This section lists the most user-visible limitations placed by the current toolset version.
The simulator supports only integer computations with maximum word width of 32 bits.
The details of encoding and compression of the instruction memory are not taken into account before the actual generation of the bit image of the instruction memory. This decision was taken to allow simplification in the other parts of the toolset, and to allow easy "exploration" with different encodings and compression algorithms in the bit generation phase.
This implies that every time you see an instruction address in architectural simulation, you are actually seeing an instruction index. That is, instruction addressing (one instruction per instruction memory address) is assumed.
We might change this in the future toolset versions to allow seeing exact instruction memory addresses during simulation, if that is seen as a necessity. Currently it does not seem to be a very important feature.
There is no tool to map data memory accesses in the source code to the actual target's memories. Therefore, you need to have a data memory which provides byte-addressing with 32-bit words. The data will be accessed using operations LDQ, LDH, LDW, STQ, STH, STW, which access the memory in 1 (q), 2 (h), and 4 (w) byte chunks. This should not be a problem, as it is rather easy to implement byte-addressing in case the actual memory is of width of 2's exponent multiple of the byte. The parallel assembler allows any kind of minimum addressable units (MAU) in the load/store units. In that case, LDQ/STQ just access a single MAU, etc. One should keep in mind the 32-bit integer word limitation of simulation. Thus, if the MAU is 32-bits, one cannot use LDH or LDW because they would require 64 and 128 bits, respectively.
The simulator assumes ideal memory model which generates no stalls and returns data for the next instruction. This so called 'Ideal SRAM' model allows modeling all types of memories in the point of view of the programmer. It is up to the load/store unit implementation to generate the lock signals in case the memory model does not match the ideal model.
There are hooks for adding more memory models which generate lock signals in the simulation, but for the v1.0 the simulator does not provide other memory models, and thus does not support lock cycle simulation.
The guard support as specified in the ADF specification [CSJ04] is only partially supported in TCE. `Operators other than logical negation are not supported. That is, supported guards always ``watch'' a single register (FU output or a GPR). In addition, the shipped default scheduling algorithm in compiler backend requires a register guard. Thus, if more exotic guarded execution is required, one has to write the programs in parallel assembly (Section 5.3).
Even though supported by the ADF and ProDe, writing of operands after triggering an operation is not supported neither by the compiler nor the simulator. However, setting different latencies for outputs of multi-result operations is supported. For example, using this feature one can have an iterative operation pipeline which computes several results which are ready after the count of stages in an iteration.
TCE uses XML to store data of the architectures and implementation locations (see Section 2.2.1 and Section 2.2.4). The encoding of the XML files must be in 7-bit ascii. If other encodings are used, the result is undefined.
The simulator supports the half (16 bits), single (32 bits) and double (64 bits) precision floating point types. However, at this time only single precision float operations (ADDF, MULF, etc.) are selected automatically when starting from C/C++ code. Currently, the compiler converts doubles to floats to allow compiling and running code with doubles with reduced precision.
Pekka Jääskeläinen 2018-03-12