The processor template from which the application specific processors designed with TCE are defined from is called Transport Triggered Architecture (TTA). For a detailed description behind the TTA philosophy, refer to [Cor97]. This section describes certain aspects of the TTA template used in the TCE toolset that might not be very clear.
The TTA template supports two ways of transporting program constants in instructions. Short immediates are encoded in the move slot's source field, and thus consume a part of a single move slot. The constants transported in the source field are usually relatively small in size. Wider constants can be transported by means of so called long immediates. Long immediates can be defined using a parameter called instruction template. The idea is that each TTA instruction is connected to a single instruction template which defines the move slots that contain pieces of a long immediate, if any. The slots cannot be used for regular data transports when they are used for transporting pieces of a long immediate. An instruction containing a long immediate also provides a target to which the long immediate must be transported. The target is so called immediate unit which is written directly from the control unit, not through the transport buses. The immediate unit is like a register file expect that it contains only read ports and is written only by the instruction decoder in the control unit when it detects an instruction with a long immediate.
Due to the way TCE abstracts operations and function units, an additional concept of operand binding is needed to connect the two in processor designs.
Operations in TCE are defined in a separate database (OSAL, Section 2.2.6) in order to allow defining a reusable database of ``operation semantics''. The operations are used in processor designs by adding function units (FU) that implement the wanted operations. Operands of the operations can be mapped to different ports of the implementing FU, which affects programming of the processor. Mapping of operation operands to the FU ports must be therefore described by the processor designer explicitly.
Example. Designer adds an FU called 'ALU' which implements operations 'ADD', 'SUB', and 'NOT'. ALU has two input ports called 'in1' and 'in2t' (triggering), and an output port called 'out'. A logical binding of the 'ADD' and 'SUB' operands to ALU ports is the following:
ADD.1 (the first input operand) bound to ALU.in1 ADD.2 (the second input operand) bound to ALU.in2t ADD.3 (the output operand) bound to ALU.out SUB.1 (the first input operand) bound to ALU.in1 SUB.2 (the second input operand) bound to ALU.in2t SUB.3 (the output operand) bound to ALU.out
However, operation 'NOT', that is, the bitwise negation has only one input thus it must be bound to port 'FU.in2t' so it can be triggered:
NOT.1 bound to ALU.in2t NOT.2 (the output operand) bound to ALU.out
Because we have a choice in how we bind the 'ADD' and 'SUB' input operands, the binding has to be explicit in the architecture definition. The operand binding described above defines architecturally different TTA function unit from the following:
SUB.2 bound to ALU.in1 SUB.1 bound to ALU.in2t SUB.3 bound to ALU.out
With the rest of the operands bound similarly as in the first example.
Due to the differing 'SUB' input bindings one cannot run code scheduled for the previous processor on a machine with an ALU with the latter operand bindings. This small detail is important to understand when designing more complex FUs, with multiple operations with different number of operands of varying size, but is usually transparent to the basic user of TCE.
Reasons for wanting to fine tune the operand bindings might include using input ports of a smaller width for some operation operands. For example, the width of the address operands in memory accessing operations of a load store unit is often smaller than the data width. Similarly, the second operand of a shift operation that defines the number of bits to shift requires less bits than the shifted data operand.
The currently identified and supported connectivity levels are, in the order of descending level of connectivity, as follows:
The easy target for the high-level language compiler tcecc. However, not a realistic design usually due to its high implementation costs.
An easy target for tcecc.
An easy target for tcecc. However, reduction of bypass connections means that less software bypassing can be done.
Compilation is fully supported starting from HLL. The Number of copies not limited by tcecc. However, this often results in suboptimal code due to the additional register copies which introduce additional moves, consume registers, and produce dependencies to the code which hinder parallelism.
Not supported by tcecc. However, any connectivity type is supported by the assembler. Thus, one can resort to manual TTA assembly programming.
Pekka Jääskeläinen 2012-06-07