Subsections


3 Notes About the Processor Template of TCE

The processor template from which the application specific processors designed with TCE are defined from is called Transport Triggered Architecture (TTA). For a detailed description behind the TTA philosophy, refer to [Cor97]. This section describes certain aspects of the TTA template used in the TCE toolset that might not be very clear.

1 Immediates/Constants

The TTA template supports two ways of transporting program constants in instructions. Short immediates are encoded in the move slot's source field, and thus consume a part of a single move slot. The constants transported in the source field are usually relatively small in size. Wider constants can be transported by means of so called long immediates. Long immediates can be defined using a parameter called instruction template. The idea is that each TTA instruction is connected to a single instruction template which defines the move slots that contain pieces of a long immediate, if any. The slots cannot be used for regular data transports when they are used for transporting pieces of a long immediate. An instruction containing a long immediate also provides a target to which the long immediate must be transported. The target is so called immediate unit which is written directly from the control unit, not through the transport buses. The immediate unit is like a register file expect that it contains only read ports and is written only by the instruction decoder in the control unit when it detects an instruction with a long immediate.

2 Operations, Function Units, and Operand Bindings

Due to the way TCE abstracts operations and function units, an additional concept of operand binding is needed to connect the two in processor designs.

Operations in TCE are defined in a separate database (OSAL, Section 2.2.6) in order to allow defining a reusable database of ``operation semantics''. The operations are used in processor designs by adding function units (FU) that implement the wanted operations. Operands of the operations can be mapped to different ports of the implementing FU, which affects programming of the processor. Mapping of operation operands to the FU ports must be therefore described by the processor designer explicitly.

Example. Designer adds an FU called 'ALU' which implements operations 'ADD', 'SUB', and 'NOT'. ALU has two input ports called 'in1' and 'in2t' (triggering), and an output port called 'out'. A logical binding of the 'ADD' and 'SUB' operands to ALU ports is the following:

 ADD.1 (the first input operand) bound to ALU.in1
 ADD.2 (the second input operand) bound to ALU.in2t
 ADD.3 (the output operand) bound to ALU.out

 SUB.1 (the first input operand) bound to ALU.in1
 SUB.2 (the second input operand) bound to ALU.in2t
 SUB.3 (the output operand) bound to ALU.out

However, operation 'NOT', that is, the bitwise negation has only one input thus it must be bound to port 'FU.in2t' so it can be triggered:

 NOT.1 bound to ALU.in2t
 NOT.2 (the output operand) bound to ALU.out

Because we have a choice in how we bind the 'ADD' and 'SUB' input operands, the binding has to be explicit in the architecture definition. The operand binding described above defines architecturally different TTA function unit from the following:

 SUB.2 bound to ALU.in1
 SUB.1 bound to ALU.in2t
 SUB.3 bound to ALU.out

With the rest of the operands bound similarly as in the first example.

Due to the differing 'SUB' input bindings one cannot run code scheduled for the previous processor on a machine with an ALU with the latter operand bindings. This small detail is important to understand when designing more complex FUs, with multiple operations with different number of operands of varying size, but is usually transparent to the basic user of TCE.

Reasons for wanting to fine tune the operand bindings might include using input ports of a smaller width for some operation operands. For example, the width of the address operands in memory accessing operations of a load store unit is often smaller than the data width. Similarly, the second operand of a shift operation that defines the number of bits to shift requires less bits than the shifted data operand.

Pekka Jääskeläinen 2010-05-28