Subsections


3 Notes About the Processor Template of TCE

The processor template from which the application specific processors designed with TCE are defined from is called Transport Triggered Architecture (TTA). For a detailed description behind the TTA philosophy, refer to [Cor97]. This section describes certain aspects of the TTA template used in the TCE toolset that might not be very clear.

1 Immediates/Constants

The TTA template supports two ways of transporting program constants in instructions. Short immediates are encoded in the move slot's source field, and thus consume a part of a single move slot. The constants transported in the source field are usually relatively small in size. Wider constants can be transported by means of so called long immediates. Long immediates can be defined using a parameter called instruction template. The idea is that each TTA instruction is connected to a single instruction template which defines the move slots that contain pieces of a long immediate, if any. The slots cannot be used for regular data transports when they are used for transporting pieces of a long immediate. An instruction containing a long immediate also provides a target to which the long immediate must be transported. The target is so called immediate unit which is written directly from the control unit, not through the transport buses. The immediate unit is like a register file expect that it contains only read ports and is written only by the instruction decoder in the control unit when it detects an instruction with a long immediate.

2 Operations, Function Units, and Operand Bindings

Due to the way TCE abstracts operations and function units, an additional concept of operand binding is needed to connect the two in processor designs.

Operations in TCE are defined in a separate database (OSAL, Section 2.2.6) in order to allow defining a reusable database of ``operation semantics''. The operations are used in processor designs by adding function units (FU) that implement the wanted operations. Operands of the operations can be mapped to different ports of the implementing FU, which affects programming of the processor. Mapping of operation operands to the FU ports must be therefore described by the processor designer explicitly.

Example. Designer adds an FU called 'ALU' which implements operations 'ADD', 'SUB', and 'NOT'. ALU has two input ports called 'in1' and 'in2t' (triggering), and an output port called 'out'. A logical binding of the 'ADD' and 'SUB' operands to ALU ports is the following:

 ADD.1 (the first input operand) bound to ALU.in1
 ADD.2 (the second input operand) bound to ALU.in2t
 ADD.3 (the output operand) bound to ALU.out

 SUB.1 (the first input operand) bound to ALU.in1
 SUB.2 (the second input operand) bound to ALU.in2t
 SUB.3 (the output operand) bound to ALU.out

However, operation 'NOT', that is, the bitwise negation has only one input thus it must be bound to port 'FU.in2t' so it can be triggered:

 NOT.1 bound to ALU.in2t
 NOT.2 (the output operand) bound to ALU.out

Because we have a choice in how we bind the 'ADD' and 'SUB' input operands, the binding has to be explicit in the architecture definition. The operand binding described above defines architecturally different TTA function unit from the following:

 SUB.2 bound to ALU.in1
 SUB.1 bound to ALU.in2t
 SUB.3 bound to ALU.out

With the rest of the operands bound similarly as in the first example.

Due to the differing 'SUB' input bindings one cannot run code scheduled for the previous processor on a machine with an ALU with the latter operand bindings. This small detail is important to understand when designing more complex FUs, with multiple operations with different number of operands of varying size, but is usually transparent to the basic user of TCE.

Reasons for wanting to fine tune the operand bindings might include using input ports of a smaller width for some operation operands. For example, the width of the address operands in memory accessing operations of a load store unit is often smaller than the data width. Similarly, the second operand of a shift operation that defines the number of bits to shift requires less bits than the shifted data operand.


3 Datapath Connectivity Levels

The datapath interconnection network of TTAs is visible to the programmer. This enables full customization of the connectivity, making it possible to remove connections that are rarely, if at all, used by the programs the processor at hand is designed to run. However, less connections the machine has, more challenging it becomes to automatically produce efficient code for it. This section describes the different TTA ``connectivity levels'' and their support in the TCE design flow.

The currently identified and supported connectivity levels are, in the order of descending level of connectivity, as follows:

  1. Fully Connected. Completely connected interconnection network ``matrix''. All bus-socket and socket-port connections are there. There is a shortcut for creating this type of connectivity in the ProDe tool.

    The easy target for the high-level language compiler tcecc. However, not a realistic design usually due to its high implementation costs.

  2. Directly Reachable. The connectivity has been reduced. However, there is still at least one direct connection from each function unit (FU) or register file (RF) output to all inputs.

    An easy target for tcecc.

  3. Fully RF connected. All FUs are connected to all RFs. That is, you can read and write any general purpose register (GPR) from any FU with a single move. However, some or all bypass connections between FUs might be missing.

    An easy target for tcecc. However, reduction of bypass connections means that less software bypassing can be done.

  4. Reachable. All FUs are connected to at least one RF and all RFs (and thus other FUs) can be reached via one or more additional register copy moves.

    Compilation is fully supported starting from HLL. The Number of copies not limited by tcecc. However, this often results in suboptimal code due to the additional register copies which introduce additional moves, consume registers, and produce dependencies to the code which hinder parallelism.

  5. RF disconnected. Some FUs are not connected to any RF or there are ``separated islands'' without connectivity to other ``islands''.

    Not supported by tcecc. However, any connectivity type is supported by the assembler. Thus, one can resort to manual TTA assembly programming.

Pekka Jääskeläinen 2011-12-08