Subsections

2 Interconnection Network

See Section 7.1.4 for the definitions of the different connectivity classes of the TTAs designed with TCE.

Some additional points worth considering:

The best compromize between instructions-per-cycle (IPC) and clock speed is usually 'Fully RF connected' with additional FU to FU connections for commonly used bypasses.

This, of course, doesn't always hold. For example, if the clock speed is limited by some other component than the interconnect, Directly Reachable might give slightly better performance. Sometimes some architecture with only 'Reachable' connectivity might give so big clocks speed improvement that it might outweight the IPC penalty. So, it needs some experimenting from the designer as it depends on the parallelism of the program among other things.

With some future improvements to the compiler the performance of 'Reachable' architectures may get a definite improvement, at which point we'll update these hints.

There is a tool called ConnectionSweeper which can be used to optimize the interconnection network automatically. ConnectionSweeper is documented in Section 6.4.3.

1 Negative short immediates

Make sure you have one or more buses (actually ``move slots``) with signed short immediate fields. Small negative numbers such as -1, and -4 are very common in the generated code, and long immediates often cause bottlenecks on the machine code.

Pekka Jääskeläinen 2018-03-12