2 Programmer Interface

This section describes the programmer interface, which is the application binary interface followed by tcecc.

The programs produced by tcecc are fully linked and not expected to be relinked afterwards to other programs. Therefore, details such as the function calling convention can be customized per program by the compiler. For these parts, this section describes the current default behavior for reference.

1 Default Data Address Space Layout

When compiling from a higher-level language using tcecc, there has to be at least one byte-addressible data address space in the machine. This is the ``default address space'', marked with the numerical id 0 in case there are multiple data address spaces.

The C/C++ compilation lays out the global variables starting from the first data memory location to the default address space. The first location after the global variables is the start location for heap, in case the program uses dynamic memory allocation. It grows upwards. The stack, used to store the function local variables and to pass parameters in the function calls, grows from the largest memory address downwards. The start address of the stack can be changed with the tcecc switch --init-sp.

As both heap and stack grow dynamically towards each other, to way to increase the available space for heap/stack is to increase the size of your data memory address space in ADF.

2 Instruction Address Space

The default GCU assumes that instruction memory is intruction addressable. In other words the instruction word must fit in the MAU of the instruction memory. This way the next instruction can be referenced by incrementing the current program counter by one.

How to interface the instruction memory with an actual memory chip is out of scope in TCE because there are too many different platforms and chips and possibilities. But as an advantage this gives the user free hands to implement almost any kind of memory hierarchy the user wants. Most probably you must implement a memory adapter to bind the memory chip and the TTA processor interfaces together.

3 Alignment of Words in Memory

tcecc aligns the addresses of data word to the data type size in bytes. For example, if a floating-point word takes 4 MAU's, its address must be a multiple of 8.

The memory acessing operations in the base operation set (ldq, ldh, ldw, ldd, stq, sth, stw, and ldd) are aligned with their size. Operations stq/ldq are for accessing single bytes, thus their alignment is 1 byte, for sth/ldh it is 2 bytes, and for stw/ldw it is 4 bytes. Thus, one cannot assume to be able to access, for example, a 4 byte word at address 3 using stw/ldw.

Double precision floating point word operations std/ldd which access 64-bit words are aligned at 8-byte addresses.

The effect of misaligned word accesses on processor implementations are undefined. Typically: (1) the processor accesses the nearest (lower) aligned address instead (because of zeroing the lower address bits); (2) the processor halts or rises an exception (unlikely). More likely is that, a processor could enter a slower operation mode and perform a mis-aligned memory access. However, these behaviors are not required by the above mentioned aligned base memory operations shipped with TCE.

4 Stack Frame Layout

The register assigned to act as the stack pointer register (referred to as SPR) is the register number 0 of the first 32b register file in the ADF. There is optional frame pointer(FPR) which is enabled on only functions with variable arrays or alloca calls. The FPR is the the third 32-bit register available in the ADF, calculated in round-robin fashion between register files.

The default mode without FPR is called NoFP mode and the mode with FPR is called HasFP mode. On NoFP mode the FPR is a freely usable GPR with callee-save semantics. The two stack frame modes are function-specific, same program can contain functions which use either mode and the parameter passing of the two modes is compatible in such way that code in either mode can call any function which uses either mode.

Figure 7.2: Stack layout(NoFP mode on left, HasFP mode on right) for a situation with two call frames, A() calling B(). Functions like B that do not call any other functions do not store their return address to the stack.

The stack of TTA's supported by TCE grows downwards.

The stack of a program is divided up into blocks of contiguous memory called frames. Each frame is activated upon entering a function and is destroyed (its storage is made available for new frames) when that function invocation returns.

In NoFP mode, the stack pointer register (SPR) always points to the place of last outgoing variable that the function can call, and this pointer is only adjusted at the beginning and end of a function. If the function contains calls to another functions, the first word after(below) the old stack pointer contains the old return address. After(below) that is the area for the local variables. The local variables and spilled registers are the same area, there is no distinction between them. After(below) the local variables is the are for outgoing parameters. When a function calls multiple other functions, the space allocated for outgoing parameters is calculated by the size of parameters of the call which needs most parameter area.

In HasFP mode, space for the outgoing variables is not allocated in the beginning of the function. In the beginning of a function, Old Return Address(RA) and FPR are pushed to stack, and SPR is then copied to FPR. Then space for local variables is allocated by decreasing the SPR. The stack pointer(SPR) always points to the topmost item of the stack. Usually this is the end of the local variable area. When a function is called, the function parameters are pushed into the stack, and SPR adjusted accordingly, allocating the area for the outgoing parameters from the stack

grows downwards, therefore the incoming arguments and the local variables on the local stack frame have positive offsets when accessed from the current function, whereas the outgoing arguments have negative stack offsets.

The value of the return address register is pushed to stack before the function's local variables in case the function is not a leaf function (a function without calls to other functions). If the called function returns an object that does not fit to the return value register, the function gets an implicit argument that points to the location in the callee's stack where the return value object will be written by the called function.

5 Word Byte Order

TCE-TTA processors follow the little endian byte order if ADF of the processor has little endian property enabled. Otherwise, the processors follow big endian byte order.

6 Function Calling Conventions

The current default calling convention generated by tcecc is as follows.

The second 32-bit GPR (first register of second 32-bit register file, or second register of only 32-bit register file) of ADF is a return value register which is used to return 32b values such as $int$s or $float$s from functions. The same register is reused for the first passed parameter value in function calls, if suitable.

In a function call, the arguments and return values that do not fit in the 32b return value / parameter register are pushed to stack.

The return location for the last executed CALL operation is stored in the RA register in the control unit. This is pushed to stack in case of multiple function call layers. Refer to the stack frame layout for more information.

7 Register Context Saving Conventions

All registers except SPR may have to be saved and restored as a result of a function call. When a register carries a variable in the caller code that remains live across a function call, and uses a register overwritten in the callee, it needs to be saved and restored. Registers are subdivided in two groups: those that are saved and restored around a function call site, and those that are saved at the beginning of the called function and are restored before it returns. The first group is termed caller-saved registers, the second group callee-saved registers.

The compiler has the freedom to optimize the context saving convention. Depending on the properties of the program and the capacity of the register allocator, a varying fraction of the total GPR's in a register file may be assigned to either above mentioned groups. The register allocator may even dedicate one register file completely to caller-saved registers and another to callee-saved registers.

The current default register saving convention followed by tcecc is to have treat all registers except FPR(when not used as FPR) as caller saved.

Pekka Jääskeläinen 2018-03-12