Subsections


3 Parallel Assembler and Disassembler

TCE Assembler compiles parallel TTA assembly programs to a TPEF binary. The Disassembler provides a textual disassembly of parallel TTA programs. Both tools can be executed only from the command line.

1 Assembler

Input: program source file in the TTA parallel assembler language and an ADF

Output: parallel TPEF

2 Disassembler

Input: parallel TPEF

Output: textual disassembly of the program

Rest of this section describes the textual appearance of TTA programs, that is, how a TTA program should be disassembled. The same textual format is accepted and assembled into a TTA program.

1 Usage of Disassembler

The usage of the tcedisasm application is as follows:

tcedisasm <options> adffile tpeffile

The adffile is the ADF file.
The tpeffile is the parallel TPEF file.

The possible options of the application are as follows:

Short Name Long Name Description
o outputfile The name of the output file.
h help Prints out help info about program usage and parameters.

The application disassembles given parallel TPEF file according to given ADF file. Program output is directed to standard output stream if specific output file is not specified. The output is TTA parallel assembler language.

The program output can then be used as an input for the assembler program tceasm.

The options can be given either using the short name or long name. If short name is used, a hyphen (-) prefix must be used. For example -o followed by the name of the output file. If the long name is used, a double hyphen (- -) prefix must be used, respectively.

1 An example of the usage

The following example generates a disassemble of a parallel TPEF in the file add4_schedule.tpef and writes the output to a file named output_dis.asm.

tcedisasm -o output_dis.asm add4_supported.adf add4_schedule.tpef

2 Usage of Assembler

The usage of the tceasm application is as follows:

tceasm <options> adffile assemblerfile

The adffile is the ADF file.
The assemblerfile is the program source file in TTA parallel assembler language.

The possible options of the application are as follows:

Short Name Long Name Description
o outputfile The name of the output file.
q quiet Do Not print warnings.
h help Help info about program usage and parameters.

The application creates a TPEF binary file from given assembler file. Program output is written to a file specified by outputfile parameter. If parameter is not given, the name of the output file will be the base name of the given assembler file concatenated with .tpef.

The options can be given either using the short name or long name. If short name is used, a hyphen (-) prefix must be used. For example -o followed by the name of the ouput file. If the long name is used, a double hyphen (- -) prefix must be used, respectively.

1 An example of the usage

The following example generates a TPEF binary file named program.tpef.

tceasm add4_schedule.adf program.asm

3 Memory Areas

A TTA assembly file consists of several memory areas. Each area specifies the contents (instructions or data) of part of an independently addressed memory (or address space). There are two kinds of memory areas: code areas and data areas. Code areas begin a section of the file that defines TTA instructions. Data areas begin a section that define groups of memory locations (each group is collectively termed ``memory chunk'' in this context) and reserve them to variables. By declaring data labels (see Section 5.3.7), variables can be referred to using a name instead of their address.

Memory areas are introduced by a header, which defines the type of area and its properties. The header is followed by several logical lines (described in Section 5.3.4), each declaring a TTA instruction or a memory chunk. The end of an area in the assembly file is not marked. Simply, a memory area terminates when the header of another memory area or the end of the assembly file is encountered.

The memory area header has one of the following formats:

\tt
  CODE [\parm{start}] ;\\
  DATA \parm{name} [\parm{start}] ;

A code area begins the declaration of TTA instructions that occupy a segment of the instruction memory. A data area begins the declaration of memory chunks reserved to data structures and variables.

A TTA program can work with several independently addressed data memories. The locations of different memories belong to different address spaces. The name parameter defines the address space a memory area belongs to. The code area declaration does not have a name parameter, because TTA programs support only one address space for instruction memory, so its name is redundant.

The start parameter defines the starting address of the area being declared within its address space. The start address can and usually is omitted. When omitted, the assembler will compute the start address and arrange different memory area declarations that refer to the same address space. The way the start address is computed is left to the assembler, which must follow only two rules:

  1. If a memory area declaration for a given address space appears before another declaration for the same address space, it is assigned a lower address.
  2. The start address of a memory area declaration is at least equal to the size of the previous area declared in the same address space plus its start address.

The second rule guarantees that the assembler reserves enough memory for an area to contain all the (chunk or instruction) declarations in it.


4 General Line Format

The body of memory areas consists of logical lines. Each line can span one or more physical lines of the text. Conversely, multiple logical lines can appear in a single physical lines. All logical lines are terminated by a semicolon `;'.

The format of logical lines is free. Any number of whitespace characters (tabs, blanks and newlines) can appear between any two tokens in a line. Whitespace is ignored and is only useful to improve readability. See Section 5.3.14 for suggestions about formatting style and use of whitespaces.

Comments start with a hash character (`#') and end at the end of the physical line. Comments are ignored by the syntax. A line that contains only a comment (and possibly whitespaces before the hash character) is completely removed before interpreting the program.


5 Allowed characters

Names (labels, procedures, function units etc.) used in assembly code must obey the following format:

 [a-zA-Z_][a-zA-z0-9_]*

Basically this means is that a name must begin with a letter from range a-z or A-Z or with an underscore. After the first character numbers can also be used.

Upper case and lower case letters are treated as different characters. For example labels main: and Main: are both unique.


6 Literals

Literals are expressions that represent constant values. There are two classes of literals: numeric literals and strings.

1 Numeric literals.

A numeric literal is a numeral in a positional system. The base of the system (or radix) can be decimal, hexadecimal or binary. Hexadecimal numbers are prefixed with `0x', binary numbers are prefixed with `0b'. Numbers in base 10 do not have a prefix. Floating-point numbers can only have decimal base.

Example: Numeric literals.

  0x56F05A
  7116083
  0b11011001001010100110011
  17.759
  308e+55
The first three literals are interpreted as integer numbers expressed in base, respectively, 16, 10 and 2. An all-digit literal string starting with `0' digit is interpreted as a decimal number, not as an octal number, as is customary in many high level languages.[*]The last two literals are interpreted as floating point numbers. Unlike integer literals, floating-point literals can appear only in initialisation sequences of data declarations (see Section 5.3.8 for details).

2 String literals.

A string literal consists of a string of characters. The the numeric values stored in the memory chunk initialised by a string literal depend on the character encoding of the host machine. EXTENSION: charset directive The use of string literals makes the assembly program less portable.

Literals are defined as sequences of characters enclosed in double (") or single (') quotes. A literal can be split into multiple quoted strings of characters. All strings are concatenated to form a single sequence of characters.

Double quotes can be used to escape single quotes and vice versa. To escape a string that contains both, the declaration must be split into multiple strings.

Example: String literals. The following literals all declare the same string Can't open file "%1".

  "Can't open file" '"%1"'
  'Can' "'" 't open file "%1"'
  "Can't open" ' file "%1"'

String literals can appear only in initialisation sequences of data declarations (see Section 5.3.8 for details).

3 Size, encoding and layout of string literals.

By default, the size (number of MAU's) of the value defined by a string literal is equal to the number of characters. If one MAU is wider than the character encoding, then the value stored in the MAU is padded with zeroes. The position of the padding bits depends on the byte-order of the target architecture: most significant if ``big endian'', least significant if ``little endian''.

If one character does not fit in a singe MAU, then each character is encoded in $\lceil m/n \rceil$ MAU's, where n is the MAU's bit width and m is the number of bits taken by a character.

When necessary (for example, to avoid wasting bits), it is possible to specify how many characters are packed in one MAU or, vice versa, how many MAU's are taken to encode one character. The size specifier for characters is prefixed to a quoted string and consists of a number followed by a semicolon.

If $n > m$, the prefixed number specifies the number of characters packed in a single MAU. For example, if one MAU is 32 bits long and a character takes 8 bits, then the size specifier in

  4:"My string"
means: pack 4 characters in one MAU. The size specifier cannot be greater than $\lceil n/m \rceil$. The size `1' is equivalent to the default.

If $m > n$, the prefixed number specifies the number of adjacent MAU's used to encode one character. For example, if MAU's are 8-bit long and one character takes 16 bits, then the same size specifier means: reserve 4 MAU's to encode a single character. In this case, a 16-bit character is encoded in 32 bits, and padding occurs as described above. The size of the specifier in this case cannot be smaller than $\lceil n/m \rceil$, which is the default value when the size is not specified explicitly.


7 Labels

A label is a name that can be used in lieu of a memory address. Labels ``decorate'' data or instruction addresses and can be used to refer to, respectively, the address of a data structure or an instruction. The address space of a label does not need to be specified explicitly, because it is implied by the memory area declaration block the label belongs to.

A label declaration consists of a name string followed by a colon:

\tt
  \parm{label-name}:

Only a restricted set of characters can appear in label names. See Section 5.3.5 for details.

A label must always appear at the beginning of a logical line and must be followed by a normal line declaration (see Sections 5.3.8, 5.3.9 for details). Only whitespace or another label can precede a label. Label declarations always refer to the address of the following memory location, which is the start location of the element (data chunk or a TTA instruction) specified by the line.

Labels can be used instead of the address literal they represent in data definitions and instruction definitions. They are referred to simply by their name (without the colon), as in the following examples:

  # label reference inside a code area (as immediate)
  aLabel -> r5 ;

  # label reference inside a data area (as initialisation value)
  DA 4 aLabel ;


8 Data Line

A data line consists of a directive that reserves a chunk of memory (expressed as an integer number of minimum addressable units) for a data structure used by the TTA program:

\tt
  DA \parm{size} [\parm{init-chunk-1} \parm{init-chunk-2} \ldots] ;

The keyword `DA' (Data Area) introduces the declaration of a memory chunk. The parameter size gives the size of the memory chunk in MAU's of the address space of the memory area.

Memory chunks, by default, are initialised with zeroes. The memory chunk can also be initialised explicitly. In this case, size is followed by a number of literals (described in Section 5.3.6) or labels (Section 5.3.7) that represent initialisation values. An initialisation value represents a constant integer number and takes always an integer number of MAU's.

1 Size of the initialisation values.

The size of an initialisation value can be given by prepending the size (in MAU's) followed by a semicolon to the initialisation value. If not defined explicitly, the size of the initialisation values is computed by means of a number of rules. If the declaration contains only one initialisation value, then the numeric value is extended to size, otherwise, the rules are more complex and depend on the type of initialisation value.

  1. If the initialisation value is a numeric literal expressed in base 10, then it is extended to size MAU's.
  2. If the initialisation value is a numeric literal expressed in base 2 or 16, then its size is extended to the minimum number of MAU's necessary to represents all its digits, even if the most significant digits are zeroes.
  3. If the initialisation value is a label, then it is extended to size MAU's.

2 Extension sign.

Decimal literals are sign-extended. Binary, hexadecimal and string literal values are zero-extended. Also the initialisation values represented by labels are always zero-extended.

3 Partial Initialisation.

If the combined size of the initialisation values (computed or specified explicitly, it does not matter) is smaller than the size declared by the `DA' directive, then the remaining MAU's are initialised with zeroes.

Example: Padding of single initialisation elements. Given an 8-bit MAU, the following declarations:

  DA 2 0xBB ; # equivalent to 2:0xBB
  DA 2 0b110001 ; # 0x31 (padded with 2 zero bits)
  DA 2 -13 ;
define 2-MAU initialisation values: 0x00BB, 0x0031, and 0xFFF3, respectively.

Example: Padding of of multi-element initialisation lists. The following declarations:

  DA 4 0x00A8 0x11;
  DA 4 0b0000000010100100 0x11 ;
are equivalent and force the size of the first initialisation value in each list to 16 bits (2 MAU's) even if the integer expressed by the declarations take less bits. The 4-MAU memory chunk is initialised, in both declarations, with the number 0x00A81100. Another way to force the number of MAU's taken by each initialisation value is to specify it explicitly. The following declarations are equivalent to the declarations above:
  DA 4 2:0xA8  0x11;
  DA 4 2:0b10100100  0x11;
Finally, the following declarations:
  DA 2 1:0xA8  0x11;
  DA 2 1:0b10100100  0x11;
define a memory chunk initialised with 0xA8110000. The initialisation value (in case of the binary literal, after padding to MAU bit width) defines only the first MAU.

When labels appear in initialisation sequences consisting of multiple elements, the size of the label address stored must be specified explicitly.

Example. Initialisation with Labels. The following declaration initialises a 6-MAU data structure where the first 2 MAU's contain characters `A' and `C', respectively, and the following 4 MAU's contain two addresses. The addresses, in this target architecture, take 2 MAU's.

  DA 6 0x41 0x43 2:nextPointer 2:prevPointer ;


9 Code Line

A code line defines a TTA instruction and consists of a comma-separated, fixed sequence of bus slots. A bus slot in any given cycle can either program a data transport or encode part of a long immediate and program the action of writing it to a destination (immediate) register for later use.[*]

A special case of code line that defines an empty TTA instruction. This line contains only three dots separated by one or more white spaces:

  . . . ; # completely empty TTA instruction

A special case of move slot is the empty move slot. An empty move slot does not program any data transport nor encodes bits of a long immediate. A special token, consisting of three dots represents an empty move slot. Thus, for a three-bus TTA processor, the following code line represents an empty instruction:

  ... , ... , ... ; # completely empty TTA instruction

10 Long Immediate Chunk

When a move slot encodes part of a long immediate, its declaration is surrounded by square brackets and has the following format:

\tt
  \parm{destination}=\parm{value}

where destination is a valid register specifier and value is a literal or a label that gives the value of the immediate. The only valid register specifiers are those that represent a register that belongs to an immediate unit. See section 5.3.12 for details on register specifiers.

When the bits of a long immediate occupy more than one move slot, the format of the immediate declaration is slightly more complex. In this case, the value of the immediate (whether literal or label) is declared in one and only one of the slots (it does not matter which one). The other slots contain only the destination register specifier.

11 Data Transport

A data transport consists of one optional part (a guard expression) and two mandatory parts (a source and a destination). All three can contain an port or register specifier, described in Section 5.3.12.

The guard expression consists of a single-character that represents the invert flag followed by a source register specifier. The invert flag is expressed as follows:

  1. Single-character token `!': the result of the guard expression evaluates to zero if the source value is nonzero, and evaluates to one if the source value is equal to zero.
  2. Single-character token `?': the result of the guard expression evaluates to zero if the source value is zero, and evaluates to one if the source value is not zero.

The move source specifier can be either a register and port specifier or an in-line immediate. Register and port specifiers can be GPR's, FU output ports, long immediate registers, bridge registers. The format of all these is specified in Section 5.3.12. The in-line immediate represents an integer constant and can be defined as a literal or as a label. In the latter case, the in-line immediate can be followed by an equal sign and a literal corresponding to the value of the label. The value of the labels is more likely to be shown as a result of disassembling an existing program than in user input code, since users can demand data allocation and address resolution to the assembler.

Example: Label with value. The following move copies the label `LAB', which represents the address 0x051F0, to a GPR:

  LAB=0x051F0 -> r.4

The move destination consists of a register and port specifier of two types: either GPR's or FU input ports.


12 Register Port Specifier

Any register or port of a TTA processor that can appear as a move or guard source, or as a move destination is identified and referred to by means of a string. There are different types of register port specifiers:

  1. General-purpose register.
  2. Function unit port.
  3. Immediate register.
  4. Bridge register.

GPR's are specified with a string of the following format:

\tt
  \parm{reg-file}[.\parm{port}].\parm{index}

where reg-file is the name of the register file, DISCUSS: pending [*] port, which can be omitted, is the name of the port through which the register is accessed, and index is the address of the register within its register file.

Function unit input and output ports are specified with a string of the following format:

\tt
  \parm{function-unit}.\parm{port}.[\parm{operation}]

where function-unit is the name of the function unit, port is the name of the port through which the register is accessed, and operation, which is required only for opcode-setting ports, identifies the operation performed as a side effect of the transport. It is not an error to specify operation also for ports that do not set the opcode. Although it does not represent any real information encoded in the TTA program, this could improve the readability of the program.

Immediate registers are specified with a string if the following format:

\tt
  \parm{imm-unit}[.\parm{port}].\parm{index}
where imm-unit is the name of the immediate unit, DISCUSS: pending [*] port, which can be omitted, is the name of the port through which the register is accessed, and index is the address of the register within its unit.

Since any bus can be connected to at most two busses through bridges, it is not necessary to specify bridge registers explicitly. Instead, the string that identifies a bridge register can only take one of two values: `{prev}' or `{next}'. These strings identify the bus whose value in previous cycle is stored in the register itself. A bus is identified by `{prev}' if it is programmed by a bus slot that precedes the bus slot that reads the bridge register. Conversely, if the bus is identified by `{next}', then it is programmed by a bus slot that follows the bus slots that reads the bridge register. In either case, the source bus slot must be adjacent to the bus slot that contains the moves that reads the bridge register.

Example: possible register and port specifiers.

\begin{tabular}{lp{0.75\textwidth}}
\texttt{IA.0}    & immediate unit `IA', register with index 0\\
\texttt{RFA.5}   & register file `RFA', register with index 5\\
\texttt{U.s.add} & port `s' of function unit `U', opcode for operation
`add'\\
\verb|{|\texttt{prev}\verb|}|  & bridge register that contains the value on
the
                   bus programmed by the previous bus slot in previous
cycle\\
\end{tabular}

1 Alternative syntax of function unit sources and destinations.

Most clients, especially user interfaces, may find direct references to function unit ports inconvenient. For this reason, an alternative syntax is supported for input and output ports of function units:
\tt
  \parm{function-unit}.\parm{operation}.\parm{index}
where function-unit is the name of the function unit, operation identifies the operation performed as a side effect of the transport and index is a number in the range [1,n], where n is the total number of inputs and outputs of the operation. The operation input and output, indirectly, identifies also the FU input or output and the port accessed. Contrary to the base syntax, which is requires the operation name only for opcode-setting ports, this alternative syntax makes the operation name not optional. The main advantage of this syntax is that is makes the code easier to read, because it removes the need to know what is the operation input or output bound to a port, because. The main drawback is an amount of (harmless) ``fuzziness'' and inconsistency, because it forces the user to define an operation for ports that do not set the opcode, even in cases where the operand is shared between two different operations. For example, suppose that the operand `1' of operations `add' and `mul' is bound to a port that does not set the opcode and its value is shared between an `add' and a `mul':
  r1 -> U1.add.1, r2 -> U1.add.2;
  U1.add.3 -> r3, r4 -> U1.mul.2;
  U1.mul.3 -> r5
it looks as if the shared move belonged only to `add'. One could have also written, correctly but less clearly:
  r1 -> U1.mul.1, r2 -> U1.add.2;
  # same code follows
or even, assuming that operation `sub' is also supported by the same unit and its operand `1' is bound to the same port:
  r1 -> U1.sub.1, r2 -> U1.add.2;
  # same code follows

This alternative syntax is the only one permitted for TTA moves where operations are not assigned to a function unit of the target machine.


13 Assembler Command Directives

Command directives do not specify any code or data, but change the way the assembler treats (part of) the code or data declared in the assembly program. A command directive is introduced by a colon followed by the name string that identifies it, and must appear at the beginning of a new logical line (possibly with whitespace before).

The assembler recognises the following directives.

1 procedure

The `:procedure' directive defines the starting point of a new procedure. This directive is followed by one mandatory parameter: the name of the procedure. Procedure directives should appear only in code areas. The procedure directive defines also, implicitly, the end of procedure declared by the previous `:procedure' directive. If the first code section of the assembly program contains any code before a procedure directive, the code is assumed to be part of a nameless procedure. Code in following code areas that precede any procedure directive is considered part of the last procedure declared in one of the previous code areas.

Example: declaration of a procedure.

  CODE ;
  :procedure Foo ;
  Foo:
      r5 -> r6 , ... ;
      . . . ;
      ... , r7 -> add.1 ;
In this example, a procedure called `Foo' is declared and a code label with the same name is declared at the procedure start point. The code label could be given any name, or could be placed elsewhere in the same procedure. In this case, the code label `Foo' marks the first instruction of procedure `Foo'.

2 global

The `:global directive declares that a given label is globally visible, that is, it could be linked and resolved with external code. This directive is followed by one mandatory parameter: the name of the label. The label must be defined in the same assembly file. The label may belong to the data or the code section, indifferently.

3 extern

The `:extern directive declares that a given label is globally visible and must be resolved an external definition. This directive is followed by one mandatory parameter: the name of the label. The label must not be defined in the assembly file.

There can be only one label with any given name that is declared global or external.

Example: declaration of undefined and defined global labels.

  DATA dmem  0x540;
  aVar:
      DA 4 ;
  :global aVar ;
  :extern budVar ;
In this example, `aVar' is declared to have global linkage scope (that is, it may be used to resolve references from other object files, once the assembly is assembled). Also `budVar' is declared to have global linkage, but in this case the program does not define a data or code label with that name anywhere, and the symbol must be resolved with a definition in an external file.


14 Assembly Format Style

This section describes a number of nonbinding guidelines that add to the assembly syntax specification and are meant to improve programs' readability.

1 Whitespaces.

Although the format of the assembly is completely free-form, tabs, whitespaces and new lines can be used to improve the assembly layout and the readability. The following rules are suggested:
  1. Separate the following tokens with at least one whitespace character:
    1. Label declaration `name:' and first move or `DA' directive.
    2. Moves of an instruction and commas.
    3. Move source and destination and the `->' or `<-' token.
  2. Do not separate the following tokens with whitespaces:
    1. Long immediate chunk declaration and the surrounding brackets.
    2. Label and, literal and the `=' token in between.
    3. Any part of a register specifier (unit, port, operation, index) and the `.' separator token.
    4. Register specifier, label or literal and the `=' in between.
    5. Invert flag mark (`!' or `?') and the register specifier of a guard expression.
    6. Initialisation chunk, the number of MAU's it takes and the `:' token in between.
    7. Colon `:' and the nearby label or directive name.

2 End of Line.

The length of physical lines accepted is only limited by client implementation. Lines up to 1024 characters must be supported by any implementation that complies with these specifications. However, it is a good rule, to improve readability, that physical line should not exceed the usual line length of 80 or 120 characters. If a TTA instruction or a data declaration does not fit in the standard length, the logical line should be split into multiple physical lines. The physical lines following the first piece of a logical line can be indented at the same column or more to the right. In case of data declarations, the line is split between two literals that form an initialisation data sequence. In case of TTA instructions, logical lines should never be split after a token of type `X' if it is recommended that no whitespace should follow `X' tokens. To improve readability, TTA instructions should be split only past the comma that separates two move slots:
    # good line breaking
    r2 -> U.sub.1 , [i1=var] , r3 -> U.sub.2 , i1 -> L.ld.1 , [i1] ,
    0 -> U.eq.2 ;

    # bad line breaking
    r2 -> U.sub.1 , [i1=var] , r3 -> U.sub.2 , i1 -> L.ld.1 , [i1] , 0 ->
    U.eq.2 ;

    # really bad line breaking
    r2 -> U.sub.1 , [i1=var] , r3 -> U.sub.2 , i1 -> L.ld.1 , [i1] , 0 -> U.
    eq.2 ;

3 Tabulation.

The following rules can be taken as starting point for a rather orderly layout of the assembly text, which resembles the layout of traditional assembly languages:
  1. The first n characters of the assembly lines are reserved to labels. Instruction or data declarations are always indented by n characters.
  2. Labels appear in the same physical line of the instruction or data declaration they refer to. Labels are no more than $n-2$ characters long.
This layout is particularly clean when the TTA instructions contain few bus slots and when multiple labels for the same data chunk or instruction do not occur.

Example: Assembly layout style best suited target architectures with few busses.

DATA DMEM
var:      DA 4;

CODE
lab_A:    spr -> U.add.1 , 55 -> U.add.2 , spr -> r12 ;
          [i0=0x7F] , U.add.3 -> spr , i0 -> r2 ;
loop_1:   r2 -> U.sub.1 , r3 -> U.sub.2 , var -> L.ld.1 ;
          r2 -> U.eq.1 , U.sub.3 -> r2 , 0 -> U.eq.2 ;
          ?U.eq.3 loop_1 -> C.jump.1 , L.ld.2 -> U.and.2 ;
          0x1F -> U.and.1 , ... , ... ;
          ... , U.and.3 -> r8 , ... ;

An alternative layout of the assembly text is the following:

  1. Instruction and data declarations are always indented by n characters.
  2. Each label declaration appears in a separate physical line of the instruction or data declaration they refer to, and starts from column 0.
This layout could be preferable when the TTA instructions contain so many bus slots that the logical line is usually split into multiple physical lines, because it separates more clearly the code before and after a label (which usually marks also a basic block entry point). In addition, this layout looks better when an instruction or data declaration has multiple labels and when the label name is long.

Example: Assembly layout style best suited targets with many busses.

DATA DMEM
var:
    DA 4;

CODE
a_long_label_name:
    spr -> U.add.1 , 55 -> U.add.2 , spr -> r12 , [i0=0x7F], i0 -> r2,
    ... ;
    ... ,  U.add.3 -> spr , ... , ... , ... , ... ;
loop_1:
    r2 -> U.sub.1 , [i1=var] , r3 -> U.sub.2 , i1 -> L.ld.1 , [i1] ,
    0 -> U.eq.2 ;
    r2 -> U.eq.1 , U.sub.3 -> r2 , ... , ?U.eq.3 loop_1 -> C.jump.1 ,
    0x1F -> U.and.1 , ... ;
    L.ld.2 -> U.and.2 , ... , ... , ... , ... , ... ;
    ... , ... , ... , U.and.3 -> r8 , ... , ... ;
This example of assembly code is exactly equivalent to the code of previous example, except that the address of `var' data chunk (a 4-MAU word) is encoded in a long immediate and takes 2 move slots.

4 Layout of Memory Area Declarations.

It is preferable to subdivide the contents of memories into several memory area declarations and to group near each other area declarations of different address spaces that are related to each other. This underlines the relation between data and code. The alternative, a single area for each address space, mixes together all data and all procedures of a program.

5 Mixing Alternative Syntaxes.

It is preferable to not mix alternative styles or even syntaxes, although any client that works with the assembly language is expected to deal with syntax variants.


15 Error Conditions

This section describes all the possible logical errors that can occur while assembling a TTA program.

1 Address Width Overflow in Initialisation.

A label is used as initialisation value of a data declaration, and the label address computed by the assembler exceeds the number of MAU's in the data declaration that must be initialised.

2 Address Width Overflow in Long Immediate.

A label is used as initialisation value of a long immediate declaration, and the label address computed by the assembler exceeds total width of the move slots that encode the immediate, when concatenated together.

3 Address Width Overflow in In-line Immediate.

A label is used as initialisation value of an in-line immediate, and the label address computed by the assembler exceeds width of source field that encodes the immediate.

4 Unspecified Long Immediate Value.

A long immediate is defined, but none of the move slots that encode its bits defines the immediate value.

5 Multiply Defined Long Immediate Value.

More than one of the move slots that contain the bits of a long immediate defines the immediate value.[*]

6 Overlapping Memory Areas.

The start address specified in the header of two memory area declarations is such that, once computed the sizes of each memory area, there is an overlapping.

7 Multiple Global Symbols with Same Name.

A `:global' directive declares a symbol with given name as globally visible, but multiple labels with given name are declared in the program.

8 Unknown Command Directive.

A command directive has been found that is not one of the directives supported by the assembler.

9 Misplaced Procedure Directive.

A `:procedure' directive appears inside a data area declaration block.

10 Procedure Directive Past End of Code.

A `:procedure' directive appears after the last code line in the program.

11 Label Past End of Area.

A label has been declared immediately before an area header or at the end of the assembly program. Labels must be followed by a code line or a data line.

12 Character Size Specifier too Big.

A size specified for the characters of a string literal is greater than the maximum number of characters that can fit in one MAU.

13 Character Size Specifier too Small.

A size specified for the characters of a string literal is smaller than the minimum number of MAU's necessary to encode one character.

14 Illegal Characters in Quoted String.

A quoted string cannot contain non-printable characters (that is, characters that cannot be printed in the host encoding) and end-of-line characters.


16 Warning Conditions

This section describes all conditions of target architecture or assembly syntax for which the client should issue an optional warning to prepare users for potential errors or problematic conditions.

1 Equally Named Register File and Function Unit.

A register file and a function unit of the target architecture have the same name. This is one of the conditions for the activation of the disambiguation rule.

2 Port with the Name of an Operation.

A register file or a function unit port are identified by a string name that is also a name of a valid operation supported by the target architecture. The first condition (port of register file) is more serious, because it may require triggering a disambiguation rule. The second condition (FU port) is not ambiguous, but is confusing and ugly. The second condition may be more or less severe depending, respectively, whether the operation with the same name is supported by the same FU or by another FU.

3 Code without Procedure.

The first code area of the program begins with code lines before the first `:procedure' directive. A nameless procedure with local visibility is automatically created by the assembler.

4 Procedure Spanning Multiple Code Areas.

A code area contains code line before the first `:procedure' directive, but it is not the first code area declared in the code. The code at the beginning of the area is attached to the procedure declared by the last `:procedure' directive.

5 Empty Quoted String.

Empty quoted strings are ignored.

17 Disambiguation Rules

Certain syntactic structures may be assigned different and (in principle) equally valid interpretations. In these cases, a disambiguation rule assigns priority to one of the two interpretation. Grammatically ambiguous assembly code should be avoided. Clients that operate on TTA assembly syntax should issue a warning whenever a disambiguation rule is employed.

1 Disambiguation of GPR and FU terms.

When a GPR term includes also the RF port specifier, it can be interpreted also as a function unit input or output.

Normally, the names of units, ports and operation rule out one of the two interpretations. Ambiguity can only occur only if:

  1. The target architecture contains a RF and a FU with identical name.
  2. One of the RF ports has a name identical to one of the operations supported by the FU that has the same name of the RF. PENDING: disambiguation of unscheduled TTA code [*]

Ambiguity is resolved in favour of the GPR interpretation. No condition applies to the indices (register index or operation input or output index). The first interpretation is chosen even when it results in a semantic error (an index out of range) whereas the other interpretation would be valid.

Example. Disambiguation rule. The following move is interpreted as a move that writes the constant 55 to the register with index 2 of register file `xx' through port `yy'. If there exists an FU called `xx' that supports an operation `yy' which has an input with index 2, this interpretation of the move is never possible.

  55 -> xx.yy.2

Even if the disambiguation rule is not triggered, clients should warn when the target architecture satisfies one of the conditions above (or a similar condition). See Section 5.3.16 for a description of this and other conditions for which a warning should be issued.

2 Disambiguation of variables and operation terms.

In unscheduled code, operation terms cannot be confused with variables. The special RF names `r', `f' and `b' are reserved, respectively, to integer, floating-point and Boolean register files of the universal machine. The assembler does not allow any operation to have one of these names. PENDING: disambiguation of mixed TTA code [*]

Example. Unambiguous move term accessing a variable. The following move is interpreted as ``copy constant 55 to variable with index 2 of variable pool `r'''. There cannot exist an operation `r', so the interpretation of the move destination as operation term is impossible.

  55 -> r.2

3 Disambiguation of Register File and Immediate Unit names

Assembler syntax does not differentiate unit names of immediate units from unit names of register files. The same register specifier of a move source

   x.2 -> alu.add.1
can represents a GPR or an immediate register depending on whether `x' is an RF or a IU.

In this case the GPR interpretation is always preferred over the IU interpretation. However using the same naming for IUs and GPRs restricts severely the programmability of target machine and is not encouraged.



Footnotes

... languages.[*]
This notation for octal literals has been deprecated.
... use.[*]
The action of actually writing the long immediate to a destination register is encoded in a dedicated instruction field, and is not repeated in each move slot that encodes part of the long immediate. This detail is irrelevant from the point of view of program specification. Although the syntax is slightly redundant, because it repeats the destination register in every slot that encodes a piece of a long immediate, it is chosen because it is simple and avoids any chance of ambiguity.
... value.[*]
If the values are identical in every move slot, then the client could issue a warning rather than a critical error.
Pekka Jääskeläinen 2016-11-24