CSE > COMP1521 > Better Assembly

In COMP1521, we’re boldly going forth and writing MIPS-flavoured assembly. Unfortunately, there have been some serious style sins committed, so here’s my hot tips on writing good assembly.

(Updated 2020-06-30, with some more rationales.)

Mechanical style

Most of these rules set out to improve whitespace and consistency. Don’t deliberately write dense, cryptic code; assembly is hard enough to read as is.

RULE Set your tab width out to 8, and don’t insert spaces. At 8, 16, and 32 columns, or as close after as possible, place mnemonic, operands, and a line comment, respectively.

This is a controversial one, because of the ever-popular tabs-vs-spaces debate. In this case, I like wide indentation to make patterns in the flow of data more apparent. If you’re abhorrent to such wide indentation, that’s OK: jas uses 3 column indentation; andrewt uses 4, another reasonable value. However: pick something sensible and stick to it.

RULE Labels are never indented. Instructions are always indented.

# BAD:
f:
bgt $a0, $0, f_a0_false
addi $v0, $a0, $a1

# ALSO BAD:
        f:
bgt $a0, $0, f_a0_false
addi $v0, $a0, $a1

# ALSO BAD:
        f:
        bgt $a0, $0, f_a0_false
        addi $v0, $a0, $a1

# GOOD:
f:
        bgt     $a0, $0, f_a0_false
        addi    $v0, $a0, $a1
#       ^       ^       ^       ^       ^       ^
#       8      16      24      32      40      48
This is a readability point, as, effectively, labels are the only ‘landmarks’ in our program. Obscuring them also obscures structure and form. Alignment in this style makes it easier to distinguish labels from instructions or directives.

RULE Don’t indent to show structure. Indent to the same level, and use comments or label names to indicate structure.

# DISGUSTINGLY BAD:
f:
bgt $0, $a0, f_a0_false
    f_a0_true:
    bgt $0, $a1, f_a1_false
        f_a1_true:
            add $v0, $a0, $a1
    f_a1_false:
    f_a0_false:
li $v0, 0

# GOOD:
f:
        bgt     $0, $a0, f_a0_false
f_a0_true:
        bgt     $0, $a1, f_a1_false
f_a1_true:
        add     $v0, $a0, $a1
f_a1_false:
f_a0_false:
        li      $v0, 0
# (better: add vertical whitespace before non-empty labels)
The obvious temptation is to mimic the visual structure of C, or any other language where we use indentation to denote structure. In many languages, including C, there is explicit block-structure which has syntactic and semantic importance, and we emphasise this aggressively.

But the main reason for such a choice is the unstructured nature of assembly: there is no block structure, and indentation to show it makes little sense.

The only real landmarks we have, given we cannot rely on (e.g.,) indentation, are the labels we use, and descriptive labels (or comments) can serve to superimpose a perceived structure onto the unstructured morass of assembly.

RULE Add whitespace between the mnemonic and arguments. Visually align the mnemonics and arguments.

# BAD:
f:
        bgt $a0, $0, f_no
        li $t0, 4
        j f_yes

# GOOD:
f:
        bgt     $a0, $0, f_no
        li      $t0, 4
        j       f_yes
This is, again, a readability thing: we want to make it easier to see instruction mnemonics and their operands. By visually aligning them — I suggest aligning mnemonics to column 8, and operands to column 16 — we make it easier to spot patterns of use.

Naming rules

RULE Give labels clear, systematic names.

Some suggestions for a systematic naming scheme follow; if you like them, use them, and use them consistently.

RULE Preface all labels with the function or scope they belong to.

Because there’s no scope bounding the names you can refer to, you need to uniquely name everything, including labels. Given a function f, it would be reasonable to prefix all relevant labels in it with, for example, f_.

RULE Give function epilogues (and, where necessary, prologues), dedicated labels.

It’s also useful to denote “special” labels, like the label for the prologue and epilogue (or prelude and postlude, depending on what you call the sections that set up and tear down stack frames) To avoid confusion, use two underscores to separate the function name from the special label type; for example, f__epi or f__post might mark the epilogue to f. It’s uncommon to need a specialised name for the prologue, so if you do need it, make it clear what magic and/or evil you’re doing.

RULE In a conditional, label all parts of that conditional, to make it clear how execution has reached here.

I like to use the scheme function_variable[_condition]. So, for example, the label f_n_lt_0 gives us “in function f, for variable n, n < 0 was true”. A special case is the _phi extension: control flow continues from this point from all arms of the conditional; the name phi is borrowed from SSA form. You may like to come up with your own scheme; but whatever you choose, stick to it.

For example,
void f (int n) {
    if (n < 0) {
        putchar ('-');
    } else if (n > 0) {
        putchar ('+');
    }
}
might give these labels:
f:
f_n_lt_0:
f_n_lt_0_f:
f_n_gt_0:
f_n_gt_0_f:
f_n_phi:
f__epi:
I’d explicitly add the multiple labels of a point, too, to make it crystal-clear what’s where.
f:
        bltz    $a0, f_n_lt_0_f
f_n_lt_0:
        li      $v0, 11         # print_character
        li      $a0, '-'
        syscall
        b       f_n_phi
f_n_lt_0_f:
        bgez    $a0, f_n_gt_0_f
f_n_gt_0:
        li      $v0, 11         # print_character
        li      $a0, '+'
        syscall
        b       f_n_phi
f_n_gt_0_f:
f_n_phi:
f__epi:
        jr      $ra

RULE In a looping construct, label all parts of that loop.

Following the above naming scheme, I like to use the suffixes init, cond, step, and f (or false) to represent the loop initialisation, loop condition, increment of the loop, and the point where control flow resumes when the condition is false. The step suffix should come directly before the instruction(s) that increment i. This allows us to build a continue analogue. (This isn’t necessary in a while loop.)

For example,
void f (int n) {
    for (int i = 0; i < n; i++) {
        // ...
    }
}
might give us these labels:
f:
f_i_init:
f_i_cond:
f_i_step:
f_i_false:
f__epi:
Again, more concretely,
f:
f_i_init:
        move    $t0, $zero
f_i_cond:
        bge     $t0, $a0, f_i_false
        ##  ...
f_i_step:
        addi    $t0, $t0, 1
        b       f_i_cond
f_i_false:
f__epi:
        jr      $ra

Commenting

To comment a function called main, I’d suggest following a template like this:

########################################################################
# .TEXT <main>
        .text
main:

# Frame:        $fp, $ra, $s0, $s1, $s2, $s3, $s4
# Uses:         $a0, $a1, $v0, $s0, $s1, $s2, $s3, $s4
# Clobbers:     $a0, $a1

# Locals:
#       - `argc' in $s0
#       - `argv' in $s1
#       - `length' in $s2
#       - `ntimes' in $s3
#       - `i' in $s4

# Structure:
#       main
#       -> [prologue]
#       -> main_seed
#         -> main_seed_t
#         -> main_seed_end
#       -> main_seed_phi
#       -> main_i_init
#       -> main_i_cond
#          -> main_i_step
#       -> main_i_end
#       -> [epilogue]

# Code:
        # set up stack frame
        # ...
        # tear down stack frame

There’s a lot of useful information being condensed here. Notably:

a visually-distinctive marker for where a section of code begins: I often use a horizontal rule made up of comment characters — here, a line of 72 hashes — as a way to break up large pieces of code.
a .text directive, immediately followed by the top-level label of this subroutine: this makes it clear where we are.
the function’s stack frame, listed from high address to low address; listing the frame makes it easier to determine what is at what offset above $fp.
the function’s used registers, in no particular order, so it’s fairly easy to spot where register values might change, and to know which ones are worth saving or restoring.
the function’s clobbered registers: the general rule I use is “clobbered = uses - frame” — that is, the clobbered list is the registers whose values will be lost.
a list of local variables, and where they’re stored — either in registers, or on the stack — is especially useful if you (like me) are prone to forgetting what value is in what register; you may want to spend some time thinking about how to represent this if there are more locals than usable registers.
a fairly lax graph of control flow, which I find serves more as a relative index of where labels are than any strong guide to structure,

It’s especially useful to keep this up-to-date, so you don’t accidentally confuse yourself. Writing assembly is hard enough as is.

RULE Write clear, useful, meaningful comments, that make it clear to the reader what your code is doing, and why.

# Given $s0 is `row' and `t3' is NCOLS:
# BAD:
        mul $t0, $s0, $t3 #confused.........
        add $t0, $t0, $s1
        sb $t2, grid($t0) #how to get grid[row][col] ='.'

# GOOD:
        mul     $t0, $s0, $t3     # (row * NCOLS
        add     $t0, $t0, $t1     #  ... + col
        sb      $t2, grid($t0)    #  ... + &grid[0][0]) <- '.'

# GOOD:
        mul     $t0, $s0, $t3     # t0 = row * NCOLS
        add     $t0, $t0, $t1     # t0 = (row * NCOLS) + col
        sb      $t2, grid($t0)    # *(grid + (row*NCOLS) + col) = '.'

Structured data

RULE When using structured data, always get a base pointer and use fixed offsets.

For example,
struct student {
    int zid;
    char *name;
    double wam;
    int program;
} s;
would be laid out with zid at offset 0, name at offset 4, wam at offset 8, and program at offset 16.
# with a base pointer to a `struct student` in $a0:
student_get_zid:
        lw      $v0, 0($a0)
student_get_name:
        lw      $v0, 4($a0)
student_get_wam:
        lw      $t0, 8($a0)
        mthc1   $t0, $f0
        lw      $t0, 12($a0)
        mtc1    $t0, $f1
student_get_program:
        lw      $v0, 16($a0)
This makes it much easier to use struct student and struct student *, as both are now effectively identical.

You should also make a clear note of the layout and offsets of a data structure: the byte-offsets, the types, the field names, and also where padding may fall.

Allocating registers

One really useful trick: when writing a function, and especially when translating a function from another language, don’t work out what variables are in what registers (“register allocation”). Instead, use percent-prefixed placeholders, then do a search-and-replace for those placeholders with the register you decide to use.

Given:

void f (int matrix[NROWS][NCOLS]) {
    for (int row = 0; row < NROWS; row++) {
        for (int col = 0; col < NCOLS; col++) {
            matrix[row][col] = 0;
        }
    }
}

It’s much easier to make a first-pass translation referring to those values, to get the logic right.

f:
        # ... preamble elided ...
        li      %NROWS, 4
        li      %NCOLS, 4

f_row_init:
        # int row = 0;
        li      %row, 0
f_row_cond:
        # row < NROWS ? 1 : 0
        slt     $at, %row, %NROWS
        beq     $at, $0, f_row_false

f_col_init:
        # int col = 0;
        li      %col, 0
f_col_cond:
        # col < NCOLS ? 1 : 0
        slt     $at, %col, %NCOLS
        beq     $at, $0, f_col_false

        mul     %tmp, %row, %NCOLS  # row * NCOLS
        addu    %tmp, %tmp, %col    # (row * NCOLS) + col
        li      %tmp2, 4
        mul     %tmp, %tmp, %tmp2   # 4 * ((row * NCOLS) + col)
        addu    %tmp, %matrix, %tmp # matrix + row*NCOLS + col
        sw      $0, (%tmp)          # *(matrix + row*NCOLS + col) = 0

f_col_step:
        addi    %col, %col, 1
        j       f_col_cond

f_col_false:
f_row_step:
        addi    %row, %row, 1
        j       f_row_cond

f_row_false:
f__post:
        # ... postamble elided ...
        jr      $ra

Now I might like to replace %matrix with $a0, %row with $s0, %col with $s1, %NROWS with $t0, %NCOLS with $t1, %tmp with $t2, and %tmp2 with $t3, using some sort of string replacement in my text editor.

Some assemblers (not SPIM, unfortunately) support defining macros either using special syntax or using the C preprocessor. This also provides a useful technique for allocating registers in a region.