In COMP1521, we’re boldly going forth and writing MIPS-flavoured assembly. Unfortunately, there have been some serious style sins committed, so here’s my hot tips on writing good assembly.
(Updated 2020-06-30, with some more rationales.)
Mechanical style
Most of these rules set out to improve whitespace and consistency. Don’t deliberately write dense, cryptic code; assembly is hard enough to read as is.
RULE Set your tab width out to 8, and don’t insert spaces. At 8, 16, and 32 columns, or as close after as possible, place mnemonic, operands, and a line comment, respectively.
This is a controversial one, because of the ever-popular tabs-vs-spaces debate. In this case, I like wide indentation to make patterns in the flow of data more apparent. If you’re abhorrent to such wide indentation, that’s OK: jas uses 3 column indentation; andrewt uses 4, another reasonable value. However: pick something sensible and stick to it.
RULE Labels are never indented. Instructions are always indented.
# BAD: f: bgt $a0, $0, f_a0_false addi $v0, $a0, $a1 # ALSO BAD: f: bgt $a0, $0, f_a0_false addi $v0, $a0, $a1 # ALSO BAD: f: bgt $a0, $0, f_a0_false addi $v0, $a0, $a1 # GOOD: f: bgt $a0, $0, f_a0_false addi $v0, $a0, $a1 # ^ ^ ^ ^ ^ ^ # 8 16 24 32 40 48
This is a readability point, as, effectively, labels are the only ‘landmarks’ in our program. Obscuring them also obscures structure and form. Alignment in this style makes it easier to distinguish labels from instructions or directives.
RULE Don’t indent to show structure. Indent to the same level, and use comments or label names to indicate structure.
# DISGUSTINGLY BAD: f: bgt $0, $a0, f_a0_false f_a0_true: bgt $0, $a1, f_a1_false f_a1_true: add $v0, $a0, $a1 f_a1_false: f_a0_false: li $v0, 0 # GOOD: f: bgt $0, $a0, f_a0_false f_a0_true: bgt $0, $a1, f_a1_false f_a1_true: add $v0, $a0, $a1 f_a1_false: f_a0_false: li $v0, 0 # (better: add vertical whitespace before non-empty labels)
The obvious temptation is to mimic the visual structure of C, or any other language where we use indentation to denote structure. In many languages, including C, there is explicit block-structure which has syntactic and semantic importance, and we emphasise this aggressively.
But the main reason for such a choice is the unstructured nature of assembly: there is no block structure, and indentation to show it makes little sense.
The only real landmarks we have, given we cannot rely on (e.g.,) indentation, are the labels we use, and descriptive labels (or comments) can serve to superimpose a perceived structure onto the unstructured morass of assembly.
RULE Add whitespace between the mnemonic and arguments. Visually align the mnemonics and arguments.
# BAD: f: bgt $a0, $0, f_no li $t0, 4 j f_yes # GOOD: f: bgt $a0, $0, f_no li $t0, 4 j f_yes
This is, again, a readability thing: we want to make it easier to see instruction mnemonics and their operands. By visually aligning them — I suggest aligning mnemonics to column 8, and operands to column 16 — we make it easier to spot patterns of use.
Naming rules
RULE Give labels clear, systematic names.
Some suggestions for a systematic naming scheme follow; if you like them, use them, and use them consistently.
RULE Preface all labels with the function or scope they belong to.
Because there’s no scope bounding the names you can refer to, you need to uniquely name everything, including labels. Given a function
f
, it would be reasonable to prefix all relevant labels in it with, for example,f_
.
RULE Give function epilogues (and, where necessary, prologues), dedicated labels.
It’s also useful to denote “special” labels, like the label for the prologue and epilogue (or prelude and postlude, depending on what you call the sections that set up and tear down stack frames) To avoid confusion, use two underscores to separate the function name from the special label type; for example,
f__epi
orf__post
might mark the epilogue tof
. It’s uncommon to need a specialised name for the prologue, so if you do need it, make it clear what magic and/or evil you’re doing.
RULE In a conditional, label all parts of that conditional, to make it clear how execution has reached here.
I like to use the scheme
function_variable[_condition]
. So, for example, the labelf_n_lt_0
gives us “in functionf
, for variablen
,n < 0
was true”. A special case is the_phi
extension: control flow continues from this point from all arms of the conditional; the name phi is borrowed from SSA form. You may like to come up with your own scheme; but whatever you choose, stick to it.For example,
void f (int n) { if (n < 0) { putchar ('-'); } else if (n > 0) { putchar ('+'); } }
might give these labels:
f: f_n_lt_0: f_n_lt_0_f: f_n_gt_0: f_n_gt_0_f: f_n_phi: f__epi:
I’d explicitly add the multiple labels of a point, too, to make it crystal-clear what’s where.
f: bltz $a0, f_n_lt_0_f f_n_lt_0: li $v0, 11 # print_character li $a0, '-' syscall b f_n_phi f_n_lt_0_f: bgez $a0, f_n_gt_0_f f_n_gt_0: li $v0, 11 # print_character li $a0, '+' syscall b f_n_phi f_n_gt_0_f: f_n_phi: f__epi: jr $ra
RULE In a looping construct, label all parts of that loop.
Following the above naming scheme, I like to use the suffixes
init
,cond
,step
, andf
(orfalse
) to represent the loop initialisation, loop condition, increment of the loop, and the point where control flow resumes when the condition is false. Thestep
suffix should come directly before the instruction(s) that incrementi
. This allows us to build acontinue
analogue. (This isn’t necessary in awhile
loop.)For example,
void f (int n) { for (int i = 0; i < n; i++) { // ... } }
might give us these labels:
f: f_i_init: f_i_cond: f_i_step: f_i_false: f__epi:
Again, more concretely,
f: f_i_init: move $t0, $zero f_i_cond: bge $t0, $a0, f_i_false ## ... f_i_step: addi $t0, $t0, 1 b f_i_cond f_i_false: f__epi: jr $ra
Commenting
To comment a function called main
,
I’d suggest following a template like this:
########################################################################
# .TEXT <main>
.text
main:
# Frame: $fp, $ra, $s0, $s1, $s2, $s3, $s4
# Uses: $a0, $a1, $v0, $s0, $s1, $s2, $s3, $s4
# Clobbers: $a0, $a1
# Locals:
# - `argc' in $s0
# - `argv' in $s1
# - `length' in $s2
# - `ntimes' in $s3
# - `i' in $s4
# Structure:
# main
# -> [prologue]
# -> main_seed
# -> main_seed_t
# -> main_seed_end
# -> main_seed_phi
# -> main_i_init
# -> main_i_cond
# -> main_i_step
# -> main_i_end
# -> [epilogue]
# Code:
# set up stack frame
# ...
# tear down stack frame
There’s a lot of useful information being condensed here. Notably:
-
a visually-distinctive marker for where a section of code begins: I often use a horizontal rule made up of comment characters — here, a line of 72 hashes — as a way to break up large pieces of code.
-
a
.text
directive, immediately followed by the top-level label of this subroutine: this makes it clear where we are. -
the function’s stack frame, listed from high address to low address; listing the frame makes it easier to determine what is at what offset above
$fp
. -
the function’s used registers, in no particular order, so it’s fairly easy to spot where register values might change, and to know which ones are worth saving or restoring.
-
the function’s clobbered registers: the general rule I use is “clobbered = uses - frame” — that is, the clobbered list is the registers whose values will be lost.
-
a list of local variables, and where they’re stored — either in registers, or on the stack — is especially useful if you (like me) are prone to forgetting what value is in what register; you may want to spend some time thinking about how to represent this if there are more locals than usable registers.
-
a fairly lax graph of control flow, which I find serves more as a relative index of where labels are than any strong guide to structure,
It’s especially useful to keep this up-to-date, so you don’t accidentally confuse yourself. Writing assembly is hard enough as is.
RULE Write clear, useful, meaningful comments, that make it clear to the reader what your code is doing, and why.
# Given $s0 is `row' and `t3' is NCOLS: # BAD: mul $t0, $s0, $t3 #confused......... add $t0, $t0, $s1 sb $t2, grid($t0) #how to get grid[row][col] ='.' # GOOD: mul $t0, $s0, $t3 # (row * NCOLS add $t0, $t0, $t1 # ... + col sb $t2, grid($t0) # ... + &grid[0][0]) <- '.' # GOOD: mul $t0, $s0, $t3 # t0 = row * NCOLS add $t0, $t0, $t1 # t0 = (row * NCOLS) + col sb $t2, grid($t0) # *(grid + (row*NCOLS) + col) = '.'
Structured data
RULE When using structured data, always get a base pointer and use fixed offsets.
For example,
struct student { int zid; char *name; double wam; int program; } s;
would be laid out with
zid
at offset 0,name
at offset 4,wam
at offset 8, andprogram
at offset 16.# with a base pointer to a `struct student` in $a0: student_get_zid: lw $v0, 0($a0) student_get_name: lw $v0, 4($a0) student_get_wam: lw $t0, 8($a0) mthc1 $t0, $f0 lw $t0, 12($a0) mtc1 $t0, $f1 student_get_program: lw $v0, 16($a0)
This makes it much easier to use
struct student
andstruct student *
, as both are now effectively identical.You should also make a clear note of the layout and offsets of a data structure: the byte-offsets, the types, the field names, and also where padding may fall.
Allocating registers
One really useful trick: when writing a function, and especially when translating a function from another language, don’t work out what variables are in what registers (“register allocation”). Instead, use percent-prefixed placeholders, then do a search-and-replace for those placeholders with the register you decide to use.
Given:
void f (int matrix[NROWS][NCOLS]) {
for (int row = 0; row < NROWS; row++) {
for (int col = 0; col < NCOLS; col++) {
matrix[row][col] = 0;
}
}
}
It’s much easier to make a first-pass translation referring to those values, to get the logic right.
f:
# ... preamble elided ...
li %NROWS, 4
li %NCOLS, 4
f_row_init:
# int row = 0;
li %row, 0
f_row_cond:
# row < NROWS ? 1 : 0
slt $at, %row, %NROWS
beq $at, $0, f_row_false
f_col_init:
# int col = 0;
li %col, 0
f_col_cond:
# col < NCOLS ? 1 : 0
slt $at, %col, %NCOLS
beq $at, $0, f_col_false
mul %tmp, %row, %NCOLS # row * NCOLS
addu %tmp, %tmp, %col # (row * NCOLS) + col
li %tmp2, 4
mul %tmp, %tmp, %tmp2 # 4 * ((row * NCOLS) + col)
addu %tmp, %matrix, %tmp # matrix + row*NCOLS + col
sw $0, (%tmp) # *(matrix + row*NCOLS + col) = 0
f_col_step:
addi %col, %col, 1
j f_col_cond
f_col_false:
f_row_step:
addi %row, %row, 1
j f_row_cond
f_row_false:
f__post:
# ... postamble elided ...
jr $ra
Now I might like to replace
%matrix
with $a0
,
%row
with $s0
,
%col
with $s1
,
%NROWS
with $t0
,
%NCOLS
with $t1
,
%tmp
with $t2
, and
%tmp2
with $t3
,
using some sort of string replacement in my text editor.
Some assemblers (not SPIM, unfortunately) support defining macros either using special syntax or using the C preprocessor. This also provides a useful technique for allocating registers in a region.