|
TSC: The Programmer's View - Part 2
Assembly Language Programming
Writing in machine language is rarely done today because the process is tedious
and error prone. Imagine trying to write a program that automates balancing a checkbook
using only machine language instructions. It would take weeks to write the thousands of
instructions needed, some of which are destined to be incorrect. Finding an error within a million
zeroes and ones is nearly impossible.
To circumvent this problem, early computer programmers
created a language that is more user friendly than machine language but still provides
access to a computer's individual instructions. They called it assembly language.
Assembly language let's the programmer use an instruction's English mnemonic instead of
it's opcode and allows binary immediate values to be replaced with their decimal
equivalent. It also allows symbolic names, like LOOP or DONE, to be used in place of
memory addresses and branch offsets thus freeing the programmer from having to
remember where every piece of data is at in memory and from having to calculate
branch and jump offsets. A program called an assembler translates
the assembly instructions into binary machine code.
The program below shows the assembly code for a modified version of the
power-of-two multiplication program we wrote in
Part 1. We will
use this code as an example throughout this section.
***************************************
* Multiply a value by a power of two. *
* The result is placed at ANSWER. *
***************************************
* Code section
.ORG 0
BEGIN LAD $2, DATA
LWD $0, $2, 0 ;The power value
LWD $1, $2, 1 ;The number to be multiplied
LOOP SHL $1, $1
ADI $0, $0, -1
BGZ $0, LOOP
LAD $2, ANSWER
SWD $1, $2, 0
HLT
* Data section
.ORG 0X0100
DATA .BSC 4, 3 ;Data for 16 x 3
ANSWER .BSS 1
.END BEGIN
Assembly Instruction Format
An assembly instruction follows this four part format:
LABEL MNEMONIC
OPERANDS COMMENTS
Each part of an assembly instruction must be separated by a space or a tab or
some combination of both. The label and comment parts are optional.
The Label
A label is a symbolic name denoting where the instruction is stored in memory.
It can consist of letters or numbers but must start with a letter.
A label is used in one of three cases:
- The label location is the target of a branch or jump instruction. The
LOOP label in the sample code is the target of BGZ. When a label
is used in the offset field of a branch instruction, the assembler calculates the offset
between the instruction following the branch and the label.
- The label location contains values that are loaded or stored. The location represented
by the DATA label holds the data to be operated on and the location at ANSWER
is where the result is stored.
- The label serves as a reference to the programmer. The BEGIN label in the
sample code reminds the programmer that program execution should start
at that instruction. The label is never actually referenced by any instructions in the program.
The Mnemonic and Operands
Every instruction requires a mnemonic and some operands (except for HLT - it has no
operands). The mnemonic is an English-language replacement for an instruction's opcode. It is
used because a word is usually easier to recognize than an opcode value.
The operands tell the instruction the location of the data to be operated on. Depending on
the instruction, one to three operands are required. Operands are always separated from each
other by commas.
There are three types of operands: register location, constant, and an expression that
reduces to a constant. If the
operand is a register location, the register number is preceded by a dollar sign (i.e. $0, $1, $2, $3).
Constant operands are used in the immediate and target fields. They can be
represented in either decimal or hex where hex numbers are preceded by '0x'.
A constant operand can initially be a two operand mathematical expression. The
expression operands can be decimal or hex numbers or symbolic names. The available
expression operators are +, -, *, and /. The expression is reduced by the assembler to
a constant value before it is inserted into the instruction. Some examples of expression
usage are
ADI $0, $0, A * 5
LAD ARRAY + 0x0009
JMP L - 1
Expressions can be used wherever a constant is used including in all the
assembler directives. The one exception is in the branch
instructions. Offset resolution is complicated by expressions so they are not
allowed.
Comments
It is good programming practice to place comments at strategic places in
your assembly code indicating why the code is there. Comments in TSC
assembly are preceded by a semicolon or an asterisk.
The asterisk can only be used if it's the first character on a line. Commenting
your assembly programs does not affect the execution of the program in any way.
When the assembler encounters a semicolon, it ignores the rest of the line.
Likewise, if the assembler sees an asterisk at the beginning of a line, it
discards the entire line. See the
sample code for an example of using
comments.
Pseudo-instructions
As assembly programmers became proficient, they noticed that some
sequences of instructions were used over and over again throughout their
code. Instead of having to repeatedly type in these sequences, they
created pseudo-instructions. A pseudo-instruction is a one-line command
that replaces a group of real instructions. Pseudo-instructions often
look like real instructions but they are not supported by the hardware.
When the assembler encounters a pseudo-instruction in a program file,
it simply replaces it with the sequence of instructions it represents.
Using pseudo-instructions not only makes writing a program a little
easier, they allow the code to look less cluttered which makes the
program easier to read.
In TSC, it is often desirable to load a 16-bit constant into a register. If
the constant is not found in memory, it has to be created. Since there
is no TSC immediate instruction that can load a value that large, the assembly
language programmer has to use
LHI followed by an
ORI. For example, the only way to
put the number 0x1289 into $3 is to execute
LHI $3, 0x12
ORI $3, $3, 0x89
The LHI-ORI instruction sequence is repeated so often that a Load Address
(LAD)
pseudo-instruction
has been created to represent it. (Don't let the word "address" in the
name
confuse you. Although many 16-bit values used in TSC programs are the
literal addresses of data in memory, the LAD pseudo-instruction can
load in any 16-bit value. It doesn't matter if the value is being
"used" as an address or not.) Using LAD, the sequence above is written
as
LAD $3, 0x1289
or if the label LOOP is equal to 0x1289, it can be written as
LAD $3, LOOP
When the TSC Assembler sees a LAD instruction it takes it out of
the code list, splits the constant operand into two parts, creates the LHI-ORI
instruction pair, and places the pair where the LAD used to be. The
sample code above demonstrates the use of
LAD in a real program.
Currently, LAD is the only pseudo-instruction supported by the TSC
Assembler.
Assembler Directives
Directives are commands within an assembly code file that "direct" the
assembler to control the assembly process and reserve memory locations
for program data. When assembled, the directives do not create
executable instructions. In TSC, all directives are preceded by a period to
distinguish them from instructions.
The TSC directives are:
|
|
|
|
ORG |
Origin |
Assembler Format |
|
.ORG address |
Description |
|
The ORG directive provides the assembler with the
memory address where the next instruction is to be placed. ORG is often
on the first line of the program but is not required there (the TSC Assembler
defaults to address zero if ORG is not found.) ORG is the means by which
separate program segments are created.
The address operand can be a literal number (in hex or decimal),
a symbolic name, or a mathematical expression. If the operand is a symbolic
name (label), it must have been previously defined. ORG cannot have a label
because it does not translate into an instruction or reserved memory location. |
Examples |
|
.ORG 0x56ff ;Next instruction placed at 0x56ff
.ORG START ;Next instruction placed at the value
;of the label START
.ORG START+3 ;Next instruction placed at START+3
See the sample code for an example
of how ORG is used to create segments. |
|
|
|
|
END |
Physical End of Code |
Assembler Format |
|
.END operand |
Description |
|
The END directive marks the physical end of the program. All code
after END is discarded. All TSC programs must have an END directive.
END cannot have a label because it does not translate into an instruction
or reserved memory location. The operand is optional and is ignored by the
assembler. |
Examples |
|
.END ;Pretty simple
.END BEGIN ;The operand indicates where the
;program began |
|
|
|
|
BSS |
Block Storage - Space |
Assembler Format |
|
.BSS operand |
Description |
|
The BSS directive reserves blocks of memory for data storage.
The operand indicates the number of memory locations to be reserved. The
TSC Assembler initializes all locations in the block to 0x0000.
A label is optional but BSS is generally useless without one. The
operand can be a literal number (in hex or decimal), a symbolic name, or a
mathematical expression although expressions and symbolic names usually
are not useful in the context of BSS. If the operand is a symbolic name (label), it
must have been previously defined. The operand must be positive.
|
Examples |
|
A .BSS 10 ;The A block has 10 words
B .BSS 0x0011 ;The B block has 17 words
See the sample code for an example
of how the locations reserved by BSS are accessed by other instructions. |
|
|
|
|
BSC |
Block Storage - Constants |
Assembler Format |
|
.BSC operand list |
Description |
|
The BSC directive reserves blocks of memory for data storage
and initializes the locations in the block to the values in the comma-delimited operand list
(comma-delimited means the elements in a list are separated by commas.)
The number of elements in the operand list determines the number of memory
locations to be reserved.
A label is optional but BSC is generally useless without one. The
elements in the operand list can be literal numbers (in hex or decimal),
symbolic names, or mathematical expressions. If the operand is a symbolic
name (label), it must have been previously defined. The elements can be
positive or negative.
|
Examples |
|
NEG2 .BSC -2 ;-2 stored at NEG2
X .BSC 0x0005, 10, 0x000f, -5
;The X block has 4 words initialized
;to the values in the list
Y .BSC X+1, X+2 ;The value X+1 is stored at Y
;The value X+2 is stored at Y+1
See the sample code for an example
of how the values defined by BSS are accessed by other instructions. |
|
|
|
|
EQU |
Equate |
Assembler Format |
|
label .EQU operand |
Description |
|
The EQU directive assigns the operand value to the
label. The value can represent a memory location or a data constant.
The operand can be a literal number (in hex or decimal), a symbolic name,
or a mathematical expression. If the operand is a symbolic name (label), it
must have been previously defined. The value of the operand can be positive
or negative.
EQU is not needed to write TSC programs but is provided as a convenience
to the advanced assembly programmer.
|
Examples |
|
* EQU used to duplicate label values
X OR $3, $2, $0
Y .EQU X ;Y=X
* EQU used to create constants
A .EQU 56 ;A=56
B .EQU A+4 ;B=60
ADI $2, $2, B-A ;Add 4 to $2
* EQU used to create indexing constants
ESIZE .EQU 4 ;Size of each element in array
INDEX .EQU 7 ;Index of an element in the array
... ;Some code here
LWD $1, $3, INDEX * ESIZE
;$3 contains the address of the first
;element in the array. 4*7 is added
;to that address to get the element
;at index 7 |
[DISECT THE SAMPLE PROGRAM HERE]
Updated on 8 May 2000. Send comments to cs143@pel.cs.byu.edu
© 1999, 2000, Performance Evaluation Laboratory, Brigham Young University.
Reproduction of all or part of this work is permitted for non-profit educational or
research use provided this copyright notice remains intact. All other rights reserved.
|