Coa Unit 4
Coa Unit 4
Coa Unit 4
Introduction
A ssembly languages, like any other computer language, represents a compromise between
people and machines. People regularly process information in what are frequently called
natural languages such as English, French, or German. Computers process information in
what are commonly known as machine languages. The route from human to thought to
machine language is traveled half way by people, who translate their thoughts in to a computer
language, and half way by computers, which use programs called translators to convert
computer language in to machine language.
There are different kinds of translators. The translators that take an entire program and
translate it as a body in to machine language are called compilers. Translators that process
programs one line at a time are called interpreters. Special purpose translators that are
specifically designed to translate assembly language programs in to machine language are
called assemblers.
Figure 5.1 translators process program instructions in to machine language at different levels of
organization
1|Page
The advantage of programming in assembly language lies in its specificity. An assembly
language programmer writes only the code that is absolutely necessary to accomplish a desired
task. The program will be much longer on paper than the equivalent program written a higher
level language, but it will be much shorter in the machine. An assembly language program will
typically occupy much less memory and run much faster than a program written in higher level
language that does the same thing. In general, the real reason for writing programs in assembly
language is to produce programs that run fast.
Aside from the practical advantages of programming in assembly language, there are several
emotionally satisfying reasons for learning this language. Assembly language gives you
complete control over the machine.
Perhaps the greatest value in learning to program in assembly language lies in the satisfaction
that comes from understanding computers and how they work. An understanding of assembly
language is in many respects an understanding of the computer itself.
Objectives……
Starting with any new language can be little intimidating at first. It is often reassuring to begin
by composing and executing a simple program just for the sake of demonstrating to yourself
that you can get the thing to work. Program 5.1 shows a short program that does nothing more
than displays the phrase “Have a nice day! ” on screen.
Each step in the process creates a separate image of the program. The first step, entering the
program, creates a source file that contains the source code of the program. The source code is
the assembly language image of the program. Program 5.1 displays the source code that
displays the text “Have a nice day! “.
Assembling a program converts its source code in to an OBJect file. An OBJect file contains the
machine language image of the source code of a program in skeletal form. It also contains
enough information about the program to enable the linker and the loader to fill in the rest of
the program.
2|Page
Linking a program converts an OBJect file in to EXEcutable file. In an over simplified terms , the
linker patches handles onto each end of an OBJect file and outputs the result as an EXEcutable
file.
Program 5.1
Hex $
Code segment
mov ax , data
mov ds, ax
int $21
int $21
Data segment
end
The loader copies the EXEcutable image of the program in to memory and fills in the few
remaining gaps in the program with the very last details necessary to accommodate the
program in the exact place in memory to which it has been copied. After the loader has done
its work, the final image of the program exists in memory and is ready to be run. When the
loader is invoked directly from the DOS, it automatically begins a running a program
immediately after installing it. Figure 5.2 shows the progress of an assembly language program
from source code to machine languages.
3|Page
Figure 5.2 the four images of an assembly language program.
Hardware Requirements
The program 5.1 and all the programs discussed in this text are designed to run on an
8086/8088‐based computer running under the MS‐DOS or PC‐DOS version 2.10 or later. Intel
corporation, which manufactures the 8086/8088 chip, has also produced a series of increasingly
powerful chips in the same family as the 8086/8088, designated 80186,the 80188,the
80286,the 80386,and the 80486. The assembly language for the 8086/8088 is the subset of the
assembly language for each of these other chips. Programs written for the 8086/8088 should
run without modifications on the machines based on any chip in this family.
Software requirements
Getting your program in the computer, assembling it , linking it, and if necessary, debugging it
are tasks that each require a separate software package. The first thing you will need is a word
or text processor for writing and editing your programs. Almost any word processor will suffice,
so long as it produces what are called standard ASCII text files.
Once a program is written, you will have to assemble it. To assemble the program we need to
have a full featured 8086/8088 macro assembler ELASS.EXE in the current directory of the DOS
where the program is saved. All of the examples under this chapter are designed to be run
through ELASS.
After the program is assembled, it will have to be linked. To link the assembled program again
we need executable file ELINK.EXE in the current directory of the DOS.
Entering a program
If you want to enter a program, first open the DOS command and then change the directory in
to the directory that contains the assembler (ELASS.EXE) and the linker (ELINKER.EXE) and then
write the “Edit” command with the required file name
4|Page
For example, if those two executable files are found in the directory “c:\asm” and if the current
directory is “ C:\Documents and Settings\user> “ then to change in to “C:\asm>” directory see
below:
Start ÆAll programs Æ Accessories Æ Command prompt and you will get the command
prompt window.
To change the directory in to C:\asm writes the following command in the above current
directory
Here our current directory is asm that contains the assembler (ELASS.EXE) and linker
(ELINK.EXE).
(You can also configure the environmental variable not to depend on the directory that
contains the assembler and the linker)
Finally we can open the editor to write the source code of the assembly language using the
command “edit” as shown below.
5|Page
And then you will get this editor that contains program5.1.asm
Notice:
• Our assumption here is the user name that we have used is “user” and also the directory
that contains the assembler and the linker is “asm” folder but you can put the assembler
and the linker in any folder (directory) as you want as you like. But you shouldn’t for get
before assembling the code that the current directory contains the assembler and also
the linker.
To assemble an assembly language program, enter the following command at the DOS prompt:
Where progname is the name of the source file in which the program to be assembled is stored.
There is no need to include the .asm file name extension following progname. The assembler
will infer if you omit it.
6|Page
This command instructs DOS to load the elass assembler and then begin executing it. ELASS
begins by displaying its copyright message. It then reads the name of the source file from the
command line and sets about assembling it. If ELASS does not have any problems during
assembly, it will report ERRORS: NONE, create an object file named PROGNAME.OBJ in the
same directory and on the same device on which it found PROGNAME.ASM, and return to DOS.
If elass encounters one or more errors in the source file, it will issue an error message for each
one. Each error message will identify the nature of the problem and the line number of the
statement that caused it.
Once a program has been assembled and its object file has been recorded on the default
device, it is ready to be linked. To link a program using LINK.EXE, enter the following command
at the DOS prompt:
C:\asm>link progname;
The semicolon following the progname parameter directs LINK to apply its standard defaults. If
you inadvertently omit the semicolon, LINK will prompt for a series of inputs. You can ignore
those prompts and invoke the defaults by repeatedly pressing the enter key. Eventually, LINK
will report that it has finished linking the program. When it has finished linking, it will produce a
file named PROGNAME.EXE on the default device and return to DOS.
But if the linker that we have is ELINK.EXE, enter the following command at the DOS prompt to
link the program:
C:\asm>elink progname;
Under ELINK you can omit the trailing semicolon with out effect. When ELINK begins linking
your program, it will report that it is WRITING PROGNAME.EXE. When ELINK finishes, it will display a
brief index of the program it has just linked.
Once an assembly language program has been linked in to an executable file, it can be loaded
and executed just like any other executable file. To load and execute an executable file, enter
the following command at the DOS prompt:
C:\asm>progname
This command will instruct DOS to invoke the loader. The loader will load PROGNAME.EXE from
the default device in to memory, insert a few finishing touches to it, and automatically begin
7|Page
executing it. It is this memory image of the program created by the loader that the computer
actually executes.
There are two possible kinds of errors you can run in to the course of executing an assembly
language program: processing errors and system errors. Processing errors produce invalid
inputs. They are the familiar errors you can encounter in the course of executing a program
written in any language. System errors, on the other hand, are unique to languages that permit
two‐level operations. A system error is one that corrupts or displays the system under which
the program is running.
This topic provides a statement‐by‐statement analysis of the program. There are three kinds of
statements in the source code of an 8086/8088 assembly language program: instruction
statements, data allocation statements, and directives. The assembler translates each
instruction in to machine language. It reserves and initializes data space in memory for each
data allocation statements. Directives serve to define the context in which instructions and data
allocation statements are processed. This topic begins by introducing the list directive.
There are several ways to generate a listing of an assembly language program. You could either
take advantage of the listing and printing capabilities of what ever word processor you might
have used to generate that program, or you could use the TYPE command or you could insert a
LIST directive in to the program to force the assembler to generate a listing.
The DOS command for displaying a program source file on the video screen is
C:\asm>type progname.asm
Alternatively, you might choose to generate a program listing by using LIST directive. A LIST
directive causes the assembler to produce an annotated listing on the printer, the video screen,
a disk drive, or some combination of the three. An annotated listing shows the text of the
assembly language program numbers each statement in the program, and subject to certain
limitations, shows the offset associated with each instruction and each datum. It also displays
the machine language associated with each instruction. The advantage of using a LIST directive
instead of your word processor or DOS is that the LIST directive produces much more
informative output.
8|Page
Where parameter is SCR, LPN, TXT, OR TOF.
The SCR parameter directs on the video screen. The LPN parameter generates a similar listing
on the printer (PRN or LPT1). TXT parameter creates an ASCII format image of the annotated
listing on the default device under the name progname.TXT, where progname.ASM contains a
device name and/or subdirectory path specification, then progname.TXT will be written to the
specified device and subdirectory rather than to the default device. The TOF( Top Of Form)
parameter directs the assembler to begin each page of the listing with a header that gives the
name of the program, the current page number , the date, and the time of the day.
When a LIST directive is used, it normally appears as the first in a program. When the assembler
encounters a LIST directive it begins listing and continues until it encounters the end of the file
or a NOLIST directive. The general format for a NOLIST directive is:
NOLIST
9|Page
There may be as many LIST and NOLIST directives in a program as you like. The list directive will
initiate listing, and the NOLIST directives will suspend it. Program5.2 is the same as program5.1
except that a LIST directive has been inserted to direct the assembler to generate an annotated
program listing. Figure 5.3 shows the annotated listing of program5.2.ASM.
An annotated program listing consisting of two sections: the annotation of the program, which
appears to the left, and the source code of the program which appears to the right.
An annotated program listing consisting of three columns of data: line numbers, offsets, and
machine codes.
The left most column in the program annotation contains line numbers. The assembler assigns
line numbers to the statements in the source file sequentially. If the assembler should have
occasion to issue an error message, the message will contain a reference to one of these line
numbers.
The second column from the left contains offsets. Each offset indicates the address of an
instruction or a datum as an offset from the base of logical segment. For example, the
statement at line number 0004 produces machine language at offset $0000 of the code
segment, and the instruction at line 0005 produces machine language at offset $0003.
The third column in the annotation displays the machine language produced by each instruction
in the program. In program5.2, line 0005 contains the instruction
Mov DS, AX
The machine language image for this instruction is shown to be 8ED8. Each machine language
image of an instruction statement consists of an opcode byte, which is usually but not always
followed by one or more additional bytes. The opcode is always the first byte of an instruction.
It is also the first byte that the 8086/8088 “reads” as it prepares to execute an instruction. The
opcode tells the 8086/8088 what to do and whether the instruction contains additional bytes.
The opcode in the machine code 8ED8 is 8E. That opcode says to the 8086/8088, “this is a MOV
instruction; read another byte for more details.” The byte D8 (in the context of being prepared
by the opcode 8E) says: “The full text of this instruction reads MOV DS, AX.”
As you can see, the column of machine language is not altogether complete. The listings for
lines 0004 and 0006 indicate that the assembler did not know what the machine language
images for those instructions were going to be. The xxxx at line 0004 occupies a place in the
10 | P a g e
program in to which the loader is supposed to insert segment number of the data segment,
and the xxxx at line 0006 occupies a place where the linker is supposed to insert the offset of
the text “have a nice day!$” .
Missing offsets: The xxxx in the machine language for the instruction at the line 0006 is there
because the assembler does not know the offset of the text of the message “have a nice a
day!$”. The linker must supply that value. It may occur to you to ask why the assembler does
not know the offset of that datum when it is so clearly identified as $0000 in the annotation of
line 0013.
The answer is that the assembler did not know that program5.2 was going to be the only
module in your program. The source code of a program can be divided in to several modules,
and each module can be assembled separately. Then, after each has been assembled they can
all be linked together in to one single program.
When the assembler reports that the offset of the datum defined at line 0013 is $0000, it is just
fudging. The assembler reports offsets on the assumption that it is processing the one and only
module in a program. This is actually just as well for beginning programmers. Most of the
programs beginners write consist of only a single module, so the annotation is quite suitable for
their purposes.
LSB/MSB order: The 8086/8088 stores all word sized values in memory in LSB(Less Significant
Byte)/MSB (More Significant Byte) order. The 8086/8088 will store a number such as $1234 in
to a word of memory with the value $34 in the first byte of the word and the value $12 in the
second byte of that word. For example, the machine language image of the instruction at line
0009 consists of an opcode, $B8, followed by a word of memory containing the value $4C00. A
full text of the machine language for that instruction reads B8004C because the value is stored
in LSB/MSB order.
The right half of an annotated program listing shows the source code of the program itself. Each
assembly language statement appears as some variation on the same basic format:
The elements of a statement must appear in their appropriate order, but no significance is
attached to the column in which an element begins. Each statement must end with a carriage
return , a line feed, or a combination of the two, but the task of managing that is really the
province of the word processor and should be transparent to a programmer. The assembler is
entirely indifferent to case. Any given token could be entered in one part of a program in lower
11 | P a g e
case letters, elsewhere in upper case, and yet elsewhere in some combination of the two. All
such entries would be recognized as representing the same token.
Keywords: A keyword is at the heart of every assembly language statement. The keyword in a
statement defines the nature of that statement. If the statement is an assembly language
instruction, the keyword will be an instruction mnemonic; if the statement is a directive, the
keyword will be the title of the directive; if the statement is a data allocation statement, the
keyword will be a data‐definition‐type.
For example, the keyword in line 0001 of program5.2 is LIST, the key word in line 0002 is HEX,
the key word in line 0003 is SEGMENT , and the keyword in line 0004 is MOV.
Identifiers are composed of the letters of the alphabet, the digits 0 through 9, and the special
characters @, _,? , ! , and $. The first character in an identifier, however, may not be one of the
digits 0 through 9. An identifier may not be one of the assembler’s reserved words.
Comment: A comment is a string of text that clarifies about the program but not part of the
program. A semicolon identifies all subsequent text in a statement as a comment. The
assembler ignores comments and does not process at all.
12 | P a g e
4.4 Assembly Language Directives
Assembly language directives are statements that describe the context of which the instruction
in a program are to be assembled into machine language and in which the data allocation
statements are to be processed into data space. The ELASS assembler supports 28 different
directives of which appear in PROGRAM5.2: LIST, HEX, SEGMENT, and END.
The HEX directive at line 0002 of program5.2 is there to facilitate the coding of hexadecimal
values in the body of the program. That statement directs the assembler to treat tokens in the
source file that begin with a dollar sign as numeric constants in hexadecimal notation. A HEX
directive contains only the source code that follows it, so it is customary to place the HEX
directive at the beginning of the program. If the HEX directive had not been included in
program5.2, the assembler would have processed the tokens beginning with the dollar signs as
identifiers instead of as numeric values in hexadecimal notation. As a result the HEX directive at
line 0002, the assembler recognizes the token $21 in lines 0008 and 0010 and the tokens $4C00
in line 0009 and $0400 in line 0011 as hexadecimal numbers.
A hex directive is only one of several techniques that you can employ to force the assembler to
recognize a numeric value represented in hexadecimal notation.
13 | P a g e
The SEGMENT Directive
A segment directive defines the logical segment to which subsequent instructions and data
allocation statements belong. It also gives a segment name to the base of that segment. This is
critically important. The address of every element in a program must be represented to the
8086/8088 in segment‐relative format. That means every address must be expressed in terms
of a segment register and an offset from the base of the segment addressed by that register. By
defining the base of a logical segment, a SEGMENT directive makes it possible to set a segment
register to address that base and also makes it possible to calculate the offset of each element
in that segment from a common base.
Typically, an 8086/8088 assembly language program will consists of three logical segments: a
code segment, a stack segment, and data segment. Also typically, though not at all necessarily,
the three segments will be named CODE, STACK, and DATA respectively. Don’t confuse the name of
the segment with its role. The code segment is the code segment because it contains program
code , not because it is named CODE . you could edit program5.2 and replace the identifier CODE
in line 0003 with FRED or GEORGE OR APPLE_PIE and the program would work just as well.
A segment directive indicates that all statements following it in the source file through and until
an ENDS (EndSegment) directive or until another segment directive are a part of that logical
segment. In program5.2 the code segment extends from line 0003 through line 0010, the stack
segment consists of line 0011, and the data segment consists of lines 0012 and 0013.
In program5.2 the end of each segment is marked implicitly by the presence of another SEGMENT
directive. Alternatively, the end of the segment can be marked explicitly with an ENDS directive:
14 | P a g e
Even though ENDS directives are optional, many programmers choose to use them
because they tend to document and emphasize the segmenting scheme employed by a given
program.
The first segment directive in program5.2 introduces a logical segment named CODE. By default
the linker assumes that the first segment in a program is its code segment. When the linker
links a program, it makes a note in the header section of the program’s executable file
described the location of the code segment. When DOS invokes the loader to load an
executable file in to memory, the loader reads that note. As it loads the program in to memory,
the loader also makes notes to itself of exactly where in memory it actually places each of the
program’s other logical segments. As the loader turns execution over to the program it has just
loaded, it sets the CS (code segment) register to address the base of the segment identified by
the linker as the code segment. This renders every instruction in the code segment addressable
in segment‐relative terms in the form of CS: XXXX.
The linker also assumes by default that the first instruction in the code segment is intended to
be the first instruction to be executed. That instruction will appear in memory at an offset of
$0000 from the base of the code segment, so the linker passes that value on to the loader by
leaving another note in the header of the program’s executable file. The loader sets the IP
(Instruction Pointer) register to that value. This sets CS:IP to the segment relative address of the
first instruction in the program.
The architecture of the 8086/8088 is such that it automatically executes the instruction at the
address defined by CS: IP. The CPU steps through a program by executing an instruction,
updating the contents of the IP register so that CS: IP points to the next instruction, executing
that instruction, and so forth. Normally the CPU adjusts only the contents of the IP register as it
steps through a program, but a few instructions can cause the CPU to adjust both the CS
register and IP register at once. The loaders act of setting the CS and IP register in accordance
with the direction of the linker turns control of the CPU over the program it has just loaded.
The second segment directive in program5.2 appears at line 0011. That statement defines the
program’s stack segment. The name of the stack segment, like the name of the code segment,
is altogether arbitrary. The identifier STACK to the left of the keyword SEGMENT is the
segment’s name. Line 0011 would have served just as well in this program if it had been
written:
QWERTY SEGMENT STACK $0400
15 | P a g e
The appearance of the STACK segment to the right of the keyword SEGMENT is not at all
arbitrary. Here the word STACK is a parameter that tells the assembler to alert the linker that
this SEGMENT statement defines the program’s stack area.
One of the many reasons a program must have a stack are is that the computer is continuously
carrying on several background operations that are completely transparent, even to an
assembly language programmer. Every 55 milliseconds the CPU has to drop what it is doing,
make a note of the address of the instruction it was about to execute , make a note of the state
of all its registers , and then go about updating the system clock. When it finishes servicing the
system clock , it has to read all those notes, restore all its registers, and go back to doing what
ever it was doing when the interruption occurred. All those notes recorded in the stack.
The size of the stack segment is defined to be $0400 bytes by the set‐aside parameter in line
0011. The linker notes the location and size of a program’s stack segment in that program’s
executable file. The loader uses that information to initialize the SS (Stack Segment) register
and the SP (Stack Pointer) register just before it sets the CS and IP registers to address the first
instruction in the program. The loader sets the SS register to address the base of the stack
segment. That makes every byte in the stack segment is addressable in segment‐relative format
as SS: XXXX. The loader then sets the contents of the SP register equal to the size of the stack
segment in bytes. That initializes SS: SP to address the byte of memory just beyond the last
byte in the program stack.
The third and the last segment in the program5.2 begins at line 0012. This is a data segment. It
contains a single data allocation statement at line 0013. Once again the name of this segment is
entirely arbitrary. It might just as well have been named RALPH, as in
RALPH SEGMENT
But if you should wish to change the name of this segment, remember that its name is
referenced in line 0004, so you should have to change it there too:
The end directive at line 0014 is simply an advisory to the assembler alerting it that it has
reached the end of the program.
16 | P a g e
4.5 DATA ALLOCATION STATEMNT
Data allocation statements set aside and initialize one or more bytes of memory for use as data
space. The general format for a data allocation statement is :
allocates a block memory 17 bytes long , initializes it to the text string “Have a nice a day!$”,
and assigns the identifier Message to it.
Zero, one , or two operands, and if there are two operands, they must be separated by a
comma. When a mnemonic takes two operands, the first operand is called the destination
operand, and the second operand is called the source operand. Following the mnemonic and
the operands, if any, there may be a comment, which, if present, must be preceded by a
semicolon.
The MOV(MOVe) instruction at line 0004 is the first instruction in the code segment. The
general format for a MOV instruction is:
The MOV instruction copies the contents of the source operand into the destination operand. It
is more or less like the assignment operator in higher‐level language. The MOV instruction in
line 0004,
17 | P a g e
MOV AX, DATA
DATA is defined as a segment name on line 0002. The value of a segment name is the
paragraph number at which that segment is loaded by the loader.
Line 0005 reads MOV DS, AX. Here the Ax register is the source operand and the Ds register is the
destination operand. This instruction directs the CPU to copy the contents of the Ax register in
to Ds register. The combined effect of lines 0004 and 0005 is to move the segment number of
the data segment in to the Ds register. You may well wonder why this was not done in a single
statement:
The reason is that in assembly language immediate (constant) values con not be transferred
directly to the segment registers like Ds, to transfer any immediate (constant) values first we
should transfer in to general‐purpose register and then through a general purpose register we
can transfer to segment‐registers. But we can transfer immediate values directly to general‐
purpose register.
This is another “Move immediate to register” instruction. In this case, the immediate value of
the offset message is moved to the Dx register. Offset Message refers to the offset of Message.
The offset of Message is the distance in bytes from the beginning of the data segment to the
first byte in Message.
The mnemonic INT stands for INTerrupt. This mnemonic appears at lines 0008 and 0010 of
program5.2. The INT instruction is a kind of subroutine call. The 8086/8088 sets up 256 special
subroutine calls called software Interrupts, which are generally used by the operating system
and low‐level applications. Put some what loosely, INT $21 means “call special subroutine
number $21.”
INT $21 has more than 100 functions supplied by DOS for most input, output, and other
essential machine functions. Taken collectively, those functions are called DOS function calls.
The DOS function calls are numbered sequentially. The contents of the AH register are used to
specify the function to be invoked. When the INT $21 is invoked, it directs program flow to the
function whose number is then present in the AH register.
18 | P a g e
Lines 0007 and 0008 invoke DOS function $09. That function displays a string of text on the
video screen. According to the protocols of DOS function $09, the string to be displayed must
be stored in memory at a location of identified by DS: DX. Its end must be flagged with dollar
sign, and the contents of the AH register must be set to $09.
Lines 0009 and 0010 of program5.2 invoke DOS function $4C of INT $21. That function returns
system control to DOS. The protocol of DOS function $4C specify that the contents of the Ax
register must contain a return code. A return code of $00 indicates an error‐free execution.
Since the AL and AH registers are the lower and upper halves of the AX register, they can both
be set in one single instruction.
This topic presents two programs designed to introduce the four basic DOS function calls for
console I/O:
Analysis of program5.3
The code in program5.3 extends the programming task introduced in program5.2. Program5.3
does two things. It reads a character from the keyboard, and it displays that character
embedded in message that reads, “the letter you typed was x”. Program5.3 is designed to
introduce DOS function $08 and DOS function $02.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>program5.3<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
19 | P a g e
Documentation
Stylistically, the most striking difference between program5.2 and program5.3 is the presence
of comments in program5.3. There are two ways to document a statement of source code. A
comment can either precede the statement on a separate line, or it can be appended to the line
on which the statement appears.
The first line of program5.2 could have been documented in either of the following manners:
20 | P a g e
; list to screen
LIST SCR
Or
The codes at lines 0010 and 0011 and at line 0036 and 0037 are called the boilerplate code.
The boilerplate code is code that is present in more or less the same form in every assembly
language program. Lines 0010 and 0011 set the ds register so that the program can access the
data segment, and lines 0036 and 0037 get processing control back to DOS when the program
concludes.
DOS function $08 is invoked at lines 0014 and 0015. This function waits for an input from the
keyboard and returns the ASCII value of that input in the Al register. DOS function $08 is
invoked when an int $21 instruction is executed with the value $08 stored in the Ah register.
The instruction at line 0018 copies the contents of the Al register in to Bl register. This
operation is necessary because the program is going to use those contents in the call to
function $02 at lines 0026 through 0028. Before it can make that call, however, the contents of
the Al register will be contaminated ( i.e. changed ) by the call to function $09 at lines 0021
through 0023.
As a rule, the DOS function calls preserve the contents of all registers except for the Ax register
and any other register or registers in which they explicitly return data. Consequently, the
contents of Al register, which is part of the Ax register, will be undefined after the execution of
the INT $21 instruction at line 0023, but the contents of the Bl register will be unaffected.
Composing output
Program5.3 produces its output in three separate steps. First, it outputs a message, “the letter
you typed was “, and second it outputs the character that the user typed. Third and finally, it
outputs two spaces and a period. The first and third parts of the output are constant string
images. They are generated with the same DOS function $09 that was used in program5.3. The
second part of the output uses DOS function $02 to output a single character. To invoke DOS
function $02, a program must execute an INT $21 instruction with the character to be displayed
contained in the Dl register and the value of $02 contained in the Ah register. The sequence of
instructions at lines 0026 through 0028 does just that.
21 | P a g e
The texts of the messages in program5.3 are enclosed in single quotes; where as the text of the
message in program5.2 was enclosed in double quotes. This was done largely to make the point
that you can use either single or double quotes to enclose a string of text in assembly language
program. One advantage of this flexibility is that you can embed single quotes with in text
defined by double quotes and embed double quotes with in text defined by single quotes:
Or
Analysis of program5.4
22 | P a g e
Program5.4
The code at lines 0016 through 0018 of program5.4 invokes DOS function $09 to display the
user prompt “type a letter, please. ”. This line appears on screen immediately below the
command line that invokes the program:
C:\asm>prohram5.4
The combined action of the carriage return/ line feed (CR/LF) sequence positions the cursor at
the start of the following line. The text of the user prompt, “type a letter, please. ”, appears at
the far left of the line immediately below the command line, because that is where the
command line left the DOS cursor pointer just before the program5.4 took over.
25 | P a g e
DOS Function $01
Lines 0021 and 0022 of program5.4 invoke DOS function $01: Keyboard Input with Echo. This
function is invoked when the Int $21 instruction is executed with a value of $01 in the Ah
register. DOS function $01 waits for a keystroke at the keyboard. When a key is pressed, it
returns with the ASCII code for that key stored in the Al register, and echoes that keystroke to
the video screen.
DOS function $01 and $08 are identical except that function $01 displays the image of the key
that was pressed while function $08 does not. When the user presses a key, function $01
echoes its image to the video screen at the current location of the cursor and advances the
cursor one position to its right:
Before program5.4 displays the next line of its output, it must first generate a CR/LF sequence.
Otherwise, the next line would begin where the call to function $01 left the cursor:
There are several ways to generate a CR/LF. The most straightforward one involves using a DOS
function $02 to output a carriage return character, and then using it again to output a line feed
character. The SCII code for the carriage return is $0D. The ASCII code for a line feed is $0A. the
following code generates a CR/LF sequence:
Int $21
Int $21
26 | P a g e
The INC instruction
The INC(Increment) instruction , which appears at line 0077 , has a general form
INC operand
This instruction increases the contents of the operand by a value of 1. In this case inc dl adds 1
to the contents of the dl register. The combined effect of the instructions at lines 0025 and
0076 is to set the contents of the dl register to the ASCII code for the key read in by the call to
DOS function $01 at lines 0021 and 0022
The collective effect of lines 0075 through 0078 is to display the image of the letter that is
alphabetically one position after the letter whose ASCII code is in the BL register.
When you execute program5.4 from the DOS prompt, the screen appears some thing like this:
C:\asm>program5.4
The DEC (Decrement) instruction is the negative counter part of the INC instruction. The
general format for a DEC instruction is:
DEC operand
The control transfer instructions consist of calls, returns, jumps, loops, and interrupts. These
instructions intercept the flow of program control and redirect it elsewhere in a program. They
make it possible for a program to branch, to loop, and to execute subroutines.
27 | P a g e
In the normal course of events; the architecture of the CPU provides for program flow to
proceed sequentially from one instruction to the next. As the CPU reads an instruction, it
automatically updates the contents of the IP register. By the time it has finished reading one
instruction, CS: IP is addressing the opcode of the next. After the CPU has read an instruction, it
executes it, and when it has finished executing that instruction, it begins reading the instruction
at the location then addressed by CS: IP.
The control transfer instructions are a special class of instructions that, when executed, adjust
the contents of the IP register and in some cases the contents of the CS register as well. By the
time the CPU has finished executing one of these instructions, CS: IP is no longer necessarily
pointing to the next instruction in sequence, but may point to an instruction some where else in
the program.
4.8.1 Subroutines
Program5.4 is an example of program that could be improved with the use of a subroutine. As it
now stands, program5.4 contains the same squib of code twice. The code that generates the
CR/LF sequence at lines 0028 through 0033 appears again at lines 0051 through 0056. The same
program could have been coded more succinctly as it appears in program5.5. In program5.5 the
CR/LF generating code appears as a subroutine at lines 0080 through 0089. This subroutine
appears only once, but it is called from two separate points in the program, once at line 0032
and again at line 0048.
Procedures
The program code in program5.5 is divided in to two separate procedures. A procedure named
MAIN, which runs from line 0014 through line 0078 , and a procedure named CRLF, which runs
from line 0080 through line 0089. Each procedure begins with a PROC directive of the form
28 | P a g e
Procname PROC
Procname ENDP
The CALL instruction is used to invoke the code in a subroutine. The general format for an
instruction that calls a subroutine in a procedure is
CALL procname
program5.5
0001 list scr
0002 ;**********program5.5***********************************************
0003 ;*this program is the same as program5.4 except that *
0004 ;* it employs Procedures and Calls. *
0005 ;* *
0006 ;*Asks the user to input a letter from *
0007 ;*the keyboard and responds: *
0008 ;* "The letter you typed was x ." (CR/LF) *
0009 ;* "The letter after x is y ." *
0010 ;*******************************************************************
0011 hex $
0012 code segment
0013 ;*******************************************************************
0014 main proc
0015 ;set the DS register.
0016 0000 B8XXXX mov ax,data
0017 0003 8ED8 mov ds,ax
0018
0019 ;display user promt
0020 0005 B409 mov ah, $09
0021 0007 BAXXXX mov dx, offset user_promt
0022 000A CD21 int $21
0023
0024 ; read keyboard with echo
0025 000C B401 mov ah, $01
29 | P a g e
0026 000E CD21 int $21
0027
0028 ;save input value
0029 0010 8AD8 mov bl, al
0030
0031 ;print the first part of message
0032 0012 E83F00 Call CRLF
0033 0015 B409 mov ah, $09
0034 0017 BAXXXX mov dx, offset message
0035 001A CD21 int $21
0036
0037 ;print contents of bl register
0038 001C B402 mov ah,$02
0039 001E 8AD3 mov dl,bl
0040 0020 CD21 int $21
0041
0042 ;display two spaces and a period
0043 0022 B409 mov ah,$09
0044 0024 BAXXXX mov dx,offset sp_sp_period
0045 0027 CD21 int $21
0046
0047 ;print start of the next line
0048 0029 E82800 call CRLF
0049 002C B409 mov ah,$09
0050 002E BAXXXX mov dx, offset line_two
0051 0031 CD21 int $21
0052
0053 ;print contents of bl register
0054 0033 B402 mov ah,$02
0055 0035 8AD3 mov dl,bl
0056 0037 CD21 int $21
0057
0058 ;print two spaces , "is",
0059 ;two more spaces
0060 0039 B409 mov ah,$09
0061 003B BAXXXX mov dx, offset is_text
0062 003E CD21 int $21
0063
0064 ;print the contents of Bl register plus 1
0065 0040 B402 mov ah,$02
0066 0042 8AD3 mov dl,bl
0067 0044 FEC2 inc dl
0068 0046 CD21 int $21
30 | P a g e
0069
0070 ;display two spaces and a period.
0071 0048 B409 mov ah,$09
0072 004A BAXXXX mov dx, offset sp_sp_period
0073 004D CD21 int $21
0074
0075 ;exit to DOS
0076 004F B8004C mov ax,$4c00
0077 0052 CD21 int $21
0078 main endp
0079 ;********************************************************
0080 CRLF proc
0081 ;generate CR/LF
0082 0054 B402 mov ah,$02
0083 0056 B20D mov dl,$0d
0084 0058 CD21 int $21
0085 005A B402 mov ah,$02
0086 005C B20A mov dl,$0a
0087 005E CD21 int $21
0088 0060 C3 ret
0089 CRLF endp
0090 ;********************************************************
0091 code ends
0092 ;***************************************************
0093 stack segment stack $0400
0094 ;***************************************************
0095 data segment
0096 0000 user_promt db 'type a letter, please. $'
0097 0019 message db 'the letter you typed was $'
0098 0034 sp_sp_period db ' .$'
0099 0038 line_two db 'The letter after $'
0100 004B is_text db ' is $'
0101 data ends
0102 ;***************************************************
0103 end
The CPU does several things in the course of executing a CALL instruction. First, it adjusts the
contents of IP register as if it were about to execute the next instruction, but then, instead of
going on to do so, it records the contents of the IP register in the program stack. Then it adjusts
the contents of the IP register again, this time to point to the first instruction in the named
procedure.
31 | P a g e
The result is that immediately after processing a CALL procname instruction, the CPU begins
executing the first instruction in the named procedure. From there on it continues executing
instructions until it encounters a RET instruction.
A RET(RETurn) instruction transfers program flow back from a subroutine to its parent. When
the CPU encounters a RET instruction, it recovers the note that the CALL instruction directed it
to leave for itself in the program stack. Then it puts the address it reads there into the IP
register and continues execution. This gets the CPU back to where it was when the CALL
instruction sidetracked it.
Placements of procedures
4.8.2 JUMPS
A program jump transfers program flow to the instruction at some specified location in
memory. An assembly language jump is analogous to a GOTO command in a higher‐level
language. The format for an unconditional jump to an address specified by an assembly
language label is:
JMP label
Where label is a program address identifier. A label consists of any valid assembler identifier
followed by a colon. Any instruction statement can be prefixed by a label. The assembler treats
a reference to a label as a reference to the address of the instruction to which that label is
affixed. The JMP instruction in this sequence.
JMP AX_ZERO
.
.
.
AX_ZERO: MOV AX, $0000
32 | P a g e
Would cause the CPU to set the contents of the IP register to address the instruction labeled
AX_ZERO and to continue processing from there.
4.8.3 Branches
A program branch is a point in a program at which program flow can continue in either of two
paths. The path actually taken at a branch is selected under program control based on the state
of some condition. In higher level languages program branches are usually represented as
IF/THEN constructs.
If A==B is a test clause and THEN GOTO 100 is the operational clause. In assembly language
analogue of an IF/THEN statement, the role of the test clause is performed by one instruction
and the role of the operational clause is performed by another. The 8086/8088 mediates the
transfer of information concerning the result of the test from the first instruction to the second
via the flag register.
The flag register is a 16‐bit register, six of whose bits are devoted to status flags and three of its
bits are devoted to control flags. The remaining seven bits in the flag register are undefined.
Generally speaking, the CPU adjusts the status flags in the course of executing arithmetic
operations.
They reflect the outcome of an operation and record information about that outcome in a
manner that renders the information accessible for use in the execution of subsequent
instructions. The control flags control the operation of the CPU in certain circumstances.
In general, most of the instructions that perform arithmetic calculations such as addition or
subtraction will adjust some subset of the status flags to reflect the outcome of their operation.
Two flags of particular interest to programmers are the Zero flag and a carry flag. In general the
zero flag will be set to when an arithmetic operation produces a result of zero and cleared
33 | P a g e
when an arithmetic operation produces a nonzero result. The carry flag will be set by an
operation that produces an unsatisfied carry and cleared by one that does not.
Comparison
The CMP (CoMPare) instruction is frequently used to compare two values and to adjust the
status flags accordingly. The general format for the CMP instruction is:
The CMP instruction accomplishes its task by subtracting the value represented by the contents
of the source operand from the value represented by the contents of the destination operand,
but it does not store the result of that subtraction or affect the contents of either operand. It
merely reflects the status of the result it obtains in the status flags. If the contents of the
destination operand are equal to the contents of source operand, the zero flag will be set;
otherwise the zero flag will be cleared. If the contents of the destination operand are below the
contents of the source operand, the carry flag will be set; if the contents of the destination
operand are above or equal to the contents of the source operand, carry flag will be cleared.
Conditional jumps
A conditional jump instruction will test the state of some specified status flag or flags and direct
program flow accordingly. In assembly language a conditional jump resembles the second half
of an
IF……THEN GOTO……….
statement in higher level language. In 8086/8088 assembly language , this pair of instructions:
CMP AX, BX
JZ Label
Jxxx label
Where xxx describes the condition for which you are testing. The syntax of 8086/8088 assembly
language supports 31 conditional jump instructions. Fifteen of those instructions test a status
34 | P a g e
flag or a combination of status flags and jump to the address indicated by label if the prescribed
condition is true. Another 15 conditional jumps are the converse of the first 15; they force a
jump if the condition proves false. For example, the converse of JZ (Jump if Zero) is JNZ (Jump if
Not Zero). Among the 15 pairs are a few synonyms. For example, the assembler recognizes the
mnemonics JE (Jump if equal) and JNE (Jump if Not Equal) as logically indistinguishable from JZ
and JNZ, respectively.
Six of the conditional jumps are specifically designed for branching based on the outcome of
arithmetic comparisons of unsigned numbers.
JB Jump if Below
JE Jump if Equal
JA Jump if Above
These six conditional jumps test either the Zero flag or the carry flag or both the Zero and Carry
flags. For example, the test performed by the JB instruction will prove true if the carry flag is
set, and the test performed by the JBE instruction will prove true if either the carry flag or the
zero flag is set.
Four of the other conditional jumps are variations on those six instructions:
But each of these variations is really just a synonym for one of the original six. JNB, for example,
is equivalent to JAE.
The 31st conditional jump is a bit anomalous. Instead of testing the status flags, this instruction
tests the contents of the CX register for Zero. The mnemonic for this instruction is JCXZ. It
forces a jump to the address of an indicated label if the contents of the CX register are zero.
35 | P a g e
The following sequence of the instructions an IF/THEN/ELSE structure that executes one squib
of code if the contents of the AX and BX registers are equal to one another and another squib of
code if they are not:
CMP AX, BX
JZ .
. ; Statements
JMP END_IF
EQUAL: . ; If AX is equal to BX
. ; Statement
END_IF:
36 | P a g e
>>>>>>>>>>>>>>>>>>>>>>>PROGRAM5.6<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
37 | P a g e
0041 0014 E83700 call say_output
0042 0017 E91B00 jmp bye_bye
0043
0044 ;say "there is no letter after
0045 ; z ." and exit to DOS
0046 001A E86F00 none_after_z: call CRLf
0047 001D BAXXXX mov dx, offset after_z_text
0048 0020 B409 mov ah,$09
0049 0022 CD21 int $21
0050 0024 E90E00 jmp bye_bye
0051
0052 ;read keyboard buffer and ignore
0053 ; result.
0054 0027 B408 extend_code: mov ah,$08
0055 0029 CD21 int $21
0056
0057 ;print error message
0058 002B E85E00 bad_keystroke: call CRLf
0059 002E BAXXXX mov dx, offset bad_char_text
0060 0031 B409 mov ah,$09
0061 0033 CD21 int $21
0062
0063 ;exit to DOS
0064 0035 E85400 Bye_Bye: call CRLf
0065 0038 B8004C mov ax,$4c00
0066 003B CD21 int $21
0067 main endp
0068 ;********************************************************
0069 get_input proc
0070 ;print input query.
0071 003D E84C00 call CRLf
0072 0040 BAXXXX mov dx, offset input_please
0073 0043 B409 mov ah, $09
0074 0045 CD21 int $21
0075
0076 ;read keyboard entry with echo
0077 0047 B401 mov ah,$01
0078 0049 CD21 int $21
0079 38 | P a g e
0080 ;save input value
0081 004B 8AD8 mov bl,al
0082
0083 004D C3 ret
0084 get_input endp
0085 ;*********************************************************
0086 say_output proc
0087 ;print first part of message
0088 004E E83B00 call CRLf
0089 0051 BAXXXX mov dx,offset message
0090 0054 B409 mov ah,$09
0091 0056 CD21 int $21
0092
0093 ;print contents of BL register.
0094 0058 8AD3 mov dl,bl
0095 005A B402 mov ah,$02
0096 005C CD21 int $21
0097
0098 ;print two spaces and period
0099 005E BAXXXX mov dx,offset sp_sp_period
0100 0061 B409 mov ah, $09
0101 0063 CD21 int $21
0102
0103 ;print CR/LF and start of next line
0104 0065 E82400 call CRLf
0105 0068 BAXXXX mov dx, offset line_two
0106 006B B409 mov ah, $09
0107 006D CD21 int $21
0108
0109 ;print contents of the Bl register
0110 006F 8AD3 mov dl,bl
0111 0071 B402 mov ah,$02
0112 0073 CD21 int $21
0113
0114 ;print two space ,"is",two more spaces
0115 0075 BAXXXX mov dx, offset is_text
0116 0078 B409 mov ah, $09
0117 007A CD21 int $21
0118
0119 ;print contents of the BL plus 1
0120 007C 8AD3 mov dl,bl
0121 007E FEC2 inc dl
0122 0080 B402 mov ah,$02
0123 0082 CD21 int $21
39 | P a g e
0124
0125 ;print two spaces and a period
0126 0084 BAXXXX mov dx,offset sp_sp_period
0127 0087 B409 mov ah,$09
0128 0089 CD21 int $21
0129
0130 008B C3 ret
0131 say_output endp
0132 ;****************************************************************
0133 CRLF proc
0134 ;generate CR/LF
0135 008C B402 mov ah,$02
0136 008E B20D mov dl,$0d
0137 0090 CD21 int $21
0138 0092 B402 mov ah,$02
0139 0094 B20A mov dl,$0a
0140 0096 CD21 int $21
0141 0098 C3 ret
0142 CRLF endp
0143 ;********************************************************
0144 code ends
0145 ;***************************************************
0146 stack segment stack $0400
0147 ;***************************************************
0148 data segment
0149 0000 input_please db 'type a letter, please. $'
0150 0019 message db 'the letter you typed was $'
0151 0034 sp_sp_period db ' .$'
0152 0038 line_two db 'The letter after $'
0153 004B is_text db ' is $'
0154 0053 bad_char_text db 'Invalid character, sorry.$'
0155 006D after_z_text db 'there is no letter after z$'
0156 data ends
0157 ;***************************************************
0158 end
40 | P a g e
4.8.4 Loops
A program loop can be implemented with either an explicit counter or an implicit counter.
Explicitly counted loops usually rely upon the fact that the DEC (Decrement) instruction not only
subtracts 1 from the contents of its operand but also sets the Zero flag if that subtraction
results in a zero and clears the zero flag if doesn’t . The skeleton of an explicitly counted loop
might look like this:
Loop_Label:
DEC CX
JNZ Loop_Label
The DEC CX instruction will subtract 1 from the contents of the CX register and adjust the Zero
flag accordingly. program flow will continue to cycle through the loop until the contents of the
CX register are reduced to zero.
Example4.1
Write a program that displays the following output using a loop without using data allocation
statement:
ABCD
ABCD
ABCD
ABCD
41 | P a g e
SOLUTION:
List scr
Hex $
Code segment
Mov ax,data
Mov ds,ax
Loop_outer:
Mov bl,$04
Loop_inner:
Mov dl,bh
Mov ah,$02
Int $21
Inc bh
Dec bl
Jnz loop_inner
Call CRLF
Dec CX
jnz loop_outer
;exit to DOS
Mov ax,$4c00
Int $21
42 | P a g e
CRLF proc
;generate CR/LF
mov ah,$02
mov dl,$0d
int $21
mov ah,$02
mov dl,$0a
int $21
ret
CRLF endp
Code ends
Data segment
43 | P a g e
Summery
• The translators that take an entire program and translate it as a body in to machine
language are called compilers.
• Translators that process programs one line at a time are called interpreters
• Special purpose translators that are specifically designed to translate assembly language
programs in to machine language are called assemblers.
• Assembling a program converts its source code in to an OBJect file. An OBJect file
contains the machine language image of the source code of a program in skeletal form
• There are three kinds of statements in the source code of an 8086/8088 assembly
language program: instruction statements, data allocation statements, and directives.
• A comment is a string of text that clarifies about the program but not part of the
program. A semicolon identifies all subsequent text in a statement as a comment.
• The Hex directive directs the assembler to treat tokens in the source file that begin with
a dollar sign as numeric constants in hexadecimal notation. A HEX directive contains
only the source code that follows it, so it is customary to place the HEX directive at the
beginning of the program. If the HEX directive had not been included in program5.2,
the assembler would have processed the tokens beginning with the dollar signs as
identifiers instead of as numeric values in hexadecimal notation
• A segment directive defines the logical segment to which subsequent instructions and
data allocation statements belong. It also gives a segment name to the base of that
segment.
• The first segment directive in program introduces a logical segment named CODE. By
default the linker assumes that the first segment in a program is its code segment.
• The second segment directive in program defines the program’s stack segment.
• The third and the last segment in the program is a data segment. It contains a single
data allocation statement.
• The general format for a data allocation statement is :
{ Varname } data‐definition‐type { init {{,init}} }
• The MOV instruction copies the contents of the source operand into the destination
operand
MOV destination, source
• The offset of Message is the distance in bytes from the beginning of the data segment to
the first byte in Message.
Mov dx, Offset message
• DOS function $01: keyboard input with Echo.
• DOS function $02: character output.
• DOS function $08: keyboard input with out echo.
44 | P a g e
• DOS function $09: string output.
• The flag register is a 16‐bit register, six of whose bits are devoted to status flags and
three of its bits are devoted to control flags. The remaining seven bits in the flag register
are undefined. Generally speaking, the CPU adjusts the status flags in the course of
executing arithmetic operations.
45 | P a g e
Exercise
Or
as appropriate.
46 | P a g e
13. Write a program that displays the following ten lines of output:
The point of this exercise is to give you practice with program loops, so try to write this
program with out using any data allocation statements.
14. Write a program that displays the following ten lines of output:
The point of this exercise is to give you practice with program loops, so try to write this
program with out using any data allocation statements.
47 | P a g e
48 | P a g e