Unix and Shell Programming (@dcoder)
Unix and Shell Programming (@dcoder)
Unix and Shell Programming (@dcoder)
By
O F F IC E S
B a n ga lo re 080-26 75 69 30 C h en n ai 044-24 34 47 26
C och in 0484-237 70 04, 405 13 03 G u w ah ati 0361-251 36 69, 251 38 81
H y d era b a d 040-24 65 23 33 J alan d h ar 0181-222 12 72
K o lk a ta 033-22 27 43 84 L u ck n o w 0522-220 99 16
M u m b ai 022-24 91 54 15, 24 92 78 69 P atn a 0612-230 00 97
R a n ch i 0651-221 47 64, 220 44 64
U U S -9626-145-U N IX A N D S H E L L P R O G -C H A C —
Typeset at : K alyan i C om pu ter S ervices, N ew D elh i. 2355/010/10
D edicated to
Lord G anesha
(v)
CONTENTS
Preface (xi)
Acknowledgement (xii)
(uii)
(viii)
— A u th o rs
(xi)
ACK N O W LED G EM EN T
(xii)
Chapter 1
INTRODUCTION TO UNIX
1.1 DEVELOPMENT OF UNIX
UNICS (Uniplexed Information and Computing System) 1970. And then UNIX—1973 in ‘C’.
1.1.1 “Bill Joy” (AT&T Bell Laboratory) is a Student Who Wrote Vi Editor
Microsoft was the first to run UNIX on a PC with 640 KB of memory. They called their
product XENIX that was bared on earlier edition of AT&T. But with some BSD borrowed utilities.
XENIX was later sold off to SCO (The Santa, cruz operation) who today markets the most popular
commercial brand of UNIX for the desktop—SCO UNIX. It now offers two major flavors—SCO
open server release 5 and SCO UNIX ware 7: the later is SVR 4—compliant,
Shell
Uni
x
Com
er man
Oth lication ds
App tem SW User
User Sys Kernel Shell
Shell
s
H/W
er
pil
Da
m
Pa ta ba
Co
cka se
ge
Internet
Tools
Shell
Shell
Us
Us er
er
Fig. 1.1
1
2 U NIX AND S HELL P ROGRAMMING
User
(i) Ordinary user → Work only their own working directory.
(ii) Super user → Super user have the command on entire system. They can add user.
Remove user etc.
1.2.1 The Reasons for Popularity and Success of the UNIX System
(i) The system is written in high level language, making it easy to read, understand
change and move to other machines.
I NTRODUCTION TO U NIX 3
(ii) It has a simple user interface that has the power to provide the services that users want.
(iii) It provides primitives that permits complex programs to be built from simpler
programs.
(iv) It user a hierarchical file system that allows easy maintenance and efficient
implementation.
(v) It uses consistent format for files, the byte stream, making application programs easier
to write.
(vi) It provides a simple, consistent interface to peripheral devices.
(vii) Its a multiuser, multiprocess system. Each user can execute several process
simultaneously.
(viii) It hides the m/c architecture from the user, making it easier to write programs that run
on different h/w implementations.
1. Multiuser System
In a multiuser system, the same computer resources–hard disk, memory etc., are accessible
to many users. The users don’t flock together at the same computer but are given different terminal
to operate from. All terminals are connected to the main computer whose resources are available
by all users. The following figure shows a typical UNIX setup.
Terminal Terminal
Host M/c
Terminal Terminal
Fig. 1.2
4 U NIX AND S HELL P ROGRAMMING
Host M/C also known as server or a console. The number of terminal connected to the Host
M/C depends on the number of ports that are present in its controller card. There are several type
of terminal that can be attached to host M/C.
(i) Dumb Terminal—These terminal consist of a keyboard and a display unit with no
memory or disk of its own. These can never act as independent machine.
(ii) Terminal Emulation—A PC has its own microprocessor, memory and disk driver. By
attaching this to host M/C through a cable and running a S/W from this PC. We can
emulate it to work as if it is a dumb terminal. At such times the memory and disk are
not in use and the PC can’t carry out any processing its own. The S/W that makes the
PC work like a dumb terminal is called terminal emulation S/W. VTERM and XTALK
are two such popular S/W.
(iii) Dial-In-Terminal—These terminal used telephones lines to connect with host M/C. To
communicate over telephone lines we attach a modem.
Modem
Fig. 1.3
2. Multitasking Capabilities
Its capable of carrying out that a single user can run more than one job at the same time.
In UNIX this is done by running one job normally and other job in background. This is managed
by dividing the CPU time b/w all processes.
“The multitasking provide by MS-DOS is known as serial multitasking”.
3. Communication
The communication may be within the n/w of a single main computer or between two or
more such computer n/w. The users can easily exchange mail, data, programs through such
n/w. Distance poses no barrier to passing information or message to and from.
4. Security
UNIX has three inherent provision for protecting data.
(i) By assigning passwords and login names to individual users ensuring that not any-
body can come and have access to your work.
(ii) At the file level there are read, write, and execute permissions to each file decide who
can access a particular file.
(iii) File encryption.
I NTRODUCTION TO U NIX 5
5. Portability
It can be ported to almost any computer system with only the bare minimum of adaptations
to suit the given computer architecture.
date eat who ask rdsk passud Anoap Unix bin kumar sharma
Fig. 1.4
In UNIX everything is treated as file. Its still necessary to divide there files into three
categories:
(i) Ordinary files—Contains only data. This includes all data, source programs, object
and executable code all UNIX commands as well as any files created by the user. The
most common type or ordinary file is the text file.
(ii) Directory files—A directory contains no external data but keeps some details of the
files and sub-directories that it contains. A directory file contains two field for each file.
The name of the file and its identification number.
(iii) Device files—Consider physical device as files. This definition includes printers, tapes,
floppy driver, CD-ROMs, hard disk and terminal.
2.1.1 Differences
(i) Process in user mode can access their own instructions and data but not Kernel
instructions and data. Process in Kernel mode however can access Kernel and user
addresses. For example, the virtual address space of a process may be divided b/w
7
8 UNIX AND S HELL P ROGRAMMING
addresses that are accessible only in Kernel mode and addresses that are accessible in
either mode.
(ii) Some m/c instructions are privileged and result in an error when executed in user
mode. For example a m/c may contain an instruction that manipulates the processor
status register. Process executed in user mode should not have this capabilities.
Although the system executes in one of two modes, the Kernel runs on behalf of a user
process. The Kernel is not a seperate set of processes that run in parallel to user-process but its
part of each user process.
3 Interrupt Disk
CPU
controller Controller
4 2
1
Fig. 2.1
Steps in starting an I/O devices and getting interrupt.
In step 1. The driver tells the controller what to do by writing into its device Register. The
controller then starts the device. When the controller has finished reading or writing the number
of bytes it has been told to transfer, it signals the interrupt controller chip using certain bus lines
in step 2.
If the interrupt controller is prepare to accept the interrupt it assert a pin on CPU chip
informing it, in step 3.
In step 4. The interrupt controller puts the number of the device on the bus so the CPU can
read it and know which device has just finished.
Current Instruction
Interrupt processing involves taking
Next Instruction the interrupt, running, the interrupt
3 Return handler, and returning to the user
1 Interrupt program.
2 Dispatch
to handler
O PERATING S YSTEM S ERVICES 9
Exception
An exception condition refers to unexpected events caused by a process such as addressing
illegal memory, executing privileged instructions, dividing by zero and soon. OR “Exception are
run time error”.
Difference
Exception Interrupt
M/C Errors
Clock
Higher Priority
Disk
N/W Devices
S/W Interrupts
Interrupt Level
Fig. 2.2
Memory Management
The Kernel permanently resides in main memory as does the currently executing process.
User Programs
Trap Libraries
User Level
Kernel Level
Inter-process
Process
Communication
Control
File Subsystem
Schedular
Subsystem
Memory
Management
Buffer Cache
Character Block
Device Drivers
Hardware Control
Kernel Level
H/W Level
Hardware
Trap
A trap (or exception) is a S.W. generated interrupt caused either by an error (for example,
division by zero or invalid memory access) or by a specific request from a user program that an
0.5 service be performed.
File Subsystem
The file subsystem manages files, allocating file space administering free space controlling
access to files and retrieving data for users. Process interact with the file subsystem via a specific
set of system calls such as open, read, write, shown, stat, chmod etc.
2.3.2 Buffer
The file subsystem accesses file data using a buffering mechanism that regulates data flow
b/w the Kernel and secondary storage devices. The buffering mechanism interacts with block
I/O device drivers to initiate data transfer to and from and Kernel Block I/O device are random
access storage device to the rest of the system. It also interacts directly with raw I/O devices
(character device that are not block device) without the intervention of buffering mechanism.
O PERATING S YSTEM S ERVICES 11
Chapter 3
FILE SYSTEM
After the disk has been partitioned its still not ready for use. A file system has to be created
in each partition. There usually are multiple file systems in one m/c each one having its own
directory tree headed by root.
Every file system is organized in a sequence of blocks of 512 bytes each. (1024 in Linux) and
will have there four components:
12
F ILE S YSTEM 13
3. Inodes—Since the block of a file are scattered throughout the disk, its obvious that
the addresses of all its blocks have to be stored and not just the starting and ending
ones. These address are available in the term of link list in the inode a table main-
tained individually for each file. All inode are stored in inode blocks distinctly
separate from the data block and are arranged contiguously in a user in accessible
area of file system.
Inode Entry
Owner
Group 1024
File Type
Permission 7034
Access Time
Modification Time
Size
0 1024
1 7034
2 1392
3 2497
4 7877
5 6745
6 7045
7 8004
8 7056 8096
9 8096
10 7096
11 3045
12 4506
13
3.3 PROCESS
A process is a program in execution and consists of a pattern of bytes that the CPU interprets
as m/c instruction (called “text”), “data” and “stack”. Many process appears to execute simul-
taneously as the Kernel schedules them for execution and several processes may be instance of
one program. A process executes by following strict sequence of instruction that is self contained
and does not jump to another process, it reads and writes its data and stack sections, but it can’t
read the stack and data sections of other process. Processes communicate with other processes
via system calls.
A process on UNIX system is the entity that is created by “fork” system calls. Every processes
except process O (sched) is created when other processes invokes the “fork” system call. The
process that invokes the “fork” system call is called parent process and newly created process
is called child process. The Kernel identifies each process by its process number called PID.
A user compile the source code of a program to create an executable file which consist of
several parts:
(i) A set of “headers” that describes the attributes of the file.
(ii) The program text.
(iii) A m/c language representation of data that has initial values when the program starts
execution and an indication of how much space the Karnel should allocate for
uninitialized data, called brs2 (the Kernel initilize it to O at run time).
(iv) Other section such as symbol table information. bss (block started by symbol) comes
from an assembly pseudo operator on IBM 7090 m/c.
The Kernel loads an executable file into memory during an exec. system call and the loaded
process consist of atleast three parts called regions; text, data and stack. The text and data regions
16 U NIX AND S HELL P ROGRAMMING
correspond to the text and data brs sections of the executable file but the stack region is automati-
cally created and its size is dynamically adjusted by the Kernel at run-time.
The stack consists of logical stack frames that are pushed when calling a function and
poped when returning; a special register called the stack pointer indicates the current stack depth.
We know that in UNIX system a process can execute either in (i) User or (ii) Kernel modes.
It uses a separate stack for each mode.
The user stack contains the arguments, local variables and other data for functions executing
in user mode.
The Kernel stack contains the stack frames for functions executing in Kernel mode. The
functions and data entries on the Kernel stack refer to functions and data in Kernel not the user
program but its construction is the same as that of user stack.
The Kernel stack of a process is null when the process execute in user mode.
User stack Kernel stack
Addr of Frame 2
Per Process
Region Table Region Table
U Area
Main Memory
Process Table
Fig. 3.4
U-Area (User-area)
U-area contains information describing the process that needs to be accessible only when the
process is executing. The important fields are:
(i) A pointer to the process table stot of the currently executing process.
(ii) Parameters of the current system call return values and error codes.
(iii) File descriptors for all open files.
(iv) Current directory current root.
(v) Process and file size limit.
The Kernel can directly access fields of the u-area of executing process but not of the u-area
of other-processes.
Process Table
The process table is resident all the time and contains information needed for all processes
even those that not currently present in memory. The user structure is swapped or page-out when
its associated process is not in memory. The process table entry and u-area contain control and
status information about the process fields in the process tables are:
(i) A state field.
(ii) Identifiers indicating the user who owns the process (UID).
(iii) An event descriptor set when a process suspended.
(iv) Memory Image—Pointers to the text, data and stack segment or if paging is used to their
page tables.
Region Table
Region (called segment in some early version and Berkeley releases) is a contiguous area of
virtual address space that can be treated as a distinct object to be shared or protected. Region is
a data area which may be accessed in mutual exclusive manner. In the region page table actual
page descriptors are stored.
18 U NIX AND S HELL P ROGRAMMING
A process in UNIX terminology consist of three separate section, Text or Code, data and
stack. Each occupies a contiguous area of virtual memory separate section belonging to a single
task may be placed in non-contiguous areas of virtual memory. For example when a UNIX task
creates a child process via tork() system calls the two share a single copy of the text region but
the child obtains a fresh private copy of the parent's data regions.
The region table entries describes the attributes of the region, such as whether its contains
text or data, whether its shared or private and where the “data” of the region is located in memory.
When a process invokes the exec. system call the Kernel allocates region for its text, data and stack
after freeing the old regions the process had been using when a process invokes tork () system
call Kernel duplicate the address space of old process allowing process to share, regions when
possible and making a physical copy otherwise when a process invokes exit () system call the
Kernel trees the region th process had used.
Kernel K K
User U U
In this figure, the Kernel does a context switch when it changes context from process A to
process B.
(iii) The process is not executing, but its ready to run as soon as the schedular chooses it.
Many processes may be in this state and the scheduling algorithm determine which one
will execute next.
(iv) The processes is sleeping. A process puts itself to sleep when it can no longer continue
executing such as when its waiting far I/O to complete.
User
1
running
sys call
or Interrupt Return
Interrupt
Kernel Running 2 Interrupt return
schedule
process
sleep
bp1
bp
bp bp1
Incorrect
20 U NIX AND S HELL P ROGRAMMING
To solve this problem, the system could prevent all interrupts while executing in Kernel
mode, but that would delay servicing of the interrupt. Instead the Kernel raises the processor
execution level to prevent interrupts when entering critical regions of code. A section of code is
critical if execution of arbitrary interrupts handlers could result in consistency problems.
The Kernel protects its consistency by allowing a content switch only when a process puts
itself to sleep and by preventing one process from changing the state of another process. It also
raises the processor execution level around critical regions of code to prevent interrupts that could
otherwise cause inconsistency.
Buffer locked
sleeps
Buffer locked
sleeps
Buffer locked
sleeps
Buffer in Unlocked Wakeup all sleeping Process
Runs
Buffer Unlocked
Lock Buffer
Runs Runs
Buffer locked Buffer locked
sleep sleeps
Wakeup
unlocks Buffer
wakeup all sleeping proces
Context switch
Ready to Run Ready to Run
eventually
Runs
F ILE S YSTEM 21
For example, a process executing in Kernel mode must sometimes lock a data structure in
case it goes to sleep at a later stage, process attempting to manipulate the locked structure must
check the lock and sleep if another process owns the lock. The Kernel implement such lock in
following manner,
while (condition is true)
sleep (event: the condition becomes false); set condition true;
It unlocks the lock and awakens all processes sleep on the lock in following manner:
set condition true;
wakeup (event: The condition is false);
Multiple process Sleeping on a lock.
community. They are distinguish from general user process only in the rights and privileges they
allowed. The Kernel does not recognize a separate class of administrative process.
Following figure shows some of the common fields present in UNIX system.
23
24 U NIX AND S HELL P ROGRAMMING
the buffer. A disk block can never map into more than one buffer at a time. If two buffers were
contain data for one disk block, the Kernel would not know which buffer contained the current
data and could write incorrect data back to disk. “A θ B” → current.
The buffer Header contains a:
(i) Device number—Specific file system. Its the logical file system number, not a physical
device (disk) unit number.
(ii) Block number—It specify the block number of the data on disk and uniquely identify the
buffer.
Device Number
Block number
Ptr to Previous buf
on hash Queue Ptr to data area
status
Ptr to previous
buf on free list
forward ptrs
Free list
buf 1 buf 2 buf n
head
back ptrs
before
after
FP
Free list
buf 2 buf n
head
BP
The Kernel caches data in the buffer pool according to a LRU algorithm: after it allocates a
buffer to a disk blocks, it cannot use the buffer for another block until all other buffers have been
used more recently. The Kernel maintains a free list of buffers that preserves the least recently used
order. The free list is a doubly linked circular list of buffers with a dummy buffer header that
marks its begining and end when the Kernel returns a buffer to the buffer pool it usually attaches
the buffer to the tail of the free list, occasionally to the head of the free list (for error case), but
never to the middle.
When the Kernel access a disk block, it searches for a buffer with the appropriate device-block
number combination. Rather than search the entire buffer pool, it organizes the buffers into
separate queues hashed as a function of the device and block number the Kernel links the buffers
on ahash queue into a circular, doubly linked list. The number of buffers on a hash queue varies
during the lifetime of the system. The Kernel must use a hashing function that distributes the
buffers uniformally accross the set of hash queues. Yet the hash function must be simple so that
performance does not suffer. System administrators configure the number of hash queues when
generating the O.S.
Block no 0 mod 4 28 4 64
1 mod 4 17 5 97
2 mod 4
98 50 10
3 mod 4
3 35 99
Blk no 0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4
8 3 99
Freelist header
0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4
8 3 99
Freelist header
T he K ernel m ay read d ata from the d isk to the bu ffer and m anipu late it and w rite d ata to the
buffer and p ossibly to d isk. T he K ernel leaves the bu ffer m arked busy. W hen the K ernel finishes
u sing the bu ffer it release the bu ffer accord ing to algorithm “brelese”. It w akeup p rocesses that
had fallen asleep becau se the bu ffer w as busy and p rocess that had fallen asleep becau se no
buffers rem ained on free list. T he K ernel places the buffer at the end of the free list, u nless an
I/O error occu red or unless it m arked the bu ffer “old ”. T his is now free for another process to
claim it.
blk no 0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4
8 3 99
Freelist header
0 28 4 5
1 10 9 6
2
2 11 13
3
3 99
Freelist header
3. The Kernel cannot find the block on the hash queue and in attempting to allocate a buffer
from free list, finds a buffer on the free list that has been marked “delayed write”. The
Kernel must write the “delayed write” buffer to disk and allocate another buffer.
H.Q.H.
blk no 0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4 delay
8 3 99
delay
Freelist header
Fig. 4.8. Search for Block 20. Delayed write blocks on free list.
H.Q.H.
blk no 0 mod 4 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4 writing
8 3 99
writing
Freelist header
4. The Kernel cannot find the block on the hash queue and the free list of buffers is empty.
blk no 0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4
8 3 99
Freelist header
H.Q.H.
blk no 0 mod 4 28 4 5
1 mod 4 10 9 6
2 mod 4
2 11 13
3 mod 4
8 3 99
Busy
Freelist header
Allocate buffer to
block b lock
buffer initiate I/O
sleep until I/O done
Find block ‘b’ on
hash Queue, Buffer
locked sleep
Sleep waiting for
any free buffer
I/O done, wakeup
Time
Chapter 5
READING AND WRITING
DISK BLOCKS
To read a disk block a process uses an algorithm to get a Block search for it in the buffer
cache. If its in the cache, the Kernel can return it immediately without physically reading the block
from disk. If its not in cache, the Kernel calls the disk driver to schedule a request and goes to
sleep awaiting the event that the I/O completes. The disk drivers notifies the disk controller h/w
that it want to read data and the disk controller later transmits the data to buffer. Finally the disk
controller interrupts the processor when the I/O is complete and the disk interrupt handler
awakens the sleeping process: the contents of the disk block now in the buffer.
Central Computer
Channels
Controller Controller
31
32 U NIX AND S HELL P ROGRAMMING
In this scheme I/O channels are not connected directly to the disk unit they service. This fact
cause the designer to investigate a bottleneck, further deciding to incorporate disk scheduling.
If a controller becomes saturated the designer may wish to reducing the number of disk on
that controller. Thus h/w reconfiguration needed to eliminate certain bottlenecks.
block num = ((inode number – 1)/no. of inodes per block) + start block of inode list
Ex. start block = 2, inode no = 8, no. of inodes per block = 8
block num = ((8 – 1) / 8 ) + 2 = (7/8) + 2 = 0 + 2 = 2
When the Kernel knows the device and disk block number it reads the block then uses the
following formula to compute the byte offset of the inode in the block.
= (inode no. – 1) modulo (no. of inodes per block) * size of diskinode.
If each disk inode occupies 64 bytes (size of disk inode) and there are 8 inodes to disk block,
then inode no 8 starts at byte offset 448 in disk block.
= ((8 – 1) % 8) * 64 = (7 % 8) * 64 = 7 * 64 = 448
The Kernel removes the in-core inodes from free list, places it on the correct hash queue and
sets its in-core reference count to 1. It copies the file type, owner fields permission settings, link
count file size, and the table of contents from the disk inode to the in-core inode and returns a
locked inode.
The Kernel manipulates the inode lock and reference count independently. The lock is set
during execution of a system call to prevent other processes from accessing the inode while its
in use. The Kernel release the lock at the conclusion of the system call an inode is never locked
across system calls. The Kernel increment the reference count for every active reference to a file.
It decrements the reference count only when the reference becomes inactive. The reference count
thus remains set across multiple system calls. The lock is free between system calls to allow
processes to share simultaneous access to a file; the reference count remains set between system
calls to prevent the Kernel from real locating an active in-core inode. Thus the Kernel can lock
and unlock an allocated inode independent of the value of reference count.
Processes have control over the allocation of inodes at user level via execution of open () and
close () system calls and consequently the Kernel cannot guarantee when an inode become
available. Therefore a process that goes to sleep waiting for a free inode to become available may
never wakeup. Rather than leave such a process “hanging” the Kernel fails the system call.
5.5.1 Buffer
However process do not have such control over buffers. Because a process cannot keep a
buffer locked across system calls, the Kernel can guarantee that a buffer will become free soon and
a process therefore sleeps until one is available.
If the inode is in the cache, the process (A) would find it on its hash queue and check if the
inode was currently locked by another process (B). Process (A) sleeps setting a flag in the in-core
inode to indicate that its waiting for the inode to become free. When process (B) unlocks the inode,
it awakens all processes waiting for the inode to become free when process a finally able to use
the inodes it locks the inode so that other process cannot allocate it.
output : none
{ lock inode if not already locked;
decrement inode reference count;
if (reference count = = 0)
{ if (inode link count = = 0)
{ free disk block for file;
set file type to 0;
free inode;
}
if (tile accessed or inode changed or file changed)
update disk inode;
put inode on free list;
} release inode lock;
}
40 50 60 70
Block Address
40 50 60 70 81
file blocks in the inode is difficult to manage. If a logical block contain 1K bytes. Then a file
consisting of 10K bytes would require an index of 10 block numbers. Either the size of inode
would vary according to the size of the file, or a relatively low limit would have to be placed on
the size of a file.
0 4096
1 228
2 45423
3 0
4 111 367
Data Block
5 0
6 126
7 354
8 367
9 485
75 3333
10 925 331
331 333
11 0 0 Single Indirect Data Block
9156
12 285
Double Indirect
Process access data in a file by byte offset. They work in terms of byte counts and view a
file as a stream of bytes starting at byte address O and going up to the size of the file. The Kernel
converts the user view of bytes into a view of blocks. The file starts at logical block O and
continues to a logical block number corresponding to the file size. The Kernel access the inode
and converts the logical file block into the appropriate disk block.
Consider the block layout in following figure and assume that a disk block contains 1024
bytes. If a process wants to access byte offset 9000. The Kernel find that the byte is in direct block
8 in the file (counting from 0).
Since in 8th block (trom 1) total bytes can be 8192 and in 9th block (9216) so the 9000 byte
will be locate in 9th block (block number 8).
1024 = 216 = 808
It then access block number 367. The 808th byte in that block is byte 9000 in the file.
If a process wants to access byte offset 3,50,000 in a file. It must access a double indirect block.
Since
256K + 10K = 272,384
READING AND W RITING DISK B LOCKS 39
So
(256K + 256K) + 1024 * 10 = 534528
⇒ 534,528 – 350,000 = 184528
⇒ 256K – 184528 = 262144 – 184528
= 77616
So byte number 77,616 of a single indirect block is in the 75th direct block in the single
indirect block, block number 3333.
Since for single 256K + 10K = 272384 < 350,000
So Double 256K + 256K + 10K = 534528
534528 – 350,000 = 184528
So in single 256K – 184528 = 77616
So the bytes is in the 77616/1024 = 75.7 blocks
which is in the 76th block and block number 75 counted from O.
1024 * 76 – 77616 = 208
So the byte number 1024 – 208 = 816
Several block entries in the inode are a meaning that the logical block entries contain no data.
This happens if no process ever wrote data into the file at any byte offsetes corresponding to those
blocks and hence the block number remains at their initial value O. No disk space is wasted for
such disk blocks.
/ * Block map of logical file byte offset to file system block * /
algorithm : b map
input : (1) node
(2) byte offset
Output : (1) Block number in file system
(2) byte offset into block
(3) byte of I/O in block
(4) Read ahead block number
{
Calculate logical block number in file/from byte offset;
Calculate start byte in block for I/O;
Calculate number of bytes to copy to user;
Check if read-ahead applicable, mark inode;
determine level of indirection:
while (not at necessary level of indirection)
{calculate index into inode or indirect
block-from logical block number in file;
get disk block number from inode or indirect block:
release buffer from previous disk read, if any;
40 U NIX AND S HELL P ROGRAMMING
5.8 DIRECTORIES
Directories are the files that give the file system its hierarchical structure; they play an
important role in conversion of a file name to an inode number. A directory is a file whose data is a
sequence of entries each consisting of an inode number and the name of a file contained in the directory.
A path name is a null terminated character string divided into separate components by the
“/” character. Each component except the last must be the name of a directory but the last
component may be a non directory file. UNIX system V restricts component names to maximum
of 14 characters; with a two byte entry for the inode number, the size of a directory entry is
16 bytes.
The Kernel stores data for a directory just as it stores data for an ordinary file using the inode
structure and level of direct and indirect block.
Chapter 6
INODE A SSIGNMENT TO A NEW F ILE
The file system contains a linear list of inodes. An inode is free if its type field is zero. When
a process needs a new inode, the Kernel could theoretically search the inode list for a free node.
However, such a search would be expensive, requiring atleast one read operation for every inode.
To improve performance, the file system super block contains an array to cache the numbers of
free inodes in the file system.
For assigning new inodes. The Kernel first verifies that no other processes have locked access
to the super block free inode list. If the list of inode numbers in the super block is not empty the
Kernel assigns the next inode number, allocates a free incore inode for the newly assigned disk
inode copies the disk inode to the incore copy initilizes the fields in the inode and returns the
locked inode. It updates the disk inode to indicate that the inode is now in use. A non zero file
type field indicates that the disk inode is assigned.
Allocate inode (Assigning New Inodes)
algorithm ialloc
input : File system
output : locked inode
{
while (not done)
{
if (super block locked)
{ sleep (event : super block becomes free):
continue;
}
if (inode list in super block is empty)
{ lock super block:
get remembered inode for free inode search
search disk for free inodes until super block or no more free
inodes;
42
I NODE A SSIGNMENT TO A N EW F ILE 43
18 19 20
index
Super Block free Inode list
18 19 20
index
18 19 20
index (remembered Inode)
numbers and if it does places the inode number in the list and returns. If the list is full the Kernel
may not save the newly freed inode there. It compares the number of the freed inode with that
of the remembered inode. If the freed inode number is less than the remembered inode number,
it “remembers” the newly freed inode number, discarding the old remembered inode number from
the super block. The inode is not last because the Kernel can find it by searching the inode list
on disk. The Kernel maintains the super block list such that the last inode it dispenses from the
list is the remembered inode. There should never be free inodes whose inode number is less than
the remembered inode number, but exceptions are possible.
Process A Process B Process C
Assigns inode I : :
from super block : :
Sleeps while : :
reading inode (a) : :
: Tries to assign inode :
: from super block :
: super block empty (b) :
: Search for free inode :
: on disk, puts inode I :
: in super block (c) :
Inode I incore : :
does usual activity : :
: completes search, :
: assign another mode (d) :
: : Assign inode I from
: : super block
: : I is in use
: : Assign Another inode (e)
(a) I
empty
(b)
free inodes J I K
(c)
free inodes J I
(d)
free inodes L
(e)
Fig. 6.4 Placing free inode number into the super block
46 U NIX AND S HELL P ROGRAMMING
R.I. index
(b) Free Inodes 499
R.I. index
(c) Free Inodes 601
Another list
109
211
310
109
109
109 949
109
109
109
System calls are standard function which instruct the kernel to do some specific task.
7.1.1 Open
This system call is the first step a process must take to access the data in a file. The syntax
for this system call is:
fd = open (Pathname, flags, modes)
Pathname—is a file name. It may be absolute or relative.
Flags—Indicate the type of open such as for reading or writing.
Mode—Modes gives the file permission if the file is being created.
File descriptor—The open system calls return an integer called user file descriptor.
49
50 U NIX AND S HELL P ROGRAMMING
unlock (inode);
return (user file descriptor);
}
The Kernel searches the file system for the file name parameter. It checks permission for
opening the file after it finds the incore inode and allocates an entry in the file table for the open
file. The file table entry contains a pointer to the inode of the open file and a field that indicates
the byte offset in the file where the Kernel expects the next read or write to begin. The Kernel
initializes the offset to O during the open call, meaning that the initial read or write starts at the
begining of a file by default. The entry in the user file table points to the entry in the global file
table.
0
1
2 Count
3 2 (etc/passed)
4 count Read
5
6
7
Count Count
Rd-wrt 1 (local)
1
count 1 write
7.1.3 Read
The syntax of read system call is number = read (fd, buffer, count)
fd—fd is the file descriptor returned by the open system call.
buffer—buffer is the address of a data structure in the user process that will contain the read
data on successful completion of the call.
count—count is the number of bytes the user wants to read.
number—is the number of bytes actually read.
S YSTEM C ALLS 51
User file
Descriptor table
(Proc A) File table Inode table
0
1
2
3 Count
4 count 3 (etc/passwd)
5 1 Read
Count
RD-wrt
1
(Proc B) Count
1 (local)
0 count
1 1 Read
2
3
4
count Count
1 write 1 (private)
count
1 Read
byte count and the starting byte offset in the file. It calculates that the byte offset O is in Oth block
of the file and retrieves the entry for the Oth block in the inode. Assuming such a block exists the
Kernel reads the entire block of 1024 bytes into a buffer but copies only 20 bytes to the user address
lilbuf. It increments the u-area byte offset to 20 and decrement the count of data to read to O. Since
the read has been satisfied, the Kernel resets the file table offset to 20, so that subsequent reads
on the file descriptor will begin at byte 20 in the file and the system call returns the number of
bytes actually read 20.
For second read call determine that ‘fd’ is legal. It stares in the u-area the user address bigbuf;
the number of bytes the process wants to read, 1024 and the starting offset in the file 20, taken
from the file table. It converts the file offset to the correct disk block as above and reads the block.
The Kernel cannot satisfy the read request entirely from the buffer, because only 1004 out of the
1024 bytes for this request are in the buffer. So it copies the last 1004 bytes from the buffer into the user
data structure bigbuf and updates the parameters in the u-area to indicate that the next iteration of
the read loop starts at byte 1024 in the file, that the data should be copied to byte position 1004
in bigbuf and that the number of bytes to satisfy the read request is 20.
Then Kernel looks up the second direct block number in the inode and find the correct disk
block to read. It copies 20 bytes from the buffer to the correct address in the user process. Before
leaving the system call, the Kernel sets the offset field in the file table entry to 1044, the byte offset
that should be accessed next. For last system call start the reading at byte 1044 in file.
A Reader and A Writer Process
/ * Process A * /
main ()
{ int fd ; charbuf [512];
fd = open (“/etc/-passwd”, O-RDONLY);
read (fd, buf, size of (buf)); / * read 1 * /
read (fd, buf, size of (buf)); /* read 2 * /
}
/ * process B * /
main ()
{ int fd, i
char buf [512];
for (i = 0; i < size of (buf); i + t)
buf (i) = ‘a’;
fd = open (“/etc/passwd”, O_WRONLY);
write (fd, buf, size of (buf)); / * write 1 * /
write (fd, buf, size of (buf)); / * write 2 * /
}
When a process invokes the read system call the Kernel locks the inode for the duration of
call. Afterwards, it could go to sleep reading a buffer associated with data or with indirect blocks
of the inode. If another process were allowed to change file while the first process was sleeping,
54 U NIX AND S HELL P ROGRAMMING
read could return inconsistent data. Hence the inode is left locked for the duration of the read call,
affording the process a consistent view of the file as it existed at the start of the call.
The Kernel can preempt a reading process between system calls in user mode and schedule
other process to run. Since the inode is unlocked at the end of a system call, nothing prevents other
processes from accessing the file and changing its contents. It would be unfair for the system to
keep an inode locked from the time a process opened the file until it closed the file because one
process could keep a file open and thus prevent other processes from ever accessing it. To avoid
such problem the Kernel unlocks the inode at the end of each system call that user it. If another
process changes the file between the two read system calls by the first process, the first process
may read unexpected data, but the Kernel data structures are consistent.
Write → Syntax
Number = Write (fd, buffer, count)
For writing a regular file. If the file does not contain a block that corresponds to the byte offset
to be written, the Kernel allocates a new block and assigns the blocks number to the correct
position in the inode table of contents. If the byte offset is that of an indirect block, the Kernel may
have to allocate several blocks for use as indirect blocks and data blocks. The inode is locked for
duration of write, because the Kernel may change the inode when allocating new blocks allowing
other processes access to the file corrupted the inode if several process allocates block simulta-
neously for the same byte offset. When the write is complete, the Kernel updates the file size entry
in the inode if the file has grown larger.
7.4 CLOSE
A process closes an open file when it no longer wants to access it. Syntax.
close (Fd);
Tables after closing a file
Proc A
User file
Descriptor table File table Inode table
0
1
2
3 Count
4 2 (1 etc/passwd)
Count 1
5
6
7
8
Count 1
Proc B Count
1 (local)
0
1 Count 0
2
3 NULL
4 NULL
5 Count 1
6
7
8
Count
Count 0
0 (private)
Fig. 7.3
The Kernel does the close operation by manipulating the file descriptor and the correspond-
ing file table and inode table entries. If the reference count of file table entries is greater than I
because of dup or tork calls then other user file descriptors reference the file table entry, the Kernel
decrements the count and the close completes. If the file table reference count is 1, the Kernel frees
the entry and releases the in-core inode.
If other processes still reference the inode, the Kernel decrements the inode reference count but
leaves it allocated, otherwise the inode is free for reallocation because its reference count is 0.
56 U NIX AND S HELL P ROGRAMMING
dev → dev specifies the major and minor device number for block and character special files.
algorithm : make new node
inputs : node (file name)
file type
permissions
major, minor device number
output : none
{ if (new node not named pipe and user not super user)
return (error);
get inode of parent of new node;
if (new node already exists)
{ release parent inode;
return (error); }
assign free inode from file system for new node;
create new directory entry in parent directory;
include new node name and newly assigned inode number;
release parent directory inode;
if (new node is block or character special files)
write major, minor numbers into inode structure;
release new node inode;
}
Change Directory → Chdir (pathname);
Change Root → Chroot (pathname);
Change Owner → Chown (pathname, owner, group);
Change Mode → Chmod (pathname, mode)
7.8 PIPES
Pipes allow transfer of data between processes in FIFO manner and they also allow synchro-
nization of process execution. Their implementation allows processes to communicate even though
they do not know what processes are on the other end of pipe. The traditional implementation
of pipes uses the file system for data storage. There are two kinds of pipes.
(i) Named pipe → Process use open system call for this pipe.
(ii) Unnamed pipe → Pipe() system call to create an unnamed pipe.
Only related processes, descendants of a process that issued the pipe call can share access
to unnamed pipe.
Proc A Can’t share pipe
Calls pipe
Proc B Proc C
Proc D Proc E
(ii) Opening a named pipe → A named pipe is a file whose semantics are the same as
those of an unnamed pipe, except that it has a directory entry and is accessed by a
pathname. Processes open named pipes in the same way that they open regular files
and hence processes that are not closely related to communicate. Named pipes perma-
nently exist in the file system hierarchy but unnamed pipes are transient. When all
process finish using the pipe, the Kernel reclaims its inode.
A process that opens the named pipe for reading will sleep until another process opens
the named pipe for writing and vice versa.
(iii) Reading and writing pipes → A pipe should be viewed as if processes write into one
end of the pipe and read from other end. The number of processes reading from a pipe
do not necessarily equal the number of processes writing the pipe, if the number of
readers or writers is greater than 1. They must coordinate use of the pipe with other
mechanisms.
The difference between storage allocation for a pipe and a regular file is that a pipe
uses only the direct blocks of the inode for greater efficiency although this places a
limit on how much data a pipe can hold at a time. The Kernel manipulates the direct
blocks of the inode as a circular queue.
0 1 2 3 4 5 6 7 8 9
7.10 DUP
The dup() system calls copies a file descriptor into the first free slot of the user file descriptor
table, returning the new file descriptor to user. It works for all file types. Syntax.
newfd = dup (fd)
main ()
{ int i, j; Char buf [512], buf 2 [512];
i = open ("/etc/password," O_RDONLY);
j = dup (i);
read (i, buf1, size of (buf1));
read (), buf2, size of (buf2));
close (i);
read (i, buf2, size of (buf2));
}
S YSTEM C ALLS 61
0
1
2
3 Count 2 /d/H/
4 Count 2
5
6
7
8
Count 1 Count 1 f1
Count 1
Association of the mount point inode and the root inode of the mounted file system set up
during the mount system call, allows the Kernel to traverse the file system hierarchy gracefully
without special user knowledge.
Mounted on inode
marked as mount
point reference
count 1 Buffer
Super block
mounted on inode
Device inode not
root inode
in use reference
into
Root inode of
mounts File
system reference
count 1
}
release working inode;
working inode = inode for new inode number;
}
else
return (no inode);
} return (working inode);
}
7.18 LINK
Link system call links a file to a new name in the file system directory structure, creating a
new directory entry for an existing inode. Syntax.
link (source file name, target file name);
Source file name → is the name of an existing file
Target file name → is the new (additional) name the file will have after completion of link call.
usr
src include
sys
inode.h test.h
: Get inode e
: Release e
: Get inode f
: Get inode a
: Release a
: :
: :
: Try to get inode d
: SLEEP - process A locked inode
Get inode e
Release e
Try to get inode f
SLEEP process B locked inode
Time Deadlock
68 U NIX AND S HELL P ROGRAMMING
In this example process A would be holding a locked inode that process B wants and
process B would be holding a locked inode that process A wants. The Kernel avoids this deadlock
condition by releasing the source file inode after incrementing its link count. Since the first
resource (inode) is free when accessing the next resource, no deadlock can occur.
This example showed how two process could deadlock each other if the inode lock were not
released. A single process could also deadlock itself. If it executed.
7.19 UNLINK
The unlink() system call removes a directory entry for a file. Syntax.
unlink (pathname);
If the file being unlinked is the last link of the file the Kernel eventually free its data blocks
algorithm : unlink
input : file name
output : none
{ get parent inode of file to be unlinked:
if (last component of file name is “.”);
increment inode reference count;
else
get inode of file to be unlinked :
if (file is directory but user is not super user)
{ release inodes;
return (error);
}
if (shared text file and link count currently 1)
remove from reqion table;
write parent directory : zero inode number of unlinked file;
release inode parent directory;
decrement file link count;
release file inode;
}
S YSTEM C ALLS 69
II Race Condition
Race conditions abound in the unlink system call, particularly when unlinking directories.
Proc A Proc B Proc C
: Unlink file ‘C’ :
: Find inode for ‘C’ locked :
: Sleeps :
: : :
Search directory ‘b’ for ‘c’ : :
Get inode no. for ‘c’ : :
Find inode for ‘c’ locked : :
Sleeps : :
: : :
: Wakes up and ‘c’ tree :
: unlink ‘c’, :
: old inode tree if :
: link count O :
: : :
: : Assign inode to new file ‘n’
: : Happen to assign
: : old inode for ‘c’
: : Eventually release
: : inode ‘n’ lock
Wakes up and old ‘c’ inode :
tree (now n) :
Get inode for ‘n’ :
Search ‘n’ for name ‘d’ :
Chapter 8
STRUCTURE OF A P ROCESS
The Kernel contains a process table with an entry that describes the state of every active
process in the system. The u-area contains additional information that controls the operation of
a process. The process table entry and the u-area are part of the context of a process. The aspect
of the process context that most visibly distinguishes it from the context of another process is of
course, the contents of its address space.
70
S TRUCTURE OF A P ROCESS 71
When the process completes the system call it may move to the state “user running” where
it executes in user mode. After a period of time, the system clock may interrupt the processor and
the process enters state “preempted” and the other process executes.
User Running
rn ll 1
tu ca upt
pt re s Retur
ru t sy ter
r n
t er rrup to use
r
in te in
rn
tu
in
re
Kernel Running
exit 2 7
Preempt
9 le Preempted
du
che
res cess
p
Zombie p ro
e
sle
run
Ready to
ry
4 3 in memo
wakeup
mem
Asleep in enough
memory created
swap swap
out 8
in Fork
swap
out not enough mem
(swapping system only)
6 wakeup 5
sleep swapped Ready to run,
swapped
Fig. 8.1 Process state transition diagram
When a process executes a system call; it leaves the state “user running” and enters the state
“Kernel running”. Suppose the system call requires I/O from the disk and the process must wait
for the I/O to complete. It enters the state “asleep in memory” putting itself to sleep until its noti-
fied that the I/O has completed. When the I/O later completes, the H/W interrupts the CPU, and
the interrupt handler awakens the process causing it to enter the state “ready to run in memory”.
When a process completes, it invokes the exit system call, thus entering the states “Kernel
running” and finally the “zombie” state.
(iv) Process identifiers (PID) specifies the relationship of processes to each other.
(v) The process table entry contains an event descriptor when the process is in the “sleep”
state.
(vi) Scheduling parameters allow the Kernel to determine the order in which processes
move to the states “Kernel running” and “User Running”.
(vii) A signal field enumerates the signals sent to a process but not yet handled.
(viii) Various timers gives process execution time and Kernel resource utilization, used for
process accounting and for the calculation of process scheduling priority. One field is
a user-set timer used to send an alarm signal to a process.
8.2.2 u-Area
The u-area contains fields that need to be accessible only to the running process. Therefore,
the Kernel allocates space for u-area only when creating a process. It does not need u-areas for
process table entries that do not have process. The fields in u-area are:
(i) A pointer to the process table identifies the entry that corresponds to u-area.
(ii) The real and effective user—ID determine various privileges allowed the process such
as file access rights.
(iii) Timer fields record the time the process spent executing in user mode and in Kernel
mode.
(iv) An array indicates how the process wishes to react to signals.
(v) The control terminal field identifies the “login terminal” associated with process if
exists.
(vi) An error field records error encountered during a system call.
(vii) A return value field contains result of system calls.
(viii) I/O parameters describe the amount of data to transfer, the address of the source (or
target) data array in user space, file offset for I/O and so on.
(ix) The current directory and current root describe file system environment of the process.
(x) The user-file-descriptor table record the file the process has open.
(xi) Limit field restrict the size of a process and the size of a file it can write.
(xii) A permission mode field masks mode setting on files the process creates.
The compiler therefore generates addresses for virtual address space with a given address
range, and the machines memory management unit translates the virtual address generated by
the compiler into address locations in physical memory. The compiler does not know where in
memory the Kernel will later load the program for execution. In fact several copies of a program
can coexist in memory: All executes using the same virtual addresses but reference different
physical addresses.
(i) Region: The Kernel divides the virtual address space of a process into logical regions.
A region is a contiguous area of the virtual address space of a process that can be
treated as a distinct object to be shared or protected. Thus text, data and stack usually
forms seperate regions of a process. Several process can share a region.
Text 8K b
Text 4K
e
Proc B Data 8K
Stack 32K d
986K 2
897K 3
Assume page size is 1K bytes and process want to access V.M.A. (Virtual Memory Address)
68432. The P region entries shows that the address is in stack region starting from virtual address
64K (65,536).
68432 – 6536 = 2896
Since each page is 1K so the address is contained at byte offset 848 in page 2 of region located
at physical address 986K.
We use following memory model in discussing memory management. The system contains
a set of memory management register triples.
(i) First Register in triples contains the address of a page table in physical memory.
(ii) Second Register contains the first virtual address mapped via the triple.
(iii) Third Register contains control information such as the number of pages in the page
table and page access permissions (read, write, execute).
(iv) Layout of the Kernel → The virtual memory mapping associated with the Kernel is
independent of all processes. The code and data for the Kernel resides in the system
permanently, and all processes share it. In many machines the virtual address space
of a process is divided into several classes, including system and user and each class
has its own page tables. When executing in Kernel mode the system permits access to
Kernel addresses but it prohibits such access when executing in user mode.
STRUCTURE OF A P ROCESS 75
Address of Virtual No. of page
page Table addr in page table
Kernel Reg Tn·pre 1 0
""'
""'""''"~
2 -.....__ 1M
3 ~M
1
User Reg Tn'pie
I l \ '
I \ \
""' ""'
2 ~
~
3I \ \
~~
856K
I \ \ ~ 747K 556K OK
~ 128K 256K
917K 950K 997K 4K 97K 292K
Reg Triple 1
Reg Triple 2
(UArea)Reg Triple 3
-- 2M 4
ProcA B c
Fig. 8.5 Memory rnap of u-area in the kernel
76 U NIX AND S HELL P ROGRAMMING
8.4.1 Sleep
When a process goes to sleep, it typically does so during execution of a system call. The
process enters the Kernel (context layer 1) when it executes on operating system trap and goes
to sleep awaiting a resource. When the process goes to sleep, it does a context switch, pushing
its current context layer and executing in Kernel context layer 2. Process also goes to sleep they
incur page faults as a result of accessing virtual addresses that are not physically loaded; they
sleep while the Kernel reads, the contents of the pages.
proc b addr A
proc d
proc e
proc g
Waiting for terminal I/P addr C
proc h
Fig. 8.8 Processes sleeping on events and events mapping into address
algorithm : sleep
input : (1) Sleep address (2) priority.
output : 1 if process awakened as a result of a signal that
process catches, jump algorithm if process awakened as a result of a
signal that it does not catch, 0 otherwise;
{ raise processor execution level to block all interrupt;
set process state to sleep;
put process on sleep hash queue, based on sleep address;
save sleep address in process table slot;
set process priority level to input priority;
if (process sleep is NOT interruptible)
{ do context switch;
reset processor priority level to allow interrupts
as when process went to sleep;
return (0); }
if (no signal pending against process)
{ do context switch;
if (no signal pending against process)
{ reset processor priority level to what it was
S TRUCTURE OF A P ROCESS 79
"
of a process. The region may be a newly 0 9
allocated region or an existing region that the Entry "\..
process will share with other processes. The for
Kernel allocates a free p region entry, sets its text empty
IV. Changing the size of a region: A process may expand or contract its virtual address space
with the sbrk system call. Similarly the stack of a process automatically expands accord-
ing to the depth of nested procedure calls. Internally, the Kernel invokes the algorithm
growreg to change the size of a region. When a region expands, the Kernel makes sure
that the virtual addresses of the expanded region do not overlap those of another region
and that the growth of the region does not cause the process size to become greater than
the maximum allowed virtual memory space. The Kernel never invokes growreg to
increase the size of a shared region that is already to several processes, therefore, it does
not have to worry about increasing the size of a region for one process and causing
another process to grow beyond the system limit for process size.
Fig. 8.11
82 U NIX AND S HELL P ROGRAMMING
VI. Freeing a region: When a region is no longer attached to any processes, the Kernel can
free the region and return it to the list of free regions. The Kernel release physical
resources associated with the region, such as page tables and memory pages.
VII. Detaching a region from a process: The Kernel detaches regions in the exec, exit and shmdt
(detach shared memory) system calls. It updates the pregion entry and revers the
connection to physical memory by invalidating the associated memory management
register triple. The Kernel decrements the region reference count and the size field in
the process table entry according to the size of the region.
P ROCESS C ONTROL 83
Chapter 9
PROCESS CONTROL
The fork system call creates a new process, the exit call terminates process, execution and
the wait call allows a parent process to synchronize its execution with the exit of a child process.
Signals inform processes of a synchronous events. Because the Kernel synchronizes execution of
exit and wait via signals.
83
84 UNIX AND S HELL P ROGRAMMING
Parent Process
File Table
U Area
Per process
Region Table
Parent
Data
Parent
user
Stack
Kernel stack
Shared
text
Inode Table
U Area
Per process
Region Table
Kernel stack
Child Process
Fig. 9.1 Fork creating a new process context
euid = geteuid () ;
Print f ("uid % d euid % d", uid, euid);
fdmjb = open ("mjb", O_RDONLY);
fdmjb1 = open ("mjb1", O_RDONLY);
print f ("fdmjb = % d, fdmjb1 = % d", fdmjb, fdmbj1);
setujd (uid);
print f ("After setuid(%d): uid % d euid % d", uid, getuid(),
geteuid());
fdmjb = open ("mjb", O_RDONLY);
fdmjb1 = open ("mjb1", O_RDONLY);
print f ("fdmjb % d fdmjb1 % d", fdmjb, fdmjb1);
setuid (euid);
print f ("after setuid (%d): uid% d euid % d", euid, getuid(),
geteuid()
}
Suppose the executable file produced by compiling the program has owner “mjb1” (ID-8319)
its setuid bit on, and all users have permission to execute it. Assumes that user “mjb” (ID-5088)
and “mjb1” own the files of their respective names and that both files have read-only permission
for their owners. User “mjb” sees following output when executing the program:
uid 5088 euid 8319
fdmjb – 1 fdmjb1 3
after setuid (5088) : uid 5088 euid 5088
fdmjb 4 fdmjbl – 1
after setuid (8319) : uid 5088 euid 8319
user “mjb1” sees the following output:
uid 8319 euid 8319
fdmjb – 1 fdmjb1 3
after setuid (8319) : uid 8319 euid 8319
fdmjb – 1 fdmjb1 4
after setuid (8319) : uid 8319 euid 8319
increment: increment changes the current break value by the specified number of bytes.
oddendds: is the break value before the call.
Sbrk is a C library routine that calls brk.
algorithm : brk
input : new break value;
output : old break value;
{ lock process data region;
if (region size increasing)
if (new region size is illegal)
{ unlock data region;
return (error);
}
change region size;
zero out addresses in new data space;
unlock process data region;
}
Shell wait
exit
WC
read
ls – l write
Chapter 10
INTER -PROCESS COMMUNICATION
IPC mechanism allow arbitrary process to exchange data and synchronize execution.
We have already considered several forms of interprocess communication, such as pipes,
named pipes and signals. Pipes suffer from the drawback that they are known only to processes
which are descendants of the process that invoke the pipe system call: unrelated process cannot
communicate via pipes. Although named pipe allow unrelated process to communicate, they
cannot generally be used across a network nor do they readily lend themselves to setting up
multiple communications paths for different sets of communicating process: its impossible to
multiplex a named pipe to provide private channels for pairs of communicating process. Arbitrary
process can also communicate by sending signals via the kill system call, but the message consist
only of the signal number.
90
I NTER -P R O C E SS C OMMUNICATION 91
child process and puts it into the “ready-to-run” state then sleeps until the child responds. When
the child resumes execution (in Kernel mode), it does the appropriate trace command, writes it
reply into the trace data structure, then awakens the debugger. Depending on the command type,
the child may reenter the trace state and wait for a new command or return from handling signals
and resume execution. When the debugger resumes execution, the Kernel saves the “return value”
supplied by the traced process, unlocks the traced data structure, and returns to the user.
If the debugger process is not sleeping in the wait system call when the child enters the trace
state, it will not discover its traced child until it calls wait.
V. Each IPC entry has a permissions structure that includes the user-ID and group-ID of
the process that created an entry.
VI. Each entry contains other status information such as the process-ID of the process to
update the entry and the time of last access or update.
VII. Each mechanism contains a “control” system call to query status of an entry to set
status information, or to remove the entry from the system.
10.2.1 Messages
Messages allow processes to send formatted data streams to arbitrary process. There are four
system calls for messages:
(a) msgget—It returns a message descriptor that designates a message queue for use in
other system calls. Syntax.
msgqid = msgget (Key, flag);
msgquid—is the descriptor returned by the call.
key—Its a numeric key, which is its user, chosen name. The Kernel searches the proper
table for an entry named by the key.
flag—It specifies the action the Kernel should take.
msgqid as an index into an array of message queue headers. The queue structure
contains the following fields.
—Pointers to the first and last messages on a linked list.
—The number of messages and the total number of data bytes on the linked list.
—The maximum number of bytes of data that can be on the linked list.
—The process IDs of the last processes to send and receive messages.
—Time stamps of the last msgind, msgrcvs and msgctl operations.
When a user calls msgget to create a new descriptor, the Kernel searches the array of message
queues to see if one exists with the given key. If there is no entry for the specified key, the Kernel
allocates a new queue structure, initialize it, and returns an identifier to user. Otherwise it checks
permissions and returns.
msgsnd (msgquid, msg, count, flag);
msgquid—is the descriptor of message queue returned by a msgget call.
msg—is a pointer to a structure consisting of a user-chosen integer type and a character array.
count—count gives the size of data array.
flag—flag specifies the action the Kernel should take if it runs out of internal buffer space.
The Kernel checks that the sending process has write permission for the message descriptor,
that the message length does not exceed the system limit that the message queue does not contain too
many bytes, and that the message type is +ve integer. If all tests succeed, the Kernel allocates space for
the message from a message map and copies the data from user space. The Kernel allocates a message
header and puts it on the end of linked list of message headers for the message queue.
I NTER -P R O C E SS C OMMUNICATION 93
Queue
Headers Message Headers Data Area
entry in the shared memory table and sets a flag there to indicate that no memory is
associated with the region. It allocates memory for the region only when a process
attaches the region to its address space.
(after
shmat)
10.2.3 Semaphores
The semaphore system calls allow processes to synchronize execution by doing a set of
operations automatically on a set of semaphores. Before the implementation of semaphores, a
process would create a lock file with the creat system call if it wanted to lock a resource. The create
fails if the file already exists, and the process would assume that another process had the resource
locked.
Disadvantages—The process does not know when to try again and lock files may inadvert-
ently be left behind when the system crashes or is reboated.
P : if (S > 0)
S––;
V: if (S < 0)
S++:
S is a semaphore variable.
A semaphore in UNIX system V consists of the following elements.
(i) The value of semaphore.
(ii) The process ID of the last process to manipulate the semaphore.
(iii) The number of processes waiting for the semaphore value to increase.
(iv) The number of processes waiting for the semaphore value to equal 0.
Semaphore
Table Semaphore Arrays
0 1 2 3 4 5 6
0 1 2
0 1 2
semget—semget to create and gain access to a set of semaphores. It creates an array of semaphores.
Syntax.
id = semget (Key, count, flag);
The Kernel allocates an entry that points to an array of semaphore structures with count
elements. The entry also specifies the number of semaphores in the array, the time of last semap
call and the time of last semctl call.
96 U NIX AND S HELL P ROGRAMMING
Chapter 11
SOCKETS
Furthermore, the methods may not allow processes to communicate with other processes on
the same m/c because they assume existence of a server process that sleeps in a driver open or
read system call. To provide common methods for IPC and to allow use of sophisticated n/w
protocols, the BSD system provides a mechanism known as sockets.
Client Process Server Process
IP IP
N/W
97
98 U NIX AND S HELL P ROGRAMMING
Socket that share common communication property such as maming convention and proto-
col address format are grouped into domains. The 4.2.B.S.D system supports the “UNIX system
Domain” for process communicating on one m/c. And “Internet Domain” for process communi-
cating across a n/w using the DARPA (Defence Advanced Research Project Agency).
Each socket has a type:
(i) Virtual circuit—V.C. Allows sequenced reliable delivery of data. They are expensive.
(ii) Datagram—Datagrams do not guarantee sequenced, reliable or unduplicated delivery
but they are less expensive because they do not require expensive setup operations.
The socket mechanism contains several system calls:
(i) Socket system calls establishes the end point of communication link. Syntax.
sd = Socket (format, type, protocol)
sd = Socket Descriptor
format = Specifies the communication domain.
type—Indicates type of communication over socket.
protocol—Indicates a particular protocols to control the communication
/ * close system call closes the socket * /
(ii) bind System calls associate a name with socket descriptor. Syntax
bind (sd, address, length)
address—e.g., file name in UNIX system domain point to a structure that specifies an
identifier specific to the communication domain and protocol specified in the socket
system call.
Length—Length is the length of the address structure, without this parameter the Kernel
would not know how long the address is because it can vary across domains and
protocols.
Server processes bind addresses to sockets and “advertise” their names to identify
themselves to client process.
(iii) Connect—It request that the Kernel make a connection to an existing socket connect (sd,
address, length)
address—Address of the target socket that will form the other end of the communica-
tions line.
Client Process
Server Process
(iv) The listen system call specify the maximum queue length.
listen (sd, q length)
q length—is the maximum number of outstanding request.
(v) Accept—It receives incoming request for a connection to a server process:
nsd = accept (sd, address, addrlen)
address—Points to a user data array that the Kernel fills with the return address of the
connecting client.
addrlen—Indicates the size of user array.
(vi) Send—
Count = send (sd, msg, length, flags)
Count — Number of bytes actually sent
msg—Pointer to the data being sent
length—Length of message
flag—This parameter may be set to the value SOF_OOB to send data “out-of-band”,
meaning that data being set is not considered part of the regular sequence of data
exchange b/w the communicating processes.
(vii) recv—
Count = recv (sd, buf, length, flags)
buf—buf is the data array for incoming data.
Flags—Flags can be set to peek at an incoming message and examine its contents
without removing it from the queue or to receive “out of band” data.
(viii) Shutdown—It close a socket connection shutdown (sd, mode)
mode—Indicates whether the sending side, the receiving side or both sides allow no
longer data transmission.
Memory Peripherals
“wakes up” a slave processor from an idle state once a second. The slave processor schedules
the highest periority process that need not run one the master processor.
The only chance for corruption of Kernel data structures comes in the schedular algorithm,
because it does not protect against having a process selected for execution on two processors.
For instance, if a configuration consists of a master processor and two slaves, it is possible
that the two slave processor find one process in user mode ready for execution. If both processors
were to schedule the process simultaneously, they would read, write and corrupt its address
space.
The system can avoid this problem:
(i) The master can specify the slave processor on which the process should execute,
permitting more than one process to be assigned to a processor. One processor may
have lots of processes assigned to it whereas others are idle. The master Kernel would
have to distribute the process load b/w the processors.
(ii) The Kernel can allow only one processor to execute the scheduling loop at a time.
Danger of Corruption
Time
Fig. 11.4 Race condition in sleep-locks on multiprocessor
102 U NIX AND S HELL P ROGRAMMING
As illustrated in above figure. Suppose a lock is free and two processes on two processors
simultaneously attempt to test and set it. They find that the lock is free at time t, set it, enter the
critical region, and may corrupt kernel data structures. To prevent this situation the locking
primitives must be atomic. i.e. The actions of testing the status of the lock and setting the lock must
be done as a single, indivisible operation, such tha only one process can manipulate that lock at
a time.
Chapter 12
UNIX COMMAND
12.1 INTRODUCTION TO SHELL
The Shell is a program that provides an interpreter and interface between the user and the
UNIX Operating System. It executes commands that are read either from a terminal or from a file.
Files containing commands may be created, allowing users to build their own commands. In this
manner, users may tailor UNIX to their individual requirements and style.
There are a number of different Shells. Each provides a slightly different interface between
the user and the UNIX Operating System. There are three important types of shell in UNIX these
are:
(i) Bourne Shell
(ii) C-Shell
(iii) Korn Shell
There are other shells that are less widely used and not available on many machines. A
command issued by a user may be run in the present shell, or the shell itself may start up another
copy of itself in which to run that command. In this way, a user may run several commands at
the same time. A secondary shell is called a sub-shell.
When a user logs onto the system, a shell is automatically started. This will monitor the
user’s terminal, waiting for the issue of any commands. The type of shell used is stored in a file
called passwd in the subdirectory etc. Any other shell may be run as a sub-shell by issuing it as
a command. For example, /usr/bin/ksh will run a Korn shell. The original shell will still be
running—in background mode—until the Korn shell is terminated.
There are three prompt in UNIX/Linux:
(i) # It is used by super user or System Administrator.
(ii) $ Its used by ordinary user.
(iii) % Its used by ordinary user. But when work with c-shell
Some super user related command.
1. Add a new user
103
104 U NIX AND S HELL P ROGRAMMING
Syntax:
useradd [OPTIONS] [USERNAME]
2. Change or assign password for a user
#passwd <username>
3. ADD a group
#groupadd <groupname>
4. DELETE
(i) USER #userdel <username>
(ii) #groupdel <groupname>
5. MODIFIED
(i) GROUP #groupmod <groupname>
(ii) USER #usermod <username>
Password Administration
The following example sets the minimum and maximum number of weeks for change of
password for user Anoop.
#passwd –n 12 ANOOP /* Minimum 12 weeks*/
#passwd –x 12 ANOOP /* Maximum 12 weeks */
C om m and
ls – It w ill list d irectory and file. Syntax:
ls [op tions] filenam e.
O ptions
-l L ong listing file and d irectories. W ith this op tion the com m and w ill d isp lay follow ing
typ es of inform ation.
File-type,Perm ission,N u m ber-of-L ink,O w ner,G roup-O w ner,File-size,File-m od ification tim e,
File-nam e.
-a Show all files (Inclu d ing hid d en files) -i D isp lay inod e nu m ber.
-A A ll files bu t not . and ..
$ ls ab* / *Start w ith ab*/
$ls ?ab / *Start w ith a single character*/
$ ls[abc]* / *First letter m u st any one of the letter given in []*/
$ ls[abc]* / *N ot start w ith abc*/
$ls [a-d ][c-m ][2-9]?? / *L ist all files w hose first letter a to d second letter c to m and third
2 to 9 and any tw o other character */
$ls .txt /*L ist all files w ith extension txt/
File T yp e M ean in g
- O rd inary File
d D irectory File
p N am ed P ip e
s Sem ap hore
Permission Weight
Read (r) 4
Write (w) 2
Execute (x) 1
A file consist of rwxrwxrwx in which first rwx for owner second for group owner and last
three for other owner.
Change permission of a file → File permission can be change either by owner of the file or
super user not by the other user.
Chmod (Change mode of a file) By this command we can change permission of a file.
Syntax→
$ chmod [WHO] [+/–/=] [Permission] <File name>
WHO means u → For user, g → For group user, and o → for other user + → add permission,
— → Remove permission , = → Instruct chmod to add the specified permission and take away
all others, if present.
$ chmod 777<File name> /*It will change all the permission of a file*/
Create Directory → A directory created by mkdir command. SYNTAX: $ Mkdir <Directory
name>
Option → -m → Set permission mode.
-v → Print a message for each created directory.
-p → Make parent directory as needed.
cat command → Create/display/append a file
$ cat >f1 /* Create the file f1 if f1 is an existing file then overwrite the contents of f1*/
$ cat >>f1 /* Append content in f1 if f1 does not exist then create it*/
$ cat f1 /* Open f1 if f1 not exist then report error*/
$cat f1>f2 /* Copy f1 to f2 if f2 exist then overwrite the contents*/
$ cat f1>>f2 /* Appends the content of f1 into f2 if f2 not exist then create it */
$ cat >a* /* Error (Try to create a file a*)*/
$cat >a\* /* The file a* will be created*/
Directory File →
$rm –r <directory name> /*Remove a non-empty directory also*/
$rmdir <directory name> /*Remove only if directory is empty */
Date Command → date command print current date and time and date in a variety of
formats. Its also used by system administrator to set system time.
$ date
Day_of_week Month Date Time IST Year /*By default printed */
Options →
Options →
–c → Print the byte count, –m → Print the character count
–l → Print new line count –L → Print the length of longest line
–w → Print word count
banner → Print large banner on printer (Not use in Linux)
pwd → Print name of current working directory
who → Show who is logged on
$who [OPTIONS] [FILE | ARG1 ARG2 ]
Options →
–H → Print line of column headings
–i,–u → Add user idle time as HOUR: MINUTE :… or old
–m → Only hostname and user associated with stdin
–q → Count all login names and number of users log on.
finger → It will display following type of information about all users who is currently log on.
Login-name tty idle login-time office office-phone
echo → Display a line of text
Options →
–n → Do not output the trailing new line
–E → Disable interpretation of those sequence in string’s.
–e → Enable those sequence of the backslash escaped character listed below:
\NNN → The character whose ASCII code is NNN (Octal)
\\ → Backlash, \v → Vertical tab, \r → carriage return \a → Alert
\b → Backspace, \n → New line, \t → Horizontal tab
cp → Copy source to destination.
cp [OPTIONS]source-pathname destination-pathname
Options →
–i → Interactive copy, –l → Link instead of copy, –v → Verbose
–R → Copy directory recursively, –H → Follow command line symbolic link
–d → Never follow symbolic link, –r → Copy recursively non directory as files
–u → Update copy only when source file is newer then destination file or destination file
missing.
mv → Move (Rename) fil. Syntax is same as in cp command
Options →
–f → Never prompt before overwriting, –i → Interactive
file → Knowing file type. Syntax:
file [OPTIONS] [Name-of-file]
$ file * /*Display all file type in current directory*/
110 U NIX AND S HELL P ROGRAMMING
Standard function
length(exp ression) R etu rn length,
read () read a nu m ber from stand ard inp u t,
scale(exp ression) T he valu e of this fu nction is the nu m ber of d igits after d ecim al p oint in
the exp ression.
sqrt(exp ression) T he squ are root of exp ression,
p rint (list) P rovid e another m ethod of ou tp u t
p rint “\ n A N O O P C H A T U R V E D I”
{Statem ent list} It allow s m u ltip le statem ents to be grou p ed together for execu tion.
if (exp ression ) stat1 [else stat2]
w hile(exp ression) statem ent,
U NIX C OMMAND 111
$split –5 f1 /*Split each file into 5 lines, it creates a group of files xaa xab .. then xba ..*/
cmp: Comparing two files →
$cmp f1 f2/*f1, f2 differ byte 15 line 2.*/
–l →(list) Gives detail list of byte number and differing byte in octal for each character that
differ in both files.
comm → Finding what is common.
–1 → Suppress lines unique to left file,
–2 → Suppress lines unique to right file
–3 → Suppress lines that appear in both files.
diff → Find difference between two files.
–a → Treat all files as text and compare them line by line even if they do not seen to be text,
–B → Ignore changes that just insert or delete blank lines,
–b → Ignore changes in amount of white space.
$diff f1 f2
0a1 /*Append after line 0 of first, */ >abc // File this line
2c3,4 /*Change line two in first file */ <cde //Replace this line with
>cde
>dfg // These two lines
4d5 //Delete line 4 of first file, <ceo // Containing this line
Pattern Matching →
* → Zero or more character ? → A single character
When wild –card lose there meaning
‘*’ , ‘?’ → Inside the class [*a*], ‘!’ and – → Outside the class ![x-z]x-[y-z]
Redirection →
1. Standard input: The command consider its own input as a stream. This stream can
come from
(i) The keyword. This is the default source,
(ii) A file (Using feature called redirection)
(iii) Another program (Using pipeline concept).
2. Standard output: It also has three similar destination.
(i) It can directed to the terminal,
(ii) It can directed to a file.
(iii) It can serve as input to other program.
3. Standard Error: It includes all error message written to the terminal. This output may
be generated either by the command or by the shell, but in either case the default case
is terminal like standard output it can also assigned to a file. 2 is used for standard
error if a file ‘aa’ does not exist then
U NIX C OMMAND 113
1. System Variable →
(i) The set display a complete list of these variable.
(ii) PATH → PATH is variable that instruct the shell about the route it should follow to
locate any executable commands.
(iii) HOME → When you login UNIX place you in a directory called home or login
directory.
(iv) IFS → Ifs contains a string of characters that are used as word separators in the
command line.
(v) MAIL → MAIL determine where all incoming mail addressed to this user to be stored.
(vi) PS1 & PS2 → The shell has two prompt in PS1 and PS2. The primary prompt string
in PS1 as you see ($) and in PS2 (>) how a milt line command makes the shell respond
with a >.
If $PS1=”ANOOP”, // So now prompt ANOOP in place of $sign.
(vii) SHELL → SHELL determine the type of shell that a user sees on logging in.
(viii) TERM → Indicate terminal type being used.
(ix) LOGNAME → Show your log name.
(x) .profile → The script executed during login time.
(xi) sty → Setting terminal characteristics. (–a → Display all current setting),
$sty –echo //Off $sty echo //On
(xii) intr → Changing the interrupt key, $ stty \^c
(xiii) eof → Changing End-of-file key $ stty eof \^a
Aliases → Aliases are listed when alias is used without argument.
$ alias l=’ls –l’ and $unalias
Vi Editor → vi editor is an text editor. Modes of vi editor:
114 U NIX AND S HELL P ROGRAMMING
Command mode
i,I,a,A,o,O,r,R,s,S <Enter>
<Esc> :
(i) Input mode → Where any key depressed is entered as text. Input mode to command
mode press <ESC> key. The default mode is command mode.
i → Insert text to left of cursor, I → Insert text to beginning of line.
a → Append text right of cursor, A → Append text at end of line.
o → Open line below, O → Open line above,
S → Replace entire line, r ‘ch’ → Replace single character under cursor,
R → Replace text from cursor to right.
s → Replace single character under cursor with any number of character
(ii) Command mode → Where keys are used as commands to act on text. Command to ex
mode press (:). The meaning of different keys work in command mode:
x → Delete a single character, dd → Delete a single line,
h → Moves cursor left(←), l → Right(→),
k → Up(↑), j → Down(↓),
b → Moves back to beginning of word,
e → Forward to end of word, w → Forward to beginning of word,
G → Moves cursor to a particular line number,
J → joining lines, . → Repeat last command.
(iii) Ex mode → Where ex mode command can be entered in the last line of screen to act
on text. In this mode the operation is:
W Saves file and remain in x Save file and quit editing mode
editing mode
Wn2w.p Like save as in Microsoft .w Write current line to file f1
Windows f1
q! Quits editing mode but q Quits editing mode where no
after abandoning changes made to file
n1,n2wf1 Writes file n1 to n2 to file f1 $w f1 Write last line to file f1
sh Escape to the UNIX shell wq Save files and quit editing mode
U NIX C OMMAND 115
: set show mode → Show the particular mode in which you are working.
Search and replaces → :1,$s /anoop/ANOOP/g // All lines g(global search)
1→ For first, $→ For last
copying and moving text →
yw → Yank a word, y$ → Yank to end of line, y) → Yank to end of sentence,
y} → Yank to end of paragraph, y]] → Yank to end of section,
yy or Y → Yank to current line Y} → Yank line to the end of paragraphs.
Paste → p→ Puts the yank text to right of cursor,
P→ Puts the yank text to left of cursor,
Simple Filters → Some commands each of which accept some data as input performs some
manipulation on it, and produce some output. Since they perform some filtering action on it, and
produce some output. Since they performs some filtering action on the data they are appropriately
called filter.
pr → Paginating output : → The pr command prepare a file for printing by adding suitable
headers, footers and formatted text.
$ pr <filename>
july 31 10:30 2007 <filename> page1
Its often used as a preprocessor before printing with the lp command.
$pr f1 | lp //lpr in Linux
request id 112
By default page size with pr is 66 lines which can be changed with –1 option.
$ pr –1 50 f1 //Page set to 50 lines, $pr +20 f1 //Start printing from page 20.
–k(integer) → Output in more than one column,
–d → Double space the output,
–D → Use format for header date,
–n → Number lines (Counting)
–h → Use a centered header instead of filename in page header,
–h “”→ Print a blank line.
–N number → Start counting with numberat 1st line of 1st page printed,
–o Margin → Offset each line with margin(zero) spaces,
–t → Omit page header and trailer,
–r → Omit warning when a file can not be opened.
–w → Set page width to page_width(72) character for multiple text column o/p page only.
head → Display the beginning of a file. Syntax:
$ head [OPTION] <filename>
$ head f1 //Print first 10 lines of file f1.
–c SIZE → Print first SIZE bytes, –n → Print 1st n lines instead of 10,
–v → Always print header giving filename, –q → Never print header giving filename.
116 U NIX AND S HELL P ROGRAMMING
–H Print file name for each match –B N Print N lines of leading context before
matching lines(---)
3. Write a shell script for search a pattern in file by using command line argument $cat
a1.sh
If [$# –ne 3 ]; then ; echo –e “\n Not three argument ”; exit2
elif grep “$1” “$2”>$3 2>/dev/null; then ; echo “Pattern found”;
Else ; echo “Pattern not found ” ; rm $3;fi
U NIX C OMMAND 119
–x Executable –d Directory
–s Size>0 –b Block
ARG1 < ARG2, ARG1 > ARG2, ARG1 = ARG2, ARG1 ! ARG2, ARG1 + ARG2, - , * , / , %
Substr – substr STRING pos Length - Sub string of STRING, pos counted from 1.
Index – index STG Char - Index in STG Where any CHARS is found or 0.
Length – length STG - Length of STG.
Directory
(i) Read permission: For a directory means that the list of file names stored in directory
is accessible. If a directory has read permission you can use is to list out its contents.
(ii) Write permission: The presence of write permission for a directory implies that you
are permitted to create or remove files in it.
(iii) Execute permission: Execution privilege of a directory means that a user can “Pass
through ” the directory in searching for sub_directory. For Ex- Cat/usr/anoop/
chaturvedi/d1/a1.sh #You need to have execute permission for each of directories
involved in the complete path name.
U NIX C OMMAND 121
Example
1. Write a shell script to print all the prime number from X1 to X2.
cat >prime.sh
echo “Enter lower Limit”
read x1
echo “Enter Higher Limit”
read x2
while [ $x1 -le $x2 ]
do
i=2
while [ $i -le $x1 ]
do
if [ ‘expr $x1 % $i' -eq 0 ]
then
break
else
i=‘expr $i + 1'
fi
done
if [ $i -eq $x1 ]
then
echo $x1
fi
2. Write a shell script which receives any year through keyboard and determine whether the year
is leap or not. If no argument supply the current year should be assumed.
cat >leap.sh
year=0
echo "Enter Year:"
read year
if [ year -eq 0 ]
then
da=‘date "+%Y"'
year=$da
fi
if [ ‘expr $year%400' "eq 0 ] -o [ ‘expr $year%4' -eq 0 -a
‘expr $year%100' -ne 0 ]
122 U NIX AND S HELL P ROGRAMMING
then
echo "The Year $year is Leap Year"
else
echo "The Year $year is Not Leap Year"
fi
3. Write a shell script which receives two filename as arguments. It should check whether the two
files contents are same or not. If they are same then second file should be deleted.
cat >checkfile.sh
echo "Enter First File Name:"
read f1
echo "Enter Second File Name:"
read f2
if cmp $f1 $f2
then
echo "The Files are Same"
rm $f2
else
echo "The Files Content are not Same"
fi
4. Write a shell script to print all the Armstrong number from X1 to X2.
cat >arm.sh
echo "Enter Lower Limit:"
read x1
echo "Enter Higher Limit:"
read x2
old=$x1
while [ $x1 -le $x2 ]
do
sum=0
while [ $x1 -gt 0 ]
do
r=‘expr $x1 % 10'
sum=‘expr $sum + $r \* $r \* $r'
x1=‘expr $x1 / 10'
done
if [ $old -eq $sum ]
U NIX C OMMAND 123
then
echo $sum
fi
old=‘expr $old + 1'
5. Write a shell script to print sum of digit of a number. Number entered through
keyboard.
cat >sumofdigit.sh
echo "Enter Number:"
read n
no=$n
sum=0
while [ $n -gt 0 ]
do
r=‘expr $n % 10'
sum=‘expr $sum + $r'
n=‘expr $n / 10'
done
echo "Sum of Digit $no is: $sum"
6. Write a shell script to print all number from 1 to 10 in same row.
cat >printno.sh
i=1
while [ $i -le 10 ]
do
echo -n "$i"
i=‘expr $i + 1'
done
7. Write a shell script calculate the factorial of a number.
cat >fact.sh
echo "Enter Number:"
read n
no=$n
f=1
while [ $n -ge 1 ]
do
f=‘expr $n \* $f'
n=‘expr $n - 1'
done
echo "The Factorial of $no is:” $f
124 U NIX AND S HELL P ROGRAMMING
8. Ramesh basic salary is input through the keyboard. His dearness allowance is 40% of basic
salary, and house rent allowance is 20% of basic salary. Write a shell script to calculate his
gross salary.
cat >grosspay.sh
echo "Calculation of Gross Salary of Ramesh"
echo "Salary of Ramesh is:"
read bs
d=‘expr $bs \* 40'
h=‘expr $bs \* 20'
da=‘expr $d / 100'
hra=‘expr $h / 100'
gs=‘expr $bs + $da + $hra'
echo "Gross Salary of Ramesh Is:" $gs
9. Write a shell script which will receive either the filename with its full path during execution.
This script should obtain information about this file as given by is – l and display it in proper
format.
cat >fileinfo.sh
echo "Enter File Name:"
read file
if test -e $file
then
echo "The Information of File $file is:"
ls -l $file
else
echo "File $file Is Not Exist"
fi
10. Write a shell script to print all the even and odd number from 1 to 100.
cat >evenodd.sh
i=1
j=1
echo "Even Number"
while [ $i -le 100 ]
do
if [ ‘expr $i % 2' -eq 0 ]
then
echo $i
U NIX C OMMAND 125
i=‘expr $i + 1'
else
i=‘expr $i + 1'
fi
done
echo "Odd Number"
while [ $j -le 100 ]
do
if [ ‘expr $j % 2' -eq 0 ]
then
j=‘expr $j + 1'
else
echo $j
j=‘expr $j + 1'
fi
done
11. Write a shell script which gets executed the moment the user logs in. It should display the
message “Good Morning” / “Good Afternoon” / “Good Evening” depending upon the time at
which the user logs in.
cat >loginfo.sh
tim=‘date "+%H"'
if [ $tim -lt 12 ]
then
echo "Good Morning"
elif [ $tim -lt 17 ]
then
echo "Good Afternoon"
else
echo "Good Evening"
fi
12. Write a menu driven program which has following option:
(i) Contents of / etc/ passwd
(ii) List of users who have currently logged in
(iii) Present working directory
(iv) Exit
Make use of case statement. The menu should be placed approximately in the center of the screen
and should be displayed in bold.
126 U NIX AND S HELL P ROGRAMMING
cat >menu.sh
echo "\t\t\t\t Menu\n\n
\t 1)\t Contents of /etc/passwd\n
\t 2)\t List of Users Who have Currently Logged in\n
\t 3)\t Present Working Directory\n
\t 4)\t Exit\n\n\t\t Enter Your Option: \c"
read ch
case "$ch" in
1) ls /etc /passwd ;;
2) who ;;
3) pwd ;;
4) exit ;;
*) echo "Invalid Option"
esac
13. Write a shell script to count the number of lines and words supplied at standard input.
cat >count.sh
echo "Enter a File Name:"
read fname
if test -e $fname
then
echo "File Name IS: "$fname
nol=‘cat $fname | wc -l'
now=‘cat $fname | wc -w'
echo "$fname file have $nol number of Lines"
echo "$fname file have $now number of Words"
else
echo "The File $fname is Not Exist"
fi
14. Write a shell script which displays a list of all files in the current directory to which you have
read, write and execute permissions.
cat >filedisplay.sh
flag=1
for file in *.*
do
if test -r $file
then
U NIX C OMMAND 127
if test -w $file
then
if test -x $file
then
echo $file
fi
fi
fi
done
15. Write a shell script which will receive any number of filename as arguments. The shell script
should check whether every argument supplied is a file or directory. If it’s a directory it should
be appropriately reported. If it’s a filename then name of the file as well as the number of lines
present in it should be reported.
cat >filecheck.sh
for file in *
do
if test -d $file
then
echo "$file is a Directory"
elif test -f $file
then
echo "$file"
nol='cat $file | wc -l'
echo "The Number of Line is:$nol"
fi
done
128 U NIX AND S HELL P ROGRAMMING
Chapter 13
AWK AND P ERL P ROGRAMMING
13.1 INTRODUCTION TO A
INTRODUCTION WK
AWK
awk is a simple and elegant pattern scanning and processing language. I would call it the
first and last simple scripting language.
awk is a little programming language, with a syntax close to C in many aspects. It is an
interpreted language and the awk interpreter processes the instructions.
About the syntax of the awk command interpreter itself:
awk is also the most portable scripting language in existence. It was created in late 70th of
the last century almost simultaneously with Borne shell. The name was composed from the initial
letters of three original authors Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger.
It is commonly used as a command-line filter in pipes to reformat the output of other commands.
It’s the precursor and the main inspiration of Perl. Although originated in Unix it is available and
widely used in Windows environment too.
awk takes two inputs: data file and command file. The command file can be absent and
necessary commands can be passed as augments. As Ronald P. Loui aptly noted awk is very
underappreciated language.
Most people are surprised when I tell them what language we use in our undergraduate AI
programming class. That’s understandable. We use GAWK. GAWK, Gnu’s version of Aho,
Weinberger, and Kernighan’s old pattern scanning language isn’t even viewed as a programming
language by most people. Like PERL and TCL, most prefer to view it as a “scripting language”.
There are three variations of awk:
AWK—the original from AT&T
NAWK—A newer, improved version from AT&T
GAWK—The Free Software foundation’s version.
The main advantage of awk is that unlike Perl and other “scripting monsters” that it is very
slim without feature creep so characteristic of Perl and thus it can be very efficiently used with
pipes. Also it has rather simple, clean syntax and like much heavier TCL can be used with C for
“dual-language” implementations.
128
AWK AND PE R L PR O G R A M M I N G 129
In awk you can became productive in several hours. For instance, to print only the second
and sixth fields of the date command—the month and year—with a space separating them, use:
date | awk ‘{print $2 " " $6}’
Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in
curly braces to separate it from the pattern. Newlines usually separate rules. Therefore, an awk
program looks like this:
pattern {action }
When you run awk, you specify an awk program that tells awk what to do. The program
consists of a series of rules. (It may also contain function definitions, an advanced feature that we
will ignore for now. Each rule specifies one pattern to search for and one action to perform upon
finding the pattern.
13.4.1 width
Pad field to this width as needed; fields that begin with a leading 0 are padded with zeros.
13.4.2 .prec
Specify maximum string width or digits to right of decimal point.
“awk printf conversion characters” lists the printf conversion characters.
c single character
d decimal number
e [-]d.ddddddE[+ –]dd
f [-]ddd.dddddd
g e or f conversion, whichever is shorter, with
nonsignificant zeros suppressed
o unsigned octal number
s string
x unsigned hexadecimal number
% print a %; no argument is converted
Both scripts will output only those lines that don’t contain a matchme character sequence.
Again, you can choose the method that works best for your code. They both do the same thing.
awk also allows the use of boolean operators “||” (for “logical or”) and “&&”(for “logical
and”) to allow the creation of more complex boolean expressions:
( $1 == “foo” ) && ( $2 == “bar” ) { print }
initialize the counter n to zero, since awk does this automatically (see Variables). The second rule
increments the variable n every time a record containing the pattern ‘foo’ is read. The END rule
prints the value of n at the end of the run.
The special patterns BEGIN and END cannot be used in ranges or with Boolean operators
(indeed, they cannot be used with any operators). An awk program may have multiple BEGIN
and/or END rules. They are executed in the order in which they appear: all the BEGIN rules at
startup and all the END rules at termination. BEGIN and END rules may be intermixed with other
rules. This feature was added in the 1987 version of awk and is included in the POSIX standard.
You can use while loop as follows:
Syntax:
while (condition)
{
statement1
statement2
statementN
Continue as long as given condition is TRUE
}
While loop will continue as long as given condition is TRUE. To understand the while loop
lets write one more awk script:
Example:
$ cat > while_loop
{
no = $1
remn = 0
while ( no > 1 )
{
remn = no % 10
no /= 10
printf "%d" ,remn
}
printf "\nNext number please (CTRL+D to stop):";
}
Run it as
$awk -f while_loop
654
456
Next number please(CTRL+D to stop):587
785
134 U NIX AND S HELL P ROGRAMMING
ARGC is the number of command-line arguments present. Unlike most awk arrays, ARGV is
indexed from zero to ARGC-1. For example:
$ awk ‘BEGIN {
> for (i = 0; i < ARGC; i++)
> print ARGV[i]
>}’ inventory-shipped BBS-list
-| awk
-| inventory-shipped
-| BBS-list
In this example, ARGV[0] contains "awk", ARGV[1] contains "inventory-shipped",
and ARGV[2] contains "BBS-list". The value of ARGC is three, one more than the index of the
last element in ARGV, since the elements are numbered from zero.
The following fragment processes ARGV in order to examine, and then remove, command line
options.
BEGIN {
for (i = 1; i < ARGC; i++) {
if (ARGV[i] == “-v”)
verbose = 1
else if (ARGV[i] == “-d”)
debug = 1
else if (ARGV[i] ~ /^-?/) {
e = sprintf(“%s: unrecognized option — %c”,
ARGV[0], substr(ARGV[i], 1, ,1))
print e > “/dev/stderr”
} else
break
delete ARGV[i]
}
}
ENVIRON
An associative array that contains the values of the environment. The array indices are
the environment variable names; the values are the values of the particular environ-
ment variables. For example, ENVIRON["HOME"] might be `/home/arnold’.
FILENAME
This is the name of the file that awk is currently reading. When no data files are listed
on the command line, awk reads from the standard input, and FILENAME is set to "-".
FILENAME is changed each time a new file is read.
AWK AND PERL P ROGRAMMING 137
FNR
FNR is the current record number in the current file. FNR is incremented each time a
new record is read. It is reinitialized to zero each time a new input file is started.
NF
NF is the number of fields in the current input record. NF is set each time a new record
is read, when a new field is created, or when $0 changes.
NR
This is the number of input records awk has processed since the beginning of the
program’s execution. NR is set each time a new record is read.
RLENGTH
RLENGTH is the length of the substring matched by the match function. RLENGTH is
set by invoking the match function. Its value is the length of the matched string, or
–1 if no match was found.
RSTART
RSTART is the start-index in characters of the substring matched by the match function.
RSTART is set by invoking the match function. Its value is the position of the string
where the matched substring starts, or zero if no match was found.
A side note about NR and FNR. awk simply increments both of these variables each time it
reads a record, instead of setting them to the absolute value of the number of records read. This
means that your program can change these variables, and their new values will be incremented
for each record. For example:
$ echo ‘1
> 2
> 3
> 4' | awk ‘NR == 2 { NR = 17 }
> { print NR }’
-| 1
-| 17
-| 18
-| 19
close($2)
} else
print
}’
Note here how the name of the extra input file is not built into the program; it is taken directly
from the data, from the second field on the ‘@include’ line.
The close function is called to ensure that if two identical ‘@include’ lines appear in
the input, the entire specified file is included twice.
atan2(y, x)
This gives you the arctangent of y/x in radians.
rand()
This gives you a random number. The values of rand are uniformly-distributed between
zero and one. The value is never zero and never one. Often you want random integers
instead. Here is a user-defined function you can use to obtain a random non-negative
integer less than n:
function randint(n) {
return int(n * rand())
}
The multiplication produces a random real number greater than zero and less than n.
We then make it an integer (using int) between zero and n – 1, inclusive.
srand([x])
The function srand sets the starting point, or seed, for generating random numbers to
the value x. If you omit the argument x, as in srand(), then the current date and time
of day are used for a seed. This is the way to get random numbers that are truly
unpredictable. The return value of srand is the previous seed.
print str
}’
-| dcaacbaaa,
gsub(regexp, replacement [, target])
This is similar to the sub function, except gsub replaces all of the longest, leftmost, non-
overlapping matching substrings it can find. The ‘g’ in gsub stands for “global,” which
means replace everywhere. For example:
awk ‘{ gsub(/Britain/, “United Kingdom”); print }’
replaces all occurrences of the string ‘Britain’ with ‘United Kingdom’ for all input
records. The gsub function returns the number of substitutions made.
substr(string, start [, length])
This returns a length-character-long substring of string, starting at character number
start. The first character of a string is character number one. For example,
substr(“washington”, 5, 3) returns “ing”. If length is not present, this function returns
the whole suffix of string that begins at character number start. For example,
substr(“washington”, 5) returns “ington”. The whole suffix is also returned if length is
greater than the number of characters remaining in the string,
tolower(string)
This returns a copy of string, with each upper-case character in the string replaced with
its corresponding lower-case character. For example, tolower(“MiXeD cAsE 123”)
returns “mixed case 123”.
toupper(string)
This returns a copy of string, with each lower-case character in the string replaced with
its corresponding upper-case character. For example, toupper(“MiXeD cAsE 123”)
returns “MIXED CASE 123”.
are not accessible in the function definition. The function body can contain expressions which
call functions. They can even call this function, either directly or by way of another function.
When this happens, we say the function is recursive.
In many awk implementations, including gawk, the keyword function may be abbreviated
func. To ensure that your awk programs are portable, always use the keyword function when
defining a function.
BEGIN {
a[1] = 1; a[2] = 2; a[3] = 3
changeit(a, 2, "two")
printf "a[1] = %s, a[2] = %s, a[3] = %s\n",
a[1], a[2], a[3]
}
This program prints ‘a[1] = 1, a[2] = two, a[3] = 3’, because changeit stores
“two” in the second element of a.
The expression part is optional. If it is omitted, then the returned value is undefined and,
therefore, unpredictable. A return statement with no value expression is assumed at the end of
every function definition. So if control reaches the end of the function body, then the function
returns an unpredictable value. awk will not warn you if you use the return value of such a
function.
Here is an example of a user-defined function that returns a value for the largest number
among the elements of an array:
function maxelt(vec, i, ret)
{
for (i in vec) {
if (ret == “” || vec[i] > ret)
ret = vec[i]
}
return ret
}
You call maxelt with one argument, which is an array name. The local variables i and ret
are not intended to be arguments.
The interpreter makes one pass of the file to analyze it and if there are no syntax or other
obvious errors, the interpreter runs the Perl code. There is no “main” function—the interpreter just
executes the statements in the file starting at the top.
Following the Unix convention, the very first line in a Perl file usually looks like this...
#!/usr/bin/perl -w
This special line is a hint to Unix to use the Perl interpreter to execute the code in this file.
The “-w” switch turns on warnings which is generally a good idea. In unix, use “chmod” to set
the execute bit on a Perl file so it can be run right from the prompt...
> chmod u+x foo.pl ## set the "execute" bit for the file once
>
> foo.pl ## automatically uses the perl interpreter to "run" this
file
The second line in a Perl file is usually a “require” declaration that specifies what version
of Perl the program expects ...
#!/usr/bin/perl -w
require 5.004;
Perl is available for every operating system imaginable, including of course Windows and
MacOS, and it’s part of the default install in Mac OSX.
2. Contained in the file specified by the first filename on the command line.
3. Passed in implicitly via standard input. This only works if there are no filename
arguments—to pass arguments to a stdin script you must explicitly specify a-for the
script name.
After locating your script, perl compiles it to an internal form. If the script is syntactically
correct, it is executed.
13.12.1 Strings
Strings constants are enclosed within double quotes (“) or in single quotes (‘). Strings in
double quotes are treated specially—special directives like \n (newline) and \x20 (hex 20) are
expanded. More importantly, a variable, such as $x, inside a double quoted string is evaluated
at run-time and the result is pasted into the string. This evaluation of variables into strings is
called “interpolation” and it’s a great Perl feature. Single quoted (‘) strings suppress all the special
evaluation—they do not evaluate \n or $x, and they may contain newlines.
$fname = "binky.txt";
$a = "Could not open the file $fname."; ## $fname evaluated and pasted
in -- neato!
$b = ‘Could not open the file $fname.’; ## single quotes (‘) do no spe-
cial evaluation
## $a is now "Could not open the file binky.txt."
## $b is now "Could not open the file $fname."
The characters ‘$’ and ‘@’ are used to trigger interpolation into strings, so those characters
need to be escaped with a backslash (\) if you want them in a string. For example:
"nick\@stanford.edu found \$1".
The dot operator (.) concatenates two strings. If Perl has a number or other type when it wants
a string, it just silently converts the value to a string and continues. It works the other way
too—a string such as “42” will evaluate to the integer 42 in an integer context.
$num = 42;
$string = "The " . $num . " ultimate" . " answer";
## $string is now "The 42 ultimate answer"
Use warnings;
my $string= ‘Anoop’;
my $chr =chop($string);
print “String : $ string\n”;
print “Char : $chr\n”;
This program gives you:
String: Anoo
Char : p
If the string is empty, chop() will return an empty string. If the string is undefined, chop()
will return undefined.
Example 2. Chopping strings in an array
If you pass the chop() function an array, it will remove the last character from every element
in the array.
Note that this will only work for a one-dimensional array. In other words, it is not valid to
pass in an array reference, or an array that contains an array (or hash).
#!/usr/bin/perl
use strict;
use warnings;
my @array = (‘fred’, ‘bob’, ‘jill’, ‘joan’);
my $chr = chop(@array);
foreach my $str (@array) {
print “$str\n”;}
print “Char: $chr\n”;
This produces the output:
fre
bo
jil
joa
Char: n
Example 3. Chopping strings in a hash
If you pass a hash into chop(), it will remove the last character from the values (not the
keys) in the hash. For example:
#!/usr/bin/perl
use strict;
use warnings;
my %hash = (
first => ‘one’,
152 U NIX AND S HELL P ROGRAMMING
Arithmetic Operators:
Operator Example Result Definition
+ 7+7 = 14 Addition
-- 7 – 7 =0 Subtraction
* 7* 7 = 49 Multiplication
/ 7/7 =1 Division
** 7 ** 7 = 823543 Exponents
% 7%7 =0 Modulus
With these operators we can take a number and perform some simple math operations.
$add = $x + 9;
$sub = $x - 9;
$mul = $x * 10;
$div = $x / 9;
$exp = $x ** 5;
$mod = $x % 79;
print "$x plus 9 is $add<br />";
print "$x minus 9 is $sub<br />";
print "$x times 10 is $mul<br />";
print "$x divided by 9 is $div<br />";
print "$x to the 5th is $exp<br />";
print "$x modulus 79 is $mod<br />";
Your browser should read:
arithmetic.pl:
81 plus 9 is 90
81 minus 9 is 72
81 times 10 is 810
81 divided by 9 is 9
81 to the 5th is 3486784401
81 modulus 79 is 2
Assignment Operators:
Operator Definition Example
+= Addition ($x + = 10)
– = Subtraction ($x – = 10)
*= Multiplication ($x * = 10)
/= Division ($x / = 10)
%= Modulus ($x % = 10)
** = Exponent ($x ** = 10)
154 U NIX AND S HELL P ROGRAMMING
Logical operators state and/or relationships. Meaning, you can take two variables and test an
either or conditional. Logical operators are used later on in conditionals and loops. For now, just
be able to recognize them in the upcoming examples.
Logical/Relational Operators:
Relational
13.15.1 Logical
Shortens to:
while (<FH>) {
/Perl/ and
print FHO ;
print uc;
}
Note that the English module adds in the ability to refer to the special variables by other
longer, but easier to remember, names such as @ARG for @_ and $PID for $$. But use English;
can have a detrimental performance effect if you’re matching regular expressions against long
incoming strings.
13.17 ARRAYS— @
List arrays (also known simply as “arrays” for short) take the concept of scalar variables to
the next level. Whereas scalar variables associate one value with one variable name, list arrays
associate one array name with a “list” of values.
Array constants are specified using parenthesis ( ) and the elements are separated with
commas. Perl arrays are like lists or collections in other languages since they can grow and shrink,
but in Perl they are just called “arrays”. Array variable names begin with the at-sign (@). Unlike
C, the assignment operator (=) works for arrays—an independent copy of the array and its
elements is made. Arrays may not contain other arrays as elements. Perl has sort of a “1-deep”
mentality. Actually, it’s possible to get around the 1-deep constraint using “references”, but it’s
no fun. Arrays work best if they just contain scalars (strings and numbers). The elements in an
array do not all need to be the same type. A list array is defined with the following syntax:
@array_name = ("element_1", "element_2"..."element_n");
For example, consider the following list array definition:
@available_colors = ("red", "green", "blue","brown");
@array = (1, 2, "hello"); ## a 3 element array
@empty = (); ## the array with 0 elements
$x = 1;
$y = 2;
@nums = ($x + $y, $x - $y);
## @nums is now (3, -1)
Just as in C, square brackets [ ] are used to refer to elements, so $a[6] is the element at index
6 in the array @a. As in C, array indexes start at 0. Notice that the syntax to access an element
begins with ‘$’ not ‘@’—use ‘@’ only when referring to the whole array (remember: all scalar
expressions begin with $).
@array = (1, 2, "hello", "there");
$array[0] = $array[0] + $array[1]; ## $array[0] is now 3
Perl arrays are not bounds checked. If code attempts to read an element outside the array size,
undef is returned. If code writes outside the array size, the array grows automatically to be big
enough. Well written code probably should not rely on either of those features.
@array = (1, 2, "hello", "there");
$sum = $array[0] + $array[27]; ## $sum is now 1, since $array[27]
returned undef
$array[99] = "the end"; ## array grows to be size 100
158 U NIX AND S HELL P ROGRAMMING
When used in a scalar context, an array evaluates to its length. The “scalar” operator will
force the evaluation of something in a scalar context, so you can use scalar() to get the length of
an array. As an alternative to using scalar, the expression $#array is the index of the last element
of the array which is always one less than the length.
@array = (1, 2, "hello", "there");
$len = @array; ## $len is now 4 (the length of @array)
$len = scalar(@array) ## same as above, since $len represented
a scalar
## context anyway, but this is more
explicit
@letters = ("a", "b", "c");
$i = $#letters; ## $i is now 2
That scalar(@array) is the way to refer to the length of an array is not a great moment
in the history of readable code. At least I haven’t showed you the even more vulgar forms such
as (0 + @a).
The sort operator (sort @a) returns a copy of the array sorted in ascending alphabetic
order. Note that sort does not change the original array. Here are some common ways to sort...
(sort @array) ## sort alphabetically, with
uppercase first
(sort {$a <=> $b} @array) ## sort numerically
(sort {$b cmp $a} @array) ## sort reverse alphabetically
(sort {lc($a) cmp lc($b)} @array) ## sort alphabetically, ignoring
case (somewhat inefficient)
The sort expression above pass a comparator function {...} to the sort operator, where the
special variables $a and $b are the two elements to compare –cmp is the built-in string compare,
and <=> is the built-in numeric compare.
There’s a variant of array assignment that is used sometimes to assign several variables at
once. If an array on the left hand side of an assignment operation contains the names of variables,
the variables are assigned the corresponding values from the right hand side.
($x, $y, $z) = (1, 2, "hello", 4);
## assigns $x=1, $y=2, $z="hello", and the 4 is discarded
This type of assignment only works with scalars. If one of the values is an array, the wrong
thing happens (see “flattening” below).
A hash array may be converted back and forth to an array where each key is immediately
followed by its value. Each key is adjacent to its value, but the order of the key/value pairs
depends on the hashing of the keys and so appears random. The “keys” operator returns an array
of the keys from an associative array. The “values” operator returns an array of all the values,
in an order consistent with the keys operator.
160 U NIX AND S HELL P ROGRAMMING
@array = %dict;
## @array will look something like
## ("homer", "D’oh", "lisa", "", "bart", "I didn’t do it");
##
## (keys %dict) looks like ("homer", "lisa, "bart")
## or use (sort (keys %dict))
You can use => instead of comma and so write a hash array value this cute way...
%dict = (
"bart" ⇒ "I didn’t do it",
"homer" ⇒ "D’Oh",
"lisa" ⇒ "",
);
In Java or C you might create an object or struct to gather a few items together. In Perl you
might just throw those things together in a hash array.
%ENV contains the environment variables of the context that launched the Perl program.
@ARGV and %ENV make the most sense in a Unix environment.
13.19.1 IF
if (expr) { ## basic if - - ( ) and { } required
stmt;
stmt;
}
AWK AND PERL P ROGRAMMING 161
13.19.2 If Variants
As an alternative to the classic if() { } structure, you may use if, while, and unless as
modifiers that come after the single statement they control ...
For these constructs, the parentheses are not required around the boolean expression. This
may be another case where Perl is using a structure from human languages. I never use this syntax
because I just cannot get used to seeing the condition after the statement it modifies. If you were
defusing a bomb, would you like instructions like this: “Locate the red wire coming out of the
control block and cut it. Unless it’s a weekday—in that case cut the black wire.”
13.19.3 Loops
These work just as in C...
while (expr) {
stmt;
stmt;
}
for (init_expr; test_expr; increment_expr) {
stmt;
stmt;
}
162 U NIX AND S HELL P ROGRAMMING
The "next" operator forces the loop to the next iteration. The "last" operator breaks out
of the loop like break in C. This is one case where Perl (last) does not use the same keyword name
as C (break).
$line = <STDIN>; ## read one line from the STDIN file handle
chomp($line); ## remove the trailing “\n” if present
$line2 = <FILE2>; ## read one line from the FILE2 file handle
## which must be have been opened previously
Since the input operator returns undef at the end of the file, the standard pattern to read all
the lines in a file is ...
## read every line of a file
while ($line = <STDIN>) {
## do something with $line
}
Open returns undef on failure, so the following phrase is often to exit if a file can’t be opened.
The die operator prints an error message and terminates the program.
open(FILE, $fname) || die "Could not open $fname\n";
In this example, the logical-or operator || essentially builds an if statement, since it only
evaluates the second expression if the first if false. This construct is a little strange, but it is a
common code pattern for Perl error handling.
The behaviour of <FILE> also depends on the special global variable $/ which is the current
the end-of-line marker (usually “\n”). Setting $/ to undef causes <FILE> to read the whole file
into a single string.
$/ = undef;
$all = <FILE>; ## read the whole file into one string
You can remember that $/ is the end-of-line marker because “/” is used to designate
separate lines of poetry. I thought this mnemonic was silly when I first saw it, but sure enough,
I now remember that $/ is the end-of-line marker.
An optional first argument to print can specify the destination file handle. There is no comma
after the file handle.
print FILE "Here", " there", " everywhere!", "\n"; ## no comma after FILE
AWK AND PERL P ROGRAMMING 165
The above uses “die” to abort the program if one of the files cannot be opened. We could
use a more flexible strategy where we print an error message for that file but continue to try to
process the other files. Alternately we could use the function call exit(-1) to exit the program
with an error code. Also, the following shift pattern is a common alternative way to iterate through
an array ...
while($fname = shift(@ARGV)) {...
Example 2
Here is the basic perl program which does the same as the UNIX cat command on a certain
file.
#!/usr/local/bin/perl
#
# Program to open the password file, read it in,
# print it, and close it again.
In the simplest case, the exact characters in the regular expression pattern must occur in the
string somewhere. All of the characters in the pattern must be matched, but the pattern does not
need to be right at the start or end of the string, and the pattern does not need to use all the
characters in the string.
#### Matches and stops at first ‘cat’; does not get to ‘catcat’ on the right
“cathatcatcat” =~ m/(c|a|t)+/ ==> TRUE
#### ? = optional
“12121x2121x2” =~ m/^(1x?2)+$/ ==> TRUE
“aaaxbbbabaxbb” =~ m/^(a+x?b+)+$/ ==> TRUE
“aaaxxbbb” =~ m/^(a+x?b+)+$/ ==> FALSE
#### Three words separated by spaces
“Easy does it” =~ m/^\w+\s+\w+\s+\w+$/ ==> TRUE
#### Just matches “gates@microsoft” — \w does not match the “.”
“[email protected]” =~ m/\w+@\w+/ ==> TRUE
#### Add the .’s to get the whole thing
“[email protected]” =~ m/^(\w|\.)+@(\w|\.)+$/ ==> TRUE
#### words separated by commas and possibly spaces
“Klaatu, barada,nikto” =~ m/^\w+(,\s*\w+)*$/ ==> TRUE
13.22 SUBROUTINES
Perl subroutines encapsulate blocks of code in the usual way. You do not need to define
subroutines before they are used, so Perl programs generally have their “main” code first, and
their subroutines laid out toward the bottom of the file. A subroutine can return a scalar or an
array.
$x = Three(); ## call to Three() returns 3
exit(0); ## exit the program normally
sub Three {
return (1 + 2);
}
170 U NIX AND S HELL P ROGRAMMING
13.22.2 @_ Parameters
Perl subroutines do not have formal named parameters like other languages. Instead, all the
parameters are passed in a single array called “@_”. The elements in @_ actually point to the
original caller-side parameters, so the called function is responsible for making any copies.
Usually the subroutine will pull the values out of @_ and copy them to local variables. A Sum()
function which takes two numbers and adds them looks like...
sub Sum1 {
my ($x, $y) = @_; # the first lines of many functions look like
this
# to retrieve and name their params
return($x + $y);
}
AWK AND PERL P ROGRAMMING 171
You can have any number of “&” in the replacement string. You could also double a pattern,
e.g., the first number of a line:
% echo "123 abc" | sed ‘s/[0-9]*/& &/’
123 123 abc
Let me slightly amend this example. Sed will match the first string, and make it as greedy
as possible. The first match for ‘[0–9]*’ is the first character on the line, as this matches zero of
more numbers. So if the input was “abc 123” the output would be unchanged. A better way to
duplicate the number is to make sure it matches a number:
% echo "123 abc" | sed ‘s/[0-9][0-9]*/& &/’
123 123 abc
The string “abc” is unchanged, because it was not matched by the regular expression.
echo abcd123 | sed ‘s/\([a-z]*\).*/\1/’
This will output “abcd” and delete the numbers.
If you want to switch two words around, you can remember two patterns and change the
order around:
sed ‘s/\([a-z]*\) \([a-z]*\)/\2 \1/’
Note the space between the two remembered patterns. This is used to make sure two words
are found.
The “\1” doesn’t have to be in the replacement string (in the right hand side). It can be in
the pattern you are searching for (in the left hand side). If you want to eliminate duplicated words,
you can try:
sed ‘s/\([a-z]*\) \1/\1/’
You can have up to nine values: “\1” thru “\9.”
13.23.5 /p-Print
By default, sed prints every line. If it makes a substitution, the new text is printed instead of
the old one. If you use an optional argument to sed, “sed -n,” it will not, by default, print any
174 U NIX AND S HELL P ROGRAMMING
new lines. I’ll cover this and other options later. When the “-n” option is used, the “p” flag will
cause the modified line to be printed. Here is one way to duplicate the function of grep with sed:
sed -n 's/pattern/&/p' <file
acts like the cat program if PATTERN is not in the file: e.g., nothing is changed. If PATTERN is
in the file, then each line that has this is printed twice. Add the “-n” option and the example acts
like grep:
sed -n ‘s/PATTERN/&/p’ file
Nothing is printed, except those lines with PATTERN included.
8. Fast and Easy Installation: Most Linux distribution come with user friendly installa-
tion and setup programs popular Linux distribution come with tools that make instal-
lation of additional software very user friendly as well.
9. Full use of Hard Disk: Linux continues to work well even when the hard disk is almost
full.
10. Multitasking: Linux is designed to do many thing at the same time. For example a large
printing job in the background would not slow down your other work.
11. Security: Linux is one of the most secure Operating System. Linux user have option to
select and safely download software, free of charge from online repository containing
thousand of high quality packages. No purchase transactions requiring credit card
number and other sensitive personal information.
12. Open Source: If You develop software that require knowledge or modification of the
operating system code, Linux source code at your fingertips. Linux application open
source as well.
The-Shell
The-Shell is a program that provides an interpreter and interface between the user and the
Linux Operating System. It executes commands that are read either from a terminal or from a file.
Files containing commands may be created, allowing users to build their own commands. In this
manner, users may tailor Linux to their individual requirements and style.
There are a number of different Shells. Each provides a slightly different interface between
the user and the Linux Operating System. The most important shells that originated from the Unix
operating system are:
There are other shells that are less widely used and not available on many machines. For
example, there is the Restricted Shell-rsh. This restricts the area of memory the user may access
to his or her own directory, thus limiting access to all other users’ files.
All of these shell interfaces are available to Linux. However, there are other shells that have
been developed since, most generally for Linux: ash, tcsh and zsh are available on most versions
of Linux. However, the most widely used, originally Linux-based shell is the Bourne-Again shell
(bash). Based on the original Bourne shell, it has similar extensions as the Korn shell, plus its own
further extensions.
Linux also offers a windows-based shell interface, commonly known as X-Windows or
simply as X. More akin to the Mackintosh windows than Microsoft windows, it is another method
of interfacing with the Linux kernel. However, X-Windows interfaces are not considered on this
course.
A command issued by a user may be run in the present shell, or the shell itself may start
up another copy of itself in which to run that command. In this way, a user may run several
commands at the same time. A secondary shell is called a sub-shell.
182 U NIX AND S HELL P ROGRAMMING
When a user logs onto the System, a shell is automatically started. This will monitor the
user’s terminal, waiting for the issue of any commands. The type of shell used is stored in a file
called passwd in the subdirectory etc. (see Section 3.2). Any other shell may be run as a sub-shell
by issuing it as a command. For example, /usr/bin/ksh will run a Korn shell. The original shell
will still be running—in background mode—until the Korn shell is terminated.
Users
Linux is a multi-user operating system. Each user will require to create and access his or her
own files. These files must be secure from other users on the system. Because of this, each user
has a unique identification on a Linux system, with the option of a password to enhance security.
There are two types of user on a Linux system:
• Ordinary Users
An ordinary user has a Home Directory under which files and sub-directories are nor-
mally stored. After logging onto the system, a user is normally taken directly to that
directory.
An ordinary user is a member of a Group of users. For security reasons, files (and
directories) owned by a user may be accessed and used by the user, other members of the
user’s group and all other users at different levels of permission. For example, a file may
be read and altered by the user that owns it, may only be read by other members of the
same group and may not be accessed at all by any other user.
• Super-User
A super-user is a privileged user who has full access to all files, regardless to whoever
owns them or what their access permissions are:—
The super-user has a position of responsibility: to administer and maintain the system.
The super-user is normally known as root. root’s Home directory is the primary directory
of the system, under which all other directories and all files are stored.
E X E RC IS E S 183
E X E R C ISE S
183
184 U NIX AND S HELL P ROGRAMMING
34. Discuss Unlink system call. How do you unlink an opened file?
35. Suppose a directory has read permission for a user but not execute permission. What
happens when the directory is used as a parameter to is with the "-i" option? What
about the -l option?
36. What strange things could happen if the kernel would allow two process to mount the
same file system simultaneously at two mount points?
37. When executing the command ls - ld on a directory, note that the number of link to the
directory is never 1. Why?
38. Explain in detail premature termination of a process.
39. How is /etc/passwd updated by any user, while changing his password, even though
the file does not the write permission?
40. Design an algorithm that translate virtual address to physical address, given the
virtual address and the address of the pregion entry.
41. Design an algorithm for allocating and freeing memory pages and page tables. What
data structure would allow best performance or simplest implementation?
42. Its possible to implement the system such that the kernel stack grows on top of the user
stack. Discuss the advantage and disadvantage of such an implementation.
43. Suppose a process goes to sleep and the system contains no process ready to run. What
happens when the sleeping process does its context switch?
44. What happens if the kernel issue a makeup call for all process asleep on address A,
but no process are asleep on that address at time?
45. Explain the following UNIX command for communication:
{i) news
(ii) mail
{iil) wall
(iv) write
(v) mesg
(v1) crontab
46. How the Fork 0 system call create a new process. Write an algorithm for Fork system
call.
47. Draw the process state diagram and algorithm for checking and handling signals.
48. Write shell script that prints the current date, user name and the name of your login
shell.
49. When executing the command ls -ld on a directory, the number of links of the directory
is never 1. Why?
50. When the shell creates a new process to execute a command, how does it know that
the file is executable? If it is executable, how does it distinguish between a shell script
and a file produced by a compilation? What is the correct sequence for checking the
above cases?
186 UNIX AND SHELL PROGRAMMING
51. Write a menu driven program which has the following options:
(1) Contents of I etc/ passwd
(ii) Present working Directory
(iii) Lists of users who have currently logged in
(iv) Exit
52. Illustrate the development of open general public licence in case of Linux OS. Give the
history of development of Linux Operating System.
53. The algos iget and iput do not require the processor execution level to be raised to block
out interrupt what this imply.
54. Describe the implementation of the kill system call.
55. A process check for signal when it enters or leave the sleep state and when it returns
to user mode from the kernel after completion of a system call or after handling a
interrupt. Why does the process not have to check for signals when entering the system
for execution of a system call?
56. Explain the security problem that exist if a setuid program is not write-protected.
57. If Anoop uses su command to become a super user, he can't execute any of the shell
script in his directory. Explain with reason.
58. Explain the use of following shell variables: $#, $*, $@ and $.
59. Write a shell script that reports in descending order of their size, name and size of all
files whose size exceeds 40 bytes in a specific directory (Supplied as an argument).
Total number of search files is also displayed.
60. Mention different grep family of commands and explain each one of them very briefly.
Is it possible to use multiple search pattern with all the grep family of commands.
61. Write a shell script that would pickup all 'C' program files from the current directory
and add the extension ·.CPP' at the end of each such file.
62. Write sed command to count the number of students born in the year 1977 from the
database.
63. Discuss how one can input insert text before the contents of input file using sed.
64. When the shell create a new process to execute a command. How does it know that
the file is executable? If its executable how does it distinguish between a shell script
and a file produce by a compilation? What is the correct sequence for checking the
above case?
65. What is the function of following UNIX commands? Explain with example by writing
proper syntax of these command:
(1) tr.
(ii) awk
(iii) grep, egrep, fgrep
(iv) finger
(v) batch
E X E RC IS E S 187
(vi) bc
(vii) sort
(viii) cu t
(ix) copy
(x) um ask
66. W hat d o you u nd erstand by m ou nting and u nm ou nting a file system in U N IX ? H ow
is this achieved ?
67. W hat are the basic fu nction of shell? D iscu ss d ifferent typ es of shell u sed in U N IX O S.
68. W hat is shell p rogram m ing? W rite a shell scrip t for tacking the backu p in U N IX .
69. W hat is the m eaning of p assw ord file and grou p file? W rite the d ifferent entries existing
in these files.
70. E xp lain the case statem ent in U N IX .
71. W rite the algorithm for p rocess sched u ling. W rite the sched u ling p aram eter.
72. W rite the algorithm for client p rocess and receiving m essage.
73. H ow the shared m em ory attach once or tw ice to a p rocess?
74. D escribe the sockets m od el in d etail.
75. W hat is the p roblem of a m u ltip rocessor system ? H ow it can be solved w ith m aster and
slave p rocessor and w ith sem ap hore?
76. D escribe the L inu x stru ctu re and also d efine the featu re of L inu x.
77. W hat are the sp ecials bu ilt in p attern in aw k? D escribe these p atterns.
78. W hat are the ad vantage of d elayed w rite m echanism ?
79. W rite an algorithm for allocation of a bu ffer for a block. Trace you r algorithm for all
the p ossible variation in inp u t d ata.
80. W hat are the contents of a incore inod e and w hat ad d itional inform ation is to be stored
in in-core inod e and w hy?
81. G ive the layou t of the U N IX system m em ory. D escribe each section.
82. W rite d ow n the secu rity featu res of U N IX .
83. W hat are aw k p atterns? D escribe B E G IN and E N D p atterns.
84. D escribe in brief any one techniqu e of p rocess synchronization u sed in U N IX .
85. W hat are p ip es? D ifferentiate betw een nam ed and u nnam ed p ip es.
86. D escribe how w rite system calls w ork. W hat are its inp u t p aram eters and retu rns
inform ation. D escribe w ith the help of algorithm .
87. D iscu ss the stru ctu re of a regu lar file. H ow byte offset can be converted into a block
nu m ber give algorithm ?
88. W rite short notes on:
(i) Featu res of linu x
(ii) D evice D eliver
(iii) M ou nting of a file system
188 U NIX AND S HELL P ROGRAMMING
18.9
190 U NIX AND SHELL PROGRAMMING
G p
Getline, 137 @_Parameters, 170
Grep Family, 116 Pattern Matching, 112
Per Process Region Table. 18
H Perl, 148
How to Run awk Programs? 129 Perl ChopO Function, 150
Hash Arrays-%, 159 Perl Script, 149
Perl- $_ and @_, 156
I
PERL- Arithmetic Operators. 152
lncore Copy of inode, 35 PERL- Assignment Operators. 153
!nodes, 13 PERL-Logical and Relational Operators, 154
Internal and External Command, 105
Pipes, 58
Interrupts and Exceptions. 8 Process, 15
Process Creation, 83
K
Process Data Structure, 17
Kernel Data Structure, 21. 71
Process State and Transitions, 18, 70
Knoppix, 180
Process Table, 17
Korn Shell, 103
Process Termination, 89
Process Tracing, 90
L
Process Tree and Sharing Pi pes. 59
Layout of System Memory, 72
Link, 66
R
Linux, 177
Race condition in assigning inodes, 45
Linux Distributions, J78
Read. 50
Linux Operating System, 181
A Reader and A Writer Process. 53
Loops in awk, 131
Region Table, 17
LSeek, 54
Relation expression. 110
Remembered Inode, 43
M
Remove file and directory. I 06
Messages, 92
mknod, 56
s
Mount, 61
Scenarios for retrieval of a Buffer. 25
Multiprocessor Systems, 99
Sed, 171
Semaphores, 95, 102
0
Shared Memory, 93
Open, 49
Shell Programming, 117
INDEX 191