Osd Co4
Osd Co4
Osd Co4
CPU Memory
Addresses
Registers Object Code
E Data Program Data
I OS Data
P Condition Instructions
Codes
Stack
Programmer-Visible State
EIP (Program Counter)
Address of next instruction
Register File
Memory
Heavily used program data
Byte addressable array
Condition Codes Code, user data, (most) OS
Store status information about data
most recent arithmetic operation Includes stack used to
Used for conditional branching support procedures
Turning C into Object Code
Code in files p1.c p2.c
Compile with command: gcc -O p1.c p2.c -o p
Use optimizations (-O)
Put resulting binary in file p
Primitive operations
Perform arithmetic function on register or memory data
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Object Code
Code for sum Assembler
Translates .s into .o
0x401040 <sum>:
0x55 Binary encoding of each instruction
0x89 • Total of 13
bytes Nearly-complete image of executable
0xe5 code
0x8b • Each
0x45 instruction 1, Missing linkages between code in
2, or 3 bytes different files
0x0c
0x03 • Starts at
0x45 address Linker
0x401040
0x08 Resolves references between files
0x89
Combines with static run-time
0xec
0x5d libraries
0xc3 E.g., code for malloc, printf
Some libraries are dynamically linked
Linking occurs when program begins
execution
Machine Instruction Example
C Code
int t = x+y;
Add two signed integers
Assembly
addl 8(%ebp),%eax Add 2 4-byte integers
“Long” words in GCC parlance
Similar to Same instruction whether
expression signed or unsigned
y += x
Operands:
y: Register %eax
x: Memory M[%ebp+8]
t: Register %eax
» Return function value in %eax
0x401046: 03 45 08
Object Code
3-byte instruction
Stored at address 0x401046
Disassembling Object Code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi
Disassembler
objdump -d p
Useful tool for examining object code
Analyzes bit pattern of series of instructions
Produces approximate rendition of assembly code
Can be run on either a.out (complete executable) or .o file
Alternate Disassembly
Disassembled
Object
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08
0x89 Within gdb Debugger
0xec gdb p
0x5d
0xc3 disassemble sum
Disassemble procedure
x/13b sum
Examine the 13 bytes starting at sum
What Can Be Disassembled?
% objdump -d WINWORD.EXE
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
%edx
Moving Data %ecx
movl Source,Dest: %ebx
Move 4-byte (“long”) word
%esi
Lots of these in typical code
%edi
Operand Types
%esp
Immediate: Constant integer data
Like C constant, but prefixed with ‘$’ %ebp
E.g., $0x400, $-533
Encoded with 1, 2, or 4 bytes
Register: One of 8 integer registers
But %esp and %ebp reserved for special use
Others have special uses for particular instructions
Memory: 4 consecutive bytes of memory
Various “address modes”
movl Operand Combinations
Source Destination C Analog
movl -4(%ebp),%ebx
movl %ebp,%esp Finish
popl %ebp
ret
Understanding Swap
void swap(int *xp, int *yp) •
{ • Stack
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0; 8 xp
}
4 Rtn adr
0 Old %ebp %ebp
-4 Old %ebx
Register Variable
%ecx yp movl 12(%ebp),%ecx # ecx = yp
%edx xp movl 8(%ebp),%edx # edx = xp
%eax t1 movl (%ecx),%eax # eax = *yp (t1)
%ebx t0 movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 456 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 456 0x124
123 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Indexed Addressing Modes
Most General Form
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
D: Constant “displacement” 1, 2, or 4 bytes
Rb: Base register: Any of 8 integer registers
Ri: Index register: Any, except for %esp
Unlikely you’d use %ebp, either
S: Scale: 1, 2, 4, or 8
Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
Address Computation Examples
%edx 0xf000
%ecx 0x100
Uses
Computing address without doing memory reference
E.g., translation of p = &x[i];
Computing arithmetic expressions of the form x + k*y
k = 1, 2, 4, or 8.
Assembly
1) byte 3) branch/jump
2) 2-byte word 4) call
mem regs alu 3) 4-byte long word 5) ret
Cond. 4) contiguous byte allocation
Stack processor 5) address of initial byte
Codes
Whose Assembler?
Intel/Microsoft Format GAS/Gnu Format
lea eax,[ecx+ecx*2] leal (%ecx,%ecx,2),%eax
sub esp,8 subl $8,%esp
cmp dword ptr [ebp-8],0 cmpl $0,-8(%ebp)
mov eax,dword ptr [eax*4+100h] movl $0x100(,%eax,4),%eax
© 2020 KL University
Common Concurrency Problems
More recent work focuses on studying other types of common concurrency bugs.
• Take a brief look at some example concurrency problems found in real code
bases.
• Focus on four major open-source applications
– MySQL, Apache, Mozilla, OpenOffice.
Application What it does Non-Deadlock Deadlock
MySQL Database Server 14 9
Apache Web Server 13 4
Mozilla Web Browser 41 16
Open Office Office Suite 6 2
Total 74 31
1 Thread1::
2 if(thd->proc_info){
3 …
4 fputs(thd->proc_info , …);
5 …
6 }
7
8 Thread2::
9 thd->proc_info = NULL;
Atomicity-Violation Bugs (Cont.)
• Solution: Simply add locks around the shared-
variable references.
1 pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
2
3 Thread1::
4 pthread_mutex_lock(&lock);
5 if(thd->proc_info){
6 …
7 fputs(thd->proc_info , …);
8 …
9 }
10 pthread_mutex_unlock(&lock);
11
12 Thread2::
13 pthread_mutex_lock(&lock);
14 thd->proc_info = NULL;
15 pthread_mutex_unlock(&lock);
Order-Violation Bugs
• The desired order between two memory accesses
is flipped.
– i.e., A should always be executed before B, but the
order is not enforced during execution.
– Example:
• The code in Thread2 seems to assume that the variable
mThread has already been initialized (and is not NULL).
1 Thread1::
2 void init(){
3 mThread = PR_CreateThread(mMain, …);
4 }
5
6 Thread2::
7 void mMain(…){
8 mState = mThread->State
9 }
Order-Violation Bugs (Cont.)
• Solution: Enforce ordering using condition
variables
1
2
pthread_mutex_t mtLock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t mtCond = PTHREAD_COND_INITIALIZER;
3 int mtInit = 0;
4
5 Thread 1::
6 void init(){
7 …
8 mThread = PR_CreateThread(mMain,…);
9
10 // signal that the thread has been created.
11 pthread_mutex_lock(&mtLock);
12 mtInit = 1;
13 pthread_cond_signal(&mtCond);
14 pthread_mutex_unlock(&mtLock);
15 …
16 }
17
18 Thread2::
19 void mMain(…){
20 …
Order-Violation Bugs (Cont.)
21 // wait for the thread to be initialized …
22 pthread_mutex_lock(&mtLock);
23 while(mtInit == 0)
24 pthread_cond_wait(&mtCond, &mtLock);
25 pthread_mutex_unlock(&mtLock);
26
27 mState = mThread->State;
28 …
29 }
Deadlock Bugs
Thread 1: Thread 2:
lock(L1); lock(L2);
lock(L2); lock(L1);
– The presence of a cycle
• Thread1 is holding a lock L1 and waiting for another one,
L2.
• Thread2 that holds lock L2 is waiting for L1 to be release.
Holds
Thread 1 Lock L1
Wanted by
Wanted by
Lock L2 Thread 2
Holds
Conditional for Deadlock
• Four conditions need to hold for a deadlock to
occur.
Condition Description
Mutual Exclusion Threads claim exclusive control of resources that they require.
No preemption Resources cannot be forcibly removed from threads that are holding them.
There exists a circular chain of threads such that each thread holds one more
Circular wait
resources that are being requested by the next thread in the chain
1 top:
2 lock(L1);
3 if( tryLock(L2) == -1 ){
4 unlock(L1);
5 goto top;
6 }
Prevention – No Preemption (Cont.)
• livelock
– Both systems are running through the code sequence
over and over again.
– Progress is not being made.
– Solution:
• Add a random delay before looping back and trying the
entire thing over again.
Prevention – Mutual Exclusion
• wait-free
– Using powerful hardware instruction.
– You can build data structures in a manner that does
not require explicit locking.
1 int CompareAndSwap(int *address, int expected, int new){
2 if(*address == expected){
3 *address = new;
4 return 1; // success
5 }
6 return 0;
7 }
Prevention – Mutual Exclusion (Cont.)
• We now wanted to atomically increment a value
by a certain amount:
1 void AtomicIncrement(int *value, int amount){
2 do{
3 int old = *value;
4 }while( CompareAndSwap(value, old, old+amount)==0);
5 }
– No lock is acquired
– No deadlock can arise
– livelock is still a possibility.
Prevention – Mutual Exclusion (Cont.)
• More complex example: list insertion
1 void insert(int value){
2 node_t * n = malloc(sizeof(node_t));
3 assert( n != NULL );
4 n->value = value ;
5 n->next = head;
6 head = n;
7 }
CPU 1 T3 T4
CPU 2 T1 T2
Example of Deadlock Avoidance via
Scheduling (2)
• More contention for the same resources
T1 T2 T3 T4
L1 yes yes yes no
L2 yes yes yes no
CPU 1 T4
CPU 2 T1 T2 T3
• Pi requests instance of Rj Pi
Rj
• Pi is holding an instance of Rj Pi
Rj
Example of Detection Algorithm
• Five processes P0 through P4; three resource types
A (7 instances), B (2 instances), and C (6 instances)
• Snapshot at time T0:
Allocation Request Available
ABC ABC ABC
P0 010 000 000
P1 200 202
P2 303 000
P3 211 100
P4 002 002
• Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i
Example (Cont.)
• P2 requests an additional instance of type C
Request
ABC
P0 0 0 0
P1 2 0 1
P2 001
P3 100
P4 002
• State of system?
– Can reclaim resources held by process P0, but insufficient resources to
fulfill other processes; requests
– Deadlock exists, consisting of processes P1, P2, P3, and P4
Banker’s Algorithm
• Multiple instances
Need
ABC
P0 743
P1 122
P2 600
P3 011
P4 431
• The system is in a safe state since the sequence < P1, P3, P4, P2, P0>
satisfies safety criteria
Example: P1 Request (1,0,2)
• Check that Request Available (that is, (1,0,2) (3,3,2) true
Allocation Need Available
ABC ABC ABC
P0 010 743 230
P1 302 020
P2 301 600
P3 211 011
P4 002 431
• Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2>
satisfies safety requirement
• Can request for (3,3,0) by P4 be granted?
• Can request for (0,2,0) by P0 be granted?
Operating Systems Design - 19CS2106R
Session 37: The Producer/Consumer (Bounded
Buffer), Reader-Writer Locks, The Dining
Philosophers Problem
© 2020 KL University
The Producer/Consumer (Bounded-Buffer)
Problem
• Producer: put() interface
• Wait for a buffer to become empty in order to put data into it.
• Consumer: get() interface
• Wait for a buffer to become filled before using it.
1 int buffer[MAX];
2 int fill = 0;
3 int use = 0;
4
5 void put(int value) {
6 buffer[fill] = value; // line f1
7 fill = (fill + 1) % MAX; // line f2
8 }
9
10 int get() {
11 int tmp = buffer[use]; // line g1
12 use = (use + 1) % MAX; // line g2
13 return tmp;
14 }
The Producer/Consumer (Bounded-Buffer)
Problem
1 sem_t empty;
2 sem_t full;
3
4 void *producer(void *arg) {
5 int i;
6 for (i = 0; i < loops; i++) {
7 sem_wait(&empty); // line P1
8 put(i); // line P2
9 sem_post(&full); // line P3
10 }
11 }
12
13 void *consumer(void *arg) {
14 int i, tmp = 0;
15 while (tmp != -1) {
16 sem_wait(&full); // line C1
17 tmp = get(); // line C2
18 sem_post(&empty); // line C3
19 printf("%d\n", tmp);
20 }
21 }
22 …
First Attempt: Adding the Full and Empty Conditions
The Producer/Consumer (Bounded-Buffer)
Problem
21 int main(int argc, char *argv[]) {
22 // …
23 sem_init(&empty, 0, MAX); // MAX buffers are empty to begin with…
24 sem_init(&full, 0, 0); // … and 0 are full
25 // …
26 }
P1
f2 f1
P2 P0
f3 f0
P3 P4
f4
The Dining Philosophers (Cont.)
• Key challenge
• There is no deadlock.
• No philosopher starves and never gets to eat.
• Concurrency is high.
// helper functions
while (1) { int left(int p) { return p; }
think();
getforks(); int right(int p) {
eat(); return (p + 1) % 5;
putforks(); }
}
© 2020 KL University
Semaphore: A definition
• An object with an integer value
• We can manipulate with two routines; sem_wait() and sem_post().
• Initialization
• If the value of the semaphore was one or higher when called sem_wait(),
return right away.
• It will cause the caller to suspend execution waiting for a subsequent post.
• When negative, the value of the semaphore is equal to the number of waiting
threads.
Semaphore: Interact with semaphore (Cont.)
• sem_post()
1 int sem_post(sem_t *s) {
2 increment the value of semaphore s by one
3 if there are one or more threads waiting, wake one
4 }
1 sem_t m;
2 sem_init(&m, 0, X); // initialize semaphore to X; what should X
be?
3
4 sem_wait(&m);
5 //critical section here
6 sem_post(&m);
Thread Trace: Single Thread Using A
Semaphore
Value of Semaphore Thread 0 Thread 1
1
1 call sema_wait()
•
0 sem_wait() returns
0 (crit sect)
0 call sem_post()
1 sem_post() returns
Thread Trace: Two Threads Using A Semaphore
Value Thread 0 State Thread 1 State
1 Running Ready
1 call sem_wait() Running Ready
0 sem_wait() retruns Running Ready
0 (crit set: begin) Running Ready
0 Interrupt; Switch → T1 Ready Running
0 Ready call sem_wait() Running
-1 Ready decrement sem Running
-1 Ready (sem < 0)→sleep sleeping
-1 Running Switch → T0 sleeping
-1 (crit sect: end) Running sleeping
-1 call sem_post() Running sleeping
0 increment sem Running sleeping
0 wake(T1) Running Ready
0 sem_post() returns Running Ready
0 Interrupt; Switch → T1 Ready Running
0 Ready sem_wait() retruns Running
0 Ready (crit sect) Running
0 Ready call sem_post() Running
1 Ready sem_post() returns Running
7
Semaphores As Condition Variables
1 sem_t s;
2
3 void *
4 child(void *arg) {
5 printf("child\n");
6 sem_post(&s); // signal here: child is done
7 return NULL;
8 }
9
10 int
11 main(int argc, char *argv[]) {
12 sem_init(&s, 0, X); // what should X be?
13 printf("parent: begin\n");
14 pthread_t c;
15 pthread_create(c, NULL, child, NULL);
16 sem_wait(&s); // wait here for child
17 printf("parent: end\n"); parent: begin
18 return 0; child
19 } parent: end
A Parent Waiting For Its Child The execution result
• What should X be?
• The value of semaphore should be set to is 0.
Thread Trace: Parent Waiting For Child (Case 1)
• The parent call sem_wait() before the child has called
sem_post().
Value Parent State Child State
0 Create(Child) Running (Child exists; is runnable) Ready
0 call sem_wait() Running Ready
-1 decrement sem Running Ready
-1 (sem < 0)→sleep sleeping Ready
-1 Switch→Child sleeping child runs Running
-1 sleeping call sem_post() Running
0 sleeping increment sem Running
0 Ready wake(Parent) Running
0 Ready sem_post() returns Running
0 Ready Interrupt; Switch→Parent Ready
0 sem_wait() retruns Running Ready
Thread Trace: Parent Waiting For Child (Case 2)
• The child runs to completion before the parent call - sem_wait().
Value Parent State Child State
0 Create(Child) Running (Child exists; is runnable) Ready
0 Interrupt; switch→Child Ready child runs Running
0 Ready call sem_post() Running
1 Ready increment sem Running
1 Ready wake(nobody) Running
1 Ready sem_post() returns Running
1 parent runs Running Interrupt; Switch→Parent Ready
1 call sem_wait() Running Ready
0 decrement sem Running Ready
0 (sem<0)→awake Running Ready
0 sem_wait() retruns Running Ready
How To Implement Semaphores
• Build our own version of semaphores called Zemaphores
1 typedef struct __Zem_t {
2 int value;
3 pthread_cond_t cond;
4 pthread_mutex_t lock;
5 } Zem_t;
6
7 // only one thread can call this
8 void Zem_init(Zem_t *s, int value) {
9 s->value = value;
10 Cond_init(&s->cond);
11 Mutex_init(&s->lock);
12 }
13
14 void Zem_wait(Zem_t *s) {
15 Mutex_lock(&s->lock);
16 while (s->value <= 0)
17 Cond_wait(&s->cond, &s->lock);
18 s->value--;
19 Mutex_unlock(&s->lock);
20 }
21 …
How To Implement Semaphores (Cont.)
22 void Zem_post(Zem_t *s) {
23 Mutex_lock(&s->lock);
24 s->value++;
25 Cond_signal(&s->cond);
26 Mutex_unlock(&s->lock);
27 }
• Zemaphore don’t maintain the invariant that the value of the semaphore.
• The value never be lower than zero.
• This behavior is easier to implement and matches the current Linux implementation.
Counting Semaphores
The entry also specifies the number of semaphores in the array, the time of the
last semop call, and the time of the last semctl call.
Processes manipulate semaphores with
the semop system call:
oldval = semop(id, oplist, count);
where oplist is a pointer to an array of semaphore operations, and count is
the size of the array. The return value, oldval, is the value of the last
semaphore operated on in the set before the operation was done. The
format of each element of oplist is,
•The semaphore number identifying the semaphore array entry being
operated on
•The operation
•Flags
© 2020 KL University
Synchronisation and Communication
The correct behaviour of a concurrent program depends on synchronisation
and communication between its processes
Synchronisation: the satisfaction of constraints on the interleaving of the
actions of processes (e.g. an action by one process only occurring after an
action by another)
Communication: the passing of information from one process to another
– Concepts are linked since communication requires synchronisation, and synchronisation
can be considered as contentless communication.
– Data communication is usually based upon either shared variables or message passing.
• A sequence of statements that must appear to be executed indivisibly is called a critical section
• The synchronisation required to protect a critical section is known as mutual exclusion
Synchronization Synchronize threads/coordinate their activities so that
when you access the shared data (e.g., global variables)
An example: race condition.
you are not having a trouble.
Critical
section:
Critical
section:
5
Protecting Accesses to Shared Variables:
Mutexes
1. Thread 1 fetches the current value of glob into its local variable loc. Let’s assume that the current value
of glob is 2000.
2. The scheduler time slice for thread 1 expires, and thread 2 commences execution.
3. Thread 2 performs multiple loops in which it fetches the current value of glob into its local variable loc,
increments loc, and assigns the result to glob. In the first of these loops, the value fetched from glob will
be 2000. Let’s suppose that by the time the time slice for thread 2 has expired, glob has been increased to
3000.
4. Thread 1 receives another time slice and resumes execution where it left off. Having previously (step 1)
copied the value of glob (2000) into its loc, it now increments loc and assigns the result (2001) to glob. At
this point, the effect of the increment operations performed by thread 2 is lost.
If we run the program in Listing 30-1 multiple times with the same command-line argument, we see that
the printed value of glob fluctuates wildly:
$ ./thread_incr 10000000
glob = 10880429
$ ./thread_incr 10000000
glob = 13493953
This nondeterministic behavior is a consequence of the vagaries of the kernel’s CPU scheduling
decisions. In complex programs, this nondeterministic behavior means that such errors may occur only
rarely, be hard to reproduce, and therefore be difficult to find.
6
Protecting Accesses to Shared Variables:
Mutexes
To avoid the problems that can occur when threads try to update a shared variable, we must use a mutex
(short for mutual exclusion) to ensure that only one thread at a time can access the variable. More generally,
mutexes can be used to ensure atomic access to any shared resource, but protecting shared variables is
the most common use.
A mutex has two states: locked and unlocked. At any moment, at most one thread may hold the lock on a
mutex. Attempting to lock a mutex that is already locked either blocks or fails with an error, depending on the
method used to place the lock.
When a thread locks a mutex, it becomes the owner of that mutex. Only the mutex owner can unlock the
mutex. This property improves the structure of code that uses mutexes and also allows for some
optimizations in the implementation of mutexes. Because of this ownership property, the terms acquire and
release are
sometimes used synonymously for lock and unlock.
In general, we employ a different mutex for each shared resource (which may consist of multiple related
variables), and each thread employs the following protocol for accessing a resource:
• lock the mutex for the shared resource;
• access the shared resource; and
• unlock the mutex.
7
Protecting Accesses to Shared Variables:
Mutexes
Finally, note that mutex locking is advisory, rather than mandatory. By this, we
mean that a thread is free to ignore the use of a mutex and simply access the
corresponding shared variable(s). In order to safely handle shared variables, all
threads must cooperate in their use of a mutex, abiding by the locking rules it
enforces.
8
9
Lock-based Concurrent Data structure
Adding locks to a data structure makes the structure thread safe.
A block of code is thread-safe if it can be simultaneously executed by
multiple threads without causing problems.
• Thread-safeness: in a nutshell, refers an application's ability to execute
multiple threads simultaneously without "clobbering" shared data or
creating "race" conditions.
• For example, suppose that your application creates several threads, each
of which makes a call to the same library routine:
• This library routine accesses/modifies a global structure or location in memory.
• As each thread calls this routine it is possible that they may try to modify this
global structure/memory location at the same time.
• If the routine does not employ some sort of synchronization constructs to
prevent data corruption, then it is not thread-safe.
Lock-based Concurrent Data structure
Solution #1
• An obvious solution is to simply lock the list any time that a thread attempts to
access it.
• A call to each of the three functions can be protected by a mutex.
Solution #2
• Instead of locking the entire list, we could try to lock individual nodes.
• A “finer-grained” approach.
1 // basic node structure
2 typedef struct __node_t {
3 int key;
4 struct __node_t *next;
5 pthread_mutex_t lock;
6 } node_t;
Concurrent Linked Lists
1 // basic node structure
2 typedef struct __node_t {
3 int key;
4 struct __node_t *next;
5 } node_t;
6
7 // basic list structure (one used per list)
8 typedef struct __list_t {
9 node_t *head;
10 pthread_mutex_t lock;
11 } list_t;
12
13 void List_Init(list_t *L) {
14 L->head = NULL;
15 pthread_mutex_init(&L->lock, NULL);
16 }
17
(Cont.)
12
Concurrent Linked Lists(Cont.)
(Cont.)
18 int List_Insert(list_t *L, int key) {
19 pthread_mutex_lock(&L->lock);
20 node_t *new = malloc(sizeof(node_t));
21 if (new == NULL) {
22 perror("malloc");
23 pthread_mutex_unlock(&L->lock);
24 return -1; // fail
26 new->key = key;
27 new->next = L->head;
28 L->head = new;
29 pthread_mutex_unlock(&L->lock);
30 return 0; // success
31 }
(Cont.)
13
Concurrent Linked Lists(Cont.)
(Cont.)
32
32 int List_Lookup(list_t *L, int key) {
33 pthread_mutex_lock(&L->lock);
34 node_t *curr = L->head;
35 while (curr) {
36 if (curr->key == key) {
37 pthread_mutex_unlock(&L->lock);
38 return 0; // success
39 }
40 curr = curr->next;
41 }
42 pthread_mutex_unlock(&L->lock);
43 return -1; // failure
44 }
14
Concurrent Linked Lists(Cont.)
The code acquires a lock in the insert routine upon entry.
This kind of exceptional control flow has been shown to be quite error prone.
Solution: The lock and release only surround the actual critical section in the
insert code
15
Concurrent Linked List: Rewritten
1 void List_Init(list_t *L) {
2 L->head = NULL;
3 pthread_mutex_init(&L->lock, NULL);
4 }
5
6 void List_Insert(list_t *L, int key) {
7 // synchronization not needed
8 node_t *new = malloc(sizeof(node_t));
9 if (new == NULL) {
10 perror("malloc");
11 return;
12 }
13 new->key = key;
14
15 // just lock critical section
16 pthread_mutex_lock(&L->lock);
17 new->next = L->head;
18 L->head = new;
19 pthread_mutex_unlock(&L->lock);
20 }
21
16
Concurrent Linked List: Rewritten(Cont.)
(Cont.)
22 int List_Lookup(list_t *L, int key) {
23 int rv = -1;
24 pthread_mutex_lock(&L->lock);
25 node_t *curr = L->head;
26 while (curr) {
27 if (curr->key == key) {
28 rv = 0;
29 break;
30 }
31 curr = curr->next;
32 }
33 pthread_mutex_unlock(&L->lock);
34 return rv; // now both success and failure
35 }
17
Scaling Linked List
Hand-over-hand locking (lock coupling)
Add a lock per node of the list instead of having a single lock for the entire list.
18
Pthreads Read-Write Locks
Neither of our multi-threaded linked lists exploits the potential for simultaneou
s access to any node by threads that are executing Member.
The first solution only allows one thread to access the entire list at any instant
The second only allows one thread to access any given node at any instant.
A read-write lock is somewhat like a mutex except that it provides two lock fun
ctions.
The first lock function locks the read-write lock for reading, while the second l
ocks it for writing.
19
Pthreads Read-Write Locks
So multiple threads can simultaneously obtain the lock by calling the read-lock
function, while only one thread can obtain the lock by calling the write-lock fu
nction.
Thus, if any threads own the lock for reading, any threads that want to obtain
the lock for writing will block in the call to the write-lock function.
If any thread owns the lock for writing, any threads that want to obtain the loc
k for reading or writing will block in their respective locking functions.
20
Pthreads Read-Write Locks
Readerwriter locks are similar to mutexes, except that they allow for higher degrees of paralleli
sm. With a mutex, the state is either locked or unlocked, and only one thread can lock it at a ti
me. Three states are possible with a readerwriter lock: locked in read mode, locked in write mo
de, and unlocked. Only one thread at a time can hold a readerwriter lock in write mode, but m
ultiple threads can hold a readerwriter lock in read mode at the same time.
When a readerwriter lock is write-locked, all threads attempting to lock it block until it is unloc
ked. When a readerwriter lock is read-locked, all threads attempting to lock it in read mode ar
e given access, but any threads attempting to lock it in write mode block until all the threads
have relinquished their read locks. Although implementations vary, readerwriter locks usually bl
ock additional readers if a lock is already held in read mode and a thread is blocked trying to
acquire the lock in write mode. This prevents a constant stream of readers from starving waitin
g writers.
21
Pthreads Read-Write Locks
Readerwriter locks are well suited for situations in which data structures are read more of
ten than they are modified. When a readerwriter lock is held in write mode, the data stru
cture it protects can be modified safely, since only one thread at a time can hold the loc
k in write mode. When the readerwriter lock is held in read mode, the data structure it p
rotects can be read by multiple threads, as long as the threads first acquire the lock in re
ad mode.
Readerwriter locks are also called sharedexclusive locks. When a readerwriter lock is read
-locked, it is said to be locked in shared mode. When it is write-locked, it is said to be lo
cked in exclusive mode.
As with mutexes, readerwriter locks must be initialized before use and destroyed before f
reeing their underlying memory.
22
Pthreads Read-Write Locks
#include <pthread.h>
int pthread_rwlock_init(pthread_rwlock_t *restrict rwlock, const
pthread_rwlockattr_t *restrict attr);
int pthread_rwlock_destroy(pthread_rwlock_t *rwlock);
Both return: 0 if OK, error number on failure
#include <pthread.h>
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock);
All return: 0 if OK, error number on failure
23
Thank you
19CS2106R
© 2020 KL University
Concurrency vs. Parallelism
Concurrency: 2 processes or threads run concurrently (are concurrent) if
their flows overlap in time
Otherwise, they are sequential.
Examples (running on single core):
Concurrent: A & B, A & C
Sequential: B & C
Server gives the page name to the thread and resumes listening.
Thread checks the disk cache in memo; if page not there, do disk I/O;
sends the page to the client.
(New) Process Address Space w/ Threads
Thread State
State shared by all threads in process:
Memory content (global variables, heap, code, etc).
I/O (files, network connections, etc).
A change in the global variable will be seen by all other threads (unlike processes).
int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);
#include <pthread.h>
int pthread_equal(pthread_t tid1, pthread_t tid2);
11
Thread Identification
A thread can obtain its own thread ID by calling the pthread_self function.
#include <pthread.h>
pthread_t pthread_self(void);
This function can be used with pthread_equal when a thread needs to identify data structures that are
tagged with its thread ID. For example, a master thread might place work assignments on a queue and use the
thread ID to control which jobs go to each worker thread.
Thread Termination
• If any thread within a process calls exit, _Exit, or _exit, then the entire
process terminates. Similarly, when the default action is to terminate
the process, a signal sent to a thread will terminate the entire process.
• A single thread can exit in three ways, thereby stopping its flow of
control, without terminating the entire process.
• The thread can simply return from the start routine. The return value is the
thread's exit code.
• The thread can be canceled by another thread in the same process.
• The thread can call pthread_exit.
#include <pthread.h>
void pthread_exit(void *rval_ptr);
The rval_ptr is a typeless pointer, similar to the single argument passed to the start routine.
This pointer is available to other threads in the process by calling the pthread_join function.
13
Wait for a thread to complete
#include <pthread.h>
int pthread_join(pthread_t thread, void **rval_ptr);
• The calling thread will block until the specified thread calls pthread_exit, returns
from its start routine, or is canceled. If the thread simply returned from its start
routine, rval_ptr will contain the return code. If the thread was canceled, the
memory location specified by rval_ptr is set to PTHREAD_CANCELED.
• By calling pthread_join, we automatically place a thread in the detached state
(discussed shortly) so that its resources can be recovered. If the thread was
already in the detached state, calling pthread_join fails, returning EINVAL.
• If we're not interested in a thread's return value, we can set rval_ptr to NULL. In
this case, calling pthread_join allows us to wait for the specified thread, but
does not retrieve the thread's termination status.
14
Joining
15
Detach
• The pthread_detach()routine can be used to explicitly detach a thread even
though it was created as joinable
• There is no converse routine
• Recommendations:
• If a thread requires joining, consider explicitly creating it as joinable
• This provides portability as not all implementations may create threads as joinable
by default
• If you know in advance that a thread will never need to join with another thread,
consider creating it in a detached state
• Some system resources may be able to be freed.
16
17
Locks
• a synchronization mechanism for enforcing limits on access to a resource in an environment where
there are many threads of execution
18
Locks (Cont.)
• All locks must be properly initialized.
• One way: using PTHREAD_MUTEX_INITIALIZER
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
19
Youjip Won
Locks (Cont.)
• Check errors code when calling lock and unlock
• An example wrapper
// Use this to keep your code clean but check for failures
// Only use if exiting program is OK upon failure
void Pthread_mutex_lock(pthread_mutex_t *mutex) {
int rc = pthread_mutex_lock(mutex);
assert(rc == 0);
}
20
Locks (Cont.)
• These two calls are also used in lock acquisition
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);
21
Condition Variables
A condition variable allows a thread to block itself until specified data
reaches a predefined state.
A condition variable is associated with a predicate.
When the predicate becomes true, the condition variable is used to signal one
or more threads waiting on the condition.
A single condition variable may be associated with more than one
predicate.
A condition variable always has a mutex associated with it.
A thread locks this mutex and tests the predicate defined on the shared
variable.
If the predicate is not true, the thread waits on the condition variable
associated with the predicate using the function
pthread_cond_wait.
Condition Variables
• Condition variables are useful when some kind of signaling must take place
between threads.
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);
• pthread_cond_wait:
• Put the calling thread to sleep.
• Wait for some other thread to signal it.
• pthread_cond_signal:
• Unblock at least one of the threads that are blocked on the condition variable
• A condition variable is a data object that allows a thread to suspend execution
until a certain event or condition occurs.
• When the event or condition occurs another thread can signal the thread to
“wake up.”
23
Condition Variables (Cont.)
• A thread calling wait routine:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;
pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);
• The wait call releases the lock when putting said caller to sleep.
• Before returning after being woken, the wait call re-acquire the lock.
• A thread calling signal routine:
pthread_mutex_lock(&lock);
initialized = 1;
pthread_cond_signal(&init);
pthread_mutex_unlock(&lock);
24
Condition Variables (Cont.)
• The waiting thread re-checks the condition in a while loop, instead of
a simple if statement.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;
pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);
• Without rechecking, the waiting thread will continue thinking that the
condition has changed even though it has not.
25
Condition Variables (Cont.)
• Don’t ever to this.
• A thread calling wait routine:
while(initialized == 0)
; // spin
26
Compiling and Running
• To compile them, you must include the header pthread.h
• Explicitly link with the pthreads library, by adding the –lpthread flag.
prompt> gcc –o main main.c -lpthread
27
/* A simple child/parent signaling example. - main-signal.c */
#include <stdio.h>
#include <pthread.h>
int done = 0;
void* worker(void* arg) {
printf("this should print first\n");
done = 1;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
pthread_create(&p, NULL, worker, NULL);
while (done == 0)
;
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last
*/
28
/* A more efficient signaling via condition variables. - main-signal-cv.c */
#include <stdio.h>
#include <pthread.h>
/* simple synchronizer: allows one thread to wait for another structure
"synchronizer_t" has all the needed data methods are:
init (called by one thread)
wait (to wait for a thread)
done (to indicate thread is done) */
typedef struct __synchronizer_t {
pthread_mutex_t lock;
pthread_cond_t cond;
int done;
} synchronizer_t;
synchronizer_t s;
void signal_init(synchronizer_t *s) {
pthread_mutex_init(&s->lock, NULL);
pthread_cond_init(&s->cond, NULL);
s->done = 0;
}
void signal_done(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
s->done = 1;
pthread_cond_signal(&s->cond);
pthread_mutex_unlock(&s->lock); 29
}
void signal_wait(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
while (s->done == 0)
pthread_cond_wait(&s->cond, &s->lock);
pthread_mutex_unlock(&s->lock);
}
void* worker(void* arg) {
printf("this should print first\n");
signal_done(&s);
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
signal_init(&s);
pthread_create(&p, NULL, worker, NULL);
signal_wait(&s);
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal-cv.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last 30
*/
Thank you
19CS2106R
© 2020 KL University
Thread Concepts
• A typical UNIX process can be thought of as having a single thread of control: each
process is doing only one thing at a time. With multiple threads of control, we can
design our programs to do more than one thing at a time within a single process,
with each thread handling a separate task. This approach can have several benefits.
o We can simplify code that deals with asynchronous events by assigning a separate thread to
handle each event type. Each thread can then handle its event using a synchronous programming
model. A synchronous programming model is much simpler than an asynchronous one.
o Multiple processes have to use complex mechanisms provided by the operating system to share
memory and file descriptors. Threads, on the other hand, automatically have access to the same
memory address space and file descriptors.
o Some problems can be partitioned so that overall program throughput can be improved. A single
process that has multiple tasks to perform implicitly serializes those tasks, because there is only
one thread of control. With multiple threads of control, the processing of independent tasks can
be interleaved by assigning a separate thread per task. Two tasks can be interleaved only if they
don't depend on the processing performed by each other.
o Similarly, interactive programs can realize improved response time by using multiple threads to
separate the portions of the program that deal with user input and output from the other parts
of the program.
Thread Creation
• How to create and control threads?
#include <pthread.h>
int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);
#include <pthread.h>
int pthread_equal(pthread_t tid1, pthread_t tid2);
4
Thread Identification
A thread can obtain its own thread ID by calling the pthread_self function.
#include <pthread.h>
pthread_t pthread_self(void);
This function can be used with pthread_equal when a thread needs to identify data structures that are
tagged with its thread ID. For example, a master thread might place work assignments on a queue and use the
thread ID to control which jobs go to each worker thread.
Thread Termination
• If any thread within a process calls exit, _Exit, or _exit, then the entire
process terminates. Similarly, when the default action is to terminate
the process, a signal sent to a thread will terminate the entire process.
• A single thread can exit in three ways, thereby stopping its flow of
control, without terminating the entire process.
• The thread can simply return from the start routine. The return value is the
thread's exit code.
• The thread can be canceled by another thread in the same process.
• The thread can call pthread_exit.
#include <pthread.h>
void pthread_exit(void *rval_ptr);
The rval_ptr is a typeless pointer, similar to the single argument passed to the start routine.
This pointer is available to other threads in the process by calling the pthread_join function.
6
Wait for a thread to complete
#include <pthread.h>
int pthread_join(pthread_t thread, void **rval_ptr);
• The calling thread will block until the specified thread calls pthread_exit, returns
from its start routine, or is canceled. If the thread simply returned from its start
routine, rval_ptr will contain the return code. If the thread was canceled, the
memory location specified by rval_ptr is set to PTHREAD_CANCELED.
• By calling pthread_join, we automatically place a thread in the detached state
(discussed shortly) so that its resources can be recovered. If the thread was
already in the detached state, calling pthread_join fails, returning EINVAL.
• If we're not interested in a thread's return value, we can set rval_ptr to NULL. In
this case, calling pthread_join allows us to wait for the specified thread, but
does not retrieve the thread's termination status.
7
8
Locks
• a synchronization mechanism for enforcing limits on access to a resource in an environment where
there are many threads of execution
9
Locks (Cont.)
• All locks must be properly initialized.
• One way: using PTHREAD_MUTEX_INITIALIZER
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
10
Youjip Won
Locks (Cont.)
• Check errors code when calling lock and unlock
• An example wrapper
// Use this to keep your code clean but check for failures
// Only use if exiting program is OK upon failure
void Pthread_mutex_lock(pthread_mutex_t *mutex) {
int rc = pthread_mutex_lock(mutex);
assert(rc == 0);
}
11
Locks (Cont.)
• These two calls are also used in lock acquisition
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);
12
Condition Variables
• Condition variables are useful when some kind of signaling must take place
between threads.
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);
• pthread_cond_wait:
• Put the calling thread to sleep.
• Wait for some other thread to signal it.
• pthread_cond_signal:
• Unblock at least one of the threads that are blocked on the condition variable
• A condition variable is a data object that allows a thread to suspend execution
until a certain event or condition occurs.
• When the event or condition occurs another thread can signal the thread to
“wake up.”
• A condition variable is always associated with a mutex.
13
Condition Variables (Cont.)
• A thread calling wait routine:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;
pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);
• The wait call releases the lock when putting said caller to sleep.
• Before returning after being woken, the wait call re-acquire the lock.
• A thread calling signal routine:
pthread_mutex_lock(&lock);
initialized = 1;
pthread_cond_signal(&init);
pthread_mutex_unlock(&lock);
14
Condition Variables (Cont.)
• The waiting thread re-checks the condition in a while loop, instead of
a simple if statement.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;
pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);
• Without rechecking, the waiting thread will continue thinking that the
condition has changed even though it has not.
15
Condition Variables (Cont.)
• Don’t ever to this.
• A thread calling wait routine:
while(initialized == 0)
; // spin
16
Compiling and Running
• To compile them, you must include the header pthread.h
• Explicitly link with the pthreads library, by adding the –lpthread flag.
prompt> gcc –o main main.c -lpthread
17
/* A simple child/parent signaling example. - main-signal.c */
#include <stdio.h>
#include <pthread.h>
int done = 0;
void* worker(void* arg) {
printf("this should print first\n");
done = 1;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
pthread_create(&p, NULL, worker, NULL);
while (done == 0)
;
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last
*/
18
/* A more efficient signaling via condition variables. - main-signal-cv.c */
#include <stdio.h>
#include <pthread.h>
/* simple synchronizer: allows one thread to wait for another structure
"synchronizer_t" has all the needed data methods are:
init (called by one thread)
wait (to wait for a thread)
done (to indicate thread is done) */
typedef struct __synchronizer_t {
pthread_mutex_t lock;
pthread_cond_t cond;
int done;
} synchronizer_t;
synchronizer_t s;
void signal_init(synchronizer_t *s) {
pthread_mutex_init(&s->lock, NULL);
pthread_cond_init(&s->cond, NULL);
s->done = 0;
}
void signal_done(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
s->done = 1;
pthread_cond_signal(&s->cond);
pthread_mutex_unlock(&s->lock); 19
}
void signal_wait(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
while (s->done == 0)
pthread_cond_wait(&s->cond, &s->lock);
pthread_mutex_unlock(&s->lock);
}
void* worker(void* arg) {
printf("this should print first\n");
signal_done(&s);
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
signal_init(&s);
pthread_create(&p, NULL, worker, NULL);
signal_wait(&s);
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal-cv.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last 20
*/
Thank you
19CS2106R
Operating Systems Design
Session 34: Shared Memory Interprocess
communication
© 2020 KL University
Shared Memory
• Normally, the Unix kernel prohibits one process from accessing
(reading, writing) memory belonging to another process
• Sometimes, however, this restriction is inconvenient
• At such times, System V IPC Shared Memory can be created to
specifically allow on process to read and/or write to memory created
by another process
• Efficiency:
• unlike message queues and pipes, which copy data from the process into
memory within the kernel, shared memory is directly accessed
• Shared memory resides in the user process memory, and is then shared
among other processes
Disadvantages of Shared Memory
• No automatic synchronization as in pipes or message queues (you
have to provide any synchronization). Synchronize with semaphores
or signals.
• You must remember that pointers are only valid within a given
process. Thus, pointer offsets cannot be assumed to be valid across
inter-process boundaries. This complicates the sharing of linked lists
or binary trees.
Shared Memory
Sharing the part of virtual memory and reading to and writing
from it, is another way for the processes to communicate. The
system calls are:
• shmget creates a new region of shared memory or returns an
existing one.
• shmat logically attaches a region to the virtual address space of
a process.
• shmdt logically detaches a region.
• shmctl manipulates the parameters associated with the region.
Shared Memory
• Allows multiple processes to share a region of
memory
• Fastest form of IPC: no need of data copying between client & server
stack 0xf7fffb2c
0xf77e86a0
shared memory 0xf77d0000
0x0003d2c8
heap malloc of 100,000 bytes
0x00024c28
initialized data
Syntax of shmctl:
shmctl(id, cmd, shmstatbuf);
which is similar to msgctl
shmdt()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
© 2020 KL University
UNIX provides several different IPC mechanisms.
Interprocess interactions have several distinct purposes:
• Data transfer — One process may wish to send data to another process. The amount of data sent may vary
from one byte to several megabytes.
• Sharing data — Multiple processes may wish to operate on shared data, such that if a process modifies the
data, that change will be immediately visible to other processes sharing it.
• Event notification — A process may wish to notify another process or set of processes that some event has
occurred. For instance, when a process terminates, it may need to inform its parent process. The receiver
may be notified asynchronously, in which case its normal processing is interrupted. Alternatively, the
receiver may choose to wait for the notification.
• Resource sharing — Although the kernel provides default semantics for resource allocation, they are not
suitable for all applications. A set of cooperating processes may wish to define their own protocol for
accessing specific resources. Such rules are usually implemented by a locking and synchronization scheme,
which must be built on top of the basic set of primitives provided by the kernel.
• Process control — A process such as a debugger may wish to assume complete control over the execution
of another (target) process. The controlling process may wish to intercept all traps and exceptions intended
for the target and be notified of any change in the target's state.
Interprocess Communication
Processes have to communicate to exchange data and to synchronize
operations. Several forms of interprocess communication are: pipes,
named pipes and signals. But each one has some limitations.
• System V IPC
The UNIX System V IPC package consists of three mechanisms:
1. Message Queues/Messages allow process to send formatted data
streams to arbitrary processes.
2. Shared memory allows processes to share parts of their virtual
address space. an area of memory accessible by multiple processes.
3. Semaphores allow processes to synchronize execution. Can be used
to implement critical-section problems; allocation of resources.
IPC System Calls
msg/sem/shm get
Functionality Message Semaphore Shared Create new or open existing IPC structure.
Queue Memory Returns an IPC identifier
int msgsnd(int msqid, const void *ptr, size_t nbytes, int flag);
Returns: 0 if OK, -1 on error
• msgsnd() places a message at the end of the queue.
o ptr: pointer that points to a message
o nbytes: length of message data
o if flag = IPC_NOWAIT: IPC_NOWAIT is similar to the nonblocking I/O flag for file I/O.
• Structure of messages
struct mymesg {
long mtype; /* positive message type */
char mtext[512]; /* message data, of length nbytes */
};
/* Algorithm: msgrcv
* Input: message descriptor
* address of data array for incoming message
int msgrcv(int msqid, void *ptr, size_t nbytes, long type, int flag);
Returns: data size in message if OK, -1 on error
• msgrcv() retrieves a message from a queue.
• type == 0: the first message on the queue is returned
• type > 0: the first message on the queue whose message type equals type is returned
• type < 0: the first message on the queue whose message type is the lowest value less than or equal
to the absolute value of type is returned
• flag may be given by IPC_NOWAIT
Algorithm: msgrcv
• If processes were waiting to send messages because there was no
room on the list, the kernel awakens them after it removes a
message from the message queue. If a message is bigger
than maxcount, the kernel returns an error for the system call leaves
the message on the queue. If a process ignores the size constraints
(MSG_NOERROR bit is set in flag), the kernel truncates the
message, returns the requested number of bytes, and removes the
entire message from the list.
• If the type is a positive integer, the kernel returns the first message of
the given type. If it is a negative, the kernel finds the lowest type of
all message on the queue, provided it is less than or equal to the
absolute value of the type, and returns the first message of that type.
For example, if a queue contains three messages whose types are
3, 1, and 2, respectively, and a user requests a message with type -
2, the kernel returns the message of type 1.
The syntax of msgctl:
msgctl(id, cmd, mstatbuf);
where cmd specifies the type of command, and mstatbuf is the address of a
user data structure that will contain control parameters or the results of a
query.
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
Performs various operations on a queue int msgctl(int msqid, int cmd, struct msqid_ds *buf);
Returns: 0 if OK, -1 on error
• cmd = IPC_STAT:
fetch the msqid_ds structure for this queue, storing it in buf
• cmd = IPC_SET:
set the following four fields from buf: msg_perm.uid, msg_perm.gid,
msg_perm.mode, and msg_qbytes
• cmd = IPC_RMID:
remove the message queue.
example: sender.c – send/store 3 messages in
to MQ
#include <stdio.h> // sender.c
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#define DEFINED_KEY 0x10101010
main(int argc, char **argv)
{
int msg_qid;
struct {
long mtype;
char content[256];
} msg;
fprintf(stdout, "=========SENDER==========\n");
if((msg_qid = msgget(DEFINED_KEY, IPC_CREAT | 0666)) < 0) {
perror("msgget: "); exit(-1);
}
msg.mtype = 1; int i=3;
while(i--) {
memset(msg.content, 0x0, 256);
gets(msg.content);
if(msgsnd(msg_qid, &msg, sizeof(msg.content), 0) < 0) {
perror("msgsnd: "); exit(-1);
}
}
}
example: receiver.c – fetch 3 messages
from
#includeMQ
#include <stdio.h> // receiver.c
<sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#define DEFINED_KEY 0x10101010
main(int argc, char **argv)
{
int msg_qid;
struct {
long mtype;
char content[256];
} msg;
fprintf(stdout, "=========RECEIVER==========\n");
if((msg_qid = msgget(DEFINED_KEY, IPC_CREAT | 0666)) < 0) {
perror("msgget: "); exit(-1);
}
int i=3;
while(i--) {
memset(msg.content, 0x0, 256);
if(msgrcv(msg_qid, &msg, 256, 0, 0) < 0) {
perror("msgrcv: "); exit(-1);
}
puts(msg.content);
}
}
Execute sender.c and receiver.c on two different terminals
© 2020 KL University
Fetch-And-Add
int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);
– Return an integer:
int
pthread_create(…, // first two args are the same
int (*start_routine)(void*),
void* arg);
Example: Creating a Thread
#include <pthread.h>
myarg_t args;
args.a = 10;
args.b = 20;
rc = pthread_create(&p, NULL, mythread, &args);
…
}
Wait for a thread to complete
int pthread_join(pthread_t thread, void **value_ptr);
© 2020 KL University
Recap of CO3
• Operating system organization: creating and running the first process, Page
tables: Paging, hardware, Process address space
• Page tables: Physical memory allocation
• Systems calls, exceptions and interrupts, Assembly trap handlers
• Disk driver and Disk scheduling
• Manipulation of the process address space
• Page tables: User part of an address space, sbrk, exec
• memory management policies: swapping, demand paging
• memory management policies: Page faults and replacement algorithms
• TLB, Segmentation
• Hybrid approach: paging and Segmentation, Multi-level paging
CO4 Topics
• Locking
• Inter-process communication
• Models of Inter-process communication
• Thread API, Conditional Variable
• Mutex, Concurrent Linked List
• Semaphores
• Concurrency Control Problems
• Deadlocks
• Boot Loader
Process memory layout
A program is a file containing a range of information that describes how to construct a process at run time.
The memory allocated to each process is composed of a number of parts, usually referred to as segments.
These segments are as follows:
a. Text: the instructions of the program.
b. The initialized data segment contains global and static variables that are explicitly Initialized
c. The uninitialized data segment contains global and static variables that are not explicitly initialized.
d. Heap: an area from which programs can dynamically allocate extra memory.
e. Stack: a piece of memory that grows and shrinks as functions are called and return and that is used to
allocate storage for local variables and function call linkage information
Several more segment types exist in an a.out, containing the symbol table, debugging information, linkage
tables for dynamic shared libraries, and the like. These additional sections don't get loaded as part of the
program's image executed by a process.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. For example:
Race Condition occurs when multiple process are trying to do something with shared data and the
final outcome depends on the order in which the processes run
Race Condition
This implementation is correct if executed in isolation. However, the code is not
correct if more than one copy executes concurrently. If two CPUs execute insert
at the same time, it could happen that both execute line 15 before either executes
16 (see Figure 4-1). If this happens, there will now be two list nodes with next set
to the former value of list. When the two assignments to list happen at line 16, the
second one will overwrite the first; the node involved in the first assignment will be
lost.
The lost update at line 16 is an example of a race condition. A race condition is
a situation in which a memory location is accessed concurrently, and at least one
access is a write. A race is often a sign of a bug, either a lost update (if the
accesses are writes) or a read of an incompletely-updated data structure. The
outcome of a race depends on the exact timing of the two CPUs involved and how
their memory operations are ordered by the memory system, which can make
race-induced errors difficult to reproduce and debug.
Race Condition example – 2 i = 5 (shared)
• The problem is that in the time a single process takes to execute these three steps,
another process can perform the same three steps. Chaos can result, as we will see
in some examples that follow.
sequence-number-increment problem
#define MAXLINE 4096 /* max text line length */
#define SEQFILE "seqno" /* filename */ void
#define LOCKFILE "seqno.lock"
my_lock(int fd)
void my_lock(int), my_unlock(int);
{
int main(int argc, char **argv)
return;
{int fd;
}
long i, seqno;
pid_t pid;
ssize_t n; void
char line[MAXLINE + 1]; my_unlock(int fd)
pid = getpid(); {
fd = open(SEQFILE, O_RDWR, 0666); return;
for (i = 0; i < 20; i++) {
}
my_lock(fd); /* lock the file */
lseek(fd, 0L, SEEK_SET); /* rewind before read */
n = read(fd, line, MAXLINE);
line[n] = '\0'; /* null terminate for sscanf */
n = sscanf(line, "%ld\n", &seqno);
printf("%s: pid = %ld, seq# = %ld\n", argv[0], (long) pid, seqno);
seqno++; /* increment sequence number */
snprintf(line, sizeof(line), "%ld\n", seqno);
lseek(fd, 0L, SEEK_SET); /* rewind before write */
write(fd, line, strlen(line));
my_unlock(fd); /* unlock the file */
}
exit(0);
}
If the sequence number in the file is initialized to one, and a single copy of the program
is run, we get the following output:
[vishnu@team-osd ~]$ cc seqnumnolock.c When the sequence number is again initialized to
one, and the program is run twice
[vishnu@team-osd ~]$ vi seqno in the background, we have the following output:
[vishnu@team-osd ~]$ ./a.out [vishnu@team-osd ~]$ vi seqno
./a.out: pid = 5448, seq# = 1 [vishnu@team-osd ~]$ ./a.out & ./a.out &
[1] 7891
./a.out: pid = 5448, seq# = 2 [2] 7892
./a.out: pid = 5448, seq# = 3 [vishnu@team-osd ~]$ ./a.out: pid = 7892, seq# = 1 ./a.out: pid = 7892, seq# = 20
./a.out: pid = 5448, seq# = 4 ./a.out: pid = 7892, seq# = 2 ./a.out: pid = 7891, seq# = 20
./a.out: pid = 7892, seq# = 3
./a.out: pid = 5448, seq# = 5 ./a.out: pid = 7891, seq# = 21
./a.out: pid = 7892, seq# = 4
./a.out: pid = 5448, seq# = 6 ./a.out: pid = 7892, seq# = 5 ./a.out: pid = 7891, seq# = 22
./a.out: pid = 7892, seq# = 6 ./a.out: pid = 7891, seq# = 23
./a.out: pid = 5448, seq# = 7
./a.out: pid = 7892, seq# = 7 ./a.out: pid = 7891, seq# = 24
./a.out: pid = 5448, seq# = 8 ./a.out: pid = 7892, seq# = 8 ./a.out: pid = 7891, seq# = 25
./a.out: pid = 5448, seq# = 9 ./a.out: pid = 7892, seq# = 9 ./a.out: pid = 7891, seq# = 26
./a.out: pid = 5448, seq# = 10 ./a.out: pid = 7892, seq# = 10
./a.out: pid = 7891, seq# = 27
./a.out: pid = 7892, seq# = 11
./a.out: pid = 5448, seq# = 11 ./a.out: pid = 7892, seq# = 12 ./a.out: pid = 7891, seq# = 28
./a.out: pid = 5448, seq# = 12 ./a.out: pid = 7892, seq# = 13 ./a.out: pid = 7891, seq# = 29
./a.out: pid = 5448, seq# = 13 ./a.out: pid = 7892, seq# = 14 ./a.out: pid = 7891, seq# = 30
./a.out: pid = 7891, seq# = 8 ./a.out: pid = 7891, seq# = 31
./a.out: pid = 5448, seq# = 14 ./a.out: pid = 7892, seq# = 15 ./a.out: pid = 7891, seq# = 32
./a.out: pid = 5448, seq# = 15 ./a.out: pid = 7892, seq# = 16
./a.out: pid = 7891, seq# = 33
./a.out: pid = 5448, seq# = 16 ./a.out: pid = 7892, seq# = 17
./a.out: pid = 7891, seq# = 17
./a.out: pid = 7891, seq# = 34
./a.out: pid = 5448, seq# = 17 ./a.out: pid = 7892, seq# = 18 ./a.out: pid = 7891, seq# = 35
./a.out: pid = 5448, seq# = 18 ./a.out: pid = 7891, seq# = 19 ./a.out: pid = 7891, seq# = 36
./a.out: pid = 5448, seq# = 19 ./a.out: pid = 7892, seq# = 19 [1]- Done ./a.out
./a.out: pid = 5448, seq# = 20 [2]+ Done ./a.out
Critical Section do {
entry section
• A critical section is a block of a
critical section
code that only one process at a
time can execute exit section
• The critical section problem is
to ensure that only one process } while (TRUE);
at a time is allowed to be
operating in its critical section
• Each process takes permission
from operating system to enter
into the critical section
The term critical section is used to refer to a section of code that accesses a shared
resource and whose execution should be atomic; that is, its execution should not be
interrupted by another thread that simultaneously accesses the same shared resource.
Mutual exclusion
• If a process is executing in its critical section , then no other process is
allowed to execute in the critical section
• No two process can be in the same critical section at the same time.
This is called mutual exclusion
Locks: The Basic Idea
• Ensure that any critical section executes as if it were a single
atomic instruction.
• An example: the canonical update of a shared variable
balance = balance + 1;
• Other threads are prevented from entering the critical section while the first
thread that holds the lock is in there.
Building A Lock
Efficient locks provided mutual exclusion at low cost.
Building a lock need some help from the hardware and the
OS.
Design
Evaluating locks – Basic criteria
• Mutual exclusion
• Does the lock work, preventing multiple threads
from entering a critical section?
• Fairness
• Does each thread contending for the lock get a
fair shot at acquiring it once it is free? (Starvation)
• Performance
• The time overheads added by using the lock
Controlling Interrupts
• Disable Interrupts for critical sections
• One of the earliest solutions used to provide mutual exclusion
• Invented for single-processor systems.
1 void lock() {
2 DisableInterrupts();
3 }
4 void unlock() {
5 EnableInterrupts();
6 }
• Problem:
• Require too much trust in applications
• Greedy (or malicious) program could monopolize the processor.
• Do not work on multiprocessors
• Code that masks or unmasks interrupts be executed slowly by modern CPUs
Why hardware support needed?
First attempt: Using a flag denoting whether the lock is held or
not.
The code below has problems.
1 typedef struct __lock_t { int flag; } lock_t;
2
3 void init(lock_t *mutex) {
4 // 0 lock is available, 1 held
5 mutex->flag = 0;
6 }
7
8 void lock(lock_t *mutex) {
9 while (mutex->flag == 1) // TEST the flag
10 ; // spin-wait (do nothing)
11 mutex->flag = 1; // now SET it !
12 }
13
14 void unlock(lock_t *mutex) {
15 mutex->flag = 0;
16 }
Why hardware support needed? (Cont.)
• Problem 1: No Mutual Exclusion (assume flag=0 to begin)
Thread1 Thread2
call lock()
while (flag == 1)
interrupt: switch to Thread 2
call lock()
while (flag == 1)
flag = 1;
interrupt: switch to Thread 1
flag = 1; // set flag to 1 (too!)
• Fairness: no
• Spin locks don’t provide any fairness guarantees.
• Indeed, a thread spinning may spin forever.
• Performance:
• In the single CPU, performance overheads can be quire painful.
• If the number of threads roughly equals the number of CPUs, spin locks work
reasonably well.
Load-Linked and Store-Conditional
• Test whether the value at the address(ptr) is equal to expected.
• If so, update the memory location pointed to by ptr with the new value.
• In either case, return the actual value at that memory location.
1 int CompareAndSwap(int *ptr, int expected, int new)
{
2 int actual = *ptr;
3 if (actual == expected)
4 *ptr = new;
5 return actual;
6 }