CS8603 UNIT 4 Agreement in A Failure Free System
CS8603 UNIT 4 Agreement in A Failure Free System
CS8603 UNIT 4 Agreement in A Failure Free System
• A distributed mechanism would have each process broadcast its values to others,
andeach process computes the same function on the values received.
• Algorithms to collect the initial values and then distribute the decision may be
based on the token circulation on a logical ring, or the three-phase tree-based
broadcast converge cast: broadcast, or direct communication with all nodes.
• Further, concurrent common knowledge of the consensus value can also be attained.
• The consensus algorithm for n processes where up to f processes where f < n may
fail in a fail stop failure model.
• Here the consensus variable x is integer value; each process has initial value xi. If
up to f failures are to be tolerated than algorithm has f+1 rounds, in each round a
process i sense the value of its variable xi to all other processes if that value has not
been sent before.
• So, of all the values received within that round and its own value xi at that start of
the round the process takes minimum and updates xi occur f + 1 rounds the local
value xi guaranteed to be the consensus value.
• In one process is faulty, among three processes then f = 1. So the agreement requires f + 1
that is equal to two rounds.
• If it is faulty let us say it will send 0 to 1 process and 1 to another process i, j and k.
Now, on receiving one on receiving 0 it will broadcast 0 over here and this
particular process on receiving 1 it will broadcast 1 over here.
• So, this will complete one round in this one round and this particular process on
receiving 1 it will send 1 over here and this on the receiving 0 it will send 0 over
here.
(global constants)
(1) Process Pi (1 i n) execute the consensus algorithm for up to f crash failures:
(1a) for round from 1 to f + 1 do
(1c) broadcast(x);
• The agreement condition is satisfied because in the f+ 1 rounds, there must be at least
one round in which no process failed.
• In this round, say round r, all the processes that have not failed so far succeed in
broadcasting their values, and all these processes take the minimum of the values
broadcast and received in that round.
• Thus, the local values at the end of the round are the same, say xri for all non-failed
processes.
• In further rounds, only this value may be sent by each process at most once, and no
process i will update its value xir.
• The validity condition is satisfied because processes do not send fictitious values inthis
failure model.
• For all i, if the initial value is identical, then the only value sent by any process is the
value that has been agreed upon as per the agreement condition.
Complexity
• The complexity of this particular algorithm is it requires f + 1 rounds where f < n and
the number of messages is O(n2 )in each round and each message has one integers
hence the total number of messages is O((f +1)· n2) is the total number ofrounds and in
each round n2 messages are required.
• In the worst-case scenario, one process may fail in each round; with f + 1 rounds,
there is at least one round in which no process fails. In that guaranteed failure-free
round, all messages broadcast can be delivered reliably, and all processes that have
not failed can compute the common function of the received values to reach an
agreement value.
n −1
f
3
• The condition where f < (n – 1) / 2 is violated over here; that means, if f = 1 andn = 2
this particular assumption is violated
• (n – 1) / 2 is not 1 in that case, but we are assuming 1 so obviously, as per the previous
condition agreement byzantine agreement is not possible.
• Here P 0 is faulty is non faulty and here P 0 is faulty so that means P 0 is the source,
the source is faulty here in this case and source is non faulty in the other case.
• So, source is non faulty, but some other process is faulty let us say that P 2 is faulty. P
1 will send because it is non faulty same values to P 1 and P 2 and as far as the P 2s
concerned it will send a different value because it is a faulty.
• So, this is the source it will send the message 0 since it is faulty. It will send 0 to Pd 0
to P b, but 1 to pa in the first column. So, P a after receiving this one it will send one to
both the neighbors, similarly P b after receiving 0 it will send 0 since itis not faulty.
• If we take these values which will be received here it is 1 and basically it is 0 and this
is also 0.
• So, the majority is basically 0 here in this case here also if you see the values 10 and 0.
The majority is 0 and here also majority is 0.
• In this particular case even if the source is faulty, it will reach to an agreement, reach
an agreement and that value will be agreed upon value or agreement variable will be
equal to 0.
(JUANG-VENKATESAN)
➢ infinite buffers
• To facilitate recovery after a process failure and restore the system to a consistent
state, two types of log storage are maintained:
➢ Volatile log: It takes short time to access but lost if processor crash.
Thecontents of volatile log are moved to stable log periodically.
➢ Stable log: longer time to access but remained if crashed.
Asynchronous checkpointing
• After executing an event, a processor records a triplet (s, m, msg_sent) in its volatile
storage.
− s:state of the processor before the event
− m: message
− msgs_sent: set of messages that were sent by the processor during the
event.
• A local checkpoint at a processor consists of the record of an event occurring at the
processor and it is taken without any synchronization with other processors.
• Periodically, a processor independently saves the contents of the volatile log in the
stable storage and clears the volatile log.
Recovery Algorithm
SENTi→ j (CkPti )
This represents the number of messages sent by processor pi to processor pj, from the
beginning of the computation until the checkpoint CkPti.
• The main idea of the algorithm is to find a set of consistent checkpoints, from theset
of checkpoints.
• Whenever a processor rolls back, it is necessary for all other processors to find outif
any message sent by the rolled back processor has become an orphan message.
• The orphan messages are identified by comparing the number of messages sent to
and received from neighboring processors.
Fig : Algorithm for Asynchronous Check pointing and Recovery (Juang- Venkatesan)
• The rollback starts at the failed processor and slowly diffuses into the entire
systemthrough ROLLBACK messages.
(i) based on the state CkPti it was rolled back in the (k − 1)th iteration, it
computes SENTi→j (CkPti) for each neighbor pj and sends this value in a
(ii) pi waits for and processes ROLLBACK messages that it receives from its
neighbors in kth iteration and determines a new recovery point CkPti for pi
based on information in these messages.
At the end of each iteration, at least one processor will rollback to its final recovery point,
unless the current recovery points are already consistent.
• Consensus algorithms necessarily assume that some processes and systems will be
unavailable and that some communications will be lost.
• Failure models:
− A faulty process can behave in any manner allowed by the failure model assumed.
− Some of the well known failure models includes fail-stop, send omission and receive
omission, and Byzantine failures.
− Fail stop model: a process may crash in the middle of a step, which could be the
execution of a local operation or processing of a message for a send or receive event.
it may send a message to only a subset of the destination set before crashing.
− The choice of the failure model determines the feasibility and complexity of solving
consensus.
• Synchronous/asynchronous communication:
− takes a very long time in transit. This is a major hurdle in asynchronous system.
− The intended recipient can deal with the non-arrival of the expected message by
assuming the arrival of a message containing some default data, and then proceeding
with the next round of the algorithm.
• Network connectivity:
− The system has full logical connectivity, i.e., each process can communicate with any
other by direct message passing.
• Sender identification:
− A process that receives a message always knows the identity of the sender process.
− When multiple messages are expected from the same sender in a single round, a
scheduling algorithm is employed that sends these messages in sub-rounds, so that each
message sent within the round can be uniquely identified.
• Channel reliability:
− The channels are reliable, and only the processes may fail.
(i) it can forge the message and claim that it was received from anotherprocess,
(ii) it can also tamper with the contents of a received message before relayingit.
− When a process receives a message, it has no way to verify its authenticity. This is
known as un authenticated message or oral message or an unsignedmessage.
− Using authentication via techniques such as digital signatures, it is easier to solve the
• Agreement variable:
− The agreement variable may be boolean or multivalued, and need not be aninteger.
− This simplifying assumption does not affect the results for other data types, but helps in
the abstraction while presenting the algorithms.
• Imagine that the grand Eastern Roman empire aka Byzantine empire has decided to
capture a city.
• The army has many divisions and each division has a general.
• The generals communicate between each as well as between all lieutenants within their
division only through messengers.
• All the generals or commanders have to agree upon one of the two plans of action.
• Exact time to attack all at once or if faced by fierce resistance then the time toretreat
all at once. The army cannot hold on forever.
• If the attack or retreat is without full strength then it means only one thing —
Unacceptable brutal defeat.
• If all generals and/or messengers were trustworthy then it is a very simple solution.
• There is a very high chance that they will not follow orders or pass on the incorrect
message. The level of trust in the army is very less.
• Consider just a case of 1 commander and 2 Lieutenants and just 2 types of
messages- ‘Attack’ and ‘Retreat’.
• In Fig, the Lieutenant 2 is a traitor who purposely changes the message that is to be
passed to Lieutenant 1.
• Now Lieutenant 1 has received 2 messages and does not know which one to follow.
Assuming Lieutenant 1 follows the Commander because of strict hierarchyin the army.
• Still, 1/3rd of the army is weaker by force as Lieutenant 2 is a traitor and this creates a
lot of confusion.
• However what if the Commander is a traitor (as explained in Fig ). Now 2/3rdof the total
• Now imagine the exponential increase when there are hundreds of Lieutenants.
• All participating nodes have to agree upon every message that is transmitted between
the nodes.
• If a group of nodes is corrupt or the message that they transmit is corrupt then still the
network as a whole should not be affected by it and should resist this ‘Attack’.
• Consensus problem
• A process is Byzantine if, during its execution, one of the following faults occurs:
− Crash: The process stops executing statements of its program and halts.
− Corruption: The process changes arbitrarily the value of a local variable with respect to
its program specification. This fault could be propagated toother processes by including
incorrect values in the content of a message sent by the process.
− Omission: The process omits to execute a statement of its program. If a process omits
to execute an assignment, this could lead to a corruption fault.
− Duplication: The process executes more than one time a statement of its program. If a
process executes an assignment more than one time, this could lead to a corruption fault.
All the process has an initial value and all the correct processes must agree on single
value. This is consensus problem.
• Agreement: All non-faulty processes must agree on the same (single) value.
• Validity: If all the non-faulty processes have the same initial value, then the
agreed upon value by all the non-faulty processes must be that same value.
All the process has an initial value, and all the correct processes must agree upon a set
of values, with one value for each process. This is interactive consistency problem.
• Agreement: All non-faulty processes must agree on the same array of values A
[v1, …,vn].
• Validity: If process i is non-faulty and its initial value is vi, then all non faulty
processes agree on vi as the ith element of the array A. If process jis faulty, then the
non-faulty processes can agree on any value for A[j].
The difference between the agreement problem and the consensus problem is that, in
the agreement problem, a single process has the initial value, whereas in the
consensus problem, all processes have an initial value.
• Consensus is not solvable in asynchronous systems even if one process can fail by
crashing. Consensus is attainable for no failure case.
The results are tabulated below. f indicates the number of processes that can failand n
indicates the total number of processes.
Agreement is attainable.
Agreement is attainable.
Concurrent common
1. No failure Common knowledge is also
knowledge is also
attainable.
attainable.
Agreement is attainable.
Agreement is not
2. Crash failure f < n process attainable.
(f+1) rounds
Agreement is attainable.
Byzantine f<= floor((n-1)/3) Agreement is not
3. (malicious) attainable.
failure Byzantine process
(f+1) rounds
• Terminating reliable broadcast: A correct process will always get a message even
if the sender crashes while sending. If the sender crashes while sending the message,
the message may be even null, but still it has to be delivered to the correct process.
• K-set consensus: It is solvable as long as the number of crashes is less than the
parameter k, which indicates the non-faulty processes that agree on different values,
as long as the size of the set of values agreed upon is bounded by k.
• Approximate agreement: The consensus value is from multi valued domain. The
agreed upon values by the non-faulty processes be within of each other.
• Koo and Toueg coordinated check pointing and recovery technique takes a consistent set
of checkpoints and avoids the domino effect and live lock problems during the recovery.
• Includes 2 parts: the check pointing algorithm and the recovery algorithm
A. The Check pointing Algorithm
The checkpoint algorithm makes the following assumptions about the distributed system:
• Processes communicate by exchanging messages through communication channels.
• Communication channels are FIFO.
• Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
• Communication failures do not divide the network.
• The checkpoint algorithm takes two kinds of checkpoints on the stable storage:
Permanent and Tentative.
• A permanent checkpoint is a local checkpoint at a process and is a part of a consistent
global checkpoint.
• A tentative checkpoint is a temporary checkpoint that is made a permanent checkpoint on
the successful termination of the checkpoint algorithm.
The above protocol, in the event of failure of process X, the above protocol will require
processes X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively. Process Z need
not roll back because there has been no interaction between process Z and the other two
processes since the last checkpoint at Z.
The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault-
free, FIFO communication channels.
Processes Pi, Pj , and Pk have taken checkpoints
• When 𝑃𝑗 receives a message m during 𝐼𝑗,𝑦 , it records the dependency from 𝐼𝑖,𝑥 to 𝐼𝑗,𝑦,
which is later saved onto stable storage when 𝑃𝑗 takes 𝐶𝑗,𝑦
• Upon receiving this message, a process whose current state belongs to the recovery line
simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
the recovery line.
2. Coordinated Checkpointing
In coordinated check pointing, processes orchestrate their checkpointing activities so that all
local checkpoints form a consistent global state
Types
1. Blocking Checkpointing: After a process takes a local checkpoint, to prevent orphan
messages, it remains blocked until the entire checkpointing activity is complete
Disadvantages: The computation is blocked during the checkpointing
2. Non-blocking Checkpointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated checkpointing is to prevent a process
from receiving application messages that could make the checkpoint inconsistent.
Example (a) : Checkpoint inconsistency
• Message m is sent by 𝑃0 after receiving a checkpoint request from the checkpoint
coordinator
• Assume m reaches 𝑃1 before the checkpoint request
• This situation results in an inconsistent checkpoint since checkpoint 𝐶1,𝑥 shows the
receipt of message m from 𝑃0, while checkpoint 𝐶0,𝑥 does not show m being sent from
𝑃0
Example (b) : A solution with FIFO channels
Types
1. Pessimistic Logging
• Pessimistic logging protocols assume that a failure can occur after any non-deterministic
event in the computation. However, in reality failures are rare
• Pessimistic protocols implement the following property, often referred to as synchronous
logging, which is a stronger than the always-no-orphans condition
• Synchronous logging
– ∀e: ¬Stable(e) ⇒ |Depend(e)| = 0
• Thai is, if an event has not been logged on the stable storage, then no process can depend
on it.
Example:
Suppose processes 𝑃1 and 𝑃2 fail as shown, restart from checkpoints B and C, and roll
forward using their determinant logs to deliver again the same sequence of messages as in the
pre-failure execution
• Consider the example shown in Figure. Suppose process P2 fails before the determinant
for m5 is logged to the stable storage. Process P1 then becomes an orphan process and
must roll back to undo the effects of receiving the orphan message m6. The rollback of
P1 further forces P0 to roll back to undo the effects of receiving message m7.
• Advantage: better performance in failure-free execution
• Disadvantages:
• coordination required on output commit
• more complex garbage collection
• Since determinants are logged asynchronously, output commit in optimistic logging
protocols requires a guarantee that no failure scenario can revoke the output. For
example, if process P0 needs to commit output at state X, it must log messages m4 and
m7 to the stable storage and ask P2 to log m2 and m5. In this case, if any process fails,
the computation can be reconstructed up to state X.
3. Causal Logging
• Combines the advantages of both pessimistic and optimistic logging at the expense of a more
complex recovery protocol
• Consider the example in Figure Messages m5 and m6 are likely to be lost on the failures
of P1 and P2 at the indicated instants. Process
• P0 at state X will have logged the determinants of the nondeterministic events that
causally precede its state according to Lamport’s happened-before relation.
• These events consist of the delivery of messages m0, m1, m2, m3, and m4.
• The determinant of each of these non-deterministic events is either logged on the stable
storage or is available in the volatile log of process P0.
• The determinant of each of these events contains the order in which its original receiver
delivered the corresponding message.
• The message sender, as in sender-based message logging, logs the message content.
Thus, process P0 will be able to “guide” the recovery of P1 and P2 since it knows the
order in which P1 should replay messages m1 and m3 to reach the state from which P1
sent message m4.
A local checkpoint
• All processes save their local states at certain instants of time
• A local check point is a snapshot of the state of the process at a given instance
• Assumption
– A process stores all local checkpoints on the stable storage
– A process is able to roll back to any of its existing local checkpoints