DC - Unit 4 Latest
DC - Unit 4 Latest
DC - Unit 4 Latest
Example:
• Fail-stop model: a process may crash in the middle of a step, which could be the
execution of a local operation or processing of a message for a send or receive
event.
• Byzantine failure model: a process may behave arbitrarily
• Choice of failure model depends upon the feasibility and complexity of solving
consensus.
Problem Definition
2. Synchronous/Asynchronous communication:
3. Network connectivity:
• The system has full logical connectivity, i.e., each process can communicate with
any other by direct message passing.
4. Sender identification:
• A process that receives a message always knows the identity of the sender
process.
Problem Definition
• When multiple messages are expected from the same sender in a single round,
we implicitly assume a scheduling algorithm that sends these messages in sub-
rounds, so that each message sent within the round can be uniquely identified.
5. Channel reliability:
• The channels are reliable, and only the processes may fail
6. Agreement variable:
• The agreement variable may be boolean or multi-valued, and need not be an
integer.
Problem Definition
Case study: Difficulty of reaching agreement
• Inspired by the long wars fought by the Byzantium Empire in the Middle Ages
• Four camps of the attacking army, each commanded by a general, are camped
around the fort of Byzantium.
• They can succeed in attacking only if they attack simultaneously. Hence, they
need to reach agreement on the time of attack.
• The only way they can communicate is to send messengers among themselves.
The messengers model the messages.
• A traitor may inform one general to attack at 10am, and inform the other
generals to attack at noon. Or he may not send a message at all to some general.
Likewise, he may tamper with messages he gets from other generals, before
relaying those messages.
Problem Definition
• Validity: If the source process is non-faulty, then the agreed upon value by all
the non-faulty processes must be the same as the initial value of the source.
• Validity: If all the non-faulty processes have the same initial value, then the
agreed upon value by all the non-faulty processes must be that same value.
• Agreement: All non-faulty processes must agree on the same array of values
A[v1 . . . vn].
• Validity: If process i is non-faulty and its initial value is vi, then all non-faulty
processes agree on vi as the ith element of the array A. If process j is faulty, then
the non-faulty processes can agree on any value for A[j].
• In this round, say round r, all the processes that have not failed so far succeed in
broadcasting their values, and all these processes take the minimum of the values
broadcast and received in that round.
• Thus, the local values at the end of the round are the same, say xir for all non-failed
processes. In further rounds, only this value may be sent by each process at most
once, and no process i will update its value xir
Agreement in (Message-Passing) Synchronous Systems with Failures
• The validity condition is satisfied because processes do not send fictitious values
in this failure model. (Thus, a process that crashes has sent only correct values
until the crash). For all i, if the initial value is identical, then the only value sent
by any process is that identical value which is the value agreed upon as per the
agreement condition
Complexity:
• The number of messages is at most O(n2) in each round
• the total number of messages is O((f +1) · n2).
Agreement in (Message-Passing) Synchronous Systems with Failures
2. Consensus Algorithms for Byzantine Failures (Synchronous System)
(A) Upper Bound on Byzantine Processes
Agreement in (Message-Passing) Synchronous Systems with Failures
Agreement in (Message-Passing) Synchronous Systems with Failures
Proof:
With n processes and f ≥ n/3 processes, the Byzantine agreement problem cannot
be solved.
• Let Z(3, 1) denote the Byzantine agreement problem for parameters n = 3 and f =
1.
• Let Z(n ≤ 3f, f) denote the Byzantine agreement problem for parameters n(≤ 3f)
and f.
• A reduction from Z(3, 1) to Z(n ≤ 3f, f) needs to be shown, i.e., if Z(n ≤ 3f, f) is
solvable, then Z(3, 1) is also solvable. After showing this reduction, we can argue
that as Z(3, 1) is not solvable, Z(n ≤ 3f, f) is also not solvable.
Agreement in (Message-Passing) Synchronous Systems with Failures
The main idea of the reduction argument
• In Z(n ≤ 3f, f), partition the n processes into three sets S1, S2, S3, each of size ≤
n/3.
• In Z(3, 1), each of the three processes P1, P2, P3 simulates the actions of the
corresponding set S1, S2, S3 in Z(n ≤ 3f, f).
• If one process is faulty in Z(3, 1), then at most f, where f ≤ n/3, processes are
faulty in Z(n, f).
• These actions are crucial – they entail taking the majority of the values at
each level of the tree.
• The final value of the root is the agreement value, which will be the same at
all processes.
Agreement in (Message-Passing) Synchronous Systems with Failures
(B) Byzantine Agreement Tree Algorithm: Exponential (Synchronous System)
(i) Iterative formulation
Agreement in (Message-Passing) Synchronous Systems with Failures
(B) Byzantine Agreement Tree Algorithm: Exponential (Synchronous System)
(i) Iterative formulation
Agreement in (Message-Passing) Synchronous Systems with Failures
(B) Byzantine Agreement Tree Algorithm: Exponential (Synchronous System)
(i) Iterative formulation
Agreement in (Message-Passing) Synchronous Systems with Failures
(B) Byzantine Agreement Tree Algorithm: Exponential (Synchronous System)
(i) Iterative formulation
Agreement in (Message-Passing) Synchronous Systems with Failures
(i) Iterative formulation – Algorithm
Agreement in (Message-Passing) Synchronous Systems with Failures
(i) Iterative formulation – Algorithm
Agreement in (Message-Passing) Synchronous Systems with Failures
(i) Correctness of Byzantine Agreement Algorithm
• Loyal commander: Given f and x, if the commander process is loyal, then
Oral_Msg(x) is correct if there are at least 2f + x processes.
• No assumption about commander: Given f, Oral_Msg(x) is correct if x ≥ f and
there are a total of 3x + 1 or more processes.
Agreement in (Message-Passing) Synchronous Systems with Failures
Complexity
• The phase-king algorithm solves the consensus problem using f + 1 phases, and a
polynomial number of messages
• Complexity
The algorithm requires f + 1 phases and two sub-rounds each, and (f + 1) [(n −1)(n
+ 1)] messages.
Checkpointing and Rollback Recovery
Introduction
• Distributed systems are not fault-tolerant and the vast computing potential of
these systems is often hampered by their susceptibility to failures.
• It achieves fault tolerance by periodically saving the state of a process during the
failure-free execution, and restarting from a saved state upon a failure to reduce
Introduction
• The saved state is called a checkpoint, and the procedure of restarting from
previously check pointed state is called rollback recovery
• A checkpoint can be saved on either the stable storage or the volatile storage
depending on the failure scenarios to be tolerated
• A system recovers correctly if its internal state is consistent with the observable
behavior of the system before the failure
2. Local checkpoint
• A local checkpoint is a snapshot of the state of the process at a given instance
and the event of recording the state of a process is called local checkpointing.
• The contents of a checkpoint depend upon the application context and the
checkpointing method being used.
Background and Definitions
• Depending upon the checkpointing method used, a process may keep several
local checkpoints or just a single checkpoint at any time
• We assume that a process stores all local checkpoints on the stable storage so
that they are available even if the process crashes
• We also assume that a process is able to roll back to any of its existing local
checkpoints and thus restore to and restart from the corresponding state.
Background and Definitions
3. Consistent and Inconsistent system states
• consistent system state is one in which a process’s state reflects a message
receipt, then the state of the corresponding sender must reflect the sending of
that message.
• For example, a printer cannot roll back the effects of printing a character, and an
automatic teller machine cannot recover the money that it dispensed to a
customer
• Output Commit
Before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure. This is
commonly called the output commit problem.
Background and Definitions
• Input messages
• input messages that a system receives from the OWP may not be reproducible
during recovery, because it may not be possible for the outside world to
regenerate them.
• recovery protocols must arrange to save these input messages so that they can
be retrieved when needed for execution replay after a failure.
• A common approach is to save each input message on the stable storage before
allowing the application program to process it.
Background and Definitions
5. Different types of messages
Background and Definitions
5. Different types of messages
a. In-transit message
messages that have been sent but not yet received
b. Lost messages
messages whose “send‟ is done but “receive‟ is undone due to rollback
c. Delayed messages
messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
d. Orphan messages
• messages with “receive‟ recorded but message “send‟ not recorded
• Do not arise if processes roll back to a consistent global state
Background and Definitions
e. Duplicate messages
Due to message logging and replaying during process recovery some messages
are sent repeatedly.
Issues in Failure Recovery
Issues in Failure Recovery
Overlapping failure
2. Coordinated checkpointing
(a) Blocking coordinated checkpointing
(b) Non blocking checkpointing coordination
4. Communication-Induced Checkpointing
(a) Model-based Checkpointing
(b) Index based checkpointing
Checkpoint Based Recovery
1. Uncoordinated checkpointing
• Each process has autonomy in deciding when to take checkpoints.
Drawbacks:
1. There is the possibility of the domino effect during a recovery, which may cause
3. Useless Checkpoints
(b) Useless checkpoints are undesirable because they incur overhead and do not
contribute to advancing the recovery line.
Checkpoint Based Recovery
5. It is not suitable for applications with frequent output commits because these
require global coordination to compute the recovery line.
Checkpoint Based Recovery
2. When a process receives this message, it stops its execution and replies with
the dependency information saved on the stable storage
3. The initiator then calculates the recovery line based on the global dependency
information and broadcasts a rollback request message containing the
recovery line.
4. Upon receiving this message, a process whose current state belongs to the
recovery line simply resumes execution; otherwise, it rolls back to an earlier
checkpoint as indicated by the recovery line.
Checkpoint Based Recovery
The direct dependency tracking technique shown in the below diagram is
commonly used in uncoordinated checkpointing.
Checkpoint Based Recovery
• Let ci,x be the xth checkpoint of process Pi , where i is the process id and x is
the checkpoint index
• Let Ii,x denote the checkpoint interval or simply interval between checkpoints
ci,x−1 and ci,x.
• When Pj receives m during interval Ij,y , it records the dependency from Ii,x to Ij,y,
which is later saved onto stable storage when Pj takes checkpoint cj,y.
Checkpoint Based Recovery
2. Coordinated checkpointing
• processes coordinate in checkpointing activities so that all local checkpoints
form a consistent global state
• Benefits are
simplifies recovery
avoids domino effect
each process restarts from its recent checkpoint
• Requires each process to maintain only one checkpoint on the stable storage,
reducing the storage overhead and eliminating the need for garbage
collection.
Checkpoint Based Recovery
• Solution:
checkpoint consistency could be achieved without synchronizing clock by
Checkpoint Based Recovery
(a) Blocking the message sending for the running duration of the protocol
(b) Piggybacking checkpoint indices on messages to avoid blocking.
Approach :
coordinated checkpointing involves blocking communications while the
checkpointing protocol executes
Procedure :
• After a process takes a local checkpoint, to prevent orphan messages, it remains
blocked until the entire checkpointing activity is complete.
Checkpoint Based Recovery
• The coordinator takes a checkpoint and broadcasts a request message to all
processes, asking them to take a checkpoint.
• A problem with this approach is that the computation is blocked during the
checkpointing and therefore, nonblocking checkpointing schemes are preferable.
Checkpoint Based Recovery
(ii) Non Blocking Coordinated Checkpointing
• In this approach processes need not stop their execution while taking
checkpoints.
Approach 1:
• uses the idea of snapshot algorithm of Chandy and Lamport in which
markers play the role of the checkpoint-request messages.
• Each process takes a checkpoint upon receiving the first marker and sends the
marker on all outgoing channels before sending any application message
Approach 2:
checkpoint indices are sent , where a checkpoint creation is triggered when the
receiver’s local checkpoint index is lower than the piggybacked checkpoint
index.
Checkpoint Based Recovery
3. Impossibility of Min Process Non-blocking Checkpointing
• A min-process, non-blocking checkpointing algorithm is one that
forces only a minimum number of processes to take a new checkpoint
Does not force any process to suspend its computation
• Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoints and sends them a request, and so
on, until no more processes can be identified.
Checkpoint Based Recovery
Phase 2:
• all processes identified in the first phase take a checkpoint
• The result is a consistent checkpoint that involves only the participating
processes.
• In this protocol, after a process takes a checkpoint, it cannot send any message
until the second phase terminates successfully, although receiving a message
after the checkpoint has been taken is allowable.
• Based on the concept called ’Z-dependency’, Cao and Singhal proved that there
does not exist a non-blocking algorithm that allows a minimum number of
processes to take their checkpoints.
Checkpoint Based Recovery
Phase 2:
• all processes identified in the first phase take a checkpoint
• The result is a consistent checkpoint that involves only the participating
processes.
• In this protocol, after a process takes a checkpoint, it cannot send any message
until the second phase terminates successfully, although receiving a message
after the checkpoint has been taken is allowable.
• Based on the concept called ’Z-dependency’, Cao and Singhal proved that there
does not exist a non-blocking algorithm that allows a minimum number of
processes to take their checkpoints.
Checkpoint Based Recovery
4. Communication-Induced Checkpointing
• It is another way to avoid domino effect while allowing processes to take some of
their checkpoints independently.
• Two types
1. Autonomous checkpoints: The checkpoints that a process takes independently
• The forced checkpoint must be taken before the application may process the
contents of the message
Checkpoint Based Recovery
• Two types of communication-induced checkpointing
1. Model-based checkpointing
2. Index-based checkpointing
Model-based checkpointing
• Model-based checkpointing prevents checkpoints that could result in inconsistent
states among the existing checkpoints and the communication pattern related to
it.
• Communication pattern causing inconsistency is prevented by forced checkpoints.
2. Rollback prevention:
• proposed by Wu and Fuchs
• In this approach domino effect is avoided by taking checkpoint immediately
after every message sending event.
Checkpoint Based Recovery
• Avoids rollback propagation
• Includes 2 parts:
o the check pointing algorithm
o the recovery algorithm
• Thus a situation will not arise where there is a record of a message being
received but there is no record of sending it.
An Optimization
• The above protocol may cause a process to take a checkpoint even when it is not
necessary for consistency
Koo-Toueg Coordinated Checkpointing Algorithm
• Since taking a checkpoint is an expensive operation, we must avoid taking
checkpoints if it is not necessary
Koo-Toueg Coordinated Checkpointing Algorithm
The Rollback Recovery Algorithm
• The rollback recovery algorithm restores the system state to a consistent state
after a failure.
Correctness
All processes restart from an appropriate state because if processes decide to
restart, then they resume execution from a consistent state. (the checkpointing
algorithm takes a consistent set of checkpoints).
An Optimization
The above recovery protocol causes all processes to roll back irrespective of
whether a process needs to roll back or not.
Koo-Toueg Coordinated Checkpointing Algorithm
Juang and Venkatesan Algorithm for Asynchronous Checkpointing and Recovery
The algorithm makes the following assumptions about the underlying system
• The communication channels are reliable
• Messages are delivered in FIFO order
• Has infinite buffers
• The message transmission delay is arbitrary, but finite
Checkpointing Algorithm
• After executing an event, a processor records a triplet {s, m, msgs_sent} in its
volatile storage
o s is the state of the processor before the event
o m is the message (including the identity of the sender of m, denoted as
m.sender) received
o msgs_sent is the set of messages sent by the processor during the event.
Juang and Venkatesan Algorithm for Asynchronous Checkpointing and Recovery
• The recovery algorithm achieves this by making each processor keep track of both
the number of messages it has sent to other processors as well as the number of
messages it has received from other processors
• Whenever a process rolls back, it is necessary for all other processes to find out if
any message sent by the rolled back process has become an orphan message
• If RCV Di←j(CkPti) > SENTj→i(CkPtj) then one or more messages at processor pj are
orphan messages.
• In this case, processor pj must roll back to a state where the number of
messages received agrees with the number of messages sent.
• When a process fails:
1. Roll’sback to latest checkpoint
2. Computes send value and transmits to other processes
3. Receives send value from other processes
4. At one particular checkpoint where send = receive will be considered for
rollback otherwise it will go for the recent checkpoint of itself.
Juang and Venkatesan Algorithm for Asynchronous Checkpointing and Recovery
• Suppose processor Y crashes at the point indicated and rolls back to a state
corresponding to checkpoint ey1.
• According to this state, Y has sent only one message to X; however, according
to X’s current state (ex2), X has received two messages from Y. Therefore, X must
roll back to a state preceding ex2 to be consistent with Y’s state.