Distributed Operating System
Distributed Operating System
Distributed Operating System
SYSTEMS
Structure
2.0
2.1
2.2
Page Nos.
Introduction
Objectives
History of Distributed Computing
17
17
17
2.3
2.4
Distributed Systems
Key Features and Advantages of a Distributed System
19
20
2.5
2.6
2.7
21
Concurrency
Scalability
Openness
Fault Tolerance
Privacy and Authentication
Transparency
23
Naming
Communication
Software Structure
Workload Allocation
Consistency Maintenance
Lamports Scheme of Ordering of Events
26
2.8
2.9
2.10
2.11
2.12
2.13
29
35
37
38
39
39
2.0 INTRODUCTION
In the earlier unit we have discussed the Multiprocessor systems. In the process
coupling we have come across the tightly coupled systems and loosely coupled
systems. In this unit we concentrate on the loosely coupled systems called as the
distributed systems. Distributed computing is the process of aggregating the power of
several computing entities to collaboratively run a computational task in a transparent
and coherent way, so that it appears as a single, centralised system.
The easiest way of explaining what distributed computing is all about, is by naming a
few of its properties:
17
mywbut.com
Before going into the actual details of the distributed systems let us study how
distributed computing evolved.
2.1 OBJECTIVES
After going through this unit, you should be able to:
2.2
Distributed computing began around 1970 with the emergence of two technologies:
With the minicomputer (e.g., Digitals PDP-11) came the timesharing operating
system (e.g., MULTICS, Unix, RSX, RSTS) - many users using the same machine but
it looks to the users as if they each have their own machine.
The problem with mini-computers was that they were slower than the mainframes
made by IBM, Control Data, Univac, etc. As they became popular they failed to scale
to large number of users as the mainframes could. The way to scale mini-computers
was to buy more of them. The trend toward cheaper machines made the idea of having
many minis as a feasible replacement for a single mainframe and made it possible to
contemplate a future computing environment where every user had their own
computer on their desk, which is a computer workstation.
Work on the first computer workstation began in 1970 at Xerox Corporations Palo
Alto Research Center (PARC). This computer was called the Alto. Over the next 10
years, the computer systems group at PARC would invent almost everything thats
interesting about the computer workstations and personal computers we use today.
The Alto introduced ideas like the workstation, bit-mapped displays (before that
computer interfaces were strictly character-based) and the mouse.
Other innovations that came from PARC from 1970 to 1980 include Ethernet, the first
local-area network, window- and icon-based computing (the inspiration for the Apple
Lisa, the progenitor of the Macintosh, which in turn inspired Microsoft Windows and
IBM OS/2 came from a visit to PARC), the first distributed fileserver XDFS , the first
18
mywbut.com
print server, one of the first distributed services: Grapevine (a messaging and
authentication system), object-oriented programming (Smalltalk) and Hoares
condition variables and monitors were invented as part of the Mesa programming
language used in the Cedar system.
2.2.1
Workstations-Networks
The vision of the PARC research (and the commercial systems that followed) was to
replace the timesharing mini-computer with single-user workstations. The genesis of
this idea is that it made computers (i.e., workstations and PCs) into a commodity item
that like a TV or a car could be produced efficiently and cheaply. The main costs of a
computer are the engineering costs of designing it and designing the manufacturing
process to build it. If you build more units, you can amortised the engineering costs
better and thus make the computers cheaper. This is a very important idea and is the
main reason that distribution is an excellent way to build a cheap scalable system.
2.2.2
Loosely-coupled (distributed): Each processor has its own memory and copy
of the OS.
the only way two nodes can communicate is by sending and receiving
network messages-this differs from a hardware approach in which
hardware signalling can be used for flow control or failure detection.
2.4.3
Although we have seen several advantages of distributed systems, there are certain
disadvantages also which are listed below:
Latency: Delay that occurs after a send operation is executed before data starts
to arrive at the destination computer.
Data Transfer Rate: Speed at which data can be transferred between two
computers once transmission has begun.
Total network bandwidth: Total volume of traffic that can be transferred across
the network in a give time.
Higher security risk due to more possible access points for intruders and
possible communication with insecure systems.
Software complexity.
20
mywbut.com
Concurrency
Scalability
Openness
Fault Tolerance
Privacy and Authentication
Transparency.
2.5.1
Concurrency
A server must handle many client requests at the same time. Distributed systems are
naturally concurrent; that is, there are multiple workstations running programs
independently and at the same time. Concurrency is important because any distributed
service that isnt concurrent would become a bottleneck that would serialise the
actions of its clients and thus reduce the natural concurrency of the system.
2.5.2
Scalability
The goal is to be able to use the same software for different size systems. A distributed
software system is scalable if it can handle increased demand on any part of the
system (i.e., more clients, bigger networks, faster networks, etc.) without a change to
the software. In other words, we would like the engineering impact of increased
demand to be proportional to that increase. Distributed systems, however, can be built
for a very wide range of scales and it is thus not a good idea to try to build a system
that can handle everything. A local-area network file server should be built differently
from a Web server that must handle millions of requests a day from throughout the
world. The key goal is to understand the target systems expected size and expected
growth and to understand how the distributed system will scale as the system grows.
2.5.3
Openness
2.5.4
Fault Tolerance
Failures are more harmful: Many clients are affected by the failure of a
distributed service, unlike a non-distributed system in which a failure affects
only a single node.
After a failure, the system should recover critical data, even data that was being
modified when the failure occurred. Data that survives failures is called
persistent data.
Very long-running computations must also be made recoverable in order to
restart them where they left off instead of from the beginning.
For example, if a fileserver crashes, the data in the file system it serves should
be intact after the server is restarted.
Availability
This is a bit harder to achieve than recovery. We often speak of a highlyavailable service as one that is almost always available even if failures occur.
For example, a fileserver could be made highly available by running two copies
of the server on different nodes. If one of the servers fails, the other should be
able to step in without service interruption.
2.5.5
Privacy is achieved when the sender of a message can control what other programs (or
people) can read the message. The goal is to protect against eavesdropping. For
example, if you use your credit card to buy something over the Web, you will
probably want to prevent anyone but the target Web server from reading the message
that contains your credit card account number. Authentication is the process of
ensuring that programs can know who they are talking to. This is important for both
clients and servers.
For clients authentication is needed to enable a concept called trust. For example, the
fact that you are willing to give your credit card number to a merchant when you buy
something means that you are implicitly trusting that merchant to use your number
according to the rules to which you have both agreed (to debit your account for the
amount of the purchase and give the number to no one else). To make a Web
purchase, you must trust the merchants Web server just like you would trust the
merchant for an in-person purchase. To establish this trust, however, you must ensure
that your Web browser is really talking to the merchants Web server and not to some
other program thats just pretending to be their merchant.
For servers authentication is needed to enforce access control. For a server to control
who has access to the resources it manages (your files if it is a fileserver, your money
if it is a banking server), it must know who it is talking to. A Unix login is a crude
example of an authentication used to provide access control. It is a crude example
because a remote login sends your username and password in messages for which
privacy is not guaranteed. It is thus possible, though usually difficult, for someone to
eavesdrop on those messages and thus figure out your username and password.
For a distributed system, the only way to ensure privacy and authentication is by using
cryptography.
2.5.6
Transparency
The final goal is transparency. We often use the term single system image to refer to
this goal of making the distributed system look to programs like it is a tightly coupled
(i.e., single) system.
22
mywbut.com
This is really what a distributed system software is all about. We want the system
software (operating system, runtime library, language, compiler) to deal with all of the
complexities of distributed computing so that writing distributed applications is as
easy as possible.
Achieving complete transparency is difficult. There are eight types, namely:
Naming
Communication
Software Structure
Workload Allocation
Consistency Maintenance
2.6.1
Naming
For example, an IP domain name (e.g., cs.ubc.ca) is turned into a IP address by the IP
Domain Name Server (DNS), a distributed hierarchical server running in the Internet.
23
mywbut.com
2.6.2
Communication
Messages
Software Structure
2.6.3
The main issues are to choose a software structure that supports our goals,
particularly the goal of openness.
We thus want structures that promote extensibility and otherwise make it easy to
program the system.
Alternatives are:
o
Let us study more details regarding the software structure in section 2.7 of this unit.
2.6.4
Workload Allocation
The key issue is load balancing: The allocation of the network workload such
that network resources (e.g., CPUs, memory, and disks) are used efficiently.
Idle Workstations
Every user has a workstation on their desk for doing their work. These workstations
are powerful and are thus valuable resources. But, really people dont use their
workstations all of the time. Theres napping, lunches, reading, meetings, etc. - lots of
times when workstations are idle. In addition, when people are using their
workstation, they often dont need all of its power (e.g., reading mail doesnt require a
200-Mhz CPU and 64-Mbytes of RAM). The goal then is to move processes from
active workstations to idle workstations to balance the load and thus make better use
of network resources. But, people own (or at least feel like they own) their
workstations. So a key issue for using idle workstations is to avoid impacting
workstation users (i.e., we cant slow them down). So what happens when I come
back from lunch and find your programs running on my machine? Ill want your
processes moved elsewhere NOW. The ability to move an active process from one
machine to another is called process migration. Process migration is necessary to
deal with the user-returning-from-lunch issue and is useful for rebalancing network
load as the load characterises a network change over time.
2.6.5
Consistency Maintenance
The final issue is consistency. There are four key aspects of consistency: atomicity,
coherence, failure consistency, and clock consistency.
Atomicity
Coherence
Coherence is the problem of maintaining the consistency of replicated data.
We have said that replication is a useful technique for (1) increasing availability and
(2) improving performance.
When there are multiple copies of an object and one of them is modified, we must
ensure that the other copies are updated (or invalidated) so that no one can erroneously
read an out-of-date version of the object.
A key issue for providing coherence is to deal with the event-ordering problem.
Failure Consistency
We talked last time about the importance of the goal of failure tolerance. To
build systems that can handle failures, we need a model that clearly defines
what a failure is. There are two key types of failures: Fail-stop and Byzantine.
Fail-stop is a simplified failure model in which we assume that the only way a
component will fail is by stopping. In particular, a will never fail by giving a
wrong answer.
In this class we will assume failures are Fail-Stop; this is a common assumption
for distributed systems and greatly simplifies failure tolerance.
25
mywbut.com
Clock Consistency
Lamport introduced a system of logical clocks in order to make the -> relation
possible. It works like this: Each process Pi in the system has its own clock Ci. Ci can
be looked at as a function that assigns a number, Ci(a) to an event a. This is the
timestamp of the event a in process Pi. These numbers are not in any way related to
physical time -- that is why they are called logical clocks. These are generally
implemented using counters, which increase each time an event occurs. Generally, an
events timestamp is the value of the clock at the time it occurs.
Conditions Satisfied by the Logical Clock system:
For any events a and b, if a -> b, then C(a) < C(b). This is true if two conditions are
met:
If a is a message sent from Pi and b is the receipt of that same message in Pj,
then Ci(a) < Cj(b).
Distributed objects
2.7.1
Nodes that import and use the service are called clients.
We should really use the terms client and server to refer to the pieces of the
software that implement the server and not to the nodes themselves. Why?
Because a node can actually be both a client and server. Consider NFS for
example. One workstation might export a local filesystem to other nodes (i.e., it
is a server for the file system) and import other filesystems exported by remote
nodes (i.e., it is a client for these filesystems).
message and then explicitly send it and wait for the reply. (Study more on
RPC given in section 2.9 of this unit).
2.7.2
Language-level objects (e.g., C++, Java, Smalltalk) are used to encapsulate data
and functions of a distributed service.
Clients communicate with servers through objects that they share with the
server. These shared objects are located in a clients memory and look to the
client just like other local objects. We call these objects remote objects
because they represent a remote service (i.e., something at the server). When a
client invokes a method of a remote object, the system might do something
locally or it might turn the method invocation into an RPC to the server. (Recall
that in the object-oriented world, procedures are called methods and procedure
callls are called method invocations).
There are two main advantages of Distributed Objects over RPC: (1) objects
hide even more of the details of distribution than RPC (more next week) and
(2) objects allow clients and servers to communicate using in two different
ways: function shipping (like RPC) and data shipping.
o
function shipping means that a client calls the server and asks it to
perform a function; this is basically like RPC.
data shipping means that the server sends some data to the client (stores it
as part of an object) and the client then performs subsequent functions
locally instead of having to call the server every time it wants to do
something.
2.7.3
For example, in the picture above, a client program running can access local
object A and remote object B (controlled by the server node) in exactly the same
way; the arrows show flow of a procedure call into the object and a return from
it. The server can design B so that calls are turned into remote calls just like
RPC (red) or it can ship some data to the client with the clients copy of the
object and thus allow some calls to run locally on the client (orange).
More on distributed objects later in the course.
28
mywbut.com
The system implements the illusion of shared memory and translates accesses to
shared data into the appropriate messages.
The advantage is that it is really easy to program: hides all of the details of
distribution.
The disadvantage is that it often hides too much. As far as a program knows,
everything in its memory is local. But really, some parts of its memory are
stored on or shared with a remote node. Access to this remote data is very
SLOW compared with accessing local data. For good performance, a program
usually needs to know what data is local and what data is remote.
Another disadvantage is that it is very complicated to implement a distributedshared memory system that works correctly and performs well.
Well talk more about distributed shared memory towards the end of the term.
2.8.1
In the printer example being used here, the problem of storage space in the spooler
typically becomes acute with graphics printers. In such a context, it is desirable to
block an applications process until the printer is ready to accept data from that
applications process, and then let that process directly deliver data to the printer.
29
mywbut.com
2.8.2
this, exactly as described. The token ring network protocol was developed for the
hardware level or the link-level of the protocol hierarchy. Here, we are proposing
building a virtual ring at or above the transport layer and using essentially the same
token passing protocol.
This solution is not problem free. What if the token is lost? What if a process in the
ring ceases transmission? Nonetheless, it is at the root of a number of interesting and
useful distributed mutual exclusion algorithms. The advantage of such distributed
algorithms is that they do not rest on a central authority, and thus, they are ideal
candidates for use in fault tolerant applications.
An important detail in the token-based mutual exclusion algorithm is that, on
receiving a token, a process must immediately forward the token if it is not waiting for
entry into the critical section. This may be done in a number of ways:
Each process could periodically check to see if the token has arrived. This
requires some kind of non-blocking read service to allow the process to poll the
incoming network connection on the token ring. The UNIX FNDELAY flag
allows non-blocking read, and the Unix select() kernel call allows testing an I/O
descriptor to see if a read from that descriptor would block; either of these is
sufficient to support this polling implementation of the token passing protocol.
The fact that UNIX offers two such mechanisms is good evidence that these are
afterthoughts added to Unix after the original implementation was complete.
The receipt of an incoming token could cause an interrupt. Under UNIX, for
example, the SIGIO signal can be attached to a socket or communications line
(see the FASYNC flag set by fcntl). To await the token, the process could
disable the SIGIO signal and do a blocking read on the incoming token socket.
To exit the critical section, the process could first enable SIGIO and then send
the token. The SIGIO handler would read the incoming token and forward it
before returning.
2.8.3
32
Second, wait until your ticket number is the minimum of all tickets in the room. There
may be others with this minimum number, but in inspecting all the tickets in the room,
mywbut.com
you found them! If you find a tie, see if your customer ID number is less than the ID
numbers of those with whom youve tied, and only then enter the critical section and
meet with the baker.
This is inefficient, because you might wait a bit too long while some other process
picks a number after the number you picked, but for now, we'll accept this cost.
If you are not the person holding the smallest number, you start checking again. If you
hold the smallest number, it is also possible that someone else holds the smallest
number. Therefore, what youve got to do is agree with everyone else on how to break
ties.
The solution shown above is simple. Instead of computing the value of the smallest
number, compute the minimum process ID among the processes that hold the smallest
value. In fact, we need not seek the minimum process ID, all we need to do is use any
deterministic algorithm that all participants can agree on for breaking the tie. As long
as all participants apply the same deterministic algorithms to the same information,
they will arrive at the same conclusion.
To return its ticket, and exit the critical section, processes execute the following trivial
bit of code:
N[i] := 0;
When you return your ticket, if any other processes are waiting, then on their next
scan of the set of processes, one of them will find that it is holding the winning ticket.
Moving to a Distributed Context
In the context of distributed systems, Lamport's bakery algorithm has the useful
property that process i only modifies its own N[i] and C[i], while it must read the
entries for all others. In effect, therefore, we can implement this in a context where
each process has read-only access to the data of all other processes, and read-write
access only to its own data.
A distributed implementation of this algorithm can be produced directly by storing
N[i] and C[i] locally with process i, and using message passing when any process
wants to examine the values of N and C for any process other than itself. In this case,
each process must be prepared to act as a server for messages from the others
requesting the values of its variables; we have the same options for implementing this
service as we had for the token passing approach to mutual exclusion. The service
could be offered by an agent process, by an interrupt service routine, or by periodic
polling of the appropriate incoming message queues.
Note that we can easily make this into a fault tolerant model by using a fault-tolerant
client-server protocol for the requests. If there is no reply to a request for the values of
process i after some interval and a few retries, we can simply assume that process i has
failed.
This demonstrates that fault tolerant mutual exclusion can be done without any central
authority! This direct port of Lamports bakery algorithm is not particularly efficient,
though. Each process must read the variables of all other processes a minimum of 3
times-once to select a ticket number, once to see if anyone else is in the process of
selecting a number, and once to see if it holds the minimum ticket.
For each process contending for entry to the critical section, there are about 6N
messages exchanged, which is clearly not very good. Much better algorithms have
been devised, but even this algorithm can be improved by taking advantage of
knowledge of the network structure. On an Ethernet or on a tree-structured network, a
broadcast can be done in parallel, sending one message to N recipients in only a few
time units. On a tree-structured network, the reply messages can be merged on the
way to the root (the originator of the request) so that sorting and searching for the
maximum N or the minimum nonzero N can be distributed efficiently.
33
mywbut.com
2.8.4
Another alternative is for anyone wishing to enter a critical section to broadcast their
request; as each process agrees that it is OK to enter the section, they reply to the
broadcaster saying that it is OK to continue; the broadcaster only continues when all
replies are in.
If a process is in a critical section when it receives a request for entry, it defers its
reply until it has exited the critical section, and only then does it reply. If a process is
not in the critical section, it replies immediately.
This sounds like a remarkably naive algorithm, but with point-to-point
communications between N processes, it takes only 2(N-1) messages for a process to
enter the critical section, N-1 messages to broadcast the request and N-1 replies.
There are some subtle issues that make the result far from naive. For example, what
happens if two processes each ask at the same time? What should be done with
requests received while a process is waiting to enter the critical section?
Ricart and Agrawalas mutual exclusion algorithm solves these problems. In this
solution, each process has 3 significant states, and its behaviour in response to
messages from others depends on its state:
Outside the critical section
The process replies immediately to every entry request.
After requesting entry, awaiting permission to enter.
The process replies immediately to higher priority requests and defers all
other replies until exit from the critical section.
Inside critical section.
The process defers all replies until exit from the critical section.
As with Lamports bakery algorithm, this algorithm has no central authority.
Nonetheless, the interactions between a process requesting entry to a critical section
and each other process have a character similar to client-server interactions. That is,
the interactions take the form of a request followed (possibly some time later) by a
reply.
As such, this algorithm can be made fault tolerant by applying the same kinds of tricks
as are applied in other client server applications. On receiving a request, a processor
can be required to immediately send out either a reply or a negative
acknowledgement. The latter says I got your request and I cant reply yet!
With such a requirement, the requesting process can wait for either a reply or a
negative acknowledgement from every other process. If it gets neither, it can retry the
request to that process. If it retries some limited number of times and still gets no
answer, it can assume that the distant process has failed and give up on it.
If a process receives two consecutive requests from the same process because
acknowledgements have been lost, it must resend the acknowledgement. If a process
waits a long time and doesnt get an acknowledgement, it can send out a message
saying are you still there, to which the distant process would reply I got your
request but I cant reply yet. If it gets no reply, it can retry some number of times and
then give up on the server as being gone.
If a process dies in its critical section, the above code solves the problem and lets one
of the surviving processes in. If a process dies outside its critical section, this code
also works.
Breaking Ties in Ricart and Agrawalas algorithm
There are many ways to break ties between processes that make simultaneous
requests; all of these are based on including the priority of each requesting process in
34
mywbut.com
the request message. It is worth noting that the same alternatives apply to Lamport's
bakery algorithm!
A unique process ID can be used as the priority, as was done in Lamports bakery
algorithm. This is a static priority assignment and is almost always needed to break
ties in any of the more complex cases. Typically, a process will append its statically
assigned process ID to any more interesting information it uses for tiebreaking, thus
guaranteeing that if two processes happen to generate the same interesting
information, the tie will still be broken.
The number of times the process has previously entered the same critical section can
be used; if processes that have entered the critical section more frequently are given
lower priority, then the system will be fair, giving the highest priority to the least
frequent user of the resource.
The time since last access to the critical section offers a similar opportunity to enforce
fairness if the process that used the critical section least recently is given the highest
priority.
If dynamic priority assignments are used, what matters is that the priority used on any
entry to the critical section is frozen prior to broadcasting the request for entry, and
that it remains the same until after the process is done with that round of mutual
exclusion. It is also important that each process has a unique priority, but this can be
assured by appending the process ID as the least significant bits of the dynamically
chosen priority.
Figure 4: Flow of activity that Tables place during a RPC call between
two networked system
2.9.2
2)
2.9.4
RPC is appropriate for client/server applications in which the client can issue a request
and wait for the servers response before continuing its own processing. Because most
RPC implementations do not support peer-to-peer, or asynchronous, client/server
interaction, RPC is not well-suited for applications involving distributed objects or
object-oriented programming.
Asynchronous and synchronous mechanisms each have strengths and weaknesses that
should be considered when designing any specific application. In contrast to
asynchronous mechanisms employed by Message-Oriented Middleware, the use of a
synchronous request-reply mechanism in RPC requires that the client and server are
always available and functioning (i.e., the client or server is not blocked). In order to
allow a client/server application to recover from a blocked condition, an
implementation of a RPC is required to provide mechanisms such as error messages,
request timers, retransmissions, or redirection to an alternate server. The complexity
of the application using a RPC is dependent on the sophistication of the specific RPC
implementation (i.e., the more sophisticated the recovery mechanisms supported by
RPC, the less complex the application utilising the RPC is required to be). RPCs that
implement asynchronous mechanisms are very few and are difficult (complex) to
implement.
36
mywbut.com
When utilising RPC over a distributed network, the performance (or load) of the
network should be considered. One of the strengths of RPC is that the synchronous,
blocking mechanism of RPC guards against overloading a network, unlike the
asynchronous mechanism of Message Oriented Middleware (MOM). However, when
recovery mechanisms, such as retransmissions, are employed by an RPC application,
the resulting load on a network may increase, making the application inappropriate for
a congested network. Also, because RPC uses static routing tables established at
compile-time, the ability to perform load balancing across a network is difficult and
should be considered when designing a RPC-based application.
Tools are available for a programmer to use in developing RPC applications over a
wide variety of platforms, including Windows (3.1, NT, 95), Macintosh, 26 variants of
UNIX, OS/2, NetWare, and VMS. RPC infrastructures are implemented within the
Distributed Computing Environment (DCE), and within Open Network Computing
(ONC), developed by Sunsoft, Inc.
2.9.5 Limitations
RPC implementations are nominally incompatible with other RPC implementations,
although some are compatible. Using a single implementation of a RPC in a system
will most likely result in a dependence on the RPC vendor for maintenance support
and future enhancements. This could have a highly negative impact on a systems
flexibility, maintainability, portability, and interoperability.
Because there is no single standard for implementing a RPC, different features may be
offered by individual RPC implementations. Features that may affect the design and
cost of a RPC-based application include the following:
whether the RPC mechanism can be obtained individually, or only bundled with
a server operating system.
Because of the complexity of the synchronous mechanism of RPC and the proprietary
and unique nature of RPC implementations, training is essential even for the
experienced programmer.
&
1)
mywbut.com
37
2)
3)
4)
5)
2.11 SUMMARY
A distributed operating system takes the abstraction to a higher level, and allows hides
from the application where things are. The application can use things on any of many
computers just as if it were one big computer. A distributed operating system will also
provide for some sort of security across these multiple computers, as well as control
the network communication paths between them. A distributed operating system can
be created by merging these functions into the traditional operating system, or as
another abstraction layer on top of the traditional operating system and network
operating system.
Any operating system, including distributed operating systems, provides a number of
services. First, they control what application gets to use the CPU and handle switching
control between multiple applications. They also manage use of RAM and disk
storage. Controlling who has access to which resources of the computer (or
computers) is another issue that the operating system handles. In the case of
distributed systems, all of these items need to be coordinated for multiple machines.
As systems grow larger handling them can be complicated by the fact that not one
person controls all of the machines so the security policies on one machine may not be
the same as on another.
Some problems can be broken down into very tiny pieces of work that can be done in
parallel. Other problems are such that you need the results of step one to do step two
and the results of step two to do step three and so on. These problems cannot be
broken down into as small of work units. Those things that can be broken down into
very small chunks of work are called fine-grained and those that require larger chunks
are called coarse-grain. When distributing the work to be done on many CPUs there is
a balancing act to be followed. You dont want the chunk of work to be done to be so
small that it takes too long to send the work to another CPU because then it is quicker
to just have a single CPU do the work, You also dont want the chunk of work to be
done to be too big of a chunk because then you cant spread it out over enough
machines to make the thing run quickly.
In this unit we have studied the features of the distributed operating system,
architecture, algorithms relating to the distributed processing, shared memory concept
and remote procedure calls.
38
mywbut.com
2)
3)
4)
Machines on a local area network have their own clocks. If these are not
synchronized, strange things can happen: e.g., the modification date of a file
can be in the future. All machines on a LAN should synchronize their clocks
periodically, setting their own time to the network time. However, adjusting
time by sudden jumps also causes problems: it may lead to time going backward
on some machines. A better way is to adjust the speed of the clock temporarily.
This is done by protocols such as NTP.
5)
Many distributed systems use Remote Procedure Calls (RPCs) as their main
communication mechanism. It is a powerful technique for constructing
distributed, client server based applications. It is based on extending the notion
of conventional or local procedure calling, so that the called procedure need not
exist in the same address space as the calling procedure. The two processes may
be on the same system, or they may be on different systems with a network
connecting them. By using RPC, programmers of distributed applications avoid
the details of the interface with the network. The transport independence of RPC
isolates the application from the physical and the logical elements of the data
communications mechanism and allows the application to use a variety of
transports.
Abraham Silberschatz, Peter Baer Galvin and Greg Gagne, Applied Operating
System Concepts, 1/e, John Wiley & Sons, Inc, New Delhi.
39
mywbut.com
2)
Maarten van Steen and Henk Sips, Computer and Network Organization: An
Introduction, Prentice Hall, New Delhi.
3)
4)
Andrew S. Tanenbaum, Modern Operating Systems, 2/e, Prentice Hall, New Delhi.
5)
6)
7)
8)
Jean bacon and Tim Harris, Operating Systems: Concurrent and distributed
software design, Addison Wesley, 2003, New Delhi.
9)
40
mywbut.com