MCS 031
MCS 031
MCS 031
1.0 INTRODUCTION
We are constantly involved in solving problem. The problems may concern our
survival in a competitive and hostile environment, may concern our curiosity to know
more and more of various facets of nature or may be about any other issues of interest
to us. Problem may be a state of mind of a living being, of not being satisfied with
some situation. However, for our purpose, we may take the unsatisfactory/ Two ideas lie gleaming
unacceptable/ undesirable situation itself, as a problem. on the jeweller’s
velvet. The first is the
One way of looking at a possible solution of a problem, is as a sequence of activities calculus; the second,
(if such a sequence exists at all), that if carried out using allowed/available tools, the algorithm. The
calculus and the rich
leads us from the unsatisfactory (initial) position to an acceptable, satisfactory or body of mathematical
desired position. For example, the solution of the problem of baking delicious analysis to which it
pudding may be thought of as a sequence of activities, that when carried out, gives gave rise made
us the pudding (the desired state) from the raw materials that may include sugar, modern science
possible; but it has
flour and water (constituting the initial position)using cooking gas, oven and some been the algorithm
utensils etc. (the tools). The sequence of activities when carried out gives rise to a that has made possible
process. the modern world.
Technically, the statement or description in some notation, of the process is called David Berlinski
in
an algorithm, the raw materials are called the inputs and the resulting entity (in the The Advent of the
above case, the pudding) is called the output. In view of the importance of the Algorithm, 2000.
concept of algorithm, we repeat:
7
Introduction to Computer Program: An algorithm, when expressed in a notation that can be
Algorithmics understood and executed by a computer system is called a computer program or
simply a program.We should be clear about the distinction between the terms viz., a
process, a program and an algorithm.
It may be noted that for some problems and the available tools, there may not exist
any algorithm that should give the desired output. For example, the problem of
baking delicious pudding may not be solvable, if no cooking gas or any other heating
substance is available. Similarly, the problem of reaching the moon is unsolvable, if
no spaceship is available for the purpose.
Particularly, the symbol ‘←’ is used for assignment. For example, x←y + 3, means
that 3 is added to the value of the variable y and the resultant value becomes the new
value of the variable x. However, the value of y remains unchanged.
If in an algorithm, more than one variables are required to store values of the
same type, notation of the form A[1..n] is used to denote n variables
A[1], A[2], … A[n].
E2. {Is r zero?} If r = 0, the algorithm terminates and n is the answer. Otherwise,
E3. {Interchange}. Let the new value of m be the current value of n and the new
value of n be the current value of r. Go back to Step E1.
The termination of the above method is guaranteed, as m and n must reduce in each
iteration and r must become zero in finite number of repetitions of steps E1, E2 and
E3.
The great Greek mathematician Euclid sometimes between fourth and third century
BC, at least knew and may be the first to suggest, the above algorithm. The algorithm
is considered as among the first non-trivial algorithms. However, the word
‘algorithm’ itself came into usage quite late. The word is derived from the name of the
Persian mathematician Mohammed al-Khwarizmi who lived during the ninth century
A.D. The word ‘al-khowarizmi’ when written in Latin became ‘Algorismus’, from
which ‘algorithm’ is a small step away.
In order to familiarise ourselves with the notation usually used to express algorithms,
next, we express the Euclid’s Algorithm in a pseudo-code notation which is closer to
a programming language.
−b± b 2 − 4ac
x= , (1.3.2)
2a
3x2 + 4x + 1 = 0 (1.3.3)
−4± 4 2 − 4 × 3 ×1 −4±2
= , i.e.,
2× 3 6
−1
x= or − 1.
3
With reference to the above discussion, the issue of finding roots of the general
quadratic equation ax2 + bx + c = 0, with a ≠ 0 is called a problem, whereas the issue
of finding the roots of the particular equation
3x2 + 4x+1 = 0
In general, a problem may have a large, possibly infinite, number of instances. The
above-mentioned problem of finding the roots of the quadratic equation
ax2 + bx + c = 0
with a ≠ 0, b and c as real numbers, has infinitely many instances, each obtained by
giving some specific real values to a, b and c, taking care that the value assigned to a
is not zero. However, all problems may not be of generic nature. For some problems,
10 there may be only one instance/question corresponding to each of the problems. For
example, the problem of finding out the largest integer that can be stored or can be Elementary Algorithmics
arithmetically operated on, in a given computer, is a single-instance problem. Many
of the interesting problems like the ones given below, are just single-instance
problems.
Problem (i): Crossing the river in a boat which can carry at one time, alongwith the
boatman only one of a wolf, a horse and a bundle of grass, in such a way that neither
wolf harms horse nor horse eats grass. In the presence of the boatman, neither wolf
attacks horse, nor horse attempts to eat grass.
Problem (ii): The Four-Colour Problem ∗ which requires us to find out whether a
political map of the world, can be drawn using only four colours, so that no two
adjacent countries get the same colour.
The problem may be further understood through the following explanation. Suppose we
are preparing a coloured map of the world and we use green colour for the terrestrial part
of India. Another country is a neighbour of a given country if it has some boundary in
common with it. For example, according to this definition, Pakistan, Bangladesh and
Myanmar (or Burma) are some of the countries which are India’s neighbours.
Then, in the map, for all the neighbour’s of India, including Pakistan, Bangladesh
and Myanmar, we can not use green colour. The problem is to show that the
minimum number of colours required is four, so that we are able to colour
the map of the world under the restrictions of the problem.
Problem (iii): The Fermat’s Last Theorem: which requires us to show that there
do not exist positive integers a, b, c and n such that
an + bn = cn with n ≥ 3.
The problem also has a very fascinating history. Its origin lies in the simple observation that the equation
x2 + y2 = z2
has a number of solutions in which x, y and z all are integers. For example, for x = 3, y = 4, z = 5, the
equation is satisfied. The fact was also noticed by the great mathematician Pierre De Fermat (1601 ─
1665). But, like all great intellectuals, he looked at the problem from a different perspective. Fermat felt
and claimed that for all integers n ≥ 3, the equation
xn + yn = zn
has no non-trivial ! solution in which x, y and z are all positive integers. And he jotted down the above
claim in a corner of a book without any details of the proof.
However, for more than next 300 years, mathematicians could not produce any convincing proof of the
Fermat’s the-then conjecture, and now a theorem. Ultimately, the proof was given by Andrew Wiles in
1994. Again the proof is based not only on a very long computer program but also on sophisticated
modern mathematics.
Problem (iv): On the basis of another generalisation of the problem of finding integral solutions of
x2+y2 = z2, great Swiss mathematician Leonhard Euler conjectured that for n ≥ 3, the sum of (n ─ 1)
∗
The origin of the Four-colour conjecture, may be traced to the observation by Francis Guthrie, a student
of Augustus De Morgan (of De Morgan’s Theorem fame), who noticed that all the counties (sort of
parliamentary constituencies in our country) of England could be coloured using four colours so that no
adjacent counties were assigned the same colour. De Morgan publicised the problem throughout the
mathematical community. Leaving aside the problem of parallel postulate and the problem in respect of
Fermat’s Last Theorem, perhaps, this problem has been the most fascinating and tantalising one for the
mathematicians, remaining unsolved for more than one hundred years. Ultimately, the problem was
solved in 1976 by two American mathematician, Kenneth Appel and Wolfgang Haken.
However, the proof is based on a computer program written for the purpose, that took 1000 hours of
computer time (in 1976). Hence, the solution generated, among mathematicians, a controversy in the
sense that many mathematicians feel such a long program requiring 1000 hours of computer time in
execution, may have logical and other bugs and hence can not be a reliable basis for the proof of a
conjecture.
!
one solution, of course, is given by x = 0 = y = z, though x, y and z, being zero, are not positive. 11
Introduction to number of nth powers of positive integers can not be an nth power of an integer. For a long time the
Algorithmics conjecture neither could be refuted nor proved. However, in 1966, L.J. Lander and T.R. Parkin found a
counter example for n = 5, by showing that 275 + 845 + 1105 + 1335 = 1445.
Coming back to the problem of finding the roots of a quadratic equation, it can be
easily seen that in finding the roots of a quadratic equation, the only operations that
have been used are plus, minus, multiplication and division of numbers alongwith the
operation of finding out the square root of a number. Using only these operations, it
is also possible, through step-by-step method, to find the roots of a cubic equation
over the real numbers, which, in general, is of the form
ax3 +bx2 + cx + d = 0,
Further, using only the set of operations mentioned above, it is also possible, through
a step-by-step method, to solve a biquadratic equation over real numbers, which, in
general, is of the form
However, the problem of finding the roots of a general equation of degree five or
more, can not be solved, using only the operations mentioned above, through a step-
by-step method, i.e., can not be solved algorithmically.
It may be noted that a ( general ) problem, like finding the roots of an equation of
degree 5 or more, may not be solvable algorithmically, i.e., through some step-by-step
method, still it is possible for some (particular) instances of the problem to have
algorithmic solutions. For example, the roots of the equation
x5 ─ 32 = 0
are easily available through a step-by-step method. Also, the roots of the equation
2x6 ─ 3x3 + 1 = 0 can be easily found through a method, in which, to begin with, we
may take y = x3
Ex. 1) Give at least three examples of problems, each one of which has only finitely
many instances.
Hint: Structures over Boolean set {0, 1} may be good sources for such examples.
Example 1.4.1: Method which is effective (to be explained later) but not definite.
The following is a program fragment for the example method:
x←1
Toss a coin,
If the result is Head then x ←3 else x ← 4
{in the above, the symbol ‘←’ denotes that the value on its R.H.S is assigned to the
variable on its L.H.S. Detailed discussion under (i) of Section 1.6.1}
All the steps, like tossing the coin etc., can be (effectively) carried out. However, the
method is not definite, as two different executions may yield different outputs.
3. Inputs: An algorithm has zero or more, but only finite, number of inputs.
Examples of algorithms requiring zero inputs:
(i) Print the largest integer, say MAX, representable in the computer system
being used.
(ii) Print the ASCII code of each of the letter in the alphabet of the computer
system being used.
(iii) Find the sum S of the form 1+2+3+…, where S is the largest integer less
than or equal to MAX defined in Example (i) above.
4. Output: An algorithm has one or more outputs. The requirement of at least one
output is obviously essential, because, otherwise we can not know the
answer/solution provided by the algorithm.
The outputs have specific relation to the inputs, where the relation is defined by
the algorithm.
A method may be designed which is a definite sequence of actions but is not finite
(and hence not effective)
∗
There are some methods, which are not definite, but still called algorithms viz., Monte Carlo
algorithms in particular and probabilistic algorithms in general. However, we restrict our
algorithms to those methods which are definite alongwith other four characteristics. In other
cases, the full name of the method viz., probabilistic algorithm, is used. 13
Introduction to e = 1+ 1/(1!) + 1/(2!) + 1/(3!) +…..
Algorithmics
and add it to x.
However, the instruction is definite as it is easily seen that computation of each of the
term 1/n! is definite (at least for a given machine).
Ex. 2) For each of the following, give one example of a method, which is not an
algorithm, because
(i) the method is not finite
(ii) the method is not definite
(iii) the method is not effective but finite.
First Algorithm:
The usual method of multiplication, in which table of products of pair of digits x, y
(i.e.; 0 ≤ x, y ≤ 9) are presumed to be available to the system that is required to
compute the product m*n.
For example, the product of two numbers 426 and 37 can be obtained as shown below,
using multiplication tables for numbers from 0 to 9.
426
37
2982
12780
15762
Second Algorithm:
For this algorithm, we assume that the only arithmetic capabilities the system is
endowed with, are
For such a system having only these two capabilities, one possible algorithm to
calculate m*n, as given below, uses two separate portions of a paper (or any other
storage devices). One of the portions is used to accommodate marks upto n, the
multiplier, and the other to accommodate marks upto m*n, the resultant product.
Step 3: Count the number of marks in First Portion. If the count equals n, then count
the number of all marks in the Second Portion and return the last count as the result.
However, if the count in the First Portion is less than n, then make one more mark in
the First Portion and go to Step 2.
Third Algorithm:
The algorithm to be discussed, is known a’la russe method. In this method, it is
presumed that the system has the only capability of multiplying and dividing any
integer by 2, in addition to the capabilities of Second Algorithm. The division must
result in an integer as quotient, with remainder as either a 0 or 1.
The algorithm using only these capabilities for multiplying two positive integers m
and n, is based on the observations that
(ii) However, if m is odd then (m/2) is not an integer. In this case, we write
m = (m ─ 1) + 1, so that (m ─ 1) is even and (m ─ 1)/2 is an integer.
Then
m . n = ((m ─ 1) + 1) . n = (m – 1)n + n
= ((m ─ 1)/2) . (2n) + n.
where (m ─ 1)/2 is an integer as m is an odd integer.
Then
m * n = 7 * 11 = ((7 ─ 1) + 1) * 11 = (7 ─ 1) * 11 + 11
=
(7 − 1) (2 * 11) + 11
2
Therefore, if at some stage, m is even, we halve m and double n and multiply the two
numbers so obtained and repeat the process. But, if m is odd at some stage, then we
halve (m – 1), double n and multiply the two numbers so obtained and then add to the
product so obtained the odd value of m which we had before halving (m ─1).
The algorithm that uses four variables, viz., First, Second, Remainder and
Partial-Result, may be described as follows:
Step 1: Initialize the variables First, Second and Partial-Result respectively with m
(the first given number), n (the second given number) and 0.
Step 2: If First or Second ∗ is zero, return Partial-result as the final result and then
stop.
∗
If, initially, Second ≠ 0, then Second ≠ 0 in the subsequent calculations also. 15
Introduction to Else, set the value of the Remainder as 1 if First is odd, else set Remainder as 0. If
Algorithmics Remainder is 1 then add Second to Partial-Result to get the new value of Partial
Result.
Step 3: New value of First is the quotient obtained on (integer) division of the current
value of First by 2. New value of Second is obtained by multiplying Second by 2. Go
to Step 2.
Example 1.5.1: The logic behind the a’la russe method, consisting of Step 1, Step 2
and Step 3 given above, may be better understood, in addition to the argument given
the box above, through the following explanation:
Here, we may note that as First = 9 is odd and hence Second is added to
Partial-Result. Also
Partial-Result1 = 4*32 = (2 * 2 + 0) * 32 = (2 * 2) * 32 + 0 * 32
= 2* (2 * 32) = First 2 * Second 2.
Again we may note that First1 = 4 is even and we do not add Second2 to
Partial-Result2, where Partial-Result2 = First2 * Second2.
As the value of the First is 0, the value 855 of Partial Result is returned as the result
and stop.
16
Ex. 3) A system has ONLY the following arithmetic capabilities: Elementary Algorithmics
Design an algorithm that multiplies two integers, and fully exploits the capabilities of
the system. Using the algorithm, find the product.
The following three basic actions and corresponding instructions form the basis of
any imperative language. For the purpose of explanations, the notation similar to
that of a high-level programming language is used.
j← 2*i + j− r ;
It is assumed that each of the variables occurring on R.H.S. of the above statement,
has a value associated with it before the execution of the above statement. The
association of a value to a variable, whether occurring on L.H.S or on R.H.S, is made
according to the following rule:
For each variable name, say i, there is a unique location, say loc 1 (i), in the main
memory. Each location loc(i), at any point of time contains a unique value say v(i).
Thus the value v(i) is associated to variable i.
Using these values, the expression on R.H.S. is evaluated. The value so obtained is
the new value of the variable on L.H.S. This value is then stored as a new value of the
variable (in this case, j) on L.H.S. It may be noted that the variable on L.H.S (in this
case, j) may also occur on R.H.S of the assignment symbol.
In such cases, the value corresponding to the occurrence on R.H.S (of j, in this case)
is finally replaced by a new value obtained by evaluating the expression on R.H.S (in
this case, 2 * i + j ─ r).
The values of the other variables, viz., i and r remain unchanged due to assignment
statement.
(ii) The next basic action is to read values of variables i, j, etc. from some
secondary storage device, the identity of which is (implicitly) assumed here, by a
statement of the form
read (i,j, ,…);
17
Introduction to The values corresponding to variables i, j,… in the read statement, are, due to read statement,
Algorithmics stored in the corresponding locations loc(i) , loc(j),…, in the main memory. The values are
supplied either, by default, through the keyboard by the user or from some secondary or
external storage. In the latter case, the identity of the secondary or external storage is also
specified in the read statement.
(iii) The last of the three basic actions, is to deliver/write values of some variables
say i, j, etc. to the monitor or to an external secondary storage by a statement of
the form
write (i, j ,….);
• one after the other on successive lines, or even on the some line if there is
enough space on a line, and
• separated by some statement separator, say semi-colons, and
• in the order of intended execution.
A; B;
C;
D;
denotes that the execution of A is to be followed by execution of B, to be followed by
execution of C and finally by that of D.
18
When the composite action consisting of actions denoted by A, B, C and D, in this Elementary Algorithmics
order, is to be treated as a single component of some larger structure, brackets such as
‘begin….end’ may be introduced, i.e., in this case we may use the structure
If Q then do A else do B,
Where A and B are instructions, which may be even composite instructions obtained
by applying these structuring rules recursively to the other instructions.
Further, in some situations the action B is null, i.e., if Q is false, then no action is
stated.
If Q then do A
In this case, if Q is true, A is executed. If Q is not true, then the remaining part of the
instruction is ignored, and the next instruction, if any, in the program is considered for
execution.
Also, there are situations when Q is not just a Boolean variable i.e., a variable which
can assume either a true or a false value only. Rather Q is some variable capable of
assuming some finite number of values say a, b, c, d, e, f. Further, suppose depending
upon the value of Q, the corresponding intended action is as given by the following
table:
Value Action
a A
b A
c B
d NO ACTION
e D
f NO ACTION
Case Q of
a, b : A;
c : B;
e : D;
end;
80 ≤ M A
60 ≤ M < 80 B
19
Introduction to 50 ≤ M < 60 C
Algorithmics
40 ≤ M < 50 D
M < 40 F
Then the corresponding notation may be:
Case M of
80 . . 100 : ‘A’
60 . . 79 : ‘B’
50 . . 59 : ‘C’
40 . . 49 : ‘D’
0 . . 39 : ‘F’
Example 1.6.2.2: We are required to find out the sum (SUM) of first n natural
numbers. Let a variable x be used to store an integer less than or equal to n, then the
algorithm for the purpose may be of the form:
algorithm Sum_First_N_1
begin
read (n); {assuming value of n is an integer ≥ 1}
x←1 ; SUM← 1;
while (x < n) do …………………………………… (α1)
begin
x ← x + 1;
SUM ← SUM + x
end; {of while loop}……………………………… ( β 1)
write (‘The sum of the first’, n, ‘natural numbers is’ SUM)
end. {of algorithm}
20
Suppose we read 3 as the value of n, and (initially) x equals 1, because of x ← 1 . Elementary Algorithmics
Therefore, as 1<3, therefore the condition x<n is true. Hence the following portion of
the while loop is executed:
begin
x ← x + 1;
SUM ← SUM + x;
end
As soon as the word end is encountered by the meaning of the while-loop, the whole
of the while-loop between (α1) and ( β 1) , (including (α1) and (β1)) is again
executed.
By our assumption n = 3 and it did not change since the initial assumption about n;
however, x has become 2. Therefore, x<n is again satisfied. Again the rest of the
while loop is executed. Because of the execution of the rest of the loop, x becomes 3
and SUM becomes the algorithm comes to the execution of first statement of while-
loop, i.e., while (x<n) do, which tests whether x<n. At this stage x=3 and n=3.
Therefore, x<n is false. Therefore, all statement upto and including (β1) are x<n
skipped.
Then the algorithm executes the next statement, viz., write (‘The sum of the first’, n,
‘natural numbers is’, sum). As, at this stage, the value of SUM is 6, the following
statement is prompted, on the monitor:
It may be noticed that in the statement write (‘ ‘, n, ‘ ’, SUM) the variables n and
SUM are not within the quotes and hence, the values of n and SUM viz 3 and 6 just
before the write statement are given as output,
Do S while (Q)
Here S is called the body of the ‘do. while’ loop. It may be noted that here S is not
surrounded by the brackets, viz., begin and end. It is because of the fact do and while
enclose S.
Again consider the example given above, of finding out the sum of first n natural
numbers. The using ‘do … while’ statement, may be of the form:
Then the execution of the for-loop is terminated. Again begin S do is called the body
of the for-loop. The variable x is called index variable of the for-loop.
Example 1.6.2.4: Again, consider the problem of finding the sum of first n natural
numbers, algorithm using ‘for …’ may be of the form:
algorithm Sum_First_N_3
begin
read (n);
SUM← 0
for x ← 1 to n do (α 3)
begin
SUM ← SUM + x (β 3)
end;
write (‘The sum of the first first’, n, natural numbers numbers’, SUM)
end. {of the algorithm}
In the algorithm Sum_First_N_3 there is only one statement in the body of the
for-loop. Therefore, the bracket words begin and end may not be used in the for-loop.
In this algorithm, also, it may be noted that only the variable SUM is initialized. The
variable x is not initialized explicitly. The variable x is implicitly initialised to 1
through the construct ‘for x varying from 1 to n do’. And, after each execution of the
body of the for-loop, x is implicitly incremented by 1.
Ex.4) Write an algorithm that finds the real roots, if any, of a quadratic equation
ax2 + bx + c = 0 with a ≠ 0, b, c as real numbers.
22
Ex.5) Extend your algorithm of Ex. 4 above to find roots of equations of the form Elementary Algorithmics
ax2 + bx +c = 0, in which a, b, c may be arbitrary real numbers, including 0.
Ex.6) (i) Explain how the algorithm Sum_First_N_2 finds the sum of the first 3
natural numbers.
(ii) Explain how the algorithm Sum_First_N_3 finds the sum of the first 3
natural numbers.
(i) Procedure
(ii) Recursion
1.6.3.1 Procedure
Among a number of terms that are used, in stead of procedure, are subprogram and
even function. These terms may have shades of differences in their usage in different
programming languages. However, the basic idea behind these terms is the same, and
is explained next.
It may happen that a sequence frequently occurs either in the same algorithm
repeatedly in different parts of the algorithm or may occur in different algorithms. In
such cases, writing repeatedly of the same sequence, is a wasteful activity. Procedure
is a mechanism that provides a method of checking this wastage.
where <name>, <parameter-list> and other expressions with in the angular brackets
as first and last symbols, are place-holders for suitable values that are to be substituted
in their places. For example, suppose finding the sum of squares of two variables is a
frequently required activity, then we may write the code for this activity independent
of the algorithm of which it would otherwise have formed a part. And then, in
(1.6.3.1), <name> may be replaced by ‘sum-square’ and <parameter-list> by the two-
element sequence x, y. The variables like x when used in the definition of an
algorithm, are called formal parameters or simply parameters. Further, whenever
the code which now forms a part of a procedure, say sum-square is required at any
place in an algorithm, then in place of the intended code, a statement of the form
is written, where values of a and b are defined before the location of the statement
under (1.6.3.2) within the algorithm.
23
Introduction to Further, the pair of brackets in [: < type >] indicates that ‘: < type >’ is optional. If the
Algorithmics procedure passes some value computed by it to the calling program, then ‘: < type >’
is used and then <type> in (1.6.3.1) is replaced by the type of the value to be passed,
in this case integer.
In cases of procedures which pass a value to the calling program another basic
construct (in addition to assignment, read and write) viz., return (x) is used, where
x is a variable used for the value to be passed by the procedure.
There are various mechanisms by which values of a and b are respectively associated
with or transferred to x and y. The variables like a and b, defined in the calling
algorithm to pass data to the procedure (i.e., the called algorithm), which the
procedure may use in solving the particular instance of the problem, are called actual
parameters or arguments.
In order to explain the involved ideas, let us consider the following simple examples
of a procedure and a program that calls the procedure. In order to simplify the
discussion, in the following, we assume that the inputs etc., are always of the required
types only, and make other simplifying assumptions.
Example 1.6.3.1.1
Procedure sum-square (a, b : integer) : integer;
{denotes the inputs a and b are integers and the output is also an integer}
S: integer;
{to store the required number}
begin
S ← a2 + b2
Return (S)
end;
Program Diagonal-Length
{the program finds lengths of diagonals of the sides of right-angled triangles whose
lengths are given as integers. The program terminates when the length of any side is
not positive integer}
begin
D← square─root (sum-square (L1, L2))
write (‘For sides of given lengths’, L1, L2, ‘the required diagonal length is’ D);
read (L1, L2);
end.
∗
24 For the purpose Ravi Sethi (1996) may be consulted.
In order to explain, how diagonal length of a right-angled triangle is computed by the Elementary Algorithmics
program Diagonal-Length using the procedure sum-square, let us consider the side
lengths being given as 4 and 5.
First Step: In program Diagonal-Length through the statement read (L1, L2), we read
L1 as 4 and L2 as 5. As L1 > 0 and L2 > 0. Therefore, the program enters the
while-loop. Next the program, in order to compute the value of the diagonal calls the
procedure sum-square by associating with a the value of L1 as 4 and with b the value
of L2 as 5. After these associations, the procedure sum-square takes control of the
computations. The procedure computes S as 41 = 16 + 25. The procedure returns 41
to the program. At this point, the program again takes control of further execution.
The program uses the value 41 in place of sum-square (L1, L2). The program calls the
procedure square-root, which is supposed to be built in the computer system, which
temporarily takes control of execution. The procedure square-root returns value 41
and also returns control of execution to the program Diagonal-Length which in turn
assigns this value to D and prints the statement:
The program under while-loop again expects values of L1 and L2 from the user. If the
values supplied by the user are positive integers, whole process is repeated after
entering the while-loop. However, if either L1 ≤ 0 (say ─ 34) or L2 ≤ 0, then while-
loop is not entered and the program terminates.
factorial (1) = 1
factorial (n) = n* factorial (n─1). (1.6.3.2.1)
For those who are familiar with recursive definitions like the one given above for
factorial, it is easy to understand how the value of (n!) is obtained from the above
definition of factorial of a natural number. However, for those who are not familiar
with recursive definitions, let us compute factorial (4) using the above definition.
By definition
factorial (4) = 4 * factorial (3).
Again by the definition
factorial (3) = 3 * factorial (2)
Similarly
factorial (2) = 2* factorial (1)
And by definition
factorial (1) = 1
Substituting back values of factorial (1), factorial (2) etc., we get
factorial (4) = 4.3.2.1=24, as desired.
This definition suggests the following procedure/algorithm for computing the factorial
of a natural number n:
25
Introduction to In the following procedure factorial (n), let fact be the variable which is used to pass
Algorithmics the value by the procedure factorial to a calling program. The variable fact is initially
assigned value 1, which is the value of factorial (1).
In order to compute factorial (n ─ 1), procedure factorial is called by itself, but this
time with (simpler) argument (n ─1). The repeated calls with simpler arguments
continue until factorial is called with argument 1. Successive multiplications of
partial results with 2,3, ….. upto n finally deliver the desired result.
Let us consider how the procedure executes for n = 4 compute the value of
factorial (4).
Initially, 1 is assigned to the variable fact. Next the procedure checks whether the
argument n equals 1. This is not true (as n is assumed to be 4). Therefore, the next
line with n = 4 is executed i.e.,
Now n, the parameter in the heading of procedure factorial (n) is replaced by 3. Again
as n ≠ 1, therefore the next line with n = 3 is executed i.e.,
On the similar grounds, we get fact as 2* factorial (1) and at this stage n = 1. The
value 1 of fact is returned by the last call of the procedure factorial. And here lies the
difficulty in understanding how the desired value 24 is returned. After this stage, the
recursive procedure under consideration executes as follows. When factorial
procedure is called with n = 1, the value 1 is assigned to fact and this value is
returned. However, this value of factorial (1) is passed to the statement fact ←2 *
factorial (1) which on execution assigns the value 2 to the variable fact. This value is
passed to the statement fact ← 3 * factorial (2) which on execution, gives a value of 6
to fact. And finally this value of fact is passed to the statement fact← 4 * factorial (3)
which in turn gives a value 24 to fact. And, finally, this value 24 is returned as value
of factorial (4).
Coming back from the definition and procedure for computing factorial (n), let us
come to general discussion.
In view of the significance of the concept of procedure, and specially of the concept of
recursive procedure, in solving some complex problems, we discuss another recursive
algorithm for the problem of finding the sum of first n natural numbers, discussed
earlier. For the discussion, we assume n is a non-negative integer
Ex.7) Explain how SUM (5) computes sum of first five natural numbers.
Study of these specific types of problems may provide useful help and guidance in
solving new problems, possibly of other problem types.
Next, we enumerate and briefly discuss the sequence of steps, which generally,
one goes through for designing algorithms for solving (algorithmic) problems,
and analyzing these algorithms.
(i) the type of problem, so that if a method of solving problems of the type, is
already known, then the known method may be applied to solve the problem
under consideration.
(ii) the type of inputs and the type of expected/desired outputs, specially, the
illegal inputs, i.e., inputs which are not acceptable, are characterized at this
stage. For example, in a problem of calculating income-tax, the income can not
be non-numeric character strings.
(iii) the range of inputs, for those inputs which are from ordered sets. For example,
in the problem of finding whether a large number is prime or not, we can not
give as input a number greater than the Maximum number (Max, mentioned
above) that the computer system used for the purpose, can store and
arithmetically operate upon. For still larger numbers, some other representation
mechanism has to be adopted.
(iv) special cases of the problem, which may need different treatment for solving
the problem. For example, if for an expected quadratic equation ax2+bx+c=0, a,
the coefficient of x2, happens to be zero then usual method of solving quadratic
equations, discussed earlier in this unit, can not be used for the purpose.
∫ (5 x
2
+ sin 2 x cos 2 x ) dx
∫
5 x 2 dx ∫ Sin
2
and x Cos 2 x dx
2 8 7
1 3 5
6 4
1 2 3
8 4
7 6 5
by sliding, any one of the digits from a cell adjacent to the blank cell, to the blank cell.
Then a wrong step cannot be ignored but has to be recovered. By recoverable, we
mean that we are allowed to move back to the earlier state from which we came to the
current state, if current state seems to be less desirable than the earlier state. The 8-
puzzle problem has recoverable steps, or, we may say the problem is a recoverable
problem
c) However if, we are playing chess, then a wrong step may not be even
recoverable. In other words, we may not be in a position, because of the
adversary’s move, to move back to earlier state. Such a problem is called an
irrecoverable step problem.
For example, for ignorable-step problems, simple control structures for sequencing
and iteration may be sufficient. However, if the problem additionally has recoverable-
step possibilities then facilities like back-tracking, as are available in the programming
language PROLOG, may have to be used. Further, if the problem additionally has
irrecoverable-step possibilities then planning tools should be available in the
computer system, so that entire sequence of steps may be analyzed in advance to find
out where the sequence may lead to, before the first step is actually taken.
which can be known through analyzing the problem under consideration, and
the knowledge of which, in turn, may help us in determining or guessing a correct
sequence of actions for solving the problem under consideration
Most of the computer systems used for educational purposes are PCs based on Von-
Neumann architecture. Algorithms, that are designed to be executed on such
machines are called sequential algorithms.
Also, there are problems, for which finding the exact solutions may be possible, but
the cost (or complexity, to be defined later) may be too much.
In order to find the shortest paths, one should find the cost of covering each of the
n! different paths covering the n given cities. Even for a problem of visiting 10
cities, n!, the number of possible distinct paths is more than 3 million. In a
country like India, a travelling salesperson may be expected to visit even more than 10
cities. To find out exact solution in such cases, though possible, is very time
consuming. In such case, a reasonably good approximate solution may be more
desirable.
We have already enumerated various design techniques and also various problem
domains which have been rigorously pursued for computational solutions. For each
30 problem domain, a particular set of techniques have been found more useful, though
other techniques also may be gainfully employed. A major part of the material of the Elementary Algorithmics
course, deals with the study of various techniques and their suitability for various
types of problem domains. Such a study can be a useful guide for solving new
problems or new problems types.
The topic is beyond the scope of the course and shall not discussed any more.
Next, even an efficient algorithm that solves a problem, may be coded into an
inefficient program. Even a correct algorithm may be encoded into an incorrect
program.
In view of the facts mentioned earlier that the state of art for proving an
algorithm/program correct is still far from satisfactory, we have to rely on testing the
proposed solutions. However, testing of a proposed solution can be effectively carried
out by executing the program on a computer system (an algorithm, which is not a
program can not be executed). Also by executing different algorithms if more than
one algorithm is available, on reasonably sized instances of the problem under
consideration, we may empirically compare their relative efficiencies. Algorithms,
which are not programs, can be hand-executed only for toy-sized instances.
1.8 SUMMARY
1. In this unit the following concepts have been formally or informally defined
and discussed:
5. In order to emphasize the significant role that available tools play in the
design of an algorithm, the problem of multiplication of two natural numbers
is solved in three different ways, each using a different set of available tools.
(Section 1.5)
8. In Section 10, the following issues which play an important role in designing,
developing and choosing an algorithm for solving a given problem, are
discussed:
1.9 SOLUTIONS/ANSWERS
Ex.1)
ax2 + bx + c = 0,
where x,y,a,b,c ε {0,1} and a ≠ 0
x2 + x + 1 = 0 (for a = 1 = b = c)
x2 + x =0 (for a = 1 = b, c = 0)
x2 + 1 =0 (for a = 1 = c, b = 0)
x2 =0 (for a = 1, b = 0 = c)
33
Introduction to Example Problem 2: (Goldbach Conjecture): In 1742, Christian Goldbach
Algorithmics conjectured that every even integer n with n>2, is the sum of two prime
numbers. For example 4=2+2, 6=3+3, 8=5+3 and so on.
Example Problem 3: (The Twin Prime Conjecture): Two primes are said to
be twin primes, if these primes differ by 2. For example, 3 and 5, 5 and 7, 11
and 13 etc. The conjecture asserts that there are infinitely many twin primes.
Though, twin primes have been found each of which has more than 32,220
digits, but still the conjecture has neither been proved or disproved. Again the
Twin Prime Conjecture is a single instance problem.
Ex.2)
S=1─1/2 +1/3─1/4+1/5……..
S can be written in two different ways, showing ½<S<1:
S= 1─ (1/2─1/3) – (1/4 ─1/5) ─ (1/6 – 1/7)………
{Showing S<1 as all the terms within parentheses are positive}
S= (1─1/2) + (1/3─1/4) + (1/5 ─1/6)+………..
{Showing ½<S, again as all terms within parenthesis are positive and
first term equals ½}
Method Not-Definite
Read (x)
{Let an Urn contain four balls of different colours viz., black, white, blue and
red. Before taking the next step, take a ball out of the urn without looking at the
ball}
Begin
(iii) If any of the following is a part of a method then the method is not
effective but finite.
∗
It is not possible to tell whether the speaker is actually telling lies or not. Because , if the
speaker is telling lies, then the statement: ‘I am telling lies’ should be false. Hence the speaker
is not telling lies. Similarly, if the speaker is not telling lies then the statement: ‘I am telling
lies’ should be true. Therefore, the speaker is telling lies. Hence, it is not possible to tell
whether the statement is true or not. Thus, the part of the method, and hence the method itself,
is not effective. But this part requires only finite amount of time to come to the conclusion that
the method is not effective.
**
A word is said to be autological if it is an adjective and the property denoted by it applies to
the word itself. For example, each of the words English, polysyllabic are autological. The
word single is a single word, hence single is autological. Also, the word autological is
autological.
A word is heterological, if it is an adjective and the property denoted by the word, does not
apply to the word itself. For example, the word monosyllabic is not monosyllabic. Similarly,
long is not a long word. German is not a German (language) word. Double is not a double
word. Thus, each of the words: monosyllabric , long, German, and double is heterological.
But, if we think of the word heterological , which is an adjective, in respect of the matter of
determining whether it is heterological or not, then it is not possible to make either of the two
statements:
The reason being that either of these the assertions alongwith definition of heterological leads
to the assertion of the other. However, both of (i) and (ii) above can not be asserted
simultaneously. Thus it is not possible to tell whether the word heterological is heterological
or not. 35
Introduction to Case 2: When on division of first by 3, remainder=1
Algorithmics Let first=22 and second=16
Then
first*second = 22*16= (7*3+1)*16
=7*3*16+1*16 = 7*(3*16)+1*16
=[ first/3]*(3*second)+1*second
= [ first/3]* (3* second)+ remainder * second
The required algorithm which uses variables First, Second, Remainder and
Partial-Result, may be described as follows:
Step 1: Initialise the variables First, Second and Partial-Result respectively with
m (the first given number), n (the second given number) and 0.
Step 2: If First or Second ∗ is zero, then return Partial-result as the final result
and then stop. Else
Partial-Result1← First1*Second1;
Partial-Result←Partial-Result1+Remainder1*Second;
Step 3:
{For computing Partial-Result1, replace First by First1; Second by Second1, and Partial-Result by
Partial-Result1 in Step 2 and repeat Step 2}
∗
If, initially, Second ≠ 0, then Second ≠ 0 in the subsequent calculations also.
{Remainder is obtained through the equation
**
First = 3*First1+Remainder1
with 0≤Remainder1<2
Second1 = 3*Second
Partial-Result1= First1 * Second1
Partial-Result = First * Second = (First1*3+Remainder1)*(Second)
=(First1*3)*Second+Remainder1*Second
=First1*(3*Second)+Remainder1*Second
=First1*Second1+Remainder1*Second
=Partial-Result1+Remainder1*Second
Where 0≤Remainder1<2
36 Thus at every stage, we are multiplying and dividing, if required by at most 3
First Second Remainder Partial Result Elementary Algorithmics
on division
by 3
Initially: 52 19 0
Step 2 As value of First ≠ 0, 1 19
hence continue
Step 3 17 57
Step 2 Value of first ≠ 0, 2 2*57+19 =133
continue,
Step 3 5 171
Step 2 Value of First ≠ 0, 2 2*171+133
continue = 475
Step 3 1 513
Step 2 Value of first ≠ 0, 1 513+475=988
continue
Step 3 0 304
As the value of the First is 0, the value 988 of Partial Result is returned as the result
and stop.
Ex. 4)
Algorithm Real-Roots-Quadratic
Ex. 5)
Algorithm Real-Roots-General-Quadratic
{In this case, a may be zero for the quadratic equation’
Variable name temp is used to store results of intermediate computations.
Further, first and second are variable names which may be used to store
intermediate results and finally, first and second store the values of first real
root (if it exists) and second real root (if it exists) of the equation}.
Ex.6) (i)
Initially the value of variable n is read as 3. Each of the variables x and Sum is
assigned value 0. Then without any condition the algorithm enters the do…while
loop. The value x is incremented to 1 and the execution of statement
SUM←Sum +1 makes SUM as 1.
Next the condition x<n is tested which is true, because x = 1 and n = 3. Once
the condition is true the body of the do..while loop is entered again and executed
second time. By this execution of the loop, x becomes 2 and SUM becomes
1+2=3.
38
As x = 2< 3 = n, the body of the do..while loop is again entered and executed Elementary Algorithmics
third time. In this round/iteration, x becomes 3 and SUM becomes 3+3 =6.
Again the condition x<n is tested But, now x = 3 n = 3, therefore, x < n is false.
Hence the body of the do..while loop is no more executed i.e., the loop is
terminated. Next, the write statement gives the following output:
The last statement consisting of end followed by dot indicates that the algorithm
is to be terminated. Therefore, the algorithm terminates.
Ex 6 (ii)
SUM← Sum+x
is executed to give the value 1 to SUM. After executing once the body of the
for-loop, the value of the index variable x is implemental incremented by 1 to
become 2.
After each increment in the index variable, the value of the index variable is
compared with the final value, which in this case, is n equal to 3. If index
variable is less than or equal to n (in this case) then body of for-loop is executed
once again.
The last statement consisting of end followed by dot, indicates that the
algorithm is to be terminated. Hence, the algorithm is terminated.
Ex.7)
39
Introduction to At this stage n=0, and accordingly, the algorithm returns value 0. Substituting
Algorithmics the value o of SUM (0) we get
S1= 1+0=1 which is returned by SUM(1).
Substituting this value we get S2=3. Continuing like this, we get S3 =6, S4=10
and S5=15
40
Some Pre-requisites and
UNIT 2 SOME PRE-REQUISITES AND Asymptotic Bounds
ASYMPTOTIC BOUNDS
Structure Page Nos.
2.0 Introduction 41
2.1 Objectives 41
2.2 Some Useful Mathematical Functions &Notations 42
2.2.1 Functions & Notations
2.2.2 Modular Arithmetic/Mod Function
2.3 Mathematical Expectation 49
2.4 Principle of Mathematical Induction 50
2.5 Concept of Efficiency of an Algorithm 52
2.6 Well Known Asymptotic Functions & Notations 56
2.6.1 Enumerate the Five Well-Known Approximation Functions
and How These are Pronounced
2.6.2 The Notation O
2.6.3 The Ω Notation
2.6.4 The Notation Θ
2.6.5 The Notation o
2.6.6 The Notation ω
2.7 Summary 66
2.8 Solutions/Answers 67
2.9 Further Readings 70
2.0 INTRODUCTION
We have already mentioned that there may be more than one algorithms, that solve a
given problem. In Section 3.3, we shall discuss eight algorithms to sort a given list of
numbers, each algorithm having its own merits and demerits. Analysis of algorithms,
The understanding of the
the basics of which we study in Unit 3, is an essential tool for making well-informed theory of a routine may be
decision in order to choose the most suitable algorithm, out of the available ones if greatly aided by providing,
any, for the problem or application under consideration. at the time of construction
one or two statements
concerning the state of the
A number of mathematical and statistical tools, techniques and notations form an machine at well chose
essential part of the baggage for the analysis of algorithms. We discuss some of these points…In the extreme form
tools and techniques and introduce some notations in Section 2.2. However, for of the theoretical method a
detailed discussion of some of these topics, one should refer to the course material of watertight mathematical
MCS-013. proof is provided for the
assertions. In the extreme
form of the experimental
Also, in this unit, we will study a number of well-known approximation functions. method the routine is tried
These approximation functions which calculate approximate values of quantities out one the machine with a
under consideration, prove quite useful in many situations, where some of the variety of initial conditions
and is pronounced fit if the
involved quantities are calculated just for comparison with each other. And the assertions hold in each case.
correct result of comparisons of the quantities can be obtained even with approximate Both methods have their
values of the involved quantities. In such situations, the advantage is that the weaknesses.
approximate values may be calculated much more efficiently than can the actual
A.M. Turing
values. Ferranti Mark 1
Programming Manual (1950)
2.1 OBJECTIVES
After going through this Unit, you should be able to:
Unless mentioned otherwise, we use the letters N, I and R in the following sense:
N = {1, 2, 3, …}
I = {…, ─ 2, ─, 0, 1, 2, ….}
R = set of Real numbers.
(i) Summation:
The expression
a1 + a2+ …+ai+…+an
may be denoted in shorthand as
n
∑a
i =1
i
(ii) Product
The expression
a1 × a2 × … × ai × … × an
may be denoted in shorthand as
n
∏a
i =1
i
42
Definition 2.2.1.2: Some Pre-requisites and
Asymptotic Bounds
Function:
For two given sets A and B (which need not be distinct, i.e., A may be the same as B)
a rule f which associates with each element of A, a unique element of B, is called a
function from A to B. If f is a function from a set A to a set B then we denote the fact
by f: A → B. Also, for x ε A, f(x) is called image of x in B. Then, A is called the
domain of f and B is called the Codomain of f.
Example 2.2.1.3:
Let f: I → I be defined such that
f(x) = x2 for all x ε I
Then
f maps ─ 4 to 16
f maps 0 to 0
f map 5 to 25
Remark 2.2.1.4:
We may note the following:
(i) if f: x → y is a function, then there may be more than one elements, say x1
and x2 such that
f(x1) = f(x2)
For example, in the Example 2.2.1.3
f(2) = f(─2) = 4
(ii) Though for each element x ε X, there must be at least one element y ε Y
s.t f(x) = y. However, it is not necessary that for each element y ε Y,
there must be an element x ε X such that f(x) = y. For example, for
y = ─ 3 ε Y there is no x ε X s.t f(x) = x2 = ─ 3.
By putting the restriction on a function f, that for each y ε Y, there must be at least
one element x of X s.t f(x) = y, we get special functions called onto or surjective
functions and shall be defined soon.
Definition 2.2.1.5:
We have already seen that the function defined in Example 2.2.1.3 is not 1-1.
However, by changing the domain, through defined by the same rule, f becomes a
1-1 function.
Example 2.2.1.2:
In this particular case, if we change the domain from I to N = {1,2,3…} then we can
easily check that function
∗
Some authors write 1-to-1 in stead of 1-1. However, other authors call a function 1-to-1 if
f is both 1-1 and onto (to be defined 0 in a short while). 43
Introduction to f:N→I defined as
Algorithmics f(x) = x2, for all x ε N,
is 1-1.
Because, in this case, for each x ε N its negative ─ x ∉ N. Hence for f(x) = f(y)
implies x = y. For example, If f(x) = 4 then there is only one value of x, viz, x = 2 s.t
f(2) = 4.
Definition 2.2.1.7:
We have already seen that the function defined in Example 2.2.1.3 is not onto.
However, in this case either, by changing the codomain Y or changing the rule, (or
both) we can make f as Onto.
Definition 2.2.1.10:
Monotonic Functions: For the definition of monotonic functions, we consider
only functions
f: R → R
where, R is the set of real numbers ∗ .
∗
Monotonic functions
f : X → Y,
may be defined even when each of X and Y, in stead of being R, may be any ordered sets.
44 But, such general definition is not required for our purpose.
A function f: R → R is said to be monotonically increasing if for x, y ε R and x ≤ y Some Pre-requisites and
Asymptotic Bounds
we have f(x) ≤ f(y).
In other words, as x increases, the value of its image f(x) also increases for a
monotonically increasing function.
Further, f is said to be strictly monotonically increasing, if x < y then f(x) < f(y)
Example 2.2.1.11:
We will discuss after a short while, useful functions called Floor and Ceiling
functions which are monotonic but not strictly monotonic.
Further, f is said to be strictly monotonically decreasing, if x < y then f(x) > f(y).
Example 2.2.1.12:
Let f: R → R be defined as
F(x) = ─ x + 3
if x1 ≥ x2 then ─ x1 ≤ ─ x2 implying ─ x1+3 ≤ ─ x2 + 3,
which further implies f(x1) ≤ f(x2)
Hence, f is monotonically decreasing.
Next, we define Floor and Ceiling functions which map every real number to an
integer.
Definition 2.2.1.13:
Floor Function: maps each real number x to the integer, which is the greatest of all
integers less than or equal to x. Then the image of x is denoted by ⎣ x ⎦.
Definition 2.2.1.14:
Ceiling Function: maps each real number x to the integer, which is the least of all
integers greater than or equal to x. Then the image of x is denoted by ⎡x ⎤.
x ─ 1 < ⎣ x ⎦ ≤ x ≤ ⎡x ⎤ < x + 1.
45
Introduction to Example 2.2.1.15:
Algorithmics
Each of the floor function and ceiling function is a monotonically increasing function
but not strictly monotonically increasing function. Because, for real numbers x and y,
if x ≤ y then y = x + k for some k ≥ 0.
Similarly
⎡y⎤ = ⎡x + k⎤ = least integer greater than or equal to x + k ≥ least integer
greater than or equal to x = ⎡x⎤.
But, each of floor and ceiling function is not strictly increasing, because
(ii) If, it is 5th day (i.e., Friday) of a week, after 4 days, it will be 2nd day
(i.e., Tuesday) and not 9th day, of course of another, week (whenever the number of
the day exceeds 7, we subtract n = 7 from the number, we are taking here Sunday as 7th day, in
stead of 0th day)
(iii) If, it is 6th month (i.e., June) of a year, then after 8 months, it will be 2nd month
(i.e., February) of, of course another, year ( whenever, the number of the month exceeds
12, we subtract n = 12)
Definition 2.2.2.1:
b mod n: if n is a given positive integer and b is any integer, then
If b = ─ 42 and n = 11 then
b mod n = ─ 42 mod 11 = 2 (Θ ─ 42 = (─ 4) × 11 + 2)
Mod function can also be expressed in terms of the floor function as follows:
b (mod n) = b ─ ⎣ b/n⎦ × n
Definition 2.2.2.2:
Factorial: For N = {1,2,3,…}, the factorial function
factorial: N ∪ {0} → N∪ {0}
46 given by
factorial (n) = n × factorial (n ─ 1) Some Pre-requisites and
Asymptotic Bounds
has already been discussed in detail in Section 1.6.3.2.
Definition 2.2.2.3:
Exponentiation Function Exp: is a function of two variables x and n where x is any
non-negative real number and n is an integer (though n can be taken as non-integer
also, but we restrict to integers only)
For n = 0
Exp (x, 0) = x0 = 1
For n > 0
Exp (x, n) = x × Exp (x, n ─ 1)
i.e
xn = x × xn-1
For n < 0, let n = ─ m for m > 0
1
xn = x-m =
xm
In xn, n is also called the exponent/power of x.
For example: if x = 1.5, n = 3, then
For two integers m and n and a real number b the following identities hold:
((b) ) m n
= bmn
(b ) m n
= (b )n m
bm . bn = bm+n
Definition 2.2.2.4:
Polynomial: A polynomial in n of degree k, where k is a non-negative integer, over
R, the set of real numbers, denoted by P(n), is of the form
We may note that P(n) = nk = 1.nk for any k, is a single-term polynomial. If k ≥ 0 then
P(n) = nk is monotonically increasing. Further, if k ≤ 0 then p(n) = nk is
monotonically decreasing.
nc
Lim n = 0
n →∞ b
The result, in non-mathematical terms, states that for any given constants b and c, but
1c 2 c 3c kc
with b > 1, the terms in the sequence 1 , 2 , 3 , ..., k , .... gradually decrease and
b b b b
approaches zero. Which further means that for constants b and c, and integer
variable n, the exponential term bn, for b > 1, increases at a much faster rate than
the polynomial term nc.
Definition 2.2.2.6:
The letter e is used to denote the quantity
1 1 1
1+ + + + ......,
1! 2! 3!
and is taken as the base of natural logarithm function, then for all real numbers x,
x2 x3 ∞
xi
ex = 1 + x + +
2! 3!
+ .... = ∑
i = 0 i!
Definition 2.2.2.8:
Logarithm: The concept of logarithm is defined indirectly through the definition of
Exponential defined earlier. If a > 0, b > 0 and c > 0 are three real numbers, such that
c = ab
The following important properties of logarithms can be derived from the properties
of exponents. However, we just state the properties without proof.
Result 2.2.2.9:
For n, a natural number and real numbers a, b and c all greater than 0, the following
identities are true:
(i) loga (bc) = loga b+loga c
n
(ii) loga (b ) = n logab
(iii) logba = logab
(iv) loga (1/b) = ─ logba
1
(v) logab =
log b a
Example 2.1: Suppose, the students of MCA, who completed all the courses in the
year 2005, had the following distribution of marks.
0% to 20% 08
20% to 40% 20
40% to 60% 57
60% to 80% 09
80% to 100% 06
If a student is picked up randomly from the set of students under consideration, what
is the % of marks expected of such a student? After scanning the table given above,
we intuitively expect the student to score around the 40% to 60% class, because, more
than half of the students have scored marks in and around this class.
Assuming that marks within a class are uniformly scored by the students in the class,
the above table may be approximated by the following more concise table:
49
Introduction to % marks Percentage of students scoring the marks
Algorithmics
10 ∗ 08
30 20
50 57
70 09
90 06
Thus, we assign weight (8/100) to the score 10% (Θ 8, out of 100 students, score on
the average 10% marks); (20/100) to the score 30% and so on.
Thus
8 20 57 9 6
Expected % of marks = 10 × + 30 × + 50 × + 70 × + 90 × = 47
100 100 100 100 100
The final calculation of expected marks of 47 is roughly equal to our intuition of the
expected marks, according to our intuition, to be around 50.
We generalize and formalize these ideas in the form of the following definition.
Mathematical Expectation
For a given set S of items, let to each item, one of the n values, say, v1, v2,…,vn, be
associated. Let the probability of the occurrence of an item with value vi be pi. If an
item is picked up at random, then its expected value E(v) is given by
n
E(v) = ∑pv
i −1
i i = p1. v1 + p 2 . v 2 + ........ p n . vn
50 ∗
10 is the average of the class boundaries 0 and 20.
Let us consider the following sequence in which nth term S(n) is the sum of first Some Pre-requisites and
Asymptotic Bounds
(n─1) powers of 2, e.g.,
S(1) = 20 =2─1
S(2) = 20 + 21 = 22 ─ 1
S(3) = 20 + 21 + 22 = 23 ─ 1
(ii) Induction Hypothesis: Assume, for some k > base-value (=1, in this case)
that
S(k) = 2k ─ 1.
(iii) Induction Step: Using (i) & (ii) establish that (in this case)
S(k+1) = 2k+1 ─ 1
In order to establish
S(k+1) = 2k+1 ─ 1, (A)
we use the definition of S(n) and Steps (i) and (ii) above
By definition
S(k+1) = 20 + 21+…+2k+1-1
= (20 + 21+…+2k-1) + 2k (B)
But by definition
20 + 21+…+2k-1 = S(k). (C)
S(k+1) = (2k ─ 1) + 2k
∴ S(k+1) = 2.2k ─ 1 = 2k+1 ─ 1
which establishes (A).
Ex.2) Let us assume that we have unlimited supply of postage stamps of Rs. 5 and
Rs. 6 then
51
Introduction to (i) through, direct calculations, find what amounts can be realized in terms
Algorithmics of only these stamps.
(ii) Prove, using Principle of Mathematical Induction, the result of your
efforts in part (i) above.
Mainly the two computer resources taken into consideration for efficiency measures,
are time and space requirements for executing the program corresponding to the
solution/algorithm. Until it is mentioned otherwise, we will restrict to only time
complexities of algorithms of the problems.
It is easy to realize that given an algorithm for multiplying two n × n matrices, the
time required by the algorithm for finding the product of two 2 × 2 matrices, is
expected to take much less time than the time taken by the same algorithm for
multiplying say two 100 × 100 matrices. This explains intuitively the notion of the
size of an instance of a problem and also the role of size in determining the (time)
complexity of an algorithm. If the size (to be later considered formally) of general
instance is n then time complexity of the algorithm solving the problem (not just
the instance) under consideration is some function of n.
In view of the above explanation, the notion of size of an instance of a problem plays
an important role in determining the complexity of an algorithm for solving the
problem under consideration. However, it is difficult to define precisely the concept
of size in general, for all problems that may be attempted for algorithmic solutions.
Formally, one of the definitions of the size of an instance of a problem may be taken
as the number of bits required in representing the instance.
However, for all types of problems, this does not serve properly the purpose for which
the notion of size is taken into consideration. Hence different measures of size of an
instance of a problem, are used for different types of problem. For example,
(i) In sorting and searching problems, the number of elements, which are to be
sorted or are considered for searching, is taken as the size of the instance of
the problem of sorting/searching.
(ii) In the case of solving polynomial equations or while dealing with the algebra
of polynomials, the degrees of polynomial instances, may be taken as the
sizes of the corresponding instances.
There are two approaches for determining complexity (or time required) for executing
an algorithm, viz.,
52
(i) empirical (or a posteriori) and Some Pre-requisites and
Asymptotic Bounds
(ii) theoretical (or a priori).
The theoretical approach has a number of advantages over the empirical approach
including the ones enumerated below:
(i) The approach does not depend on the programming language in which the
algorithm is coded and on how it is coded in the language,
(ii) The approach does not depend on the computer system used for executing (a
programmed version of) the algorithm.
(iii) In case of a comparatively inefficient algorithm, which ultimately is to be
rejected, the computer resources and programming efforts which otherwise
would have been required and wasted, will be saved.
(iv) In stead of applying the algorithm to many different-sized instances, the
approach can be applied for a general size say n of an arbitrary instance of the
problem under consideration. In the case of theoretical approach, the size n
may be arbitrarily large. However, in empirical approach, because of
practical considerations, only the instances of moderate sizes may be
considered.
Remark 2.5.1:
In view of the advantages of the theoretical approach, we are going to use it as
the only approach for computing complexities of algorithms. As mentioned earlier,
in the approach, no particular computer is taken into consideration for calculating time
complexity. But different computers have different execution speeds. However, the
speed of one computer is generally some constant multiple of the speed of the other.
Therefore, this fact of differences in the speeds of computers by constant
multiples is taken care of, in the complexity functions t for general instance sizes
n, by writing the complexity function as c.t(n) where c is an arbitrary constant.
An important consequence of the above discussion is that if the time taken by one
machine in executing a solution of a problem is a polynomial (or exponential)
function in the size of the problem, then time taken by every machine is a polynomial
(or exponential) function respectively, in the size of the problem. Thus, functions
differing from each other by constant factors, when treated as time complexities
should not be treated as different, i.e., should be treated as complexity-wise
equivalent.
53
Introduction to Remark 2.5.2:
Algorithmics
Asymptotic Considerations:
Computers are generally used to solve problems involving complex solutions. The
complexity of solutions may be either because of the large number of involved
computational steps and/or because of large size of input data. The plausibility of the
claim apparently follows from the fact that, when required, computers are used
generally not to find the product of two 2 × 2 matrices but to find the product of two
n × n matrices for large n, running into hundreds or even thousands.
Similarly, computers, when required, are generally used not only to find roots of
quadratic equations but for finding roots of complex equations including polynomial
equations of degrees more than hundreds or sometimes even thousands.
The above discussion leads to the conclusion that when considering time complexities
f1(n) and f2(n) of (computer) solutions of a problem of size n, we need to consider and
compare the behaviours of the two functions only for large values of n. If the relative
behaviours of two functions for smaller values conflict with the relative behaviours
for larger values, then we may ignore the conflicting behaviour for smaller values.
For example, if the earlier considered two functions
represent time complexities of two solutions of a problem of size n, then despite the
fact that
f1 (n) ≥ f2 (n) for n ≤ 14,
we would still prefer the solution having f1 (n) as time complexity because
This explains the reason for the presence of the phrase ‘n ≥ k’ in the definitions
of the various measures of complexities and approximation functions, discussed
below:
Remark 2.5.3:
Comparative Efficiencies of Algorithms: Linear, Quadratic, Polynomial
Exponential
Suppose, for a given problem P, we have two algorithms say A1 and A2 which solve
the given problem P. Further, assume that we also know time-complexities T1(n) and
T2 (n) of the two algorithms for problem size n. How do we know which of the two
algorithms A1 and A2 is better?
The difficulty in answering the question arises from the difficulty in comparing time
complexities T1(n) and T2(n).
More explicitly
The issue will be discussed in more detail in Unit 3. However, here we may mention
that, in view of the fact that we generally use computers to solve problems of large
54
sizes, in the above case, the algorithms A1 with time-complexity T1 (n) = 1000n2 is Some Pre-requisites and
Asymptotic Bounds
preferred over the algorithm A2 with time-complexity T2 (n) = 5n4, because
T1 (n) ≤ T2(n) for all n ≥ 15.
BT ( n ) = a k n k + a k −1 + ..... + a i n i + ... + a 1 n + a 0
1
for some k ≥ 0 with a ' s as real numbers and a k > 0, and
i
(ii) If, again a problem is solved by two algorithms D1 and D2 with respectively
polynomial time complexities DT1 and DT2 then if
then the algorithm D1 is assumed to be more efficient and is preferred over D2.
Similarly, the terms ‘quadratic’ and ‘polynomial time’ complexity functions and
algorithms are used when the involved complexity functions are respectively of the
forms c n2 and c1 nk +…….+ck.
Remark 2.5.4:
For all practical purposes, the use of c, in (c t(n)) as time complexity measure, offsets
properly the effect of differences in the speeds of computers. However, we need to be
on the guard, because in some rarely occurring situations, neglecting the effect of c
may be misleading.
For example, if two algorithms A1 and A2 respectively take n2 days and n3 secs for
execution of an instance of size n of a particular problem. But a ‘day’ is a constant
multiple of a ‘second’. Therefore, as per our conventions we may take the two
complexities as of C2 n2 and C3 n3 for some constants C2 and C3. As, we will discuss
later, the algorithm A1 taking C2 n2 time is theoretically preferred over the algorithm
A2 with time complexity C3 n3. The preference is based on asymptotic behaviour of
complexity functions of the algorithms. However in this case, only for instances
requiring millions of years, the algorithm A1 requiring C2 n2 time outperforms
algorithms A2 requiring C3 n3.
Remark 2.2.5:
Unit of Size for Space Complexity: Though most of the literature discusses the
complexity of an algorithm only in terms of expected time of execution, generally
55
Introduction to neglecting the space complexity. However, space complexity has one big advantage
Algorithmics over time complexity.
Ex.3) For a given problem P, two algorithms A1 and A2 have respectively time
complexities T1 (n) and T2 (n) in terms of size n, where
Find the range for n, the size of an instance of the given problem, for whichA1 is more
efficient than A2.
f: N→N
g: N→N
Solution:
Part (i)
Consider
57
Introduction to ∴ there exist C = 6 and k = 1 such that
Algorithmics
f(x) ≤ C. x3 for all x≥k
Thus we have found the required constants C and k. Hence f(x) is O(x3).
Part (ii)
As above, we can show that
However, we may also, by computing some values of f(x) and x4, find C and k as
follows:
Part (iii)
for C = 1 and k = 1 we get
x3 ≤ C (2x3 + 3x2 +1) for all x ≥ k
Part (iv)
We prove the result by contradiction. Let there exist positive constants C and k
such that
Part (v)
Again we establish the result by contradiction.
Let O (2 x3+3x2+1) = x2
implying
implying
x≤C for x ≥ k
58
Again for x = max {C + 1, k } Some Pre-requisites and
Asymptotic Bounds
Example 2.6.2.2:
The big-oh notation can be used to estimate Sn, the sum of first n positive integers
Remark 2.6.2.2:
It can be easily seen that for given functions f(x) and g(x), if there exists one pair of C
and k with f(x) ≤ C.g (x) for all x ≥ k, then there exist infinitely many pairs (Ci, ki)
which satisfy
Because for any Ci ≥ C and any ki ≥ k, the above inequality is true, if f(x)≤ c.g(x) for
all x ≥ k.
Example 2.6.3.1:
For the functions
(iv) x3 = Ω (h(x))
(v) x2 ≠ Ω (h(x))
Solutions:
Part (i)
59
Introduction to Part (ii)
Algorithmics
h(x) = 2x3─3x2+2
Let C and k > 0 be such that
2x3─3x2+2 ≥ C x3 for all x ≥ k
3 2
i.e., (2─C) x ─ 3x +2 ≥ 0 for all x ≥ k
Part (iii)
2x3─3x2+2 = Ω (x2)
It can be easily seen that lesser the value of C, better the chances of the above
inequality being true. So, to begin with, let us take C = 1 and try to find a value of k
s.t
2x3─ 4x2+2 ≥ 0.
For x ≥ 2, the above inequality holds
Part (iv)
Let the equality
x3 = Ω (2x3─3x2+2)
x3 ≥ C(2(x3─3/2 x2 +1))
Part (v)
We prove the result by contradiction.
2C +1
≥ x for all x ≥ k
C
60
(2 C + 1) Some Pre-requisites and
But for any x ≥ 2 , Asymptotic Bounds
C
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers. Then f(x) said to be Θ (g(x)) (pronounced as
big-theta of g of x) if, there exist positive constants C1, C2 and k such that
C2 g(x) ≤ f(x) ≤ C1 g(x) for all x ≥ k.
(Note the last inequalities represent two conditions to be satisfied simultaneously viz.,
C2 g(x) ≤ f(x) and f(x) ≤ C1 g(x))
We state the following theorem without proof, which relates the three functions
O, Ω, Θ.
Theorem: For any two functions f(x) and g(x), f(x) = Θ (g(x)) if and only if
f(x) = O (g(x)) and f(x) = Ω (g(x)).
Solutions
Part (i)
for C1 = 3, C2 = 1 and k = 4
Part (ii)
We can show by contradiction that no C1 exists.
Let, if possible for some positive integers k and C1, we have 2x3+3x2+1≤C1. x2 for all
x≥k
Then
i.e.,
But for
61
Introduction to x= max {C1 + 1, k }
Algorithmics
f(x) ≠ Θ (x4)
C2 x4 ≤ (2x3 + 3x2 + 1)
If such a C2 exists for some k then C2 x4 ≤ 2x3 + 3x2 + 1 ≤ 6x3 for all x ≥ k≥1,
implying
C2 x ≤ 6 for all x ≥ k
⎛ 6 ⎞
But for x = ⎜⎜ + 1⎟⎟
⎝ C2 ⎠
Then for f (x) = O (x3), though there exist C and k such that
yet there may also be some values for which the following equality also holds
However, if we consider
f(x) = O (x4)
The case of f(x) = O (x4), provides an example for the next notation of small-oh.
The Notation o
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers.
Further, let C > 0 be any number, then f(x) = o(g(x)) (pronounced as little oh of
g of x) if there exists natural number k satisfying
62
Here we may note the following points Some Pre-requisites and
Asymptotic Bounds
(i) In the case of little-oh the constant C does not depend on the two functions f (x)
and g (x). Rather, we can arbitrarily choose C >0
(ii) The inequality (B) is strict whereas the inequality (A) of big-oh is not
necessarily strict.
Solutions:
Part (i)
Let C > 0 be given and to find out k satisfying the requirement of little-oh.
Consider
Case when n = 4
3 1
2+ + <C x
x x3
⎧7 ⎫
if we take k = max ⎨ ,1⎬
⎩C ⎭
then
therefore
We prove the result by contradiction. Let, if possible, f(x) = 0(xn) for n≤3.
3 1
2+ + < C xn-3
x x2 63
Introduction to n ≤ 3 and x ≥ k
Algorithmics
As C is arbitrary, we take
3 1
2+ + 2 < C. xn-3 for n ≤ 3 and x ≥ k ≥ 1.
x x
Also, it can be easily seen that
3 1
∴ 2+ + ≤1 for n ≤ 3
x x2
However, the last inequality is not true. Therefore, the proof by contradiction.
We state (without proof) below two results which can be useful in finding small-oh
upper bound for a given function.
Theorem 2.6.5.3: Let f(x) and g(x) be functions in definition of small-oh notation.
f (x)
Lim =0
x →∞ g( x )
Next, we introduce the last asymptotic notation, namely, small-omega. The relation of
small-omega to big-omega is similar to what is the relation of small-oh to big-oh.
Further
f(x) = ω (g(x))
Consider
2x3 + 3x2 + 1 > C x
1
2x2 + 3x + > C (dividing throughout by x)
x
Let k be integer with k≥C+1
More generally
Theorem 2.6.6.3: Let f(x) and g(x) be functions in the definitions of little-omega
Then f(x) = ω (g(x)) if and only if
f (x )
Lim =∞
x →∞ g(x )
g(x )
Lim =0
x →∞ f (x )
2.7 SUMMARY
In this unit, first of all, a number of mathematical concepts are defined. We defined
the concepts of function, 1-1 function, onto function, ceiling and floor functions, mod
function, exponentiation function and log function. Also, we introduced some
mathematical notations.
n
E(v) = ∑pv
i −1
i i = p1. v1 + p 2 . v 2 + ........ p n . vn
Next, five Well Known Asymptotic Growth Rate functions are defined and
corresponding notations are introduced. Some important results involving these
are stated and/or proved
Let f(x) and g(x) be two functions, each from the set of natural numbers or set of
positive real numbers to positive real numbers.
The Notation Θ
Provides simultaneously both asymptotic lower bound and asymptotic upper bound
for a given function.
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers. Then f(x) said to be Θ (g(x)) (pronounced as
big-theta of g of x) if, there exist positive constants C1, C2 and k such that
C2 g(x) ≤ f(x) ≤ C1 g(x) for all x ≥ k.
The Notation o
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers.
Further, let C > 0 be any number, then f(x) = o(g(x)) (pronounced as little oh of
g of x) if there exists natural number k satisfying
The Notation ω
Again the asymptotic lower bound Ω may or may not be tight. However, the
asymptotic bound ω cannot be tight. The formal definition of ω is as follows:
Let f(x) and g(x) be two functions each from the set of natural numbers or the set of
positive real numbers to set of positive real numbers.
Further
f(x) = ω (g(x))
2.8 SOLUTIONS/ANSWERS
Ex. 1) We follow the three-step method explained earlier.
Let S(n) be the statement: 6 divides n3 ─ n
67
Introduction to But 0 = 6 × 0. Therefore, 6 divides 0. Hence S(0) is
Algorithmics correct.
Ex. 2) Part (i): With stamps of Rs. 5 and Rs. 6, we can make the following
the following amounts
5 = 1 × 5 + 0× 6
6 = 0 × 5 + 1× 6 using 2 stamps
10 = 2 × 5 + 0× 6
11 = 1 × 5 + 1× 6 using 2 stamps
12 = 0 × 5 + 2× 6
15 = 3× 5 + 0× 6
16 = 2× 5 + 1× 6
17 = 1 × 5 + 2× 6 using 3 stamps
18 = 0 × 5 + 3× 6
19 is not possible
20 = 4× 5 + 0× 6
21 = 3 × 5 + 1× 6
22 = 2 × 5 + 2× 6 using 4 stamps
23 = 1 × 5 + 3× 6
24 = 0× 5 + 4× 6
68
25 = 5 × 5 + 0× 6 Some Pre-requisites and
26 = 4× 5 + 1× 6 Asymptotic Bounds
27 = 3× 5 + 2× 6
28 = 2× 5 + 3× 6 using 5 stamps
29 = 1 × 5 + 4× 6
30 = 0 × 5 + 5× 6
It appears that for any amount A ≥ 20, it can be realized through stamps
of only Rs. 5 and Rs. 6.
Ex. 3) Algorithm A1 is more efficient than A2 for those values of n for which
4n5 + 3n = T1 (n) ≤ T2 (n) = 2500 n3 + 4n
i.e.,
4n4 + 3 ≤ 2500 n2 + 4
i.e.,
for n ≤ 25
Next, consider n ≥ 26
69
Introduction to 4n2 ─ 2500 ≥ 4(26)2 ─ 2500 = 2704 ─ 2500
Algorithmics 1 1
= 204 > 1 > ≥ 2
(26) n
2
Each factor on the right hand side is less than equal to 1 for all value of
n. Hence, The right hand side expression is always less than one.
Therefore, n!/nn ≤ 1
or, n! ≤ nn
Therefore, n! =O( nn)
Ex. 5)
1. Discrete Mathematics and Its Applications (Fifth Edition) K.N. Rosen: Tata
McGraw-Hill (2003).
70
Basics of Analysis
UNIT 3 BASICS OF ANALYSIS
Structure Page Nos.
3.0 Introduction 71
3.1 Objectives 72
3.2 Analysis of Algorithms ─ Simple Examples 72
3.3 Well Known Sorting Algorithms 75
3.3.1 Insertion Sort
3.3.2 Bubble Sort
3.3.3 Selection Sort
3.3.4 Shell Sort
3.3.5 Heap Sort
3.3.6 Divide and Conquer Technique
3.3.7 Merge Sort
3.3.8 Quick Sort
3.3.9 Comparison of Sorting Algorithms
3.4 Best-Case and Worst-Case Analyses 97
3.4.1 Various Analyses of Algorithms
3.4.2 Worst-Case Analysis
3.4.3 Best-Case Analysis
3.5 Analysis of Non-Recursive Control Structures 100
3.5.1 Sequencing
3.5.2 For Construct
3.5.3 While and Repeat Constructs
3.6 Recursive Constructs 105
3.7 Solving Recurrences 107
3.7.1 Method of Forward Substitution
3.7.2 Solving Linear Second-Order Recurrences with Constant Coefficients
3.8 Average-Case and Amortized Analyses 110
3.8.1 Average-Case Analysis
3.8.2 Amortized Analysis
3.9 Summary 114
3.10 Solutions/Answers 114
3.11 Further Readings 126
3.0 INTRODUCTION
Analysis of algorithms is an essential tool for making well-informed decision in order
It appears that the whole of the
to choose the most suitable algorithm, out of the available ones, if any, for the conditions which enable a
problem or application under consideration. For such a choice of an algorithm, which finite machine to make
is based on some efficiency measures relating to computing resources required by the calculations of unlimited
algorithm, there is no systematic method. To a large extent, it is a matter of judgment extent are fulfilled in the
and experience. However, there are some basic techniques and principles that help Analytical Engine…I have
converted the infinity of space,
and guide us in analysing algorithms. These techniques are mainly for which was required by the
conditions of the problem, into
the infinity of time.
(i) analysing control structures and
Charles Babbage
(ii) for solving recurrence relations, which arise if the algorithm involves recursive About
structures. Provision of iteration &
conditional branching in the
design of his analytical engine
In this unit, we mainly discuss models, techniques and principles for analyzing (year 1834), the first general
algorithms. purpose digital computer
Also, sorting algorithms, which form good sources for learning how to design and
analyze algorithms, are discussed in detail in this unit.
71
Introduction to
Algorithmics 3.1 OBJECTIVES
After going through this Unit, you should be able to:
• explain and use various types of analyses of algorithms;
• tell how we compute complexity of an algorithm from the complexities of the
basic instructions using the structuring rules;
• solve recurrence equations that arise in recursive algorithms, and
• explain and use any one of the several well-known algorithms discussed in the
text, for sorting a given array of numbers.
Computing Prefix Averages: For a given array A[1..n] of numbers, the problem is
concerned with finding an array B[1..n] such that
B[1] = A[1]
B[2] = average of first two entries = (A[1] + A [2])/2
B[3] = average of first 3 entries = (A[1] + A[2] + A[3])/3
Next we discuss two algorithms that solve the problem, in which second algorithm is
obtained by minor modifications in the first algorithm, but with major gains in
algorithmic complexity ─ the first being a quadratic algorithm, whereas the second
algorithm is linear. Each of the algorithms takes array A[1..n] of numbers as input and
returns the array B[1..n] as discussed above.
for j ← 1 to i do
begin {of second for-loop}
Sum ← Sum + A[j];
end {of second for- loop}
B[i] ← Sum/i
end {of the first for-loop}
end {of algorithm}
72
Analysis of First-Prefix-Averages: Basics of Analysis
Step 1: Intitalization step for setting up of the array A[1..n] takes constant time say
C1, in view of the fact that for the purpose, only address of A (or of A[1]) is to be
passed. Also after all the values of B[1..n] are computed, then returning the array
B[1..n] also takes constant time say C2, again for the same reason.
Step 2: The body of the algorithm has two nested for-loops, the outer one, called the
first for-loop is controlled by i and is executed n times. Hence the second for-loop
alongwith its body, which form a part of the first for-loop, is executed n times. Further
each construct within second for-loop, controlled by j, is executed i times just because
of the iteration of the second for-loop. However, the second for-loop itself is being
executed n times because of the first for-loop. Hence each instruction within the
second for-loop is executed (n.i) times for each value of i = 1, 2, …n.
Thus, the first for-loop makes n additions (to reach (n+1)) and n comparisons with
(n+1).
The second for-loop makes, for each value of i=1,2,…,n, one addition and one
comparison. Thus total number of each of additions and comparisons done just for
controlling variable j
n (n + 1)
= (1+2+…+n) = .
2
Step 4: Using the explanation of Step 2, we count below the number of times the
various operations are executed.
(i) (From Step 1) Constant C1 for initialization of A[1..n] and Constant C2 for
returning B[1..n]
(ii) (From Step 3)
Number of additions for control variable i = n.
Number of Comparisons with (n+1) of variable i = n
n (n + 1)
Number of additions for control variable j =
2
Number of comparisons with (i+1) of control variable j =
n (n + 1)
2
Number of initializations (Sum ← 0) = n.
n (n + 1)
Number of additions (Sum ← Sum + A[j]) =
2
n (n + 1)
Number of divisions (in Sum/i) =
2
73
Introduction to n (n + 1)
Algorithmics Number of assignments in (Sum ← Sum + A[j]) =
2
n (n + 1)
Number of assignments in (B[i] ← Sum/i) =
2
Assuming each of the operations counted above takes some constant number of unit
operations, then total number of all operations is a quadratic function of n, the size
of the array A[1..n].
Analysis of Second-Prefix-Averages
Step1: As in First-Prefix-Averages, the intialization step of setting up the values of
A[1..n] and of returning the values of B[1..n] each takes a constant time say C1 and C2
(because in each case only the address of the first element of the array viz A[1] or
B[1] is to be passed)
Step 2: There are n additions for incrementing the values of the loop variable and n
comparisons with (n+1) in order to check whether for loop is to be terminated or not.
Step 3: There n additions, one for each i (viz Sum + A [i]) and n assignments, again
one for each i (Sum ← Sum + A[i]). Also there are n divisions, one for each i (viz
Sum/i) and n (more) assignments, one for each i (viz B[i] ← Sum/i).
(i) 2 n additions
(ii) n comparisons
(iii) (2n+1) assignments
(iv) n divisions
(v) C1 and C2, constants for initialization and return.
74
As each of the operations, viz addition, comparison, assignment and division takes a Basics of Analysis
constant number of units of time; therefore, the total time taken is C.n for some
constant C.
For the discussion on Sorting Algorithms, let us recall the concept of Ordered Set.
We know given two integers, say n1 and n2, we can always say whether n1 ≤ n2 or
n2 ≤ n1. Similarly, if we are given two rational numbers or real numbers, say n1 and
n2, then it is always possible to tell whether n1 ≤ n2 or n2 ≤ n1.
Ordered Set: Any set S with a relation, say, ≤, is said to be ordered if for any two
elements x and y of S, either x ≤ y or y ≤ x is true. Then, we may also say that
(S, ≤) is an ordered set.
Thus, if I, Q and R respectively denote set of integers, set of rational numbers and set
of real numbers, and if ‘≤’ denotes ‘the less than or equal to’ relation then, each of
(I, ≤), (Q, ≤) and (R, ≤) is an ordered set. However, it can be seen that the set
C = {x + iy : x, y ε R and i2 = ─ 1} of complex numbers is not ordered w.r.t ‘≤’. For
example, it is not possible to tell for at least one pair of complex numbers, say 3 + 4i
and 4+3i, whether 3 + 4i ≤ 4 +3i, or 4 + 3i ≤ 3 + 4i.
All the sorting algorithms discussed in this section, are for sorting numbers in
increasing order.
Next, we discuss sorting algorithms, which form a rich source for algorithms. Later,
we will have occasions to discuss general polynomial time algorithms, which of
course include linear and quadratic algorithms.
One of the important applications for studying Sorting Algorithms is the area of
designing efficient algorithms for searching an item in a given list. If a set or a list is
already sorted, then we can have more efficient searching algorithms, which include
75
Introduction to binary search and B-Tree based search algorithms, each taking (c. log (n)) time,
Algorithmics where n is the number of elements in the list/set to be searched.
Example 3.3.1.1
80
{Pick up the next number 32 from the list and place it at correct position
relative to 80, so that the sublist considered so far is sorted}.
32 80
{We may note in respect of the above sorted sublist, that in order to insert 32
before 80, we have to shift 80 from first position to second and then insert 32
in first position.
32 80
{Next number 31 is picked up, compared first with 80 and then (if required)
with 32. in order to insert 31 before 32 and 80, we have to shift 80 to third
position and then 32 to second position and then 31 is placed in the first
position}.
76
31 32 80 Basics of Analysis
{Next 110 is picked up, compared with 80. As 110>80, therefore, no shifting
and no more comparisons. 110 is placed in the first position after 80}.
31 32 80 110
{Next, number 50 is picked up. First compared with 110, found less; next
compared with 80, again found less; again compared with 32. The correct
position for 50 is between 32 and 80 in the sublist given above. Thus, each of
110 and 80 is shifted one place to the right to make space for 50 and then 50
is placed over there
31 32 50 80 110
{Next in order to place 40 after 32 and before 50, each of the values 50, 80
and 110 need to be shifted one place to the right as explained above.
However, values 31 and 32 are not to be shifted. The process of inserting 40
at correct place is similar to the ones explained earlier}.
31 32 40 50 80 110
{to find out the correct relative position for A[ j] and insert it there among
the already sorted elements A[1] to A [j ─ 1]}
77
Introduction to {In order to find correct relative position we store A[j] in m and start with the
Algorithmics last element A[j─1] of the already sorted part. If m is less than A[j─1], then
we move towards left, compare again with the new element of the array. The
process is repeated until either m ≥ some element of the array or we reach the
left-most element A[1]}.
{After finding the correct relative position, we move all the elements of the
array found to be greater than m = A[j], one place to the right so as to make
a vacancy at correct relative position for A[j]}
end; {of while loop}
A[i +1] ← m
{i.e., m = A[j] is stored at the correct relative position}
end {if}
end; {of for loop}
end; {of else part}
end; {of procedure}
Thus in the first pass after scanning once all the numbers in the given list, the largest
number will reach its destination, but other numbers in the array, may not be in order.
In each subsequent pass, one more number reaches its destination.
3.3.2.1 Example
In the following, in each line, pairs of adjacent numbers, shown in bold, are
compared. And if the pair of numbers are not found in proper order, then the
positions of these numbers are exchanged.
iteration number i = 2
31 32 81 50 40 (j = 2
31 32 81 50 40 (j = 3)
31 32 50 81 40 (j = 4)
31 32 50 40 81 (j = 1)
↑
remove from further
consideration
In the second pass, the next to maximum element of the list viz, 81, reaches the 5th
position from left. In the next pass, the list of remaining (n ─ 2) = 4 elements are taken
into consideration.
iteration number i = 3
31 32 50 40 (j = 2)
31 32 50 40 (j = 3)
31 32 40 50 (j = 1)
↑
remove from further
consideration
31 32 40
31 32 40
31 32
These elements are compared and found in proper order. The process
terminates.
Note: As there is only one statement in the scope of each of the two for-loops,
therefore, no ‘begin’ and ‘end’ pair is used.
79
Introduction to 3.3.3 Selection Sort
Algorithmics
Selection Sort for sorting a list L of n numbers, represented by an array A[1..n],
proceeds by finding the maximum element of the array and placing it in the last
position of the array representing the list. Then repeat the process on the subarray
representing the sublist obtained from the list by excluding the current maximum
element.
The difference between Bubble Sort and Selection Sort, is that in Selection Sort to find
the maximum number in the array, a new variable MAX is used to keep maximum of
all the values scanned upto a particular stage. On the other hand, in Bubble Sort, the
maximum number in the array under consideration is found by comparing adjacent
pairs of numbers and by keeping larger of the two in the position at the right. Thus
after scanning the whole array once, the maximum number reaches the right-most
position of the array under consideration.
Step 1: Create a variable MAX to store the maximum of the values scanned upto a
particular stage. Also create another variable say MAX-POS which keeps track of the
position of such maximum values.
Step 2: In each iteration, the whole list/array under consideration is scanned once to
find out the current maximum value through the variable MAX and to find out the
position of the current maximum through MAX-POS.
Step 3: At the end of an iteration, the value in last position in the current array and
the (maximum) value in the position MAX-POS are exchanged.
Step 4: For further consideration, replace the list L by L ~ {MAX} {and the array A
by the corresponding subarray} and go to Step 1.
Example 3.3.3.1:
80 32 31 40 50 110
80 32 31 40 50
50 32 31 40 80
50 32 31 40
40 32 31 50
40 32 31
Initially & finally
Max ← 40 ; MAX-POS ← 1
Therefore, entries 40 and 31 are exchanged to get
31 32 40
31 32
31
∗
as there is only one statement in j-loop, we can Omit ‘begin’ and ‘end’.
81
Introduction to 3.3.4 Shell Sort
Algorithmics
The sorting algorithm is named so in honour of D.L. Short (1959), who suggested the
algorithm. Shell Sort is also called diminishing-increment sort. The essential idea
behind Shell-Sort is to apply any of the other sorting algorithm (generally Insertion
Sort) to each of the several interleaving sublists of the given list of numbers to be
sorted. In successive iterations, the sublists are formed by stepping through the file
with an increment INCi taken from some pre-defined decreasing sequence of step-
sizes INC1 > INC2 > … > INCi … > 1, which must terminate in 1.
Example 3.3.4.2: Let the list of numbers to be sorted, be represented by the next row.
13 3 4 12 14 10 5 1 8 2 7 9 11 6 (n = 14)
7 10 13
Taking sublist of elements at 2nd, 7th and 12th positions, viz sublist of values 3, 5 and 9. After sorting
these values we get the sorted sublist.
3 5 9
Taking sublist of elements at 3rd, 8th and 13th positions, viz sublist of values 4, 1 and 11. After sorting
these values we get the sorted sublist.
1 4 11
Similarly, we get sorted sublist
6 8 12
2 14
{Note that, in this case, the sublist has only two elements Θ it is 5th sublist and n = 14 is less than
⎛ ⎡ 14 ⎤ ⎞
⎜⎜ ⎢ ⎥ • INC + 5 ⎟⎟ where INC = 5}
⎝⎣ INC ⎦ ⎠
After merging or interleaving the entries from the sublists, while maintaining the
initial relative positions, we get the New List:
7 3 1 6 2 10 5 4 8 14 13 9 11 12
Next, take INC = 3 and repeat the process, we get sorted sublists:
5 6 7 11 14,
2 3 4 12 13 and
1 8 9 10
After merging the entries from the sublists, while maintaining the initial relative
positions, we get the New List:
New List
5 2 1 6 3 8 7 4 9 11 12 10 14 13
82
Taking INC = 1 and repeat the process we get the sorted list Basics of Analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Note: Sublists should not be chosen at distances which are mutliples of each other
e.g. 8, 4, 2 etc. otherwise, same elements may be technique compared again and
again.
else s ← r
Insertion-Sort (A [t… (t + s * j)])
end {of t-for-loop}
end {i for-loop}
end {of procedure}.
83
Introduction to
Algorithmics
The following are all distinct and only binary trees having three nodes
Heap: is defined as a binary tree with keys assigned to its nodes (one key per node)
such that the following conditions are satisfied:
(i) The binary tree is essentially complete (or simply complete), i.e, all its levels
are full except possibly the last level where only some rightmost leaves may be
missing.
(ii) The key at each node is greater than or equal to the keys at its children.
10
5 7
4 2
1
However, the following is not a heap because the value 6 in a child node is more
than the value 5 in the parent node.
84
Basics of Analysis
10
5 7
6 2
1
Also, the following is not a heap, because, some leaves (e.g., right child of 5), in
between two other leaves (viz 4 and 1), are missing.
10
5 7
4
1
Example 3.3.5.1:
Let us consider applying Heap Sort for the sorting of the list 80 32 31 110 50 40
120 represented by an array A[1…7]
85
Introduction to
80
Algorithmics
32
As 32 < 80, therefore, heap property is satisfied. Hence, no modification of the tree.
Next, value 31 is attached as right child of the node 80, as shown below
80
32
31
Again as 31 < 80, heap property is not disturbed. Therefore, no modification of the
tree.
80
32 31
110
However, 110 > 32, the value in child node is more than the value in the parent node.
Hence the tree is modified by exchanging the values in the two nodes so that, we get
the following tree
80
110 13
9 12
Again as 110 > 80, the value in child node is more than the value in the parent node.
Hence the tree is modified by exchanging the values in the two nodes so that, we get
the following tree
110
80 31
32
86
This is a Heap. Basics of Analysis
Next, number 50 is attached as right child of 80 so that the new tree is as given below
110
80 31
32 50
As the tree satisfies all the conditions of a Heap, we insert the next number 40 as left
child of 31 to get the tree
110
80 31
32 50
40
As the new insertion violates the condition of Heap, the values 40 and 31 are
exchanged to get the tree which is a heap
110
80 40
32 50
31
Next, we insert the last value 120 as right child of 40 to get the tree
110
80 40
32 50
31 120
87
Introduction to The last insertion violates the conditions for a Heap. Hence 40 and 120 are exchanged
Algorithmics to get the tree
110
80 120
32 50
31 40
Again, due to movement of 120 upwards, the Heap property is disturbed at nodes
`110 and 120. Again 120 is moved up to get the following tree which is a heap.
120
80 110
32 50
31 40
Step 2&3: Consists of a sequence of actions viz (i) deleting the value of the root, (ii)
moving the last entry to the root and (iii) then readjusting the Heap
The root of a Heap is always the maximum of all the values in the nodes of the tree.
The value 120 currently in the root is saved in the last location B[n] in our case B[7]
of the sorted array say B[1..n] in which the values are to be stored after sorting in
increasing order.
Next, value 40 is moved to the root and the node containing 40 is removed from
further consideration, to get the following binary tree, which is not a Heap.
40
80 110
32 50
31
In order to restore the above tree as a Heap, the value 40 is exchanged with the
maximum of the values of its two children. Thus 40 and 110 are exchanged to get the
tree which is a Heap.
88
Basics of Analysis
110
80 40
32 50
31
Again 110 is copied to B[6] and 31, the last value of the tree is shifted to the root and
last node is removed from further consideration to get the following tree, which is not
a Heap
31
80 40
32 50
Again the root value is exchanged with the value which is maximum of its children’s
value i.e exchanged with value 80 to get the following tree, which again is not a Heap.
80
31 40
32 50
Again the value 31 is exchanged with the maximum of the values of its children, i.e.,
with 50, to get the tree which is a heap.
80
50 40
32 31
Again 80 is copied in B[5] and 31, the value of the last node replaces 80 in the root
and the last node is removed from further consideration to get the tree which is not a
heap.
89
Introduction to
31
Algorithmics
50 40
32
Again, 50 the maximum of the two children’s values is exchanged with the value of
the root 31 to get the tree, which is not a heap.
50
31 40
32
50
32 40
31
The entry 31 in the last node replaces the value in the root and the last node is deleted,
to get the following tree which is not a Heap
31
32
40
Again 40, the maximum of the values of children is exchanged with 31, the value in
the root. We get the Heap
40
32
31
90
Again 40 is copied in B[3]. The value in the last node of the tree viz 31, replaces the Basics of Analysis
value in the root and the last node is removed from further consideration to get the
tree, which is not a Heap.
31
32
Again 32, the value of its only child is exchanged with the value of the root to get the
Heap
32
31
Next, 32 is copied in B[2] and 31, the value in the last node is copied in the root and
the last node is deleted, to get the tree which is a Heap.
31
This value is copied in B[1] and the Heap Sort algorithm terminates.
The following procedure reads one by one the values from the given to-be-sorted
array A[1..n] and gradually builds a Heap. For this purpose, it calls the procedure
Build-Heap. For building the Heap, an array H[1..n] is used for storing the elements of
the Heap. Once the Heap is built, the elements of A[1..n] are already sorted in H[1..n]
and hence A may be used for sorting the elements of finally sorted list for which we
used the array B. Then the following three steps are repeated n times, (n, the number
of elements in the array); in the ith iteration.
(i) The root element H[1] is copied in A[n ─ i+1] location of the given array A.
The first time, root element is stored in A[n]. The next time, root element is
stored in A[n ─1] and so on.
(ii) The last element of the array H[n ─ i + 1] is copied in the root of the Heap, i.e.,
in H[1] and H[n ─ i + 1] is removed from further consideration. In other words,
in the next iteration, only the array H[1..(n ─ i)] (which may not be a Heap) is
taken into consideration.
(iii) The procedure Build-Heap is called to build the array H[1..(n ─ i)] into a Heap.
Procedure Build-Heap
The following procedure takes an array B[1..m] of size m, which is to be sorted, and
builds it into a Heap
A sorting algorithm based on ‘Divide and Conquer’ technique has the following
outline:
92
There are two well-known ‘Divide and conquer’ methods for sorting viz: Basics of Analysis
Given List: 4 6 7 5 2 1 3
Chop the list to get two sublists viz.
((4, 6, 7, 5), ( 2 1 3))
where the symbol ‘/ ’ separates the two sublists
Again chop each of the sublists to get two sublists for each viz
(((4, 6), (7, 5))), ((2), (1, 3)))
Again repeating the chopping operation on each of the lists of size two or more
obtained in the previous round of chopping, we get lists of size 1 each viz 4 and 6, 7
and 5, 2, 1 and 3. In terms of our notations, we get
((((4), (6)), ((7), (5))), ((2), ((1), (3))))
At this stage, we start merging the sublists in the reverse order in which chopping was
applied. However, during merging the lists are sorted.
Start merging after sorting, we get sorted lists of at most two elements viz
(((4, 6), (5, 7)), ((2), (1, 3)))
Merge two consecutive lists, each of at most two elements we get sorted lists
((4, 5, 6, 7), (1, 2, 3))
Finally merge two consecutive lists of at most 4 elements, we get the sorted
list: (1, 2, 3, 4, 5, 6, 7)
93
Introduction to Ex. 6) Sort the following sequence of number, using Merge Sort:
Algorithmics 15, 10, 13, 9, 12 7
Further, find the number of comparisons and copy/assignment operations
required by the algorithm in sorting the list.
To partition the list, we first choose some value from the list for which, we hope,
about half the values will be less than the chosen value and the remaining values will
be more than the chosen value.
Division into sublists is done through the choice and use of a pivot value, which is a
value in the given list so that all values in the list less than the pivot are put in one list
and rest of the values in the other list. The process is applied recursively to the sublists
till we get sublists of lengths one.
Remark 3.3.8.1:
The choice of pivot has significant bearing on the efficiency of Quick-Sort algorithm.
Sometime, the very first value is taken as a pivot.
However, the first values of given lists may be poor choice, specially when the
given list is already ordered or nearly ordered. Because, then one of the sublists may
be empty. For example, for the list
7 6 4 3 2 1
the choice of first value as pivots, yields the list of values greater than 7, the pivot, as
empty.
Even, choice of middle value as pivot may turn out to be very poor choice, e.g, for
the list
2 4 6 7 3 1 5
the choice of middle value, viz 7, is not good, because, 7 is the maximum value of the
list. Hence, with this choice, one of the sublists will be empty.
A better method for the choice of pivot position is to use a random number generator
to generate a number j between 1 and n, for the position of the pivot, where n is the
size of the list to be sorted. Some simpler methods take median value of the of a
sample of values between 1 to n, as pivot position. For example, the median of the
values at first, last and middle (or one of the middle values, if n is even) may be taken
as pivot.
94
5 3 1 9 8 2 4 7 {← Given list} Basics of Analysis
i → ← j
5 3 1 9 8 2 4 7
i j
{the value 9 is the first value from left that is greater than the pivot viz 5 and the value 4 is the first value
from right that is less than the pivot. These values are exchanged to get the following list}
5 3 1 4 8 2 9 7
i j
{Moving i toward right and j toward left and i stops, if it reaches a value greater than the pivot and j
stops if j meets a value less than the pivot. Also both stop if j ≤ i. We get the list}
5 3 1 4 8 2 9 7
i ↔ j
{After exchanging the values 2 and 8 we get the list}
5 3 1 4 2 8 9 7
i j
{The next moves of i to the right and j to the left make j < i and this indicates the completion of one
iteration of movements of i and j to get the list with positions of i and j as follows}
5 3 1 4 2 8 9 7
j i
{At this stage, we exchange the pivot with value at position j, i.e., 2 and 5 are exchanged so that pivot
occupies almost the middle position as shown below.}
2 3 1 4 5 8 9 7
{It may be noted that all values to the left of 5 are less than 5 and all values to the right of 5 are greater
than 5. Then the two sublists viz 2,3,1,4 and 8,9,7 are sorted independently}
2 3 1 4 and 8 7 9
i → ← j i → ← j
2 3 1 4 and 8 7 9
i ↔ j i j
2 1 3 4 and 8 7 9
i j j i
95
Introduction to Procedure Quick-Sort (A[min… max])
Algorithmics {min is the lower index and max the upper index of the array to be sorted using Quick Sort}
begin
if min < max then
p ← partition (A[min..max]);
{p is the position s.t for min ≤ i ≤ p ─ 1, A [i] ≤ A [p] and for all j ≥ p+1,
A[j] ≥ A [p]}
Quick-Sort (A [min .. p ─ 1]);
Quick-Sort (A[p+1 .. max]);
end;
In the above procedure, the procedure partition is called. Next, we define the
procedure partition.
begin
While (A[i] < p) do
i ← i +1
While (A[j] > p) do
j←j─1
exchange (A[i], A[j])
{the exchange operation involves three assignments, viz, temp ← A [i ]; A[i] ← A [j] and
A[j] ← temp, where
temp is a new variable }
end; {of while loop}
exchange (A[1], A[j]);
return j
{the index j is s.t in the next iteration sorting is to be done for the two subarray
A[ min .. j─1] and A[j+1 .. max]
The following table gives the time complexities requirement in terms of size n of the
list to be sorted, for the two types of actions for executing the algorithms. Unless
mentioned otherwise, the complexities are for average case behaviors of the
algorithms:
96
Name of the Comparison of Keys Assignments Basics of Analysis
algorithm
1 2 1 2
Bubble Sort (n ─ n log n) (average) (n ─ n) (average)
2 4
1 2 1 2
(n ─ n) (worst) (n ─ n) (worst)
2 2
Merge Sort is good for linked lists but poor for contiguous lists. On the other hand,
Quick Sort is good for contiguous lists but is poor for linked lists.
In context of complexities of a sorting algorithm, we state below an important
theorem without proof.
Again, for this purpose, one of the two lists is TS[ 1..n] in which the elements are
already sorted in increasing order, and the other is TR[1..n] in which elements are in
reverse-sorted order, i.e., elements in the list are in decreasing order.
The algorithm Insertion-Sort for the array TS[1..n] already sorted in increasing order,
does not make any displacement of entries (i.e., values) for making room for out-of-
order entries, because, there are no out-of-order entries in the array.
In order to find the complexity of Insertion Sort for the array TS[1..n], let us consider
a specific case of sorting.
The already sorted list {1,3,5,7,9}. Initially A[1] = 1 remains at its place. For the
next element A[2] = 3, at its place. For the next element A[2] = 3, the following
operations are performed by Insertion Sort:
(i) j←2
(ii) i←j─1
(iii) m ← A[j]
(iv) A[i + 1] ← m
(v) m < A[i] in the while loop.
97
Introduction to We can see that these are the minimum operation performed by Insertion Sort irrective
Algorithmics of the value of A[2]. Further, had A[2] been less than A[1], then more operations
would have been performed, as we shall see in the next example.
To conclude, as if A[2] ≥ A[1] (as is the case, because A[2] = 3 and A[1] = 1) then
Insertion Sort performs 4 assignments and 1 comparison. We can see that in general,
if A[l+1] ≥ A[l] then we require (exactly) 4 additional assignments and 1 comparison
to place the value A[l+1] in its correct position, viz (l+1)th.
Further, we notice, that these are the minimum operations required to sort a list
of n elements by Insertion Sort.
Hence in the case of array TS[1..n], the Insertion-Sort algorithm takes linear time for
an already sorted array of n elements. In other words, Insertion-Sort has linear
Best-Case complexity for TS[1..n].
Next, we discuss the number of operations required by Insertion Sort for sorting
TR[1..n] which is sorted in reverse order. For this purpose, let us consider the sorting
of the list {9, 7, 5, 3, 1} stored as A[1] = 9, A[2] = 7, A[3] = 5 A[4] = 3 and A[5] = 1.
Let m denote the variable in which the value to be compared with other values of the
array, is stored. As discussed above in the case of already properly sorted list, 4
assignments and one comparison, of comparing A[k +1] with A[k], is essentially
required to start the process of putting each new entry A[k+1] after having already
sorted the list A[1] to A[k]. However, as 7 = A[2] = m < A[1] = 9. A[1] is copied to
A[2] and m is copied to A[1]. Thus Insertion Sort requires one more assignment
(viz A[2] ← A[1]).
At this stage A[1] = 7 and A[2] = 9. It is easily seen that for a list of two elements, to
be sorted in proper order using Insertion-Sort, at most one comparison and
5 (= 4 +1) assignments are required.
Next, m ← A[3] = 5, and m is first compared with A[2] = 9 and as m < A[2], A[2] is
copied to A[3]. So, at this stage both A[2] and A[3] equal 9.
Next, m = 5 is compared with A[1] = 7 and as m < A[1] therefore A[1] is copied to
A[2], so that, at this stage, both A[1] and A[2] contain 7. Next 5 in m is copied to
A[1]. Thus at this stage A[1] = 5, A[2] = 7 and A[3] = 9. And, during the last round,
we made two additional comparisons and two addition assignments
(viz A[3] ← A[2] ← A[1]), and hence total (4 + 2) assignments, were made.
In general, for a list of n elements, sorted in reverse order, the Insertion Sort
algorithm makes
(n − 1) n = 1 (n2 ─ n) comparisons and 4n + 1 (n2 + n)
2 2 2
1 2
= (n + 9n) assignments.
2
98
Again, it can be easily seen that for a list of n elements, Insertion-Sort algorithm Basics of Analysis
1 2 1 2
should make at most (n ─ n) comparisons and (n + 9n) assignments.
2 2
Thus, the total number of operations
⎛1
=⎜ (
n2 − n )⎞⎟ + ⎛⎜ 12 (n 2
) ⎞
+ n + 4n ⎟ = n 2 + 4n
⎝2 ⎠ ⎝ ⎠
If we assume that time required for a comparison takes a constant multiple of the time
than that taken by an assignment, then time-complexity of Insertion-Sort in the case of
reverse-sorted list is a quadratic in n.
When actually implemented for n = 5000, the Insertion-Sort algorithm took 1000
times more time for TR[1..n], the array which is sorted in reverse order than for
TS[1..n], the array which is already sorted in the required order, to sort both the
arrays in increasing order.
We discuss worst-case analysis and best-case analysis in this section and the
other two analyses will be discussed in Section 2.9.
99
Introduction to In other words, the worst time complexity of Insertion-Sort algorithm is a quadratic
Algorithmics polynomial in the size of the problem instance.
(i) finding the instances (types) for which algorithm runs fastest and then
(ii) with finding running time of the algorithm for such instances (types).
If C-best (n) denotes the best-case complexity for instances of size n, then by
definition, it is guaranteed that no instance of size n, of the problem, shall take less
time than
C-best (n).
Thus the best time complexity of Insertion Sort algorithm is a linear polynomial in the
size of the problem instance.
100
3.5.1 Sequencing Basics of Analysis
Let F1 and F2 be two program fragments, with t1 and t2 respectively the time required
for executing F1 and F2. Let the program fragment F1 ; F2 be obtained by sequencing
the two given program fragments, i.e, by writing F1 followed by F2.
Then sequencing rule states that the time required for the program fragment
F1 ; F2 is t1 + t2.
Word of Caution: The sequencing rule, mentioned above, is valid only under the
assumption that no instruction in fragment F2 depends on any instruction in Fragment
F1. Otherwise, instead of t1 + t2, the time required for executing the fragment F1 ; F2
may be some more complex function of t1 and t2, depending upon the type of
dependency of instruction(s) of F2 on instructions of F1. Next, we consider the various
iterative or looping structures, starting with “For” loops.
Example 3.5.2.1: The following program fragment may be used for computing sum
of
first n natural numbers:
for i = 1 to n do
sum = sum + i.
The example above shows that the instruction sum = sum + i depends upon the loop
variable ‘i’. Thus, if we write
for i= 1 to n do
P (i),
end {for}
where i in P(i) indicates that the program fragment P depends on the loop variable i.
Example 3.5.2.2: The following program fragment may be used to find the sum of n
numbers, each of which is to be supplied by the user:
for i = 1 to n do
read (x);
sum = sum + x;
end {for}.
In the latter example, the program fragment P, consisting of two instructions viz.,
read (x) and sum = sum + x, do not involve the loop variable i. But still, there is
nothing wrong if we write P as P(i). This is in view of the fact that a function f of a
variable x, given by
f(x) = x2
may also be considered as a function of the two variables x and y, because
f(x,y) = x2 + 0.y
101
Introduction to Remark 3.5.2.3:
Algorithmics
The for loop
for i = 1 to n do
P(i) ;
end for,
is actually a shorthand for the following program fragment
i=1
while i ≤ n do
P(i) ;
i =i + 1
end while ;
Remark 3.5.24:
The case when n = 0 in the loop for i = 1 to n do P (i) would not be treated as an error.
The case n = 0 shall be interpreted that P(i) is not executed even once.
Let us now calculate the time required for executing the loop
for i = 1 to n do
P(i).
end for
For this purpose, we use the expanded definition considered under Remark and, in
addition, we use the following notations:
fl : the time required by the ‘for’ loop
fl = a for i = 1
+ (n+1) c for (n+1) times testing i ≤ n
+ nt for n times execution of P(i)
i.e.
fl = (n+1)a + (n+1)c + n s + n t
102
(a) a, the time for making an assignment Basics of Analysis
(b) c, the time for making a comparison and
(c) s, the time for sequencing two instructions, one after the other,
then fl, the time to be taken by the ‘for’ loop is approximately nt, i.e., fl ≈ nt
(iii) If n = 0 or n is negative, the approximation fl ≈ nt is completely wrong.
Because , w.r.t Remark 2.6.2.1 above, at least the assignment i = 1 is executed
and also the test i ≤ n is executed at least once and hence fl ≤ 0 or fl ≈ 0 can not
be true.
One of the frequently used techniques for analysing while/repeat loops, is to define a
function say f of the involved loop variables, in such a way that
(i) the value of the function f is an integer that decreases in successive iterations of
the loop.
(ii) the value of f remains non-negative throughout the successive executions of the
loop, and as a consequence.
(iii) the value of f reaches some minimum non-negative value, when the loop is to
terminate, of course, only if the loop under consideration is a terminating loop.
Once, such a function f, if it exists, is found, the analysis of the while/repeat loop gets
simplified and can be accomplished just by close examination of the sequence of
successive values of f.
We illustrate the techniques for computing the time complexity of a while loop
through an example given below. The repeat loop analysis can be handled on the
similar lines.
Example 3.5.3.1:
Let us analyze the following Bin-Search algorithm that finds the location of a value v
in an already sorted array A[1..n], where it is given that v occurs in A[ 1.. n ]
The Bin-Search algorithm, as defined below, is, in its rough version, intuitively applied by us in finding
the meaning of a given word in a dictionary or in finding out the telephone number of a person from the
telephone directory, where the name of the person is given. In the case of dictionary search, if the word
to be searched is say CARTOON, then in view of the fact that the word starts with letter C which is near
the beginning of the sequence of the letters in the alphabet set, we open the pages in the dictionary near
the beginning. However, if we are looking for the meaning of the word REDUNDANT, than as R is 18th
letter out of 26 letters of English alphabet, we generally open to pages after the middle half of the
dictionary.
However, in the case of search of a known value v in a given sorted array, the values
of array are not know to us. Hence we do not know the relative position of v. This is
⎡1 + n ⎤
why, we find the value at the middle position ⎢ ⎥ in the given sorted array
⎣ 2 ⎦
⎡1 + n ⎤
A[1..n] and then compare v with the value A ⎢ ⎥ . These cases arise:
⎣ 2 ⎦
⎡ n + 1⎤
(i) If the value v = A ⎢ ⎥ , then search is successful and stop. Else,
⎣ 2 ⎦
103
Introduction to ⎡ ⎛ ⎡1 + n ⎤ ⎞ ⎤
⎡1 + n ⎤
Algorithmics (ii) if v < A ⎢ ⎥ , then we search only the part A ⎢1..⎜⎜ ⎢ ⎥ − 1⎟⎟⎥ of the given
⎣ 2 ⎦ ⎣ ⎝ ⎣ 2 ⎦ ⎠⎦
array. Similarly
⎡1 + n ⎤ ⎡⎛ 1 + n ⎞ ⎤
(iii) if v > A ⎢ ⎥ , than we search only the part A ⎢⎜ ⎟ + 1...n ⎥ of the array.
⎣ 2 ⎦ ⎣⎝ 2 ⎠ ⎦
And repeat the process. The explanation for searching v in a sorted array, is
formalized below as function Bin-Search.
k = [(i+ j) ÷ 2]
Case
v < A[ k ]: j = k─1
v = A[ k ]: { return k }
v > A[ k ]: i=k+1
end case
end while { return i }
end function;
1 4 7 9 11 15 18 21 23 24 27 30
⎡1 = 12 ⎤ ⎡13 ⎤
k= ⎢ ⎥ =⎢ ⎥ =6
⎣ 2 ⎦ ⎣2⎦
and A[6] = 15
As v = 11 < 15 = A[6]
j=6─1=5
i = 1 (unchanged)
⎡1 + 5 ⎤
hence k = ⎢ ⎥ =3
⎣ 2 ⎦
A [3] = 7
As v = 11 > 7 = A[3]
104
Basics of Analysis
⎡5 + 5⎤
k= ⎢ ⎥ =5
⎣ 2 ⎦
And A[k] = A[5] = 11 = v
We have seen through the above illustration and also from the definition that the array
which need to be searched in any iteration is A[i..j] which has (j ─ i + 1) number of
elements
Also if fold, jold and iold are the values of respectively f, j and i before an iteration of the
loop and fnew , jnew and inew the new values immediately after the iteration, then for
each of the three cases v < A [ k ], v = A [ k ] and v > A [ k ] we can show that
fnew ≤ fold/2
Just for explanations, let us consider the case v < A [ k ] as follows:
for v < A [ k ], the instruction j = k ─ 1 is executed and hence
inew = iold and jnew = [(iold + jold) ÷ 2] ─ 1 and hence
1 = (n/2t) or n = 2t
i.e., [
t = log 2 n ]
Analysis of “repeat” construct can be carried out on the similar lines.
105
Introduction to Example 3.6.1:
Algorithmics
Function factorial (n)
{computes n!, the factorial of n recursively
where n is a non-negative integer}
begin
if n = 0 return 1
else return (n * factorial (n─ 1))
end factorial
We take n, the input, as the size of an instance of the problem of computing factorial
of n. From the above (recursive) algorithm, it is clear that multiplication is its basic
operation. Let M(n) denote the number of multiplications required by the
algorithm in computing factorial (n). The algorithm uses the formula
Therefore, the algorithm uses one extra multiplication for computing factorial (n) then
for computing factorial (n─1). Therefore,
Also, for computing factorial (0), no multiplication is required, as, we are given
factorial (0) = 1. Hence
M(0) = 0 ( 3.6.2)
The Equation ( 3.6.1) does not define M(n) explicitly but defines implicitly through
M(n─1). Such an equation is called a recurrence relation/equation. The equation
( 3.6.2) is called an initial condition. The equations (3.6.1) and (3.6.2 ) together form
a system of recurrences.
We shall discuss in the next section, in some detail, how to solve system of
recurrences. Briefly, we illustrate a method of solving such systems, called Method of
Backward Substitution, through solving the above system of recurrences viz.
M(n) = M(n─1) + 1
= [M (n ─ 2) + 1] + 1,
M(n) = M (n ─ 1) + 1 = M (n ─ 1) + 1
= [M (n ─2) + 1 ] + 1 = M (n ─ 2) + 2
= [M (n ─ 3) + 1 ] + 2 = M (n ─ 3) + 3
= M ( 1 ) + ( n ─ 1) = M (n ─ I) + i
106
= M (0) + n , Using (2.7.2) we get Basics of Analysis
M(n) =0+n=n
Example 3.7.1.1:
First few terms of the sequence < F (n) > are, as given below.
f(1) = 1
f(2) = 2 * f (1) + 1 = 2 × 1+1 =3
f(3) = 2 * f (2) + 1 = 2 × 3+1 =7
f(4) = f(3) + 1 = 2 x 7 + 1 = 15
F(n) = 2n ─ 1 for n = 1, 2, 3, 4
F(1) = 21 ─ 1 = 1,
F (k) = 2k ─ 1
F(k+1) = 2k+1 ─ 1.
For showing
107
Introduction to F(k+1) = 2k+1 ─ 1,
Algorithmics
Consider, by definition,
F(k+1) = 2 F (K + 1 ─ 1) + 1
= 2 F (k) + 1
= 2 (2k ─ 1) + 1 (by Step (ii))
= 2k+1 ─ 2 + 1
= 2k+1 ─ 1
Therefore, by Principle of Mathematical Induction, our feeling that F(n) = 2n ─ 1 for
all n ≥ 1 is mathematically correct.
where a, b, c are real numbers, a ≠ 0, are called linear second-order recurrences with
constant coefficients. Further, if g(n) = 0 then the recurrence is called Homogeneous.
Otherwise, it is called inhomogeneous. Such systems of recurrences can be solved by
neither backward substitution method nor by forward substitution method.
First we consider only the homogeneous case, i.e., when g(n) = 0. The recurrence
becomes
The above equation has infinitely many solutions except in the case when both b = 0
and c = 0.
Theorem 3.7.2.1:
a x2 + bx + c = 0. Then
Case 1: If x1 and x2 are real and distinct then solution of (2.8.2.2) is given by
F(n) = α x 1n + β x n2 (3.7.2.4)
Case II: If the roots x1 and x2 are real but x1 = x2 then solutions of (3.7.2.2) are
given by
108
F(n) = α x1n + β nx1n , (3.7.2.5) Basics of Analysis
Case III: If x1 and x2 are complex conjugates given by u + iv, where u and v are real
numbers. Then solutions of (2.8.2.2) are given by
where r = u 2 + v 2 and θ = tan –1 (v/u) and α and β are two arbitrary real
constants.
Example 3.7.2.2:
F(n) ─ 4 F ( n ─ 1) + 4 F (n ─ 2) = 0. (3.7.2.7)
x2 ─ 4x + 4 = 0,
Theorem 3.7.2.3:
can be obtained as the sum of the general solution of the homogeneous equation
a F(n) + b F(n ─ 1) + c F (n ─ 2) = 0
and a particular solution of (3.8.2.8).
The method of finding a particular solution of (3.8.2.8) and then a general solution of
(3.8.2.8) is explained through the following examples.
Example 3.7.2.4
F(n) ─ 4 F(n ─ 1) + 4 F (n ─ 2) = 3
If F(n) = c is a particular solution of the recurrence, then replacing F(n), F(n─1) and
F(n─2) by c in the recurrence given above, we get
c ─ 4c + 4c = 3
i.e., c=3
109
Introduction to Also, the general solution of the characteric equation (of the inhomogeneous recurrence given
Algorithmics above) viz
F(n) ─ 4 F(n ─ 1) + 4 F(n ─ 2) = 0
Are (obtained from Example 2.8.2.2, as) α 2n βn 2n
Hence general solution of the given recurrence is given by
F(n) = α 2n + β n 2n + 3
Where α and β are arbitrary real constants.
Also, the last term (n . (1 ─ p)) is the contribution in which while-loop is executed
n times and after which it is found that A [ i ] ≠ K for i = 1, 2, …., n.
⎛p⎞
Cavg (n) = ⎜ ⎟ [ 1 + 2 + …+ i + .. + n] + n ( 1 ─ p)
⎝n⎠
p n (n + 1)
= . + n. (1 − p)
n 2
p (n + 1)
= + n (1 − p)
2
As can be seen from the above discussion, the average-case analysis is more difficult
than the best-case and worst-case analyses.
Through the above example we have obtained an idea of how we may proceed to find
average-case complexity of an algorithm. Next we outline the process of finding
average-case complexity of any algorithm, as follows:
(i) First categories all possible input instances into classes in such a way that inputs
in the same class require or are expected to require the execution of the same
number of the basic operation(s) of the algorithm.
(ii) Next, the probability distribution of the inputs for different class as, is obtained
empirically or assumed on some theoretical grounds.
(iii) Using the process as discussed in the case of Sequential-Search above, we
compute the average-case complexity of the algorithm.
It is worth mentioning explicitly that average-case complexity need not be the average
of the worst-case complexity and best-case complexity of an algorithm, though in
some cases, the two may coincide.
Further, the effort required for computing the average-case, is worth in view of its
contribution in the sense, that in some cases, the average-case complexity is much
better than the Worst-Case Complexity. For example, in the case of Quicksort
algorithm, which we study later, for sorting an array of elements, the Worst-Case
complexity is a quadratic function whereas the average-case complexity is bounded by
some constant multiple of n log n. For large values of n, a constant multiple of
(n log n) is much smaller than a quadratic function of n. Thus, without average-case
analysis, we may miss many on-the-average good algorithms.
Another important fact that needs our attention is the fact that most of the operations,
including the most time-consuming operations, on a data structure (used for solving a
problem) do not occur in isolation, but different operations, with different time
complexities, occur as a part of a sequence of operations. Occurrences of a
particular operation in a sequence of operations are dependent on the occurrences of
111
Introduction to other operations in the sequence. As a consequence, it may happen that the most time
Algorithmics consuming operation can occur but only rarely or the operation only rarely consumes
its theoretically determined maximum time. We will support our claim later through
an example. But, we continue with our argument in support of the need for another
type of analysis, viz., amortized analysis, for better evaluation of the behaviour of an
algorithm. However, this fact of dependence of both the occurrences and complexity
of an operation on the occurrences of other operations, is not taken into consideration
in the earlier mentioned analyses. As a consequence, the complexity of an algorithm
is generally over-evaluated.
Next, we give an example in support of our claim that the frequencies of occurrences
of operations and their complexities, are generally interdependent and hence the
impact of the most time-consuming operation may not be as bad as it is assumed or
appears to be.
Example 3.8.2.1:
We define a new data structure say MSTACK, which like the data structure STACK
has the usual operations of PUSH and POP. In addition there is an operation
MPOP(S, k), where S is a given stack and k is a non-negative integer. Then
MPOP(S, k) removes top k elements of the stack S, if S has at least k elements in the
stack. Otherwise it removes all the elements of S. MPOP (S, k) may be formally
defined as
For example, If, at some stage the Stack S has the elements
35 40 27 18 6 11
↑ ↑
TOP BOTTOM
Then after MPOP (S, 4) we have
6 11
↑ ↑
TOP BOTTOM
(i) Cost of each PUSH and POP is assumed to be 1 and if m ≥ 0 is the number of
elements in the Stack S when an MPOP (S, k) is issued, then
⎧k if k ≤ m
Cost (MPOP(S, k)) = ⎨
⎩m otherwise
(ii) If we start with an empty stack S, then at any stage, the number of elements
that can be POPed off the stack either through a POP or MPOP, can not
exceed the total number of preceding PUSHes.
112
The above statement can be further strengthened as follows: Basics of Analysis
(ii a) If we start with an empty stack S, then, at any stage, the number of elements
that can be popped off the stack through all the POPs and MPOPs can not
exceed the number of all the earlier PUSHes.
For Example, if Si denotes ith PUSH and Mj denote jth POP/MPOP and if we have a
sequence of PUSH/POP/MPOP as (say)
S1 S2 S3 M1 S4 S5 M2 S6 S7 M3 S8 S9 S10 S11 M4
∴Cost (M1) + Cost (M2) + …….+ Cost (Mt) ≤ sum of costs of all Pushes
= i1 + i2+…+it ≤ n (from ( 3.8.2.3))
Thus, we observe that operations are not considered in isolation but as a part of a
sequence of operations, and, because of interactions among the operations, highly-
costly operations may either not be attained or may be distributed over the less costly
operations.
The above discussion motivates the concept and study of AMORTIZED ANALYSIS.
3.9 SUMMARY
In this unit, the emphasis is on the analysis of algorithms, though in the process, we
have also defined a number of algorithms. Analysis of an algorithm generally leads to
computational complexity/efficiency of the algorithm.
It is shown that general analysis of algorithms may not be satisfactory for all types of
situations and problems, specially, in view of the fact that the same algorithm may
have vastly differing complexities for different instances, though, of the same size.
This leads to the discussion of worst-case and best-case analyses in Section 3.4 and of
average-case analysis and amortized analysis in Section 3.8.
In, Section 3.2, we discuss two simple examples of algorithms for solving the same
problem, to illustrate some simple aspects of design and analysis of algorithms.
Specially, it is shown here, how a minor modification in an algorithm may lead to
major efficiency gain.
In Section 3.3, the following sorting algorithms are defined and illustrated with
suitable examples:
(i) Insertion Sort
(ii) Bubble Sort
(iii) Selection Sort
(iv) Shell Sort
(v) Heap Sort
(vi) Merge Sort
(vii) Quick Sort
Though these algorithms are not analyzed here, yet a summary of the complexities of
these algorithms is also included in this section.
The Section 3.5 deals with the analysis in terms of non-recursive control structures in
and the Section 3.6 deals with the analysis in terms of recursive control structures.
3.10 SOLUTIONS/ANSWERS
Ex. 1) List to be sorted: 15, 10, 13, 9, 12, 17 by Insertion Sort.
114
Iteration (i): For placing A[2] at its correct relative position w.r.t A[1] Basics of Analysis
in the finally sorted array, we need the following operations:
(i) As A[2] = 10 < 15 = A [1], therefore, we need following
additional operations
(ii) 10= A[2] is copied in m, s.t A[1] = 15, A[2] = 10, m = 10
(iii) 15 = A [1] is copied in A [2] s.t A[1] = 15, A[2] = 15, m = 10
(iv) 10 = m is copied in A[1], so that A[1] = 10, A[2] = 15.
Also, for correct place for A [3] = 13 w.r.t A[1] and A[2], the following
operations are performed
(i) 13 = A[3] is compared with A[2] = 15
As A[3] < A[2], therefore, the algorithm further performs the
following operations
(ii) 13 = A[3] copied to m so that m = 13
(iii) A [ 2] is copied in A [3] s.t. A[3] = 15 = A[2]
(iv) Then 13 = m is compared with
A [1] = 10 which is less than m.
Therefore A[2] is the correct location for 13
(v) 13 = m is copied in A[2] s.t., at this stage
A[1] = 10, A[2] = 13, A[3] = 15
And m = 13
Iteration III: For correct place for A[4] = 9 w.r.t A[1], A[2] and A[3],
the following operations are performed:
(i) A[4] = 9 is compared with A[3] = 15.
As A[4] = 9 < 15 = A[3], hence the algorithm.
(ii) 9 = A[4] is copied to m so that m = 9
(iii) 15 = A [3] is copied to A [4] s.t A[4] = 15 = A[3]
(iv) m is compared with A[2] = 13, as m < A [2], therefore, further
the following operations are performed.
(v) 13 = A[2] is copied to A[3] s.t A[3] = 13
(vi) m = 9 is compared with A[1] = 10, as performs the following
additional operations.
(vii) 10 = A[1] is copied to A[2]
(viii) finally 9 = m is copied to A[1]
Iteration IV: For correct place for A[5] = 12, the following operations
are performed.
115
Introduction to In view of the earlier discussion, and in view of the fact that the
Algorithmics number 12 (contents of A[5]) occurs between A[2] = 10 and
A[3] = 13, the algorithm need to perform, in all, the following
operations.
Iteration V: The correct place for A[6] = 17, after sorting, w.r.t the
elements of A[1..5] is A[6] itself. In order to determine that A[6] is the
correct final position for 17, we perform the following operations.
To summerize:
m ← A [i]
A [i] ← A [j]
A [j] ← m
In the following, the numbers in the bold are compared, and if required,
exchanged.
116
Iteration I: Basics of Analysis
15 10 13 9 12 17
10 15 13 9 12 17
10 13 15 9 12 17
10 13 9 15 12 17
10 13 9 12 15 17
10 13 9 12 15 17
In this iteration, 5 comparisons and 5 exchanges i.e., 15 assignments,
were performed
10 13 9 12 15
10 13 9 12 15
10 9 13 12 15
10 9 12 13 15
10 9 12 13 15
10 9 12 13
9 10 12 13
9 10 12 13
9 10 12 13
9 10 12
9 10 12
9 10 12
9 10
9 10
There will be five iterations in all. In each of the five iterations at least,
the following operations are performed:
Next we explain the various iterations and for each iteration, count
the operations in addition to these 4 assignments.
Iteration 1: MAX ← 15
MAX_ POS ← 1
MAX is compared with successively 10, 13, 9, 12, and 17, one at a time
i.e, 5 comparisons are performed.
The list after the iteration is 10, 9, 12. Again 2 comparisons of 12 with 10
and 9 are performed. No additional assignments are made, in addition to
normal 4.
Finally, the list to be sorted is: 9 and the process of sorting terminates.
118
In Iteration I : 5 comparisons and 6 assignments were Basics of Analysis
performed
t ← 1 (one assignment)
Next, to calculate the position of A [t+3 × s], one multiplication and one
addition is performed. Thus, for selection of a particular sublist, 2
assignments, one comparison, one subtraction, one addition and one
multiplication is performed
Thus, just for the selection of all the three sublists, the following
operations are performed:
6 Assignments
3 Comparisons
3 Subtractions
3 additions
3 multiplications
i← 1
m ← A[2] = 9 (two assignments performed)
Next, the comparisons m = 9 < A [1] =15 and 1 = i < 0 are performed,
both of which are true. (two comparisons)
Next again the one comparisons is performed viz i > 0 which is false and
hence 9 = m < A [0] is not performed
We can count all the operations mentioned above for the final answer.
Ex.5) To sort the list 15, 10, 13, 9, 12, 17 stored in A[1..6], using Heap Sort
first build a heap for the list and then recursively delete the root and
restore the heap.
Step I
For j = 2
2 = location > 1 is tested, which is true. (one comparison)
Hence
⎛ location ⎞
parent ← ⎜ ⎟ = 1, is performed. (one comparison)
⎝ 2 ⎠
120
Basics of Analysis
15
10
15
10
13
⎡ location ⎤ ⎡ 4 ⎤
Parent ← ⎢ = = 2 is performed (one assignment)
⎣ 2 ⎥⎦ ⎢⎣ 2 ⎥⎦
A [location] = A [4] = 9 < 10 = A [2] is performed. (one comparison)
As the above inequality is true, no more operations in this case. The heap
at this stage is
15
10 13
The Comparison
Location = 5 > 1 is performed, (one comparison)
which is true. Therefore,
⎡ location ⎤ ⎡ 5 ⎤
Parent ← ⎢ ⎥ = ⎢2⎥ =2
⎣ 2 ⎦ ⎣ ⎦
is performed (one comparison)
121
Introduction to
Algorithmics At this stage the tree is
15
10 13
9 12
15
12 13
9 10
(ii) (e) j← 6
The comparison
Location = 6>1 is performed (one comparison)
⎡ location ⎤
Therefore, parent ← ⎢ ⎥ = 3 is performed (one assignment)
⎣ 2 ⎦
A [location] = A [6] = 17 < 9 = A[3] is performed (one comparison)
Next
location ← 3 (one assignment)
(location>1) is performed (one comparison)
122
and Basics of Analysis
⎡ location ⎤
parent ← ⎢ =1 (one assignment)
⎣ 2 ⎥⎦
is performed
Further
which is false.
which is not true. Hence the process is completed and we get the heap
17
12 15
9 10
13
Step II: The following three substeps are iterated 1 repeated 5 times:
The sub steps (i) and (ii) are performed 5 times each, which contribute to
10 assignments
Iteration (i): after first two sub steps the heap becomes the tree
13
12 15
9 10
Root node is compared, one by one, with the values of its children
(2 comparisons)
The variable MAX stores 15 and MAX_POS stores the index of the right
child (Two assignment)
123
Introduction to Then 15 is exchanged with 13 (3 assignments)
Algorithmics is performed, to get the heap
15
12 13
9 10
Iteration (ii): 15 of the root is removed and 10, the value of last node, is
moved to the root to get the tree
10
12 13
MAX first stores value 12 and then 13, and MAX_POS stores first the
index of the left child and then index of the right child (4 assignments)
13
12 10
Iteration (iii): Again 13 of the root mode is removed and 9 of the last
node is copied in the root to get the tree
9
12 10
12
9 10
Iteration (iv): 12 is removed and the value 10 in the last node is copied
in the root to get the tree
10
Iteration (v) (i): 10 is deleted from the root and 9 is copied in the root.
As the root is the only node in the tree, the sorting process terminates.
Ex. 7)
The sequence
15, 10, 13, 9, 12, 17,
125
Introduction to to be sorted is stored in A [1.. 6]
Algorithmics We take A [1] = 15 as pivot
i is assigned values 2, 3, 4 etc. to get the first value from the left such that
A[i] > pivot.
The index i = 6, is the first index s.t 17 = A[i] > pivot = 15.
Also j = 5 i.e. A [5] = 12, is the first value from the right such that
A [j] < pivot.
As j < i, therefore
A[j] = A [5] = 12 is exchanged with pivot = 15 so that we get the array
Next the two sublists viz 12, 10, 13, 9 and 17 separated by the pivot value
15, are sorted separately. However the relative positions of the sublists
w.r.t 15 are maintained so we write the lists as
The right hand sublist having only one element viz 17 is already sorted.
So we sort only left-hand sublist but continue writing the whole list.
Pivot for the left sublist is 12 and i = 3 and j = 4 are such that A[i] = 13 is
the left most entry more than the pivot and A[j] =9 is the rightmost value,
which is less than the pivot = 12. After exchange of A[i] and A[j], we get
the list (12, 10, 9, 13), 15, (17). Again moving i to the right and j to the
left, we get, i = 4 and j = 3. As j < i, therefore, the iteration is complete
and A[j] = 9 and pivot = 12 are exchanged so that we get the list ((9, 10)
12 (13)) 15 (17)Only remaining sublist to be sorted is (9, 10). Again pivot
is 9, I = 2 and j = 1, so that A [i] is the left most value greater than the
pivot and A[j] is the right most value less than or equal to pivot. As j < i,
we should exchange A [j] = A[1] with pivot. But pivot also equals A [1].
Hence no exchange. Next, sublist left to sorted is {10} which being a
single element is already sorted. The sublists were formed such that any
element in a sublist on the left is less than any element of the sublist on
the right, merging does not require.
126
6. The Design and Analysis of Algorithms, Anany Levitin: Basics of Analysis
(Pearson Education, 2003).
7. Discrete Mathematics and Its Applications, K.N. Rosen: (Fifth Edition) Tata
McGraw-Hill (2003).
127
Divide-And-Conquer
UNIT 1 DIVIDE-AND-CONQUER
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 5
1.2 General Issues in Divide-and-Conquer 6
1.3 Integer Multiplication 8
1.4 Binary Search 12
1.5 Sorting 13
1.5.1 Merge Sort
1.5.2 Quick Sort
1.6 Randomization Quicksort 17
1.7 Finding the Median 19
1.8 Matrix Multiplication 22
1.9 Exponentiation 23
1.10 Summary 25
1.11 Solutions/Answers 25
1.12 Further Readings 28
1.0 INTRODUCTION
We have already mentioned that solving (a general) problem, with or without
computers, is quite a complex and difficult task. We also mentioned a large number
of problems, which we may encounter, even in a formal discipline like Mathematics,
may not have any algorithmic/computer solutions. Out of the problems, which
theoretically can be solved algorithmically, designing a solution for such a problem is,
in general, quite difficult. In view of this difficulty, a number of standard techniques,
which are found to be helpful in solving problems, have become popular in computer
science. Out of these techniques Divide-and-Conquer is probably the most well-
known one.
The general plan for Divide-and-Conquer technique has the following three major
steps:
Step 1: An instance of the problem to be solved, is divided into a number of smaller
instances of the (same) problem, generally of equal sizes. Any sub-instance
may be further divided into its sub-instances. A stage reaches when either a
direct solution of a sub-instance at some stage is available or it is not further
sub-divisible. In the latter case, when no further sub-division is possible, we
attempt a direct solution for the sub-instance.
Step 3: Combine the solutions so obtained of the smaller instances to get the
solution of the original instance of the problem.
1.1 OBJECTIVES
After going through this Unit, you should be able to:
• explain the essential idea behind the Divide-and-Conquer strategy for solving
problems with the help of a computer, and
• use Divide-and-Conquer strategy for solving problems. 5
Design Techniques-I
1.2 GENERAL ISSUES IN DIVIDE-AND-CONQUER
Recalling from the introduction, Divide-and-Conquer is a technique of designing
algorithms that (informally) proceeds as follows:
Given an instance of the problem to be solved, split this into more than one
sub-instances (of the given problem). If possible, divide each of the sub-instances into
smaller instances, till a sub-instnace has a direct solution available or no further
subdivision is possible. Then independently solve each of the sub-instances and then
combine solutions of the sub-instances so as to yield a solution for the original
instance.
Example 1.2.1:
We have an algorithm, alpha say, which is known to solve all instances of size n, of a
given problem, in at most c n2 steps (where c is some constant). We then discover an
algorithm, beta say, which solves the same problem by:
• Dividing an instance into 3 sub-instances of size n/2.
• Solve these 3 sub-instances.
• Combines the three sub-solutions taking d n steps in combining.
Suppose our original algorithm alpha is used to carry out the Step 2, viz., ‘solve these
sub-instances’. Let
So if dn < (cn2)/4 (i.e., d/4c < n) then beta is faster than alpha.
In particular, for all large enough n’s, (viz., for n > 4d/c = Constant), beta is faster
than alpha.
The algorithm beta improves upon the algorithm alpha by just a constant factor. But
if the problem size n is large enough such that for some i > 1, we have
n > 4d/c and also
n/2 > 4d/c and even
n/2i > 4d/c
which suggests that using beta instead of alpha for the Step 2 repeatedly until the
sub-sub-sub…sub-instances are of size n0 < = (4d/c), will yield a still faster algorithm.
If n < = n0 then
6
Solve problem using Algorithm alpha; Divide-And-Conquer
else
Split the problem instance into 3 sub-instances of size n/2;
Use gamma to solve each sub-instance;
Combine the 3 sub-solutions;
end if ;
end gamma;
Let T (gamma) (n) denote the running time of this algorithm. Then
cn2 if n < = n0
T (gamma) (n) =
3T (gamma) (n/2) + dn, otherwise
we shall show how relations of this form can be estimated. Later in the course, with
these methods it can be shown that
T(gamma) (n) = O (n(log3) ) =O(n1.59)
This is a significant improvement upon algorithms alpha and beta, in view of the fact
that as n becomes larger the differences in the values of n1.59 and n2 becomes larger
and larger.
The improvement that results from applying algorithm gamma is due to the fact that it
maximizes the savings achieved through beta. The (relatively) inefficient method
alpha is applied only to “small ” problem sizes.
In (ii), it is more usual to consider the ratio of initial problem size to sub-instance size.
In our example, the ration was 2. The threshold in (i) is sometimes called the
(recursive) base value. In summary, the generic form of a divide-and-conquer
algorithm is:
Procedure D-and-C (n : input size);
begin
read (n0) ; read the threshold value.
if n < = n0 then
solve problem without further sub-division;
else
Split into sub-instances each of size n/k;
for each of the r sub-instances do
D-and-C (n/k);
Combine the resulting sub-solutions to produce the solution to the original
problem;
end if;
end D-and-C;
7
Design Techniques-I Such algorithms are naturally and easily realised as recursive procedures in (suitable)
high-level programming languages.
x = x n −1 x n − 2 ... x1 x 0 and
y = y n −1 y n − 2 ... y1 y 0
z = z 2 n −1 z 2 n − 2 z 2 n − 3 ... z1 z 0
Note: The algorithm given below works for any number base, e.g., binary, decimal,
hexadecimal, etc. We use decimal simply for convenience.
The classical algorithm for multiplication requires O(n2) steps to multiply two n-digit
numbers.
A step is regarded as a single operation involving two single digit numbers, e.g.,
5+6, 3* 4, etc.
x = x n −1 x n − 2 ... x 1 x 0 and
y = y n −1 y n − 2 ... y1 y 0
n −1
x = ∑ ( x i ) * 10 i ; and
i=0
n −1
y = ∑ ( y i ) * 10 i .
i=0
with representation
z = z 2 n −1 z 2 n − 2 z 2 n − 3 ... z1 z 0
is given by
8
2n − 1 Divide-And-Conquer
i ⎛ n −1 i ⎞ ⎛ n −1 i⎞
z= ∑ ( z i ) * 10 = ⎜ ∑ x i * 10 ⎟ * ⎜⎜ ∑ y i * 10 ⎟⎟ .
i=0 ⎝ i =0 ⎠ ⎝i = 0 ⎠
For example:
581 = 5 * 102 + 8 * 101 + 1 *100
602 = 6 * 102 + 0 * 101 + 2 * 100
581*602 = 349762 = 3 * 105 + 4 * 104 + 9 × 103 + 7 × 102 + 6 × 101
+ 2 × 100
Let us denote
From this we also know that the result of multiplying x and y (i.e., z) is
z = x*y = (a * 10[n/2] + b) * (c * 10[n/2] + d)
= (a * c) * 102[n/2] + (a * d + b * c) * 10[n/2] + (b * d)
where
⎧n , if n is even
2[n/2] = ⎨
⎩n + 1 if n is odd
∗
For a given n-digit number, whenever we divides the sequence of digits into two
subsequences, one of which has [n/] digits, the other subsequence has n ─ [n/2] digits, which is
⎛ n + 1⎞
(n/2) digits if n is even and ⎜ ⎟ if n is odd. However, because of the convenience we
⎝ 2 ⎠
may call both as (n/2) ─ digit sequences/numbers. 9
Design Techniques-I 3. Given the four returned products, the calculation of the result of multiplying
x and y involves only additions (can be done in O(n) steps) and multiplications
by a power of 10 (also can be done in O(n) steps, since it only requires placing
the appropriate number of 0s at the end of the number). (Combine stage).
This saving is accomplished at the expense of a slightly more number of steps taken in
the ‘combine stage’ (Step 3) (although, this will still uses O(n) operations).
We continue with the earlier notations in which z is the product of two numbers x and
y having respectively the decimal representations
x = x n −1 x n − 2 ... x1 x 0
y = y n −1 y n − 2 ...y1 y 0
Further a,b,c,d are the numbers whose decimal representations are given by
d = y[n/2] ─ 1 y[n/2] ─ 2 … y1 y0
Performance Analysis
One of the reasons why we study analysis of algorithms, is that if there are more than
one algorithms that solve a given problem then, through analysis, we can find the
running times of various available algorithms. And then we may choose the one be
which takes least/lesser running time.
• The ratio of initial problem size to sub-problem size. (let us call the ration
as β )
• The number of steps required to divide the initial instance into substances and
to combine sub-solutions, expressed as a function of the input size, n.
Let TP (n) denote the number of steps taken by P on instances of size n. Then
T (P ( n0 )) = Constant (Recursive-base)
T (P ( n )) = ∝ T (P (n⏐ β) + gamma ( n )
In the case when ∝ and β are both constant (as mentioned earlier, in all the examples
we have given, there is a general method that can be used to solve such recurrence
relations in order to obtain an asymptotic bound for the running time Tp (n). These
methods were discussed in Block 1.
In general:
T (n ) = ∝ T ( n/β) + O ( n ∧ gamma)
In general:
T (n ) = ∝ T ( n/β) + O ( n ∧ gamma) 11
Design Techniques-I (where gamma is constant) has the solution
Ex. 1) Using Karatsuba’s Method, find the value of the product 1026732 × 732912
int Binary Seach (int * A, int low, int high, int value)
{ int mid:
while (low < high)
{ mid = (low + high) / 2;
if (value = = A [mid])
return mid;
else if (value < A [mid])
high = mid – 1;
else low = mid + 1;
}
return ─ 1;
}
Explanation of the Binary Search Algorithm
It takes as parameter the array A, in which the value is to be searched. It also takes
the lower and upper bounds of the array as parameters viz., low and high respectively.
At each step of the interation of the while loop, the algorithm reduces the number of
elements of the array to be searched by half. If the value is found then its index is
returned. However, if the value is not found by the algorithm, then the loop terminates
if the value of the low exceeds the value of high, there will be no more items to be
searched and hence the function returns a negative value to indicate that item is not
found.
Analysis
As mentioned earlier, each step of the algorithm divides the block of items being
searched in half. The presence or absence of an item in an array of n elements, can be
established in at most lg n steps.
12
Thus the running time of a binary search is proportional to lg n and we say this is a Divide-And-Conquer
O(lg n) algorithm.
Ex. 2) Explain how Binary Search method finds or fails to find in the given sorted
array:
8 12 75 26 35 48 57 78 86
93 97 108 135 168 201
the following values
(i) 15
(ii) 93
(iii) 43
1.5 SORTING
We have already discussed the two sorting algorithms viz., Merge Sort and Quick
Sort. The purpose of repeating the algorithm is mainly to discuss, not the design but,
the analysis part.
Divide Step: If given array A has zero or 1 element then return the array A, as it is
trivially sorted. Otherwise, chop the given array A in almost the middle to give two
subarrays A1 and A2, each of which containing about half of the elements in A.
The recursion stops when the subarray has just only 1 element, so that it is trivially
sorted. Below is the Merge Sort function in C++.
13
Design Techniques-I void merge_sort (int A[], int p, int r)
{
Next, we define merge function which is called by the program merge-sort At this
stage, we have an Array A and indices p,q,r such that p < q < r. Subarray A[p .. q] is
sorted and subarray A [q + 1 . . r] is sorted and by the restrictions on p, q, r, neither
subarray is empty. We want that the two subarrays are merged into a single sorted
subarray in A[p .. r]. We will implement it so that it takes O(n) time, where
n = r – p + 1 = the number of elements being merged.
Let us consider two piles of cards. Each pile is sorted and placed face-up on a table
with the smallest card on top of each pile. We will merge these into a single sorted
pile, face-down on the table. A basic step will be to choose the smaller of the two top
cards, remove it from its pile, thereby exposing a new top card and then placing the
chosen card face-down onto the output pile. We will repeatedly perform these basic
steps until one input becomes empty. Once one input pile empties, just take the
remaining input pile and place it face-down onto the output pile. Each basic step
should take constant time, since we check just the two top cards and there are n basic
steps, since each basic step removes one card from the input piles, and we started with
n cards in the input piles. Therefore, this procedure should take O(n) time. We don’t
actually need to check whether a pile is empty before each basic step. Instead we will
put on the bottom of each input pile a special sentinel card. It contains a special value
that we use to simplify the code. We know in advance that there are exactly r – p + 1
non sentinel cards. We will stop once we have performed r – p + 1 basic step. Below
is the function merge which runs in O(n) time.
int n 1 = q – p + 1
int n 2 = r – q:
int* L = new int[n1 + 1];
int * R = new int [ n2 + 1];
for (int i = 1; i < = n1; i ++)
L [i] = A [ p + i – 1];
i = j = 1;
for (int k = p;k <=r; k++)
‘
‘
if (L[i] < = R [j])
‘
‘
A[k] = L[i];
14
i + = 1; Divide-And-Conquer
‘
‘
else
‘
‘
A[k] = R[j];
j + = 1;
‘
‘ }
Solving the merge-sort recurrence: By the master theorem, this recurrence has the
solution T(n) = O(n lg n). Compared to insertion sort (O(n2) worst-case time), merge
sort is faster. Trading a factor of n for a factor of lg n is a good deal. On small inputs,
insertion sort may be faster. But for large enough inputs, merge sort will always be
faster, because its running time grows more slowly than insertion sort’s.
Partition a [ 1…n] into subarrays A′ = A [ 1..q] and A′′ = A[q + 1…n] such that all
elements in A′′ are larger than all elements in A′.
Recursively sort A′ and A′′.
15
Design Techniques-I Pseudo code for QUICKSORT:
QUICKSORT (A, p, r)
If p < r THEN
q = PARTITION (A, p, r)
QUICKSORT (A, p, q ─ 1)
QUICKSORT (A, q + 1, r)
end if
Then, in order to sort an array A of n elements, we call QUICKSORT with the three
parameters A, 1 and n QUICKSORT (A, 1, n).
If q = n/2 and is θ(n) time, we again get the recurrence. If T(n) denotes the time taken
by QUICKSORT in sorting an array of n elements.
T(n) = 2T(n/2) + θ (n). Then after solving the recurrence we get the running time as
⇒T(n) = θ (n log n)
The problem is that it is hard to develop partition algorithm which always divides A in
two halves.
PARTITION (A, p, r)
x = A [ r]
i=p─1
FOR j = p TO r ─ 1 DO
IF A [J] ≤ r THEN
i=i+1
Exchange A [ i] and A[j]
end if
end Do
Exchange A[I + 1] and A [r]
RETURN i + 1
QUICKSORT correctness:
• Easy to show inductively, if PARTITION works correctly
Example:
⏐2 8 7 1 3 5 6 4 i = 0, j = 1
2 ⏐ 8 7 1 3 5 6 4 i = 1, j = 2
2 ⏐ 8 ⏐ 7 1 3 5 6 4 i = 1, j = 3
2 ⏐ 8 7 ⏐ 1 3 5 6 4 i = 1, j = 4
2 1 ⏐ 7 8 ⏐ 3 5 6 4 i = 2, j = 5
2 1 3 ⏐ 8 7 ⏐ 5 6 4 i = 3, j = 6
2 1 3 ⏐ 8 7 5 ⏐ 6 4 i = 3, j = 7
2 1 3 ⏐ 8 7 5 6 4 i = 3, j = 8
2 1 3 ⏐ 4 ⏐ 7 5 6 8 q=4
• If we run QUICKSORT on a set of inputs that are already sorted, the average
running time will be close to the worst-case.
• Similarly, if we run QUICKSORT on a set of inputs that give good splits, the
average running time will be close to the best-case.
• If we run QUICKSORT on a set of inputs which are picked uniformly at
random from the space of all possible input permutations, then the average case
will also be close to the best-case. Why? Intuitively, if any input ordering is
equally likely, then we expect at least as many good splits as bad splits,
therefore on the average a bad split will be followed by a good split, and it gets
“absorbed” in the good split.
So, under the assumption that all input permutations are equally likely, the average
time of QUICKSORT is θ (n lg n) (intuitively). Is this assumption realistic?
• Not really. In many cases the input is almost sorted: think of rebuilding indexes
in a database etc.
The question is: how can we make QUICKSORT have a good average time
irrespective of the input distribution?
• Using randomization.
Running time of a randomized algorithm depends not only on input but also on the
random choices made by the algorithm.
Randomized algorithms have best-case and worst-case running times, but the inputs
for which these are achieved are not known, they can be any of the inputs.
We are normally interested in analyzing the expected running time of a randomized
algorithm, that is the expected (average) running time for all inputs of size n
∗
This section may be omitted after one reading. 17
Design Techniques-I • Alternatively we can modify PARTITION slightly and exchange last element in
A with random element in A before partitioning.
RANDQUICKSORT (A, p, r)
IF p < r THEN
q = RANDPARTITION (A,p,r)
RANDQUICKSORT (A, p, q ─ 1, r)
END IF
One call of PARTITION takes O (1) time plus time proportional to the number of
iterations FOR-loop.
─ In each iteration of FOR-loop we compare an element with the pivot
element.
Each pair of elements zi and zj are compared at most once (when either of them is the
pivot)
X = ∑in=−11 ∑
n
j = i +1
X ij where
⎧⎪ 1 If zi compared to zi
Xij = ⎨
⎪⎩0 If zi not compared to zi
18
Divide-And-Conquer
= ∑in=−11 ∑ nj = i + 1 Pr ⎡zi compared to z j ⎤
⎢⎣ ⎥⎦
To compute Pr [zi compared to zj] it is useful to consider when two elements are not
compared.
Assume first pivot it 7 ⇒ first partition separates the numbers into sets
{1, 2, 3, 4, 5, 6} and {8, 9, 10}.
In partitioning. 7 is compared to all numbers. No number from the first set will ever
be compared to a number from the second set.
In general once a pivot r, zi < r < zj, is chosen we know that zi and zj cannot later be
compared.
On the other hand if zi is chosen as pivot before any other element in Zij then it is
compared to each element in Zij. Similar for zj.
In example 7 and 9 are compared because 7 is first item from Z7,9 to be chosen as pivot
and 2 and 9 are not compared because the first pivot in Z2,9 is 7.
Prior to an element in Zij being chosen as pivot the set Zij is together in the same
partition ⇒ any element in Zij is equally likely to be first element chosen as pivot ⇒
1
the probability that zi or zj is chosen first in Zij is
j− i +1
2
Pr [zi compared to zj ] =
j − i +1
• We now have:
E⏐X⏐ = ∑in=−11 ∑ nj = i + 1 Pr ⎡⎢zi compared to z j ⎤⎥
⎣ ⎦
2
= ∑in=−11 ∑ nj = i + 1
j − i +1
= ∑in=−11 ∑kn − i 2
=1 k + 1
= ∑in=−11 ∑ nk − i 2
=1 k
= ∑in=−11 O (log n )
= O(n log n)
Next time we will see how to make quicksort run in worst-case O(n log n ) time.
We will give here two algorithms for the solution of above problem. One is practical
randomized algorithm with O(n) expected running time. Another algorithm which is
more of theoretical interest only, with O(n) worst case running time.
Randomized Selection
The key idea is to use the algorithm partition () from quicksort but we only need to
examine one subarray and this saving also shows up in the running time O(n). We
will use the Randomized Partition (A, p,r) which randomly partitions the Array A
around an element A[q] such that all elements from A[p] to A[q─1] are less than A[q]
and all elements A[q+1] to A[r] are greater than A[q]. This is shown in a diagram
below.
We can now give the pseudo code for Randomized Select (A, p, r, i). This procedure
selects the ith order statistic in the Array A [p ..r].
Worst case: The partition will always occur in 0:n-1 fashion. Therefore, the time
required by the Randomized Select can be described by a recurrence given below:
T(n) = T(n─1) + O(n)
= O(n2) (arithmetic series)
Average case: Let us now analyse the average case running time of Randomized
Select.
For upper bound, assume ith element always occurs in the larger side of partition:
1 n −1
T(n) ≤ ∑ T(max(k , n − k − 1)) + Θ (n )
n k =0
20
2 n −1 Divide-And-Conquer
≤ ∑ T ( k ) + Θ( n )
n k =n / 2
2 n −1
T(n) ≤ ∑ T ( k ) + Θ( n ) The recurrence we started with
n k =n / 2
2 n −1
≤ ∑ ck + Θ(n ) Substitute T(n) ≤ cn for T(k)
n k =n / 2
2c ⎛ n −1 n / 2−1⎞
= ⎜ ∑ k − ∑ k ⎟⎟ + Θ(n ) Split” the recurrence
n ⎜⎝ k =1 k =1 ⎠
2c ⎛ 1 1⎛n ⎞n⎞
= ⎜⎜ (n − 1)n − ⎜ − 1⎟ ⎟⎟ + Θ (n ) Expand arithmetic series
n ⎝2 2⎝2 ⎠ 2⎠
c⎛n ⎞
= c(n─1) - ⎜ − 1⎟ + Θ ( n ) Multiply it out
2⎝2 ⎠
c⎛n ⎞
T(n) ≤ c(n─1) - ⎜ − 1⎟ + Θ ( n ) The recurrence so far
2⎝2 ⎠
cn c
= cn – c ─ − + Θ( n ) Subtract c/2
4 2
cn c
= cn – ( − + Θ(n ) ) Rearrange the arithmetic
4 2
≤ cn (if c is big enough) What we set out to prove
There are at least ½ of the 5-element medians which are ≤ x which equal to
⎣⎣n/5⎦/2⎦ = ⎣n/10⎦ and also there are at least 3 ⎣n/10⎦ elements which are ≤ x. Now,
for large n, 3 ⎣n/10⎦ ≥ n/4. So at least n/4 elements are ≤ x and similarly n/4 elements
are ≥ x. Thus after partitioning around x, step 5 will call Select () on at most 3n/4
elements. The recurrence is therefore.
21
Design Techniques-I T(n) ≤ T(⎣ n/5⎦) + T(3n/4) + Θ(n)
≤ T(n/5) + T(3n/4) + Θ (n) ⎣n/5⎦ ≤ n/5
≤ cn/5 + 3cn/Θ (n)) Substitute T(n) = cn
= 19cn/20 + Θ (n) Combine fractions
= cn – (cn/20 - Θ (n)) Express in desired form
≤ cn if c is big enough What we set out to prove
The idea behind the Strassen’s algorithm is to multiply 2 × 2 matrices with only 7
scalar multiplications (instead of 8). Consider the matrices
⎛r s⎞ ⎛a b ⎞⎛ e g ⎞
⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎜ ⎟⎟
⎝ t u⎠ ⎝c d ⎟⎠ ⎜⎝ f h ⎠
The seven submatrix products used are
P1 = a . (g – h)
P2 = (a + b) . h
P3 = ( c+ d ) . e
P4 = d . (f – e)
P5 = (a + d) . ( e + h)
P6 = (b – d) . (f + h)
P7 = ( a ─ c) . ( e + g)
Using these submatrix products the matrix products are obtained by
r = P5 + P4 – P2 + P6
s = P1 + P2
t = P3 + P4
u = P5 + P1 – P3 – P7
This method works as it can be easily seen that s = (ag – ah) + (ah + bh) = ag + bh. In
this method there are 7 multiplications and 18 additions. For (n × n) matrices, it can
be worth replacing one multiplication by 18 additions, since multiplication costs are
much more than addition costs.
1.9 EXPONENTIATION
Exponentiating by Squaring is an algorithm used for the fast computation of large
powers of number x. It is also known as the square-and-multiply algorithm or
binary exponentiation. It implicitly uses the binary expansion of the exponent. It is
of quite general use, for example, in modular-arithmetic.
Squaring Algorithm
The following recursive algorithm computes xn, for a positive integer n:
x, if n = 1
Power (x, n) = Power (x2, n/2), if n is even
x . (Power (x2, (n – 1)/2)), if n > 2 is odd
Further Applications
The same idea allows fast computation of large exponents modulo a number.
Especially in cryptography, it is useful to compute powers in a ring of integers modulo
q. It can also be used to compute integer powers in a group, using the rule
Power (x, ─ n) = (Power (x, n))─1.
The method works in every semigroup and is often used to compute powers of
matrices.
Examples 1.9.1:
13789722341 (mod 2345)
would take a very long time and lots of storage space if the native method is used:
compute 1378972234 then take the remainder when divided by 2345. Even using a more
effective method will take a long time: square 13789, take the remainder when
divided by 2345, multiply the result by 13789, and so on. This will take 722340
modular multiplications. The square-and-multiply algorithm is based on the
observation that 13789722341 = 13789 (137892)361170. So if we computed 137892, then
the full computation would only take 361170 modular multiplications. This is a gain
23
Design Techniques-I of a factor of two. But since the new problem is of the same type, we can apply the
same observation again, once more approximately having the size.
The repeated application this algorithm is equivalent to decomposing the exponent (by
a base conversion to binary) into a sequence of squares and products: for example,
x7= x4 x2x1
= (x2)2 * x2*x
= (x2 * x)2 * x algorithm needs only 4 multiplications instead of
7–1=6
where 7 = (111)2 = 22 + 21 + 20
Addition Chain
For example: 1, 2, 3, 6, 12, 24, 30, 31 is an addition chain for 31, of length 7, since
2=1+1
3=2+1
6=3+3
12 = 6 + 6
24 = 12 + 12
30 = 24 + 6
31 = 30 + 1
Addition chains can be used for exponentiation: thus, for example, we only need
7 multiplications to calculate 531:
52 = 51 × 51
52 = 52 × 51
56 = 53 × 53
512 = 56 × 56
524 = 512 × 512
530 = 524 × 56
531 = 530 × 51
Addition chain exponentiation
In mathematics, addition chain exponentiation is a fast method of exponentiation. It
works by creating a minimal-length addition chain that generates the desired
exponent. Each exponentiation in the chain can be evaluated by multiplying two of
the earlier exponentiation results.
24
This algorithm works better than binary exponentiation for high exponents. However, Divide-And-Conquer
it trades off space for speed, so it may not be good on over-worked systems.
1.10 SUMMARY
The unit discusses various issues in respect of the technique viz., Divide and Conquer
for designing and analysing algorithms for solving problems. First, the general plan of
the Divide and conquer technique is explained and then an outline of a formal Divide-
and-conquer procedure is defined. The issue of whether at some stage to solve a
problem directly or whether to further subdivide it, is discussed in terms of the relative
efficiencies in the two alternative cases.
1.11 SOLUTIONS/ANSWERS
Ex.1) 1026732 × 732912
Though, the above may be simplified in another simpler way, yet we want to
explain Karatsuba’s method, therefore, next, we compute the products.
U = 1026 × 732
V = 732 × 912
P = 1758 × 1644
Let us consider only the product 1026 × 732 and other involved products may
be computed similarly and substituted in (A).
Let us write
U = 1026 × 732 = (10 × 102 + 26) (07 × 102 + 32)
= (10 × 7) 104 + 26 × 32 + [(10 + 7) (26 + 32)
─ 10 × 7 ─ 26 × 32)] 102
25
Design Techniques-I = 17 × 104 + 26 × 32 + (17 × 58 ─ 70 ─ 26 × 32) 102
At this stage, we do not apply Karatsuba’s algorithm and compute the products of
2-digit numbers by conventional method.
Ex. 2) The number of elements in the given list is 15. Let us store these in an array
say A[1..15]. Thus, initially low = 1 and high = 15 and, hence,
mid = (1+15)/2 = 8.
In the first iteration, the search algorithm compares the value to be searched
with A[8] = 78
low = 1, high 4 ─ 1 = 3
⎛1+ 3⎞
Therefore, mid = ⎜ ⎟=2
⎝ 2 ⎠
Therefore
⎡3 + 4⎤
(new) mid = ⎢ ⎥ =3
⎣ 2 ⎦
As A[3] = 15 (the value to be searched)
Hence, the algorithm terminates and returns the index value 3
as output.
9 + 15
and (new) mid = = 12, where A[12] = 108
2
9 + 11
and (new) mid = = 10 with A[10] = 93
2
low = 5 high = 6 ─ 1 = 5
hence mid = 5, and A[5] = 35
As 43 > A[5], hence value ≠ A[5].
But, at this stage, low is not less than high and hence the
algorithm returns ─ 1, indicating failure to find the given
value in the array.
⎡a b ⎤ ⎡ − 7 6⎤
⎢c d ⎥ = ⎢ 5 9 ⎥⎦
⎣ ⎦ ⎣
and
⎡e g ⎤ ⎡ − 7 6⎤
⎢f h ⎥ = ⎢ 5 9 ⎥⎦
⎣ ⎦ ⎣
Then
P1 = a . (g ─ h)
= 5 (6 ─ 9) = ─ 15
P2 = (a + b) . h = (5 + 6) . 9 = 99,
P3 = (c + d) . e = (─ 4 + 3) . (─ 7) = 7
P4 = d . (f ─ e) = 3. (5 ─ (─ 7)) = 36;
P5 = (a + d) (e + h) = (5 + 3) (─ 7 + 9) = 16
P6 = ( b ─ d) (f + h) = (6 ─ 3) . (5 + 9) = 42
P7 = (a ─ c) (e + g) = (5 ─ (─ 4)) (─ 7 + 6) = ─ 9
⎡r s ⎤
⎢t u ⎥
⎣ ⎦
27
Design Techniques-I where
r = P5 + P4 ─ P2 + P6
= 16 + 36 ─ 99 + 42 = ─ 5
s = P1 + P2 = ─ 15 + 99 = 84
t = P3 + P4 = 7 + 36 = 43
u = P5 + P1 ─ P3 ─ P7
= 16 + (─ 15) ─ 7 ─ (─9)
= 16 ─ 15 ─ 7 + 9
=3
28
Graph Algorithms
UNIT 2 GRAPH ALGORITHMS
Structure Page Nos.
2.0 Introduction 29
2.1 Objectives 29
2.2 Examples 29
2.2.1 NIM/Marienbad Game
2.2.2 Function For Computing Winning Nodes
2.3 Traversing Trees 32
2.4 Depth-First Search 34
2.5 Breadth-First Search 44
2.5.1 Algorithm of Breadth First Search
2.5.2 Modified Algorithm
2.6 Best-First Search & Minimax Principle 49
2.7 Topological Sort 55
2.8 Summary 57
2.9 Solutions/Answers 57
2.10 Further Readings 59
2.0 INTRODUCTION
A number of problems and games like chess, tic-tac-toe etc. can be formulated and
solved with the help of graphical notations. The wide variety of problems that can be
solved by using graphs range from searching particular information to finding a good
or bad move in a game. In this Unit, we discuss a number of problem-solving
techniques based on graphic notations, including the ones involving searches of
graphs and application of these techniques in solving game and sorting problems.
2.1 OBJECTIVES
After going through this Unit, you should be able to:
• explain and apply various graph search techniques, viz Depth-First Search
(DFS), Breadth-First-Search (BFS), Best-First Search, and Minimax Principle;
• discuss relative merits and demerits of these search techniques, and
• apply graph-based problem-solving techniques to solve sorting problems and to
games.
2.2 EXAMPLES
To begin with, we discuss the applicability of graphs to a popular game known as
NIM.
Nim is a game for 2 players, in which the players take turns alternately. Initially the
players are given a position consisting of several piles, each pile having a finite
number of tokens. On each turn, a player chooses one of the piles and then removes at
least one token from that pile. The player who picks up the last token wins.
29
Design Techniques-I “losing positions” (for the player whose turn it is to move next). The positions which
are not losing ones are called winning.
Marienbad is a variant of a nim game and it is played with matches. The rules of this
game are similar to the nim and are given below:
(1) It is a two-player game
(2) It starts with n matches ( n must be greater or equal to 2 i.e., n >= 2)
(3) The winner of the game is one who takes the last match, whosoever is left with
no sticks, loses the game.
(4) On the very first turn, up to n –1 matches can be taken by the player having the
very first move.
(5) On the subsequent turns, one must remove at least one match and at most twice
the number of matches picked up by the opponent in the last move.
Before going into detailed discussion through an example, let us explain what may be
the possible states which may indicate different stages in the game. At any stage, the
following two numbers are significant:
(i) The total number of match sticks available, after picking up by the players so
far.
(ii) The number of match sticks that the player having the move can pick up.
After discussing some of possible states, we elaborate the game described above
through the following example.
Example 2.2.1:
Let the initial number of matches be 6, and let player A take the chance to move first.
What should be A’s strategy to win, for his first move. Generally, A will consider all
possible moves and choose the best one as follow:
• if A takes 5 matches, that leaves just one for B, then B will take it and win the
game;
• if A takes 4 matches, that leaves 2 for B, then B will take it and win;
• if A takes 3 matches, that leaves 3 for B, then B will take it and win;
30
Graph Algorithms
• if A takes 2 match, that leaves 4 for B, then B will take it and win;
• if A takes 1 match, that leaves 5 for B. In the next step, B can take 1 or 2(recall
that B can take at most the twice of the number what A just took) and B will go
to either of the states (4,2) or (3,3) both of which are winning moves for A, B’s
move will lead to because A can take all the available stick and hence there will
be not more sticks for B to pick up. Taking a look at this reasoning process, it is
for sure that the best move for A is just taking one match stick.
The above process can be expressed by a directed graph, where each node
corresponds to a position (state) and each edge corresponds a move between two
positions, with each node expressed by a pair of numbers < i,j >, 0 <= j <= i, and
i: the number of the matches left;
j : the upper limit of number of matches which can be removed in the next move, that
is, any number of matches between 1 and j can be taken in the next move.
In the directed graph shown below, rectangular nodes denote losing nodes and
oval nodes denote winning nodes:
5,2
6,5
3,3
6,5 4,2
4,4
5
3,2
2,2
1,1
Figure 1
• a terminal node < 0, 0 >, from which there is no legal move. It is a losing
position.
• a nonterminal node is a winning node (denoted by a circle), if at least one of its
successors is a losing node, because the player currently having the move is can
leave his opponent in losing position.
• a nonterminal node is a losing node (denoted by a square) if all of its successors
are wining nodes. Again, because the player currently having the move cannot
avoid leaving his opponent in one of these winning positions.
How to determine the wining nodes and losing nodes in a directed graph?
Intuitively, we can starting at the losing node < 0, 0 >, and work back according to
the definition of the winning node and losing node. A node is a losing node, for the
current player, if the move takes to a state such that the opponent can make at least
one move which forces the current player to lose. On the other hand, a node is a
winning node, if after making the move, current player will leave the opponent in a
state, from which opponent can not win. For instance, in any of nodes < 1, 1 >,
31
Design Techniques-I < 2, 2 >, < 3, 3 > and < 4, 4 >, a player can make a move and leave his opponent to
be in the position < 0, 0 >, thus these 4 nodes are wining nodes. From position
< 3, 2 >, two moves are possible but both these moves take the opponent to a winning
position so it is a losing node. The initial position < 6, 5 > has one move which takes
the opponent to a losing position so it is a winning node. Keeping the process of
going in the backward direction, we can mark the types of nodes in a graph. A
recursive C program for the purpose, can be implemented as follows:
Ex.1) Draw a directed graph for a game of Marienbad when the number of match
sticks, initially, is 5.
Preconditioning
Consider a scenario in which problem might have many similar situations or
instances, which are required to be solved. In such a situation, it it might be useful to
spend some time and energy in calculating the auxiliary solutions (i.e., attaching
some extra information to the problem space) that can be used afterwards to fasten
the process of finding the solution of each of these situations. This is known as
preconditioning. Although some time has to be spent in calculating / finding the
auxiliary solutions yet it has been seen that in the final tradeoff, the benefit achieved
in terms of speeding up of the process of finding the solution of the problem will be
much more than the additional cost incurred in finding auxiliary/additional
information.
In other words, let x be the time taken to solve the problem without preconditioning, y
be the time taken to solve the problem with the help of some auxiliary results (i.e.,
after preconditioning) and let t be the time taken in preconditioning the problem space
i.e., time taken in calculating the additional/auxiliary information. Then to solve n
typical instances, provided that y < x , preconditioning will be beneficial only
when ,
nx > t + ny
i.e., nx – ny > t
or n > t / (x – y)
32
Graph Algorithms
Preconditioning is also useful when only a few instances of a problem need to be
solved. Suppose we need a solution to a particular instance of a problem, and we need
it in quick time. One way is to solve all the relevant instances in advance and store
their solutions so that they can be provided quickly whenever needed. But this is a
very inefficient and impractical approach,─ i.e., to find solutions of all instances when
solution of only one is needed. On the other hand, a popular alternative could be to
calculate and attach some additional information to the problem space which will be
useful to speedup the process of finding the solution of any given instance that is
encountered.
For an example, let us consider the problem of finding the ancestor of any given node
in a rooted tree (which may be a binary or a general tree).
In any rooted tree, node u will be an ancestor of node v, if node u lies on the path
from root to v. Also we must note that every node is an ancestor of itself and root is an
ancestor of all nodes in the tree including itself. Let us suppose, we are given a pair of
nodes (u,v) and we are to find whether u is an ancestor or v or not. If the tree contains
n nodes, then any given instance can take Ω(n) time in the worst case. But, if we
attach some relevant information to each of the nodes of the tree, then after spending
Ω(n) time in preconditioning, we can find the ancestor of any given node in constant
time.
Now to precondition the tree, we first traverse the tree in preorder and calculate the
precedence of each node in this order, similarly, we traverse the tree in postorder and
calculate the precedence of each node. For a node u, let precedepre[u] be its
precedence in preorder and let precedepost[u] be its precedence in postorder.
Let u and v be the two given nodes. Then according to the rules of preorder and
postorder traversal, we can see that :
In preorder traversal, as the root is visited first before the left subtree and the right
subtree, so,
If precedepre[u] <= precedepre[v], then
u is an ancestor of v or u is to the left of v in the tree.
In postorder traversal, as the root is visited last, because, first we visit leftsubtree,
then right subtree and in the last we visit root so,
If precedepost[u] >= precedepost[v], then
u is an ancestor of v or u is to the right of v in the tree.
So for u to be an ancestor of v, both the following conditions have to be satisfied:
precedepre[u] <= precede[v] and precedepost[u] >= precedepost[v].
Thus, we can see that after spending some time in calculating preorder and postorder
precedence of each node in the tree, the ancestor of any node can be found in constant
time.
B C D
E F G H
33
Design Techniques-I
2.4 DEPTH-FIRST SEARCH
The depth-first search is a search strategy in which the examination of a given vertex
u, is delayed when a new vertex say v is reached and examination of v is delayed
when new vertex say w is reached and so on. When a leaf is reached (i.e., a node
which does not have a successor node), the examination of the leaf is carried out. And
then the immediate ancestor of the leaf is examined. The process of examination is
carried out in reverse order of reaching the nodes.
In depth first search for any given vertex u, we find or explore or discover the first
adjacent vertex v (in its adjacency list), not already discovered. Then, in stead of
exploring other nodes adjacent to u, the search starts from vertex v which finds its
first adjacent vertex not already known or discovered. The whole process is repeat for
each newly discovered node. When a vertex adjacent to v is explored down to the
leaf, we back track to explore the remaining adjacent vertices of v. So we search
farther or deeper in the graph whenever possible. This process continues until we
discover all the vertices reachable from the given source vertex. If still any
undiscovered vertices remain then a next source is selected and the same search
process is repeated. This whole process goes on until all the vertices of the graph are
discovered.
The vertices have three adjacent different statuses during the process of traversal or
searching, the status being: unknown, discovered and visited. Initially all the vertices
have their status termed as ‘unknown’, after being explored the status of the vertex is
changed to ‘discovered’ and after all vertices adjacent to a given vertex are discovered
its status is termed as ‘visited’. This technique ensures that in the depth first forest, at
a time each vertex belong to only one depth-first tree so these trees are disjoint.
Because we leave partially visited vertices and move ahead, to backtrack later, stack
will be required as the underlying data structure to hold vertices. In the recursive
version of the algorithm given below, the stack will be implemented implicitly,
however, if we write a non-recursive version of the algorithm, the stack operation
have to be specified explicitly.
In the algorithm, we assume that the graph is represented using adjacency list
representation. To store the parent or predecessor of a vertex in the depth-first search,
we use an array parent[]. Status of a ‘vertex’ i.e., unknown, discovered, or visited is
stored in the array status. The variable time is taken as a global variable. V is the
vertex set of the graph G.
In depth-first search algorithm, we also timestamp each vertex. So the vertex u has
two times associated with it, the discovering time d[u] and the termination time t[u].
The discovery time corresponds to the status change of a vertex from unknown to
discovered, and termination time corresponds to status change from discovered to
visited. For the initial input graph when all vertices are unknown, time is initialized to
0. When we start from the source vertex, time is taken as 1 and with each new
discovery or termination of a vertex, the time is incremented by 1. Although DFS
algorithm can be written without time stamping the vertices, time stamping of vertices
helps us in a better understanding of this algorithm. However, one drawback of time
stamping is that the storage requirement increases.
Also in the algorithm we can see that for any given node u, its discovering time will
be less than its termination time i.e., d[u] < t[u].
34
Graph Algorithms
The algorithm is:
Program
DFS(G)
//This fragment of algorithm performs initializing
//and starts the depth first search process
4 time = 0 }
5 for each vertex u ε V
6 {if status[u] == unknown
7 VISIT(u)
VISIT(U)
1 status[u] = discovered;
2 time = time + 1;
3 d[u] =time;}
4 for each Vertex v ε V adjacent to u
5 {if status[v] == unknown
6 parent[v] = u;
7 VISIT(V);
8 time = time + 1;
9 t[u] = time;
10 status[u] = visited;}
In the procedure DFS, the first for-loop initializes the status of each vertex to
unknown and parent or predecessor vertex to NULL. Then it creates a global variable
time and initializes it to 0. In the second for-loop belonging to this procedure, for each
node in the graph if that node is still unknown, the VISIT(u) procedure is called. Now
we can see that every time the VISIT (u) procedure will be called, the vertex u it will
become the root of a new tree in the forest of depth first search.
Whenever the procedure VISIT(u) will be called with parameter u, the vertex u will be
unknown. So in the procedure VISIT(u), first the status of vertex u is changed to
‘discovered’, time is incremented by 1 and it is stored as discovery time of vertex u in
d[u].
When the VISIT procedure will be called for the first time, d[u] will be 1. In the
for-loop for each given vertex u, every unknown vertex adjacent to u is visited
recursively and the parent[] array is updated. When the for-loop concludes, i.e., when
every vertex adjacent to u is discovered, the time is increment by 1 and is stored as the
termination time of u i.e. t[u] and the status of vertex u is changed to ‘visited’.
In procedure DFS(), each for loop takes time O(⏐V⏐), where ⏐V⏐ is the number of
vertices in V. The procedure VISIT is called once for every vertex of the graph. In the
procedure visit for each of the for-loop is executed equal to the number of edges
emerging from that node and yet not traversed. Considering the adjacency list of all
nodes to total number of edges traversed are O(⏐E⏐), where ⏐E⏐, is the number of
edges in E. So the running time of DFS is, therefore, O (⏐V⏐+⏐E⏐).
35
Design Techniques-I Example 2.4.1:
For the graph given in Figure 2.4.1.1. Use DFS to visit various vertices. The vertex
D is taken as the starting vertex and, if there are more than one vertices adjacent to a
vertex, then the adjacent vertices are visited in lexicographic order.
In the following,
(i) the label i/ indicates that the corresponding vertex is the ith discovered vertex.
(ii) the label i/j indicates that the corresponding vertex is the ith discovered vertex
and jth in the combined sequence of discovered and visited.
Figure 2.4.1.2: D has two neighbors by convention A is visited first i.e., the status of A changes to
discovered, d[A] = 2
Figure 2.4.1.3: A has two unknown neighbors B and C, so status of B changes to ‘discovered’, i.e.,
d[B] = 3
36
Graph Algorithms
Figure 2.4.1.5: All of E’s neighbors are discovered so status of vertex E is changed to ‘visited’ and
t[E] = 5
Figure 2.4.1.7: Similarly vertices G, E and H are discovered respectively with d[G] = 7, d[C] = 8
and d[H] = 9
37
Design Techniques-I
Figure 2.4.1.8: Now as all the neighbors of H are already discovered we backtrack, to C and stores
its termination time as t[H] = 10
Figure 2.4.1.9: We find the termination time of remaining nodes in reverse order, backtracking
along the original path ending with D.
The resultant parent pointer tree has its root at D, since this is the first node visited.
Each new node visited becomes the child of the most recently visited node. Also we
can see that while D is the first node to be ‘discovered’, it is the last node terminated.
This is due to recursion because each of D’s neighbors must be discovered and
terminated before D can be terminated. Also, all the edges of the graph, which are not
used in the traversal, are between a node and its ancestor. This property of depth-first
search differentiates it from breadth-first search tree.
Also we can see that the maximum termination time for any vertex is 16, which is
twice the number of vertices in the graph because time is incremented only when a
vertex is discovered or terminated and each vertex is discovered once and terminated
once.
Note: We should remember that in depth-first search the third case of overlapping
intervals is not possible i.e., situation given below is not, possible because of
recursion.
(2) Another important property of depth-first search (sometimes called white path
property) is that v is a descendant of u if and only if at the time of discovery of
u, there is at least one path from u to v contains only unknown vertices (i.e.,
white vertices or vertices not yet found or discovered).
(3) Depth-First Search can be used to find connected components in a given graph:
One useful aspect of depth first search algorithm is that it traverses connected
component one at a time and then it can be used to identify the connected
components in a given graph.
(4) Depth-first search can also be used to find cycles in an undirected graph:
we know that an undirected graph has a cycle if and only if at some particular
point during the traversal, when u is already discovered, one of the neighbors v
of u is also already discovered and is not parent or predecessor of u.
We can prove this property by the argument that if we discover v and find that
u is already discovered but u is not parent of v then u must be an ancestor of v
and since we traveled u to v via a different route, there is a cycle in the graph.
Ex.3) Trace how DFS traverses (i.e., discover and visits) the graph given below
when starting node/vertex is B.
B C
E F G H
To perform depth first search in directed graphs, the algorithm given above can be
used with minor modifications. The main difference exists in the interpretation of an
“adjacent vertex”. In a directed graph vertex v is adjacent to vertex u if there is a
directed edge from u to v. If a directed edge exists from u to v but not from v to u,
then v is adjacent to u but u is not adjacent to v.
Because of this change, the algorithm behaves differently. Some of the previously
given properties may no longer be necessarily applicable in this new situation.
39
Design Techniques-I Edge Classification
Another interesting property of depth first search is that search can be used to classify
different type of edges of the directed graph G(V,E). This edge classification gives us
some more information about the graph.
Note: In an undirected graph, every edge is either a tree edge or back edge, i.e.,
forward edges or cross edges are not possible.
Example 2.4.2:
In the following directed graph, we consider the adjacent nodes in the increasing
alphabetic order and let starting vertex be.
Figure 2.4.2.2: a has unknown two neighbors a and d, by convention b is visited first, i.e the status
of b changes to discovered, d[a] =2
40
Graph Algorithms
Figure 2.4.2.3: b has two unknown neighbors c and d, by convention c is discovered first i.e.,
d[c] = 3
Figure 2.4.2.4: c has only a single neighbor a which is already discovered so c is terminated i.e.,
t[c] = 5
Figure 2.4.2.5: The algorithm backtracks recursively to b, the next unknown neighbor is d, whose
status is change to discovered i.e., d[d] = 5
41
Design Techniques-I
Figure 2.4.2.7: The algorithm backtracks recursively to b, which has no unknown neighbors, so
b(terminated) is visited i.e., t[b] = 7
Figure 2.4.2. 8: The algorithm backtracks to a which has no unknown neighbors so a is visited i.e.,
t[a] = 8.
Figure 2.4.2. 9: The connected component is visited so the algorithm moves to next component
starting from e (because we are moving in increasing alphabetic order) so e is
‘discovered’ i.e. , d[e] = 9
Figure 2.4.2. 10: e has two unknown neighbors f and g, by convention we discover f i.e.,
d[f] = 10
42
Graph Algorithms
Figure 2.4.2. 12: The algorithm backtracks to e, which has g as the next ‘unknown’ neighbor, g is
‘discovered’ i.e., d[g] = 12
Figure 2.4.2. 13: The only neighbor of g is e, which is already discovered, so g(terminates) is
‘visited’ i.e., t[g] = 12
Figure 2.4.2. 14: The algorithm backtracks to e, which has no unknown neighbors left so
e (terminates) is visit i.e., t[e] = 14
43
Design Techniques-I Some more properties of Depth first search (in directed graph)
(1) Given a directed graph, depth first search can be used to determine whether it
contains cycle.
(2) Cross-edges go from a vertex of higher discovery time to a vertex of lower
discovery time. Also a forward edge goes from a vertex of lower discovery time
to a vertex of higher discovery time.
(3) Tree edges, forward edges and cross edges all go from a vertex of higher
termination time to a vertex of lower finish time whereas back edges go from a
vertex of lower termination time to a vertex of higher termination time.
(4) A graph is acyclic if and only if f any depth first search forest of graph G yields
no back edges. This fact can be realized from property 3 explained above, that if
there are no back edges then all edges will go from a vertex of higher
termination time to a vertex of lower termination time. So there will be no
cycles. So the property which checks cycles in a directed graph can be verified
by ensuring there are no back edges.
For recording the status of each vertex, whether it is still unknown, whether it has
been discovered (or found) and whether all of its adjacent vertices have also been
discovered. The vertices are termed as unknown, discovered and visited respectively.
So if (u,v) ε E and u is visited then v will be either discovered or visited i.e., either v
has just been discovered or vertices adjacent to v have also been found or visited.
As breadth first search forms a breadth first tree, so if in the edge (u,v) vertex v is
discovered in adjacency list of an already discovered vertex u then we say that u is the
parent or predecessor vertex of V. Each vertex is discovered once only.
The data structure we use in this algorithm is a queue to hold vertices. In this
algorithm we assume that the graph is represented using adjacency list representation.
front[u] is used to represent the element at the front of the queue. Empty() procedure
returns true if queue is empty otherwise it returns false. Queue is represented as Q.
Procedure enqueue() and dequeue() are used to insert and delete an element from the
queue respectively. The data structure Status[]is used to store the status of each vertex
as unknown or discovered or visited.
The algorithm works as follows. Lines 1-2 initialize each vertex to ‘unknown’.
Because we have to start searching from vertex s, line 3 gives the status ‘discovered’
to vertex s. Line 4 inserts the initial vertex s in the queue. The while loop contains
statements from line 5 to end of the algorithm. The while loop runs as long as there
remains ‘discovered’ vertices in the queue. And we can see that queue will only
contain ‘discovered’ vertices. Line 6 takes an element u at the front of the queue and
in lines 7 to 10 12 the adjacency list of vertex u is traversed and each unknown vertex
v in the adjacency list of u, its status is marked as discovered, its parent is marked as u
and then it is inserted in the queue. In the line 13, vertex u is removed from the queue.
In line 14-15, when there are no more elements in adjacency list of u, vertex u is
removed from the queue its status is changed to ‘visited’ and is also printed as visited.
The algorithm given above can also be improved by storing the distance of each
vertex u from the source vertex s using an array distance[] and also by permanently
recording the predecessor or parent of each discovered vertex in the array parent[]. In
fact, the distance of each reachable vertex from the source vertex as calculated by the
BFS is the shortest distance in terms of the number of edges traversed. So next we
present the modified algorithm for breadth first search.
In the above algorithm the newly inserted line 3 initializes the parent of each vertex to
NULL, line 4 initializes the distance of each vertex from the source vertex to infinity,
line 6 initializes the distance of source vertex s to 0, line 7 initializes the parent of
source vertex s NULL, line 14 records the parent of v as u, line 15 calculates the
shortest distance of v from the source vertex s, as distance of u plus 1.
45
Design Techniques-I Example 2.5.3:
In the figure given below, we can see the graph given initially, in which only source s
is discovered.
We take unknown (i.e., undiscovered) adjacent vertex of s and insert them in queue,
first a and then b. The values of the data structures are modified as given below:
Next, after completing the visit of a we get the figure and the data structures as given
below:
46
Graph Algorithms
47
Design Techniques-I
Figure 2: We take unknown (i.e., undiscovered) adjacent vertices of s and insert them
in the queue.
Figure 3: Now the gray vertices in the adjacency list of u are b, c and d, and we can
visit any of them depending upon which vertex is inserted in the queue first. As in this
example, we have inserted b first which is now at the front of the queue, so next we
will visit b.
Figure 5: Vertices e and f are discovered as adjacent vertices of c, so they are inserted
in the queue and then c is removed from the queue and is visited.
48
Graph Algorithms
2.6 BEST FIRST SEARCH & MINIMAX
PRINCIPLE
Best First Search
In the two basic search algorithms we have studied before i.e., depth first search and
breadth first search we proceed in a systematic way by discovering/finding/exploring
nodes in a predetermined order. In these algorithms at each step during the process of
searching there is no assessment of which way to go because the method of moving is
fixed at the outset.
The best first search belongs to a branch of search algorithms known as heuristic
search algorithms. The basic idea of heuristic search is that, rather than trying all
possible search paths at each step, we try to find which paths seem to be getting us
nearer to our goal state. Of course, we can’t be sure that we are really near to our goal
state. It could be that we are really near to our goal state. It could be that we have to
take some really complicated and circuitous sequence of steps to get there. But we
might be able to make a good guess. Heuristics are used to help us make that guess.
To use any heuristic search we need an evaluation function that scores a node in the
search tree according to how close to the goal or target node it seems to be. It will just
be an estimate but it should still be useful. But the estimate should always be on the
lower side to find the optimal or the lowest cost path. For example, to find the
optimal path/route between Delhi and Jaipur, an estimate could be straight arial
distance between the two cities.
There are a whole batch of heuristic search algorithms e.g., Hill Climbing, best first
search, A* ,AO* etc. But here we will be focussing on best first search.
Best First Search combines the benefits of both depth first and breadth first search
by moving along a single path at a time but change paths whenever some other path
looks more promising than the current path.
At each step in the depth first search, we first generate the successors of the current
node and then apply a heuristic function to find the most promising child/successor.
We then expand/visit (i.e., find its successors) the chosen successor i.e., find its
unknown successors. If one of the successors is a goal node we stop. If not then all
these nodes are added to the list of nodes generated or discovered so fart. During this
process of generating successors a bit of depth search is performed but ultimately if
the solution i.e., goal node is not found then at some point the newly
found/discovered/generated node will have a less promising heuristic value than one
of the top level nodes which were ignored previously. If this is the case then we
backtrack to the previously ignored but currently the most promising node and we
expand /visit that node. But when we back track, we do not forget the older branch
from where we have come. Its last node remains in the list of nodes which have been
discovered but not yet expanded/ visited . The search can always return to it if at some
stage during the search process it again becomes the most promising node to move
ahead.
Choosing the most appropriate heuristic function for a particular search problem is not
easy and it also incurs some cost. One of the simplest heuristic functions is an
estimate of the cost of getting to a solution from a given node this cost could be in
terms of the number of expected edges or hops to be traversed to reach the goal node.
We should always remember that in best first search although one path might be
selected at a time but others are not thrown so that they can be revisited in future if the
selected path becomes less promising.
49
Design Techniques-I Although the example we have given below shows the best first search of a tree, it is
sometimes important to search a graph instead of a tree so we have to take care that
the duplicate paths are not pursued. To perform this job, an algorithm will work by
searching a directed graph in which a node represents a point in the problem space.
Each node, in addition to describing the problem space and the heuristic value
associated with it, will also contain a link or pointer to its best parent and points to its
successor node. Once the goal node is found, the parent link will allow us to trace the
path from source node to the goal node. The list of successors will allow it to pass the
improvement down to its successors if any of them are already existing.
• OPEN list Æ is the list of nodes which have been found but yet not expanded
i.e., the nodes which have been discovered /generated but whose
children/successors are yet not discovered. Open list can be implemented in the
form of a queue in which the nodes will be arranged in the order of decreasing
prioity from the front i.e., the node with the most promising heuristic value (i.e.,
the highest priority node) will be at the first place in the list.
• CLOSED list Æ contains expanded/visited nodes i.e., the anodes whose
successors are also genereated. We require to keep the nodes in memory I we
want to search a graph rather than a tree, since whenever a new node is generated
we need to check if it has been generated before.
The algorithm can be written as:
Best First Search
1. Place the start node on the OPEN list.
2. Create a list called CLOSED i.e., initially empty.
3. If the OPEN list is empty search ends unsuccessfully.
4. Remove the first node on OPEN list and put this node on CLOSED list.
5. If this is a goal node, search ends successfully.
6. Generate successors of this node:
For each successor :
(a) If it has bot been discovered / generated before i.e., it is not on OPEN,
evaluate this node by applying the heuristic function, add it to the OPEN
and record its parent.
(b) If it has been discovered / generated before, change the parent if the new
path is better than the previous one. In that case update the cost of getting to
this node and to any successors that this node may already have.
7. Reorder the list OPEN, according to the heuristic merit.
8. Go to step 3.
Example
In this example, each node has a heuristic value showing the estimated cost of getting
to a solution from this node. The example shows part of the search process using best
first search.
A
B 8 C 7
A B 8 C 7
D 4 E 9
50
Graph Algorithms
A A
B 8 C 7 B 8 C 7
D 4 E 9 B 3 B 3 B 5 D 4 E 9
A 9 A 5
2 A 9 A 11
Figure 4 Figure 5
Figure 3: As the estimated goal distance of C is less so expand C to find its successors
d and e.
Figure 4: Now D has lesser estimated goal distance i.e., 4 , so expand D to generate
F and G with distance 9 and 11 respectively.
Figure 5: Now among all the nodes which have been discovered but yet not expanded
B has the smallest estimated goal distance i.e., 8, so now backtrack and expand B and
so on.
Best first searches will always find good paths to a goal node if there is any. But it
requires a good heuristic function for better estimation of the distance to a goal node.
Minimax is a method in decision theory for minimizing the expected maximum loss.
It is applied in two players games such as tic-tac-toe, or chess where the two players
take alternate moves. It has also been extended to more complex games which require
general decision making in the presence of increased uncertainty. All these games
have a common property that they are logic games. This means that these games can
be described by a set of rules and premises. So it is possible to know at a given point
of time, what are the next available moves. We can also call them full information
games as each player has complete knowledge about possible moves of the adversary.
In the subsequent discussion of games, the two players are named as MAX and MIN.
We are using an assumption that MAX moves first and after that the two players will
move alternatively. The extent of search before each move will depend o the play
depth – the amount of lookahead measured in terms of pairs of alternating moves for
MAX and MIN.
51
Design Techniques-I generated in a few nanoseconds. Therefore, for many complex games, we must accept
the fact that search to termination is impossible instead we must use partial searching
techniques.
For searching we can use either breadth first, depth first or heuristic methods except
that the termination conditions must now be specified. Several artificial termination
conditions can be specified based on factors such as time limit, storage space and the
depth of the deepest node in the search tree.
In a two player game, the first step is to define a static evaluation function efun(),
which attaches a value to each position or state of the game. This value indicates how
good it would be for a player to reach that position. So after the search terminates, we
must extract from the search tree an estimate of the best first move by applying a
static evaluation function efun() to the leaf nodes of the search tree. The evaluation
function measures the worth of the leaf node position. For example, in chess a simple
static evaluation function might attach one point for each pawn, four points for each
rook and eight points for queen and so on. But this static evaluation is too easy to be
of any real use. Sometimes we might have to sacrifice queen to prevent the opponent
from a winning move and to gain advantage in future so the key lies in the amount of
lookahead. The more number of moves we are able to lookahead before evaluating a
move, the better will be the choice.
In analyzing game trees, we follow a convention that the value of the evaluation
function will increase as the position becomes favourable to player MAX , so the
positive values will indicate position that favours MAX whereas for the positions
favourable to player MIN are represented by the static evaluation function
having negative values and values near zero correspond to game positions not
favourable to either MAX or MIN. In a terminal position, the static evaluation
function returns either positive infinity or negative infinity where as positive infinity
represents a win for player MAX and negative infinity represents a win for the player
MIN and a value zero represents a draw.
In the algorithm, we give ahead, the search tree is generated starting with the current
game position upto the end game position or lookahead limit is reached. Increasing
the lookahead limit increases search time but results in better choice. The final game
position is evaluated from the MAX’s point of view. The nodes that belong to the
player MAX receive the maximum value of its children. The nodes for the player MIN
will select the minimum value of its children.
In the algorithm, lookahead limit represents the lookahead factor in terms of number
of steps, u and v represent game states or nodes, maxmove() and minmove() are
functions to describe the steps taken by player MAX or player MIN to choose a
move, efun() is the static evaluation function which attaches a positive or negative
integer value to a node ( i.e., a game state), value is a simple variable.
Now to move number of steps equal to the lookahead limit from a given game state u,
MAX should move to the game state v given by the following code :
maxval = -
for each game state w that is a successor of u
val = minmove(w,lookaheadlimit)
if (val >= maxval)
maxval = val
v=w // move to the state v
The minmove() function is as follows :
minmove(w, lookaheadlimit)
{
if(lookaheadlimit = = 0 or w has no successor)
52
Graph Algorithms
return efun(w)
else
minval = +
for each successor x of w
val = maxmove(x,lookaheadlimit – 1)
if (minval > val)
minval = val
return(minval)
}
maxmove(w, lookaheadlimit)
{
if (lookaheadlimit = = 0 or w has no successor)
return efun(w)
else
maxval = -
for each successor x of w
val = minmove(x,lookaheadlimit – 1)
if (maxval < val)
maxval = val
return(maxval)
}
We can see that in the minimax technique, player MIN tries to minimize te advantage
he allows to player MAX, and on the other hand player MAX tries to maximize the
advantage he obtains after each move.
Let us suppose the graph given below shows part of the game. The values of leaf
nodes are given using efun() procedure for a particular game then the value of nodes
above can be calculated using minimax principle.Suppose the lookahead limit is 4 and
it is MAX’s turn.
9
2
2 1
3 7 2 5 1
-7 3 -2 7 2 -4 -7 5 1 -2
5 -7 9 3 -2 8 7 2 5 -4 5 -7 8 3 1 5 2-2
53
Design Techniques-I spending time to search for children of the B node and so we can safely ignore all the
remaining children of B.
This shows that the search on same paths can sometimes be aborted ( i.e., it is not
required to explore all paths) because we find out that the search subtree will not take
us to any viable answer.
MAX
MIN
A5 3 B
5 7 3
This optimization is known as alpha beta pruning/procedure and the values, below
which search need not be carried out are known as alpha beta cutoffs.
• The alpha values of MAX nodes (including the start value) can never decrease.
• The beta value of MIN nodes can never increase.
So we can see that remarkable reductions in the amount of search needed to evaluate a
good move are possible by using alpha beta pruning / procedure.
54
Graph Algorithms
2.7 TOPOLOGICAL SORT
In many applications we are required to indicate precedences or dependencies among
various events. Using directed graphs we can easily represent these dependencies. Let
a directed graph G with vertex set V and edge set E. An edge from a vertex u to vertex
v in the directed graph will then mean that v is dependent on u or v precedes v. Also
there cannot be any cycles in these dependency graphs as it can be seen from the
following simple argument. Suppose that u is vertex and there is an edge from u to u,
i.e., there is a single node cycle. But the graph is dependency graph; this would mean
that vertex u is dependent on vertex u which means u must be processed before u
which is impossible.
A directed graph that does not have any cycles is known as directed acyclic graph.
Hence, dependencies or precedences among events can be represented by using
directed acyclic graphs.
There are many problems in which we can easily tell which event follows or precedes
a given event, but we can’t easily work out in which order all the events are held. For
example, it is easy to specify/look up prerequisite relationships between modules in a
course, but it may be hard to find an order to take all the modules so that all
prerequisite material is covered before the modules that depend on it. Same is the case
with a compiler evaluating sub-expressions of an expression like the following:
(a + b)(c ─ d) ─ (a ─ b)(c + d)
Both of these problems are essentially equivalent. The data of both problems can be
represented by directed acyclic graph (See figure below). In the first each node is a
module; in the second example each node is an operator or an operand. Directed edges
occur when one node depends on the other, because of prerequisite relationships
among courses or the parenthesis order of the expression. The problem in both is to
find an acceptable ordering of the nodes satisfying the dependencies. This is referred
to as a topological ordering. More formally it is defined below.
* *
+ ─ + ─
a b c d
The term topological sort comes from the study of partial orders and is sometimes
called a topological order or linear order.
The algorithm given below assumes that the direceted acyclic graph is represented
using adjacency lists. Each node of the adjacency list contains a variable indeg which
stores the indegree of the given vertex. Adj is an array of |V| lists, one for each vertex
in V.
Topological-Sort(G)
1 for each vertex u ε G
2 do indeg[u] = in-degree of vertex u
3 if indeg[u] = 0
4 then enqueue(Q,u)
5 while Q != 0
6 do u = dequeue(Q)
7 print u
8 for each v ε Adj[u]
9 do indeg[v] = indeg[v] – 1
10 if indeg[v] = 0
11 then enqueue(Q,v)
The for loop of lines 1-3 calculates the indegree of each node and if the indegree of
any node is found to be 0, then it is immediately enqueeud. The while loop of lines 5-
11 works as follows. We dequeue a vertex from the queue. Its indegree will be
zero(Why?). It then output the vertex, and decrement the in degree of each vertex
adjacent to u. If in the process, the in degree of any vertex adjacent to u becomes 0,
then it is also enqueued.
We can also use Depth First Search Traversal for topologically sorting a directed
acyclic graph. DFS algorithm can be slightly changed or used as it is to find the
topological ordering. We simply run DFS on the input directed acyclic graph and
insert the vertices of a node in a linked list or simply print the vertices in decreasing
order of the termination time.
To see why this approach work, suppose that DFS is run on a given dag G = (V,E) to
determine the finishing times for its vertices. Let u, v ε V, if there is an edge in G from
u to v, then termination time of v will be less than termination time of u i.e.,
t[v] < t[u]. Since, we output the vertices in decreasing order of termination time, the
vertex with least number of dependencies will be outputted first.
ALGORITHM
1. Run the DFS algorithm on graph G. In doing so compute the termination time
of each vertex.
2. Whenever a vertex is terminated (i.e. visited), insert it in the front o f a list.
3. Output the list.
56
Graph Algorithms
RUNNING TIME
Let n is the number of vertices (or nodes, or activities) and m is the number of edges
(constraints). As each vertex is discovered only once, and for each vertex we loop
over all its outgoing edges once. Therefore, total running time is O(n+m).
2.8 SUMMARY
This unit discusses some searching and sorting techniques for sorting those problems
each of which can be efficiently represented in the form of a graph. In a graphical
representation of a problem, generally, a node represents a state of the problem, and
an arrow/arc represents a move between a pair of states.
2.9 SOLUTIONS/ANSWERS
Ex. 1)
3,2 4,2
2,2
3,2
3,3
0,0
4,3
1,1
2,1
Ex.2)
A
B C
E F G H
D I K L
57
Design Techniques-I 1/
A
2/ 7/
B C
6/ 11/
3/ 8/
E F G H
D I K L
4/ 5/
9/ 10/
1/11
A
2/5 7/10
B C
6/4 11/9
3/3 8/8
E F G H
D I K L
4/1 5/2
9/6 10/7
Ex.3)
1/
B C
E F G H
58
Graph Algorithms
2/
1/
B C
E F G H
2/
1/
3/
B C
E F G H
59
Dynamic Programming
UNIT 1 DYNAMIC PROGRAMMING
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 8
1.2 The Problem of Making Change 8
1.3 The Principle of Optimality 13
1.4 Chained Matrix Multiplication 14
1.5 Matrix Multiplication Using Dynamic Programming 15
1.6 Summary 17
1.7 Solutions/Answers 18
1.8 Further Readings 21
1.0 INTRODUCTION
In the earlier units of the course, we have discussed some well-known techniques,
including the divide-and-conquer technique, for developing algorithms for
algorithmically solvable problems. The divide-and-conquer technique, though quite
useful in solving problems from a number of problem domains, yet in some cases, as
shown below, may give quite inefficient algorithms to solve problems.
n
Example 1.0.1: Consider the problem of computing binomial coefficient (or in
k
linear notation c (n, k)) where n and k are given non-negative integers with n≥k. One
way of defining and calculating the binomial coefficient is by using the following
recursive formula
1 if k = n or k = 0
n n − 1 n − 1
= + if 0 < k < n (1.0.1)
k k − 1 k
0 otherwise
The following recursive algorithm named Bin (n, k), implements the above formula
for computing the binomial coefficient.
If k = n or k = 0 then return 1
else return Bin (n−1, k−1) + Bin (n−1, k)
For computing Bin (n, k) for some given values of n and k, a number of terms Bin
(i, j), 1≤ i ≤ and 1 ≤ j ≤ k, particularly for smaller values of i and j, are repeatedly
calculated. For example, to calculate Bin (7, 5), we compute Bin (6, 5) and Bin (6, 4).
Now, for computing Bin (6, 5), we compute Bin (5, 4) and Bin (5, 5). But for
calculating Bin (6, 4) we have to calculate Bin (5, 4) again. If the above argument is
further carried out for still smaller values, the number of repetitions for Bin (i, j)
increases as values of i and j decrease.
For given values of n and k, in order to compute Bin (n, k), we need to call Bin (i, j)
for 1 ≤ i ≤ n ─ 1 and 1 ≤ j ≤ k─1 and as the values of i and j decrease, the number of
times Bin (i, j) is required to be called and executed generally increases.
The above example follows the Divide-and-Conquer technique in the sense that the
task of calculating C(n, k) is replaced by the two relatively simpler tasks, viz.,
calculating C(n−1, k) and C (n−1, k−1). But this technique, in this particular case,
5
Design Techniques-II makes large number of avoidable repetitions of computations. This is not an isolated
instance where the Divide-and-Conquer technique leads to inefficient solutions. In
such cases, an alternative technique, viz., Dynamic Programming, may prove quite
useful. This unit is devoted to developing algorithms using Dynamic Programming
technique. But before, we discuss the technique in more details, let us briefly discuss
underlying idea of the technique and the fundamental difference between Dynamic
Programming and Divide-and-Conquer technique.
Essential idea of Dynamic Programming, being quite simple, is that we should avoid
calculating the same quantity more than once, usually by keeping a table of known
results for simpler instances. These results, in stead of being calculated repeatedly,
can be retrieved from the table, as and when required, after first computation.
The (i, j) th entry of the table contains the value C(i, j). We know,
C (i, 0) = 1 for all i = 0, 1, 2,…., n and
C (0, j) = 0 for j = 1, 2, …., k
0 1 2 3 …….. k
0 1 0 0 0 …….. 0
1 1
2 1
3 1
. .
. .
. .
. .
n 1
C (i, j) = C (i – 1, j – 1) + C (i – 1, j).
6
After filling up the entries of the first row, the table takes the following form: Dynamic Programming
0 1 2 3 …….. k
0 1 0 0 0 0
1 1 1 0 0 0
2 1
. 1
. .
. .
. .
n 1
From the already calculated values of a given row i, adding successive pairs of
consecutive values, we get the values for (i + 1)th row. After completing the entries
for row with index 4, the table may appear as follows, where the blank entries to the
right of the main diagonal are all zeros.
0 1 2 3 4 ….. k
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
. .
. .
. .
. .
n 1
We Summarize below the process followed above, for calculating C(i, j):
First of all, the simplest values C(i, 0) = 1 for i = 1, 2, …., n and C(0, j) = 0 for
j ≥ 1, are obtained directly from the given formula. Next, more complex values are
calculated from the already available less complex values. Obviously, the above
mentioned process is a bottom-up one.
Though the purpose in the above discussion was to introduce and explain the Dynamic
Programming technique, yet we may also consider the complexity of calculating
C (n, k) using the tabular method.
(i) The value in column 0 is always 1 and hence need not be stored.
(ii) Initially 0th row is given by C(0,0) = 1 and C (0, j) = 0 for j = 1, 2, …., k. Once
any value of row 1, say C(1, j) is calculated the values C(0, j−1) and C(0, j) are
no more required and hence C (1, j) may be written in the space currently
occupied by C(0, j) and hence no extra space is required to write C (1, j).
In general, when the value C(i, j) of the ith row is calculated the value C (I – 1, j) is no
more required and hence the cell currently occupied by C (i –1, j) can be used to store
7
Design Techniques-II the value C(i, j). Thus, at any time, one row worth of space is enough to calculate
C(n, k). Therefore, space requirement is θ (k).
In the next sections, we shall discuss solution of some well-known problems, using
Dynamic Programming technique.
1.1 OBJECTIVES
After going through this unit, you should be able to:
We, in India, have currency notes or coins of denominations of Rupees 1, 2, 5, 10, 20,
50, 100, 500 and 1000. Suppose a person has to pay an amount of Rs. 5896 after
having decided to purchase an article. Then, the problem is about how to pay the
amount using minimum number of coins/notes.
8
of making payments. Further, let A, a positive integer, be the amount to be paid using Dynamic Programming
the above-mentioned coins. The problem is to use the minimum number of coins, for
the purpose.
The problem with above mentioned algorithm based on greedy technique, is that in
some cases, it may either fail or may yield suboptimal solutions. In order to establish
inadequacy of greedy technique based algorithms, we consider the following two
examples.
Example 1.2.2: Next, we consider another example, in which greedy algorithm may
yield a solution, but the solution may not be optimal, but only suboptimal. For this
purpose, we consider a hypothetical situation, in which currency notes of
denominations 1, 4 and 6 are available. And, we have to collect an amount of
8 = 6+1+1. But this solution uses three currency notes/coins, whereas another
solution using only two currency notes/coins, viz., 8 = 4+4, is available.
Next, we discuss how the Coin Problem is solved using Dynamic Programming
technique.
Each of the denomination di, 1 ≤ i ≤ k, is made a row label and each of the value j for
1 ≤ j ≤ A is made a column label of the proposed table as shown below, where A is the
amount to be paid:
Amount → 1 2 3 4 ….. j …. A
denomination
1= d1
d2
.
.
.
di C[i, j]
.
.
.
dk
9
Design Techniques-II In the table given on page no. 9, 0 < d1 < d2 < … < dk and C[i, j] denotes the
minimum number of coins of denominations d1, d2, …., di (only) that is used to make
an amount j, where more than one coin of the same denomination may be used. The
value C[i, j] is entered in the table with row label di and column label j.
Next, in respect of entries in the table, we make the following two observations:
(i) In order to collect an amount 0, we require zero number of coins, which is true
whether we are allowed to choose from, say, one of the successively larger sets
of denominations viz., {d1}, {d1, d2}, {d1, d2, d3}, …, {d1, d2, …, dk}. Thus,
entries in the table in the column with 0 as column label, are all 0’s.
(ii) If d1 ≠ 1, then there may be some amounts j (including j = 1) for which, even
with dynamic programming technique, no solution may exist, i.e., there
may not be any number of coins of denominations d1, d2, … dk for which
the sum is j. Therefore, we assume d1 = 1. The case d1 ≠ 1 may be handled
similarly.
As d1 = 1, therefore, the first row of the table in the jth column, contains j, the
number of coins of denomination only d1 = 1 to form value j.
0 1 2 3 4 ….. j …. A
d1 0 1 2 3 4 … j … A
d2 0
di 0
. .
. .
. .
dk 0
Next for i ≥ 2 and j ≥ 1, the value C[i , j], the minimum number of coins of
denominations upto di (only) to sum up to j, can be obtained recursively through
either of the following two ways:
10
By definition, C[i, j] = min { 1 + C[ i, j – di], C [ i─1, j]} and can be calculated, as Dynamic Programming
the two involved values viz., C [ i, j – di] and C [ i─1, j] are already known.
Comment 1.2.3
If j < di in case (1.1) then the Equation (1.1) is impossible. Mathematically we can
say C [ i, j – di] = ∞ if j < di, because, then the case is automatically excluded from
consideration for calculating C[i, j].
Similarly we take
C[i−1, j] = ∞ if i<1
Following the above procedure, C{k, A] gives the desired number.
In order to explain the above method, let us consider the earlier example for which
greedy algorithm gave only suboptimal solution.
Example 1.2.4: Using Dynamic Programming technique, find out minimum number
of coins required to collect Rupees 8 out of coins of denominations 1, 4, 6.
From the earlier discussion we already know the following portion of the table to be
developed using Dynamic Programming technique.
0 1 2 3 4 5 6 7 8
d1 = 1 0 1 2 3 4 5 6 7 8
d2 = 4 0
di = 6 0
Next we consider
Next, interesting case is C[2, 4], i.e., to find out minimum number of coins to make an
amount 4 out of coins of denominations 1, 4, 6. By definition,
11
Design Techniques-II C[2, 4] = min {1 + C (2, 4 ─ 4), C(1, 4)}
But C[2, 0] = 0 and C [1, 4] = 4, therefore,
= min {1 + 0, 4} = 1 C[2, 4]
By following the method explained through the above example steps, finally, we get
the table as
0 1 2 3 4 5 6 7 8
d1 = 1 0 1 2 3 4 5 6 7 8
d2 = 4 0 1 2 3 1 2 3 4 2
di= 6 0 1 2 3 1 2 1 2 2
Let us formalize the method explained above of computing C[k, A], in the general
case, in the form of the following algorithm:
array C [1… k, 0… A]
For i = 1 to k
Read (d [i ] )
{reads the various denominations available with each denomination coin in sufficient
numbers}.
{assuming d1 = 1, initialize the table C { } as follows}
For i = 1 to k
C[ i, 0] = 0
For j = 1to A
C [ 1, j] = j
Comments 1.2.5
Comment 1: The above algorithm explicitly gives only the number of coins, which
are minimum to make a pre-assigned amount A, yet it can also be used to determine
the set of coins of various denominations that add upto A.
By definition, C [i, j] = either 1 + C [i, j − di] or C [i −1, j], which means either we
choose a coin of denomination di or we do not choose a coin of denomination di,
12
depending upon whether 1 + C [i, j − di] ≤ C [ i ─ 1, j] or not. Applying the above rule Dynamic Programming
recursively for decreasing values of i and j, we know which coins are chosen for
making an amount j out of the available coins.
While using the Dynamic Programming technique in solving the coin problem we
tacitly assumed the above mentioned principle in defining C [i, j], as minimum of
1 + C [i, j − di] and C [ i – 1, j]. The principle is used so frequently and without our
being aware of having used it.
However, we must be aware that there may be situations in which the principle may
not be applicable. The principle of optimality may not be true, specially, in situations
where resources used over the (sub) components may be more than total resources
used over the whole, and where total resources are limited and fixed. A simple
example may explain the point. For example, suppose a 400-meter race champion
takes 42 seconds to complete the race. However, covering 100 meters in 10.5 seconds
may not be the best/optimal solution for 100 meters. The champion may take less
than 10 seconds to cover 100 meters. The reason being total resources (here the
concentration and energy of the athlete) are limited and fixed whether distributed
over 100 meters or 400 meters.
Similarly, for a vehicle with best performance over 100 miles, can not be thought of
in terms of 10 times best performance over 10 miles. First of all, fuel usage after
some lower threshold, increases with speed. Therefore, as the distance to be covered
increases (e.g., from 10 to 100 miles) then fuel has to be used more cautiously,
restraining the speed, as compared to when distance to be covered is less (e.g., 10
miles). Even if refuelling is allowed, refuelling also takes time. The drivers
concentration and energy are other fixed and limited resources; which in the case of
shorter distance can be used more liberally as compared to over longer distances, and
in the process produce better speed over short distances as compared to over long
distances. The above discussion is for the purpose of driving attention to the fact that
principle of optimality is not universal, specially when the resources are limited and
fixed. Further, it is to draw attention that Dynamic Programming technique assumes
validity of Principle of Optimality for the problem domain. Hence, while applying
Dynamic Programming technique for solving optimisation problems, in order for the
validity of the solution based on the technique, we need to ensure that the Optimality
Principle is valid for the problem domain.
Ex. 1) Using Dynamic Programming, solve the following problem (well known as
Knapsack Problem). Also write the algorithm that solves the problem.
13
Design Techniques-II into pieces∗. In other words, either a whole object is to be included or it has to be
excluded.
(iv) Though, for three or more matrices, matrix multiplication is associative, yet
the number of scalar multiplications may very significantly depending upon
how we pair the matrices and their product matrices to get the final product.
Summering: The product of matrices A (14 × 6), B(6 × 90) and C(90 × 4) takes
12600 scalar operators when first the product of A and B is computed and then
product AB is multiplied with C. On the other hand, if the product BC is calculated
first and then product of A with matrix BC is taken then only 2496 scalar
multiplications are required. The later number is around 20% of the former. In case
when large number of matrices are to be multiplied for which the product is defined,
proper parenthesizing through pairing of matrices, may cause dramatic saving in
number of scalar operations.
∗
Another version allows any fraction xi with 0≤xi≤1. However, in this problem, we
assume either xi = 1 or xi = 0.
14
This raises the question of how to parenthesize the pairs of matrices within the Dynamic Programming
expression A1A2 … An, a product of n matrices which is defined; so as to optimize
the computation of the product A1A2 … An. The product is known as Chained
Matrix Multiplication.
Brute-Force Method: One way for finding the optimal method (i.e., the method
which uses minimum number of scalar (numerical) operations) is to parenthesize the
expression A1A2 … An in all possible ways and calculate number of scalar
multiplications required for each way. Then choose the way which requires minimum
number of scalar multiplications.
However, if T(n) denotes the number of ways of putting parentheses for pairing the
expression A1A2 … An, T(n) is an exponentially increasing function. The rate at
which values of T(n) increase may be seen from the following table of values of T(n)
for n = 1, 2, …….
n : 1 2 3 4 5 … 10 … 15
T(n): 1 1 2 5 14 … 4862 … 2674440
Hence, it is almost impossible to use the method for determining how to optimize the
computation of the product A1A2 … An.
Thus, the Dynamic Programming technique can be applied to the problem, and is
discussed below:
Let us first define the problem. Let Ai, 1 ≤ i ≤ n, be a di-1 × di matrix. Let the vector
d [ 0.. n] stores the dimensions of the matrices, where the dimension of
Ai is di-1 × di for i = 1, 2, …, n. By definition, any subsequence Aj…Ak of
A1A2 … An for 1 ≤ j ≤ k ≤ n is a well-defined product of matrices. Let us consider a
table m [1.. n, 1 .. n] in which the entries mij for 1 ≤ i ≤ j ≤ n, represent optimal (i.e.,
minimum) number of operations required to compute the product matrix (Ai … Aj).
We fill up the table diagonal-wise, i.e., in one iteration we fill-up the table one
diagonal mi, i+s, at a time, for some constant s ≥ 0. Initially we consider the biggest
diagonal mii for which s = 0. Then next the diagonal mi, i+s for s = 1 and so on.
Now mii stands for the minimum scalar multiplications required to compute the
product of single matrix Ai. But number of scalar multiplications required are zero.
Hence,
mii = 0 for i = 1, 2, … n.
15
Design Techniques-II Filling up entries for mi(i +1) for i = 1, 2, …, (n – 1).
mi(i +1) denotes the minimum number of scalar multiplication required to find the
product Ai Ai+1. As Ai is di-1 × di matrix and Ai+1 is di × di+1 matrix. Hence,
there is a unique number for scalar multiplication for computing Ai Ai+1 giving
Assuming optimal number of scalar multiplication viz., mi,j and mi+1, j are already
known, we can say that
for i = 1, 2, …, n – s.
When the term di-1 dj di+s represents the number of scalar multiplications required to
multiply the resultant matrices (Ai … Aj) and (Aj+1 … Ai+s)
Summing up the discussion, we come the definition mi,i+s for i=1,2, …(n −1) as
Let us illustrate the algorithm to compute mj+1,i+s discussed above through an example
A1 of order 14 × 6
A2 of order 6 × 90
A3 of order 90 × 4
A4 of order 4 × 35
16
m12 = d0 d1 d2 = 14 × 6 × 90 = 7560 Dynamic Programming
m23 = d1 d2 d3 = 6 × 90 × 4 = 3240
m34 = d2 d3 d4 = 9 × 4 × 35 = 1260
1 2 3 4
1 0 7560
2 0 3240
3 0 1260
4 0
Finally, for s = 3
1.6 SUMMARY
17
Design Techniques-II (5) The Knapsack Problem: We are given n objects and a knapsack. For i = 1,
2, …, n, object i has a positive weight wi and a positive value vi. The
knapsack can carry a weight not exceeding W. The problem requires that
the knapsack is filled in a way that maximizes the value of the objects
included in the knapsack.
1.7 SOLUTIONS/ANSWERS
Ex. 1)
First of all, it can be easily verified that Principle of Optimality is valid in the
sense for an optimal solution of the overall problem each subsolution is also
optimal. Because in this case, a non-optimal solution of a subproblem, when
replaced by a better solution of the subproblem, would lead to a better than
optimal solution of the overall problem. A contradiction.
In order to label the rows we first of all, order the given objects according to
increasing relative values R = v/w.
Thus first object O1 is the one with minimum relative value R1. The object O2 is
the one with next least relative value R2 and so on. The last object, in the
sequence, is On, with maximum relative weight Rn.
The ith row of the Table Corresponds to object Oi having ith relative value,
when values are arranged in increasing order. The jth column corresponds to
weight j for 0≤j ≤W. The entry Knap [i, j] denotes the maximum value that can
be packed in knapsack when objects O1, O2, …,Oi only are used and the
included objects have weight at most j.
Next, in order to fill up the entries Knap[i, j], 1≤i ≤ n and 0≤j ≤W, of the table,
we can check, as was done in the coin problem that,
∗
Another version allows any fraction xi with 0≤xi≤1. However, in this problem, we assume either xi = 1
or xi = 0.
18
0 1 2 3 4 … j … W Dynamic Programming
1 0 V V V V V V
2 0
. .
. .
. .
j 0
. .
. .
. .
n 0
Further, when calculating Knap [i, j], there are two possibilities: either
(i) The ith object Oi is taken into consideration, then
Knap [i, j] = Knap [i−1, j − wi] + vi or
(ii) The ith object Oi is not taken into consideration then
Knap [i, j] = V[i−1, j]
Thus we define
Knap [i, j] = max {Knap [i−1, j], Knap [i−1, j - wi] + vi}
The above equation is valid for i = 2 and j ≥ wi. In order that the above equation may
be applicable otherwise also, without violating the intended meaning we take,
We explain the Dynamic Programming based solution suggested above, through the
following example.
We are given six objects, whose weights are respectively 1, 2, 5, 6, 7, 10 units and
whose values respectively are 1, 6, 18, 22, 28, 43. Thus relative values in increasing
order are respectively 1.00, 3.00, 3.60, 3.67, 4.00 and 4.30. If we can carry a
maximum weight of 12, then the table below shows that we can compose a load
whose value is 49.
Weights 0 1 2 3 4 5 6 7 8 9 10 11 12
Relative Values
w1= 1, v1 = 1, R1 = 1.00 0 1 1 1 1 1 1 1 1 1 1 1 1
w2 = 2, v2 = 6, R2 = 3.00 0 1 6 7 7 7 7 7 7 7 7 7 7
w3 = 5, v3 = 18, R3 = 3.60 0 1 6 7 7 18 19 24 25 25 25 25 25
W4 = 6, v4 = 22, R4 =3.67 0 1 6 7 7 18 22 24 28 29 29 40 41
W5 = 7, v5 = 28 R5 = 4.00 0 1 6 7 7 18 22 28 29 34 35 40 46
19
Design Techniques-II Algorithm for the solution of the solution explained above of the Knapsack
Problem.
begin
read (Weight [i ] )
read (value [i ] )
R [ i ] / weight [ i ]
end
For j = 1 to n do
Begin
k=j
For t = j + 1 to n do
If R [ t ] < R [ k ] then
k=t
Exchange (R [ j ], R [ k ]);
Exchange (Weight [ j ], Weight [ k ]);
Exchange (Value [ j ], value [ k ] );
end
{At this stage R[ 1.. n] is a sorted array in increasing order and Weight [ j] and value
[ j ] are respectively weight and value for jth least relative value}
{Next, we complete the table knap for the problem}
For i = 1 to n do
Knap [ i , 0] = 0 for i = 1, …, n
Knap [ 1, j] = value [ 1 ] for j = 1, …, W
If i ≤ 0 and j ≥ 0 then
Knap [ i , j ] = 0
Else if j < 0 ─ then
Knap [ i, j ] = ─ ∞
For i = 2 to n do
For j = 1 to W do
Knap [ i, j] = maximum {Knap [i –1, j], Knap [ i − 1, j – Weight [i] ]
+ value [ i ]
Retu rn Knap [ n, W]
20
Dynamic Programming
1.8 FURTHER READINGS
1. Foundations of Algorithms, R. Neapolitan & K. Naimipour:
(D.C. Health & Company, 1996).
2. Algoritmics: The Spirit of Computing, D. Harel, (Addison-Wesley
Publishing Company, 1987).
3. Fundamental Algorithms (Second Edition), D.E. Knuth, (Narosa Publishing
House).
4. Fundamentals of Algorithmics, G. Brassard & P. Brately, (Prentice-Hall
International, 1996).
5. Fundamentals of Computer Algorithms, E. Horowitz & S. Sahni, (Galgotia
Publications).
6. The Design and Analysis of Algorithms, Anany Levitin, (Pearson Education,
2003).
7. Programming Languages (Second Edition) ─ Concepts and Constructs, Ravi
Sethi, (Pearson Education, Asia, 1996).
21
Design Techniques-II
UNIT 2 GREEDY TECHNIQUES
Structure Page Nos.
2.0 Introduction 22
2.1 Objectives 23
2.2 Some Examples 23
2.3 Formalization of Greedy Technique 25
2.3.1 Function Greedy-Structure (GV: set): Set
2.4 Minimum Spanning Tree 27
2.5 Prim’s Algorithm 31
2.6 Kruskal’s Algorithm 34
2.7 Dijkstra’s Algorithm 38
2.8 Summary 41
2.9 Solutions/Answers 41
2.10 Further Readings 46
2.0 INTRODUCTION
Algorithms based on Greedy technique are used for solving optimization problems.
An optimization problem is one in which some value (or set of values) of interest is
required to be either minimized or maximized w.r.t some given relation on the values.
Such problems include maximizing profits or minimizing costs of, say, production of
some goods. Other examples of optimization problems are about
• finding the minimum number of currency notes required for an amount, say of
Rs. 289, where arbitrary number of currency notes of each denomination from
Rs. 1 to Rs. 100 are available, and
• finding shortest path covering a number of given cities where distances between
pair of cities are given.
As we will study later, the algorithms based on greedy technique, if exist, are easy to
think of, implement and explain about. However, for many interesting optimization
problems, no algorithm based on greedy technique may yield optimal solution. In
support of this claim, let us consider the following example:
Example 1.1: Let us suppose that we have to go from city A to city E through either
city B or city C or city D with costs of reaching between pairs of cities as shown
below:
A
5000
3000 4000
B C D D
8000
5000
4000
E
Figure 2.0.1
Then the greedy technique suggests that we take the route from A to B, the cost of
which Rs.3000, is the minimum among the three costs, (viz., Rs. 3000, Rs. 4000 and
Rs. 5000) of available routes.
22
However, at B there is only one route available to reach E. Thus, greedy algorithm Greedy Techniques
suggests the route from A to B to E, which costs Rs.11000. But, the route from A to C
to E, costs only Rs.9000. Also, the route from A to D to E costs also Rs.9000.
Thus, locally better solution, at some stage, suggested by greedy technique yields
overall (or globally) costly solution.
2.1 OBJECTIVES
After studying unit, you should be able to:
Example 2.2.1
In this example, we discuss, how intuitively we attempt to solve the Minimum
Number of Notes Problem , to be specific, to make an amount of Rs.289/-.
23
Design Techniques-II Solution: Intuitively, to begin with, we pick up a note of denomination D, satisfying
the conditions.
i) D ≤ 289 and
ii) if D1 is another denomination of a note such that D1 ≤ 289, then D1 ≤ D.
In other words, the picked-up note’s denomination D is the largest among all the
denominations satisfying condition (i) above.
To deliver Rs. 289 with minimum number of currency notes, the notes of different
denominations are chosen and rejected as shown below:
Example 2.2.2
Next, we consider an example in which for a given amount A and a set of available
denominations, the greedy algorithm does not provide a solution, even when a
solution by some other method exists.
Let us consider a hypothetical country in which notes available are of only the
denominations 20, 30 and 50. We are required to collect an amount of 90.
24
notes becomes 100, which is greater than 90. Therefore, we do not pick up Greedy Techniques
any note of denomination 50 or above.
iii) Therefore, we pick up a note of next denomination, viz., of 30. The amount
made up by the sum of the denominations 50 and 30 is 80, which is less then
90. Therefore, we accept a note of denomination 30.
iv) Again, we can not pick up another note of denomination 30, because
otherwise the sum of denominations of picked up notes, becomes 80+30=110,
which is more than 90. Therefore, we do not pick up only note of
denomination 30 or above.
v) Next, we attempt to pick up a note of next denomination, viz., 20. But, in that
case the sum of the denomination of the picked up notes becomes 80+20=100,
which is again greater than 90. Therefore, we do not pick up only note of
denomination 20 or above.
vi) Next, we attempt to pick up a note of still next lesser denomination. However,
there are no more lesser denominations available.
Thus, we get 90 and , it can be easily seen that at least 3 notes are required to make an
amount of 90. Another alternative solution is to pick up 3 notes each of denomination
30.
Example 2.2.3
Next, we consider an example, in which the greedy technique, of course, leads to a
solution, but the solution yielded by greedy technique is not optimal.
Again, we consider a hypothetical country in which notes available are of the only
denominations 10, 40 and 60. We are required to collect an amount of 80.
Using the greedy technique, to make an amount of 80, first, we use a note of
denomination 60. For the remaining amount of 20, we can choose note of only
denomination 10. And , finally, for the remaining amount, we choose another note of
denomination 10. Thus, greedy technique suggests the following solution using 3
notes: 80 = 60 + 10 + 10.
Ex.1) Give another example in which greedy technique fails to deliver an optimal
solution.
25
Design Techniques-II mentioned. Otherwise, it is assumed that each candidate value can be used as
many times as required for the solution using greedy technique. Let us call this
set as
GV: Set of Given Values
(ii) Set (rather multi-set) of considered and chosen values: This structure
contains those candidate values, which are considered and chosen by the
algorithm based on greedy technique to reach a solution. Let us call this
structure as
CV: Structure of Chosen Values
The structure is generally not a set but a multi-set in the sense that values may
be repeated. For example, in the case of Minimum Number of Notes problem,
if the amount to be collected is Rs. 289 then
CV = {100, 100, 50, 20, 10, 5, 2, 2}
(iii) Set of Considered and Rejected Values: As the name suggests, this is the set
of all those values, which are considered but rejected. Let us call this set as
RV: Set of considered and Rejected Values
A candidate value may belong to both CV and RV. But, once a value is put in
RV, then this value can not be put any more in CV. For example, to make an
amount of Rs. 289, once we have chosen two notes each of denomination 100,
we have
CV = {100, 100}
At this stage, we have collected Rs. 200 out of the required Rs. 289. At this
stage RV = {1000, 500}. So, we can chose a note of any denomination except
those in RV, i.e., except 1000 and 500. Thus, at this stage, we can chose a note
of denomination 100. However, this choice of 100 again will make the total
amount collected so far, as Rs. 300, which exceeds Rs. 289. Hence we reject
the choice of 100 third time and put 100 in RV, so that now RV = {1000, 500,
100}. From this point onward, we can not chose even denomination 100.
26
addition of 100 to the values already in CV, the total value becomes 300 which Greedy Techniques
exceeds 289, the value 100 is rejected and put in RV. Next, the function SelF
attempts the next lower denomination 50. The value 50 when added to the sum
of values in CV gives 250, which is less than 289. Hence, the value 50 is
returned by the function SelF.
(vi) The Feasibility-Test Function, say FeaF. When a new value say v is chosen
by the function SelF, then the function FeaF checks whether the new set,
obtained by adding v to the set CV of already selected values, is a possible part
of the final solution. Thus in the case of Minimum Number of Notes problem,
if amount to be collected is Rs. 289 and at some stage, CV = {100, 100}, then
the function SelF returns 50. At this stage, the function FeaF takes the control.
It adds 50 to the sum of the values in CV, and on finding that the sum 250 is
less than the required value 289 informs the main/calling program that {100,
100, 50} can be a part of some final solution, and needs to be explored further.
(vii) The Objective Function, say ObjF, gives the value of the solution. For
example, in the case of the problem of collecting Rs. 289; as CV = {100, 100,
50, 20, 10, 5, 2, 2} is such that sum of values in CV equals the required value
289, the function ObjF returns the number of notes in CV, i.e., the number 8.
After having introduced a number of sets and functions that may be required by
an algorithm based on greedy technique, we give below the outline of greedy
technique, say Greedy-Structure. For any actual algorithm based on greedy
technique, the various structures the functions discussed above have to be
replaced by actual functions.
These functions depend upon the problem under consideration. The Greedy-
Structure outlined below takes the set GV of given values as input parameter
and returns CV, the set of chosen values. For developing any algorithm based
on greedy technique, the following function outline will be used.
27
Design Techniques-II Definitions
A Spanning tree of a connected graph, say G = (V, E) with V as set of vertices and E
as set of edges, is its connected acyclic subgraph (i.e., a tree) that contains all the
vertices of the graph.
A minimum spanning tree of a weighted connected graph is its spanning tree of the
smallest weight, where the weight of a tree is defined as the sum of the weights on all
its edges.
The minimum spanning tree problem has a number of useful applications in the
following type of situations:
Suppose, we are given a set of cities alongwith the distances between each pair of
cities. In view of the shortage of funds, it is desired that in stead of connecting
directly each pair of cities, we provide roads, costing least, but allowing passage
between any pair cities along the provided roads. However, the road between some
pair of cities may not be direct, but may be passing through a number of other cities.
Next, we illustrate the concept of spanning tree and minimum spanning tree through
the following example.
1
a b
5 2
c d
3
Figure: 2.4.1
For the graph of Figure 2.4.1given above, each of Figure 2.4.2, Figure, 2. 4.3 and
Figure. 2.4.4 shows a spanning tree viz., T1, T2 and T3 respectively.
Out of these, T1 is a minimal spanning tree of G, of weight 1+2+3 = 6.
1
a b
3
c d
T1
Figure: 2.4.2
28
Greedy Techniques
1
a b
c d
T2
Figure: 2.4.3
1
a b
5 2
c d
T3
Figure: 2.4.4
Remark 2.4.1:
The weight may denote (i) length of an edge between pair of vertices or (ii) the cost of
reaching from one town to the other or (iii) the cost of production incurred in reaching
from one stage of production to the immediate next stage of production or (iv) the cost
of construction of a part of a road or of laying telephone lines between a pair of towns.
Remark 2.4.2:
The weights on edges are generally positive. However, in some situations the weight
of an edge may be zero or even negative. The negative weight may appear to be
appropriate when the problem under consideration is not about ‘minimizing costs’ but
about ‘maximizing profits’ and we still want to use minimum spanning tree
algorithms. However, in such cases, it is not appropriate to use negative weights,
because, more we traverse the negative-weight edge, lesser the cost. However, with
repeated traversals of edges, the cost should increase in stead of decreasing.
29
Design Techniques-II we need to find appropriate values of the various sets and functions discussed in
Section 3.
In the case of the problem of finding minimum-spanning tree for a given
connected graph, the appropriate values are as follows:
(i) GV: The set of candidate or given values is given by
GV = E, the set of edges of the given graph (V, E).
(ii) CV: The structure of chosen values is given by those edges from E, which
together will form the required minimum-weight spanning tree.
(iii) RV: set of rejected values will be given by those edges in E, which at some
stage will form a cycle with earlier selected edges.
(iv) In the case of the problem of minimum spanning tree, the function SolF
that checks whether a solution is reached or not, is the function that checks
that
(a) all the edges in CV form a tree,
(b) the set of vertices of the edges in CV equal V, the set of all edges in the
graph, and
(c) the sum of the weights of the edges in CV is minimum possible of the
edges which satisfy (a) and (b) above.
(v) Selection Function: depends upon the particular algorithm used for the
purpose. There are two well-known algorithms, viz., Prim’s algorithm and
Kruskal’s algorithm for finding the Minimum Spanning Tree. We will
discuss these algorithms in detail in subsequent sections.
(vi) FeaF: Feasibility Test Function: In this case, when the selection function
SelF returns an edge depending on the algorithm, the feasibility test function
FeaF will check whether the newly found edge forms a cycle with the earlier
slected edges. If the new edge actually forms a cycle then generally the newly
found edge is dropped and search for still another edge starts. However, in
some of the algorithms, it may happen that some earlier chosen edge is
dropped.
(vii) In the case of Minimum Spanning Tree problem, the objective function may
return
(a) the set of edges that constitute the required minimum spanning tree and
(b) the weight of the tree selected in (a) above.
5 2
c d
3
4
8
30
Greedy Techniques
2.5 PRIM’S ALGORITHM
The algorithm due to Prim builds up a minimum spanning tree by adding edges to
form a sequence of expanding subtrees. The sequence of subtrees is represented by the
pair (VT, ET), where VT and ET respectively represent the set of vertices and the set of
edges of a subtree in the sequence. Initially, the subtree, in the sequence, consists of
just a single vertex which is selected arbitrarily from the set V of vertices of the given
graph. The subtree is built-up iteratively by adding an edge that has minimum weight
among the remaining edges (i.e., edge selected greedily) and, which at the same time,
does not form a cycle with the earlier selected edges.
5 2
c d
3
1.5
e
Figure: 2.5.1
In the first iteration, the edge having weight which is the minimum of the weights
of the edges having a as one of its vertices, is chosen. In this case, the edge ab with
weight 1 is chosen out of the edges ab, ac and ad of weights respectively 1,5 and 2.
Thus, after First iteration, we have the given graph with chosen edges in bold and
VT and ET as follows:
1
VT = (a, b) a b
ET = ( (a,b))
5 2
c d
3
1.5
e
Figure: 2.5.2
31
Design Techniques-II In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edge and having either a or b as one of its vertices, the edge with
minimum weight is chosen. In this case the vertex b does not have any edge
originating out of it. In such cases, if required, weight of a non-existent edge may be
taken as ∞. Thus choice is restricted to two edges viz., ad and ac respectively of
weights 2 and 5. Hence, in the next iteration the edge ad is chosen. Hence, after
second iteration, we have the given graph with chosen edges and VT and ET as
follows:
VT = (a, b, d) 1
a b
ET = ((a, b), (a, d))
5 2
c d
3
1.5
Figure: 2.5.3
In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edges and having either a, b or d as one of its vertices, the edge with
minimum weight is chosen. Thus choice is restricted to edges ac, dc and de with
weights respectively 5, 3, 1.5. The edge de with weight 1.5 is selected. Hence, after
third iteration we have the given graph with chosen edges and VT and ET as
follows:
VT = (a, b, d, e) 1
ET = ((a, b), (a, d), (d, e)) a b
5 2
c d
3
1.5
Figure: 2.5.4
In the next iteration, out of the edges, not chosen earlier and not making a cycle with
earlier chosen edge and having either a, b, d or e as one of its vertices, the edge with
minimum weight is chosen. Thus, choice is restricted to edges dc and ac with weights
respectively 3 and 5. Hence the edge dc with weight 3 is chosen. Thus, after fourth
iteration, we have the given graph with chosen edges and VT and ET as follows:
32
1
VT = (a, b, d, e, c) a b Greedy Techniques
ET = (( a b), (a d) (d e) (d c))
5 2
c d
3
1.5
e
Figure: 2.5.5
At this stage, it can be easily seen that each of the vertices, is on some chosen edge
and the chosen edges form a tree.
Given below is the semiformal definition of Prim’s Algorithm
Algorithm Spanning-Prim (G)
// the algorithm constructs a minimum spanning tree
// for which the input is a weighted connected graph G = (V, E)
// the output is the set of edges, to be denoted by ET, which together constitute a
minimum spanning tree of the given graph G
// for the pair of vertices that are not adjacent in the graph to each other, can be given
// the label ∞ indicating “infinite” distance between the pair of vertices.
// the set of vertices of the required tree is initialized with the vertex v0
VT ← {vo }
ET ← φ // initially ET is empty
// let n = number of vertices in V
For i = 1 to n - 1 do
find a minimum-weight edge e = (v1, u1) among all the edges such that v1 is in VT and
u1 is in V – VT.
VT ← VT ∪ { u1}
ET = ET ∪{ e}
Return ET
Ex. 3) Using Prim’s algorithm, find a minimal spanning tree for the graph given
below: a
1
b
5 2
c d
3
4
8
33
Design Techniques-II
2.6 KRUSKAL’S ALGORITHM
Next, we discuss another method, of finding minimal spanning tree of a given
weighted graph, which is suggested by Kruskal. In this method, the emphasis is on
the choice of edges of minimum weight from amongst all the available edges, of
course, subject to the condition that chosen edges do not form a cycle.
The connectivity of the chosen edges, at any stage, in the form of a subtree, which was
emphasized in Prim’s algorithm, is not essential.
We briefly describe the Kruskal’s algorithm to find minimal spanning tree of a given
weighted and connected graph, as follows:
(i) First of all, order all the weights of the edges in increasing order. Then repeat
the following two steps till a set of edges is selected containing all the vertices
of the given graph.
(ii) Choose an edge having the weight which is the minimum of the weights of the
edges not selected so far.
(iii) If the new edge forms a cycle with any subset of the earlier selected edges, then
drop it, else, add the edge to the set of selected edges.
Example 2.6.1:
Let us consider the following graph, for which the minimal spanning tree is required.
1
a b
5 4.2
c d
3
Figure: 2.6.1
Let Eg denote the set of edges of the graph that are chosen upto some stage.
According to the step (i) above, the weights of the edges are arranged in increasing
order as the set
{1, 3, 4.2 ,5, 6}
In the first iteration, the edge (a,b) is chosen which is of weight 1, the minimum of
all the weights of the edges of the graph.
As single edge do not form a cycle, therefore, the edge (a,b) is selected, so that
Eg = ((a,b))
34
After first iteration, the graph with selected edges in bold is as shown below: Greedy Techniques
1
a b
5 4.2
c d
3
Figure: 2.6.2
Second Iteration
Next the edge (c,d) is of weight 3, minimum for the remaining edges. Also edges
(a,b) and (c,d) do not form a cycle, as shown below. Therefore, (c,d) is selected
so that,
Eg = ((a,b), (c,d))
Thus, after second iteration, the graph with selected edges in bold is as shown below:
1
a b
5 4.2
c d
3
Figure: 2.6.3
It may be observed that the selected edges do not form a connected subgraph or
subtree of the given graph.
Third Iteration
Next, the edge (a,d) is of weight 4.2, the minimum for the remaining edges. Also the
edges in Eg alongwith the edge (a,d) do not form a cycle. Therefore, (a,d) is selected
so that new Eg = ((a,b), (c,d), (a,d)). Thus after third iteration, the graph with selected
edges in bold is as shown below.
35
1
Design Techniques-II
a b
5 4.2
c d
3
e
Figure: 2.6.4
Fourth Iteration
Next, the edge (a,c) is of weight 5, the minimum for the remaining edge. However,
the edge (a,c) forms a cycles with two edges in Eg, viz., (a,d) and (c,d). Hence (a,c) is
not selected and hence not considered as a part of the to-be-found spanning tree.
1
a b
5 4.2
c d
3
e Figure: 2.6.5
At the end of fourth iteration, the graph with selected edges in bold remains the same
as at the end of the third iteration, as shown below:
1
a b
5 4.2
c d
3
e
Figure: 2.6.6
36
Fifth Iteration Greedy Techniques
Next, the edge (e,d), the only remaining edge that can be considered, is considered.
As (e,d) does not form a cycle with any of the edges in Eg. Hence the edge (e,d) is put
in Eg. The graph at this stage, with selected edge in bold is as follows.
Error!
1
a b
5 4.2
c d
3
e
Figure: 2.6.7
At this stage we find each of the vertices of the given graph is a vertex of
some edge in Eg. Further we observe that the edges in Eg form a tree, and hence, form
the required spanning tree. Also, from the choice of the edges in Eg, it is clear that the
spanning tree is of minimum weight. Next, we consider semi-formal definition of
Kruskal’s algorithm.
37
Design Techniques-II Summary of Kruskal’s Algorithm
(i) θ ( a log a) time is required for sorting the edges in increasing orders of lengths
(ii) An efficient Union-Find operation takes (2a) find operations and (n −1) merge
operations.
Ex. 4) Using Kruskal’s algorithm, find a minimal spanning tree for the following g
graph
5
a b
4 2
c d
3
1
8
Actually, the notation (a,b) in mathematics is used for ordered pair of the two
elements viz., a and b in which a comes first and then b follows. And the ordered
pair (b,a) denotes a different ordered set in which b comes first and then a follows.
38
However, we have misused the notation in the sense that we used the notation (a,b) to Greedy Techniques
denote an unordered set of two elements, i.e., a set in which order of occurrence of a
and b does not matter. In Mathematics the usual notation for an unordered set is
{a,b}. In this section, we use parentheses (i.e., (and)) to denote ordered sets and
braces (i.e., {and}) to denote a geneal (i.e., unordered set).
Definition:
A directed graph or digraph G = (V(G), E(G)) where V(G) denotes the set of
vertices of G and E(G) the set of directed edges, also called arcs, of G. An arc from a
to b is denoted as (a, b). Graphically it is denoted as follows:
a b,
in which the arrow indicates the direction. In the above case, the vertex a is sometimes
called the tail and the vertex b is called the head of the arc or directed edge.
Definition:
A Weighted Directed Graph is a directed graph in which each arc has an assigned
weight. A weighted directed graph may be denoted as G = (V(G), E(G)), where any
element of E(G) may be of the form (a,b,w) where w denotes the weight of the arc
(a,b). The directed Graph G = ((a, b, c, d, e), ((b, a, 3), (b, d, 2) (a, d,7), (c, b, 4),
(c, d, 5), (d, e, 4), (e, c, 6))) is diagrammatically represented as follows:
4
b c
3
6
2 5
a d e
7 4
Figure: 2.7.1
39
Design Techniques-II // and Distance D(v) of any other vertex v is taken as ∞.
// Iteratively distances of other vertices are modified taking into consideration the
// minimum distances of the various nodes from the node with most recently modified
// distance
D(s) ← 0
For each vertex v ≠ s do
D (v) ← ∞
// Let Set-Remaining-Nodes be the set of all those nodes for which the final minimum
// distance is yet to be determined. Initially
Set-Remaining-Nodes ← V
while (Set-Remaining-Nodes ≠ φ) do
begin
choose v ∈ Set-Remaining-Nodes such that D(v) is minimum
Set-Remaining-Nodes ← Set-Remaining-Nodes ∼ {v}
For each node x ∈ Set-Remaining-Nodes such that w(v, x) ≠ ∞ do
D(x) ← min {D(x), D(v) + w (v, x)}
end
Example 2.7.1
For the purpose, let us take the following graph in which, we take a as the source
4
Error! b c
3
6
2 5
a d e
7 4
Figure: 2.7.2
1 b (c, d, e) [3, 3 + 4, 3 + 2, ∞]
3 c (e) [3, 7, 5, 9]
For minimum distance from a, the node b is directly accessed; the node c is accessed
through b; the node d is accessed through b; and the node e is accessed through b and
d.
Ex. 5) Using Dijkstra’s algorithm, find the minimum distances of all the nodes from
node b which is taken as the source node, for the following graph.
40
6 Greedy Techniques
a b
4 2
c d
3 1
1
2
2.8 SUMMARY
In this unit, we have discussed the greedy technique the essence of which is : In the
process of solving an optimization problem, initially and at subsequent stages,
evaluate the costs/benefits of the various available alternatives for the next step.
Choose the alternative which is optimal in the sense that either it is the least
costly or it is the maximum profit yielding. In this context, it may be noted that
the overall solution, yielded by choosing locally optimal steps, may not be optimal.
Next, well-known algorithms viz., Prim’s and Kruskal’s that use greedy technique, to
find spanning trees for connected graphs are discussed. Also Dijkstra’s algorithm for
solving Single-Source-Shortest path problem, again using greedy algorithm, is
discussed.
2.9 SOLUTIONS/ANSWERS
Ex.1)
Consider the following graph, in which vertices/nodes represent cities of a
country and each edge denotes a road between the cities denoted by the
vertices of the edge. The label on each edge denotes the distance in 1000
kilometers between the relevant cities. The problem is to find an optimal path
from A1 to A4.
A2
5
3
9 A4
A1
2 7
A3
Then greedy techniques suggests the route A1, A3, A4 of length 9000 kilometers,
whereas the optimal path A1, A2, A4 is of length 8000 kilometers only.
41
Design Techniques-II Ex.2)
1
a b
3
c d
Ex.3)
The student should include the explanation on the lines of Example 2.5.1.
However, the steps and stages in the process of solving the problem are as
follows.
Initially
1
a b
VT = (a)
ET = φ
5 2
c d
3
4
8
42
In the following figures, the edges in bold denote the chosen edges. Greedy Techniques
c d
3
4
8
1
After Second Iteration a b
VT = (a, b, d)
ET = ((a, b), (a, d)) 5 2
c d
3
4
8
1
After Third Iteration a b
VT = (a, b, c, d)
ET = ((a, b), (a, d), (c, d)) 5 2
c d
3
4
8
43
Design Techniques-II After Fourth Iteration 1
a b
VT = (a, b, c, d, e)
ET = ((a, b), (a, d), (c, d), (c, e))
5 2
c d
3
4
8
e
Ex. 4)
The student should include the explanation on the lines of Example 2.6.1.
However, the steps and stages in the process of solving the problem are as
follows:
4 2
c d
3
1
8
4 2
c d
3
1
8
44
5
After Third Iteration a b Greedy Techniques
c d
3
1
8
e
After Fourth Iteration
We can not take edge ac, because it forms a cycle with (a, d) and (c, d)
After Fifth Iteration 5
a b
Eg = ((c, e), (a, d), (c, d), (a, b))
4 2
c d
3
1
8
e
Now, on the above four edges all the vertices of the graph lie and these edges form a
tree which is the required minimal spanning tree.
Ex. 5)
A copy of the graph is given below
6
a b
4 2
c d
3 1
1
2
45
Design Techniques-II Step Additional S = Set-of- Distances from source of
node Remaining Nodes a, c, d, e
Initialization B (a, c, d, e) [6, ∞, ∞, 1]
3 d (a) [5, 2, 3, 1]
For minimum distance from b, node a is accessed through d and e; node c is accessed
through e; node d is accessed through e and node e is accessed directly.
46
Models for Executing
UNIT 3 MODELS FOR EXECUTING Algorithms-I: FA
ALGORITHMS-I: FA
Structure Page Nos.
3.0 Introduction 47
3.1 Objectives 47
3.2 Regular Expressions 47
3.2.1 Introduction to Defining of Languages
3.2.2 Kleene Closure Definition
3.2.3 Formal Definition of Regular Expressions
3.2.4 Algebra of Regular Expressions
3.3 Regular Languages 53
3.4 Finite Automata 54
3.4.1 Definition
3.4.2 Another Method to Describe FA
3.5 Summary 59
3.6 Solutions/Answers 59
3.7 Further Readings 60
3.0 INTRODUCTION
In the earlier two blocks and unit 1 and unit 2 of this block, we discussed a number of
issues and techniques about designing algorithms. However, there are a number of
problems for each of which, no algorithmic solution exists. Examples of such
problems will be provided in unit 2 of block 4. However, many of these examples are
found from the discipline of the well-known models of computation viz., finite
automata, push-down automata and Tuning machines. In this unit, we discuss the
topic of Finite Automata.
3.1 OBJECTIVES
After studying this unit, you should be able to:
47
Design Techniques-II Alphabet: A finite set of symbols/characters. We generally denote an alphabet by Σ.
If we start an alphabet having only one letter, say, the letter z, then Σ = {z}.
Letter: Each symbol of an alphabet may also be called a letter of the alphabet or
simply a letter.
Example 2: If the word zzz is called c and the word zz is called d, then the word
formed by concatenating c and d is
cd = zzzzz
When two words in our language L1 are concatenated they produce another word in
the language L1. However, this may not be true in all languages.
Note: The alphabet for L2 is the same as the alphabet for L1.
Example 4: A Language L3 may denote the language having strings of even lengths
include of length 0. In other words, L3 = {∧, zz, zzzz, …..}
In the above description of concatenation we find very commonly, that for a single
letter alphabet when we concatenate c with d, we get the same word as when we
concatenate d with c, that is cd = dc But this relationship does not hold for all
48
languages. For example, in the English language when we concatenate “Ram” and Models for Executing
Algorithms-I: FA
“goes” we get “Ram goes”. This is, indeed, a word but distinct from “goes Ram”.
Now, let us define the reverse of a language L. If c is a word in L, then reverse (c) is
the same string of letters spelled backward.
The reverse (L) = {reverse (w), w∈L}
Let us define a new language called PALINDROME over the alphabet Σ = {a,b}.
For a given alphabet Σ, the language L consists of all possible strings, including the
null string.
Example 7: If Σ = {0, 1}, then, Σ* = {∧, 0, 1, 00, 01, 10, 11, 000, 001 …..}
So, we can say that Kleene Star is an operation that makes an infinite language of
strings of letters out of an alphabet, if the alphabet, Σ ≠ φ. However, by the definition
alphabet Σ may also be φ . In that case, Σ* is finite. By “infinite language, we mean a
language with infinitely many words.
Now, we can generalise the use of the star operator to languages, i.e., to a set of
words, not just sets of alphabet letters.
Definition: If s is a set of words, then by s* we mean the set of all finite strings
formed by concatenating words from s, where any word may be used as often.
Positive Closure: If we want to modify the concept of closure to refer to only the
concatenation leading to non-null strings from a set s, we use the notation + instead of
*. This plus operation is called positive closure.
Proof: We know that every word in s** is made up of factors from s*.
49
Design Techniques-II Also, every factor from s* is made up of factors from s.
Therefore, we can say that every word in s** is made up of factors from s.
For example, now we would build expression from the symbols 0,1 using the
operations of union, concatenation, and Kleene closure.
(i) 01 means a zero followed by a one (concatenation)
(ii) 0+1 means either a zero or a one (union)
(iii) 0* means ∧+0+00+000+….. (Kleen closure).
With parentheses, we can build larger expressions. And, we can associate meanings
with our expressions. Here’s how
50
The language denoted/represented by the regular expression R is L(R). Models for Executing
Algorithms-I: FA
Example 9: The language L defined by the regular expression ab*a is the set of all
strings of a’s and b’s that begin and end with a’s, and that have nothing but b’s inside.
L = {aa, aba, abba, abbba, abbbba, }
Example 10: The language associated with the regular expression a*b* contains all
the strings of a’s and b’s in which all the a’s (if any) come before all the b’s (if any).
Example 11: Let us consider the language L defined by the regular expression
(a+b)* a(a+b)*. The strings of the language L are obtained by concatenating a string
from the language corresponding to (a+b)* followed by a string from the language
associated with (a+b)*. We can also say that the language is a set of all words over the
alphabet Σ = {a,b} that have an a in them somewhere.
Definition: If S and T are sets of strings of letters (they may be finite or infinite sets),
we define the product set of strings of letters to be. ST = {all combinations of a string
from S concatenated with a string from T in that order}.
Ex.4) Find a regular expression over the alphabet {0,1,} to describe the set of all
binary numerals without leading zeroes (except 0 itself). So the language is
the set
{0,1,10,11,100,101,110,111,…}.
51
Design Techniques-II 2. R+R = R
3. R+φ = φ+R = R.
4. R+S = S+R
5. Rφ = φR = φ
6. R∧ = ∧R = R
7. (RS)T = R(ST)
8. R(S+T) = RS+RT
9. (S+T)R = SR+TR
10. φ* = ∧* = ∧
11. R*R* = R* = (R*)*
12. RR* = R*R = R* = ∧+RR*
13. (R+S)* = (R*S*)* = (R*+S*)* = R*S* = (R*S)*R* = R*(SR*)*
14. (RS)* = (R*S*)* = (R*+S*)*
Example 15: Show that R+RS*S = a*bS*, where R = b+aa*b and S is any regular
expression.
= R(∧+S*S) (property 8)
= (b+aa*b)S* (definition of R)
= (∧+aa*) bS* (properties 6 and 8)
52
As we already know the concept of language and regular expressions, we have an Models for Executing
Algorithms-I: FA
important type of language derived from the regular expression, called regular
language.
Definition: For a given alphabet Σ, the following rules define the regular language
associated with a regular expression.
Rule 1: φ,{∧} and {a} are regular languages denoted respectively by regular
expressions φ and ∧.
Rule 2: For each a in Σ, the set {a} is a regular language denoted by the regular
expression a.
(i) The language = {xy : x∈L and y∈M} is a regular expression associated with the
regular expression lm.
(ii) The regular expression l+m is associated with the language formed by the union
of the sets L and M.
To make one regular expression that defines the language L, turn all the words in L
into bold face type and insert plus signs between them. For example, the regular
expression that defines the language L = {baa, abbba, bababa} is baa + abbba +
bababa.
Example 16: If L = {aa, ab, ba, bb}, then the corresponding regular expression is
aa + ab +ba + bb.
So, a particular regular language can be represented by more than one regular
expressions. Also, by definition, each regular language must have at least one regular
expression corresponding to it.
Try some exercises.
53
Design Techniques-II Ex.7) Find a regular expression for each of the following languages over the
alphabet {a,b}.
(a) strings with even length.
(b) strings containing the sub string aba.
In our day to day life we oftenly use the word Automatic. Automation is the process
where the output is produced directly from the input without direct involvement of
mankind. The input passes from various states in process for the processing of a
language we use very important finite state machine called finite automata.
Can a machine recognise a language? The answer is yes for some machine and some
elementary class of machines called finite automata. Regular languages can be
represented by certain kinds of algebraic expressions by Finite automaton and by
certain grammars. For example, suppose we need to compute with numbers that are
represented in scientific notation. Can we write an algorithm to recognise strings of
symbols represented in this way? To do this, we need to discuss some basic
computing machines called finite automaton.
An automata will be a finite automata if it accepts all the words of any regular
language where language means a set of strings. In other words, The class of regular
language is exactly the same as the class of languages accepted by FA’s, a
deterministic finite automata.
3.4.1 Definition
A system where energy and information are transformed and used for performing
some functions without direct involvement of man is called automaton. Examples are
automatic machine tools, automatic photo printing tools, etc.
A finite automata over a finite alphabet A can be thought of as a finite directed graph
with the property that each node omits one labelled edge for each distinct element of
A. The nodes are called states. There is one special state called the start (or initial)
state, and there is a possible empty set of states called final states.
For example, the labelled graph in Figure1 given below represents a DFA over the
alphabet A = {a,b} with start state 1 and final state 4.
54
Models for Executing
Algorithms-I: FA
We always indicate the start state by writing the word start with an arrow painting to
it. Final states are indicated by double circle.
The single arrow out of state 4 labelled with a,b is short hand for two arrows from
state 4, going to the same place, one labelled a and one labelled b. It is easy to check
that this digraph represents a DFA over {a,b} because there is a start state, and each
state emits exactly two arrows, one labelled with a and one labelled with b.
2. An alphabet Σ of possible input letters from which are formed strings that are to
be read one letter at a time.
3. A finite set of transitions that tell for each state and for each letter of the input
alphabet which state to go to next.
For example, the input alphabet has only two letters a and b. Let us also assume that
there are only three states, x, y and z. Let the following be the rules of transition:
1. from state x and input a go to state y;
Let us also designate state x as the starting state and state z as the only final state.
Let us examine what happens to various input strings when presented to this FA. Let
us start with the string aaa. We begin, as always, in state x. The first letter of the
string is an a, and it tells us to go state y (by rule 1). The next input (instruction) is
also an a, and this tells us (by rule 3) to go back to state x. The third input is another
a, and (by Rule 1) again we go to the state y. There are no more input letters in the
55
Design Techniques-II input string, so our trip has ended. We did not finish in the final state (state z), so we
have an unsuccessful termination of our run.
The string aaa is not in the language of all strings that leave this FA in state z. The set
of all strings that do leave as in a final state is called the language defined by the finite
automaton. The input string aaa is not in the language defined by this FA. We may
say that the string aaa is not accepted by this FA because it does not lead to a final
state. We may also say “aaa is rejected by this FA.” The set of all strings accepted is
the language associated with the FA. So, we say that L is the language accepted by
this FA. FA is also called a language recogniser.
Let us examine a different input string for this same FA. Let the input be abba. As
always, we start in state x. Rule 1 tells us that the first input letter, a, takes us to state
y. Once we are in state y we read the second input letter, which is ab. Rules 4 now
tells us to move to state z. The third input letter is a b, and since we are in state z,
Rule 5 tells us to stay there. The fourth input letter is an a, and again Rule 5 says state
z. Therefore, after we have followed the instruction of each input letter we end up in
state z. State z is designated as a final state. So, the input string abba has taken us
successfully to the final state. The string abba is therefore a word in the language
associated with this FA. The word abba is accepted by this FA.
It is not difficult for us to predict which strings will be accepted by this FA. If an
input string is made up of only the letter a repeated some number of times, then the
action of the FA will be jump back and forth between state x and state y. No such
word can ever be accepted.
To get into state z, it is necessary for the string to have the letter b in it as soon as a b
is encountered in the input string, the FA jumps immediately to state z no matter what
state it was before. Once in state z, it is impossible to leave. When the input strings
run out, the FA will still be in state z, leading to acceptance of the string.
So, the FA above will accept all the strings that have the letter b in them and no other
strings. Therefore, the language associated with this FA is the one defined by the
regular expression (a+b)* b(a+b)*.
The list of transition rules can grow very long. It is much simpler to summarise them
in a table format. Each row of the table is the name of one of the states in FA, and
each column of this table is a letter of the input alphabet. The entries inside the table
are the new states that the FA moves into the transition states. The transition table for
the FA we have described is:
Table 1
Input
State
a b
Start x y z
y x z
Final z z z
The machine we have already defined by the transition list and the transition table can
be depicted by the state graph in Figure 2.
56
Models for Executing
Algorithms-I: FA
Note: A single state can be start as well as final state both. There will be only one
start state and none or more than one final states in Finite Automaton.
The finite automata shown in Figure 3 can also be represented in Tabular form as
below:
Table 2
Input
State 0 1 Accept?
Start 1 1 2 No
Final 2 2 3 Yes
3 3 3 No
Before continuing, let’s examine the computation of a finite automaton. Our first
example begins in state one and reads the input symbols in turn changing states as
necessary. Thus, a computation can be characterized by a sequence of states. (Recall
that Turing machine configurations needed the state plus the tape content. Since a
finite automata on never writes, we always know what is on the tape and need only
look at a state as a configuration). Here is the sequence for the input 0001001.
Input Read : 0 0 0 1 0 0 1
States : 1 → 1 → 1 → 1 → 2 → 2 → 2 → 3
Example 17 (An elevator controller): Let’s imagine an elevator that serves two
floors. Inputs are calls to a floor either from inside the elevator or from the floor
itself. This makes three distinct inputs possible, namely:
57
Design Techniques-II 0 - no calls
1 - call to floor one
2 - call to floor two
The elevator itself can be going up, going down, or halted at a floor. If it is on a floor,
it could be waiting for a call or about to go to the other floor. This provides us with
the six states shown in Figure 4 along with the state graph for the elevator controller.
State Input
None call to 1 call to 2
W1 (wait on 1) W1 W1 UP
U1 (start up) UP U1 UP
UP W2 D2 W2
DN W1 W1 U1
W2 (wait on 2) W2 DN W2
D2 (start down) DN DN D2
Accepting and rejecting states are not included in the elevator design because
acceptance is not an issue. If we were to design a more sophisticated elevator, it
might have states that indicated:
Finite automata
a) power faukyrem
b) overloading, or
c) breakdown
58
Let us make a few small notes about the design. If the elevator is about to move ( i.e., Models for Executing
Algorithms-I: FA
in state U1 or D2) and it is called to the floor it is presently on it will stay. (This may
be good Try it next time you are in an elevator.) And, if it is moving (up or down)
and gets called back the other way, it remembers the call by going to the U1 or D2
state upon arrival on the next floor. Of course, the elevator does not do things like
open and close doors (these could be states too) since that would have added
complexity to the design. Speaking of complexity, imagine having 100 floors.
That is our levity for this section. Now that we know what a finite automaton is, we
must (as usual) define it precisely.
We also need some additional notation. The next state function is called the transition
function and the accepting states are often called final states. The entire machine is
usually defined by presenting a transition state table or a transition diagram. In this
way, the states, alphabet, transition function, and final states are constructively
defined. The starting state is usually the lowest numbered state. Our first example of
a finite automaton is:
Where the transition function δ, is defined explicitly by either a state table or a state
graph.
3.5 SUMMARY
In this unit we introduced several formulations for regular languages, regular
expressions are algebraic representations of regular languages. Finite Automata are
machines that recognise regular languages. From regular expressions, we can derive
regular languages. We also made some other observations. Finite automata can be
used as output devices - Mealy and Moore machines.
3.6 SOLUTIONS/ANSWERS
Ex.1)
(i) ababbbaa
(ii) baaababb
(iii) ab abb ab abb
(iv) baa baa
(v) ababbababb baa
Ex.2)
(i) Suppose aa = x
Then { x, b}* = {∧, x, b, xx, bb, xb, bx, xxx, bxx, xbx, xxb, bbx, bxb, xbb,
bbb} substituting x = aa
{aa,b}* = { ∧, aa, b, aaaa, bb, aab, baa, aaaaaa, baaaa, aabaa,
59
Design Techniques-II Ex.3)
(a) a+b+c
(b) ab*+ba*
(c) ∧+a(bb)*
Ex.4)
0+1(0+1)*
Ex.5)
Starting with the left side and using properties of regular expressions, we get
b*(abb* + aabb*+aaabb*)*
= b*((ab+aab+aaab)b*)* (property 9)
= (b + ab + aab + aaab)* (property 7).
Ex.6)
(a) {a,b}
(b) {a,∧,b,bb,….bn,….}
(c) {a,b,ab,bc,abb,bcc,…abn,bcn,….}
Ex.7)
(a) (aa+ab+ba+bb)*
(b) (a+b)*aba(a+b)*
60
Models for Executing
UNIT 4 MODELS FOR EXECUTING Algorithms-II: PDFA &
CFG
ALGORITHMS-II: PDFA & CFG
Structure Page Nos.
4.0 Introduction 61
4.1 Objectives 61
4.2 Formal Language & Grammar 61
4.3 Context Free Grammar (CFG) 68
4.4 Pushdown Automata (PDA) 72
4.5 Summary 74
4.6 Solutions/Answers 74
4.7 Further Readings 75
4.0 INTRODUCTION
We have mentioned earlier that not every problem can be solved algorithmically and
that good sources of examples of such problems are provided by formal models of
computation viz., FA, PDFA and TA. In the previous unit, we discussed FA. In this
unit, we discuss PDFA, CFG and related topics.
4.1 OBJECTIVES
After going through this unit, you should be able to:
<noun> → I
<noun> → Ram
<noun> → Sam
<verb> → reads
<verb> → writes
From the above, we can collect all the values in two categories. One is with the
parameter changing its values further, and another is with termination. These
61
Design Techniques-II collections are called variables and terminals, respectively. In the above discussion
variables are, <sentence>, <noun> and <verb>, and terminals are I, Ram, Sam, read,
write. As the sentence formation is started with <sentence>, this symbol is special
symbol and is called start symbol.
x→y
where x and y denote strings of symbols taken from A and from a set of grammar
symbols disjoint from A. The grammar rule x → y is called a production rule, and
application of production rule (x is replaced by y), is called derivation.
Every grammar has a special grammar symbol called the start symbol and there must
be at least one production with the left side consisting of only the start symbol. For
example, if S is the start symbol for a grammar, then there must be at least one
production of the form S→ y.
S→ ∧ (i)
S→ aS (ii)
S→ bS (iii)
S→ cS (iv)
The desired derivation of the string is aacb. Each step in a derivation corresponds to a
branch of a tree and this true is called parse tree, whose root is the start symbol. The
completed derivation and parse tree are shown in the Figure 1,2,3:
62
Models for Executing
Algorithms-II: PDFA &
CFG
Let us derive the string aacb, its parse tree is shown in Figure 4.
S ⇒ aS ⇒ aaS ⇒ aacS ⇒ aacbS ⇒ aacb∧ = aacb
In G = (V, Σ, P, S) where
V = {<sentence>, <noun>, <verb>}
Σ = {Ram, reads,…}
P = <sentence> → <noun> <verb>
<noun> → Ram
<verb> → reads, and
S = <sentence>
xαy ⇒ xβy
63
Design Techniques-II To the left hand side of the above production rule x is left context and y is right
context. If the derivation is applied to left most variable of the right hand side of any
production rule, then it is called leftmost derivation. And if applied to rightmost then
is called rightmost derivation.
S → b/aA
A → c/bS
S ⇒ aA ⇒ abS,
A ⇒ bS ⇒ baA
A grammar is recursive if it contains either a recursive production or an indirectly
recursive production.
Notice that any string in this language is either ∧ or of the form ax for some string x in
the language. The following grammar will derive any of these strings:
S → ∧/aS.
Now, we shall derive the string aaa:
Notice that any string in this language is either ∧ or of the form axb for some string x
in the language. The following grammar will derive any of the strings:
S → ∧/aSb.
For example, we will derive the string aaabbb;
Notice that any string in this language is either ∧ or of the form abx for some string x
in the language. The following grammar will derive any of these strings:
S → ∧/abS.
64
For example, we shall derive the string ababab: Models for Executing
Algorithms-II: PDFA &
S ⇒ abS ⇒ ababS ⇒ abababS ⇒ ababab. CFG
Suppose M and N are languages whose grammars have disjoint sets of non-terminals.
Suppose also that the start symbols for the grammars of M and N are A and B,
respectively. Then, we use the following rules to find the new grammars generated
from M and N:
Union Rule: The language M∪N starts with the two productions
S → A/B.
Product Rule: The language MN starts with the production.
S → AB
Closure Rule: The language M* starts with the production
S → AS/∧.
Example 6: Using the Union Rule:
L = M∪N,
L = {ambn⏐m,n≥0}.
S → AB product rule
A → ∧/aA grammar for M,
B → ∧/bB grammar for N,
Example 8: Using the Closure Rule: For the language L of all strings with zero or
more occurrence of aa or bb. L = {aa, bb}*. If we let M = {aa, bb}, then L = M*.
Thus, we can write the following grammar for L:
65
Design Techniques-II We can simplify the grammar by substituting for A to obtain the following grammar:
S → aaS/bbS/∧
Example 9: Let Σ = {a, b, c}. Let S be the start symbol. Then, the language of
palindromes over the alphabet Σ has the grammar.
S → aSa/bSb/cSc/a/b/c/∧.
E → a/b/E−E
The language of the grammar E → a/b/E-E contains strings like a, b, b−a, a−b−a, and
b−b−a−b. This grammar is ambiguous because it has a string, namely, a−b−a, that has
two distinct parse trees.
Since having two distinct parse trees mean the same as having two distinct left most
derivations.
66
S → S[S]/∧ Models for Executing
Algorithms-II: PDFA &
CFG
For each of the following strings, construct a leftmost derivation, a rightmost
derivation and a parse tree.
Type 0: This grammar is also called unrestricted grammar. As its name suggests,
it is the grammar whose production rules are unrestricted.
In other words, ⏐xAy⏐≤⏐xαy⏐as α≠∧. Here, x is left context and y is right context.
A grammar is called type 1 grammar, if all of its productions are of type 1. For this,
grammar S → ∧ is also allowed.
Type 2: The grammar is also known as context free grammar. A grammar is called
type 2 grammar if all the production rules are of type 2. A production is said to be of
type 2 if it is of the form A → α where A∈V and α∈(V∪Σ)*. In other words, the left
hand side of production rule has no left and right context. The language generated by
a type 2 grammar is called context free language.
Type 3: A grammar is called type 3 grammar if all of its production rules are of type
3. (A production rule is of type 3 if it is of form A → ∧, A → a or A → aB where
a∈Σ and A,B∈V), i.e., if a variable derives a terminal or a terminal with one variable.
This type 3 grammar is also called regular grammar. The language generated by
this grammar is called regular language.
Ex.11) Find the highest type number that can be applied to the following grammar:
(a) S → ASB/b, A → aA
(b) S → aSa/bSb/a/b/∧
(c) S → Aa, A→ S/Ba, B → abc.
67
Design Techniques-II
4.3 CONTEXT FREE GRAMMAR (CFG)
We know that there are non-regular languages. For example, {anbn⏐n≥0} is non-
regular language. Therefore, we can’t describe the language by any of the four
representations of regular languages, regular expressions, DFAs, NFAs, and regular
grammars.
S → ∧/aSb.
So, a context-free grammar is a grammar whose productions are of the form :
S→x
Where S is a non-terminal and x is any string over the alphabet of terminals and non-
terminals. Any regular grammar is context-free. A language is context-free language
if it is generated by a context-free grammar.
A grammar that is not context-free must contain a production whose left side is a
string of two or more symbols. For example, the production Sc → x is not part of any
context-free grammar.
Most programming languages are context-free. For example, a grammar for some
typical statements in an imperative language might look like the following, where the
words in bold face are considered to be the single terminals:
L → SL/∧
E →….(description of an expression)
I →….(description of an identifier).
Where the strings of terminals and non-terminals can consist of only terminals or of
only non-terminals, or any combination of terminals and non-terminals or even the
empty string.
The language generated by a CFG is the set of all strings of terminals that can be
produced from the start symbol S using the productions as substitutions. A language
generated by a CFG is called a context-free language.
Example 11: Find a grammar for the language of decimal numerals by observing that
a decimal numeral is either a digit or a digit followed by a decimal numeral.
68
S → D/DS Models for Executing
Algorithms-II: PDFA &
D → 0/1/2/3/4/5/6/7/8/9 CFG
S ⇒ DS ⇒ 7S ⇒ 7DS ⇒ 7DDS ⇒ 78DS ⇒ 780S ⇒ 780D ⇒ 780.
Example 12: Let the set of alphabet A = {a, b, c}
Then, the language of palindromes over the alphabet A has the grammar:
S → aSa⏐bSb⏐cSc⏐a⏐b⏐c⏐∧
For example, the palindrome abcba can be derived as follows:
A → LA⏐DA⏐∧
L → a⏐b⏐…⏐Z
D → 0⏐1⏐…⏐9
The language generated by the grammar has all the strings formed by a, b,c ….z, 0,
1,…..9.
Proof: If L1 and L2 are context-free languages, then each of them has a context-free
grammar; call the grammars G1 and G2. Our proof requires that the grammars have
no non-terminals in common. So we shall subscript all of G1’s non-terminals with a 1
and subscript all of G2’s non-terminals with a 2. Now, we combine the two grammars
into one grammar that will generate the union of the two languages. To do this, we
add one new non-terminal, S, and two new productions.
S → S1
⏐ S2
S is the starting non-terminal for the new union grammar and can be replaced either
by the starting non-terminal for G1 or for G2, thereby generating either a string from
L1 or from L2. Since the non-terminals of the two original languages are completely
different, and once we begin using one of the original grammars, we must complete
the derivation using only the rules from that original grammar. Note that there is no
need for the alphabets of the two languages to be the same.
Concatenation
Theorem 2: If L1 and L2 are context-free languages, then L1L2 is a context-free
language.
Proof : This proof is similar to the last one. We first subscript all of the non-terminals
of G1 with a 1 and all the non-terminals of G2 with a 2. Then, we add a new
nonterminal, S, and one new rule to the combined grammar:
S → S1S2
69
Design Techniques-II
S is the starting non-terminal for the concatenation grammar and is replaced by the
concatenation of the two original starting non-terminals.
Kleene Star
Theorem 3: If L is a context-free language, then L* is a context-free language.
Proof : Subscript the non-terminals of the grammar for L with a 1. Then add a new
starting nonterminal, S, and the rules
S → S1S
⏐Λ
The rule S → S1S is used once for each string of L that we want in the string of L*,
then the rule S → Λ is used to kill off the S.
Intersection
Now, we will show that the set of context-free languages is not closed under
intersection. Think about the two languages L1 = {anbncm⏐n,m≥0} and
L2 = {ambncn⏐n,m≥0}. These are both context-free languages and we can give a
grammar for each one:
G1:
S → AB
A → aAb
⏐Λ
B → cB
⏐Λ
G2:
S → AB
A → aA
⏐Λ
B → bBc
⏐Λ
The strings in L1 contain the same number of a’s as b’s, while the strings in L2 contain
the same number of b’s as c’s. Strings that have to be both in L1 and in L2, i.e., strings
in the intersection, must have the same numbers of a’s as b’s and the same number of
b’s as c’s.
Although the set is not closed under intersection, there are cases in which the
intersection of two context-free languages is context-free. Think about regular
languages, for instance. All regular languages are context-free, and the intersection of
two regular languages is regular. We have some other special cases in which an
intersection of two context-free languages is context, free.
Suppose that L1 and L2 are context-free languages and that L1⊆L2. Then L2∩L1 = L1
which is a context-free language. An example is EQUAL ∩{anbn}. Since strings in
70
{anbn} always have the same number of a’s as b’s, the intersection of these two Models for Executing
Algorithms-II: PDFA &
languages is the set {anbn}, which is context-free.
CFG
Complement
The set of context-free languages is not closed under complement, although there are
again cases in which the complement of a context-free language is context-free.
Think carefully when doing unions and intersections of languages if one is a superset
of the other. The union of PALINDROME and (a+b)* is (a+b)*, which is regular. So,
sometimes the union of a context-free language and a regular language is regular. The
union of PALINDROME and a* is PALINDROME, which is context-free but not
regular.
(b) For any two positive integers p and q, the language of all words of the
form ax by az, where x, y, z = 1, 2, 3… and y = px + qz.
71
Design Techniques-II
4.4 PUSHDOWN AUTOMATA (PDA)
Informally, a pushdown automata is a finite automata with stack. The corresponding
acceptor of context-free grammar is pushdown automata. There is one start state and
there is a possibly empty-set of final states. We can imagine a pushdown automata as
a machine with the ability to read the letters of an input string, perform stack
operations, and make state changes.
The execution of a PDA always begins with one symbol on the stack. We should
always specify the initial symbol on the stack. We assume that a PDA always begins
execution with a particular symbol on the stack. A PDA will use three stack
operations as follows:
(i) The pop operation reads the top symbol and removes it from the stack.
(ii) The push operation writes a designated symbol onto the top of the stack. For
example, push (x) means put x on top of the stack.
(iii) The nop does nothing to the stack.
We can represent a pushdown automata as a finite directed graph in which each state
(i.e., node) emits zero or more labelled edges. Each edge from state i to state j
labelled with three items as shown in the Figure 7, where L is either a letter of an
alphabet or ∧, S is a stack symbol, and 0 is the stack operation to be performed.
LS
0
i j
It takes fine pieces of information to describe a labelled edge. We can also represent
it by the following 5-tuple, which is called a PDA instruction.
(i, L, S, 0, j)
An instruction of this form is executed as follows, where w is an input string whose
letters are scanned from left to right.
If the PDA is in state i, and either L is the current letter of w being scanned or L = ∧,
and the symbol on top of the stack is S, then perform the following actions:
(1) execute the stack operation 0;
(2) move to the state j; and
(3) if L ≠ ∧, then scan right to the next letter of w.
72
The second kind of nondeterminism occurs when a state emits two edges labelled with Models for Executing
Algorithms-II: PDFA &
the same stack symbol, where one input symbol is ∧ and the other input symbol is not. CFG
For example, the following two 5-tuples represent non-determinism because the
machine has the option of consuming the input letter b or cleaning it alone.
(i, ∧, c, pop, j)
(i, b, c, push(D), k).
Example 14: The language {anbn⏐n≥0} can be accepted by a PDA. We will keep
track of the number of a’s in an input string by pushing the symbol Y onto the stack
for each a. A second state will be used to pop the stack for each b encountered. The
following PDA will do the job, where x is the initial symbol on the stack:
(0, ∧, X, nop, 2)
(0, a, X, push(Y), 0),
(0, a, Y, push(Y), 0),
(0, b, Y, pop,1),
(1, b, Y, pop,1),
(1, ∧, X, nop,2).
This PDA is non-deterministic because either of the first two instructions in the list
can be executed if the first input letter is a and X is on the top of the stack. A
computation sequence for the input string aabb can be written as follows:
73
Design Techniques-II Example 15: (An empty stack PDA): Let’s consider the language {anbn⏐n≥0}, the
PDA that follows will accept this language by empty stack, where X is the initial
symbol on the stack.
PDA shown in Figure 9 can also be represented by the following three instructions:
This PDA is non-determinstic. Let’s see how a computation proceeds. For example,
a computation sequence for the input string aabb can be as follows:
4.5 SUMMARY
In this unit we have considered the recognition problem and found out whether we can
solve it for a larger class of languages. The corresponding accepter for the context-
free languages are PDA’s. There are some languages which are not context free. We
can prove the non-context free languages by using the pumping lemma. Also in this
unit we discussed about the equivalence two approaches, of getting a context free
language. One approach is using context free grammar and other is Pushdown
Automata.
4.6 SOLUTIONS/ANSWERS
Ex.1)
74
Ex.2) Models for Executing
Algorithms-II: PDFA &
(a) S → aSb/aAb CFG
A → bA/b
Ex.3)
(a) S → aSa/bSb/∧
(b) S → aSa/bSb/a/b.
Ex.4)
(c) Type 2.
Ex.5)
(a) S → AB
S → aAb5/∧
B → b7Ba/∧
(b) S → AB
A → aAbp/∧
B → bqBa/∧
Ex.6)
Suppose language is {wcwT:w∈{a,b}*} then pda is
Ex.7)
Language is {wwT:w∈{a,b}*}. Similarly as Ex 6.
75
Design Techniques-II
76
Models for Executing of
Algorithms – III: TM
UNIT 1 MODELS FOR EXECUTING
ALGORITHMS – III : TM
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 6
1.2 Prelude to Formal Definition 6
1.3 Turing Machine: Formal Definition and Examples 8
1.4 Instantaneous Description and Transition Diagram 13
1.4.1 Instantaneous Description
1.4.2 Transition Diagrams
1.5 Some Formal Definitions 16
1.6 Observations 19
1.7 Turing Machine as a Computer of Functions 21
1.8 Summary 31
1.9 Solutions/Answers 31
1.10 Further Readings 38
1.0 INTRODUCTION
In unit 3 and unit 4 of block 4, we discussed two of the major approaches to modeling
of computation viz. the automata/machine approach and linguistic/grammatical
approach. Under grammatical approach, we discussed two models viz., Regular
Languages and Context-free Languages.
Under automata approach, we discussed two models viz., Finite Automata and
Pushdown Automata. Next, we discuss still more powerful automata for computation.
Turing machine (TM) is the next more powerful model of automata approach
which recognizes more languages than Pushdown automata models do. Also Phrase-
structure model is the corresponding grammatical model that matches Turing
machines in computational power.
5
Complexity & Halt or h: The halt state. The same symbol h is used for the purpose of denoting
Completeness halt state for all halt state versions of TM. And then h is not used for
other purposes.
e or ε : The empty string
Or
w1 a w2: The symbol a is the symbol currently being scanned by the Head.
↑
1.1 OBJECTIVES
After going through this unit, you should be able to:
• define and explain various terms mentioned under the title key words in the
previous section;
• construct TMs for simple computational tasks;
• realize some simple mathematical functions as TMs; and
• apply modular techniques for the construction of TMs for more complex
functions and computational tasks from TMs already constructed for simple
functions and tasks.
Infinite Tape
d a b # c b …… …… ….. …..
Read /Write
Head
Finite Control
TURING MACHINE
6 Figure: 1.2.1
Such a view, in addition to being more comprehensible to human beings, can be a Models for Executing of
Algorithms – III: TM
quite useful aid in the design of TMs accomplishing some computable tasks, by
allowing informal explanation of the various steps involved in arriving at a particular
design. Without physical view and informal explanations, whole design process
would be just a sequence of derivations of new formal symbolic expressions from
earlier known or derived symbolic expressions ⎯ not natural for human
understanding.
(i) a tape, with an end on the left but infinite on the right side. The tape is divided
into squares or cells, with each cell capable of holding one of the tape symbols
including the blank symbol #. At any time, there can be only finitely many cells
of the tape that can contain non-blank symbols. The set of tape symbols is
denoted by Γ .
As the very first step in the sequence of operations of a TM, the input, as a
finite sequence of the input symbols is placed in the left-most cells of the
tape. The set of input symbols denoted by ∑, does not contain the blank
symbol #. However, during operations of a TM, a cell may contain a tape
symbol which is not necessarily an input symbol.
There are versions of TM, to be discussed later, in which the tape may be
infinite in both left and right sides ⎯ having neither left end nor right end.
(ii) a finite control, which can be in any one of the finite number of states.
The states in TM can be divided in three categories viz.,
(a) the Initial state, the state of the control just at the time when TM starts its
operations. The initial state of a TM is generally denoted by q0 or s.
(b) the Halt state, which is the state in which TM stops all further operations.
The halt state is generally denoted by h. The halt state is distinct from the
initial state. Thus, a TM HAS AT LEAST TWO STATES.
(c) Other states
(iii) a tape head (or simply Head), is always stationed at one of the tape cells and
provides communication for interaction between the tape and the finite control.
The Head can read or scan the symbol in the cell under it. The symbol is
communicated to the finite control. The control taking into consideration the
symbol and its current state decides for further course of action including−
The course of action is called a move of the Turing Machine. In other words, the
move is a function of current state of the control and the tape symbol being
scanned.
In case the control decides for change of the symbol in the cell being scanned, then
the change is carried out by the head. This change of symbol in the cell being
scanned is called writing of the cell by the head.
7
Complexity & Now, we are ready to consider a formal definition of a Turing Machine in the next
Completeness section.
The meaning of δ (qi, ak) = (qj, al, x) is that if qi is the current state of the TM,
and ak is cell currently under the Head, then TM writes al in the cell currently
under the Head, enters the state qj and the Head moves to the right adjacent cell,
if the value of x is R, Head moves to the left adjacent cell, if the value of x is L
and continues scanning the same cell, if the value of x is N.
(v) q0 ∈ Q, is the initial/start state.
(vi) h ∈ Q is the ‘Halt State’, in which the machine stops any further activity.
Remark 1.3.1
Again, there are a number of variations in literature of even the above version of TM.
For example, some authors allow at one time only one of the two actions viz.,
(i) writing of the current cell and (ii) movement of the Head to the left or to the right.
However, this restricted version of TM can easily be seen to be computationally
equivalent to the definition of TM given above, because one move of the TM given by
the definition can be replaced by at most two moves of the TM introduced in the
Remark.
In the next unit, we will discuss different versions of TM and issues relating to
equivalences of these versions.
In order to illustrate the ideas involved, let us consider the following simple
examples.
Example 1.3.2
Consider the Turing Machine (Q, Σ, ⎡, δ, qo, h) defined below that erases all the non-
blank symbols on the tape, where the sequence of non-blank symbols does not contain
any blank symbol # in-between:
8
Q= {qo, h} Σ = {a, b}, ⎡ = {a, b, #} Models for Executing of
Algorithms – III: TM
and the next-move function δ is defined by the following table:
A string Accepted by a TM
Example 1.3.3
Design a TM which accepts all strings of the form bn dn for n ≥ 1 and rejects all other
strings.
Let the TM M to be designed is given by M = (Q, ∑, ⎡, δ, q0, h) with ∑ = { b, d}. The
values of Q, ⎡, δ, shall be determined by the design process explained below.
However to begin with we take ⎡= {b, d, #}.
We illustrate the design process by considering various types of strings which are to
be accepted or rejected by the TM.
As input, we consider only those strings which are over {b, d}. Also, it is assumed
that, when moving from left, occurrence of first # indicates termination of strings
over ⎡.
We are considering this particular type of strings, because, by taking simpler cases of
the type, we can determine some initial moves of the required TM both for strings to
be accepted and strings to be rejected.
B b d - - - -
9
Complexity & Next, TM should mark the b, if it exists, which is immediately on the right of the
Completeness previously marked b. i.e., should mark the b which is the left-most b which is yet to be
marked.
Thus we require two additional Tape symbols B and D, i.e., ⎡ = {b, d, B, D #}.
After one iteration of replacing one b by B and one d by D the tape would be of the
form
B b D - - - -
In respect of the states of the machine, we observe that in the beginning, in the
initial state q0, the cell under the Head is a b, and then this b is replaced by a B; and at
this stage, if we do not change the state then TM would attempt to change next b
also to B without matching the previous b to the corresponding d. But in order to
recognize the form bn dn of the string we do not want, in this round, other b’s to be
changed to B’s before we have marked the corresponding d. Therefore,
In q2, when we meet the first B, we know that none of the cells to the left of the
current cell contains b and, if there is some b still left on the tape, then it is in the cell
just to the right of the current cell. Therefore, we should move to the right and then if
it is a b, it is the left-most b on the tape and therefore the whole process should be
repeated, starting in state q0 again.
Therefore, before entering b from the left side, TM should enter the initial state q0.
Therefore,
δ (q2, B) = (q0, B, R).
For to-be-accepted type string, when all the b’s are converted to B’s and when the
last d is converted to D in q2, we move towards left to first B and then move to right in
q0 then we get the following transition:
10
from configuration Models for Executing of
Algorithms – III: TM
B B D D # #
↑
q2
to configuration
B B D D # #
↑
q0
b D b ……..
B D b
↑
q0
The above string is to be rejected. But if we take δ (q0, D) as q0 then whole process
of matching b’s and d’s will be again repeated and then even the (initial) input of the
form
b d b # #
will be incorrectly accepted. In general, in state q0, we encounter D, if all b’s have
already been converted to B’s and corresponding d’s to D’s. Therefore, the next state
of δ (q0, D) cannot be q0.
Let
δ (q0, D) = (q3, D, R).
As explained just above, for a string of the to-be-accepted type, i.e., of the form bn dn,
in q3 we do not expect symbols b, B or even another d because then there will be more
d’s than b’s in the string, which should be rejected.
In all these cases, strings are to be rejected. One of the ways of rejecting a string
say s by a TM is first giving the string as (initial) input to the TM and then by not
providing a value of δ in some state q ≠ h, while making some move of the TM.
Thus the TM, not finding next move, stops in a state q ≠ h. Therefore, the string
is rejected by the TM.
Case II when n = 0 but m ≠ 0, i.e., when input string is of the form dm (b⏐d)* for
m ≠ 0. 11
Complexity & Case III when the input is of the form bn #, n ≠ 0
Completeness
Case IV when the input is of the form # …………..
Case II:
d ………
↑
q0
δ (q1, #) is undefined
We have considered all possible cases of input strings over Σ = {b,d} and in which,
while scanning from left, occurrence of the first # indicates termination of strings
over ⎡.
After the above discussion, the design of the TM that accepts strings of the form
bndn and rejects all other strings over {b, d}, may be summarized as follows:
Ex. 1) Design a TM that recognizes the language of all strings of even lengths over
the alphabet {a, b}.
Ex. 2) Design a TM that accepts the language of all strings which contain aba as a
sub-string.
(i) Contents of all the cells of the tape, starting from the left−most cell up to atleast
the last cell containing a non-blank symbol and containing all cells upto the cell
being scanned.
Initial Configuration: The total configuration at the start of the (Turing) Machine is
called the initial configuration.
13
Complexity & There are various notations used for denoting the total configuration of a Turing
Completeness Machine.
# # b d a f # g h k # # # #
# # b d a f # g h k # # # #
↑
q3
Alternatively, the configuration is also denoted by (q3,## bdaf# g hk), where the
symbol under the tape head is underscored but two last commas are dropped.
It may be noted that the sequence of blanks after the last non-blank symbol, is not
shown in the configuration. The notation may be alternatively written (q3, w, g, u)
where w is the string to the left and u the string to the right respectively of the symbol
that is currently being scanned.
In case g is the left-most symbol then we use the empty string e instead of w.
Similarly, if g is being currently scanned and there is no non-blank character to the
right of g then we use e, the empty string instead of u.
Notation 3: The next notation neither uses parentheses nor commas. Here the state is
written just to the left of the symbol currently being scanned by the tape Head. Thus
the configuration (q3, ##bdaf#, g, h, k) is denoted as # # bdaf#q3ghk
Thus if the tape is like
g w # …………
↑
q5
then we may denote the corresponding configuration as (q5, e, g, u). And, if the tape
is like
a b c g # # …
↑
q6
Then the configuration is (q6, abc, g, e) or (q6, abc g ) or alternatively as abcq6g by the
following notation.
Example 1.4.2.1
0 1 #
q0 - - (q2, #, R)
q1 (q2, 0, R) (q1, #, R) (h, #, N )
q2 (q2, 0, L) (q1, 1, R) (h, #, N )
h - - -
Then, the above Turing Machine may be denoted by the Transition Diagram shown
below, where we assume that q0 is the initial state and h is a final state.
1/#,R
1/1, R
q1
q0
#/#, R
#/#,N
0/0, R
h
q2
#/#, N
0/0, L
Figure: 1.4.2.1
Ex. 3) Design a TM M that recognizes the language L of all strings over {a, b, c}
with
(i) number of a’s = Number of b’s = Number of c’s and
15
Complexity & (ii) if (i) is satisfied, the final contents of the tape are the same as the input, i.e.,
Completeness the initial contents of the tape are also the final contents of the tape, else
rejects the string.
Ex. 4) Draw the Transition Diagram of the TM that recognizes strings of the form bn
dn, n ≥1 and was designed in the previous section.
Ex. 5) Design a TM that accepts all the language of all palindromes over the alphabet
{a, b}. A palindrome is a string which equals the string obtained by reversing
the order of occurrence of letters in it. Further, find computations for each of
the strings (i) babb (ii) bb (iii) bab.
Ex. 6) Construct a TM that copies a given string over {a, b}. Further find a
computation of the TM for the string aab.
M= (Q, ∑, ⎡, δ, q0 h)
For the definition and notation for Move, assume the TM is in the configuration
(q, a1 a2 … ai-1, ai , ai+1 … an)
Case i(a) if i > 1, then the move is the activity of TM of going from the configuration
The suffix M, denoting the TM under consideration, may be dropped, if the machine
under consideration is known from the context.
Case i(c) when i = n and b is the blank symbol #, then the move is denoted as
(q, a1 a2 … an-1, an, e) ├ (q, a1 a2 … an-2, an-1, ε,e ).
Case (iii) δ( ai, q) = (b, p, ‘No Move’) when Head does not move.
then the move is denoted as
(q, a1 … ai-1,ai, ai+1… an) |– (p, a1… ai-1, b, ai + 1 … an)
Definition: Computation
If c0 is an initial configuration and for some n, the configurations c1, c2, …, cn are
such that c0, |– c1 |– … |– cn, then, the sequence of configurations c0, c1 … cn
constitutes a computation.
17
Complexity & Definition: Language accepted by a TM
Completeness
M = (θ, Σ, ⎡, δ, q0, h ), denoted by L(M), and is defined as
L(M) = {ω | ω ∈ Σ* and if ω = a1 … an then
(q0, e, a1, a,…an) |–*
(h, b1 … bj-1, bj, … bj,+1… bn)
for some b1 b2 ….bn ε ⎡*
L(M), the language accepted by the TM M is the set of all finite strings ω over Σ
which are accepted by M.
There are at least two alternate, but of course, equivalent ways of defining a Turing
Decidable Language as given below:
fL: ∑*→ { Y, N}
such that for each ω ε ∑*
⎧Y if ω ∈ L
fL (ω) = ⎨
⎩N if ω ∉ L
Remark 1.5.1
This raises the question of how to decide that an input string w is not accepted by
a TM M.
(ii) The tape Head is scanning the left-most cell containing the symbol x and
the state of M is say q and δ (x, q) suggests a move to the ‘left’ of the
current cell. However, there is no cell to the left of the left-most cell.
18
Therefore, move is not possible. The potentially resulting situation (can’t Models for Executing of
Algorithms – III: TM
say exactly configuration) is called Hanging configuration.
(iii) The TM on the given input w enters an infinite loop. For example, if
configuration is as
x y
↑
q0
1.6 OBSERVATIONS
The concept of TM is one of the most important concepts in the theory of
Computation. In view of its significance, we discuss a number of issues in respect of
TMs through the following remarks.
Remark 1.6.1
Turing Machine is not just another computational model, which may be further
extended by another still more powerful computational model. It is not only the most
powerful computational model known so far but also is conjectured to be the ultimate
computational model.
Turing Thesis: The power of any computational process is captured within the class
of Turing Machines.
It may be noted that Turing thesis is just a conjecture and not a theorem, hence,
Turing Thesis can not be logically deduced from more elementary facts. However, the
conjecture can be shown to be false, if a more powerful computational model is
proposed that can recognize all the languages which are recognized by the TM model
and also recognizes at least one more language that is not recognized by any TM.
In view of the unsuccessful efforts made in this direction since 1936, when Turing
suggested his model, at least at present, it seems to be unlikely to have a more
powerful computational model than TM Model.
Remark 1.6.2
The Finite Automata and Push-Down Automata models were used only as accepting
devices for languages in the sense that the automata, when given an input string from
a language, tells whether the string is acceptable or not. The Turing Machines are
designed to play at least the following three different roles:
(i) As accepting devices for languages, similar to the role played by FAs and
PDAs.
A TM on entering the Halt State stops making moves and whatever string is there on
the tape, is taken as output irrespective of whether the position of Head is at the end
or in the middle of the string on the tape. However, an FA/PDA, while scanning a
symbol of the input tape, if enters a final state, can still go ahead (as it can do on
entering a non-final state) with the repeated activities of moving to the right, of
scanning the symbol under the head and of entering a new state etc. In the case of
FA⏐ PDA, the portion of string from left to the symbol under tape Head is accepted if
the state is a final state and is not accepted if the state is not a final state of the
machine.
To be more clear we repeat: the only difference in the two situations when an FA/PDA
enters a final state and when it enters a non-final state is that in the case of the first
situation, the part of the input scanned so far is said to be accepted/recognized,
whereas in the second situation the input scanned so far is said to be unaccepted.
Remark 1.6.4
Final State Version of Turing Machine
Instead of the version discussed above, in which a particular state is designated as
Halt State, some authors define TM in which a subset of the set of states Q is
designated as Set of Final States, which may be denoted by F. This version is
extension of Finite automata with the following changes, which are minimum required
changes to get a Turing Machine from an FA.
(i) The Head can move in both Left and Right directions whereas in PDA/FA the
head moves only to the Right.
(ii) The TM, while scanning a cell, can both read the cell and also, if required,
change the value of the cell, i.e., can write in the cell. In Finite Automata, the
Head only can read the cell. It can be shown that the Halt State version of TM is
equivalent to the Final State version of Turing Machine.
(iii) In this version, the TM machine halts only if in a given state and a given symbol
under the head, no next move is possible. Then the (initial) input on the tape of
TM, is unacceptable.
We have already discussed the Turing Machine in the role of language accepting
device. Next, we discuss how a TM can be used as a computer of functions.
Remark 1.7.1
For the purpose of discussing TMs as computers of functions, we make the following
assumptions:
• A string ω over some alphabet say ∑ will be written on the tape as #ω#, where #
is the blank symbol.
• Also initially, the TM will be scanning the right-most # of the string #ω#.
Thus, the initial configuration, (q0, #ω#) represents the starting point for the
computation of the function with ω as input.
Though, most of the time, we require functions of one or more arguments having only
integer values with values of arguments under the functions again as integers, yet, we
consider functions with domain and codomain over arbitrary alphabet sets say Σ0 and
Σ1 respectively, neither of which contains the blank symbol #.
Remark 1.7.3
Next, we discuss the case of functions which require k arguments, where k may be
any finite integer, greater than or equal to zero. For example,
the operation PLUS takes two arguments m and n and returns m + n.
# x1 x2 y1 Y2 y3 z1 z2 #
then the above tape contents may even be interpreted as a single argument viz.,
x1 x2, y1 y2 y3 z1 z2. Therefore, in order, to avoid such an incorrect interpretation,
the arguments are separated by #. Thus, the above three arguments will be written on
the tape as
# x1 x2 # y1 Y2 y3 # z1 z2 #
In general, if a function f takes k ≥ 1 arguments say ω1, ω2, …, ωk where each of these
arguments is a string over Σ0 (i.e., each ωi belongs to Σ0*) and if f (ω1, ω2, …, ωk) = μ
for some μ ∈ Σ1*; then we say f is Turing Computable if there is a Turing Machine
M such that
Remark 1.7.4
In stead of functions with countable, but otherwise arbitrary sets as domains and
ranges, we consider only those functions, for each of which the domain and range is
the set of natural numbers. This is not a serious restriction in the sense that any
countable set can, through proper encoding, be considered as a set of natural numbers.
For natural numbers, there are various representations; some of the well-known
representations are Roman Numerals (e.g., VI for six), Decimal Numerals (6 for six),
Binary Numerals (110 for six). Decimal number system uses 10 symbols viz., 0, 1, 2,
3,4, 5, 6, 7, 8 and 9. Binary number system uses two symbols denoted by 0 and 1.
In the discussion of Turing Computable Functions, the unary representation
described below is found useful. The unary number system uses one symbol only:
22
Let the symbol be denoted by I then the number with name six is represented as I I I I Models for Executing of
Algorithms – III: TM
I I. In this notation, zero is represented by empty/null string. Any other number say
twenty is represented in unary systems by writing the symbol I, twenty times. In order
to facilitate the discussion, the number n, in unary notation will be denoted by In in
stead of writing the symbol I, n times.
The advantage of the unary representation is that, in view of the fact that most of the
symbols on the tape are input symbols and if the input symbol is just one, then the
next state will generally be determined by only the current state, because the other
determinant of the next state viz., tape symbol is most of the time the unary symbol.
We recall that for the set X, the notation X* represents the set of all finite strings of
symbols from the set X. Thus, any function f from the set of natural number to the set
of natural numbers, in the unary notation, is a function of the form f : {I}* → {I}*
The above idea may be further generalized to the functions of more than one
integer arguments. For example, SUM of two natural numbers n and m takes two
integer arguments and returns the integer (n + m). The initial configuration with the
tape containing the representation of the two arguments say n and m respectively, is of
the form
# I I … I # I I ……I #
where the string contains respectively n and m I’s between respective pairs of #’s and
Head scans the last #. The function SUM will be Turing computable if we can
design a TM which when started with the initial tape configuration as given above,
halts in the Tape configuration as given below:
# I I … I I ….. I #
where the above string contains n + m consecutive I’s between pair of #’s.
Example 1.7.5
Show that the SUM function is Turing Computable.
The problem under the above-mentioned example may also be stated as: Construct a
TM that finds the sum of two natural numbers.
The following design of the required TM, is not efficient yet explains a number of
issues about which a student should be aware while designing a TM for
computing a function.
# # # ***
↑
q0
representing n = 0, m =0
Configuration (ii)
# # I … # ***
↑
q0
n = 0, m ≠ 0
Configuration (iii)
# I … # # ***
↑
q0
n ≠ 0, m = 0
Configuration (iv)
# I … # I … # ***
↑
q0
n ≠0, m ≠ 0
# … # … # … #
↑
q0
containing two or more than two #’s to the left of # being scanned in initial
configuration, as valid, where ‘…’ denotes sequence of I’s only.
Configuration (v)
*** I … ***
↑
Where at least one of *** does not contain # and initially the Head is scanning an I or
any symbol other than # . The configuration is invalid as it does not contain required
number of #’s.
Configuration (vii)
# … # ***
↑
Where *** does not contain # then the configuration represents only one of the
natural numbers.
Also, in case of legal initial configurations, the final configuration that represents the
result m + n should be of the firm.
# ….. #
↑
halt
(ii) the TM Head attempts to fall off the left edge (i.e., the TM has Hanging
configuration); or
(iii) the TM does not have a move in a non-Halt state.
(b) In this case of legal moves for TM for SUM function, first move of the Head
should be to the Left only
(c) In this case, initially there are at least two more #’s on the left of the # being
scanned. Therefore, to keep count of the #’s, we must change state after
scanning each # . Let q1, q2 and q3 be the states in which the required TM enters
after scanning the three #’s
(d) In this case the movement of the Head, after scanning the initial # and also after
scanning one more # on the left, should continue to move to the Left only, so as
to be able to ensure the presence of third # also. Also, in states q1 and q2, the
TM need not change state on scanning I.
Thus we have,
δ (q0, #) = (q1, #, L),
25
Complexity & δ (q1, #) = (q2, #, L)
Completeness and
δ(q1, I) = (q1, I, L), δ(q2, I) = (q2, I, L).
However, from this point onward, the Head should start moving to the Right.
∴ δ (q2, #) = (q3, #, R).
Thus, at this stage we are in a configuration of the form.
# #
↑
q3
For further guidance in the matter of the design of the required TM, we
again look back on the legal configurations.
(e) In the configuration just shown above in q3, if the symbol being scanned is # (as
in case of configuration (i) and configuration (ii)), then the only action required
is to skip over I’s, if any, and halt at the next # on the right.
However, if the symbol being scanned in q3 of the above configuration, happens
to be an I (as in case of configuration (iii) and configuration (iv)) then the
actions to be taken, that are to be discussed after a while, have to be different.
But in both cases, movement of the Head has to be to the Right. Therefore, we
need two new states say q4 and q5 such that
δ(q3, #) = (q4, #, R)
(the processing⏐scanning argument on the left, is completed).
δ(q3, I) = (q5, I, R)
(the scanning of the argument on the left, is initiated).
Taking into consideration the cases of the initial configuration (i) and configuration
Next, taking into consideration the cases of initial configuration (iii) and configuration
(iv) cases, we decide about next moves including the states etc., in the current state
q5.
# I # #
↑
q5
Where the blank spaces between #’s may be empty or non-empty sequence of I’s.
Next landmark symbol is the next # on the right. Therefore, we may skip over the I’s
without changing the state i.e.,
δ(q5, I) = (q5, I, R)
But we must change the state when # is encountered in q5, otherwise, the next
sequence of I’s will again be skipped over and we will not be able to distinguish
between configuration (iii) and configuration (iv) for further necessary action.
26
Therefore, Models for Executing of
Algorithms – III: TM
δ(q5, #) = (q6, #, R)
(notice that, though at this stage, scanning of the argument on the left is completed,
yet we can not enter in state q4, as was done earlier, because in this case, the
sequence of subsequent actions have to be different. In this case, the# in the middle
has to be deleted, which is not done in state q4).
# # #
↑
q6
Next, in q6, if the current symbol is a #, as is the case in configuration (iii), then we
must halt after moving to the left i.e.,
δ(q6, #) = (halt, #, L)
we reach the final configuration
0# I # #
↑
halt
# I # I #
↑
q6
Then the following sequence of actions is required for deleting the middle #:
Action (i): To remove the # in the middle so that we get a continuous sequence of I’s
to represent the final result. For this purposes, we move to the left and replace the #
by I. But then it will give one I more than number of I’s required in the final result.
Therefore,
Action (ii): We must find out the rightmost I and replace the rightmost I by # and
stop, i.e., enter halt state. In order to accomplish Action (ii) we reach the next # on
the right, skipping over all I’s and then on reaching the desired #, and then move left
to an I over there. Next, we replace that I by # and halt.
δ(q6, I) = (q7, I, L)
δ(q7, #) = (q8, I, R)
(at this stage we have replaced the # in the middle of two sequences of I’s by an I)
δ(q8, I) = (q8, I, R)
δ(q8, #) = (q9, #, L)
δ(q9, I) = (halt, #, N)
27
Complexity & It can be verified that through above-mentioned moves, the designed TM does not
Completeness have a next-move at some stage in the case of each of the illegal configurations.
I #
q0 - (q1, #, L)
q1 (q1, I, L) (q2, #, L)
q2 (q2, I, L) (q3, #, R)
q3 (q5, I, R) (q4, #, R)
q4 (q4, I, R) (halt, #, N)
q5 (q5, I, R) (q6, #, R)
q6 (q7, I, L) (halt, #, L)
q7 - (q8, I, R)
q8 (q8, I, R) (q9, #, L)
q9 (halt, #, N)
halt - -
Remark 1.7.6
As mentioned earlier also in the case of design of TM for recognizing the language of
strings of the form bndn, the design given above contains too detailed explanation of
the various steps. The purpose is to explain the involved design process in fine
details for better understanding of the students. However, the students need not
supply such details while solving a problem of designing TM for computing a
function. While giving the values of Q, ∑, ⎡ explicitly and representing δ either by a
table or a transition diagram, we need to give only some supporting statements to help
understanding of the ideas involved in the definitions of Q, ∑, ⎡ and δ.
Example 1.7.7
Construct a TM that multiplies two integers, each integer greater than or equal to zero
(Problem may also be posed as: show that multiplication of two natural numbers is
Turing Computable).
The legal and illegal configurations for this problem are the same as those of the
problem of designing TM for SUM function. Also, the moves required to check the
validity of input given for SUM function are the same and are repeated below:
δ( q0, #) = (q1, #, L)
δ(q1, #) = (q2, #, L)
δ(q1, I) = (q1, I, L)
δ(q2, #) = (q3, #, R)
δ(q2, I) = (q2, I, L)
# # #
↑
q3
To get representation of zero, as, one of the multiplier and multiplic and is zero, the
result must be zero. We should enter state say q4 which skips all I’s and meets the
next # on the right.
Once the Head meets the required #, Head should move to the left replacing all I’s by
#’s and halt on the # it encounters so that we have the configuration
# # #
↑
Halt
The moves suggested by the above explanation covering configuration (i) and
configuration (ii) are:
δ (q3, #) = (q4, #, R)
δ (q4, I) = (q4, I, R)
δ (q4, #) = (q5, #, L)
δ (q5, I) = (q5, #, L)
δ (q5, #) = (Halt, #, R)
Case II
# I # #
↑
q3
If we take δ(q3, I) = (q4, #, R), then we get the following desired configuration in
finite number of moves:
# # # # # #
↑
Halt
Case III
While covering the configuration (iv), At one stage, we are in the configuration
⏐ ← n I’s → ⏐ ⏐← m I’s →⏐
# I … # I #
↑
q3
⏐ ← m n I’s →⏐
# # … # I I… I #
↑
Halt 29
Complexity & The strategy to get the representation for n m I’s consists of the following steps:
Completeness
(i) Replace the left-most I in the representation of n by # and then copy the m I’s in
the cells which are on the right of the # which was being scanned in the initial
configuration. In the subsequent moves, copying of I’s is initiated in the cells
which are in the left-most cells on the right hand of last I’s on the tape,
containing continuous infinite sequence of #’s.
Repeat the process till all I’s of the initial representation of n, are replaced by #.
At this stage, as shown in the following figure, the tape contains m I’s of the
initial representation of the integer m and additionally n.m I’s. Thus the tape
contains m extra #’s than are required in the representation of final result.
Hence, we replace all I’s of m by #’s and finally skipping over all I’s of the
representation of (n . m) we reach the # which is on the right of all the (n . m)
I’s on the tape as required.
# # I # I ….. I # I ……… I #
⏐← m I’s → ⏐ ⏐← ((n−1).m) I’s → ⏐ ↑
Then we replace the # between two sequences of I’s by I and replace the right-most I
by # and halt.
The case of illegal initial configurations may be handled on similar lines as were
handed for SUM Turing machine
Remark 1.7.8
The informal details given above for the design of TM for multiplication function
are acceptable as complete answer/solution for any problem about design of a
Turing Machine. However, if more detailed formal design is required, the
examiner should explicitly mention about the required details.
Details of case (iii) are not being provided for the following reasons:
(i) Details are left as an exercise for the students
(ii) After some time we will learn how to construct more complex machines out of
already constructed machines, starting with the construction of very simple
machines. One of the simple machines discussed later is a copying machine
which copies symbols on a part of the tape, in other locations on the tape.
Ex. 7) Design a TM to compute the binary function MONUS (or also called PROPER
SUBTRACTION) defined as follows:
Monus : N × N → N
such that
⎧m − n if m ≥ n
monus (m, n) = ⎨
⎩ 0 else
1.8 SUMMARY
In this unit, after giving informal idea of what a Turing machine is, the concept is
formally defined and illustrated through a number of examples. Further, it is explained
how TM can be used to compute mathematical functions. Finally, a technique is
explained for designing more and more complex TMs out of already designed TMs,
starting with some very simple TMs.
1.9 SOLUTIONS/ANSWERS
Ex. 1)
The transition diagram of the required TM is as shown below:
Figure: 1.9.1
The next move function δ is given by the transition diagram above. If the
input string is of even length the TM reaches the halt state h. However, if the
input string is of odd length, then TM does not find any next move in state q1
indicating rejection of the string.
Ex. 2)
Figure: 1.9.2
Ex. 3)
Figure: 1.9.3
In state q1, we move towards left skipping over all symbols to reach the
leftmost symbol of the tape and enter state q5.
In q5, we start searching for b by moving to the right skipping over all non-
blank symbols except b and if such b exists, reach state q2.
In state q2, we move towards left skipping over all symbols to reach the
leftmost symbol of the tape and enter q6.
In q6, we start searching for c by moving to the right skipping over all non-
blank symbols except c and if such c exists, reach state q3.
In state q2, we move towards left skipping all symbols to reach the leftmost
symbol of the tape and enter state q0.
If in any one of the states q4, q5 or q6 no next move is possible, then reject the
string.
Else repeat the above process till all a’s are converted to A’s, all b’s to B’s
and all c’s to C’s.
32
Step II: is concerned with the restoring of a’s from A’s, b’s from B’s and c’s Models for Executing of
Algorithms – III: TM
from C’s, while moving from right to left in state q7 and then after
successfully completing the work move to halt state h.
Ex. 4)
The Transition Diagram of the TM that recognizes strings of the form bn dn,
n ≥1 and designed in the previous section is given by the following Figure.
Figure: 1.9.4
Ex. 5)
The transition Figure of the required TM is as shown below.
Figure: 1.9.5
(i) In state q0, at any stage if TM finds the blank symbol then TM has
found a palindrome of even length. Otherwise, it notes the symbol
being read and attempts to match it with last non-blank symbol on the
tape. If the symbol is a, the TM replaces it by # goes to state q1, in
which it skips all a’s and b’s and on #, the TM from q1 will go to q3 to
find a matching a in last non-blank symbol position. If a is found, TM
goes to q5 replace a by #. However, if b is found then TM has no more
indicating the string is not a palindrome. However, if in state q2 only #’s
are found, then it indicates that the previous ‘a’ was the middle most
symbol of the given string indicating palindrome of odd length.
33
Complexity & Similar is the case when b is found in state q0, except that the next state is
Completeness q2 in this case and roles of a’s and b’s are interchanged in the above
argument.
Figure: 1.9.6
# a A b # # ***
and for the input, output on the tape is of the form
# a a b # A a b # # ***
But, in this process original string of a’s and b’s is converted to a string of A’s
and B’s. At this stage TM goes from q1 to state q8 to replace each A by a and
each B by b. This completes the task.
Ex.7)
# I …. I # I .. I #
# I …. I # I …. I #
↑ ↑
q0
and as observed above
δ (q0, #) = (q1, #, L)
I I #
↑
q1
and
# #
↑
q1
guide us to moves
δ (q1, I) = (q2, #, L)
change of state is essential else other I’s will also be converted to #’s,
δ (q1, #) = ( halt, #, N)
Observation 3: The moves are guided by principle that convert the left-most I to
# on the right side the corresponding right-most I to # on the left-side
δ (q2, I) = (q2, I, L)
δ (q2, #) = (q3, #, L)
δ (q3, I) = (q3, I, L)
δ (q3, #) = (q4, #, R)
(We have reached the right-most # on the left of all I’s as shown below)
# # #
↑
q4
# # #
↑
q4
36
then it must have resulted from initial configuration in which m < n represented by Models for Executing of
Algorithms – III: TM
say
# I I # I I I #
↑
q4
Therefore, we must now enter a state say q7 which skips all I’s on the right and then
halts
Therefore
δ (q4, #) = (q7, #, R)
δ (q7, I) = (q7, I, R)
δ ( q7, #) = ( halt, #, N)
δ (q5, I) = (q5, I, R)
δ (q5, #) = (q6, #, R)
(the middle # is being crossed while moving from left to right)
δ (q6, I) = (q6, I, R)
δ (q6, #) = (q0, #, N)
(the left-most # on right side is scanned in q6 to reach q0 so that whole process
may be repeated again.)
Summarizing the above moves the transition table for δ function is given by
I #
q0 (q1, #, L)
q1 (q2, #, L) (halt, #,L)
q2 (q2, I, L) (q3, #, L)
q3 (q3, I, L) (q4, #, L)
q4 (q5, #, R) (q7, #, R)
q5 (q5, I, R) (q6, #, R)
q6 (q6, I, R) (q6, # R)
q7 (q7, I, R) (halt, #, N)
Halt - -
Ex.8)
# I ... I # # …
↑
q0
1 44 2 4 43
n I 's
# #
↑
halt
37
Complexity & If n is odd, then f(x) = 1 which is represented by f (n) = 1 which is represented
Completeness by a final configuration of the form,
# I #
↑
halt
δ (q0, #) = (q2, #, L)
δ (q2, I) = (q1, #, L)
δ (q2, #) = (halt, #, N)
δ (q1, I) = (q2, #, L)
δ (q1, #) = (q3, #, R)
δ (q3, #) = (halt, I, R)
δ ( qi, ak) = (qj, al, m), the sequence of actions is as follows: First al is written
in the current cell so far containing ak. Then movement of tape head is made
to left, to right or ‘no move’ respectively according as the value of m is L, R
or N. Finally the state of the control changes to qj.
δ # I
q0 (q2, #,L) (q1, #, L)
q1 (q3, #, R) (q2, #, L)
q2 (halt, #, N) (q1, #, L)
q3 (halt, I, R) -
halt - -
38
Models for Executing of
Algorithms – III: TM
39
Algorithmically
Unsolvable Problems
UNIT 2 ALGORITHMICALLY
UNSOLVABLE PROBLEMS
Structure Page Nos.
2.0 Introduction 39
2.1 Objectives 39
2.2 Decidable and Undecidable Problems 39
2.3 The Halting Problem 40
2.4 Reduction to Another Undecidable Problem 44
2.5 Undecidability of Post Correspondence Problem 46
2.6 Undecidable Problems for Context Free Languages 47
2.7 Other Undecidable Problems 48
2.8 Summary 49
2.9 Solutions/Answers 49
2.10 Further Readings 52
2.0 INTRODUCTION
In this unit, we discuss issues and problems that exhibit the limitations of computing
devices in solving problems. We also prove the undecidability of the halting
problem. It is related to Gödel's Incompleteness Theorem which states that there is
no system of logic strong enough to prove all true sentences of number theory.
2.1 OBJECTIVES
After going through this unit, you should be able to:
39
Complexity & where,
Completeness
q0 ω denotes the initial configuration with left-most symbol of the string ω being
scanned in state q0 and qf g(ω) denotes the final c.
For some problems, we are interested in simpler solution in terms of “yes” or “no”.
For example, we consider problem of context free grammar i.e., for a context free
grammar G, Is the language L(G) ambiguous. For some G the answer will be “yes”,
for others it will be “no”, but clearly we must have one or the other. The problem is
to decide whether the statement is true for any G we are given. The domain for this
problem is the set of all context free grammars. We say that a problem is decidable if
there exists a Turing machine that gives the correct answer for every statement in the
domain of the problem.
A class of problems with two outputs “yes” or “no” is said to be decidable (solvable)
if there exists some definite algorithm which always terminates (halts) with one of
two outputs “yes” or “no”. Otherwise, the class of problems is said to be undecidable
(unsolvable).
Trial solution: Just run the machine M with the given input w.
What we need is an algorithm that can determine the correct answer for any M and w
by performing some analysis on the machine’s description and the input. But, we will
show that no such algorithm exists.
Let us see first, proof devised by Alan Turing (1936) that halting problem is
unsolvable.
40
Algorithmically
Suppose you have a solution to the halting problem in terms of a machine, say, H. Unsolvable Problems
H takes two inputs:
1. a program M and
2. an input w for the program M.
M halt
w loop
So now H can be revised to take M as both inputs (the program and its input) and H
should be able to determine if M will halt on M as its input.
Let us construct a new, simple algorithm K that takes H's output as its input and does
the following:
1. if H outputs “loop” then K halts,
2. otherwise H’s output of “halt” causes K to loop forever.
M
M halt
loop
M loop halt
H
K K halt
loop
K loop halt
H
If H says that K halts then K itself would loop (that’s how we constructed it).
If H says that K loops then K will halt.
In either case H gives the wrong answer for K. Thus H cannot work in all cases.
41
Complexity & We’ve shown that it is possible to construct an input that causes any solution H to
Completeness fail. Hence, The halting problem is undecidable.
Theorem 1.1: There does not exist any Turing machine H that behaves as required
by Definition 1.1. The halting problem is therefore undecidable.
qy
WM w
q0
qn
We can achieve this by adding two more states say, q1 and q2. Transitions are defined
from qy to q1, from q1 to q2 and from q2 to q1, regardless of the tape symbol, in such a
way that the tape remains unchanged. This is shown by another block diagram given
below.
42
Algorithmically
Unsolvable Problems
qy
q1 q2
WM w
q0
qn
Here, ∞ stands for Turing machine is in infinite loop i.e., Turing machine will run
forever. Next, we construct another Turing machine H2 from H1. This new machine
takes as input WM and copies it, ending in its initial state q0. After that, it behaves
exactly like H1. The action of H2 is such that
This clearly contradicts what we assumed. In either case H2 gives the wrong answer
for WM. Thus H cannot work in all cases.
We’ve shown that it is possible to construct an input that causes any solution H to
fail. Hence, the halting problem is undecidable.
Theorem 2.2: If the halting problem were decidable, then every recursively
enumerable language would be recursive. Consequently, the halting problem is
undecidable.
2. If H says “yes”, then apply M to w. But M must halt, so it will ultimately tell
us whether w is in L or not.
43
Complexity & This constitutes a membership algorithm, making L recursive. But, we know that
Completeness there are recursively enumerable languages that are not recursive. The contradiction
implies that H cannot exist i.e., the halting problem is undecidable.
Not so surprising, Although this result is sweeping in scope, may be it is not too
surprising. If a simple question such as whether a program halts or not is
undecidable, why should one expect that any other property of the input/output
behaviour of programs is decidable? Rice’s theorem makes it clear that failure to
decide halting implies failure to decide any other interesting question about the
input/output behaviour of programs. Before we consider Rice’s theorem, we need to
understand the concept of problem reduction on which its proof is based.
One may ask, Why is this important? A reduction of problem B to problem A shows
that problem A is at least as difficult to solve as problem B.Also, we can show the
following:
Proof: We prove that the halting problem is reducible to the totality problem. That
is, if an algorithm can solve the totality problem, it can be used to solve the halting
problem. Since no algorithm can solve the halting problem, the totality problem must
also be undecidable.
The reduction is as follows. For any TM M and input w, we create another TM M1
that takes an arbitrary input, ignores it, and runs M on w. Note that M1 halts on all
inputs if and only if M halts on input w. Therefore, an algorithm that tells us whether
44
Algorithmically
M1 halts on all inputs also tells us whether M halts on input w, which would be a Unsolvable Problems
solution to the halting problem.
Proof: We prove that the totality problem is reducible to the equivalence problem.
That is, if an algorithm can solve the equivalence problem, it can be used to solve the
totality problem. Since no algorithm can solve the totality problem, the equivalence
problem must also be unsolvable.
The reduction is as follows. For any TM M, we can construct a TM M1 that takes any
input w, runs M on that input, and outputs “yes” if M halts on w. We can also
construct a TM M2 that takes any input and simply outputs “yes.” If an algorithm can
tell us whether M1 and M2 are equivalent, it can also tell us whether M1 halts on all
inputs, which would be a solution to the totality problem.
Practical implications
• The fact that the totality problem is undecidable means that we cannot write a
program that can find any infinite loop in any program.
• The fact that the equivalence problem is undecidable means that the code
optimization phase of a compiler may improve a program, but can never
guarantee finding the optimally efficient version of the program. There may be
potentially improved versions of the program that it cannot even be sure are
equivalent.
• A property of a program (TM) can be viewed as the set of programs that have
that property.
45
Complexity &
Completeness
• The language accepted by a TM contains two different strings of the same
length.
Rice’s theorem can be used to show that whether the language accepted by a Turing
machine is context-free, regular, or even finite, are undecidable problems.
Not all properties of programs are functional.
Let Σ be an alphabet, and let L and M be two lists of nonempty strings over Σ, such
that L and M have the same number of strings. We can represent L and M as follows:
• Each of the integers is less than or equal to k. (Recall that each list has k
strings).
If there exists the sequence (i, j, k, ..., m) satisfying above conditions then
(i, j, k, ..., m) is a solution of PCP.
Alphabet Σ = { a, b }
List L = (a, ab)
List M = (aa, b)
We see that ( 1, 2 ) is a sequence of integers that solves this PCP instance, since the
concatenation of a and ab is equal to the concatenation of aa and b
(i.e ,w1 w2 = v1 v2 = aab). Other solutions include: ( 1, 2, 1, 2 ) , ( 1, 2, 1, 2, 1, 2 )
and so on.
List L = ( 0, 01000, 01 )
List M = ( 000, 01, 1 )
46
Algorithmically
Unsolvable Problems
2.6 UNDECIDABLE PROBLEMS FOR CONTEXT
FREE LANGUAGES
The Post correspondence problem is a convenient tool to study undecidable questions
for context free languages. We illustrate this with an example.
Theorem 1.2: There exists no algorithm for deciding whether any given context-free
grammar is ambiguous.
and
where the set of productions P is the union of the two subsets: the first set PA
consists of
S → SA,
SA →uiSAai | uiai, i = 1, 2,…, n,
the second set PB consists of
B
S → SB, B
SB →viSBai | viai,
B B i = 1, 2,…, n.
Now take
Then,
LA = L(GA),
LB = L(GB),
B B
and
L (G) = LA ∪ LB. B
(G) ends with ai, then its derivation with grammar GA must have started with S⇒ ui
SA ai . Similarly, we can tell at any later stage which rule has to be applied. Thus, If G
is ambiguous it must be because there is w for which there are two derivations
47
Complexity & S⇒ SA⇒ uiSai ⇒* ui uj… ukak …ajai = w
Completeness and
S⇒ SB⇒ viSai ⇒* vi vj… vkak …ajai = w.
B
Consequently, if G is ambiguous, then the Post correspondence problem with the pair
(A, B) has a solution. Conversely, If G is unambiguous, then the Post correspondence
problem cannot have solution.
If there existed an algorithm for solving the ambiguity problem, we could adapt it to
solve the Post correspondence problem. But, since there is no algorithm for the Post
correspondence problem, we conclude that the ambiguity problem is undecidable.
• Does Turing machine M halt for any input? (That is, is L(M)=∅?)
• If G is a unrestricted grammar.
• Does L(G) = ∅ ?
• Does L(G) = ∅ ?
• Does L1 ∩ L2 = ∅ ?
• Does L1 = L2 ?
• Does L1 ⊆ L2 ?
• Does L empty ?
• Does L finite ?
Hint: The problem is described as follows: Given any Turing machine M = (Q, Σ,
Τ, δ, q0, F) and any q ∈ Q, w∈ Σ+, to determine whether Turing machine M,
when given input w, ever enters state q.
48
Algorithmically
Hint: The problem is described as follows: Given a Turing machine M, Does Unsolvable Problems
Turing machine M halts when given a blank input tape?
Alphabet Σ = { 0, 1, 2 }
List L = ( 0, 1, 2 )
List M = ( 00, 11, 22 )
Does PCP have a solution ?
Alphabet Σ = { a, b }
List L = ( ba, abb, bab )
List M = ( bab, bb, abb )
Does PCP have a solution ?
Ex. 5) Does PCP with two lists A = (b, babbb, ba) and B = (bbb, ba, a) have a
solution ?
Ex. 6) Does PCP with two lists A = (ab, b, b) and (abb, ba, bb) have a solution ?
Ex.7) Show that there does not exist algorithm for deciding whether or not
2.8 SUMMARY
• We have not said that undecidable means we don’t know of a solution today
but might find one tomorrow. It means we can never find an algorithm for the
problem.
• We can show no solution can exist for a problem A if we can reduce it into
another problem B and problem B is undecidable.
2.9 SOLUTIONS/ANSWERS
Ex. 1)
49
Complexity & The only way a Turing machine M halts is if it enters a state q for which
Completeness some transition function δ(qi, ai) is undefined. Add a new final state Z to
the Turing machine, and add all these missing transitions to lead to state Z.
Now use the (assumed) state-entry procedure to test whether state Z is ever
entered when M is given input w. This will reveal whether the original
machine M halts. We conclude that it must not be possible to build the
assumed state-entry procedure.
Ex. 2)
Here, we will reduce the blank tape halting problem to the halting problem.
Given M and w, we first construct from M a new machine Mw that starts
with a blank tape, writes w on it, then positions itself in configuration q0w.
After that, Mw acts exactly like M. Hence, Mw will halt on a blank tape if
and only if M halts on w.
Suppose that the blank tape halting problem were decidable. Given any M
and w, we first construct Mw, then apply the blank tape halting problem
algorithm to it. The conclusion tells us whether M applied to w will halt.
Since this can be done for any M and w, an algorithm for the blank tape
halting problem can be converted into an algorithm for the halting problem.
Since the halting problem is undecidable, the same must be true for the
blank tape halting problem.
Ex. 3)
There is no solution to this problem, since for any potential solution, the
concatenation of the strings from list L will contain half as many letters as
the concatenation of the corresponding strings from list M.
Ex. 4)
ba
bab
The next choice from L must begin with b. Thus, either we choose w1 or w3
as their string starts with symbol b. But, the choice of w1 will make two
string look like:
baba
babbab
While the choice of w3 direct to make choice of v3 and the string will look
like:
50
Algorithmically
babab Unsolvable Problems
bababb
Since the string from list M again exceeds the string from list L by the
single symbol 1, a similar argument shows that we should pick up w3 from
list L and v3 from list M. Thus, there is only one sequence of choices that
generates compatible strings, and for this sequence string M is always one
character longer. Thus, this instance of PCP has no solution.
Ex. 5)
We see that ( 2, 1, 1, 3 ) is a sequence of integers that solves this PCP
instance, since the concatenation of babbb, b, b and ba is equal to the
concatenation of ba, bbb, bbb and a (i.e., w2 w1 w1 w3 = v2 v1 v1 v3 =
babbbbbba).
Ex. 6)
For each string in A and corresponding string in B, the length of string of
A is less than counterpart string of B for the same sequence number.
Hence, the string generated by a sequence of strings from A is shorter than
the string generated by the sequence of corresponding strings of B.
Therefore, the PCP has no solution.
Ex. 7)
Proof : Consider two grammars
and
SB →viSBai | viai,
B B i = 1, 2,…, n.
where consider two sequences of strings A = (u1, u2, … , um) and B = (v1,
v2, … , vm) over some alphabet ∑. Choose a new set of distinct symbols a1,
a2, … , am, such that
and
Then the pair (A, B) has a PC-solution. Conversely, if the pair does not
have a PC- solution, then L(GA) and L(GB) cannot have a common
B
51
Complexity &
Completeness 2.10 FURTHER READINGS
1. Elements of the Theory of Computation, H.R. Lewis & C.H.Papadimitriou: PHI,
(1981).
52
Complexity of
UNIT 3 COMPLEXITY OF ALGORITHMS Algorithms
3.0 INTRODUCTION
In unit 2 of the block, we discussed a number of problems which cannot be solved by
algorithmic means and also discussed a number of issues about such problems.
The concept of the size of a problem, though a fundamental one, yet is difficult to
define precisely. Generally, the size of a problem, is measured in terms of the size of
the input. The concept of the size of an input of a problem may be explained
informally through examples. In the case of multiplication of two nxn (squares)
matrices, the size of the problem may be taken as n2, i.e, the number of elements in
each matrix to be multiplied. For problems involving polynomials, the degrees of the
polynomials may be taken as measure of the sizes of the problems.
(i) However, for every NTM solution, there is a Deterministic TM (DTM) solution
of a problem. Therefore, if there is an NTM solution of a problem, then there is
an algorithmic solution of the problem. However, the symmetry may end here.
P denotes the class of all problems, for each of which there is at least one
known polynomial time Deterministic TM solving it.
NP denotes the class of all problems, for each of which, there is at least one
known Non-Deterministic polynomial time solution. However, this solution
may not be reducible to a polynomial time algorithm, i.e, to a polynomial time
DTM.
Thus starting with two distinct classes of problems, viz., tractable problems and
intractable problems, we introduced two classes of problems called P and NP. Some
interesting relations known about these classes are:
(i) P = set of tractable problems
(ii) P⊆ NP.
(The relation (ii) above simply follows from the fact that every Deterministic TM is a
special case of a Non-Deterministic TM).
However, it is not known whether P=NP or P ⊂ NP. This forms the basis for the
subject matter of the rest of the chapter. As a first step, we introduce some notations
to facilitate the discussion of the concept of computational complexity.
54
Complexity of
3.1 OBJECTIVES Algorithms
Suppose, a supercomputer executes instructions one million times faster than another
computer. Then irrespective of the size of a (solvable) problem and the solution used
to solve it, the supercomputer solves the problem roughly million times faster than the
computer, if the same solution is used on both the machines to solve the problem.
Thus we conclude that the time requirement for execution of a solution, changes
roughly by a constant factor on change in hardware, software and environmental
factors.
Similarly, computers, when required, are generally used not to find roots of quadratic
equations but for finding roots of complex equations including polynomial equations
of degrees more than hundreds or sometimes even thousands.
55
Complexity & The above discussion leads to the conclusion that when considering time complexities
Completeness f1(n) and f2(n) of (computer) solutions of a problem of size n, we need to consider and
compare the behaviours of the two functions only for large values of n. If the relative
behaviours of two functions for smaller values conflict with the relative behaviours
for larger values, then we may ignore the conflicting behaviour for smaller values.
For example, if the earlier considered two functions
f1(n) = 1000 n2 and
f2(n) = 5n4
represent time complexities of two solutions of a problem of size n, then despite the
fact that
f1 (n) ≥ f2 (n) for n ≤ 14,
we would still prefer the solution having f1 (n) as time complexity because
This explains the reason for the presence of the phrase ‘n ≥ k’ in the definitions
of the various measures of complexities discussed below:
f(n) = n2 – 5n and
g(n) = n2
then
56
Remark 3.2.3.1 Complexity of
Algorithms
In the discussion of any one of the five notations, generally two functions say f and g
are involved. The functions have their domains and Codomains as N, the set of natural
numbers, i.e.,
f: N→N
g: N→N
Remark 3.2.3.2
The purpose of these asymptotic growth rate notations and functions denoted by these
notations, is to facilitate the recognition of essential character of a complexity
function through some simpler functions delivered by these notations. For example, a
complexity function f(n) = 5004 n3 + 83 n2 + 19 n + 408, has essentially the same
behaviour as that of g(n) = n3 as the problem size n becomes larger and larger. But
g(n) = n3 is much more comprehensible than the function f(n). Let us discuss the
notations, starting with the notation O.
Solutions
Part (i)
Consider
57
Complexity & Thus, we have found the required constants C and k. Hence f(x) is O(x3).
Completeness
Part (ii)
Part (iii)
Part (iv)
We prove the result by contradiction. Let there exist positive constants C and k
such that
Part (v)
Example: The big-oh notation can be used to estimate Sn, the sum of first n positive
integers
58
Remark 3.2.4.2 Complexity of
Algorithms
It can be easily seen that for given functions f(x) and g(x), if there exists one pair of C
and k with f(x) ≤ C.g (x) for all x ≥ k, then there exist infinitely many pairs (Ci, ki)
which satisfy
Because for any Ci ≥ C and any ki ≥ k, the above inequality is true, if f(x)≤ c.g(x) for
all x ≥ k.
Let f(x) and g(x) be two functions, each from the set of natural numbers or set of
positive real numbers to positive real numbers.
Solutions:
Part (i)
Part (ii)
Part (iii)
59
Complexity & It can be easily seen that lesser the value of C, better the chances of the above
Completeness inequality being true. So, to begin with, let us take C = 1 and try to find a value of k
s.t
2x3− 4x2+2 ≥ 0.
Part (iv)
x3 = Ω (2x3−3x2+2)
be true. Therefore, let C>0 and k > 0 be such that
x3 ≥ C(2(x3−3/2 x2 +1))
For C = ½ and k = 1, the above inequality is true.
Part (v)
Let x2 = Ω (3x3−2x2+2)
2C + 1
≥ x for all x ≥ k
C
(2 C + 1)
But for any x ≥ 2 ,
C
The above inequality can not hold. Hence contradiction.
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers. Then f(x) said to be Θ (g(x)) (pronounced as
big-theta of g of x) if, there exist positive constants C1, C2 and k such that
C2 g(x) ≤ f(x) ≤ C1 g(x) for all x ≥ k.
(Note the last inequalities represent two conditions to be satisfied simultaneously viz.,
C2 g(x) ≤ f(x) and f(x) ≤ C1 g(x)).
We state the following theorem without proof, which relates the three functions
O, Ω, Θ
Theorem: For any two functions f(x) and g(x), f(x) = Θ (g(x)) if and only if
f(x) = O (g(x)) and f(x) = Ω (g(x)).
60
Examples 3.2.6.1: For the function Complexity of
f(x) = 2 x3 + 3x2 + 1, show that Algorithms
Solutions
Part (i)
for C1 = 3, C2 = 1 and k = 4
Part (ii)
Let, if possible for some positive integers k and C1, we have 2x3+3x2+1≤C1. x2 for all
x≥k
Then
x3≤ C1 x2 for all x≥k
i.e.,
x≤ C1 for all x≥k
But for
x= max {C1 + 1, k }
The last inequality is not true
Part (iii)
f(x) ≠ Θ (x4)
C2 x4 ≤ (2x3 + 3x2 + 1)
If such a C2 exists for some k then C2 x4 ≤ 2x3 + 3x2 + 1 ≤ 6x3 for all x ≥ k≥1,
implying
C2 x ≤ 6 for all x ≥ k
⎛ 6 ⎞
But for x = ⎜⎜ +1⎟⎟
⎝ C2 ⎠
the above inequality is false. Hence, proof of the claim by contradiction.
Then for f (x) = O (x3), though there exist C and k such that
f(x) ≤ C (x3) for all x ≥ k
yet there may also be some values for which the following equality also holds
61
Complexity & f(x) = C (x3) for x ≥ k
Completeness
However, if we consider
f(x) = O (x4)
then there can not exist positive integer C s.t
The Notation o
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers.
Further, let C > 0 be any number, then f(x) = o(g(x)) (pronounced as little oh of
g of x) if there exists natural number k satisfying
(ii) The inequality (B) is strict whereas the inequality (A) of big-oh is not
necessarily strict.
Solution
Let C > 0 be given and to find out k satisfying the requirement of little-oh.
Consider
3 1
2+ + <C x
x x3
⎧7 ⎫
if we take k = max ⎨ ,1⎬
⎩C ⎭
then
2x3 + 3x2 + 1 < C x4 for x ≥ k.
therefore
2x3 + 3x2 + 1 < C xn for n ≥ 4
for all x ≥ k
⎧7 ⎫
with k = max ⎨ ,1⎬
⎩c ⎭
62
Part (ii) Complexity of
Algorithms
We prove the result by contradiction. Let, if possible, f(x) = 0(xn) for n≤3.
3 1
2+ + < C xn-3
x x2
n ≤ 3 and x ≥ k
As C is arbitrary, we take
C = 1, then the above inequality reduces to
3 1
2+ + < C. xn-3 for n ≤ 3 and x ≥ k ≥ 1.
x x2
3 1
∴ 2+ + ≤1 for n ≤ 3
x x2
However, the last inequality is not true. Therefore, the proof by contradiction.
we state (without proof) below two results which can be useful in finding small-oh
upper bound for a given function
Theorem 3.2.7.3: Let f(x) and g(x) be functions in definition of small-oh notation.
Next, we introduce the last asymptotic notation, namely, small-omega. The relation of
small-omega to big-omega is similar to what is the relation of small-oh to big-oh.
Let f(x) and g(x) be two functions each from the set of natural numbers or the set of
positive real numbers to set of positive real numbers.
Further
Solution:
Consider
1
2x2 + 3x + > C (dividing throughout by x)
x
Let k be integer with k≥C+1
1
2x2 + 3x + ≥ 2x2 + 3x >2k2 +3k > 2C2 +3C > C. (Θ k ≥ C+1)
x
∴ f(x) = ω (x)
Theorem 3.2.8.3: Let f(x) and g(x) be functions in the definitions of little-omega
Then f(x) = ω g(x) if and only if
64
Lim f (x) Complexity of
= ∞ Algorithms
x → ∞ g (x)
or
Lim g (x)
= 0
x → ∞ f (x)
a) Possible inputs
b) Possible outcomes
c) Entitles occurring and operations on these entities in the (dynamic)
problem domains.
In this sense of definition of a problem, what to talk of solving, most of the problems
can not even be defined. Think of the following problems.
These are some of problems, the definition of each of which require enumeration of
potentially infinite parameters, and hence are almost impossible to define.
(II) Problems which can be formally defined but can not be solved by
computational means. We discussed some of these problems in the previous
unit.
(III) Problems which, though theoretically can be solved by computational means,
yet are infeasible, i.e., these problems require so large amount of
computational resources that practically is not feasible to solve these
problems by computational means. These problems are called intractable or
infeasible problems. The distinguishing feature of the problems is that for
each of these problems any solution has time complexity which is
exponential, or at least non-polynomial, function of the problem size.
65
Complexity & (IV) Problems that are called feasible or theoretically not difficult to solve by
Completeness computational means. The distinguishing feature of the problems is that for
each instance of any of these problems, there exists a Deterministic Turing
Machine that solves the problem having time-complexity as a polynomial
function of the size of the problem. The class of problem is denoted by P.
(V) Last, but probably most interesting class include large number of problems,
for each of which, it is not known whether it is in P or not in P.
These problems fall somewhere between class III and class IV given above
However, for each of the problems in the class, it is known that it is in NP,
i.e., each can be solved by at least one Non-Deterministic Turing Machine,
the time complexity of which is a polynomial function of the size of the
problem.
A problem from the class NP can equivalently but in more intuitive way, be
defined as one for which a potential solution, if given, can be verified in
polynomial time whether the potential solution is actually a solution or not.
The problems in this class, are called NP-Complete problems (to be formally defined
later). More explicitly, a problem is NP-complete if it is in NP and for which no
polynomial-time Deterministic TM solution is known so far.
Most interesting aspect of NP-complete problems, is that for each of these problems
neither, so far, it has been possible to design a Deterministic polynomial-time TM
solving the problem nor it has been possible to show that Deterministic polynomial -
time TM solution can not exist.
The idea of NP-completeness was introduced by Stephen Cook ∗ in 1971 and the
satisfiability problem defined below is the first problem that was proved to be NP-
complete, of course, by S. Cook.
A good source for the study of NP-complete problems and of related topics is Garey &
Johnson+
Problem 1: Satisfiability problem (or, for short, SAT) states: Given a Boolean
expression, is it satisfiable?
(i) Boolean variables x1, x2,..., xi , …, each of which can assume a value either
TRUE ( generally denoted by 1) or FALSE (generally denoted by 0) and
*
Cook S.A: The complexity of Theorem providing procedures, proceedings of the third annual ACM
symposium on the Theory of Computing, New York: Association of Computing Machinery, 1971,
pp. 151-158.
+ Garey M.R. and Johnson D.S. : Computers and Intractability: A guide to the Theory of
NP-Completeness, H.Freeman, New York, 1979.
66
For example Complexity of
Algorithms
For example: Let x1= 0, x2=1, and x3=1 be one of the eight possible assignments to
a Boolean expression involving x1, x2 and x3
Truth-value of a Boolean expression.
Truth value of ( (x1 ∧ x2) ∨ ⎤ x3) for the truth–assignment x1=0, x2=1 and x3=1 is
((0 ∧ 1) ∨ ⎤ 1) = (0 ∨ 0) =0
For example: x1=1, x2=0 and x3= 0 is one assignment that makes the Boolean
expression ((x1 ∧ x2) ∨ ⎤ x3) True. Therefore, ((x1 ∧ x2) ∨ ⎤ x3) is
satisfiable.
Each Ci is called a conjunct. It can be easily shown that every logical expression can
equivalently be expressed in CNF
Problem 3: Satisfiability (or for short, 3SAT) Problem: given a Boolean expression
in 3-CNF, is it satisfiable?
Given a set of cities C= {C1, C2, …. Cn} with n >1, and a function d which assigns to
each pair of cities (Ci, Cj) some cost of travelling from Ci to Cj. Further, a positive
integer/real number B is given. The problem is to find a route (covering each city
exactly once) with cost at most B.
1 2
3 4
Then the above graph has one Hamiltonian circuit viz., (1, 2, 4, 3, 1)
Explanation: A vertex cover for a graph G is a set C of vertices so that each edge of
G has an endpoint in G. For example, for the graph shown above,
{1, 2, 3} is a vertex cover. It can be easily seen that every superset of
a vertex cover of a graph is also a vertex cover of the graph.
1 2
3 4
As the vertices 1, 2, 3 are mutually adjacent therefore, we require at least three colours
for k-colouring problem.
Explanation: For a given graph G = (V, E), two vertices v1 and v2 are said to be
adjacent if there is an edge connecting the two vertices in the graph.
A subgraph H= (V1, E1) of a graph G = (V, E) is a graph such that
68
V1 ⊆ V and E1 ⊆ E. In other words, each vertex of the subgraph is a Complexity of
Algorithms
vertex of the graph and each edge of the subgraph is an edge of the
graph.
For example in the above graph, the subgraph containing the vertices {1, 2, 3}
and the edges (1, 2), (1, 3), (2, 3) is a complete subgraph or a clique of the graph.
However, the whole graph is not a clique as there is no edge between vertices 1 and 4.
Problem 10: Independent set problem: Given a graph G = (V, E) and a positive
integer k, is there an independent set of vertices with at least k elements?
Problem 11: The subgraph isomorphism problem: Given graph G1 and G2,
does G1 contain a copy of G2 as a subgraph?
Explanation: Two graphs H1 = (V1, E1) and H2 = (V2, E2) are said to be isomorphic
if we can rename the vertices in V2 in such a manner that after renaming, the graph H1
and H2 look identical (not necessarily pictorially, but as ordered pairs of sets)
For Example
1 2 a d
3 4
b c
Problem 12: Given a graph g and a positive integer k, does G have an “edge
cover” of k edges?
Explanation: For a given graph G = (V,E ), a subset E1 of the set of edges E of the
graph, is said to be an edge cover of G, if every vertex is an end of at least one of the
edges in E1.
1 2
3 4
The two-edge set {(1, 4), (2, 3)} is an edge cover for the graph.
Problem 13: Exact cover problem: For a given set P = {S1, S2, …, Sm}, where
each Si is a subset of a given set S, is there a subset Q of P such
that for each x in S, there is exactly one Si in Q for which x is in
Si ?
69
Complexity & Example: Let S = {1, 2, …,10}
Completeness
and P = { S1, S2, S3, S4, S5} s.t
S1 = {1, 3, 5}
S2 = {2, 4, 6}
S3 = {1, 2, 3, 4}
S4 = {5, 6, 7, 9, 10}
S5 = {7, 8, 9, 10 }
Then Q = { S1, S2, S5} is a set cover for S.
Problem 14: The knapsack problem: Given a list of k integers n1, n2… nk, can we
partition these integers into two sets, such that sum of integers in each
of the two sets is equal to the same integer?
In this section, we formally define the concept and then describe a general technique
of establishing the NP-Completeness of problems and finally apply the technique to
show some of the problems as NP-complete. We have already explained how a
problem can be thought of as a language L over some alphabet Σ . Thus the terms
problem and language may be interchangeably used.
The process of transformation of the instances of the problem already known to the
undecidable to instances of the problem, the undecidability is to checked, is called
reduction.
Some-what similar, but, slightly different, rather special, reduction called polynomial-
time reduction is used to establish NP-Completeness of problems.
70
The direction of the mapping must be clearly understood as shown below. Complexity of
Algorithms
Polynomial-time
P1 P2
Reduction
Though we have already explained the concept of NP-Completeness, yet for the sake
of completeness, we give below the formal definition of NP-Compleness
In this context, we introduce below another closely related and useful concept.
However, from the above definitions, it is clear that every NP-complete problem L
must be NP-Hard and additionally should satisfy the condition that L is an NP-class
problem.
However, to begin with, there is a major hurdle in execution of the second step. The
above technique of reduction can not be applied unless we already have established at
least one problem as NP-Complete. Therefore, for the first NP-Complete problem, the
NP-Completeness has to be established in a different manner.
The proof of Satisfiability problem as the first NP-Complete problem, is quite lengthy
and we skip the proof. Interested readers may consult any of the text given in the
reference.
71
Complexity & Assuming the satisfiality problem as NP-complete, the rest of the problems that we
Completeness establish as NP-complete, are established by reduction method as explained above.
A diagrammatic notation of the form.
SAT
3-CNF-SAT
Clique Problem
Subset -Sum
Vertex Cover
Hamiltonian Cycle
Travelling Salesman
Figure: 3.1
¬x1 x2 x3
x1 ¬x1
¬x 2 ¬x
2
¬x 3 ¬x
Figure: 3.2
For each of the literals, create a graph node, and connect each node to every node in
other clauses, except those with the same variable but different sign. This graph can
be easily computed from a boolean formula ∅ in 3-CNF-SAT in polynomial time.
Consider an example, if we have−
A vertex cover of an undirected graph G = (V, E) is a subset V'of the vertices of the
graph which contains at least one of the two endpoints of each edge.
73
Complexity &
Completeness
A B C B
A
C
E D F E
F
D
The vertex cover problem is the optimization problem of finding a vertex cover of
minimum size in a graph. The problem can also be stated as a decision problem :
Proof : To show that Vertex cover problem ∈ NP, for a given graph G = (V, E), we
take V’⊆ V and verifies to see if it forms a vertex cover. Verification can be done
by checking for each edge (u, v) ∈ E whether u ∈ V’ or v ∈ V’. This verification can
be done in polynomial time.
Now, We show that clique problem can be transformed to vertex cover problem in
polynomial time. This transformation is based on the notion of the complement of a
graph G. Given an undirected graph G = (V, E), we define the complement of G as
G’ = (V, E’), where E’ = { (u, v) | (u, v) ∉ E}. i.e., G’ is the graph containing exactly
those edges that are not in G. The transformation takes a graph G and k of the clique
problem. It computes the complement G’ which can be done in polynomial time.
To complete the proof, we can show that this transformation is indeed reduction : the
graph has a clique of size k if and only if the graph G’ has a vertex cover of size
|V| − k.
Suppose that G has a clique V’ ⊆ V with |V’| = k. We claim that V – V’ is a vertex
cover in G’. Let (u, v) be any edge in E’. Then, (u, v) ∉ E, which implies that atleast
one of u or v does not belong to V’, since every pair of vertices in V’ is connected by
an edge of E. Equivalently, atleast one of u or v is in V – V’, which means that edge
(u, v) is covered by V – V’. Since (u, v) was chosen arbitrarily from E’, every edge of
E’ is covered by a vertex in V – V’. Hence, the set V – V’, which has size |V| − k,
forms a vertex cover for G’.
Conversely, suppose that G’ has a vertex cover V’ ⊆ V , where |V’| = |V| - k. Then,
for all u, v ∈ V, if (u, v) ∈ E’, then u ∈ V’ or v ∈ V’ or both. The contrapositive of
this implication is that for all u, v ∈ V, if u ∉ V’ and v ∉ V’, then (u, v) ∈ E. In
other words, V – V’ is a clique, and it has size |V| − |V’| = k.
For example, The graph G(V,E) has a clique {A, B, E}. The complement of graph G
is given by G’ and have independent set given by {C, D, F}.
3.6 SUMMARY
In this unit in number of concepts are defined.
P denotes the class of all problems, for each of which there is at least one known
polynomial time Deterministic TM solving it.
NP denotes the class of all problems, for each of which, there is at least one known
Non-Deterministic polynomial time solution. However, this solution may not be
reducible to a polynomial time algorithm, i.e., to a polynomial time DTM.
Next, five Well Known Asymptotic Growth Rate Notations are defined.
Let f(x) and g(x) be two functions, each from the set of natural numbers or set of
positive real numbers to positive real numbers.
The Notation Θ
Provides simultaneously both asymptotic lower bound and asymptotic upper bound
for a given function.
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers. Then f(x) said to be Θ (g(x)) (pronounced as
big-theta of g of x) if, there exist positive constants C1, C2 and k such that
C2 g(x) ≤ f(x) ≤ C1 g(x) for all x ≥ k.
The Notation o
Let f(x) and g(x) be two functions, each from the set of natural numbers or positive
real numbers to positive real numbers.
Further, let C > 0 be any number, then f(x) = o(g(x)) (pronounced as little oh of g of
x) if there exists natural number k satisfying−
75
Complexity & The Notation ω
Completeness
Again the asymptotic lower bound Ω may or may not be tight. However, the
asymptotic bound ω cannot be tight. The formal definition of ω is follows:
Let f(x) and g(x) be two functions each from the set of natural numbers or the set of
positive real numbers to set of positive real numbers.
Further
f(x) = ω (g(x))
if there exist a positive integer k s.t
Finally in Section 3.4, we discussed how some of the problems defined in Section 3.2
are established as NP-Complete.
3.7 SOLUTIONS/ANSWERS
Ex.1)
n!/nn = (n/n) ((n−1)/n) ((n−2)/n) ((n−3)/n)…(2/n)(1/n)
= 1(1−(1/n)) (1-(2/n)) (1−(3/n))…(2/n)(1/n)
Each factor on the right hand side is less than equal to 1 for all value of n.
Hence, The right hand side expression is always less than one.
Therefore, n!/nn ≤ 1
or, n! ≤ nn
Therefore, n! =O( nn)
Ex. 2)
For large value of n, 3logn < < n2
Therefore, 3logn/ n2< < 1
(n2 + 3logn)/ n2 =1 + 3logn/ n2
or, (n2 + 3logn)/ n2 <2
or, n2 + 3logn = O(n2).
76
Complexity of
Algorithms
Ex.3)
Given a set of integers, we have to divide the set in to two disjoint sets such
that their sum value is equal .
To show that the partition problem ∈ NP, for a given set S, we take S1 ⊆ S,
S2 ⊆ S and S1 ∩ S2 = ∅ and verify to see if the sum of all elements of set S1 is
equal to the sum of all elements of set S2. This verification can be done in
polynomial time.
Ex. 5)
Ex. 6)
An independent set is defined as a subset of a vertices in a graph such that no
two vertices are adjacent.
Proof : To show that the independent set problem ∈ NP, for a given graph
G = (V, E), we take V’⊆ V and verifies to see if it forms an independent set.
Verification can be done by checking for u ∈ V’ and v ∈ V’, does (u,v) ∈ E .
This verification can be done in polynomial time.
A B
E
F
C D
Figure: 3.5
Figure. 3.5
A B
E F
C D
Figure: 3.6
Ex.7)
Proof : To show that travelling salesman problem ∈ NP, we show that verification of
the problem can be done in polynomial time. Given a constant M and a
closed circuit path of a weighted graph G = (V, E) . Does such path exists in
graph G and total weight of such path is less than M ?, Verification can be
done by checking, does (u,v) ∈ E and the sum of weights of these edges is
less than M. This verification can be done in polynomial time.
79