Data Structures and Algorithms-II
Data Structures and Algorithms-II
Data Structures and Algorithms-II
D A T A S T R U C T UR E S
AND ALGORITHMS-II
For S.Y.B.Sc. Computer Science : Semester – IV (Paper – I)
[Course Code CS 241 : Credits - 2]
CBCS Pattern
As Per New Syllabus, Effective from June 2020
Price ` 190.00
N5541
DATA STRUCTURES & ALGORITHMS-II ISBN 978-93-90506-31-6
First Edition : January 2021
© : Author
The text of this publication, or any part thereof, should not be reproduced or transmitted in any form or stored in any computer
storage system or device for distribution including photocopy, recording, taping or information retrieval system or reproduced on any disc,
tape, perforated media or other information storage device etc., without the written permission of Author with whom the rights are
reserved. Breach of this condition is liable for legal action.
Every effort has been made to avoid errors or omissions in this publication. In spite of this, errors may have crept in. Any mistake, error
or discrepancy so noted and shall be brought to our notice shall be taken care of in the next edition. It is notified that neither the publisher
nor the author or seller shall be responsible for any damage or loss of action to any one, of any kind, in any manner, therefrom.
DISTRIBUTION CENTRES
PUNE
Nirali Prakashan : 119, Budhwar Peth, Jogeshwari Mandir Lane, Pune 411002, Maharashtra
(For orders within Pune) Tel : (020) 2445 2044; Mobile : 9657703145
Email : [email protected]
Nirali Prakashan : S. No. 28/27, Dhayari, Near Asian College Pune 411041
(For orders outside Pune) Tel : (020) 24690204; Mobile : 9657703143
Email : [email protected]
MUMBAI
Nirali Prakashan : 385, S.V.P. Road, Rasdhara Co-op. Hsg. Society Ltd.,
Girgaum, Mumbai 400004, Maharashtra; Mobile : 9320129587
Tel : (022) 2385 6339 / 2386 9976, Fax : (022) 2386 9976
Email : [email protected]
DISTRIBUTION BRANCHES
JALGAON
Nirali Prakashan : 34, V. V. Golani Market, Navi Peth, Jalgaon 425001, Maharashtra,
Tel : (0257) 222 0395, Mob : 94234 91860; Email : [email protected]
KOLHAPUR
Nirali Prakashan : New Mahadvar Road, Kedar Plaza, 1st Floor Opp. IDBI Bank, Kolhapur 416 012
Maharashtra. Mob : 9850046155; Email : [email protected]
NAGPUR
Nirali Prakashan : Above Maratha Mandir, Shop No. 3, First Floor,
Rani Jhanshi Square, Sitabuldi, Nagpur 440012, Maharashtra
Tel : (0712) 254 7129; Email : [email protected]
DELHI
Nirali Prakashan : 4593/15, Basement, Agarwal Lane, Ansari Road, Daryaganj
Near Times of India Building, New Delhi 110002 Mob : 08505972553
Email : [email protected]
BENGALURU
Nirali Prakashan : Maitri Ground Floor, Jaya Apartments, No. 99, 6th Cross, 6th Main,
Malleswaram, Bengaluru 560003, Karnataka; Mob : 9449043034
Email: [email protected]
Other Branches : Hyderabad, Chennai
Note : Every possible effort has been made to avoid errors or omissions in this book. In spite this, errors may have crept in. Any type of
error or mistake so noted, and shall be brought to our notice, shall be taken care of in the next edition. It is notified that neither the publisher,
nor the author or book seller shall be responsible for any damage or loss of action to any one of any kind, in any manner, therefrom. The
reader must cross check all the facts and contents with original Government notification or publications.
[email protected] | www.pragationline.com
Also find us on www.facebook.com/niralibooks
Preface …
I take an opportunity to present this Text Book on "Data Structures and Algorithms-II"
to the students of Second Year B.Sc. (Computer Science) Semester-IV as per the New
Syllabus, June 2020.
The book has its own unique features. It brings out the subject in a very simple and lucid
manner for easy and comprehensive understanding of the basic concepts. The book covers
theory of Tree, Efficient Search Trees, Graph and Hash Table.
A special word of thank to Shri. Dineshbhai Furia, and Mr. Jignesh Furia for
showing full faith in me to write this text book. I also thank to Mr. Amar Salunkhe and
Mr. Akbar Shaikh of M/s Nirali Prakashan for their excellent co-operation.
I also thank Ms. Chaitali Takle, Mr. Ravindra Walodare, Mr. Sachin Shinde, Mr. Ashok
Bodke, Mr. Moshin Sayyed and Mr. Nitin Thorat.
Although every care has been taken to check mistakes and misprints, any errors,
omission and suggestions from teachers and students for the improvement of this text book
shall be most welcome.
Author
Syllabus …
1. Tree (10 Hrs.)
1.1 Concept and Terminologies
1.2 Types of Binary Trees - Binary Tree, Skewed Tree, Strictly Binary Tree, Full Binary Tree,
Complete Binary Tree, Expression Tree, Binary Search Tree, Heap
1.3 Representation - Static and Dynamic
1.4 Implementation and Operations on Binary Search Tree - Create, Insert, Delete, Search, Tree
Traversals - Preorder, Inorder, Postorder (Recursive Implementation), Level-Order Traversal
using Queue, Counting Leaf, Non-Leaf and Total Nodes, Copy, Mirror
1.5 Applications of Trees
1.5.1 Heap Sort, Implementation
1.5.2 Introduction to Greedy Strategy, Huffman Encoding (Implementation using Priority
Queue)
2. Efficient Search Trees (8 Hrs.)
2.1 Terminology: Balanced Trees - AVL Trees, Red Black Tree, Splay Tree, Lexical Search
Tree -Trie
2.2 AVL Tree - Concept and Rotations
2.3 Red Black Trees - Concept, Insertion and Deletion
2.4 Multi-Way Search Tree - B and B+ Tree - Insertion, Deletion
3. Graph (12 Hrs.)
3.1 Concept and Terminologies
3.2 Graph Representation - Adjacency Matrix, Adjacency List, Inverse Adjacency List, Adjacency
Multi-list
3.3 Graph Traversals - Breadth First Search and Depth First Search (With Implementation)
3.4 Applications of Graph
3.4.1 Topological Sorting
3.4.2 Use of Greedy Strategy in Minimal Spanning Trees (Prim’s and Kruskal’s Algorithm)
3.4.3 Single Source Shortest Path - Dijkstra’s Algorithm
3.4.4 Dynamic Programming Strategy, All Pairs Shortest Path - Floyd Warshall Algorithm
3.4.5 Use of Graphs in Social Networks
4. Hash Table (6 Hrs.)
4.1 Concept of Hashing
4.2 Terminologies - Hash Table, Hash Function, Bucket, Hash Address, Collision, Synonym,
Overflow etc.
4.3 Properties of Good Hash Function
4.4 Hash Functions: Division Function, Mid Square, Folding Methods
4.5 Collision Resolution Techniques
4.5.1 Open Addressing - Linear Probing, Quadratic Probing, Rehashing
4.5.2 Chaining - Coalesced, Separate Chaining
Contents …
1.0 INTRODUCTION
• A tree is a non-linear data structure. A non-linear data structure is one in which its
elements do not forms a sequence.
• A tree data structure is a widely used Abstract Data Type (ADT) that simulates a
hierarchical tree structure, with a root value and subtrees of children with a parent
node, represented as a set of linked nodes.
• A tree data structure stores the data elements in a hierarchical manner. A tree is a
hierarchical data structure that consists of nodes connected by edges.
• Let us see an example of a directory structure stored by operating system. The
operating system organizes files into directories and subdirectories. This can be
viewed as tree shown in Fig. 1.1.
Desktop
Win TC Java
include lib
Fig. 1.1: Tree Representation of Directory Structure
1.1
Data Structures & Algorithms - II Tree
Subtree Edge
Parent B C
node Siblings
1.1.1 Definition
• A tree may be defined as, a finite set ‘T’ of one or more nodes such that there is a node
designated as the root of the tree and the other nodes (excluding the root) are divided
into n ≥ 0 disjoint sets T1, T2, … Tn and each of these sets is a tree in turn. The trees
T1, T2 … Tn are called the sub-trees or children of the root.
1.2
Data Structures & Algorithms - II Tree
• The Fig. 1.3 (a) shows the empty tree, there are no nodes. The Fig. 1.3 (b) shows the
tree with only one node. The tree in Fig. 1.3 (c) has 12 nodes.
root A
root = NULL
(a) (b)
Level
root A 0
B C 1
D E F G 2
H I J K
3
L 4
(c)
Fig. 1.3: Examples of Tree
• The root of tree is A, it has 2 subtrees. The roots of these subtrees are called the
children of the root A.
• The nodes with no subtrees are called terminal node or leaves. There are 5 leaves in
the tree of Fig. 1.3 (c).
• Because family relationship can be modeled as trees, we often call the root of a tree (or
subtree) the parent and the nodes of the subtrees the children; the children of a node
are called siblings.
1.1.3 Terminology
• A tree consists of following terminology:
1. Node: In a tree, every individual element is called as node. Node in a tree data
structure, stores the actual data of that particular element and link to next element
in hierarchical structure. Fig. 1.3 (c) has 12 nodes.
2. Root Node: Every tree must have a root node. In tree data structure, the first node
from where the tree originates is called as a root node. In any tree, there must be
only one root node.
3. Parent Node: In tree data structure, the node which is predecessor of any node is
called as parent node. Parent node can be defined as, "the node which has
child/children". In simple words, the node which has branch from it to any other
node is called as parent node.
4. Child Node: In tree data structure, the node which is descendant of any node is
called as child node. In simple words, the node which has a link from its parent
node is called as child node. In tree data structure, all the nodes except root node
are child nodes.
5. Leaf Node: The node which does not have any child is called as a leaf node. Leaf
nodes are also called as external nodes or terminal nodes.
6. Internal Node: In a tree data structure, the leaf nodes are also called as external
nodes. External node is also a node with no child. In a tree, leaf node is also called
as terminal node.
7. Edge: In tree data structure, the connecting link between any two nodes is called
as edge. In a tree with 'n' number of nodes there will be a maximum of 'n-1'
number of edges.
8. Path: In tree data structure, the sequence of nodes and edges from one node to
another node is called as path between that two nodes. Length of a path is total
number of nodes in that path.
9. Siblings: Nodes which belong to the same parent are called as siblings. In other
words, nodes with the same parent are sibling nodes. [Oct. 17]
10. Null Tree: A tree with no nodes is called as a null tree (Refer Fig. 1.3 (a)).
11. Degree of a Node: Degree of a node is the total number of children of that node.
The degree of A is 2, degree of K is 1 and degree of L is 0, (Refer Fig. 1.3 (c)).
12. Degree of a Tree: The highest degree of a node among all the nodes in a tree is
called as degree of tree. In Fig. 1.3 (c), degree of the tree is 2. [April 18]
13. Depth or Level of a Node: In tree data structure, the root node is said to be at
Level 0 and the children of root node are at Level 1 and the children of the nodes
which are at Level 1 will be at Level 2 and so on. In simple words, in a tree each
step from top to bottom is called as a Level and the Level count starts with '0' and
incremented by one (1) at each level (step).
14. Descendants: The descendents of a node are those nodes which are reachable
from node. In Fig. 1.3 nodes J, K, L are descendents of G.
1.4
Data Structures & Algorithms - II Tree
15. Ancestor: The ancestor of a node are all the nodes along the path from the root to
that node. In Fig. 1.3 nodes A and C are ancestor of G.
16. In-degree: The in-degree of a node is the total number of edges coming to that
node.
17. Out-degree: The out-degree of a node is the total number of edges going outside
from the node.
18. Forest: A forest is a set of disjoint trees.
19. Sub Tree: In tree data structure, each child from a node forms a subtree
recursively. Every child node will form a subtree on its parent node.
20. Height: In tree data structure, the total number of edges from leaf node to a
particular node in the longest path is called as height of that node. In a tree, height
of the root node is said to be height of the tree. In a tree, height of all leaf nodes is
'0'.
21. Depth: Total number of edges from root node to a particular node is called
as depth of that node. In tree, depth of the root node is ‘0’. The tree of Fig. 1.3 (c),
has depth 4.
B C
D
E
G F
A A
B B
C C
D D
B C
D E
F G
B C
D E F G
B C
D E F G
B I J K L
+ 3
4 2
Fig. 1.9
32 32
26 45 20 59
15 30 40 48 15 26 78
11
38
1.2.8 Heap
• A heap is a special tree-based data structure in which the tree is a complete binary
tree.
• Generally, heaps can be of following two types:
1. Max-Heap: In a max-heap the key present at the root node must be greatest among
the keys present at all of its children. The same property must be recursively true
for all sub-trees in that binary tree.
2. Min-Heap: In a min-heap the key present at the root node must be minimum
among the keys present at all of its children. The same property must be
recursively true for all sub-trees in that binary tree.
10 100
15 30 40 50
40 50 100 40 10 15 50 40
• Hence, nodes of the tree are stored level by level, starting from the zero level where
the root is present.
• The root node is stored in the first memory location as the first element in the array.
• Static representation of tree needs sequential numbering of the nodes starting with
nodes on level zero, then those on level 1 and so on.
h+1
• A complete binary tree of height h has (2 − 1) nodes in it. The nodes can be stored in
h+1 d
one dimensional array. A array of size (2 − 1) or 2 − 1 (where d = no. of levels) is
required.
th
• Following rules can be used to decide the location of any i node of a tree:
For any node with index i, where i, 1 ≤ i ≤ n
i
(a) PARENT (i) = 2 if i ≠ 1
If i = 1 then it is root which has no parent
(b) LCHILD (i) = 2 * i, if 2i ≤ n
If 2i > n, then i has no left child
(c) RCHILD(i) = 2i + 1, if 2i + 1 ≤ n
If (2i + 1) > n, then i has no right child.
• Let us, consider complete binary tree in the Fig. 1.12.
A
B
E
C D F G
Fig. 1.12
• The representation of the above tree in Fig. 1.12 using array is given in Fig. 1.13.
1 2 3 4 5 6 7 8 9
A B E C D F G – –
Fig. 1.13
• The parent of node i is at location (i/2)
Example, Parent of node D = (5/2) =2 i.e. B
i
• Left child of a node i is present at position 2i + 1 if 2 + 1 < n
i
Left child of node B = 2 = 2*2= 4 i.e. C
i
• Right child of a node i is present at position 2 + 1
i
Right child of E = 2 +1= (2*3) +1= 6+1=7 i.e. G.
1.9
Data Structures & Algorithms - II Tree
Examples:
Example 1: Consider the complete binary tree in Fig. 1.14.
0
A
1 2
B E
3 4 5 6
C D F G
Fig. 1.14
In Fig. 1.14,
Number of levels = 3 (0 to 2) and height = 2
3 2+1 3
Therefore, we need the array of size 2 −1 or 2 − 1 is 2 − 1 = 7.
The representation of the above binary tree using array is as followed:
1 2 3 4 5 6 7
A B C D E F G
We will number each node of tree starting from the root. Nodes on same level will be
numbered left to right.
Example 2: Consider almost complete binary tree in Fig. 1.15.
A
B C
D E
F G H I
Fig. 1.15
4
Here, depth = 4 (level), therefore we need the array of size 2 − 1 = 15. The
representation of the above binary tree using array is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A B C D E − − F G H I − − − −
level 0 1 2 3
We can apply the above rules to find array representation.
i 5
1. Parent of node E (node 5) = 2 = 2 = 2 i.e. B.
1.10
Data Structures & Algorithms - II Tree
i
2. Left (i) = 2 .
For example, left of E = 2 × 5 = 10 i.e. H.
th
Since, E is the 5 node of tree.
i
3. Right (i) = 2 + 1
For example: Right of D = 2 × 4 + 1 = 8 + 1 = 9 i.e. G.
th th
Since, D is the 4 node of tree and G is the 9 element of an array.
Example 3: Consider the example of skewed tree.
A
D
Fig. 1.16: Skewed Binary Tree
The tree has following array representation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
A B – C – – – D – – – – – – –
4
We need the array of size 2 – 1 = 15.
Thus, only four positions are occupied in the array and 11 are wasted.
From the above example it is clear that binary tree using array representation is
easier, but the method has certain drawbacks. In most of the representation, there will
be lot of unused space. For complete binary trees, the representation is ideal as no
space is wasted.
But for the skewed binary tree (See Fig. 1.16), less than half of the array is used and
the more is left unused. In the worst case, a skewed binary tree of depth k will require
k
2 − 1 locations of array and occupying only few of them.
Advantages of Array Representation of Tree:
1. In static representation of tree, we can access any node from other node by
calculating index and this is efficient from the execution point of view.
2. In static representation, the data is stored without any pointers to their successor
or ancestor.
3. Programming languages like BASIC, FORTRAN etc., where dynamic allocation is
not possible, use array representation to store a tree.
Disadvantages of Array Representation of Tree:
1. In static representation the number of the array entries are empty.
2. In static representation, there is no way possible to enhance the tree structure.
Array size cannot be changed during execution.
3. Inserting a new node to it or deleting a node from it are inefficient for this
representation, (here, considerable data movement up and down the array
happens, thus more/excessive processing time is used).
1.11
Data Structures & Algorithms - II Tree
Program 1.1: Program for static (array) representation of tree (Converting a list of array
in to binary tree).
#include<stdio.h>
typedef struct node
{
struct node*left;
struct node*right;
char data;
}node;
node* insert(char c[],int n)
{
node*tree=NULL;
if(c[n]!='\0')
{
tree=(node*)malloc(sizeof(node));
tree->left=insert(c,2*n+1);
tree->data=c[n];
tree->right=insert(c,2*n+2);
}
return tree;
}
/*traverse the tree in inorder*/
void inorder(node*tree)
{
if(tree!=NULL)
{
inorder(tree->left);
printf("%c\t",tree->data);
inorder(tree->right);
}
}
void main()
{
node*tree=NULL;
char c[]={'A','B','C','D','E','F','\0','G','\0','\0','\0','\0','\0','
\0','\0','\0','\0','\0','\0','\0','\0'};
1.12
Data Structures & Algorithms - II Tree
tree=insert(c,0);
inorder(tree);
}
Output:
G D B E A F C
A Root node
Tree
A Root
B C
B
C
D E F G D NULL E NULL F NULL NULL G NULL
H I J
(a)
1.13
Data Structures & Algorithms - II Tree
Linked representation
A Root NULL A
B
NULL B
C
NULL C
D
NULL D NULL
(b)
Fig. 1.18: Linked Representation of Binary Tree
• Each node represents information (data) and two links namely, Lchild and Rchild are
used to store addresses of left child and right child of a node.
Declaration in 'C': We can define the node structure as follows:
struct tnode
{
struct tnode *left;
<datatype> data; /* data type is a data type of field data */
struct tnode *right;
};
• Using this declaration for linked representation, the binary tree representation can be
logically viewed as in Fig. 1.19 (c). In Fig. 1.19 (b) physical representation shows the
memory allocation of nodes.
Address node
A
left data right
B 50 NULL D NULL
A C
D 75 50 B 40
B E F G
C
40 NULL E NULL
D E FH G I
89 75 A 46
(a) A Binary Tree 62 NULL F NULL
H I
(a)
46 62 C 58
(a)
66 NULL H NULL
58 66 G 74
74 NULL I NULL
(b)
(b) A Binary Tree and its various Nodes
(Physical View)
75 A 46
1.14
50 B 40 62 C 58
75 A 46
50 B 40 62 C 58
(c)
(c) Logical View
Fig. 1.19: Linked Representation of Binary Tree
Advantages of Linked Representation of Binary Tree:
1. Efficient use of memory than that of static representation.
2. Insertion and deletion operations are more efficient.
3. Enhancement of tree is possible.
Disadvantages of Linked Representation of Binary Tree:
1. If we want to access a particular node, we have to traverse from root to that node;
there is no direct access to any node.
2. Since, two additional pointers are present (left and right), the memory needed per
node is more than that of sequential/static representation.
3. Programming languages not supporting dynamic memory management are not
useful for dynamic representation.
Program 1.2: Program for dynamic (linked list) representation of tree.
#include <stdio.h>
#include <malloc.h>
struct node {
struct node * left;
char data;
struct node * right;
};
struct node *constructTree( int );
void inorder(struct node *);
char array[] = {'A', 'B', 'C', 'D', 'E', 'F', 'G', '\0', '\0', 'H'};
int leftcount[] = {1, 3, 5, -1, 9, -1, -1, -1, -1, -1};
int rightcount[] = {2, 4, 6, -1, -1, -1, -1, -1, -1, -1};
void main() {
struct node *root;
root = constructTree( 0 );
1.15
Data Structures & Algorithms - II Tree
B NULL C
NULL G NULL
1.16
Data Structures & Algorithms - II Tree
3. Deletion: To delete a node from the binary tree. Deletion operation deletes any
node from any non-empty binary tree. In order to delete a node in a binary tree, it
is required to reach at the parent node of the node to be deleted. The link field of
the parent node which stores the address of the node to be deleted is then set by a
NULL entry as shown in Fig. 1.21.
A
B NULL C NULL
NULL G NULL
Fig. 1.21: Node with Data G is Deleted and Left of D becomes NULL
4. Traversal: Binary tree traversal means visiting every node exactly once. There are
three methods of traversing a binary tree. These are called as inorder, postorder
and preorder traversals.
1.17
Data Structures & Algorithms - II Tree
32 48 Dec May
5 27 42 Apr
1.4.1.1 Creating Binary Search Tree [April 16, 18, 19, Oct. 17]
• The function create() simply creates an empty binary search tree. This initializes the
root pointer to NULL.
struct tree * root = NULL;
• The actual tree is build through a call to insert BST function, for every key to be
inserted.
Algorithm:
1. root = NULL
2. read key and create a new node to store key value
3. if root is null then new node is root
4. t = root, flag = false
5. while (t ≠ null) and (flag = false) do
case 1: if key < t → data
attach new node to t → left
case 2: if key > t → data
attach new node to t → right
case 3: if t → data = key
then print "key already exists"
6. Stop.
1.18
Data Structures & Algorithms - II Tree
Example: Construct Binary Search Tree (BST) for the following elements:
15, 11, 13, 8, 9, 17, 16, 18
1. 2. 3.
15 15
15
11 11
13
4. 5. 6.
15 15 15
11 11 11 17
8 13 8 13 8 13
9 9
7. 15 8. 15
17 17
11 11
8 13 16 8 13 16 18
9 9
Key = 16 Key = 18
• Here, all values in the left subtrees are less than the root and all values in the right
subtrees are greater than the root.
Program 1.3: Program to create a BST.
#include<stdio.h>
#include<malloc.h>
#include<stdlib.h>
struct node
{
int data;
struct node *right;
struct node *left;
};
1.20
Data Structures & Algorithms - II Tree
Output:
Enter data for node 10
Do you want to insert more elements?y
Enter data for node 20
20 inserted right of 10
Do you want to insert more elements?y
Enter data for node 5
5 inserted left of 10
Do you want to insert more elements?y
Enter data for node 15
15 inserted right of 10
15 inserted left of 20
Do you want to insert more elements?y
Enter data for node 2
2 inserted left of 10
2 inserted left of 5
Do you want to insert more elements?
• To search a target key, we first compare it with the key at the root of the tree. If it is
the same, then the search ends and if it is less than key at root, search the target key in
left subtree else search in the right subtree.
• Example: Consider BST in the Fig. 1.23.
Jan
Dec May
Apr
1.22
Data Structures & Algorithms - II Tree
1.4.1.3 Inserting a Node into BST [April 16, 18, Oct. 17, 18]
• We insert a node into binary search tree in such a way that resulting tree satisfies the
properties of BST.
Algorithm: Insert_BST (Key)
Steps: 1. t = root, flag = false
2. while (t ≠ null) & (flag = false) do
case 1: key < t → data
t1 = t;
t = t → left;
case 2: key > t → data
t1 = t
t = t → right
case 3: t → data = key
flag = true
display "item already exist"
break
end case
end while
3. if (t = null) then
new = getnode (node) //create node
new → data = key
new → left = null //initialize a node
new → right = null
if (t1 → data < key) then //insert at right
t1 → right = new
else t1 → left = new //insert at left
endif
4. Stop
Recursive C function for inserting node into BST:
BSTNODE *Insert-BST (BSTNODE *root, int n)
{
if (root == NULL)
{
root = (BSTNODE *) malloc (sizeof(BSTNODE));
root → data = n;
root → left = root → right = NULL;
}
1.24
Data Structures & Algorithms - II Tree
else
if (n < root → data)
root → left = Insert-BST (root → left, n);
else
root → right = Insert–BST (root → right, n);
return(root);
}
Non-Recursive C function for inserting node into BST:
BSTNODE *Insert-BST (BSTNODE *root, int n)
{
BSTNODE *temp, *newnode;
newnode = (BSTNODE*) malloc (sizeof(BSTNODE));
newnode → data = n;
newnode → left = root → right = NULL;
if (root==NULL)
root = newnode;
else
{
temp = root;
while (temp)
{
if (n < temp → data)
{
if (temp → left==NULL)
{
temp → left = newnode;
break;
}
else
temp = temp → left;
else if (n > temp → data)
{
if(temp → right==NULL)
{
temp → right = newnode;
break;
}
else
temp = temp → Rchild;}
}
}
return root;
}
}
}
1.25
Data Structures & Algorithms - II Tree
• Example: Consider the insertion of keys: Heena, Deepa, Tina, Meena, Beena, Anita.
• Initially tree is empty. The first key 'Heena' is inserted, it becomes a root. Since, 'Deepa'
< 'Heena', it is inserted at left. Similarly insert remaining keys in such a way that they
satisfy BST properties.
• Now, suppose if we want to insert a key 'Geeta'. It is first compared with root. Since,
‘Geeta’ < ‘Heena’, search is proceeding to left side. Since, 'Geeta' > ‘Deepa’ and right of
‘Deepa’ is null then ‘Geeta’ is inserted as a right of ‘Deepa’.
Heena Heena
1.27
Data Structures & Algorithms - II Tree
1.28
Data Structures & Algorithms - II Tree
12 30
10 15 25 35
32
33
Fig. 1.25: Binary Search Tree
After deletion of leaf node with data 33.
22 22
12 30 Node 33 is 12 30
deleted
10 15 10 15
25 35 25 35
32 32
33
12 30 12 30
10 15 10 15
25 35 25 32
32
Fig. 1.27: Deletion of Node 35
1.30
Data Structures & Algorithms - II Tree
Case 3: The node to be deleted having both nodes. When the node to be deleted has
both non-empty subtrees, the problem is difficult.
One of the solution is to attach the right subtree in place of the deleted node and then
attach the left subtree to the appropriate node of right subtree.
From Fig. 1.28, delete the node with data 12.
22 22
12 15
30 30
10 15 10
25 35 25 35
32 32
33
33
Delete x
x z
y z y
The another approach is to delete x from T by first deleting inorder successor of node
x say z, then replace the data content in node x by the data content in node z.
Inorder successor means the node which comes after node x during the inorder
traversal of T.
• Example: Consider 1.30 and delete node with data 15.
10 10
Delete 15
8 15 8 16
6 6
20 11 20
11
16 22 18 22
18
1.31
Data Structures & Algorithms - II Tree
1.4.3 Tree Traversals (Preorder, Inorder and Postorder) [April 16, 19]
• Traversing a binary tree refers to the process of visiting each and every node of the
tree exactly once.
• Traversal operation is used to visit each node (or visit each node exactly once). There
are two methods of traversal namely, recursive and non-recursive.
1. Recursive is a straight forward approach where the programmer has to convert
the definitions into recursions. The entire load is on the language translator to
carry out the execution.
2. Non-recursive approach makes use of stack. This approach is more efficient as it
requires less execution time.
• In addition, it is good for the programming language which do not support dynamic
memory management scheme.
• The sequence in which these entities processed defines a particular traversal method.
• There are three traversal methods i.e. inorder traversal, preorder traversal and post
order traversal.
• These traversal techniques are applicable on binary tree as well as binary search tree.
Preorder Traversal (DLR):
• In preorder traversal, the root node is visited before traversing it’s the left child and
right child nodes.
1.32
Data Structures & Algorithms - II Tree
• In this traversal, the root node is visited first, then its left child and later its right child.
This preorder traversal is applicable for every root node of all subtrees in the tree.
(i) Process the root Data (D) (Data)
(ii) Traverse the left subtree of D in preorder (Left)
(iii) Traverse the right subtree of D in preorder (Right)
Preorder ⇒ Data − Left − Right (DLR)
Example:
• A preorder traversal of a tree in Fig. 1.31 (a) visit node in a sequence ABDECF.
A
B
C
D E F
A B
Fig. 1.31 (b): Prefix Expression
• In preorder traversal, we visit a node, traverse left and continue again. When we
cannot continue, move right and begin again or move back, until we can move right
and stop. Preorder function can be written as both recursive and non-recursive way.
Recursive preorder traversal:
Algorithm:
Step 1 : begin
Step 2 : if tree not empty
visit the root
preorder (left child)
preorder(right child)
Step 3 : end
C Function for Recursive Preorder Traversal:
void preorder (struct treenode * root)
{
if (root)
{
printf("%d \t", root → data); /*data is integer type */;
preorder (root → left);
preorder (root → right);
}
}
1.33
Data Structures & Algorithms - II Tree
Example:
• In the Fig. 1.32, tree contains an arithmetic expression. It gives us prefix expression as,
+ * − A/BCDE. The preorder traversal is also called as depth first traversal.
+
* E
- D
A l
B C
Fig. 1.32: Binary Tree for Expression
((A − B/C) * D) + E
Inorder Traversal (LDR):
• In Inorder traversal, the root node is visited between the left child and right child.
• In this traversal, the left child node is visited first, then the root node is visited and
later we go for visiting the right child node.
• This inorder traversal is applicable for every root node of all subtrees in the tree. This
is performed recursively for all nodes in the tree.
(i) Traverse the left substree of D in inorder (Left)
(ii) Process of root Data (D) (Data)
(iii) Traverse the right subtree of R in inorder (Right)
Inorder ⇒ Left − Data − Right (LDR)
Algorithm for Recursive Inorder Traversal:
Step 1 : begin
Step 2 : if tree is not empty
inorder (left child)
visit the root
inorder (right child)
Step 3 : end
'C' function for Inorder Traversal:
• The in order function calls for moving down the tree towards the left until, you can go
on further. Then visit the node, move one node to the right and continue again.
void inorder (tnode * root)
{
if(root)
{
inorder (root → left);
printf("%d", root → data);
inorder (root → right);
}
}
• This traversal is also called as symmetric traversal.
1.34
Data Structures & Algorithms - II Tree
Fig. 1.33
Traversals: Preorder Traversal: log ! n
Inorder Traversal: log n!
Postorder Traversal: n! log
1.35
Data Structures & Algorithms - II Tree
(b)
and
and <
<
< d
c
a b
b c
Fig. 1.34
Traversals: Preorder Traversal: and and < a b < b c < c d
Inorder Traversal: a < b and b < c and c < d
Postorder Traversal: a b < b c and c d < and
(c)
1
Fig. 1.35
Traversals: Preorder Traversal: 1 2 3 4 5
Inorder Traversal: 2 4 5 3 1
Postorder Traversal: 5 4 3 2 1
(d)
1
2 3
4 5 6
7 8 9
Fig. 1.36
Traversals: Preorder Traversal: 1 2 4 7 3 5 6 8 9
Inorder Traversal: 4 7 2 1 5 3 8 6 9
Postorder Traversal: 7 4 2 5 8 9 6 3 1
1.36
Data Structures & Algorithms - II Tree
Examples:
Example 1: Perform in order, post order and pre order traversal of the binary tree
shown in Fig. 1.37.
A In order traversal (left-root-right):
DCEBGFAIHJ
B H
5 4
Pre-order: 8, 5, 9, 7, 1, 12, 2, 4, 11, 3
9 7 11 In-order: 9, 5, 1, 7, 2, 12, 8, 4, 3, 11
Post-order: 9, 1, 2, 12, 7, 5, 3, 11, 4, 8
1 12 3
Fig. 1.38
Program 1.4: Menu driven program for binary search tree creation and tree traversing.
#include<stdio.h>
#include<malloc.h>
#include<stdlib.h>
struct node
{
int data;
struct node *right;
struct node *left;
};
struct node *Create(struct node *, int);
void Inorder(struct node *);
void Preorder(struct node *);
void Postorder(struct node *);
1.37
Data Structures & Algorithms - II Tree
int main()
{
struct node *root = NULL;
setbuf(stdout, NULL);
int choice, item, n, i;
printf("\n*** Binary Search Tree ***\n");
printf("\n1. Creation of BST");
printf("\n2. Traverse in Inorder");
printf("\n3. Traverse in Preorder");
printf("\n4. Traverse in Postorder");
printf("\n5. Exit\n");
while(1)
{
printf("\nEnter Your Choice :(1.Create 2.Inorder 3.Preorder
4.Postorder 5.Exit)\n");
scanf("%d",&choice);
switch(choice)
{
case 1:
root = NULL;
printf("Enter number of nodes:\n");
scanf("%d",&n);
for(i = 1; i <= n; i++)
{
printf("\nEnter data for node %d : ", i);
scanf("%d",&item);
root = Create(root,item);
}
break;
case 2:
Inorder(root);
break;
case 3:
Preorder(root);
break;
case 4:
Postorder(root);
break;
1.38
Data Structures & Algorithms - II Tree
case 5:
exit(0);
default:
printf("Wrong Choice !!\n");
}
}
return 0;
}
struct node *Create(struct node *root, int item)
{
if(root == NULL)
{
root = (struct node *)malloc(sizeof(struct node));
root->left = root->right = NULL;
root->data = item;
return root;
}
else
{
if(item < root->data )
root->left = Create(root->left,item); //recursive function call
else if(item > root->data )
root->right = Create(root->right,item);
else
printf(" Duplicate Element is Not Allowed !!!");
return(root);
}
}
void Inorder(struct node *root)
{
if( root != NULL)
{
Inorder(root->left); //recursive function call
printf("%d ",root->data);
Inorder(root->right);
}
}
1.39
Data Structures & Algorithms - II Tree
1.40
Data Structures & Algorithms - II Tree
4 5
Fig. 1.39
• To count total nodes of a binary tree, we traverse a tree. We can use any of the three
traversal methods.
'C' Function:
int CountNode(struct node *root)
{
static int count;
if (root== NULL)
return 0;
else
count=1+CountNodes(root left)+CountNodes(root right);
return count;
}
OR
int CountNode (struct node * root)
{
if(root == NULL)
return 0;
else
return (1 + CountNode(root → left) + CountNode (root → right));
}
1.41
Data Structures & Algorithms - II Tree
20 20
10 30 30 10
5 8 32 8
32 5
'C' Function:
struct node * mirror(struct node * root)
{
struct node * temp=NULL;
if (root != NULL)
{
temp = root -> left;
root -> left = mirror(root right);
root -> right = mirror(temp);
}
return root;
}
55 40 7 15
25 15 7 2 25 40 55 75
(a) Max Heap (b) Min Heap
Fig. 1.41: Heap Trees
• Heap sort It is one of the important application of heap tree.
• Heap sort is based on heap tree and it is an efficient sorting method. We can sort
either in ascending or descending using heap sort.
1.43
Data Structures & Algorithms - II Tree
15 64
2 75 67 57
80
Fig. 1.42
The tree of Fig. 1.42 is not a heap (max) tree, we convert it into heap tree as follows:
32 32
15 64 15 64
2 75 67 57 80 75 67 57
80 2
(a) (b)
32 80
80 64 32 64
15 75 67 57 15 75 67 57
2 2
(c) (d)
80
75 67
15 32 64 57
2
(e)
Fig. 1.43: Heap Tree (Max)
Here, if the keys in the children are greater than the parent node, the key in parent
and in child are interchanged.
1.44
Data Structures & Algorithms - II Tree
75 67
15 32 64 57 80 75 67 15 32 64 57 2
(a)
1.45
Data Structures & Algorithms - II Tree
Iteration 2: (a)
2
75 67
15 32 64 57 2 75 67 15 32 64 57 80
80
(b) Swapping the root node and last node
Iteration 3:
75 57
15 67 15 67
2 32 64 57 2 32 64
(c) Rebuilt the heap tree (d) Swapping 57 and root node
67 15 64 2 32 57 75 80
67
64
67 15 64
(e) Rebuilt
15 57
15 64 2 32 57
2 32
2 32 57
64
64 15 57 2 32 32 67 75 80
(e) Rebuilt 57
15
(f)15
Swapping and 57
Rebuilt
64 15 57 2 32 67 75 80
2 32 57
(f) Swapping and Rebuilt 2
(g) After
15 swapping 32
32and 64
32
2
15 57
57 15 32 2 64 67 75 80
2
(g) After swapping 32 and 64 (h) Rebuilt
15 32
2 15 32 57 64 67 75 80
2 15 32 57 64 67 75 80
Sorted list when heap is empty
(i) After swapping 2 and 57 (j) Continue and we get emptyheap
Fig. 1.44: Tracing of Heap Sort
Hence we get the array sorted in ascending order.
1.46
Data Structures & Algorithms - II Tree
1.47
Data Structures & Algorithms - II Tree
1.48
Data Structures & Algorithms - II Tree
1.49
Data Structures & Algorithms - II Tree
a 5 b 9
1.50
Data Structures & Algorithms - II Tree
Now min heap contains 5 nodes where 4 nodes are roots of trees with single
element each, and one heap node is root of tree with 3 elements
Character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3 : Extract two minimum frequency nodes from heap. Add a new internal node
with frequency 12 + 13 = 25.
25
c 12 d 13
Now min heap contains 4 nodes where 2 nodes are roots of trees with single
element each, and two heap nodes are root of tree with more than one nodes.
Character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
Step 4 : Extract two minimum frequency nodes. Add a new internal node with
frequency 14 + 16 = 30.
30
e 16
14
a 5 b 9
1.51
Data Structures & Algorithms - II Tree
Step 5 : Extract two minimum frequency nodes. Add a new internal node with
frequency 25 + 30 = 55.
55
30
e 16
25
14 14
c 12 d 13 a 5 b 9
f 45
55
30
e 16
25
14 14
c 12 d 13 a 5 b 9
1.52
Data Structures & Algorithms - II Tree
100
0 1
f 45
55
1
0 30
0 1
e 16
25
14 14
0 1 0 1
c 12 d 13 a 5 b 9
1.55
Data Structures & Algorithms - II Tree
if (root->right) {
arr[top] = 1;
printCodes(root->right, arr, top + 1);
}
/* If this is a leaf node, then */
/* it contains one of the input */
/* characters, print the character */
/* and its code from arr[]
if (isLeaf(root)) {
printf("%c: ", root->data);
printArr(arr, top);
}
}
/* The main function that builds a */
/* Huffman Tree and print codes by traversing */
/* the built Huffman Tree */
void HuffmanCodes(char data[], int freq[], int size)
{
/* Construct Huffman Tree */
struct MinHeapNode* root = buildHuffmanTree(data, freq, size);
/* Print Huffman codes using */
/* the Huffman tree built above */
int arr[MAX_TREE_HT], top = 0;
printCodes(root, arr, top);
}
/* Driver program to test above functions */
int main()
{
char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 5, 9, 12, 13, 16, 45 };
int size = sizeof(arr) / sizeof(arr[0]);
HuffmanCodes(arr, freq, size);
return 0;
}
Output:
f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111
1.58
Data Structures & Algorithms - II Tree
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which is widely used non-linear data structure?
(a) Tree (b) Array
(c) Queue (d) Stack
2. Which in a tree data structure stores the actual data of that particular element and
link to next element in hierarchical structure?
(a) Root (b) Node
(c) Child (d) Leaf
3. Which is the operation in tree will remove a node and all of its descendants from
the tree?
(a) Prune (b) Graft
(c) Insert (d) Delete
4. The depth of the root node = .
(a) 1 (b) 3
(c) 0 (d) 4
5. Which is a set of several trees that are not linked to each other.
(a) Node (b) Forest
(c) Leaf (d) Root
6. In which tree, every node can have a maximum of two children, which are known
as left child and right child.
(a) Binary (b) Binary search
(c) Strictly (d) Extended
7. Which is data structure like a tree-based data structure that satisfies a property
called heap property?
(a) Tree (b) Graph
(c) Heap (d) Stack
8. How many roots contains a tree?
(a) 1 (b) 3
(c) 0 (d) 4
9. The total number of edges from root node to a particular node is called as,
(a) Height (b) Path
(c) Depth (d) Degree
10. In which binary tree every node has either two or zero number of children?
(a) Binary (b) Binary search
(c) Strictly (d) Extended
11. Which tree operation will return a list or some other collection containing every
descendant of a particular node, including the root node itself?
(a) Prune (b) Graft
(c) Insert (d) Enumerate
1.59
Data Structures & Algorithms - II Tree
3
2
4 5 6 7
If the postorder traversal gives (ab–cd*+) then the label of the nodes 1, 2, 3, 4, 5, 6
will be,
(a) +, –, *, a, b, c, d (b) a, –, b, +, c, *, d
(c) a, b, c, d, –, *, + (d) –, a, b, +, *, c, d
16. Which of the following statement about binary tree is correct?
(a) Every binary tree is either complete or full
(b) Every complete binary tree is also a full binary tree
(c) Every full binary tree is also a complete binary tree
(d) A binary tree cannot be both complete and full
17. Which type of traversal of binary search tree outputs the value in sorted order?
(a) Preorder (b) Inorder
(c) Postorder (d) None of these
Answers
1. (a) 2. (b) 3. (a) 4. (c) 5. (b) 6. (a) 7. (c)
8. (a) 9. (c) 10.(c) 11. (d) 12. (c) 13. (a) 14. (b)
15. (a) 16. (c) 17. (b)
Q. II Fill in the Blanks:
1. A tree is a non-linear ______ data structure.
2. There is only ______ root per tree and one path from the root node to any node.
3. In tree data structure, every individual element is called as ______.
4. Nodes which belong to ______ parent are called as siblings.
5. The ______ operation will remove a specified node from the tree.
6. Height of all leaf nodes is ______.
7. A ______ tree is simply a tree with zero nodes.
1.60
Data Structures & Algorithms - II Tree
1.61
Data Structures & Algorithms - II Tree
18. What are different tree traversal methods? Explain with example.
19. What is binary search tree? How to implement it? Explain with example.
20. Traverse following trees in:
(i) Inorder (ii) Preorder (iii) Postorder
1
B P
Q S
T
I J
Ans. Refer to Section 1.4.3.
April 2017
1. Write a ‘C’ function to compare two BST. [5 M]
Ans. Refer to Section 1.4.1.2.
2. Define skewed binary tree. [1 M]
Ans. Refer to Section 1.2.2.
October 2017
1. What is siblings? [1 M]
Ans. Refer to Section 1.1.3, Point (9).
1.63
Data Structures & Algorithms - II Tree
B F
C D G
H I
Ans. Refer to Section 1.4.3.
1.64
CHAPTER
2
Efficient Search Trees
Objectives …
To study AVL Trees with its Operations
To learn Red Black Trees
To understand B and B+ Tree with its Operations
2.0 INTRODUCTION
• The efficiency of many operations on trees is related to the height of the tree - for
example searching, inserting, and deleting.
• The efficiency of various operations on a Binary Search Tree (BST) decreases with
increase in the differences between the heights of right sub tree and left sub tree of the
root node.
• Hence, the differences between the heights of left sub tree and right sub tree should be
kept to the minimum.
• Various operations performed on binary tree can lead to an unbalanced tree, in which
either the height of left sub tree is much more than the height of right sub tree or vice
versa.
• Such type of tree must be balanced using some techniques to achieve better efficiency
level.
• The need to have balanced tree led to the emergence of another type of binary search
tree known as height-balanced tree (also known as AVL tree) named after their
inventor G. M. Adelson-Velsky and E. M. Landis.
• The AVL tree is a special kind of binary search tree, which satisfies the following two
conditions:
1. The heights of the left and right sub trees of a node differ by one.
2. The left and right sub trees of a node (if exist) are also AVL trees.
2.1 TERMINOLOGY
• In 1962, Adelson-Velsky and Landis introduced a binary tree structure that is balanced
with respect to the height of subtree. That is why it is called a height balanced tree.
• The basic objective of the height balanced tree is to perform the searching, insertion
and deletion operations efficiently.
• A balanced binary tree is a binary tree in which the heights of two sub trees (left and
right) of every node never differ by more than 1.
2.1
Data Structures & Algorithms - II Efficient Search Trees
• A binary search tree is said to height balanced binary tree if all its nodes have a
balance factor of 1, 0 or − 1 i.e.,
|hL − hR | ≤ 1
Where, hL and hR are heights of left and right subtrees respectively.
1. Balance Factor: [Oct. 17, April 19]
• The term Balancing Factor (BF) is used to determine whether the given binary
search tree is balanced or not.
• The BF of a code is calculated as the height of its left sub tree minus the height of
right sub tree i.e.,
Balance factor = hL − hR
• The balance factor of a node is a binary tree can have value +1, −1 or 0 depending on
whether the height of its left sub tree is greater than or equal to the height of its right
subtree.
• The balance factor of each node is indicated in the Fig. 2.1.
o If balance factor of any node is +1, it means that the left sub-tree is one level higher
than the right sub-tree.
o If balance factor of any node is 0, it means that the left sub-tree and right sub-tree
contain equal height.
o If balance factor of any node is −1, it means that the left sub-tree is one level lower
than the right sub-tree.
-1
L
0 E +1
T
0 0 +1
B J 0
N V
0
P
0
25
1 0
Balance factor 20 36
-1 0 1 0
10 22 30 40
0 0 0 0
12 28 38 48
Fig. 2.2
3. Red Black Tree:
• A red black tree is a variant of Binary Search Tree (BST) in which an additional
attribute, ‘color’ is used for balancing the tree. The value of this attribute can be
either red or black.
• The red black trees are self-balancing binary search tree. In this type of tree, the leaf
nodes are the NULL/NIL child nodes storing no data.
• In addition to the conditions satisfied by binary search tree, the following
conditions/rules must also be satisfied for being a red black tree:
(i) Each and every node is either red or black.
(ii) The roof node and leaf nodes are always black in color.
(iii) If any node is red, then both its child nodes are black.
(iv) Each and every path from a given node to the leaf node contains same
number of black nodes. The number of black nodes on such a path is known
as black-height of a node.
• Fig. 2.3 shows a red back tree.
4
2 6
1 3 5 8
7 9
NULL node
Fig. 2.3
2.3
Data Structures & Algorithms - II Efficient Search Trees
• The AVL trees are more balanced compared to red black trees, but they may cause
more rotations during insertion and deletion.
• So if the application involves many frequent insertions and deletions, then Red Black
trees should be preferred.
• And if the insertions and deletions are less frequent and search is a more frequent
operation, then AVL tree should be preferred over red black tree.
4. Splay Tree:
• Splay tree was invented by D. D. Sleator and R. E. Tarjan in 1985. According them the
tree is called splay tree (splay means to spread wide apart).
• Splay tree is another variant of a Binary Search Tree (BST). In a splay tree, recently
accessed element is placed at the root of the tree.
• Splay Tree is a self-adjusted/self-balancing Binary Search Tree in which every
operation on element rearranges the tree so that the element is placed at the root
position of the tree.
• All normal operations on a binary search tree are combined with one basic operation,
called splaying.
• Splaying the tree for a certain element rearranges the tree so that the element is
placed at the root of the tree.
• Fig. 2.4 shows an example of splay tree.
j
f l
b i k m
a e g
d h
Fig. 2.4
Rotations in Splay Tree:
• In zig rotation, every node moves one position to the right from its current position.
• In zag rotation, every node moves one position to the left from its current position.
• In zig-zig rotation, every node moves two positions to the right from its current
position.
• In zag-zag rotation, every node moves two positions to the left from its current
position.
• In zig-zag rotation, every node moves one position to the right followed by one
position to the left from its current position.
• In zag-zig rotation, every node moves one position to the left followed by one position
to the right from its current position.
2.4
Data Structures & Algorithms - II Efficient Search Trees
A B C Y Z A B C Y Z A B C Y Z
...
A B C Y Z A B C Y Z
... ...
2.5
Data Structures & Algorithms - II Efficient Search Trees
• Trie data structure makes retrieval of a string from the collection of strings more
easily.
• Trie is also called as Prefix Tree and sometimes Digital Tree. Trie is a tree like data
structure used to store collection of strings.
• Fig. 2.6 shows an example of trie. Consider a list of strings Cat, Bat, Ball, Rat, Cap, Be.
Root *
B C R
a e a a
l t $ t t
l $ $ $
$
Indicates end of a string
Fig. 2.6
2.2.1 Concept
• AVL, named after inventors Adelson-Velsky and Landis, is a binary tree that self-
balances by keeping a check on the balance factor of every node.
• The balance factor of a node is the difference in the heights of the left and right
subtrees. The balance factor of every node in the AVL tree should be either +1, 0 or -1.
• AVL trees are special kind of binary search trees. In AVL trees, difference of heights of
left subtree and right subtree of any node is less than or equal to one.
• AVL trees are also called as self-balancing binary search trees. The node structure of
AVL tree are given below:
struct AVLNode
{
int data;
struct AVLNode *left, *right;
int balfactor;
};
• AVL tree can be defined as, let T be a non-empty binary tree with TL and TR as its left
and right subtrees. The tree is height balanced if:
o TL and TR are height balanced.
o hL − hR ≤ 1, where hL − hR are the heights of TL and TR.
2.6
Data Structures & Algorithms - II Efficient Search Trees
J
+1
F -1 P
+1
D
G 0 L V -1
-1 +1
C 0
0 N S X 0
0
Q U
0 0
Fig. 2.7: AVL Tree with Balance Factors
• Balance factor of a node is the difference between the heights of the left and right
subtrees of that node. Consider the following binary search tree in Fig. 2.8.
90
63 200
30
175 300
21 35
180 220 320
10
184
182 186
Fig. 2.8: Binary Search Tree (BST)
Height of tree with root 90 (90) = 1 + max (height (63), height (200))
Height of (63) = 1 + height (30)
Height (30) = 1 + max (height (21), height (35))
Height (21) = 1 + height (10)
Height (10) = 1
Therefore,
Height (21) = 1 + 1 = 2
Height (30) = 1 + max (2, 1) = 1 + 2 = 3
Height (63) = 1 + 3 = 4
Height (200) = 1 + max (height (175), height (300))
Height (175) = 1 + height (180)
Height (180) = 1 + height (184)
Height (184) = 1 + max (height (182), height (186))
Height (182) = 1
Height (186) = 1
2.7
Data Structures & Algorithms - II Efficient Search Trees
Height (184) = 1 + 1 = 2
Height (180) = 1 + 2 = 3
Height (175) = 1 + 3 = 4
Height (200) = 1 + max. (height (175), height (300))
= 1 + max (4, 2)
= 1+4=5
Height (90) = 1 + max (height (63), height (200))
= 1 + max (4, 5)
= 1+5=6
• Thus, this tree has height 6. But from this we do not get any information about balance
of height. The tree is said to be balanced if the difference in the right subtree and left
subtree is not more than 1.
• Consider the above example in which all the leaf nodes have a balance factor of 0.
BF (21) = hL (10) − 0 = 1 − 0 = 1
BF (30) = hL − hR = 2 − 1 = 1
BF (63) = 3 − 0 = 3
BF (184) = 1 − 1 = 0
BF (180) = 0 − 2 = − 2
BF (175) = 0 − 3 = − 3
BF (300) = 1 − 1 = 0
BF (200) = 4 − 2 = 2
BF (90) = 4 − 5 = − 1
• Hence, the above tree is not height balanced. In order to balance a tree, we have to
perform rotations on the tree.
2.2.2 Rotations
• Rotation is the process of moving nodes either to left or to right to make the tree
balanced. To balance itself, an AVL tree may perform the following four kinds of
rotations:
1. Left Rotation (LL Rotation)
2. Right Rotation (RR Rotation)
3. Left-Right Rotation (LR Rotation)
4. Right-Left Rotation (RL Rotation)
• The first two rotations are single rotations and the next two rotations are double
rotations.
Single Left Rotation (LL Rotation):
• In LL Rotation, every node moves one position to left from the current position.
• To understand LL Rotation, let us consider the following insertion operation in AVL
tree.
2.8
Data Structures & Algorithms - II Efficient Search Trees
insert 1, 2 and 3
-2 -2
1 1
-1 -1
0
2 2
2
0 0 0
0
1 3
3 3
Fig. 2.9
• In RR Rotation, every node moves one position to right from the current position.
2 2
3 3
1 1
0
2 2
2
0 0 0 0
1 3
1 1
Fig. 2.10
• The LR Rotation is a sequence of single left rotation followed by a single right rotation.
• In LR Rotation, at first, every node moves one position to the left and one position to
right from the current position.
2.9
Data Structures & Algorithms - II Efficient Search Trees
insert 3, 1 and 2
2
2 2 3
3 3 0
-1 -1 1 2
1 1 After LL 2 After RR 0 0
Rotation Rotation
0 0 1 3
0
2 2
1
2.10
Data Structures & Algorithms - II Efficient Search Trees
insert 2
-1
1
Tree is balanced
0
2
insert 3
-2 -2
1 1 0
2
-1 -1 After LL Rotation 0 0
2 2 1 3
0 0
3 3
0 -1 Tree is balanced
1 3
0
4
insert 5
-2 -2 -1
2 2 2
0 0 0
-2 -2 0
1 1 1
3 3 4
After LL
-1 -1 Rotation at 3 0 0
4 4 3 5
0 0
5 5
2.11
Data Structures & Algorithms - II Efficient Search Trees
insert 6
-2 -2
2 2
0 0
-1 -1 0
1 1
4 3 4
0 0 After LL 0
-1 -1 Rotation at 2 -1
3 5 3 4 2 5
0 0 0 0 0
Becomes right
6 child of 2 5 1 3 6
insert 7
-1 -1
4 4
0 0
-2 -2 0
2 2
5 5 4
0 0 0 0 After LL 0
-1 -1 Rotation at 5
1 3 6 1 3 6 2 6
0 0 0 0 0 0
7 7 1 3 5 7
insert 8
-1
4
0 -1
2 6
0 0 0 -1
1 3 5 7
0
8
Tree is balanced
2.12
Data Structures & Algorithms - II Efficient Search Trees
Example: Delete the node 30 from the AVL tree shown in the Fig. 2.13.
(0)
15
(0) (0)
12 54
(0) (-1) (1) (0)
8 13 18 60
Fig. 2.13
Step 1:
• The node to be deleted from the tree is 8.
• If we observe it is the parent node of the node 5 and 9.
• Since the node 8 has two children it can be replaced by either of its child nodes.
(0)
15
(0) (0)
12 54
(0) (-1) (1) (0)
8 13 18 60
Fig. 2.14
Step 2:
• The node 8 is deleted from the tree.
• As the node is deleted we replace it with either of its children nodes.
• Here we replaced the node with the inorder successor, i.e. 9.
• Again we check the balance factor for each node.
(0)
15
(0) (0)
12 54
(1) (-1) (1) (0)
9 13 18 60
Fig. 2.15
2.13
Data Structures & Algorithms - II Efficient Search Trees
Step 3:
• Now The next element to be deleted is 12.
• If we observe, we can see that the node 12 has a left subtree and a right subtree.
• We again can replace the node by either it’s inorder successor or inorder
predecessor.
• In this case we have replaced it by the inorder successor.
(0)
15
(0) (0)
12 54
(1) (-1) (1) (0)
9 13 18 60
Fig. 2.16
Step 4:
• The node 12 is deleted from the tree.
• Since we have replaced the node with the inorder successor, the tree structure
looks like shown in the Fig. 2.17.
• After removal and replacing check for the balance factor of each node of the tree.
(0)
15
(1) (0)
13 54
(1) (0) (1) (0)
9 14 18 60
Fig. 2.17
Step 5:
• The next node to be eliminated is 14.
• It can be seen clearly in the image that 14 is a leaf node.
• Thus it can be eliminated easily from the tree.
2.14
Data Structures & Algorithms - II Efficient Search Trees
(0)
15
(1) (0)
13 54
(1) (0) (1) (0)
9 14 18 60
Fig. 2.18
Step 6:
• As the node 14 is deleted, we check the balance factor of all the nodes.
• We can see the balance factor of the node 13 is 2.
• This violates the terms of the AVL tree thus we need to balance it using the rotation
mechanism.
(0)
15
(2) (0)
13 54
(1) (1) (0)
9 18 60
Fig. 2.19
Step 7:
• In order to balance the tree, we identify the rotation mechanism to be applied.
• Here, we need to use LL Rotation.
• The nodes involved in the rotation is shown in Fig. 2.20.
(0)
15
(2) (0)
13 54
(1) (1) (0)
9 18 60
Fig. 2.20
2.15
Data Structures & Algorithms - II Efficient Search Trees
Step 8:
• The nodes are rotated and the tree satisfies the conditions of an AVL tree.
• The final structure of the tree is shown as follows.
• We can see all the nodes have their balance factor as ‘0’, ‘1’ and ‘-1’.
(-1)
15
(0) (0)
9 54
(0) (0) (1) (0)
5 13 18 60
Fig. 2.21
Kamal
0
(iii) Archana
Manisha
2 Kamal
0
LL
Rotation
Kamal
1
Archana Manisha
0 0
Archana
0
2.16
(iv) Reena
Kamal
-1
No balancing is required
Archana Manisha
0 -1
Manisha
2 Kamal
0
LL
Rotation
Kamal
1
Archana Manisha
0 0
Archana
Data Structures0& Algorithms - II Efficient Search Trees
(iv) Reena
Kamal
-1
No balancing is required
Archana Manisha
0 -1
Reena
(v) Nidhi 0
Kamal Kamal
-2 -1
RL
Archana Manisha Rotation Archana Nidhi
0 -2 0 0
Nidhi
0
(vi) Shalaka
Kamal Nidhi
-2 0
RR
Rotation
Archana Nidhi Kamal Reena
0 -1 0 -1
Shalaka
0
Leena
0
Leena Meena
0 0
Sun
Mon Mon Mon RR 0
(i) 0 (ii) -1 (iii) -2 Rotation
Sun Thur
Sun Sun
0 Mon Thur
-1
0 0
Thur
(iv) Fri Sun 0
+1
Fri
0
No balancing
Mon Thur Mon Thur required
0 0 0 -1
RL
No balancing
Rotation Thur No balancing
Thur MonMon
Mon Thur
Thur required Mon Mon Thur
Thur -2 required
0 00 0 -1 -2
0 0 0 -1
Tue
-1
Fri Fri
Fri Sat Sat Wed
Wed Fri Sat Wed
Sat Sat Fri Sat
00 0 01 0 0
0 00 0 0 0 0
Wed
Tue 0
n (vii) Tue Sun 0
Sun Sun
-1 0 Sun 0
0
RL RL
Rotation RL Thur Rotation
Thur Thur
Mon MonRotation Thur-2 Mon
Tue
0 0 Mon -2
-2 -2 00
0
Tue Tue
-1 -1
t Wed Fri Fri Sat Sat Wed Fri Sat
1 0 0 Fri Sat 0 Thur
0 0 1 0Wed
0 0 Wed 0 0 Wed
Tue 0 0
Tue
0 0
Sun Sun
0 0
RL RL
ation Tue Rotation
Mon Tue
0 Mon
0 0
0
2.19
Data Structures & Algorithms - II Efficient Search Trees
40 40
0 1 20 50
0 0
20
0
80
No balancing required
0
(vi) 200 40 80
-2 0
RL
Rotation 100
20 80 40
0 -1 0 -1
50 100
0 -1 20 50 200
0 0 0
200
0
150
0
2.20
Data Structures & Algorithms - II Efficient Search Trees
Blue Blue
0 -1 Blue Red
0 0
Green
0
Orange Red
Orange Orange 0
-1 0
0
Pink
No balancing required
0
(vi) Green
0
Blue Pink
1 0
Black Red
Orange
0 0
0
No balancing required
2.21
Data Structures & Algorithms - II Efficient Search Trees
8 17
1 11 15 25
2.22
Data Structures & Algorithms - II Efficient Search Trees
insert (8)
Tree is Empty. So insert new node as Root node with black color.
8
insert (18)
Tree is not Empty. So insert new node with red color.
18
insert (5)
Tree is mat Empty. So insert new node with red color.
5 18
insert ( 15 )
Tree is not Empty. So insert new node with red color.
8
Here there are two consecutive Red nodes (18 and 15).
5 18 The new node's parent sibling color is Red
and parent's parent is root node.
15 So we use RECOLOR to make it Red Black Tree.
After RECOLOR
8
Here there are two consecutive
Red nodes (15 and 17).
5 18
The new node's parent sibling is NULL.
So we need rotation.
15 Here. we need LR Rotation and Recolor
17
17 15 18
15
2.23
Data Structures & Algorithms - II Efficient Search Trees
insert ( 25 )
Tree is not Empty. So insert new node with red color.
8
Here there are two consecutive
5 17 Red nodes (18 and 25),
The new node's parent sibling color is Red
18 and parent's parent is not root node.
15
So we use RECOLOR and Recheck.
25
After Recolor
8
After Recolor operation, the tree is satisfying
5 17 all Red Black Tree properties.
15 18
25
insert (40)
Tree is not Empty. So insert new node with red color.
8
Here there are two consecutive
5 17 Red nodes (25 and 40).
The new node's parent sibling is NULL
15 18 So we need a Rotation and Recolor.
Here, we use LL Rotation and Recheck
25
After LL Rotation 40
and Recolor
5 17
After LL Rotation and Recolor operation,
the tree is satisfying all Red Black Tree properties.
15 25
18 40
2.24
Data Structures & Algorithms - II Efficient Search Trees
Insert (80)
Tree is not Empty. So insert new node with red color.
8
Here there are two consecutive
5 17 Red nodes (40 and 80). The new node's
parent sibling color is Red
15 25 and parent's parent is not root node.
So we use RECOLOR and Recheck.
18 40
80
After Recolor
8
After Recolor again there are two consecutive
5 17 Red nodes (17 and 25). The new node's
parent sibling color is Black. So we need
15 25 Rotation. We use Left Rotation and Recolor.
18 40
80
17
8 25
5 15 18 40
80
Finally above tree is satisfying all the properties of Red Black Tree and
it is a perfect Red Black tree.
• The deletion operation in red black tree is similar to deletion operation in BST. but
after every deletion operation, we need to check with the red-black tree properties.
• If any of the properties are violated then make suitable operations like recolor,
• In this example, we show the red black trees that result from the successful deletion of
2.25
Data Structures & Algorithms - II Efficient Search Trees
Delete 8
B B
38 38
R B R B
19 41 19 41
R B
R B
12 31 12 31
8
Delete 12
B B
38 38
R B B B
Case-2
19 41 19 41
B
B R
12 31 31
Delete 19
B B
38 38
B B R B
19 41 31 41
R
31
Delete 31
B B
38 38
B R
Case-2
31 41 41
Delete 38 B
B
38
41
R
41
Delete 41
No/Empty Tree.
• A multiway tree can have more than one value/child per node. They are written as m-
way trees where the m means the order of the tree. A multiway tree can have m-1
values per node and m children.
• An m-way tree is a search tree in which each node can have from 0 to m subtrees,
where m is defined as the B-tree order.
• Fig. 2.23 shows an example of m-way tree of order 4.
50 100 150
60 70 90 110 120
75
17 23 42 68 85 96
16 21 57 78 91
11 17 19 21 23 24 45 63 65 85 90 95
2.4.1 B-Tree
• In search trees like binary search tree, AVL tree, red-black tree, etc., every node
contains only one value (key) and a maximum of two children.
• But there is a special type of search tree called B-tree in which a node contains more
than one value (key) and more than two children.
• B-Tree was developed in the year 1972 by Bayer and McCreight with the name Height
Balanced m-way Search Tree. Later it was named as B-Tree.
• A B-tree is a specialized m-way tree that is widely used for disk access. A B tree of
order m can have a maximum of m–1 keys and m pointers to its sub-trees.
• B-tree is a type of tree in which each node can store multiple values and can point to
multiple subtrees.
• The B-trees are useful in case of very large data that cannot accommodated the main
memory of the computer and is stored on disks in the form of files.
• The B-tree is a self-balancing search tree like AVL and red-black trees. The main
objective of B-trees is to minimize/reduce the number of disk accesses for accessing a
record.
• Every B-tree has an order. B-tree of order m has the following properties:
1. All leaf nodes must be at same level.
2. All nodes except root must have at least [m/2]-1 keys and maximum of m-1 keys.
3. All non-leaf nodes except root (i.e. all internal nodes) must have at
least m/2 children.
4. If the root node is a non-leaf node, then it must have at least two children.
5. A non-leaf node with n-1 keys must have n number of children.
6. All the key values in a node must be in Ascending Order.
• Fig. 2.26 shows an example of B-tree of order 5.
66
35 45 80 100
20 30 38 40 48 52 56 68 70 73 77 85 90 110 115
• Now, while inserting the element in the searched leaf node following one of the two
cases may arise:
1. There may be a space in the leaf node to accommodate more elements. In that
case, the element is simply added to the node in such a way that the order of
elements is maintained.
2. There may not be a space in the leaf node to accommodate more elements. In that
case, after inserting new element in the full leaf node, a single middle element is
selected from these elements and is shifted to the parent node and the leaf is split
into two nodes namely, left node and right node (at the same level).
• All the elements less than the middle element are placed in the left node and all the
elements greater than the middle element are placed in the right node.
• If there is no space for middle element in the parent node, the splitting of the parent
node takes place using the same procedure.
• The process of splitting may be repeated all the way to the root. In case the splitting of
root node takes place, a new root node is created that comprises the middle element
from the old root node.
• The rest of the elements of the old root node are distributed in two nodes created as a
result of splitting. The splitting of root node increases the height of B-tree by one.
• For example, consider the following step-by-step procedure for inserting elements in
the B-tree of order 4, i.e. any node can store at most 3 elements and can point to at
most 4 subtrees.
• The elements to be inserted in the B-tree are 66, 90, 40, 75, 30, 35, 80, 70, 20, 50, 45, 55,
110, 100, and 120.
• The element 66 forms the part of new root node, as B tree is empty initially
[Fig. 2.27 (a)].
• Since, each node of B tree can store up to 3 elements, the elements 90 and 40 also
become part of the root node [Fig. 2.27 (b)].
• Now, since the root node is full, it is split into two nodes. The left node stores 40, the
right node stores 90, and middle element 66 becomes the new root node. Since, 75 is
less than 90 and greater than 66, it is placed before 90 in the right node [Fig. 2.27 (c)].
• The elements 30 and 35 are inserted in left sub tree and the element 80 is inserted in
the right sub tree such that the order of elements is maintained [Fig. 2.27 (d)].
• The appropriate position for the element 70 is in the right sub tree, and since there is
no space for more elements, the splitting of this node takes place.
• As a result, the middle element 80 is moved to the parent node, the element 75 forms
the part of the left sub tree (of element 80) and the element 90 forms the part of the
right sub tree (of element 80). The new element 70 is placed before the element 75
[Fig. 2.27 (e)].
• The appropriate position for the element 20 is in the left most sub tree, and since
there is no space for more elements, the splitting of this node takes place as discussed
in the previous step. The new element 20 is placed before the element 30
[Fig. 2.27 (f)].
2.29
Data Structures & Algorithms - II Efficient Search Trees
• This tree can be used for future insertions, but a situation may arise when any of the
sub trees splits and it will be required to adjust the middle element from that sub tree
to the root node where there is no space for more elements.
• Hence, keeping in mind the future requirements, as soon as root node becomes full,
splitting of root node must take place [Fig. 2.27 (g)]. This splitting of root node
increases the height of tree by one.
• Similarly, other elements 50, 45, 55, 110, 100, and 120 can be inserted in this B-tree.
The resultant B-tree is shown in Fig. 2.27 (h).
66 66
66 40 66 90 40 90 40 75 90
66
30 35 40 75 80 90
(d)
66 80 66 80
30 35 40 75 90 30 35 40 70 75 90
(e)
35 66 80 35 66 80
30 40 70 75 90 20 30 40 70 75 90
(f)
66
35 80
20 30 40 70 75 90
(g)
2.30
Data Structures & Algorithms - II Efficient Search Trees
66
35 45 80 100
20 30 40 50 55 70 75 90 110 120
(h)
Fig. 2.27: Insertion Operation in B-Tree Deletion Operation on B-Tree:
• Deletion of an element from a B-tree involves following two steps:
1. Searching the desired element and
2. Deleting the element.
• Whenever, an element is deleted from a B-tree, it must be ensured that no property of
B-tree is violated after the deletion.
• The element to be deleted may belong to either leaf node or internal node.
• Consider a B-tree in Fig. 2.28.
13
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
Fig. 2.28
• If we want to delete 8 then it is very simple (See Fig. 2.29).
13
4 7 17 20
1 3 5 6 11 12 14 16 18 19 23 24 25 26
Fig. 2.29
• Now we will delete 20, the 20 is not in a leaf node so we will find its successor which is
23. Hence, 23 will be moved upto replace 20.
2.31
Data Structures & Algorithms - II Efficient Search Trees
13
4 7 17 23
1 3 5 6 11 12 14 16 18 19 24 25 26
Fig. 2.30
• Next we will delete 18. Deletion of 18 from the corresponding node causes the node
with only one key, which is not desired in B-tree of order 5.
• The slibling node to immediate right has an extra key. In such a case we can borrow a
key from parent and move spare key of sibling to up (See Fig. 2.31).
13
4 7 17 24
1 3 5 6 11 12 14 16 19 23 25 26
Fig. 2.31
• Now delete 5. But deletion of 5 is not easy. The first thing is 5 is from leaf node.
Secondly this leaf node has no extra keys nor siblings to immediate left or right.
• In such a situation we can combine this node with one of the siblings. That means
removes 5 and combine 6 with the node 1, 3.
• To make the tree balanced we have to move parent's key down. Hence, we will move 4
down as 4 is between 1, 3 and 6. The tree will be looks like as shown in Fig. 2.32.
13
7 17 24
1 3 4 6 11 12 14 16 19 23 25 26
Fig. 2.32
• But again internal node of 7 contains only one key which not allowed in B-tree. We
then will try to borrow a key from sibling.
2.32
Data Structures & Algorithms - II Efficient Search Trees
• But sibling 17, 24 has no spare key. Hence, what we can do is that, combine 7 with 13
and 17, 24. Hence the B-tree will be looks like as shown in Fig. 2.33.
7 13 17 24
1 3 4 6 11 12 14 16 19 23 25 26
Fig. 2.33
2.4.2 B+ Tree
• B+ (B Plus) Tree is a balanced, multi-way binary tree. The B+ Trees are extended
version of B-Trees.
• A B+ tree is an m-ary tree with a variable but often large number of children per node.
A B+ tree consists of a root, internal nodes and leaves. The root may be either a leaf or
a node with two or more children.
• A B+ tree can be viewed as a B-tree in which each node contains only keys (not key–
value pairs), and to which an additional level is added at the bottom with linked
leaves.
• The B+ tree is a variation of B-tree in a sense that unlike B-tree, it includes all the key
values and record pointers only in the leaf nodes, and key values are duplicated in
internal nodes for defining the paths that can be used for searching purposes.
• In addition, each leaf node contains a pointer, which points to the next leaf node, that
is, leaf nodes are linked sequentially (See Fig. 2.34).
• Hence, B+ tree supports fast sequential access of records in addition to the random
access feature of B-tree.
• Note that in case txt B+ trees, if key corresponding to the desired record is found in
any internal node, the traversal continues until its respective leaf node is reached to
access the appropriate record pointer.
• Fig. 2.34 shows a sample B+ tree.
Duplicated key
40
values
30 40 77 90
20 30 38 40 48 52 56 68 70 73 77 85 90 110 115
Only leaf nodes having record pointers with the key values
Fig. 2.34
2.33
Data Structures & Algorithms - II Efficient Search Trees
Fig. 2.35
• The value 195 will be inserted in the right sub-tree of 120 after 190. Insert it at the
desired position (See Fig. 2.36).
60 78 108 120
Fig. 2.36
• The node contains greater than the maximum number of elements i.e. 4, therefore
split it and place the median node up to the parent.
60 78 108 120 190
Fig. 2.37
• Now, the index node contains 6 children and 5 keys which violates the B+ tree
properties, therefore we need to split it, shown in Fig. 2.38.
108
60 78 120 190
Fig. 2.38
Deletion in B+ Tree:
• Delete the key and data from the leaves.
• Delete the key 200 from the B+ tree shown in the Fig. 2.39.
2.34
Data Structures & Algorithms - II Efficient Search Trees
108
60 78 120 190
Fig. 2.39
• The value 200 is present in the right sub-tree of 190, after 195, delete it.
108
60 78 120 190
Fig. 2.40
• Merge the two nodes by using 195, 190, 154 and 129.
108
60 78 120
Fig. 2.41
• Now, element 120 is the single element present in the node which is violating the
B+ Tree properties. Therefore, we need to merge it by using 60, 78, 108 and 120.
• Now, the height of B+ tree will be decreased by 1.
60 78 108 120
Fig. 2.42
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which factor on many tree operations related to the height of the tree?
(a) Efficiency (b) Degree
(c) Sibling (d) Path
2.35
Data Structures & Algorithms - II Efficient Search Trees
14. A B-tree of order 4 and of height 3 will have a maximum of _______ keys.
(a) 255 (b) 63
(c) 127 (d) 188
15. Consider the below formations of red-black tree:
100 100 100
50 50 50
NULL NULL NULL
18 18 18
NULL NULL NULL
All the above formations are incorrect for it to be a red black tree. then what may
be the correct order?
(a) 50-black root, 18-red left subtree, 100-red right subtree
(b) 50-red root, 18-red left subtree, 100-red right subtree
(c) 50-black root, 18-black left subtree, 100-red right subtree
(d) 50-black root, 18-red left subtree, 100-black right subtree
16. What is the special property of red-black trees and what root should always be?
(a) a color which is either red or black and root should always be black color
(b) height of the tree
(c) pointer to next node
(d) a color which is either green or black
Answers
1. (a) 2. (d) 3. (c) 4. (c) 5. (b) 6. (a) 7. (a)
8. (d) 9. (c) 10.(a) 11. (b) 12. (d) 13. (b) 14. (a)
15. (a) 16. (a)
2.37
Data Structures & Algorithms - II Efficient Search Trees
2.39
Data Structures & Algorithms - II Efficient Search Trees
2.40
CHAPTER
3
Graph
Objectives …
To study Basic Concepts of Graph Data Structure
To learn Graph Terminology
To understand Representation and Operations of Graph
To study Graph Traversals
To learn Greedy Strategy, Dynamic Programming Strategy of Graphs
3.0 INTRODUCTION
• Graph is one of the most important non-linear data structure. A graph is a pictorial
representation of a set of objects where some pairs of objects (vertices) are connected
by links (edges).
• A graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices.
3.1.1 Concept
• A graph G is a set of two tuples G = (V, E) where, V is a finite non-empty set of vertices,
and E is the set of pairs of vertices called edges.
• Fig. 3.1 shows an example of graph.
B
A
C D
3.1
Data Structures & Algorithms - II Graph
• There are two types of Graph, Undirected graph and Directed graph.
1. Undirected Graph:
• In an undirected graph, the pair of vertices representing any edge is unordered i.e. the
pairs (v1, v2) and (v2, v1) represent the same edge.
• In other words, the edges have no direction in undirected graph.
Example: Consider the Fig. 3.2.
B B A
A A
B C
D C D C D E
A
B C B C
C D
E D D E F
(a)
(a) (b) (c)
(b) (c)
Fig. 3.3: Directed Graphs
• In Fig. 3.3 (a) G = (V, E) where V = {A, B, C, D, E} and E = {(A, B), (A, C), (C, D),
(A, E), (E, D)}.
• In Fig. 3.3 (b) G = (V, E) where V = {A, B, C, D} and E = {(A, B), (A, C), (C, D), (D, B)}.
• In Fig. 3.3 (c) G = (V, E) where V = {A, B, C, D} and E = {(A, B), (A, C), (B, D), (B, E), (C, F)}.
3.2
Data Structures & Algorithms - II Graph
3.1.2 Terminology
• Basic graph terminology listed below:
1. Adjacent Vertex: When there is an edge from one vertex to another then these
vertices are called adjacent vertices. Node 1 is called adjacent to node 2 as there
exists an edge from node 1 and node 2 as shown in Fig. 3.4.
4
2
3
Fig. 3.4
2. Cycle: A path from a vertex to itself is called a cycle. Thus, a cycle is a path in
which the initial and last vertices are same.
3.3
Data Structures & Algorithms - II Graph
Example: Fig. 3.5 shows the path (A, B, C, A) or (A, C, D, B, A) are cycles of different
lengths. If a graph contains a cycle it is cyclic otherwise acyclic.
A
B C
C D
C C
Fig. 3.7: Connected
(a) Graph Fig. 3.8 Non-connected Graph
5. Degree of a Vertex: It is the number of edges incident(b)
to a vertex. It is written as
degree(V), where V is a vertex.
6. In-degree of a Vertex: In directed graph, in-degree of a vertex ‘v’ is the number of
edges for which ‘v’ is the head.
A B
Indegree of vertex B = 1
D Indegree of vertex C = 2
Indegree of vertex D = 2
E C Indegree of vertex E = 1
Indegree of vertex F = 1
F Indegree of vertex A = 1
Fig. 3.9: Digraph
3.4
Data Structures & Algorithms - II Graph
7. Out-degree of a Vertex: In directed graph, the out-degree of a vertex ‘v’ is the total
number of edges for which ‘v’ is the tail.
From Fig. 3.9:
Outdegree of A = 1
Outdegree of B = 2
Outdegree of C = 2
Outdegree of D = 0
Outdegree of E = 2
Outdegree of F = 1
8. Isolated Vertex: If any vertex does not belong to any edge then it is called isolated
vertex.
B
D
A
C B
B C A
D B C
(a) Graph
(a) G (b) Subgraph of G
Fig. 3.12
3.5 (b)
Data Structures & Algorithms - II Graph
13. Weighted Graph: A weighted graph is a graph in which every edge is assigned a
weight. In Fig. 3.13 weight in a graph denote the distance between the two vertices
connected by the corresponding edge. The weight of an edge is also called its cost.
In case of a weighted graph, an edge is a 3-tuple (U, V, W) where U and V are
vertices and W is weight of an edge (U, V).
5
4
1
2
5
2 58
14
4
3 34
5
A B A B
C D C D
1 2
3 4
3.6
Data Structures & Algorithms - II Graph
3.2.1 Sequential Representation of Graph [April 16, 17, 18, 19 Oct. 17, 18]
• Graphs can be represented through matrix (array) in computer system's memory. This
is sequential in nature. This type of representation is called sequential representation
of graphs.
• The graph when represented using sequential representation using matrix, is
commonly known as Adjacency Matrix.
• Let G = (V, E) be a graph with n vertices, where n>=1. An adjacency matrix of G is a
2-dimentional n × n array, say A, with the following property:
A[i][j] = 1 if the edge (vi, vj) is in E(G)
= 0 if there is no such edge in G
• If the graph is undirected then,
A[i] [j] = A[j] [i] = 1
1 1
1
2 3 2 3 2
4 5 6 7 3
4
G1 G2 G3
Fig. 3.16: Undirected Graph G1, G2 and Directed Graph G3
• The graphs G1, G2 and G3 of Fig. 3.16 are represented using adjacency matrix in
Fig. 3.17.
1 2 3 4 1 2 3
1 0 1 1 1 1 0 1 0
2 1 0 1 1 2 1 0 1
3 1 1 0 1 3 0 0 0
4 1 1 1 0
G1 G3
3.7
Data Structures & Algorithms - II Graph
1 2 3 4 5 6 7
1 0 1 1 1 0 0 1
2 1 0 1 1 1 0 0
3 1 1 0 0 0 1 1
4 1 1 0 0 1 0 0
5 0 1 0 1 0 0 0
6 0 0 1 0 0 0 1
7 1 0 1 0 0 1 0
G2
Fig. 3.17 (a): Adjacency Matrix for G1, G2 and G3 of Fig. 3.16
• Adjacency Matrix Representation of a Weighted Graph:
For weighted graph, the matrix A is represented as,
A[i] [j] = weight of the edge (i, j)
= 0 otherwise 1
Here, weight is labelled associated with edge.
Example, following is the weighted graph and its associated adjacency matrix.
1 19
1 2 3 4 5
10 15 1 0 15 12 19 0
4
2 10 0 14 0 0
2 12
3 0 0 0 0 9
8
14 6 4 0 0 0 0 0
3 5 5 0 0 6 8 0
9
Fig. 3.17 (b)
Example 1: Consider the following undirected graph and provide adjacency matrix.
The graph has 4 vertices and it is undirected graph. Write 1 to Number of vertices i.e. 1 to
4 as row and column headers to represent it in adjacency matrix. If edge exists between
any two nodes (row, column headers indicate nodes) write it as 1 otherwise 0 in the
matrix.
1 1 2 3 4
1 0 1 1 1
1
2 3 2 0 1 0
31 1 0 1
4 1 0
4
0 1
(a) Undirected Graph (b) Adjacency Matrix
Fig. 3.18
3.8
Data Structures & Algorithms - II Graph
3.9
Data Structures & Algorithms - II Graph
for(j=0;j<n;j++)
{
printf("Enter Edge from %d to %d,(1: Edge 0: No edge) \n",i,j);
scanf("%d",&adj[i][j]);
}
for(i=0;i<n;i++)
{
degree(adj,i,n);
}
return 0;
}
Output:
Enter the total number of nodes in graph4
Enter Edge from 0 to 0,(1: Edge 0: No edge)
0
Enter Edge from 0 to 1,(1: Edge 0: No edge)
1
Enter Edge from 0 to 2,(1: Edge 0: No edge)
0
Enter Edge from 0 to 3,(1: Edge 0: No edge)
1
Enter Edge from 1 to 0,(1: Edge 0: No edge)
1
Enter Edge from 1 to 1,(1: Edge 0: No edge)
0
Enter Edge from 1 to 2,(1: Edge 0: No edge)
1
Enter Edge from 1 to 3,(1: Edge 0: No edge)
1
Enter Edge from 2 to 0,(1: Edge 0: No edge)
0
Enter Edge from 2 to 1,(1: Edge 0: No edge)
1
Enter Edge from 2 to 2,(1: Edge 0: No edge)
0
Enter Edge from 2 to 3,(1: Edge 0: No edge)
1
Enter Edge from 3 to 0,(1: Edge 0: No edge)
1
Enter Edge from 3 to 1,(1: Edge 0: No edge)
1
Enter Edge from 3 to 2,(1: Edge 0: No edge)
1
Enter Edge from 3 to 3,(1: Edge 0: No edge)
0
3.10
Data Structures & Algorithms - II Graph
3.2.2 Linked Representation of Graphs [April 16, 17, 18, 19 Oct. 17, 18]
• We use the adjacency list for the linked representation of the graph. In adjacency lists
representation the n rows of the adjacency matrix are represented as n linked lists.
• The adjacency list representation maintains each node of the graph and a link to the
nodes that are adjacent to this node. When we traverse all the adjacent nodes, we set
the next pointer to null at the end of the list.
• Example: Consider the graph G. The Fig. 3.20 shows the graph G and linked
representation of G in memory. The linked representation will contain two lists:
1. A node vertex list: To keep the track of all the N nodes of the graph.
2. An edge list: To keep the information of adjacent vertices of each and every vertex
of a graph.
• Head node is used to represent each list, i.e. we can represent G by an array Head,
where Head [i] is a pointer to the adjacency list of vertex i.
1 2
5
C 4
2 3 4 NULL
3 4 NULL
5 4 NULL
2 1 3 4 NULL
2 3
3 1 2 4 NULL
4 1 2 3 NULL
4
Undirected graph G1
Fig.3.22: Adjacency List Representation of G1 (Undirected Graph)
• The structure for adjacency list representation can be defined in C as follows:
#define MAX_V 20
struct node
{
int vertex;
struct node * next;
}
struct node head[MAX_V];
• In this representation every edge (vi, vj) of an undirected graph is represented twice,
once in the list of vi and in the list of vj.
• For directed graph time required is O(n) to determine whether, there is an edge from
vertex i to vertex j and for undirected graph time is O(n + e).
3.12
Data Structures & Algorithms - II Graph
Example 1: Consider the undirected graph in Fig. 3.23 and provide adjacency list.
1 2 3
4 5
6 7
Fig. 3.23
Solution: The adjacency list is given in Fig. 3.24
First edge
1 2 5 4
2 1 3
3 2 5
4 1 6
5 1 3 6
6 4 5
7
Fig. 3.24
Example 2: Consider the weighted undirected graph in Fig. 3.25 and provide
adjacency list.
12
A B
7 1
17
15 F 2 C
19 10 6
E D
14
Fig. 3.25
Solution: An adjacency list for this graph is:
A B 12 E 15 F 17
B A 12 C 1 D 2 F 7
C B 1 D 6
D B 2 C 6 E 14 F 10
E A 15 D 14 F 19
F A 17 B 7 D 10 E 19
Fig. 3.26
3.13
Data Structures & Algorithms - II Graph
2
2
3 1 2
3
3.14
Data Structures & Algorithms - II Graph
0 edge (1,2)
N4 1 2 N5 N6
N5 1 3 N6 edge (1,3)
1 2
N6 2 3 edge (2,3)
3
Lists: Vertex 0: N1 N2 N3, Vertex 1: N1 N4 N5
Vertex 2: N2 N4 N6, Vertex 3: N3 N5 N6
3.3.1 Depth First Search (DFS) [April 16, 17, 18, 19 Oct. 17, 18]
3.15
Data Structures & Algorithms - II Graph
3.16
Data Structures & Algorithms - II Graph
begin
visited[v] = 1
display vertex i
push anyone adjacent vertex x of v onto STACK which is not visted
end
end
3. Stop.
Pseudo 'C' Code for Recursive DFS:
int visited[MAX]={0};
DFS (int i)
{ int k;
visited[i] = 1;
printf(“ %d”,i);
for(k=0;k<n;k++)
if (A[i,k]==1 && visited[k]==0)
DFS (k);
}
• Let us consider graph 3.29 drawn below:
b c f
a e
d g h i
Sr.
Steps Stack Spanning Tree
No.
1. Select vertex 'a' as starting point
(visit 'a'). Push 'a' onto the stack a a
a a
Visited = {a} Visited = {a}
3.17
Data Structures & Algorithms - II Graph
c
3. Visit any adjacent vertex of 'b' which c b c
bb c
b a
is not visited ('c'). Push 'c' onto the a a
a
stack. Visited = {a,b,c}
Visited = {a,b,c}
f f b bc cf f
4. Visit any adjacent vertex of 'c' which c c
is not visited ('f'). Push 'f' onto the b ba a
stack. a a ={a,b,c,f
Visited Visited ={a,b,c,f
} }
Visited = {a, b, c, f, e, g}
contd. …
3.18
Data Structures & Algorithms - II Graph
contd. …
3.19
Data Structures & Algorithms - II Graph
d g h i
Stack becomes Empty. So Stop DFS traversal, final result of DFS traversal is following
spanning tree.
b c f
a
e
d g h i
3.3.2 Breadth First Search (BFS) [April 16, 17, 18, 19 Oct. 17, 18]
• Another systematic way of visiting the vertices is Breadth-First Search (BFS). Starting
at vertex v and mark it as visited, the BFS differs from DFS, in that all unvisited
vertices adjacent to i are visited next.
• Then unvisited vertices adjacent to these vertices are visited and so on until the all
vertices has been in visited. The approach is called 'breadth first' because from vertex
i that we visit, we search as broadly as possible by next visiting all the vertices
adjacent to i.
• In BFS method, all the vertices are stored in a Queue and each vertex of the graph is
visited or explored once.
• The oldest vertex (added first) in the Queue is explored first. To traverse the graph
using breadth first search, Adjacent List of a graph is created first.
• For example, the BFS of graph of Fig. 3.30 results in visiting the nodes in the following
order: 1, 2, 3, 4, 5, 6, 7, 8.
1
2 3
4 5 6
7
Fig. 3.30
• This search algorithm uses a queue to store the adjacent vertices of each vertex of the
graph as and when it is visited.
3.20
Data Structures & Algorithms - II Graph
• These vertices are then taken out from queue in a sequence (FIFO) and their adjacent
vertices are visited and so on until all the vertices have been visited. The algorithm
terminates when the queue is empty.
• The algorithm for BFS is given below. The algorithm initializes the Boolean array
visited[ ] to 0 (false) i.e. marks each vertex as unvisited.
for i = 1 to n do
visited[i] = 0.
Algorithm:
BFS (i)
begin
visited[i] = 1
add (Queue, i)
while not empty (Queue) do
begin
i = delete (Queue)
for all vertices j adjacent to i do
begin
if (visited[j] = 0)
add (Queue, j)
visited[j] = 1
end
end
end
• Here, while loop is executed n time as n is the number of vertices and each vertex is
inserted in a queue once. If adjacency list representation is used, then adjacent nodes
are computed in for loop.
Pseudo 'C' code for recursive BFS:
BFS (int i)
{
int k,visited[MAX];
struct queue Q;
for(k=0;k<n;k++)
visited[k] = 0;
visited[i] = 1;
printf(“ %d”,i);
insert(Q,i);
3.21
Data Structures & Algorithms - II Graph
a e
d g h i
• Let us traverse the graph using non-recursive algorithm which uses queue. Let 'a' be a
start vertex. Initially queue is empty. Initial set of visited vertices, V = φ.
(i) Add ‘a’ to the queue. Mark ‘a’ as visited. V = {a}
a
Front
Rear
(ii) As queue is not empty,
vertex = delete( ) = ‘a’.
Add all unvisited adjacent vertices of ‘a’ to the queue. Also mark them visited.
V = {a, b, d, e}
b d e
Front Rear
3.22
Data Structures & Algorithms - II Graph
d e c
Front Rear
(iv) As queue is not empty,
Vertex = delete( ) = ‘d’. Now insert all adjacent, unvisited vertices of ‘d’ to the
queue and mark them as visited
V = {a, b, d, e, c, g}
e c g
Front Rear
c g
Front Rear
(vi) As queue is not empty, vertex = delete( ) = ‘c’.
Insert all adjacent, unvisited vertices of ‘c’ to the queue and mark them visited.
V = {a, b, d, e, c, g, f}
g f
Front Rear
(vii) As queue is not empty, vertex = delete( ) = ‘g’.
Insert all unvisited adjacent vertices of ‘g’ to the queue and mark them visited.
V = {a, b, d, e, c, g, f, h}
f h
Front Rear
3.23
Data Structures & Algorithms - II Graph
Front Rear
(ix) As queue is not empty, vertex = delete( ) = ‘h’.
Insert its unvisited adjacent vertices to queue and Mark them.
V = {a, b, d, e, c, g, f, h, i}
i
Front Rear
(x) As queue is not empty,
Vertex = delete( ) = ‘ i ’, No adjacent vertices of i are unvisited.
(xi) As queue is empty, stop.
The sequence in which vertices are visited by BFS is as:
a, b, d, e, c, g, f, h, i.
2 5 7
b c f
1 a e 4
d g h i
3 6 8 9
Fig. 3.31
Program 3.2: To create a graph and represent using adjacency matrix and adjacency list
and traverse in BFS order.
#include<stdio.h>
#include<stdlib.h>
struct q
{
int data[20];
int front, rear;
} q1;
struct node
{
int vertex;
struct node * next;
} * v[10];
3.24
Data Structures & Algorithms - II Graph
3.26
Data Structures & Algorithms - II Graph
while (temp)
{
printf("v%d -> ", temp -> vertex);
temp = temp -> next;
}
printf("null");
}
}
void bfs(int m[10][10], int n) //create adjacency list
{
int i, j, v, w;
int visited[20];
initq();
for(i=0; i<n; i++)
visited[i] = 0;
printf("\n \t The BFS traversal is: \n");
v=0;
visited[v] = 1;
add(v);
while(! emptyq())
{
v = del();
printf("\n v%d ", v + 1);
printf("\n");
for(w = 0; w < n; w++)
if((m[v][w] ==1) && (visited[w] == 0))
{
add(w);
visited[w] = 1;
}
}
}
/* main program */
void main()
{
int m[10][10], n;
printf("\n \t enter no. of vertices");
scanf("%d", &n);
create(m,n);
disp(m,n);
create1(m,n);
displist(n);
bfs(m,n);
}
3.27
Data Structures & Algorithms - II Graph
B C B C
D E F D E F
A, B, D, C, E, F A, B, C, D, E, F
P R
Q S P Q S R
1 3
1 3
3.29
Data Structures & Algorithms - II Graph
• In an algorithm to find the topological sort of an acyclic directed graph, the indegree of
the vertices is considered. Following are the steps that are repeated until the graph is
empty.
1. Select any vertex Vi with 0 indegree.
2. Add vertex Vi to the topological sort (initially the topological sort is empty).
3. Remove the vertex Vi along with its edges from the graph and decrease the
indegree of each adjacent vertex of Vi by one.
• To illustrate this algorithm, consider an acyclic directed graph shown in Fig. 3.35.
2
3
2 6
0 1
1 3 5 3 Topological Short: Empty
4 7 3
2
Fig. 3.35: Acyclic Directed Graph
• The steps for finding topological sort for this graph are shown in Fig. 3.36.
2 1
3 3
2 6 2 6
0 1 0
1 3 5 3 Topological sort: 1 3 5 3
4 7 3 4 7 3
2 1
(a) Removing Vertex 1 with 0 Indegree
1 0
3 3
2 6 2 6
0
3 5 3 Topological sort: 1,3 5 2
4 7 3 4 7 2
1 0
(b) Removing Vertex 3 with 0 Indegree
3.30
Data Structures & Algorithms - II Graph
0
3 2
2 6 6
4 7 2 4 7 2
0 0
(c) Removing Vertex 2 with 0 Indegree
2 2
6 6
4 7 2 7 1
0
(d) Removing Vertex 4 with 0 Indegree
2
6 6 1
0
7 1 7
0
Topological sort: 1,3,2,4,5,7 6 Topological sort: 1,3,2,4,5,7,6
0
7
3.31
Data Structures & Algorithms - II Graph
1 3 2 4 5 7 6
(a)
1 3 4 2 5 7 6
(b)
Fig. 3.37: Graphical representation of Topological Sort
• Topological sort is useful for the proper scheduling of various subtasks to be executed
for completing a particular task. In computer field, it is used for scheduling the
instructions.
• For example, consider a task in which smaller number is to be subtracted from the
larger one. The set of instructions for this task is as follows:
1. if A>B then goto Step 2, else goto Step 3
2. C = A-B, goto Step 4
3. C = B-A, goto Step 4
4. Print C
5. End
• The two possible scheduling orders to accomplish this task are (1, 2, 4, 5) and
(1, 3, 4, 5). From this, it can be concluded that the instruction 2 cannot be executed
unless the instruction 1 is executed before it. Moreover, these instructions are non-
repetitive, hence acyclic in nature.
• A spanning tree of a connected graph G is a tree that covers all the vertices and the
edges required to connect those vertices in the graph.
• Formally, a tree T is called a spanning tree of a connected graph G if the following two
conditions hold:
1. T contains all the vertices of G, and
2. All the edges of T are subsets of edges of G.
• For a given graph G with n vertices, there can be many spanning trees and each tree
will have n − 1 edges. For example, consider a graph as shown in Fig. 3.38.
• Since, this graph has 4 vertices, each spanning tree must have 4 − 1 = 3 edges. Some of
the spanning trees for this graph are shown in Fig. 3.39.
• Observe that in spanning trees, there exists only one path between any two vertices
and insertion of any other edge in the spanning tree results in a cycle.
• The spanning tree generated by using depth-first traversal is known as depth-first
spanning tree. Similarly, the spanning tree generated by using breadth-first traversal
is known as breadth-first spanning tree.
1
2 3 4
2 3 4 2 3 4
(a) (b)
1 1
2 3 4 2 3 4
(c) (d)
Fig. 3.39: Spanning Trees of Graph G
• For a connected weighted graph G, it is required to construct a spanning tree T such
that the sum of weights of the edges in T must be minimum. Such a tree is called a
minimum spanning tree.
3.33
Data Structures & Algorithms - II Graph
• There are various approaches for constructing a minimum spanning tree out of which
Kruskal's algorithm and Prim's algorithm are commonly used.
• If each edge of the graph is associated with a weight and there exists more than one
spanning tree, we need to find the minimum spanning tree of the graph.
• A Minimum Spanning Tree (MST) or minimum weight spanning tree is a subset of the
edges of a connected, edge-weighted undirected graph that connects all the vertices
together, without any cycles and with the minimum possible total edge weight.
• A minimum spanning tree in an undirected connected weighted graph is a spanning
tree of minimum weight (among all spanning trees).
• In a minimum spanning tree of a graph, the maximum weight of an edge is the
minimum possible from all possible spanning trees of that graph.
• In a minimum spanning tree of a graph, the maximum weight of an edge is the
minimum possible from all possible spanning trees of that graph.
• In the left image you can see a weighted undirected graph, and in the right image you
can see the corresponding minimum spanning tree.
5
4 3 4 3
9 8
3
5 1 3 6 5 1 3 6
4 7 4 7
1
C 2 1
C 2
2 2
(a) Weighted Undirected Graph (b) Minimum Spanning Tree
Fig. 3.40
Prim’s Minimum Spanning Tree (MST) Algorithm:
• Prim’s algorithm is a greedy strategy to find the minimum spanning tree. In this
algorithm, to form a MST we can start from an arbitrary vertex.
• Prim's algorithm shares a similarity with the shortest path first algorithms.
• Prim's algorithm, treats the nodes as a single tree and keeps on adding new nodes to
the spanning tree from the given graph.
• Example: Consider the following graph.
9
6
A B
7 5
4
S 3 2 T
8 2
C D
3
1
Fig. 3.41
3.34
Data Structures & Algorithms - II Graph
6
A B
7 5
4
S 3 2 T
8 2
C D
3
1
Fig. 3.42
• Remove all loops and parallel edges from the given graph. In case of parallel edges,
keep the one which has the least cost associated and remove all others.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.43
Step 2: Choose any arbitrary node as root node:
• In this case, we choose S node as the root node of Prim's spanning tree. This node is
arbitrarily chosen, so any node can be the root node. One may wonder why any video
can be a root node. So the answer is, in the spanning tree all the nodes of a graph are
included and because it is connected then there must be at least one edge, which will
join it to the rest of the tree.
Step 3: Check outgoing edges and select the one with less cost:
• After choosing the root node S, we see that S, A and S, C are two edges with weight 7
and 8, respectively. We choose the edge S, A as it is lesser than the other.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.44
• Now, the tree S-7-A is treated as one node and we check for all edges going out from it.
We select the one which has the lowest cost and include it in the tree.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.45
3.35
Data Structures & Algorithms - II Graph
• After this step, S-7-A-3-C tree is formed. Now we'll again treat it as a node and will
check all the edges again. However, we will choose only the least cost edge. In this
case, C-3-D is the new edge, which is less than other edges' cost 8, 6, 4, etc.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.46
• After adding node D to the spanning tree, we now have two edges going out of it
having the same cost, i.e. D-2-T and D-2-B. Thus, we can add either one. But the next
step will again yield edge 2 as the least cost. Hence, we are showing a spanning tree
with both edges included.
A B
7
S 3 2 T
2
C D
3
Fig. 3.47
• We may find that the output spanning tree of the same graph using two different
algorithms is same.
Kruskal's Minimum Spanning Tree (MST) Algorithm:
• Kruskal's algorithm to find the minimum cost spanning tree uses the greedy strategy.
• Kruskal's algorithm treats the graph as a forest and every node it has as an individual
tree. A tree connects to another only and only if, it has the least cost among all
available options and does not violate MST properties.
• To understand Kruskal's algorithm let us consider the Fig. 3.48.
9
6
A B
7 5
4
S 3 2 T
8 2
C D
3
1
Fig. 3.48
3.36
Data Structures & Algorithms - II Graph
6
A B
7 5
4
S 3 2 T
8 2
C D
3
1
Fig. 3.49
• In case of parallel edges, keep the one which has the least cost associated and remove
all others.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.50
Step 2: Arrange all edges in their increasing order of weight:
• The next step is to create a set of edges and weight, and arrange them in an ascending
order of weightage (cost).
B, D D, T A, C C, D C, B B, T A, B S, A S, C
2 2 3 3 4 5 6 7 8
Step 3: Add the edge which has the least weightage:
• Now we start adding edges to the graph beginning from the one which has the least
weight. Throughout, we shall keep checking that the spanning properties remain
intact. In case, by adding one edge, the spanning tree property does not hold then we
shall consider not to include the edge in the graph.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.51
• The least cost is 2 and edges involved are B, D and D, T. We add them. Adding them
does not violate spanning tree properties, so we continue to our next edge selection.
3.37
Data Structures & Algorithms - II Graph
• Next cost is 3, and associated edges are A, C and C, D. We add them again:
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.52
• Next cost in the table is 4, and we observe that adding it will create a circuit in the
graph.
6
A B
7 5
4
S 3 2 T
8 2
C D
3
Fig. 3.53
• We ignore it. In the process we shall ignore/avoid all edges that create a circuit.
6
A B
7 5
S 3 2 T
8 2
C D
3
Fig. 3.54
• We observe that edges with cost 5 and 6 also create circuits. We ignore them and move
on.
A B
7
S 3 2 T
8 2
C D
3
Fig. 3.55
• Now we are left with only one node to be added. Between the two least cost edges
available 7 and 8, we shall add the edge with cost 7.
A B
7
S 3 2 T
2
C D
3
Fig. 3.56
3.38
Data Structures & Algorithms - II Graph
• By adding edge S, A we have included all the nodes of the graph and we now have
minimum cost spanning tree.
2
C 5
6
Fig. 3.57
Distance 0 3 1 6 ∞ ∞
1
Distance From 0 0 0
Step 2:
0
No. of Nodes 0 1 2 3 4 5 3
1
Distance 0 3 0 5 6 4 1
Distance From 0 2 2 2
2
Step 3:
0
3
No. of Nodes 0 1 2 3 4 5 1
1
Distance 0 0 0 5 3 4
3 2
Distance From 2 1 2
3.39
Data Structures & Algorithms - II Graph
Step 4:
0
3
No. of Nodes 0 1 2 3 4 5 1
1
Distance 0 0 0 5 0 4
Distance From 2 2 3 2
4
4 5
Step 5:
0
3
No. of Nodes 0 1 2 3 4 5 1
1 3
Distance 0 0 0 3 0 0
Distance From 2 2 3 2 2
4
4 5
Hence, the minimum distance of vertex 9 from vertex 1 is 20. And the path is
1 → 3 →7 →8 →6 →9
This path is determined based on predecessor information.
7 8
2 5 6
5 3 4
2
7 9
1 2 4 5 3
6
2 3
3 7 8
9 2
Fig. 3.58
Example: Consider the graph in Fig. 3.59.
¥ ¥
2
B D
10
8
¥ A 1 4 7 9
3
C E
2
¥ ¥
Fig. 3.59
Procedure for Dijkstra's Algorithm:
Step 1 : Consider A as source Vertex.
¥
B
No. of Nodes A B C D E 10
Distance 0 10 3 ∞ ∞
0 A
Distance From A A
3
C
¥
Step 2 : Now consider vertex C.
10 ¥
B D
No. of Nodes A B C D E
10
Distance 0 7 3 11 5 8
0 A
Distance From C C C C
3
C E
2
3 ¥
3.41
Data Structures & Algorithms - II Graph
Therefore,
A B C D E
0 ∞ ∞ ∞ ∞
A 0 10 3 ∞ ∞
C 7 3 11 5
E 14 5
B 9
D 16
3.42
Data Structures & Algorithms - II Graph
C E
3
Fig. 3.60
• The all-pairs shortest path problem, in which we have to find shortest paths between
every pair of vertices in the graph.
• The Floyd Warshall Algorithm is for solving the All Pairs Shortest Path problem. The
all pair shortest path algorithm is also known as Floyd-Warshall algorithm is used to
find all pair shortest path problem from a given weighted graph.
• The Floyd-Warshall algorithm is used for finding the shortest path between every pair
of vertices in the graph.
• This algorithm works for both directed as well as undirected graphs. This algorithm is
invented by Robert Floyd and Stephen Warshall hence it is often called as Floyed -
Warshall algorithm.
• Floyd-Warshall algorithm is used to find all pair shortest path problem from a given
weighted graph.
• As a result of this algorithm, it will generate a matrix, which will represent the
minimum distance from any node to all other nodes in the graph.
2 0 1 −3 2 −4
3 4
3 0 −4 1 −1
7 4 0 5 3
7 1
1 3
8
-4 2 -5 2 −1 −5 0 −2
C
5
6
4
8 5 1 6 0
Fig. 3.61
3.43
Data Structures & Algorithms - II Graph
• At first, the output matrix is the same as the given cost matrix of the graph. After that,
the output matrix will be updated with all vertices k as the intermediate vertex.
Input and Output:
Input: The cost matrix of the graph.
036∞∞∞∞
3021∞∞∞
620142∞
∞1102∞4
∞∞42021
∞∞2∞201
∞∞∞4110
Output: Matrix of all pair shortest path.
0345677
3021344
4201323
5110233
6332021
7423201
7433110
3.44
Data Structures & Algorithms - II Graph
• On The Graph API, everything is a vertice or node. This are entities such as Users,
Pages, Places, Groups, Comments, Photos, Photo Albums, Stories, Videos, Notes, Events
and so forth. Anything that has properties that store data is a vertice.
• The Graph API uses this, collections of vertices and edges (essentially graph data
structures) to store its data.
• The Graph API has come into some problems because of it's ability to obtain unusually
rich info about user's friends.
Jacie
John Juan
Jack Julia
Jade Jeff
Fig. 3.62: A sample Graph (in this graph, individuals are represented with nodes (circles) and
individuals who know each other are connected with edges (lines))
Example: Define spanning tree and minimum spanning tree. Find the minimum spanning
tree of the graph shown in Fig. 3.62.
Using Prim's Algorithm:
Let X be the set of nodes explored, initially X = {A}.
A
5
4
3
B E
6
2 5 6 7
C
C
1
D
B X = {A, B}
Step 2 : Taking minimum weight edge of all Adjacent edges of X = {A, B}.
A
4
B
X = {A, B, C}
2
C
C
3.45
Data Structures & Algorithms - II Graph
Step 3 : Taking minimum weight edge of all Adjacent edges of X = {A, B, C}.
A
4
2 X = {A, B, C, D)
C
C
1
D
Step 4 : Taking minimum weight edge of all Adjacent edges of X = {A, B, C, D}.
A
4
3
B E
2 X = {A, B, C, D, E}
C
C
1
D
All nodes of graph are there with set X, so we obtained minimum spanning tree of
cost: 4 + 2 + 1 + 3 = 10.
Using Kruskal's Algorithm:
• Let X be the set of nodes explored, initially X = (A).
A
5
4
3
B E
6
2 5 6 7
C
C
1
D
Step 1 : Taking minimum edge (C, D).
C
C
1
D
Step 2 : Taking next minimum edge (B, C).
B
C
C
1
D
Step 3 : Taking next minimum edge (B, E).
3
B E
C
C
1
D
3.46
Data Structures & Algorithms - II Graph
C
C
1
D
Step 5 : Taking next minimum edge (A, E) it forms cycle so do not consider.
Step 6 : Taking next minimum edge (C, E) it forms cycle so do not consider.
Step 7 : Taking next minimum edge (A, D) it forms cycle so do not consider.
Step 8 : Taking next minimum edge (A, C) it forms cycle so do not consider.
Step 9 : Taking next minimum edge (E, D) it forms cycle so do not consider.
All edges of graph have been visited, so we obtained minimum spanning tree of
cost: 4 + 2 + 1 + 3 = 10.
Program 3.3: Program that accepts the vertices and edges of a graph. Create adjacency list
and display the adjacency list.
#include <stdio.h>
#include <stdlib.h>
#define new_node (struct node*)malloc(sizeof(struct node))
struct node
{
int vertex;
struct node *next;
};
void main()
{
int choice;
do
{
printf("\n A Program to represent a Graph by using an
Adjacency List \n ");
printf("\n 1. Directed Graph ");
printf("\n 2. Un-Directed Graph ");
printf("\n 3. Exit ");
printf("\n\n Select a proper choice : ");
scanf("%d", &choice);
3.47
Data Structures & Algorithms - II Graph
switch(choice)
{
case 1 : dir_graph();
break;
case 2 : undir_graph();
break;
case 3 : exit(0);
}
}while(1);
}
int dir_graph()
{
struct node *adj_list[10], *p;
int n;
int in_deg, out_deg, i, j;
printf("\n How Many Vertices ? : ");
scanf("%d", &n);
for( i = 1 ; i <= n ; i++ )
adj_list[i] = NULL;
read_graph (adj_list, n);
printf("\n Vertex \t In_Degree \t Out_Degree \t Total_Degree ");
for (i = 1; i <= n ; i++ )
{
in_deg = out_deg = 0;
p = adj_list[i];
while( p != NULL )
{
out_deg++;
p = p -> next;
}
for ( j = 1 ; j <= n ; j++ )
{
p = adj_list[j];
while( p != NULL )
{
if ( p -> vertex == i )
in_deg++;
p = p -> next;
}
}
3.48
Data Structures & Algorithms - II Graph
3.50
Data Structures & Algorithms - II Graph
1 2 0 2
2 1 2 3
3 0 1 1
4 1 1 2
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which is a non-linear data structure?
(a) Graph (b) Array
(c) Queue (d) Stack
2. A graph G is represented as G = (V, E) where,
(a) V is set of vertices (b) E is set of edges
(c) Both (a) and (b) (d) None of these
3. In which representation, the graph is represented using a matrix of size total
number of vertices by a total number of vertices.
(a) Adjacency List (b) Adjacency Matrix
(c) Adjacency Queue (d) Adjacency Stack
4. Each node of the graph is called a ______.
(a) Edge (b) Path
(c) Vertex (d) Cycle
5. Which representation of graph is based on linked lists?
(a) Adjacency List (b) Adjacency Matrix
(c) Both (a) and (b) (d) None of these
6. Which of a graph means visiting each of its nodes exactly once?
(a) Insert (b) Traversal
(c) Delete (d) Merge
7. Which is a vertex based technique for finding a shortest path in graph?
(a) DFS (b) BST
(c) BFS (d) None of these
8. Which usually implemented using a stack data structure?
(a) DFS (b) BST
(c) BFS (d) None of these
9. Which is a graph in which all the edges are uni-directional i.e. the edges point in a
single direction?
(a) Undirected (b) Directed
(c) Cyclic (d) Acyclic
3.51
Data Structures & Algorithms - II Graph
10. Which sorting involves displaying the specific order in which a sequence of
vertices must be followed in a directed graph?
(a) Cyclic (b) Acyclic
(c) Topological (d) None of these
11. Which is a subset of an undirected graph that has all the vertices connected by
minimum number of edges?
(a) Spanning tree (b) Minimum spanning tree
(c) Both (a) and (b) (d) None of these
12. Which is a subset of edges of a connected weighted undirected graph that connects
all the vertices together with the minimum possible total edge weight?
(a) Spanning tree (b) Minimum Spanning Tree (MST)
(c) Both (a) and (b) (d) None of these
Answers
1. (a) 2. (c) 3. (b) 4. (c) 5. (a) 6. (b) 7. (c)
8. (a) 9. (b) 10.(c) 11. (a) 12. (b)
13. The sequence of nodes that we need to follow when we have to travel from one
vertex to another in a graph is called the ______.
14. An adjacency ______ is a matrix of size n x n where n is the number of vertices in
the graph.
15. Topological ordering of vertices in a graph is possible only when the graph is a
______ acyclic graph.
16. A ______ (or minimum weight spanning tree) for a weighted, connected and
undirected graph is a spanning tree with weight less than or equal to the weight of
every other spanning tree.
17. The ______ algorithm is an example of an all-pairs shortest paths algorithm.
18. A graph ______ cycle is called acyclic graph.
Answers
1. G = (V, E) 2. cyclic 3. non-linear 4. Vertex, Arc
5. searching 6. Stack 7. degree 8. toward
9. Null 10. link 11. Queue 12. visited
13. path 14. matri 15. directed
16. Minimum Spanning Tree (MST) 17. Floyd-Warshall 18. without
Q. III State True or False:
1. An acyclic graph is a graph that has no cycle.
2. A Graph consists of a finite set of vertices (or nodes) and set of Edges which
connect a pair of nodes.
3. A graph in which weights are assigned to every edge is called acyclic graph.
4. The Floyd-Warshall algorithm is for solving the all pairs shortest path problem.
5. The number edges pointing away from the node are called out-degree/out-order.
6. In a graph, if two nodes are connected by an edge then they are called adjacent
nodes or neighbors.
7. Dijkstra's Algorithm can be applied to either a directed or an undirected graph to
find the shortest path to each vertex from a single source.
8. The graph is a linear data structures.
9. BFS usually implemented using a queue data structure.
10. The spanning tree does not have any cycle (loops).
11. Prim’s Algorithm will find the minimum spanning tree from the graph G.
12. Given an undirected, connected and weighted graph, find Minimum Spanning
Tree (MST) of the graph using Kruskal’s algorithm.
3.53
Data Structures & Algorithms - II Graph
13. The number of edges that are connected to a particular node is called the path of
the node.
14. A spanning tree of that graph is a subgraph that is a tree and connects all the
vertices together.
15. Directed graph is also called as Digraph.
Answers
1. (T) 2. (T) 3. (F) 4. (T) 5. (T) 6. (T) 7. (T) 8. (T)
9. (F) 10. (T) 11. (T) 12. (T) 13. (F) 14. (T) 15. (T)
2
1 3
2 1 3 NULL
3 2 3 NULL
B C
V1 V3 V5
V4 V6
3.55
Data Structures & Algorithms - II Graph
V1 V2 V3
V6 V7
Starting vertex v1
(i) Draw adjacency list
(ii) Give DFS and BFS traversals.
Ans. Refer to Sections 3.2.2 and 3.3.
October 2018
3
6
1 8
5
2
April 2019
1. Consider the following graph: [4 M]
V2
V5
V1 V3 V7
V6
V4
3.58
CHAPTER
4
Hash Table
Objectives …
To study Basic Concepts of Hashing
To learn Hash Table and Hash Function
To understand Terminologies in Hashing
To study Collision Resolution Techniques
4.0 INTRODUCTION
• Hashing is a data structure which is designed to use a special function called the hash
function which is used to map a given value with a particular key for faster access of
elements.
• Hashing is the process of mapping large amount of data item to smaller table with the
help of hashing function.
• The mapping between an item and the slot where that item belongs in the hash table
is called the hash function.
• The hash function will take any item in the collection and return an integer in the
range of slot names, between 0 and m−1.
• A hash table is a collection of items which are stored in such a way as to make it easy
to find them later.
Need for Hash Table Data Structure:
• In the linear search and binary search, the location of the item is determined by a
sequence/series of comparisons. The data item to be searched is compared with items
at certain locations in the list.
• If any item/element matches with the item to be searched, the search is successful.
The number of comparisons required to locate an item depends on the data structure
like array, linked list, sorted array, binary search tree, etc. and the search algorithm
used.
• For example, if the items are stored in sorted order in an array, binary search can be
applied which locates an item in O(log n) comparisons.
• On the other hand, if an item is to be searched in a linked list or an unsorted array,
linear search has to be applied which locates an item in O(n) comparisons.
4.1
Data Structures and Algorithms - II Hash Table
• However, for some applications, searching is very critical operation and they need a
search algorithm which performs search in constant time, i.e. O(1).
• Although, ideally it is almost impossible to achieve a performance of O(1), but still a
search algorithm can be derived which is independent of n and can give a
performance very close to O(1).
• That search algorithm is called hashing. Hashing uses a data structure called hash
table which is merely an array of fixed size and items in it are inserted using a
function called hash function.
• Best case timing behavior of searching using hashing = O(1) and Worst case timing
Behavior of searching using hashing = O(n).
• A hash table is a data structure in which the location of a data item is determined
directly as a function of the data item itself rather than by a sequence of comparisons.
• Under ideal condition, the time required to locate a data item in a hash table is O(1)
i.e. it is constant and does nor depend on the number of data items stored.
4.2
Data Structures and Algorithms - II Hash Table
4.2 TERMINOLOGY
• The basic terms used in hashing are explained below:
1. Hash Table: A hash table is a data structure that is used to store keys/value pairs.
Hashing uses a data structure called hash table which is merely an array of fixed
size and items in it are inserted using a hash function. It uses a hash function to
compute an index into an array in which an element will be inserted or searched.
A hash table is a data structure that maps keys to values. a hash table (hash map) is
a data structure that implements an associative array abstract data type, a
structure that can map keys to values. A hash table uses a hash function to
compute an index, also called a hash code, into an array of buckets or slots, from
which the desired value can be found.
2. Hash Function: A hash function is a function that maps the key to some slot in the
hash table. A hash function, is a mapping function which maps all the set of search
keys to the address where actual records are placed.
3. Bucket: A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store
one or more records.
4. Hash Address: A hash function is a function which when given a key, generates an
address in the table. Hash index is an address of the data block.
5. Collision: The situation where a newly inserted key maps to an already occupied
slot in the hash table is called collision. A collision occurs when two data elements
are hashed to the same value and try to occupy the same space in the hash table.
In simple words, the situation in which a key hashes to an index which is already
occupied by another key is called as collision.
6. Synonym: It is possible for different keys to hash to the same array location. This
situation is called collision and the colliding keys are called synonyms.
4.3
Data Structures and Algorithms - II Hash Table
7. Overflow: An overflow occurs at the time of the bucket for a new pair (key,
element) is full.
8. Open Addressing: It is performed to ensure that all elements are stored directly
into the hash table, thus it attempts to resolve collisions implementing various
methods.
9. Linear Probing: It is performed to resolve collisions by placing the data into the
next open slot in the table.
10. Hashing: Hashing is a process that uses a hash function to get the key for the hash
table and transform it into an index that will point to different arrays of buckets,
which is where the information will be stored.
11. Chaining: It is a technique used for avoiding collisions in hash tables.
12. Collision Resolution: Collision should be resolved by finding some other location
to insert the new key, this process of finding another location is called as collision
resolution.
• Such a field is known as primary key (denoted by k). The values k1, k2,… kn. in the key
field are known as keys or key values.
• The key through an algorithmic function determines the location of a particular
record.
The algorithmic function i.e. hashing function basically performs the key-to-address
transformation in which key is mapped to the addresses of records in the file as shown
in Fig. 4.2.
4.6
Data Structures and Algorithms - II Hash Table
4.7
Data Structures and Algorithms - II Hash Table
while(hashArray[hashIndex] != NULL)
{
if(hashArray[hashIndex]->key == key)
{
struct DataItem* temp = hashArray[hashIndex];
//assign a dummy item at deleted position
hashArray[hashIndex] = dummyItem;
return temp;
}
//go to next cell
++hashIndex;
//wrap around the table
hashIndex %= SIZE;
}
return NULL;
}
void display()
{
int i = 0;
for(i = 0; i<SIZE; i++)
{
if(hashArray[i] != NULL)
printf(" (%d,%d)",hashArray[i]->key,hashArray[i]->data);
else
printf(" ~~ ");
}
printf("\n");
}
int main()
{
dummyItem = (struct DataItem*) malloc(sizeof(struct DataItem));
dummyItem->data = -1;
dummyItem->key = -1;
insert(1, 20);
insert(2, 70);
insert(42, 80);
insert(4, 25);
insert(12, 44);
4.8
Data Structures and Algorithms - II Hash Table
insert(14, 32);
insert(17, 11);
insert(13, 78);
insert(37, 97);
display();
item = search(37);
if(item != NULL)
{
printf("Element found: %d\n", item->data);
} else
{
printf("Element not found\n");
}
delete(item);
item = search(37);
if(item != NULL)
{
printf("Element found: %d\n", item->data);
} else
{
printf("Element not found\n");
}
}
Output:
$gcc -o hashpgm *.c
$hashpgm
~~ (1,20) (2,70) (42,80) (4,25) ~~ ~~ ~~ ~~ ~~ ~~ ~~ (12,44)
(13,78) (14,32) ~~ ~~ (17,11) (37,97) ~~
Element found: 97
Element not found
• The number obtained after removing the digits is used as the hash value. Note that the
2
digits at the same position of k must be used for all keys.
• Thus, the hash function is given below,
h(k) = s
2
where, s is obtained by deleting digits from both sides of k .
• For example, consider a hash table with N = 1000. The hash value of the key value
132437 can be calculated as follows:
1. The square of the key value is calculated, which is, 17539558969.
th th th
2. The hash value is obtained by taking 5 , 6 and 7 digits counting from right,
which is, 955.
• Let us suppose, if k is the index retrieved from the hash function. If the kth index is
already filled then we will look for (k+1) %M, then (k+2) %M and so on. When we get a
free slot, we will insert the object into that free slot.
• Below table shows the result of Inserting keys (5,18,55,78,35,15) using the hash
function h(k) = k % 10 and linear probing strategy.
Empty
After 5 After 18 After 55 After 78 After 35 After 15
Table
0 15
1
2
3
4
5 5 5 5 5 5 5
6 55 55 55 55
7 35 35
8 18 18 18 18 18
9 78 78 78
• Linear probing is easy to implement but it suffers from "primary clustering".
• When many keys are mapped to the same location (clustering), linear probing will not
distribute these keys evenly in the hash table. These keys will be stored in
neighborhood of the location where they are mapped. This will lead to clustering of
keys around the point of collision.
Example 1: Consider an example of hash table of size 20, and the following items are to be
stored. Item are in the (key, value) format. Here, we can search the next empty location in
the array by looking into the next cell until we find an empty cell. This technique is called
linear probing.
(1, 20)
(2, 70)
(42, 80)
(4, 25)
(12, 44)
(14, 32)
(17, 11)
(13, 78)
(37, 98)
4.12
Data Structures and Algorithms - II Hash Table
Solution:
After Linear Probing,
Sr. No. Key Hash Array Index
Array Index
1. 1 1 % 20 = 1 1 1
2. 2 2 % 20 = 2 2 2
3. 42 42 % 20 = 2 2 3
4. 4 4 % 20 = 4 4 4
5. 12 12 % 20 = 12 12 12
6. 14 14 % 20 = 14 14 14
7. 17 17 % 20 = 17 17 17
8. 13 13 % 20 = 13 13 13
9. 37 37 % 20 = 17 17 18
Example 2: A hash table of length 10 uses open addressing with hash function h(k)=k mod
10, and linear probing. After inserting 6 values into an empty hash table, the table is as
shown below:
0
1
2 42
3 23
4 34
5 52
6 46
7 33
8
9
What is a possible order in which the key values could have been inserted in the table?
Solution: 46, 34, 42, 23, 52, 33 is the sequence in which the key values could have been
inserted in the table.
How many different insertion sequences of the key values using the same hash function
and linear probing will result in the hash table shown above?
Solution: 30
In a valid insertion sequence, the elements 42, 23 and 34 must appear before 52 and 33,
and 46 must appear before 33.
Total number of different sequences = 3! × 5 = 30
In the above expression, 3! is for elements 42, 23 and 34 as they can appear in any order,
and 5 is for element 46 as it can appear at 5 different places.
4.13
Data Structures and Algorithms - II Hash Table
4.5.1.3 Rehashing
• As the name suggests, rehashing means hashing again. Rehashing is a technique in
which the table is resized, i.e. the size of table is doubled by creating a new table.
• Several deletion operations are intermixed with insertion operations while
performing operations on hash table. Eventually, a situation arises when the hash
table becomes almost full.
• At this time, it might happen that the insert, delete, and search operations on the hash
table take too much time. The insert operation even fails in spite of performing open
addressing with quadratic probing for collision resolution. This condition/situation
indicates that the current space allocated to the hash table is not sufficient to
accommodate all the keys.
4.14
Data Structures and Algorithms - II Hash Table
• A simple solution to this problem is rehashing, in which all the keys in the original
hash table are rehashed to a new hash table of larger size.
• The default size of the new hash table is twice as that of the original hash table. Once
the new hash table is created, a new hash value is computed for each key in the
original hash table and the keys are inserted into the new hash table.
• After this, the memory allocated to the original hash table is freed. The performance of
the hash table improves significantly after rehashing.
0
Transferring 1
the contents 2
9
Old table
22
New table
Fig. 4.3: Rehashing
4.5.2 Chaining
• In hashing the chaining is one collision resolution technique. Chaining is a possible
way to resolve collisions. Each slot of the array contains a link to a singly-linked list
containing key-value pairs with the same hash.
• Chaining allows storing the key elements with the same hash value into linked list as
shown in Fig. 4.4.
• Thus, cach slots in the hash table contains a pointer to the head of the linked list of all
the elements that hashes to the value h.
• All collisions are chained in the lists attached to the appropriate slot. This allows an
unlimited number of collisions to be handled and does not require a prior knowledge
of how many elements are contained in the collection.
• The tradeoff is the same as with linked lists versus array implementations of
collections: linked list overhead in space and to a lesser extent, in time.
4.15
Data Structures and Algorithms - II Hash Table
1
Head
2
122803 3
Hash Address = h(k)=4
Key%307 + 1 4 Uday Lodia 122803
151354
Key Value Next
4.16
Data Structures and Algorithms - II Hash Table
0 ofu
1 gcl
2 qur
3 clq
4 ecd
5 dim
6 aty
7 rhv
8 qrj
9 qsu
Fig. 4.5
• Coalesced hashing technique is effective, efficient, and very easy to implement.
• In separate chaining technique, a separate list of all elements mapped to the same
value is maintained. Separate chaining is based on collision avoidance.
• If memory space is tight, separate chaining should be avoided. Additional memory
space for links is wasted in storing address of linked elements.
• Hashing function should ensure even distribution of elements among buckets;
otherwise the timing behavior of most operations on hash table will deteriorate.
• Fig. 4.6 shows a separate chaining hash table.
List of elements
0 10 50
2 12 32 62
4 4 24
7 7
9 9 69
Example 1: The integers given below are to be inserted in a hash table with 5 locations
using chaining to resolve collisions. Construct hash table and use simplest hash function.
1, 2, 3, 4, 5, 10, 21, 22, 33, 34, 15, 32, 31, 48, 49, 50,
Solution: An element can be mapped to a location in the hash table using the mapping
function h(k) = k % 10.
0 5, 10, 15, 50
1 1, 21, 31
2 2, 22, 32
3 3, 33, 48
4 4, 34, 49
4.18
Data Structures and Algorithms - II Hash Table
0 5 10 15 50
1 1 21 31
2 2 22 32
3 3 33 48
4 4 34 49
Fig. 4.7
Example 2: Consider the key values 20, 32, 41, 66, 72, 80, 105, 77, 56, 53 that need to be
hashed using the simple hash function h(k) = k mod 10. The keys 20 and 80 hash to index
0, key 41 hashes to index 1, keys 32 and 72 hashes to index 2, key 53 is hashed to index 3,
key 105 is hashed to index 5, keys 66 and 56 are hashed to index 6 and finally the key 77 is
hashed to index 7. The collision is handled using the separate chaining (also known as
0 80 20 NULL
1 41 NULL
2 72 32 NULL
3 53 NULL
4 NULL
5 105 NULL
6 56 66 NULL
7 77 NULL
8 NULL
9 NULL
4.19
Data Structures and Algorithms - II Hash Table
Sr.
Separate Chaining Open Addressing
No.
1. Chaining is simpler to implement. Open addressing requires more
computation.
2. In chaining, hash table never fills up, In open addressing, table may become
we can always add more elements to full.
chain.
3. Chaining is less sensitive to the hash Open addressing requires extra care
function or load factors. for to avoid clustering and load factor.
4. Chaining is mostly used when it is Open addressing is used when the
unknown how many and how frequency and number of keys is
frequently keys may be inserted or known.
deleted.
5. Cache performance of chaining is not Open addressing provides better cache
good as keys are stored using linked performance as everything is stored in
list. the same table.
6. Wastage of space (some parts of hash In open addressing, a slot can be used
table in chaining are never used). even if an input does not map to it.
7. Chaining uses extra space for links. No links in open addressing.
PRACTICE QUESTIONS
Q. I Multiple Choice Questions:
1. Which data structure is designed to use a special function called the hash
function?
(a) Hashing (b) Stacking
(c) Queueing (d) None of these
2. Which is a data structure that represents data in the form of key-value pairs?
(a) Hash function (b) Hash table
(c) Hashing (d) None of these
3. Which is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table?
(a) Hash function (b) Hash table
(c) Hash values (d) None of these
4.20
Data Structures and Algorithms - II Hash Table
8. (a) 9. (c) 10. (b) 11. (d) 12. (a) 13. (b)
4.22
Data Structures and Algorithms - II Hash Table
4.23
Data Structures and Algorithms - II Hash Table
4.24