DSC Unit-4

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

Graphs :

Graphs are mathematical structures that represent pairwise relationships between


objects. A graph is a flow structure that represents the relationship between various
objects.

It can be visualized by using the following two basic components:

 Vertices: It is also called as Nodes. These are the most important components in
any graph. Nodes are entities whose relationships are expressed using edges. If
a graph comprises 2 nodes A and B and an undirected edge between them, then
it expresses a bi-directional relationship between the nodes and edge.
 Edges: Edges are the components that are used to represent the relationships
between various nodes in a graph. An edge between two nodes expresses a one-
way or two-way relationship between the nodes.

Suppose finite set of ordered pair of the form (u, v) . The pair is ordered because
(u, v) is not same as (v, u) in case of a directed graph(di-graph). The pair of the
form (u, v) indicates that there is an edge from vertex u to vertex v. The edges
may contain weight/value/cost.

Application of Graphs:
Graphs are used to represent networks.
 The networks may include paths in a city or telephone network or circuit
network.
 Graphs are also used in social networks like linkedIn, Facebook.
For example, in Facebook, each person is represented with a vertex(or node).
Each node is a structure and contains information like person id, name, gender
and locale.
 Program call graph and variable dependency graph.
 Course prerequisites.
Representation of Graphs:
 Represent vertices with circles, perhaps containing a label inside the circle.

 Represent edges with lines between circles

 Example:

 V = {A,B,C,D}

 E = {(A,B),(A,C),(A,D),(B,D),(C,D)}

Kinds of Graphs
Various flavors of graphs have the following specializations and particulars about
how they are usually drawn.

 Weighted or unweighted

 Directed or undirected

 Cyclic or acyclic
Weighted and Unweighted Graphs:
Weighted Graph:

 A weighted graph is an edge labeled graph where the labels can be operated
on by the usual arithmetic operators, including comparisons like using less than
and greater than.
 Usually they are integers or floats.
 The idea is that some edges may be more (or less) expensive, and this cost is
represented by the edge labels or weight.
 Examples for Weighted Graphs: weights are distances between cities

Unweighted Graph: Edges have no weight.


 Edges simply show connections between the vertices/nodes.

 Examples for Unweighted Graphs: course prerequisites.


Directed and Undirected Graph:
Directed Graph:
A directed graph is graph, i.e., a set of objects (called vertices or nodes) that are
connected together, where all the edges are directed from one vertex to another. A
directed graph is sometimes called a digraph or a directed network. In contrast, a
graph where the edges are bidirectional is called an undirected graph.
When drawing a directed graph, the edges are typically drawn as arrows indicating the
direction.
A complete graph in which each edge is bidirected is called a complete directed graph.
A directed graph having no symmetric pair of directed edges (i.e., no bidirected edges)
is called an oriented graph.
Ex:

Example: E = {(A,B), (A,C), (A,D), (B,C), (D,C)}


Constraints in digraph:
 In a digraph, an edge is an ordered pair

 Thus: (u,v) and (v,u) are not the same edge

 In the example, (D,C) ∈ E, (C,D) ∉ E

 What would edge (B,A) look like? Remember (A,B) ≠ (B,A)

 Here a node can have an edge to itself (eg (A,A) is valid)


Undirected Graphs:
An undirected graph is graph, i.e., a set of objects (called vertices or nodes) that are
connected together, where all the edges are bidirectional. An undirected graph is
sometimes called an undirected network.
When drawing an undirected graph, the edges are typically drawn as lines between
pairs of nodes.

Cyclic Graphs:
A cyclic graph is a directed graph which contains a path from at least one node back to
itself. In simple terms cyclic graphs contain a cycle.
A cyclic graph is a graph containing at least one graph cycle. A graph that is not cyclic
is said to be acyclic. A cyclic graph possessing exactly one (undirected, simple) cycle
is called a unicyclic graph.
Cyclic graphs are not trees.

Acyclic Graphs:
 An acyclic graph contains no cycles
 Example: Course prerequisites.

Important Points about Graphs:


 An undirected graph is connected if every pair of vertices has a path between it
 Otherwise it is unconnected

 An unconnected graph can be broken in to connected components

 A directed graph is strongly connected if every pair of vertices has a path


between them, in both directions

 Complete graph: a graph in which every vertex is directly connected to every


other vertex
 Degree of a Node
 The degree of a node is the number of edges the node is associated with.

In the example above:


Degree 2: B and C and Degree 3: A and D
A and D have odd degree, and B and C have even degree
Graph Representations:
Following two are the most commonly used representations of a graph.
1. Adjacency Matrix
2. Adjacency List

There are other representations also like, Incidence Matrix and Incidence List.
The choice of the graph representation is situation specific.
It totally depends on the type of operations to be performed and ease of use.

Adjacency Matrix:

Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph.


Let the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to
vertex j. Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is
also used to represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to
vertex j with weight w.

Example of an undirected graph and its adjacency matrix as follows:


Example of a Directed graph and its adjacency matrix as follows

Adjacency matrix:
Example of an weighted graph and its adjacency matrix as follows:

Pros: Representation is easier to implement and follow. Removing an edge takes O(1)


time. Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient
and can be done O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse(contains less number
of edges), it consumes the same space. Adding a vertex is O(V^2) time.

Adjacency List:
An array of lists is used. Size of the array is equal to the number of vertices. Let the
array be array[]. An entry array[i] represents the list of vertices adjacent to the ith
vertex. This representation can also be used to represent a weighted graph. The
weights of edges can be represented as lists of pairs. Following is adjacency list
representation of the above graph.

Array of Adjacency Lists Representation :


Array of Adjacency Lists of weighted graphs:
Graph Traversal Methods:
Graph traversal is a technique used for a searching vertex in a graph. The graph
traversal is also used to decide the order of vertices is visited in the search process. 

There are two graph traversal techniques and they are as follows...
 DFS (Depth First Search)
 BFS (Breadth First Search)

1.DFS:
DFS traversal of a graph produces a spanning tree as final result. Spanning Tree is a
graph without loops. We use Stack data structure with maximum size of total number
of vertices in the graph to implement DFS traversal.

We use the following steps to implement DFS traversal...

 Step 1 - Define a Stack of size total number of vertices in the graph.


 Step 2 - Select any vertex as starting point for traversal. Visit that vertex and
push it on to the Stack.
 Step 3 - Visit any one of the non-visited adjacent vertices of a vertex which is
at the top of stack and push it on to the stack.
 Step 4 - Repeat step 3 until there is no new vertex to be visited from the vertex
which is at the top of the stack.
 Step 5 - When there is no new vertex to visit then use back tracking and pop
one vertex from the stack.
 Step 6 - Repeat steps 3, 4 and 5 until stack becomes Empty.
 Step 7 - When stack becomes Empty, then produce final spanning tree by
removing unused edges from the graph

Note: Back tracking is coming back to the vertex from which we reached the current
vertex.
Example of DFS:

2.BFS:
Breadth first search is a graph traversal algorithm that starts traversing the graph from
root node and explores all the neighbouring nodes. Then, it selects the nearest node
and explore all the unexplored nodes. The algorithm follows the same process for each
of the nearest node until it finds the goal and ensures that no node is visited twice.
We use Queue data structure with maximum size of total number of vertices in the
graph to implement BFS traversal.
We use the following steps to implement BFS traversal...
Step 1 - Define a Queue of size total number of vertices in the graph.
Step 2 - Select any vertex as starting point for traversal. Visit that vertex and enqueue
it.
Step 3 - Visit all the non-visited adjacent vertices of the vertex which is at front of the
Queue and insert them into the Queue.
Step 4 - When there is no new vertex to be visited from the vertex which is at front of
the Queue then dequeue that vertex.
Step 5 - Repeat steps 3 and 4 until queue becomes empty.
Step 6 - When queue becomes empty, then produce final spanning tree by removing
unused edges from the graph
Example for BFS:

Sorting:
Before learning about sorting , we will see some important points about trees.
Types of binary trees :
Full binary tree: Every node other than leaf nodes has 2 child nodes (Every node has
0 or 2 children) .
Complete binary tree: All levels are filled except possibly the last one, and all nodes
are filled in as far left as possible.
Perfect binary tree: All nodes have two children and all leaves are at the same level.

No. of Nodes = 2d+1 – 1


No. of Leaf Nodes = 2d
Where, d – Depth of the Tree
Heap:
A heap is a data structure that stores a collection of objects (with keys), and has the
following properties:
 Complete Binary tree.
 Heap Order - Max Heap and Min Heap.
Heap order property:
 For every node v, other than the root, the key stored in v is greater or equal than the
key stored in the parent of v . - Min Heap
 Store data in ascending order

 For every node v, other than the root, the key stored in v smaller or equal than the
key stored in the parent of v . -Max Heap

 Store data in descending order

Insertion in Heap:
Given a Binary Heap and a new element to be added to this Heap. The task is to insert the new
element to the Heap maintaining the properties of Heap.

 Algorithm
1. First increase the heap size by 1, so that it can store the new element.
2. Add the new element to the next available position at the lowest level or end
of the heap.

3. Restore the max-heap property if violated

 General strategy is percolate up (or bubble up): if the parent of the


element is smaller than the element, then interchange the parent and
child and then check for its parent and swap them if violated.

OR

Restore the min-heap property if violated

 General strategy is percolate up (or bubble up): if the parent of the


element is larger than the element, then interchange the parent and
child and then check for its parent and swap them if violated.
Example1:

Example2:

Deletion:

The standard deletion operation on Heap is to delete the element present at the root node of the
Heap.

1. Delete Max Heap

 Copy the last number to the root ( overwrite the maximum element stored there ).

 Restore the max heap property by percolate down.

2. Delete Min Heap

 Copy the last number to the root ( overwrite the minimum element stored there ).

 Restore the min heap property by percolate down.

Since deleting an element at any intermediary position in the heap can be costly, so we can
simply replace the element to be deleted by the last element and delete the last element of
the Heap.
 Replace the root or element to be deleted by the last element.
 Delete the last element from the Heap.
 Since, the last element is now placed at the position of the root node. So, it may not
follow the heap property. Therefore, heapify the last node placed at the position of
root.
Example For Deletion:

Heap Sort:

Heap sort is a comparison based on sorting technique based on Binary Heap data structure. It is
similar to selection sort where we first find the maximum element and place the maximum element
at the end. We repeat the same process for remaining element.

A Binary Heap is a Complete Binary Tree where items are stored in a special order such that value in
a parent node is greater (or smaller) than the values in its two children nodes. The former is called
as max heap and the latter is called min heap. The heap can be represented by binary tree or array.

Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array and array based
representation is space efficient.
The root of the tree A[0] and given index i of a node and n is array size, the indices of its parent, left
child and right child can be computed as
 
PARENT (i )
    return (i+1)/2 -1
LEFT (i )
        return 2i +1
RIGHT (i )
        return 2i + 2
Procedures on Heap :

 Heapify – modifies the heap according to the properties of max heap or min heap.
 Build Heap – building a new heap with values.

Heapify :
Starting from a complete binary tree, we can modify it to become a Max-Heap by running a function
called heapify on all the non-leaf elements of the heap.

Heapify picks the largest child key and compare it to the parent key. If parent key is larger
than heapify quits, otherwise it swaps the parent key with the largest child key. So that the
parent is now becomes larger than its children.

Ex Algorithm:
Heapify(A, i)
{
        l  left(i)
        r  right(i)
        if l <= heapsize[A] and A[l] > A[i]
            then largest l
            else largest  i
        if r <= heapsize[A] and A[r] > A[largest]
            then largest  r
        if largest != i
            then swap A[i]  A[largest]
                Heapify(A, largest)
 }

Build Heap:

We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n] into a heap.
Since the elements in the subarray A[n/2 +1 . . n] are all leaves, the procedure Build_Heap goes
through the remaining nodes of the tree and runs 'Heapify' on each one. The bottom-up order of
processing node guarantees that the subtree rooted at children are heap before 'Heapify' is run at
their parent.

Ex Algorithm:

Buildheap(A)

{
        heapsize[A] length[A]
        for i |length[A]/2 //down to 1
            do Heapify(A, i)
 }
Heap Sort Algorithm :

The heap sort algorithm starts by using procedure BUILD-HEAP to build a heap on the input array
A[1 . . n].
Since the maximum element of the array stored at the root A[0].
it can be put into its correct final position by exchanging it with A[n] (the last element in A).
If we now discard node n from the heap than the remaining elements can be made into heap. Note
that the new element at the root may violate the heap property. All that is needed to restore the
heap property using heapify .

Ex Algorithm:
Heapsort(A)
{
        Buildheap(A)
        for i  length[A] //down to 2
            do swap A[1]  A[i]
            heapsize[A]  heapsize[A] - 1
            Heapify(A, 1)
}

Example :
Observe the below gif image carefully.
Complexity Analysis of Heap Sort:
 Time complexity of heapify is O(Logn).
 Time complexity of createAndBuildHeap() is O(n).
 Worst Case Time Complexity: O(n*log n)
 Best Case Time Complexity: O(n*log n)
 Average Time Complexity: O(n*log n)
 Space Complexity : O(1)
 Heap sort is not a Stable sort, and requires a constant space for sorting a list.
 Heap Sort is very fast and is widely used for sorting.

Example Code for functions of Heap Sort :

heapSort(int arr[], int size)


{
for (i = size / 2 - 1; i >= 0; i--) //start from last node’s parent
heapify(arr, size, i);
for (i=size-1; i>=0; i--)
temp = arr[0];
arr[0]= arr[i];
arr[i] = temp;
heapify(arr, i, 0);
}

heapify(int arr[], int size, int i)


{
largest = i;
left = 2*i + 1;
right = 2*i + 2;

if (left < size && arr[left] >arr[largest])


largest = left;

if (right < size && arr[right] > arr[largest])


largest = right;

if (largest != i)
temp = arr[i];
arr[i]= arr[largest];
arr[largest] = temp;
heapify(arr, size, largest);
}

Merge Sort:

Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves,
calls itself for the two halves and then merges the two sorted halves. The merge()
function is used for merging two halves. The merge(arr, l, m, r) is key process that
assumes that arr[l..m] and arr[m+1..r] are sorted and merges the two sorted sub-arrays
into one. 
With worst-case time complexity being Ο(n log n), it is one of the most respected
algorithms.
The top-down merge sort approach is the methodology which
uses recursion mechanism.
“Divide and Conquer” :
 Very important strategy in computer science:
 Divide problem into smaller parts
 Independently solve the parts
 Combine these solutions to get overall solution

Recursive Algorithm:
 Downward pass over the recursion tree.
 Divide large instances into small ones.
 Upward pass over the recursion tree.
 Merge pairs of sorted lists.
 Number of leaf nodes is n.
 Number of non leaf nodes is n-1.
Ex:
Iterative Algorithm:
iterative MergeSort doesn’t require explicit auxiliary stack.
Eliminate downward pass.
Start with sorted lists of size 1 and do pairwise merging of these sorted lists as in the
upward pass.

Operation of the bottom-up merge sort algorithm:


 The bottom-up merge sort algorithm first merges pairs of adjacent
arrays of 1 elements
 Then merges pairs of adjacent arrays of 2 elements
 And next merges pairs of adjacent arrays of 4 elements
 And so on... Until the whole array is merged
Ex:

Important functions in MergeSort:


merge(int arr[], int beg, int mid, int end)
{
i = beg, j = mid+1,index=beg, k;
 
while ((i <= mid) && (j <= end))
if (arr[i] < arr[j])
temp[index] = arr[i]; i++;
else
temp[index] = arr[j]; j++;
index++;
}
merge_sort(int arr[], int beg,int end)
{
if (beg < end)
mid = (beg + end) / 2;
merge_sort(arr,beg, mid);
merge_sort(arr,mid+1,end);
merge(arr,beg,mid,end);
}
Time Complexity: Sorting arrays on different machines. Merge Sort is a recursive algorithm
and time complexity can be expressed as following recurrence relation.

T(n) = 2T(n/2) + 
The above recurrence can be solved either using Recurrence Tree method or Master
method. It falls in case II of Master Method and solution of the recurrence

is  .

Time complexity of Merge Sort is   in all 3 cases (worst, average and best)
as merge sort always divides the array into two halves and take linear time to merge two
halves.

Auxiliary Space: O(n)
Applications of Merge Sort:
 Merge Sort is useful for sorting linked lists in O(nLogn) time.In the case of
linked lists, the case is different mainly due to the difference in memory
allocation of arrays and linked lists. Unlike arrays, linked list nodes may not be
adjacent in memory. Unlike an array, in the linked list, we can insert items in
the middle in O(1) extra space and O(1) time. Therefore merge operation of
merge sort can be implemented without extra space for linked lists.
 In arrays, we can do random access as elements are contiguous in memory. Let
us say we have an integer (4-byte) array A and let the address of A[0] be x then
to access A[i], we can directly access the memory at (x + i*4). Unlike arrays,
we can not do random access in the linked list. Quick Sort requires a lot of this
kind of access. In linked list to access i’th index, we have to travel each and
every node from the head to i’th node as we don’t have a continuous block of
memory. Therefore, the overhead increases for quicksort. Merge sort accesses
data sequentially and the need of random access is low.
 Inversion Count Problem
 Used in External Sorting

External Sorting :
 External sorting - sorting algorithms that can handle massive amounts of data.
 It is required when the data to be sorted do not fit into the main memory (RAM)
and instead they must reside in the slower external memory (hard disk).
 It typically uses a hybrid sort-merge strategy.
 In the sorting phase, chunks of data small enough to fit in main memory are
read, sorted, and written out to a temporary file.
 In the merge phase, the sorted sub-files(runs) are combined into a single larger
file(single run).
Note: Run-sorted subfile
Example:
File: 4500 records, A1, …, A4500
internal memory: 750 records (3 blocks)
block length: 250 records
input disk vs. scratch pad (disk)
 File – 4500 records to be sorted present on Hard Disk
 RAM- 750 records (3 blocks i.e. block size=250 recs)
 No. of Chunks = 4500/750 = 6 each of 3 blocks
(Block- data unit transferred from/to hard disk)

Steps :

 Internally sort 3 blocks at a time(i.e. 750 recs) to obtain 6 runs R1 to R6 written


onto scratch disk. Sort method can be heap sort, merge sort or quick sort.
 Out of 3 Blocks of internal memory, 2 blocks will be used as input buffers
(read runs)and 1 block as output buffer(merged output).
 When Output buffer gets full, it is written onto disk and when input buffer gets
empty it is refilled with another block from same run.
 Next r3 and r4 are merged so on (similar to iterative merge sort)

File: 4500 records, A1, …, A4500


internal memory: 750 records (3 blocks)
block length: 250 records
input disk vs. scratch pad (disk)

(1) sort three blocks at a time and write them out onto scratch pad
(2) three blocks: two input buffers & one output buffer
Two-Way Merge Sort or a binary merge:
Two-way merge sort is a technique which works in two stages which are as follows
here:
Stage 1: Firstly break the records into the blocks and then sort the individual record
with the help of two input tapes.
Stage 2: In this merge the sorted blocks and then create a single sorted file with the
help of two output tapes.
By this, it can be said that two-way merge sort uses the two input tapes and two output
tapes for sorting the data.
no. of passes : log2x Ex: m(no, of runs)=16, passes=4
Fig: 2 Way Merging
K – Way Merging:
The k-way merge problem consists of merging k sorted arrays to produce a single
sorted array with the same elements. Denote by n the total number of elements. n is
equal to the size of the output array and the sum of the sizes of the k input arrays.
The problem can be solved in O(n log k) running time with O(n) space. Several
algorithms that achieve this running time exist.
 k-way merging with m runs
no. of passes : logkx
ex: m=16, k=4 passes=2

You might also like