Summary of C++ Data Structures: Part 0: Review
Summary of C++ Data Structures: Part 0: Review
Summary of C++ Data Structures: Part 0: Review
Wayne Goddard
School of Computing, Clemson University, 2018
Part 0: Review
1 Basics of C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Basics of Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Program Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
CpSc2120 – Goddard – Notes Chapter 1
Basics of C++
1.1 Summary
C++ is an extension of C. So the simplest program just has a main function. The
program is compiled on our system with g++, which by default produces an executable
a.out that is run from the current directory.
C++ has for, while and do for loops, if and switch for conditionals. The standard
output is accessed by cout. The standard input is accessed by cin. These require
inclusion of iostream library. The language is case-sensitive.
These integer-types also come in unsigned versions. We will not use these much. But
do note that arithmetic with unsigned data types is different. For example the code
1.3 Arrays
Arrays in C++ are declared to hold a specific number of the same type of object.
The valid indices are 0 up to 1 less than the size of the array. The execution does no
checking for references going outside the bounds of the array. Arrays can be initialized
at declaration.
1
1.4 Functions
A function is a self-standing piece of code. It can return a variable of a specified
type, or have type void. It can have arguments of specific type. In general, variables
are passed by value, which means that the function receives a copy of the variable.
This is inefficient for large objects, so these are usually passed by address (such as
automatically occurs for arrays) or by reference (discussed later).
To aid the compiler, a prototype of a function at the start of a program tells the
compiler of the existence of such a function: it specifies the name, arguments, and type
of the function. The actual names of the arguments are optional, but recommended.
1.5 Pointers
A pointer stores an address. A pointer has a type: this indicates the type of object
stored at the address to which the pointer points. A pointer is defined using the ∗, and
is dereferenced thereby too. An array name is equivalent to a pointer to the start of
that array. Primitive arithmetic can be applied to pointers. To indicate that a pointer
points to nothing, it is set equal to nullptr.
1.6 Strings
There are two options to store strings in C++. The first is the way done in C, now
called C-strings. A C-string is stored as a sequence of chars, terminated by the null
character (which is denoted ’\0’ and has value 0 as an int). The user must ensure
that the null terminator remains present. Constant strings defined by the user using
quotation marks are automatically C-strings. With the cstring library, strings can be
compared, cin-ed and cout-ed, copied, appended, and several other things. C-strings
are passed to functions by reference: that is, by supplying the address of the first
character using the array name or a char pointer.
We will mostly use the object from the string class provided in the string library.
These can be compared, cin-ed and cout-ed, assigned C-string, appended, etc.
Sample Code
The first example code prints out the prime numbers less than 100. We will explain
the use of namespace’s later.
In the second example code, the binarySearch function searches a sorted array for
a specific value. It returns the index if it finds the value, and -1 otherwise.
primality.cpp
BinarySearch.cpp
2
CpSc2120 – Goddard – Notes Chapter 2
Basics of Classes
2.1 Objects
An object is a particular instance of a class and there may be multiple instances of
a class. An object has
object.method();
The code
MyString puppet;
• accessor functions: these allow the user to get data from the object.
• mutator functions: these allow the user to set data in the object.
2.3 Constructors
A constructor is a special function that initializes the state of the object; it has the
same name as the class, but does not have a return type. There can be more than one
constructor. Note that the compiler will provide a default no-argument constructor if
none is coded. Some constructor is always executed when an object is created.
3
2.4 Why Objects?
Object-oriented programming rests on the three basic principles of encapsulation:
OOP uses the idea of classes. A class is a structure which houses data together
with operations that act on that data. We strive for loose coupling : each class
is largely independent and communicates with other classes via a small well-defined
interface. We strive for cohesion: each class performs one and only one task (for
readability, reuse).
We strive for responsibility-driven design: each class should be responsible for
its own data. You should ask yourself: What does the class need to know? What does
it do?
The power of OOP also comes from two further principles which we will discuss
later:
• Inheritance: classes inherit properties from other classes (which allows partial
code reuse)
Sample Code
Below is a sample class and a main function. But note that there are several style
problems with it, some of which we will fix later. The output is
Citizen.cpp
4
CpSc2120 – Goddard – Notes Chapter 3
Program Development
3.1 Testing
One needs to test extensively. Start by trying some standard simple data. Look at
the boundary values: make sure it handles the smallest or largest value the program
must work for, and suitably rejects the value just out of range. Add watches or debug
statements so that you know what is happening at all times. Especially look at the
empty case, or the 0 input.
3.4 Algorithms
An algorithm for a problem is a recipe that:
(a) is correct,
(b) is concrete,
(c) is unambiguous,
(d) has a finite description, and
(e) terminates.
Having found an algorithm, one should look for an efficient algorithm. As Shaffer
writes: “First tune the algorithm, then tune the code.”
5
Summary of C++ Data Structures
Wayne Goddard
School of Computing, Clemson University, 2018
Part 1: Fundamentals
4 More about Classes, Files and I/O . . . . . . . . . . . . . . . . . . . . 6
5 Standard Class Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Algorithmic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
CpSc2120 – Goddard – Notes Chapter 4
More about Classes, Files and I/O
6
variable in the calling function. Pass by value is inefficient if the object is large.
Pass-by-reference/address provides access to the original variable; changing the
variable in the function does affect the variable in the calling function.
In C, pass-by-reference/address is achieved by pointers. This is still used in C++.
For example, we saw that arrays are implicitly passed this way.
C++ introduced the idea of references or aliases. This allows a method direct
access to an object (“sharing”) but uses a different syntax. An ampersand & indicates
an object passed by sharing; inside the function it is treated as if it were a local variable.
// file myHeader.h
#ifndef MYHEADER_H
#define MYHEADER_H
... code as before ...
#endif
4.6 Const’s
The const modifier has several different uses in C++. One can indicate that a variable
does not change by putting a const before it:
7
One can indicate that an argument is not changed by a function, by putting a const
before it:
One can indicate that the method does not change the object on which it is invoked
by putting a const after the parameter list:
One can of course use both. The compiler will try to check that the claims are correct.
But it cannot guarantee them, since one can create a pointer and get it to point to a
variable.
class Foo {
public:
Foo( ) : Bar(1) , ging(’d’) // no-argument constructor
{ }
private:
Iso Bar;
char ging;
};
4.8 Libraries
Mathematical functions are available in the library cmath. Note that angles are repre-
sented in radians.
A namespace is a context for a set of identifiers. By writing std::string, we
say to use the string from that namespace. A way to avoid writing the namespace
every time is to use the using expression. But, the user can create a class with the
same name as one in std, and then the compiler doesn’t know which to use.
8
4.9 More on Output
Including <iomanip> allows one to format stream output using what are called manip-
ulators. For example, setw() sets the minimum size of the output field, setprecision()
sets the precision, and fixed ensures that a fixed number of decimal places are displayed.
For example
double A = 4.999;
cout << setprecision(2) << showpoint;
cout << A;
produces 5.0 as output.
Sample Code
Consider a revision to our Citizen class. This is compiled by
g++ CitizenToo.cpp TestCitizenToo.cpp
CitizenToo.cpp
CitizenToo.h
TestCitizenToo.cpp
9
CpSc2120 – Goddard – Notes Chapter 5
If the user has some code where two fractions are added, e.g. A+B, then this member
function is called on A, with B as the argument. That is, the compiler changes A+B
to A.operator+(B) ; In the actual code for the function, the data members of the first
fraction are accessed directly; those of the second are accessed with other. notation.
class Foo {
int bar;
bool operator== ( const Foo &other ) const
{
return (bar == other.bar);
}
};
Most binary operators are left-to-right associative. It follows that when in the
calling function we have
Foo X,Y;
if( X==Y )
10
5.3 Inputting or Outputting a Class
Output of a class can be achieved by overloading the stream insertion operator <<.
This is usually a separate global function (that is, not a member function). In order
to access the private variables of your class, you usually need to make it a friend of
your class (by adding its prototype inside the class).
class Foo {
private:
int bar1,bar2;
friend ostream &operator<< (ostream &, const Foo &);
};
Note that the arguments are passed by reference, and the stream itself is returned by
reference (so that the operator works with successive <<).
One can use the same approach to read an object from the user. Usually the user
data is read into a string and then parsed internally. This is to handle malformed data
without crashing.
class Foo
{
private:
Bar *barPtr;
public:
Foo( const Foo &other ) {
11
barPtr = new Bar( *(other.barPtr) );
}
};
You should assume the deep copy is required unless otherwise specified.
Note that the code A=B uses the assignment operator. There is a fundamental trio:
either the default copy constuctor, assignment operator and destructor are
all okay, or you need to provide all three.
We see how to create an assignment operator next.
12
Sample Code
We create a class called Fraction. Note that the fraction is stored in simplest form.
In what follows we have first the header file Fraction.h, then the implementation file
Fraction.cpp, and then a sample program that uses the class TestFraction.cpp.
Fraction.h
Fraction.cpp
TestFraction.cpp
13
CpSc2120 – Goddard – Notes Chapter 6
Algorithmic Analysis
So 5n is O(n2 ) but n2 is not O(5n). Note that constants do not matter; saying f is
√ √
O( n) is the same thing as saying f is O( 22n).
The order (or growth rate) of a function is the simplest smallest function that it
is O of. It ignores coefficients and everything except the dominant term.
Example. Some would say f (n) = 2n2 + 3n + 1 is O(n3 ) and O(n2 ). But
its order is n2 .
(Check!)
14
6.3 Combining Functions
• Add. If T1 (n) is O(f (n)) and T2 (n) is O(g(n)), then
T1 (n) + T2 (n) is max(O(f (n)), O(g(n))).
That is, when you add, the larger order takes over.
6.4 Logarithms
The log base 2 of a number is how many times you need to multiply 2 together to get
that number. That is, log n = L ⇐⇒ 2L = n. Unless otherwise specified, computer
science log is always base 2. So it gives the number of bits. The function log n grows
forever, but it grows (much) slower than any power of n.
15
Example. A sequence of positive integers is a radio sequence if two in-
tegers the same value are at least that many places apart. Meaning, two
1s cannot be consecutive; two 2s must have at least 2 integers between
them; etc. Here is a test of this: this method is quadratic.
16
CpSc2120 – Goddard – Notes Chapter 7
Recursion
Often in solving a problem one breaks up the problem into subtasks. Recursion
can be used if one of the subtasks is a simpler version of the original problem.
7.1 An Example
Suppose we are trying to sort a list of numbers. We could first determine the minimum
element; and what remains to be done is to sort the remaining numbers. So the code
might look something like this:
Every recursive method needs a stopping case: otherwise we have an infinite loop
or an error. In this case, we have a problem when C is empty. So one always checks to
see if the problem is simple enough to solve directly.
Example. Printing out a decimal number. The idea is to extract one digit and then
recursively print out the rest. It’s hard to get the most significant digit, but one can
obtain the least significant digit (the “ones” column): use num % 10. And then num/10
is the “rest” of the number.
17
7.2 Tracing Code
It is important to be able to trace recursive calls: step through what is happening in
the execution. Consider the following code:
void g( int n ) {
if( n==0 ) return;
g(n-1);
cout << n;
}
It is not hard to see that, for example, g(3) prints out the numbers from 3 down
to 1. But, you have to be a bit more careful. The recursive call occurs before the value
3 is printed out. This means that the output is from smallest to biggest.
1
2
3
Here is some more code to trace:
void f( int n ) {
cout << n;
if(n>1)
f(n-1);
if(n>2)
f(n-2);
}
If you call the method f(4), it prints out 4 and then calls f(3) and f(2) in succession.
The call to f(3) calls both f(2) and f(1), and so on. One can draw a recursion tree:
this looks like a family tree except that the children are the recursive calls.
f(4)
f(3) f(2)
f(1)
Then one can work out that f(1) prints 1, that f(2) prints 21 and f(3) prints 3211.
What does f(4) print out?
18
Exercise
Give recursive code so that brackets(5) prints out ((((())))).
int fib(int n) {
if( n<2 )
return 1;
else
return fib(n-1) + fib(n-2);
}
Warning: Recursion is often easy to write (once you get used to it!). But occa-
sionally it is very inefficient. For example, the code for fib above is terrible. (Try to
calculate fib(30).)
19
erase columns and diagonals this queen attacks;
}
}
}
The recursive boolean method takes as parameter the remaining distance required,
and returns whether this is possible or not. If the remaining distance is 0, it returns
true. Else it considers each possible first step in turn. If it is possible to get home after
making that first step, it returns true; failing which it returns false. One can adapt
this to actually count the minimum number of steps needed. See code below.
One can also use recursion to explore a maze or to draw a snowflake fractal.
Sample Code
StepsByRecursion.cpp
20
Summary of C++ Data Structures
Wayne Goddard
School of Computing, Clemson University, 2018
8.1 ADT
An ADT or abstract data type defines a way of interacting with data: it specifies
only how the ADT can be used and says nothing about the implementation of the
structure. An ADT is conceptually more abstract than a Java interface specification
or C++ list of class member function prototypes, and should be expressed in some
formal language (such as mathematics).
A data structure is a way of storing data that implements certain operations.
When choosing a data structure for your ADT, you might consider many issues such
as whether the data is static or dynamic, whether the deletion operation is important,
and whether the data is ordered. In general
1. The basic collection is often called a bag . It stores objects with no ordering of
the objects and no restrictions on them.
2. Another unstructured collection is a set where repeated objects are not permit-
ted: it holds at most one copy of each item. A set is often from a predefined
universe.
21
8.3 The Array Implementation
A common implementation of a collection is a partially filled array . This is often
expanded every time it needs to be, but rarely shrunk. It has a pointer/counter which
keeps track of where the real data ends.
0 1 2 3 4 5 6
Amy Bo Carl Dana ? ? ?
count=4
Sample Code
An array-based implementation of a set of strings.
StringSet.h
StringSet.cpp
TestStringSet.cpp
22
CpSc2120 – Goddard – Notes Chapter 9
Linked Lists
These links are also called pointers. Both metaphors work. They are links because
they go from one node to the next, and because if the link is broken the rest of the list
is lost. They are called pointers because this link is (usually) one-directional—and, of
course, they are pointers in C/C++.
The first node is called the head node. The last node is called the tail node.
The first node has to be pointed to by some external holder; often the tail node is too.
One can use a struct or class to create a node. We use here a struct. Note that in
C++ a struct is identical to a class except that its members are public by default.
struct Node {
<data>
Node *link;
};
(where <data> means any type of data, or multiple types). The class using or creating
the linked list then has the declaration:
Node *head;
23
for( cursor=head; cursor!=nullptr; cursor=cursor->link ){
<do something with object referenced by cursor>
}
For insertion, there are two separate cases to consider: (i) addition at the root, and
(ii) addition elsewhere. For addition at the root, one creates a new node, changes its
pointer to where head currently points, and then gets head to point to it.
head
insert
In code:
cursor insert
24
9.3 Traps for Linked Lists
1. You must think of and test the exceptional cases: The empty list, the beginning
of the list, the end of the list.
2. Draw a diagram: you have to get the picture right, and you have to get the order
right.
9.4 Removal
The easiest case is removal of the first node. For this, one simply advances the head
to point to the next node. However, this means the first node is no longer referenced;
so one has to release that memory:
In general, to remove a node that is elsewhere in the list, one needs a reference to
the node before the node one wants to remove. Then, to skip that node, one needs
only to update the link of the node before: that is, get it to point to the node after
the one wants to delete.
remove
cursor
If the node before is referenced by cursor, then cursor->link refers to the node to
be deleted, and cursor->link->link refers to the node after. Hence the code is:
The problem is to organize cursor to be in the correct place. In theory, one would like
to traverse the list, find the node to be deleted, and then back up one: but that’s not
possible. Instead, one has to look one node ahead. And then beware nullptr pointers.
See sample code.
25
9.5 And Beyond
Arrays are better at random access: they can provide an element, given a position,
in constant time. Linked lists are better at additions/removals at the cursor: done in
constant time. Resizing arrays can be inefficient (but is “on average” constant time).
Doubly-linked lists have pointer both forward and backward. These are useful if
one needs to traverse the list in both directions, or to add/remove at both ends.
Dummy header/tail nodes are sometimes used. These allow some of the special
cases (e.g. empty list) to be treated the same as a typical case. While searching takes
a bit more care, both removal and addition are simplified.
One can also have circularly linked lists where the last node points to the first.
1st val
Head
Head
4th val 2nd val
1st val
3rd val
Sample Code
MyLinkedBag.h
MyLinkedBag.cpp
26
CpSc2120 – Goddard – Notes Chapter 10
Stacks and Queues
A linear data structure is one which is ordered. There are two special types with
restricted access: a stack and a queue.
• pop: remove the top element from the stack and return it
If the stack is empty and one tries to remove an element, this is called underflow .
Another common operation is called peek: this returns a reference to the top element
on the stack (leaving the stack unchanged).
A simple stack algorithm could be used to reverse a word: push all the characters
on the stack, then pop from the stack until it’s empty.
s
i
this→ →siht
h
t
10.2 Implementation
A stack is commonly and easily implemented using either an array or a linked list. In
the latter case, the head points to the top of the stack: so addition/removal (push/pop)
occurs at the head of the linked list.
27
Scan the string from left to right, and for each char:
1. If a left bracket, push onto stack
2. If a right bracket, pop bracket from stack
(if not match or stack empty then fail)
At end of string, if stack empty and always matched, then accept.
For example, suppose the input is: ( [ ] ) [ ( ] ) Then the stack goes:
[ (
→ → → → →
( ( ( [ [
With two stacks, we can evaluate each subexpression when we reach the closing
bracket:
Algorithm (assuming brackets are correct!) is as follows:
Scan the string from left to right and for each char:
1. If a left bracket, do nothing
2. If a number, push onto numberStack
3. If an operator, push onto operatorStack
4. If a right bracket, do an evaluation:
a) pop from the operatorStack
b) pop two numbers from the numberStack
c) perform the operation on these numbers (in the right order)
d) push the result back on the numberStack
At end of string, the single value on the numberStack is the answer.
28
5
11 – → 6 to be read: *(8/4))
nums ops nums ops
4
8 / 2
→ to be read: )
6 * 6 *
nums ops nums ops
2
6 * → 12
nums ops nums ops
Graham Scan
1. Sort points by angle from 0
2. Push 0 and 1. Set i=2
3. While i ≤ n do:
If i makes left turn w.r.t. top 2 items on stack
then { push i; i++ }
else { pop and discard }
We do not attempt to prove that the algorithm works. The running time: Each
time the while loop is executed, a point is either stacked or discarded. Since a point is
29
looked at only once, the loop is executed at most 2n times. There is a constant-time
method for checking, given three points in order, whether the angle is a left or a right
turn. This gives an O(n) time algorithm, apart from the initial sort which takes time
O(n log n).
For the points given earlier, the labeling is as follows:
7
8
6
4
5
0 3
1
The algorithm proceeds:
push(0)
push(1)
push(2)
pop(2), push(3)
pop(3), push (4)
push(5)
pop(5), push(6)
pop(6), push(7)
push(8)
push(0)
• void enqueue(QueueType ob): insert the item at the rear of the queue
• QueueType dequeue(): delete and return the item at the front of the queue
(sometimes called the first item).
A simple task with a queue is echoing the input (in the order it came): repeatedly
insert into the queue, and then repeatedly dequeue.
30
10.7 Queue Implementation as Array
The natural approach to implementing a queue is, of course, an array. This suffers
from the problem that as items are enqueued and dequeued, we reach the end of the
array but are not using much of the start of the array.
The solution is to allow wrap-around: after filling the array, you start filling it from
the front again (assuming these positions have been vacated). Of course, if there really
are too many items in the queue, then this approach will also fail. This is sometimes
called a circular array .
You maintain two markers for the two ends of the queue. The simplest is to maintain
instance variables:
• int count records the number of elements currently in the queue, and int capacity
the length of the array
• int front and int rear are such that: if rear≤front, then the queue is in positions
data[front] . . . data[rear]; otherwise it is in data[front] . . . data[capacity-1] data[0]
. . . data[rear]
31
10.9 Application: Discrete Event Simulation
There are two very standard uses of queues in programming. The first is in implement-
ing certain searching algorithms. The second is in doing a simulation of a scenario that
changes over time. So we examine the CarWash simulation (taken from Main).
The idea: we want to simulate a CarWash to gain some statistics on how service
times etc. are affected by changes in customer numbers, etc. In particular, there is a
single Washer and a single Line to wait in. We are interested in how long on average
it takes to serve a customer.
We assume the customers arrive at random intervals but at a known rate. We
assume the washer takes a fixed time.
So we create an artificial queue of customers. We don’t care about all the details
of these simulated customers: just their arrival time is enough.
for currentTime running from 0 up to end of simulation:
1. toss coin to see if new customer arrives at currentTime;
if so, enqueue customer
2. if washer timer expired, then set washer to idle
3. if washer idle and queue nonempty, then
dequeue next customer
set washer to busy, and set timer
update statistics
It is important to note a key approach to such simulations, is to look ahead whenever
possible. The overall mechanism is an infinite loop:
while(simulation continuing) do {
dequeue nextEvent;
update status;
collect statistics
precompute associated nextEvent(s) and add to queue;
}
Thus when we “move” the Customer to the Washer, we immediately calculate what
time the Washer will finish, and then update the statistics. In this case, it allows us
to discard the Customer: the only pertinent information is that the Washer is busy.
Sample Code
Here is code for an array-based stack, and a balanced brackets tester.
ArrayStack.h
ArrayStack.cpp
brackets.cpp
32
CpSc2120 – Goddard – Notes Chapter 11
Standard Template Library
11.1 Overview
The standard template library (STL) provides templates for data structures and algo-
rithms. Each data structure is in its own file. For example, there is vector, stack, and
set. These are implemented as templates (which we will discuss more later). For now,
it suffices to know that things like vector and stack are created to store a specific data
type. This data type is specified in angle brackets at declaration:
vector<int> A;
Thereafter we can just treat A as before. For example, the push back method adds an
item at the end of the vector. Another useful method is emplace(val); this adds an
object to the structure treating val as the input to the constructor for that object.
You can leave out the const_ part. Or even replace it with auto: this “typename”
can be used in places to help the reader where the compiler can infer the type. In the
above case, the vector template class also implements subscripting; so one could write:
33
int addup ( vector<int> & A ) {
int sum = 0;
for(int i=0; i<A.size(); i++ )
sum += A[i];
return sum;
}
A range-for loop can be used to process all the entries in some data structure.
E.g.
class Foo {
friend bool operator<(Foo & A, Foo & B);
};
bool operator<(Foo & A, Foo &B) { ... }
Sample Code
MyInteger.h
TestMyInteger.cpp
34
Summary of C++ Data Structures
Wayne Goddard
School of Computing, Clemson University, 2018
Part 3: Trees
12 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
13 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
14 More Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
15 Heaps and Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . 45
CpSc2120 – Goddard – Notes Chapter 12
Trees
Examples include:
• father/family tree
• UNIX file system: each node is a level of grouping
• decision/taxonomy tree: each internal node is a question
For example, here is an expression tree that stores the expression (7 + 3) ∗ (8 − 2):
∗
+ −
7 3 8 2
The descendants of a node are its children, their children etc. A node and its
descendants form a subtree. A node u is ancestor of v if and only if v is descendant
of u. The depth of a node is the number of ancestors (excluding itself); that is, how
many steps away from the root it is. Here is a binary tree with the nodes’ depths
marked.
0
1 1
2 2 2 2
3 3 3
Special trees: A binary tree is proper/full if every internal node has two children.
A binary tree is complete if it is full and every leaf has the same depth. (NOTE:
different books have different definitions.)
35
Note that:
struct BTNode {
<type> data;
BTNode *left;
BTNode *right;
};
If there is no child, then that child pointer is nullptr. It is common for tree methods
to return nullptr when a child does not exist (rather than print an error message or
throw an Exception).
root
elt1
null
elt2
null null
36
12.3 Animal Guessing Example
(Based on Main.) The computer asks a series of questions to determine a mystery
animal. The data is stored as a decision tree. This is a full binary tree where each
internal node stores a question: one child is associated with yes, one with no. Each
leaf stores an animal.
The program moves down the tree, asking the question and moving to the appro-
priate child. When a leaf is reached, the computer has identified the animal. The cool
idea is that if the program is wrong, it can automatically update the decision tree: If
the program is unsuccesful in a guess, it prompts the user to provide a question that
differentiates its answer from the actual answer. Then it replaces the relevant node by
a guess and two children.
Code for such a method might look something like:
preorder(Node *v) {
visit node v
preorder ( left child of v )
preorder ( right child of v )
}
2 6
3 5 7 8
4 9 10
37
The standard application of a preorder traversal is printing a tree in a special way:
for example, the indented printout below:
The most common traversal is a postorder traversal . In this, each node is visited
after its children (so the root is last). Here is a tree labeled with postorder:
10
4 9
2 3 5 8
1 6 7
Examples include computation of disk-space of directories, or maximum depth of a
leaf. For the latter:
For the code, the time is proportional to the size of the tree, that is, it is O(n).
Practice. Calculate the size (number of nodes) of the tree using recursion.
38
CpSc2120 – Goddard – Notes Chapter 13
A binary search tree is used to store ordered data to allow efficient queries and
updates.
left descendants are smaller, right descendants are bigger. (One can adapt
this to allow repeated values.)
This assumes the data comes from a domain in which there is a total order : you can
compare every pair of elements (and there is no inconsistency such as a < b < c < a).
In general, we could have a large object at each node, but the object are sorted with
respect to a key .
Here is an example:
53
31 57
12 34 56 69
5 68 80
An inorder traversal is when a node is visited after its left descendants and
before its right descendants. The following recursive method is started by the call
inorder(root).
An inorder traversal of a binary search tree prints out the data in order.
39
13.2 Insertion in BST
To find an element in a binary search tree, you compare it with the root. If larger,
go right; if smaller, go left. And repeat. The following method returns nullptr if not
found:
Node *find(key x) {
Node *t=root;
while( t!=nullptr && x!=t->key )
t = ( x<t->key ? t->left : t->right );
return t;
}
Insertion is a similar process to searching, except you need a bit of look ahead.
Here is a strange-looking recursive version:
• Node x has only one child. Then delete the node and do “adoption by grand-
parent” (get old parent of x to point to old child of x).
• Node x has two children. Then find the node y with the next-lowest value:
go left, and then go repeatedly right (why does this work?). This node y cannot
have a right child. So swap the values of nodes x and y, and delete the node y
using one of the two previous cases.
40
The following picture shows a binary search tree and what happens if 11, 17, or 10
(assuming replace with next-lowest) is removed.
10
8 17
6 11
5 5 5
10 10 8
or or
8 17 8 11 6 17
6 6 11
All modification operations take time proportional to depth. In best case, the depth
is O(log n) (why?). But, the tree can become “lop-sided”—and so in worst case these
operations are O(n).
Sample Code
Here is code for a binary search tree.
BSTNode.h
BinarySearchTree.h
BinarySearchTree.cpp
41
CpSc2120 – Goddard – Notes Chapter 14
4. Every down-path from root/node to nullptr contains the same number of black
nodes.
red
3 8
black
7 9
B C
C B
D D
The simplest (but not most efficient) method of insertion is called bottom up
insertion. Start by inserting as per binary search tree and making the new leaf red.
The only possible violation is that its parent is red.
42
This violation is solved recursively with recoloring and/or rotations. Everything
hinges on the uncle:
1. if uncle is red (but nullptr counts as black), then recolor: parent & uncle → black,
grandparent → red, and so percolate the violation up the tree.
14.3 B-Trees
Many relational databases use B-trees as the principal form of storage structure. A
B-tree is an extension of a binary search tree.
In a B-tree the top node is called the root. Each internal node has a collection of
values and pointers. The values are known as keys. If an internal node has k keys,
then it has k +1 pointers: the keys are sorted, and the keys and pointers alternate. The
keys are such that the data values in the subtree pointed to by a pointer lie between
the two keys bounding the pointer.
The nodes can have varying numbers of keys. In a B-tree of order M , each internal
node must have at least M/2 but not more than M − 1 keys. The root is an exception:
it may have as few as 1 key. Orders in the range of 30 are common. (Possibly each
node stored on a different page of memory.)
The leaves are all at the same height. This stops the unbalancedness that can occur
with binary search trees. In some versions, the keys are real data. In our version, the
real data appears only at the leaves.
It is straight-forward to search a B-tree. The search moves down the tree. At a
node with k keys, the input value is compared with the k keys and based on that, one
of the k + 1 pointers is taken. The time used for a search is proportional to the height
of the tree.
43
The insertion of a value into a B-tree can be stated as follows. Search for correct
leaf. Insert into leaf. If overfull then split. If parent full then split it, and so on up the
tree. If the root becomes overfull, it is split and a new root created. This is the only
time the height of the tree is increased.
For example, if we set M = 3 and insert the values 1 thru 15 into the tree, we get
the following B-tree:
5 9
3 7 11 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Adding the value 16 causes a leaf to split, which causes its parent to split, and the
root to split, and the height of the tree is increased:
9
5 13
3 7 11 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Deletion from B-trees is similar but harder. Some code for a B-tree implementation
is included in the chapter on inheritance.
Sample Code
Here is code for red-black tree. Note that we have adapted the code for binary search
trees given in the previous chapter. An alternative would have been to use inheritance,
where RBNode extends BSTNode and RedBlackTree extends BinarySearchTree.
RBNode.h
RedBlackTree.h
RedBlackTree.cpp
44
CpSc2120 – Goddard – Notes Chapter 15
Heaps and Priority Queues
• removeMin(): Remove and return item with minimum key (Error if priority queue
is empty).
insert removeMin
15.2 Heap
In level numbering in binary trees, the nodes are numbered such that:
for a node numbered x, its children are 2x+1 and 2x+2
Thus a node’s parent is at (x-1)/2 (rounded down), and the root is 0.
0
1 2
3 4 5 6
7 8 9
One can store a binary tree in an array/vector by storing each value at the position
given by level numbering. But this is wasteful storage, unless nearly balanced.
We can change the definition of complete binary tree as a binary tree where each
level except the last is complete, and in the last level nodes are added left to right.
With this definition, a min-heap is a complete binary tree, normally stored as a
vector, with values stored at nodes such that:
45
heap-order property: for each node, its value is smaller than or equal to
its children’s
24 19
25 56 68 40
29 31 58
A max-heap can be defined similarly.
24 19 12 19
⇒
25 56 68 40 25 24 68 40
29 31 58 12 29 31 58 56
The idea for removeMin is to Replace with value from last leaf, delete last leaf,
and bubble down value until heap-order property re-established.
46
Algorithm: RemoveMin()
temp = value of root
swap root value with last leaf
delete last leaf
v = root
while v > any child(v) {
swapElements(v, smaller child(v))
v= smaller child(v)
}
return temp
Here is an example of RemoveMin:
7 19
24 19 24 40
⇒
25 56 68 40 25 56 68 58
29 31 58 29 31
It is clear that inserting n values into a heap takes at most O(n log n) time. Possibly
surprising, is that we can create a heap in linear time. Here is one approach: work up
the tree level by level, correcting as you go. That is, at each level, you push the value
down until it is correct, swapping with the smaller child.
Analysis: Suppose the tree has depth k and n = 2k+1 − 1 nodes. An item that
starts at depth j percolates down at most k − j steps. So the total data movement is
at most
k
2j (k − j),
X
j=0
47
Thus we get Heap-Sort. Note that one can re-use the array/vector in which
heap in stored: removeMin moves the minimum to end, and so repeated application
produces sorted the list in the vector.
A Heap-Sort Example is:
heap
1 3 2 6 4 5
heap
2 3 5 6 4 1
heap
3 4 5 6 2 1
heap
4 6 5 3 2 1
heap
5 6 4 3 2 1
heap
6 5 4 3 2 1
traverse the string until the part you have covered so far is a valid code;
cut it off and continue.
Repeat
merge two (of the) rarest characters into a mega-character
whose occurrence is the combined
Until only one mega-character left
Assign mega-character the code EmptyString
Repeat
48
split a mega-character into its two parts assigning each of these
the mega-character’s code with either 0 or 1
The information can be organized in a trie: this is a special type of tree in which
the links are labeled and the leaf corresponds to the sequence of labels one follows to
get there.
For example if 39 chars are A=13, B=4, C=6, D=5 and E=11, we get the coding
A=10, B=000, C=01, D=001, E=11.
0 1
15 24
0 1 0 1
9
0 1 C=6 A=13 E=11
B=4 D=5
Note that a priority queue is used to keep track of the frequencies of the letters.
Sample Code
PriorityQ.h
Heap.h
Heap.cpp
49
Summary of C++ Data Structures
Wayne Goddard
School of Computing, Clemson University, 2018
Inheritance
The derived class automatically has the methods of the base class as member func-
tions, unless declared as private in the base class. They may be declared as protected
in the base class to allow direct access only to extensions. Similarly, the derived class
can access the instance variables of the base class, provided not private.
50
are expected. There are times an object may need to be cast back to its original type.
Note that the actual dynamic type of an object is forever fixed.
Suppose class Rectangle extends class Shape. Then we could do the following as-
signments:
Shape *X;
Rectangle *Y;
Y = new Rectangle();
X = Y; // okay
X = new Rectangle(); // okay
Y = static_cast<Rectangle*>(X); // cast needed
class Shape {
virtual void foo(){ cout << "base foo"; }
void bar() { cout << "base bar"; }
};
class Rectangle : public Shape {
void foo( ) { cout << "derived foo"; }
void bar( ) { cout << "derived bar";}
};
Shape *X;
Rectangle *Y;
X = Y = new Rectangle();
Y -> foo( ); // prints derived foo
51
Y -> bar( ); // prints derived bar
X -> foo( ); // prints derived foo
X -> bar( ); // prints base bar
One can access in the Derived class the Base version of an overridden function by
using the scope resolution operator: Base::fooBar().
52
the calling program (but then one can only execute Number’s methods on ticket).
Sample Code
A (somewhat artificial) example of using inheritance to create a 3-dimensional point
given code for a 2-dimensional point.
TwoDPoint.h
TwoDPoint.cpp
ThreeDPoint.h
ThreeDPoint.cpp
TestPoint.cpp
BTreeNode.h
BTreeInternal.cpp
BTreeLeaf.cpp
BTree.h
BTree.cpp
53
CpSc2120 – Goddard – Notes Chapter 17
Templates & Exceptions
We briefly consider exceptions and templates.
17.1 Exceptions
An exception is an unexpected event that occurs when the program is running. For
example, if new cannot allocate enough space, this causes an exception. An exception
is explicitly thrown using a throw statement. A throw statement must specify an
exception object to be thrown. There are exceptions already defined; it is also possible
to create new ones. (Or one can, for example, throw an int.)
A try clause is used to delimit a block of code in which a method call or operation
might cause an exception. If an exception occurs within a try block, then C++ aborts
the try block, executes the corresponding catch block, and then continues with the
statements that follow the catch block. If there is no exception, the catch block is
ignored. All exceptions that are thrown must be eventually caught. A method might
not handle an exception, but instead propagate it for another method to handle.
Good practice says that one should state which functions throw exceptions. This
is achieved by having a throw clause that lists the exceptions that can be thrown by
a method. Write the exception handlers for these exceptions in the program that uses
the methods.
17.2 Templates
Thus far in our code we have defined a special type for each collection. Templates let
the user of a collection tell the compiler what kind of thing to store in a particular
instance of a collection. We saw already that if we want a set from the STL that stores
strings, we say
set<string> S;
54
{ }
T data;
Node *next;
} ;
It is standard to break the class heading over two lines. To a large extent, one can
treat Node<> as just a new class type.
Note that one can write code assuming that the parameter (T or U) implements
various operations such as assignment, comparison, or stream insertion. These assump-
tions should be documented! Note that:
the template code is not compiled abstractly; rather it is compiled for each
instantiated parameter choice separately.
Consequently, the implementation code must be in the template file: one can
#include the cpp-file at the end of the header file.
As example, iterators allow one to write generic code. For example, rather than
having a built-in boolean contains function, one does:
The algorithm library has multiple templates for common tasks in containers.
55
Sample Code
Note that this code is compiled with g++ TestSimpleList.cpp only. (SimpleList.cpp is
#included by its header file.)
SimpleList.h
SimpleList.cpp
TestSimpleList.cpp
56
Summary of C++ Data Structures
Wayne Goddard
School of Computing, Clemson University, 2018
18.1 Dictionary
The dictionary ADT supports:
• insertItem(e): Insert new item e
18.2 Components
The hash table is designed to do the unsorted dictionary ADT. A hash table consists
of:
• an array of fixed size (normally prime) of buckets
• convert the int to the required range by taking it mod the table-size
A natural method of obtaining a hash code for a string is to convert each char to
an int (e.g. ASCII) and then combine these. While concatenation is possibly the most
obvious, a simpler combination is to use the sum of the individual char’s integer values.
But it is much better to use a function that causes strings differing in a single bit to
have wildly different hash codes.
For example, compute the sum
ai 37i
X
57
18.4 Collision-Resolution
The simplest method of dealing with collisions is to put all the items with the same
hash-function value into a common bucket implemented as an unsorted linked list: this
is called chaining .
The load factor of a table is the ratio of the number of elements to the table size.
Chaining can handle load factor near 1
1 BigBro
4 MathsTest
5 Survivor Dentist
• linear probing : move down array until find vacant (and wrap around if needed):
look at h, h + 1, h + 2, h + 3, . . .
Linear probing causes chunking in the table, and open addressing likes load factor
below 0.5.
Operations of search and delete become more complex. For example, how do we
determine if string is already in table? And deletion must be done by lazy deletion:
when the entry in a bucket is deleted, the bucket must be marked as “previously used”
rather than “empty”. Why?
58
18.5 Rehashing
If the table becomes too full, the obvious idea is to replace the array with one double
the size. However, we cannot just copy the contents over, because the hash value is
different. Rather, we have to go through the array and re-insert each entry.
One can show (a process called amortized analysis) that this does not signifi-
cantly affect the average running time.
59
CpSc2120 – Goddard – Notes Chapter 19
Sorting
We have already seen one sorting algorithm: Heap Sort. This has running time
O(n log n). Below are four more comparison-based sorts; that is, they only compare
entries. (An example of an alternative sort is radix sort of integers, which directly
uses the bit pattern of the elements.)
Say the input is an array. Then the natural implementation is such that the sorted
portion is on the left and the yet-to-be-examined elements are on the right.
In the worst case, the running time of Insertion Sort is O(n2 ); there are n additions
each taking O(n) time. For example, this running time is achieved if the list starts in
exactly reverse order. On the other hand, if the list is already sorted, then the sort
takes O(n) time. (Why?)
Insertion Sort is an example of an in situ sort; it does not need extra temporary
storage for the data. It is also an example of a stable sort: if there are duplicate
values, then these values remain in the same relative order.
Since in phase k we end with a single Insertion Sort, the process is guaranteed to sort.
Why then the earlier phases? Well, in those phases, elements can move farther in
one step. Thus, there is a potential speed up. The most natural choice of sequence is
hi = n/2i . On average this choice does well; but it is possible to concoct data where
this still takes O(n2 ) time. Nevertheless, there are choices of the hi that guarantee
Shell Sort takes better that O(n2 ) time.
60
1. Arbitrarily split the data
2. Call MergeSort on each half
3. Merge the two sorted halves
The only step that actually does anything is the merging. The question is: how to
merge two sorted lists to form one sorted list. The algorithm is:
repeatedly: compare the two elements at the tops of both lists, removing the
smaller.
The running time of Merge Sort is O(n log n). The reason for this is that there are
log2 n levels of the recursion. At each level, the total work is linear, since the merge
takes time proportional to the number of elements.
Note that a disadvantage of Merge Sort is that extra space is needed (this is not
an in situ sort). However, an advantage is that sequential access to the data suffices.
19.4 QuickSort
A famous recursive divide-and-conquer algorithm is QuickSort.
1. Pick a pivot
2. Partition the array into those elements smaller and those elements bigger
than the pivot
3. Call QuickSort on each piece
The most obvious method to picking a pivot is just to take the first element. This
turns out to be a very bad choice if, for example, the data is already sorted. Ideally
one wants a pivot that splits the data into two like-sized pieces. A common method
to pick a pivot is called middle-of-three: look at the three elements at the start,
middle and end of the array, and use the median value of these three. The “average”
running time of QuickSort is O(n log n). But one can concoct data where QuickSort
takes O(n2 ) time.
There is a standard implementation. Assume the pivot is in the first position.
One creates two “pointers” initialized to the start and end of the array. The pivot is
removed to create a hole. The pointers move towards each other, one always pointing
to the hole. This is done such that: the elements before the first pointer are smaller
than the pivot and the elements after the second are larger than the pivot, while the
elements between the pointers have not been examined. When the pointers meet, the
hole is refilled with the pivot, and the recursive calls begin.
61
19.5 Lower Bound for Sorting
Any comparison-based sorting algorithm has running time at least O(n log n).
Here is the idea behind this lower bound. First we claim that there are essentially
n! possible answers to the question: what does the sorted list look like. One way to see
this, is that sorting entails determining the rank (1 to n) of every element. And there
are n! possibilities for the list of ranks.
Now, each operation (such as a comparison) reduces the number of possibilities by
at best a factor of 2. So we need at least log2 (n!) steps to guarantee having narrowed
down the list to one possibility. (The code can be thought of as a binary decision tree.)
A mathematical fact (using Stirling’s formula) is that log2 (n!) is O(n log n).
Sample Code
Here is template code for Insertion Sort. We also introduce the idea of a comparator ,
where the user can specify how the elements are to be compared.
Sorting.cpp
62
CpSc2120 – Goddard – Notes Chapter 20
Algorithmic Techniques
There are three main algorithmic techniques: Divide and conquer, greedy algo-
rithms, and dynamic programming.
1. Divide and Conquer. In this approach, you find a way to divide the problem
into pieces such that: if you recursively solve each piece, you can stitch together
the solutions to each piece to form the overall solution. Both Merge Sort and
QuickSort are classic examples of divide-and-conquer algorithms. Another fa-
mous example is modular exponentiation (used in cryptography).
3. Dynamic Programming. If you find a way to break the problem into pieces, but
the number of pieces seems to explode, then you probably need the technique
known as dynamic programming. We do not study this.
63
CpSc2120 – Goddard – Notes Chapter 21
Graphs
21.1 Graphs
A graph has two parts: vertices (one vertex) also called nodes. An undirected
graph has undirected edges. Two vertices joined by edge are neighbors. A directed
graph has directed edges/arcs; each arc goes from in-neighbor to out-neighbor .
Examples include:
• city map
• circuit diagram
• chemical molecule
• family tree
Adjacency Matrix
1) container of numbered vertices, and
2) array where each entry has info about the corresponding edge.
Adjacency List
1) container of vertices, and
2) for each vertex an unsorted bag of out-neighbors.
64
An example directed graph (with labeled vertices and arcs):
B
orange black
A green C
red
blue
white yellow
D
E
Adjacency array:
A B C D E
A — orange — — —
B — — black green blue
C — — — — —
D — — yellow — —
E white red — — —
Adjacency list:
A orange, B
B black, C green, D blue, E
C
D yellow, C
E red, B white, A
21.3 Aside
Practice. Draw each of the following without lifting your pen or going over the same
line twice.
65
A topological ordering is an ordering of the vertices such that every arc goes
from lower number to higher number vertex.
Example. In the following DAG, one topological ordering is: E A F B D C.
B
A C
D
F E
For efficiency, use the Adjacency List representation of the graph. Also:
1. maintain a counter in-degree at each vertex v; this counts the arcs into the vertex
from “nondeleted” vertices, and decrement every time the current source has an
arc to v (no actual deletions).
66
Sample Code
Here is an abstract base class DAG, an implementation of topological sort for that class,
and an adjacency-list implementation of the class
Dag.h
GraphAlgorithms.cpp
AListDAG.h
AListDAG.cpp
67
CpSc2120 – Goddard – Notes Chapter 22
Paths & Searches
Visit the source; then all its neighbors; then all their neighbors; and so on.
If the graph is a tree and one starts at the root, then one visits the root, then the root’s
children, then the nodes at depth 2, and so on. That is, one level at a time. This is
sometimes called level ordering .
1
2 3
4 5 6 7
8 9 10
BFS uses a queue: each time a node is visited, one adds its (not yet visited) out-
neighbors to the queue of nodes to be visited. The next node to be visited is extracted
from the front of the queue.
keep exploring new vertex from current vertex; when get stuck, backtrack to
most recent vertex with unexplored neighbors
68
In DFS, the seach continues going deeper into the graph whenever possible. When
the search reaches a dead end, it backtracks to the last (visited) node that has un-
visited neighbors, and continues searching from there. A DFS uses a stack : each time
a node is visited, its unvisited neighbors are pushed onto the stack for later use, while
one of its children is explored next. When one reaches a dead end, one pops off the
stack. The edges/arcs used to discover new vertices form a tree.
Example. Here is graph and a DFS-tree from vertex A:
B B
A C A C
D D
F E F E
If the graph is itself a tree, we can still use DFS. Here is an example:
1
2 6
3 5 7 8
4 9 10
Algorithm: DFS(v):
for all edges e outgoing from v
w = other end of e
if w unvisited then {
label e as tree-edge
recursively call DFS(w)
}
Note:
• DFS visits all vertices that are reachable
• to keep track of whether visited a vertex, one must add field to vertex (the
decorator pattern)
69
Algorithm: 1. Do a DFS from arbitrary vertex v & check that
all vertices are reached
2. Reverse all arcs and repeat
22.4 Distance
The distance between two vertices is the minimum number of arcs/edges on path
between them. In a weighted graph, the weight of a path is the sum of weights
of arcs/edges. The distance between two vertices is the minimum weight of a path
between them. For example, in a BFS in an unweighted graph, vertices are visited in
order of their distance from the start.
Example. In the example graph below, the distance from A to E is 7 (via vertices
B and D):
B
4
4
A
3 C
9 2
5
8
E 1
6
D
F 2
For each vertex, maintain dist giving minimum weight of path to it found so far. Each
iteration, choose a vertex of minimum dist, finalize it and update all dist values.
70
If doing this by hand, one can set in out in a table. Each round, one circles
the smallest value in an unfinalized column, and then updates the values in all other
unfinalized columns.
Example. Here are the steps of Dijkstra’s algorithm on the graph of the previous
page, starting at A.
A B C D E F
0 ∞ ∞ ∞ ∞ ∞
4 ∞ ∞ ∞ 5
8 6 ∞
5
8
6 ∞
8
7
8
Comments:
• Implementation: store boolean array known. To get the actual shortest path,
store Vertex array prev.
71