Data Structures and Algorithms
Data Structures and Algorithms
Logarithmic 1 2 3 4
Linear 2 4 8 16
NlogN 2 8 24 64
Quadratic 4 16 64 256
2n 4 16 256 65536
How to measure recursive functions
N is input
T(N) is T as a function of N, T is how long it takes
T(1) = 1 base case
If N>1
2 recursive calls, on size N/2 arrays
T(N/2) + T(N/2) = 2 T(N/2)
T(N) = 2T(N/2) + N
What is T(N/2)
T(N/2) = 2T(N/40) + N/2
Say N = 8
T(8) = 2T(4) + 4 = 32
T(4) = 2T(2) + 2 = 12
T(2) = 2T(1) + 2 = 4
T(1) = 1
N = 2k so that k = log2N
T(N) = N* (k+1) = Nk + N = Nlog2N + N = O(Nlog2N) drop the + N
Template class examples:
Can't have the same typename for two templates that include one or
the other
"#include "stacknode.cpp"
template <typename Comparable>
class Stack{
public:
Comparable first(){return top->getdata();}
Comparable pop()
{
Comparable temp = top->data;
StackNode<Comparable>* t = top->next;
delete top;
top = t;
return temp;
//save the first one, delete it, then point head to the next one
}
void push(Comparable c)
{
StackNode<Comparable> * s = new StackNode<Comparable>(c);
s->next = top;
top = s;
}
bool empty()
{
if(top)
return false;
else
return true;
}
private:
StackNode<Comparable>* top;
};"
"{}"
Chapter 3
Friday, February 08, 2008
11:01 AM
Different ways of arranging data
Abstract data types ADTs
Lists
List of stuff, duh
List nodes
Stacks
New stuff at the beginning, stuff to take off from the beginning
Last In, First Out
Queues
New stuff at end
First In First Out
Can be implemented with linked lists or contiguous arrays, behind the scenes,
doesn’t matter
Next program due Feb 18th
Converts infix to postfix
Infix
1. 8+6*2
2. 9*8/2 + 3 - 4*7
Postfix expression
1. 8 62* +
2. 98*2/ 3 47*- +
An input file with the equations as infixes (with parentheses) in a .txt, no spaces
Assume all numbers are one digit numbers, all positive
main
For all expressions
Read expression
Postfix = convert(expr)
Answer = eval (postfix)
All of these three implement the class I make
int precedence (char)
Character precedence
( 0
* or / 2
or 3
+
Use get line
Gets a whole line from an input file and puts it in a cstring with a /0
Returns null when end of .txt file is reached
Push means put on bottom/top of stack
Pop means take out of bottom of stack
After this, you have to evaluate the expression
Use a character converter, or take the ascii code of the character and subtract what the
ascii code of 0
Then caste them into doubles because, with /, you need the precision
For operator characters, use if statement if(char == '*') then multiply
Must have definition of a stack class
"Class Stack{
public:
char top(){return top->getdata();}
char pop(){save the first one, delete it, then point head
to the next one}
void push(char c) {StackNode * s = new (nothrow)
StackNode(c);
s->next = top;
top = s; }
private:
StackNode * head;
}"
"Class StackNode{
friend Stack; //friend the class Stack so they can
chadBrochill
public:
char getdata(){ return data;}
StackNode (char c) {data = c;}
private:
char data; //or template it
StackNode* next; //null if nothing
}"
Iterators
"class List{
public:
List();
void insert(int);
int delete();
private:
ListNode * head;
}"
"void List::insert(int d)
{
//ITERATOR
ListNode* n = new ListNode(d);
int c;
ListNode*ptr;
ptr = head;
if(ptr ==NULL) //if empty list
{
head=n;
return; //even though it's a void function, returning nothing is ok. BOLD
}
while(ptr->next!=NULL)
{
ptr= ptr->next;
}//insterts at end
ptr->next=n;
}"
"class ListNode{
public:
ListNode(int d){data = d; next=NULL;}
private:
friend class List;
int data;
ListNode* next;
}"
"int main{
List lst;
}"
Debugging
In cygwin command line
g++ -g program.cpp -o output.exe
gdb output.exe
run
Have a second cygwin or other terminal window open so that you can recompile
the program without exiting gdb
In gdb, use kill to terminate current program, recompile it in the other window,
then use run again in gdb
"{}"
Chapter 4
Friday, February 22, 2008
11:00 AM
ADT abstract data type tree
Nodes, root
Like upside down tree because roots at top
Nodes that don’t have children are called leaves (NULL children)
Pasted from <http://www.toves.org/books/data/ch05-trees/not-a-tree.png>
Binary tree: 2 children only per node, like china
If complete:
Binary search tree,
Each node has a certain data, if its sorted, then you can save a lot of time
In order traversal, prints in order
To write classes u need two; Tree and TreeNode
Tree has TreeNode* root;
TreeNode class has
Template<data> maybe, a lil som som
Tnode* lchild
Tnode* rchild
Page 222 in book
Evaluate the tree
Start at the top pointer root
Program
Stack
Pointer to tree node in the stack
Use the stack inside the conversion of the infix expression to expression, stack of chars
Only using stack as a temporary data structure
Use stack in the constructor of tree
Pop them on to the tree
Shouldn’t need the stack in the evaluation, have the tree point to the tree that was in the
stack
Another way of implementing a tree thing with only two pointers per node is first child and next
sibling
Pre order traversal
starts at the root and works its way down to the leaves
"pre(p)
{
if (p == null)
return;
print;
pre(p->lchild);
pre(p->rchild);
}
Post order traversal
goes to "leaves" nodes then works its way back up to the root
post(p)
{
if (p==NULL)
return;
post(p->lchild);
post(p->rchild);
print;
}
In order traversal
works best for a binary tree that is sorted, so the numbers come out in order, prioritizes
left child
in(p)
{
if(p==NULL)
return;
in(p->lchild);
print;
in(p->rchild);
}"
For almost all binary search trees, you need to overload operator < (comparable)
AVL trees
Figuring out better and better ways of storing things
If you keep inserting a larger or smaller object, then you get a lopsided tree
AVL trees make sure the height of the right side of the tree is the same a the left side, but
that's not enough
For every node, the difference between the heights of the l and r sub children
differs by <= 1
Needs to be true for every node and sub tree too! Not just root
When inserting, it is important that this is maintained
To maintain this property, insert a new node as usual, then when inserted, as
you go back up through the recursion, check the AVL property of each node
Check that the heights of children are ok (differ by <= 1)
If the property is VIOLATED upon an insert, find the problem or alpha node,
Height = length of the longest path from that node to a leaf node, -1 if NULL
Pasted from <http://users.informatik.uni-halle.de/~jopsi/dinf204/avl_tree.gif>
Consider 4 cases
From the alpha node when you did the insertion, did you go:
Left left
Right right
Single rotation
Left right
Right left
Double rotation
http://www.cs.jhu.edu/~goodrich/dsa/trees/avltree.html
Check text book for how to do a rotation
In studying,
Be sure to read
4.8
skip in chapter 4
Average case analysis 4.3.6
Splay trees 4.5
Sets
A way of storing stuff and erasing and inserting efficiently
Maps
A way of storing a key and a value effeci
"{}"
Chap 4 class notes
Tuesday, March 04, 2008
12:24 PM
Chapter 5
Monday, March 17, 2008
11:03 AM
Hashes
Has table
Records
Each record has a key, string or int
Has to be some algorithm (hashing function) to turn it into a [] array index
Hash table is an array
Take the key, turn it into an index into the array
Table size
For example if key was strings, you could convert each char into an int then add them
together then divide them do be something within the table size
Collision: if the hashing function that makes two different keys have the same position in
an array
Have it be a linked list so the array can point to one key which points to another
Best to make table size a prime number, so that when you do %TableSize you get an
even distribution of keys
Separate
Hash table class necessities
== != operators
My hash
The hashing function /algorithm
Insert
Remove
Contains (bool)
Rehash
Class that implements hash table has to have a Hash function in it
56 92 14 7 89 32 71 17 23 46 60 69
7 table size, hash function is just %7
0 ==> 56=> 14=> 7
1 ==> 92
2 ==>
3 ==>
4 ==>
5 ==>
6 ==>
Linear probing (as opposed to quadratic probing)
Go down and find an empty space
0 ==> 56
1 ==> 92
2 ==> 14
3 ==> 7
4 ==>
5 ==>
6 ==>
If table < tablesize/2
If more than table size/2, increase size
When you increase size, use the next largest prime number after 2*tablesize
Then rehash (re assign the keys by re-doing the hash function with the new table size
First you go where ever the hash functions takes you
If collision, go to next spot (12 if quadratic)
If collision go next spot, (22 , 4 spots if quadratic, don't forget to use % for > table
size)
Proof: quadratic probing
If using quadratic probing is used and the table size is prime, then a new element can
always be inserted if the table is at least half empty
Say table size is 7, ceiling(7/2) is 4
Hash function (x), h(x)+12 % table size, h(x)+22 % 7, h(x)+32 % 7, h(x)+42 % 7
Do this ceiling(T/2) times and it's ok
Insert 1st thing
Next one has a collision, you'll get a new location
Two of these locations are h(x)+i2%T and h(x)+j2%T
Proof by contradiction
What if they weren't distinct
h(x)+i2 %T =h(x)+j2 %T our assumption, and therefore A%T = B%T
thenT divides(A-B)
T divides (h(x)+i2 - h(x)+j2 )
T divides i2-j2
(i2-j2)%T = 0
(i-j)(i+j)%T=0 therefore it must be T*some number, therefore (i-
j)v(i+j)=T
i-j can't be divisible by T because they are both less than half of T
i+j can't be divisible by T because they are both less than half of T
Contradiction
So for the first floor(T/2) insertions, you'll always get different indexes
We assumed you have ceiling(T/2) empty slots
h(x)+i2 =h(x)+j2
i2 =j2
i = j contradiction
What if data needs to be stored on disk read/write
Extendible hashing
Collisions and extensive programming can make for a lot of disk reads, so you want
to limit them
Sbolution: keep it shallow
Pasted from <http://en.wikipedia.org/wiki/Binary_heap>
Complete tree
All nodes have key values < their children (opposite a binary tree)
(assuming no duplicates)
Smallest always at top
ith element
Left child : 2*I
Right child: (2*i)+1
Parent: floor(i/2)
I think you hash it to an array, then make a tree. So in the array, you need that ith element stuff
Height (if full)
If height is 2, full nodes = 2h+1-1 = 7
h=3 15
4 31
If barely that height
Nodes = 2h
h=2 4
3 8
4 16
Inserting
Eh
Max heap example
Pasted from <http://en.wikipedia.org/wiki/Binary_heap>
Pasted from <http://en.wikipedia.org/wiki/Binary_heap>
Pasted from <http://en.wikipedia.org/wiki/Binary_heap>
Deleting
From root
The procedure for deleting the root from the heap — effectively extracting the maximum
element in a max-heap or the minimum element in a min-heap — starts by replacing it
with the last element on the last level. So, if we have the same max-heap as before, we
remove the 11 and replace it with the 4.
Now the heap property is violated since 8 is greater than 4. The operation that restores
the property is called down-heap, bubble-down, percolate-down, or sift-down. In this case,
swapping the two elements 4 and 8, is enough to restore the heap property and we need
not swap elements further:
In general, the wrong node is swapped with its larger child in a max-heap (in a min-heap it
would be swapped with its smaller child), until it satisfies the heap property in its new
position. Note that the down-heap operation (without the preceding swap) can be used in
general to modify the value of the root, even when no element is being deleted.
Pasted from <http://en.wikipedia.org/wiki/Binary_heap>
Program
Customers in line
You don’t want the line to get too long
Need to have sense of when people arrive
Arrival heap BinaryHeap<int> int for arrival times
100 customers is a good start
Randomly generate numbers on when they arrive for how long they'll take
(don't need to store this with heap, can be done when customer gets there) Then
randomly generate a number (1-5) minutes to serve the customer
Events
Customer arriving
Customer finishing
Don't use ticks
Calculate average wait time (n queue length?)
Clock
Int variable
Start at 1
While (heap not empty)
Keep going thru customers and stuff
All k tellers are full, customer waiting or not
Advance time to earliest departure of customer
Number of tellers busy--
A customer is finished at teller
Check time finished, check time of next arrival
Advance clock to earliest departure of customer if < departure
If person waited, add wait time to wait time variable
All tellers open no one waiting
Pull someone off arrival queue, increment k, make sure customer
arrived on time
No one at tell, waiting for next customer
1 or 2 tellers full
Properties of leftest heap
Binary tree
Key in a nobe <=key in children nodes
Null path lenth (npl) of a node's left child >= npl of right child
Null path length is
The length of the shortest path from x to a node without two children
Of a null node is -1
Makes merging more efficient if you maintain a leftest tree
Merging algorithm
merge (tree1, tree2)
If either tree is empty, return the non empty one,
Else
If tree1 has a smaller key, then smaller is tree1, larger is tree2
Right = merge(larger, rightsubheap)
Make right the new right subheap of smaller
If npl is violated on newly created heaph, swap left and right children
Return this heap
Binomial Queue
Forest of heaps
Forest of heaps
Pasted from <http://en.wikipedia.org/wiki/Binomial_heap>
It doubles each time
Pasted from <http://en.wikipedia.org/wiki/Binomial_heap>
Leftist heaps
Insertion delete min, merge
Worst case O(log N)
Not average consant time for insertion
Binomial queues do have average constant time for insertion
Chapter 7
Monday, April 07, 2008
11:06 AM
Sorting
Read the book
Sections to skip
7.4.1
7.5.1
Average analysis 288-290
7.7.6 linear expect time for selection
Skip in chapter 6
6.4.1
6.5 D heap
6.7 Skew heaps
Array of size n, n-1 passes/steps
N swaps per pass
Shell sort
You go to the first element (N*0), then the nth (N*1) then the n*x element and just
compare them
Then start at 2nd element and compare insert sort with nth element,
…til the nth element, then you switch to a smaller n and start at the beginning
For example n starts as 5, then 3, then 1
Fuck there was some sort of algorithm for picking increments and I missed it
Minheap
Only takes logN to perform deleteMin and then u put the sorted ones into a new
array
Merge sort
Assume arrays are length power of 2
recursive
Basecase
When it's down to two things to sort
Sort one half, sort the second, merge
length Time (# of steps) to do merge sort
T(1)= 1
T(N) 2T(n/2) + N
T(N)/N = 2T(N/2)/N + N/N
T(N)/N = T(N/2)/(N/2) +1
T(N/2)/(N/2)= T(N/4)/(N/4) + 1
T(N/4)/(N/4)= T(N/8)/(N/8) + 1
… …
T(2)/2 = T(1)/1 +1
T(N)/N= T(1)/1+logN
N(1) + N(logN)
N +NlogN
Quick Sort
Worst case O(N2)
Average case O(NlogN)
Algorithm
If # of elements in array is 0, do nothing (base case)
Else choose pivot element v in the array
Partition the array S-{v} into two disjoint sets S1 S2
S1={x in S-{v}|x<=v}
S2={x in S-{v}|x>=v}
Return Quicksort (S1 cat v cat S2)
Pasted from <http://www.mycsresource.net/articles/image?id=32>
Worst case of QuickSort T(N)
Pivot is first element and array is already sorted
T(N) = T(i) + T(N-i-1)+cN
i = S1
j =S2, N-i-1
T(N)=T(N-1)+cN
T(N-1)=T(N-2)+cN
T(N-2)=T(N-3)+cN
… telescoping
T(2)=T(1)+c2
Sum all up and cancel out stuff
T(N) = T(1) +c Σni=2 i
= O(N2)
Best case
Pivot in middle
T(N) = 2T(N/2) +cN
T(N)/N=(2T(N/2))/N + c
T(N)/N = T(N/2)/(N/2) +c
T(N/2)/(N/2)=T(N/4)/(N/4) +c
…
T(2)/2=T(1)/1+c
Sum up and cancel stuff
…
T(N)=N+cNlogN
O(NlogN)
Indirect sorting
Instead of an array of shit, which requires copying of the thing when you insert it, use
pointers so you don't need to do that saves disk reads
Cycle thing
Assign the one you are replacing to a temp variable, point the other one to its place or
something
Proof that sorting in general can never be better than Ω(NlogN)
Bucket Sort
No key will be bigger than > M
N number of items
Simply putting the key into its equivalent index, no comparisons
Something about a<b<c<d
External sorting
Merge like a bamf
Chapter 9
Monday, April 14, 2008
11:07 AM
Graphs
Defined by G
Two sets, V vertices and E is a set of edges
An edge is a pair (v,w) where v,w are both in V
Edges:
Directed/undirected
If directed, every vertex has a path to any adjacent vertex
undir. u v w x y z
u T T T T
v T T T
w T T T T
x T T T
y T T T T
z T T
Weighted/not weighted
Adjacency list
u v,w,x
v y,x,y
w u,y,z
x u,v
y v,w,x
z w
Graph algorithms
Huffman Encoding
"how much wood would a woodchuck chuck"
Each char is one ascii, one byte
Variable length codes
Each char is a symbol. We want to use fewer bits to encode symbols that appear more
frequently.
"this is an example of a huffman tree"
Pasted from <http://en.wikipedia.org/wiki/Image:Huffman_tree_2.svg>
Then u assign a value to the left and right children, say 1 for right, 0 for left
Char Freq Code
space 7 111
a 4 010
e 4 000
f 3 1101
h 2 1010
i 2 1000
m 2 0111
n 2 0010
s 2 1011
t 2 0110
l 1 11001
o 1 00110
p 1 10011
r 1 11000
u 1 00111
x 1 10010
Different types of algorithms
Greedy
Divide and conquer
…
All derive from certain problems
Graph coloring problem
G=(v,e)
N vertices
K colors
You wan to color it so no contiguous vertices are the same colors
Optimization problem
Smallest number of colors
Search space
Heuristic
Likely to give a solution close to optimal in a reasonable amount of time as N gets
bigger animal
Brute force
Kn possible colorings
Complete graph
Every vertex has an edge to every other vertex
Circuit
Having to go through every vertex and come back
Random media
Wednesday, April 16, 2008
11:30 AM
Bitmap
Has a header with a bunch of shit in it, including a color table. The max colors is 16 million
something. You can do a simple 256 color table
Each pixel (a total height pixels * length pixels) has a R, G, and B aspect, each is one byte, 8
bits
For converting from more colors to less
Popularity algorithm
Pick the 256 (or however many) most popular colors and use that for your
color table
Uniform partitioning
I don’t really understand it, but it gives other colors a better chance
Bad if a whole bunch of different colors in one block that it chooses only one
color to represent it
Final Exam
Monday, April 28, 2008
11:06 AM
Saturday at 2pm
Cumulative
Study old tests
Chapter 9 stuff that we weren't tested on
Topological sort
What ever has the least number of pre-req, enqueue, dequeue the first and see
what edges it changes, enqueue more if necessary
Dijkstra's algorithm
If edge.dv> current.dv+weight then change edge.dv, set to known
Huffman encoding
More frequent have smaller codes
Graph coloring
No contiguous vertexes can have same color, find min colors
Will NOT contain indexed color stuff