0% found this document useful (0 votes)
33 views16 pages

Comp 2123 Notes

The document discusses algorithms and data structures. It covers algorithm efficiency, big-O notation, abstract data types like lists and linked lists. It then discusses trees, binary trees, binary search trees, priority queues, heaps, and maps. Key data structures covered include lists, linked lists, stacks, queues, trees, binary search trees, heaps, and maps. The document provides information on their implementations and time complexities of common operations.

Uploaded by

maxharper26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views16 pages

Comp 2123 Notes

The document discusses algorithms and data structures. It covers algorithm efficiency, big-O notation, abstract data types like lists and linked lists. It then discusses trees, binary trees, binary search trees, priority queues, heaps, and maps. Key data structures covered include lists, linked lists, stacks, queues, trees, binary search trees, heaps, and maps. The document provides information on their implementations and time complexities of common operations.

Uploaded by

maxharper26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data and algorithms

Algorithm E ciency
• E cient if it runs on polynomial time
- For n elements, it preforms less than p(n) steps
- However hard to determine exactly what p(n) is
• Big-Oh notation
- T(n) = O(f(n)) if there exists a n0 such that T(n)<= cf(n) for n>n0 (c>0)
- T(n) = Ω(f(n)) if there exists a n0 such that T(n)>= cf(n) for n>n0 (c>0)
- T(n) = Θ(f(n)) if T(n) = O(f(n)) and T(n) = Ω(f(n)

For a double
‘for loop’

Using upper
bound so j-i
will happen at
most n times

Can be
improved by
using less
loops and
more +,-,/
ffi
ffi
Week 2

Abstract Data Types


• Using interfaces and inheritance to make coding easier
• Lists
- Support size,isempty,get,set,add,remove
- Allows random access
- Runs on O(n) time for. Operations e.g.insert
• LinkedLists
- Head -> nodes->nodes…->null
- E cient insertion and deletion - O(1)
- No maximum capacity
- Simple behaviour as list grows
- Algorithm tips
• Can reverse entire next commands in O(n) time to
traverse backwards

• Stack
- Uses LIFO (last in rst out)
- Supports push, pop, top,size,isempty
- O(1)
• Queue
- Uses FIFO ( rst in rst out)
- You can remove rst element and add to end
- Enqueue,dequeue, rst, size, isempty
- Algorithm tips
• Can change enqueue to support sum( ) or max( ) commands
ffi
fi
fi
fi
fi
fi
Week 3

Trees
• Inheritance hierarchy
• Terminology
- Root = node without parent (only 1, no parents, all tree = its descendants)
- Internal node = node with at least one child
- External/leaf node = no children
- Ancestors = has grand-children
- Descendants = has grand-parents
- Siblings = same parent
• Concepts
- Depth of node = number of ancestors (so root is depth 0)
- Level = a given depth of siblings
- Height of tree = maximum depth+1 (generations )
- Subtree = a branch
- Edge = a direct parental link
- Path = a link in a tree which involves an edge
- Lowest common ancestor of (x,y) = deepest ancestor of both with no children
• Operators
- size( )
- Is empty( )
- Iterator( )
- Positions( )
- root( )
- Parent( )
- Children( )
- numchildren( )
- Isinternal( ), isexternal( ), isroot( )

Traversing
Pre-order traversal visits the root rst post-order traversal visits root last
preorder(x)
visit(x)
for each child of x
preorder(x)

Binary tree
• At most 2 children (right or left child - left rst )
• Proper binary tree = all 2 children
• leftchild( ), rightchild( ), sibling( )
• Inorder traversal goes all the way left (deals with it all), then back to node
fi
fi
Week 4 - Binary search trees

• Stores keys —> values


• Keys are done in an in-order traversal
• External nodes don’t store items

Insert(key,newvalue)
search(key)
if key isfound:
key.value=newvalue
if key isnotfound:
external_node_found.child =
newvalue

delete(key)
search(key)
if key has 1 external child
delete key and external child
set key.internal_child.parent to
key’s parent
if key has 2 external children
nd next internal node (inorder)
key.value = next internal node
next internal node.leftchild dies
Get, put, remove all in O(height) next internal node.rightchild =
left child of parent of next
internal node

Range Querys

Rank trees O( height + | output | )


• rank(v) limits subtree
• Makes for more balanced trees
• AVL tree
- r(v) is the height of v’s subtree
- The ranks of siblings cannot di er by more than 1
- Height, searching insertion and deletion are all O(logn)
- Insertion is normal besides sometimes having to do trinodal restructuring
- Also constantly updates height variable for each node
fi
ff
Week 5

Priority Queue
• Similar to a map or dictionary
• Operations
- insert(key, value)
- Remove_min( ) - removes smallest key
- min( )
• Can be sorted or unsorted
- Unsorted has O(1) insertion but O(n) removal
- Sorted has O(n) insertion but O(1) removal
Sorting methods
• Priority Queue sorting: Remove n min elements in
O(n) time -> O(n2)
• Selection sorting
- For each i, nd minimum ahead of it and switch
• Insertion sorting
- For each i, if next entry is smaller then switch, and with the smaller of the switch: head
backwards and switch successively to nd right place

Heap data structure


• A binary tree except key( node) >= key (node.parent)
• Last node = rightmost node on maximum depth level
• Insertion
- Start at last node, go up until a left child or root is found
• If a left is found, go to right sibling and down left until external
• if the root is reached, go right to right sibling of root and then down left until external
• This is all O(log(n))
- Then iterate upwards switching keys with successive parents until the heap property is
satis ed = O(log(n)) because height is log(n)
• Remove_min
- Switch last node and root
- Do a downwards iteration to restore heap property picking lowest path of children
- O(logn) becuase could do a whole height iteration

Heap sort
• Use heapify function to convert an array to a heap in O(n)
• then remove_min n elements so heap sort takes nlogn
• Implementation of heapify
- Root is index 0
- Last node is index n-1
- For node i
• Left child at 2i+1
• Right child at 2i + 2
• Parent at (i-1)/2
fi
fi
fi
Week 6

Maps
• Key-value
• Operations
- Get
- Put - O(1) if nonexistent , O(n) if it has to be found and replaced
- Remove -O(n)
- Size
- Keys, Values
Implementations
• Unsorted list
- Put is O(1) if nonexistent , O(n) if it has to be found and replaced
- Remove, get are also O(n)
• Restricted keys
- Put in array of size N, so index = key
- Makes get, put, remove O(1)
- However is ine cient e.g. for 9 digit ID then N = 109
- Can’t do string keys
• Hash
- Key -> hash function -> h(key) -> index in array
- Used for a xed array size N
- Usually uses mod(N), e.g. key * modN
• h(key) involves hashcode (converting key to int) then compression (int to index)
• Probability that 2 keys collide is 1/N so then E(collisions) = n/N
• This is called a universal hash function
- Random linear hash function
• h(k) = (( ak+b) mod p) mod N where p is prime and a,b < p, p > n
• Still = 1/N
- Other options include sum of all keys so far, a polynomial involving all keys (which can be
evaluated in O(n) time

Collision Handling
• Separate chaining
- Makes all array entries a separate linked list of records that have
same hash function value
- E(collisions) is still n/N = α = loading factor
- This makes put O(1 + α) as you might have to add α things in
though a linked list
- Worst case is O(n) if all hash values collide
• Open addressing and linear probing
- Chuck collisions in next available cell
- However clusters can collide and you have a linearly probe a longer list of cells
- Search
• Go index of key
• If index.key = search_value then return value
• Else go forward until it matches
- Get, insert and delete
• Get skips over ‘DEFUNCT’ cells
• Remove removes the element but leaves a ‘DEFUNCT’
• Put can replace ‘DEFUNCT’ or null cells to put
• Can run at worst in O(n)
• Expected probes = 1/(1-α) so if α<1, then it runs in O(1)
• Cuckoo hashing
- Uses 2 hash arrays
- Get and remove are O(1) as you just check both arrays
- Put: you nd the location and evict the resident, then put the resident in the other hash
- This can create an eviction cycle _. Can be solved by cutting and rehashing into larger tables
- Expected time of n put operations is O(n)

Sets are just maps without duplicates so all O(1)


Multi-sets support duplicates where values are the number of occurrences
fi
fi
ffi
Week 8 - graphs

Compositions
• V is a set of nodes
• E is a set of edges
- Can be directed (from A to B)
- Or undirected (2 way)
Terminology
• Undirected
- Edges connect endpoints
- Edges with same endpoint are incident
- Connected vertices are adjacent
- Degree is the amount of edges a node has
- Parallel edges share same 2 vertices
- Self loop only has 1 vertex
- Simple graphs have no parallel or self loops
• Directed
- Head and tail
- Out-degree and in-degree = amount in and out
- Parallel edges have same head and tail
- Simple graphs have no self or parallel, also anti-parallel where 2 edges have reverse head
and tail
• Path
- A path along edges
- Simple path has only distinct nodes (no repeat visits)
- A cycle is a path that starts and ends with same vertex
• simple cycle is same logic
• An acyclical graph has no cycles
• Subsets
- Subset of a graph
- Can be G = ( vertsubset, edges(vertsubset))
- Or G = ( verts(edgesubset), edgesubset)
• Connectivity
- Connected if there’s a connection to each vertex
- Can have connected subsets
• Trees and forests
- A tree is a graph that is connected but accylical
- A forest is a collection of disconnected trees
- A tree with n vertices must have n-1 edges
- A spanning tree is a connected subgraph of a graph that is a tree
- Can also have spanning forests
Properties
• Letting m = number of edges, n = number of vertices
• Sum o all vertices degrees = 2m
• In a simple undirected graph: m =< n (n-1) / 2
• In a simple directed graph: m =< n (n-1)

Data Structure
• getallvertices( )
• Getalledges( )
• getedge( nodeA, nodeB)
• getvertices( edge)
• outdegree( ) and indegree( ) or degree( )
• outcomingedges (node)
• insert and delete for both vertex and edge
ff
EdgeList structure: Each edge just contains the 2 vertices it connects
Adjacency list: also contains a list of all incident edges on each vertex in each vertex
- Can be stored as an adjacency matrix

Traversal techniques
Depth rst

-Essentially push as far as


possible and fall back if dead
end
-Visits each node then DFS_visit
visits each incident edge
-Visits each node = O(n) and
eventually does the sum of all the
degrees = O(m) so = O(m+n)
-Recursively each node and
parent form a spanning tree in a
connected subgraph, this can
work across a whole graph to
form a spanning forest

Cut edges
• If an edge is removed, it disconnects a graph
• Essentially use DFS, remove an edge and see if graph still connects

Breadth rst traversal


• Visits all vertices at a set distance k from start vertex, before going k+1 distance
• Same logic applies for O(m+n)
fi
fi
Week 8 - graph algorithms

Weighted graphs
• Each edge has an associated numerical value representing something

Shortest Path algorithms


• Dijkstra’s

-Puts all vertices in a priority queue


with D values of in nity and 0 for s,
and removes min each iteration
-This begins with s
-Go through this minimum value’s
neighbours and nd if it can be
reached quicker than previous through
the minimum value

The while loop is O(deg(u)) for each u


= O(2m) = O(m)
This operates on a priority queue or
heap, which can do decreasekey in
O(logn) time
So overall = O(mlogn)
(+ 2nlogn insert and removemins)

• Can use a bonacci heap which is O(1) for decreasekey, so = O(m + nlogn)
• May no work for negative edge values

Minimum spanning tree


• Cutset
- For any group of nodes, the cutset has all nodes adjacent to any member of the group
- The minimum weight edge in the cutset, must be in the spanning tree

Prims algorithm
Essentially gets a cutset for each node individually

Sets d to in nity, but then with each neighbour of a


node u, if the new path is shorter than the last to the
neighbour, then it updates this nodes’s parent to u

Same O(mlogn) or O(m + nlogn) complexity


fi
fi
fi
fi
Kruskals algorithms

Add in minimum weight edge unless it creates a cycle sequentially


• Works by cycle principle: largest edge in any cycle won’t be in minimum span tree
• Sorting edges take O(mlogm) + checking if a cycle forms m times = O(mn)
• Can be improved by. Keeping track of all edges already sorted
- Set structure
• Find(element) - returns the set which an element is in O(1)
• union(seta,setb) - joins them O(n)
• Makesets (list) - puts all elements into an individual set O(n)

This exclusion property stops double


counting and means union is only called n-1
times = O(n^2) vs O(mn)

Based on assumption that all edges. Are distinct so can add i/n^2 to each
Greedy Algorithmns
• You build a solution by continually taking locally optimal choices
• May not be best solution

Fractional knapsack
• Choosing the optimal combination of items with a bene ted a weight (repetitions allowed)
• Weight is however bounded
• So sum of xi (amount of times item i is used) x wi < W
- Implying that sum of xi <W
- And that 0<=xi<=wi -> this cap stops too many repetitions

Nlogn to sort
N to go through so = O(nlogn)

Interval partitioning
• Like classroom scheduling
• Find maximum depth

O(nlogn)

Text Compression - Hu man encoding


• Each letter has a code
• Encode high frequency with shortest code
• The encoding can be done from a tree
- Constructed by continually removing the minimum frequency tree and combining it
- This then creates an encoding tree prioritised by frequency where each level is the amount of
bits for encoding, hence bits = sum of frequency(char) x depth(char)
- Time complexity dominated by priority queue = nlogn for n unique chars
ff
fi
Divide and conquer

• Dividing a problem up into smaller segments

e.g. Binary searching in sorted list

Merge-sort
• Halve array, sort each then recombine

Merge
consecutively inserts the smallest uninserted item

Quick sort

=O(nlogn)
Divide and conquer 2

Maxima-set

Integer multiplication of n digit numbers


• For naive approach = O(n^2)

Essentially multiplies 3 n/2 digit numbers

Can use to get time complexity, depends on logba and the power of f(n)
Finding the kth smallest integer in an unsorted n-list
Randomisation

Randomized algorithms are algorithms where the behaviour doesn’t depend solely on the input. It
also depends (in part) on random choices or the values of a number of random bits.

Used to randomise a list, each permutation has equal


probability of 1/n!

Skiplists
• Like a map
• Uses get,put,remove ect

Insert remove and search all


O(logn) bc height is O(logn) due to
insertions being a function of 2
based o the randomised coin
ips
fl
ff

You might also like