0% found this document useful (0 votes)
5 views

Data Structures and Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Structures and Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data structure

Provides organization of mathematical and logical concepts of data

Factors to consider in choosing a data structure

What kind of information will be stored?


How will that information be used?
Where should data be kept?
What is the best way to organize the data?
What aspects of memory and storage reservation management should be considered?

Algorithm

An ordered-sequence of well-defined finite instructions (in processing data)

Part of the Design phase in the Software Development Cycle

Wrapper Class

Java defines a wrapper class for each base type.

Wrapper classes provide a way to use primitive data types as objects.

Java provides additional support for implicitly converting between base types and their wrapper types through a process known as
automatic boxing (primitive type to object-reference type) and unboxing (object-reference type to primitive type)

Integer a = 5; // boxing - integer 5 is wrapped inside Integer 'a'


int b = a; // unboxing - Integer 'a' is unboxed and assigned to int'b'
int k = Integer.parseInt("34") // using static method parseInt() from Integer class

Note: any two numerical wrapper classes cannot be assigned to each other, even if it is allowed in primitive type data.

Integer a = 5;
Double x = a; // Not valid
Double y = 4; //Not valid (it must be 4.0)
double m = 2; Double n = m; // valid

Bitwise Operators

Java provides the following bitwise operators for integers and Booleans

~ bitwise complement (prefix unary operator)


& bitwise and
| bitwise or
^ bitwise exclusive-or
<< shift bits left, filling in with zeros
>> shift bits right, filling in with sign bit
>>> shift bits right, filling in with zeros

Bitwise Operators
Shift Left
<< Operator.

Shifts a binary number such as 1010 to the left with 0 padding so that it becomes 10100

Maximum of 32 bits, otherwise causes overflow

Shift Right

Maximum of 32 bits, otherwise causes overflow

With Sign-Bit Padding

>> Operator
Shift to the right (so add a bit to the left) dependent on the Most Significant Bit.

Example 1:
1010 >> becomes 11010

Example 2:
0101 >> becomes 00101

With Zero Padding

>>> Operator
Shift to the right (so add a bit to the left) independent of the Most Significant Bit (always add a 0 as MSB)

Example 1:
0101 >>> becomes 00101

Example 2:
1010 >>> becomes 01010

Recursion
Recursion

A top-down technique that repeats itself until a base case, in terms of a simpler version of itself is satisfied.

Disadvantage: Always takes up more memory (memory hungry process) than the Iteration approach, but could be faster to design

View Recursive Methods to see more info

Iteration

A bottom-up technique used to run a block of code repeatedly until a specific condition no longer exists. Normally performed with
loops

Recursive Method

A method that calls itself.

Recursive solutions use memory repetitively by

allocating memory for parameters and local variables


storing the address of where control returns after the method terminates in the Random Access Memory (RAM)
These actions are called overhead and take place with each method call.

The depth of recursion is the term used for how many times the recursive method calls itself.

Generic Types
Generic Class

A generic class can be created to generate a family of classes

Syntax:

Follow class name with any identifier enclosed in angle brackets public class MyClass <T>
Here MyClass is a generic class, and the identifier T is a class reference place-holder, which will be filled out by an existing
class reference

Key benefit of generics is to enable errors to be detected at compile time rather at runtime.

Generic Type Method

Generic methods introduce their own type parameters. This is like declaring a generic type, but the type parameter's scope is limited
to the method where it is declared.

Static and non-static generic methods are allowed, as well as generic class constructors.

The syntax for a generic method includes a list of type parameters, inside angle brackets, which appears before the method's return
type. For static generic methods, the type parameter section must appear before the method's return type.

public <T> void myMethod(T o1, int x);

Static must appear before the Type if you are trying to make as static type method

public static <T> void myMethod(T o1, int x);

For-Each Loop

Java
Used to traverse a complete array sequentially from index 0 to the end, without using an index variable.

for (dataType variable_Name: array: array_Name) {


// Code here
}

JavaScript
Executes a provided function once for every element in an array. Looks something like so:

array.forEach(element => console.log(element));

Bounded Generic Type

A generic type that can be specified as a subtype of another type.

Example:
<T extends Circle> Specifies that T is a generic subtype of Circle class
<? extends T> a bounded wildcard which specifies T or any subtype of T
<? super T> a lower-bounded wildcard which specifies T or any super-type of T

Unbounded Generic Type

<?> is an unbounded wild card same as <? Extends Object>

<T> is an unbounded generic type same as <T Extends Object>

Restrictions on Generics

1. Cannot create an instance of a generic type


2. Generic array creation is not allowed
3. A generic type parameter of a class is not allowed in a static context
4. Exception classes cannot be generic

ArrayList

ArrayList is a generic class with a header: Class ArrayList<E>

An ArrayList class allows object-storage; not primitive-type data storage

An ArrayList dynamically expands or shrinks depending on items being added or removed.

It must be imported as so:


import java.util.ArrayList;

Creating an ArrayList
Here is an example of creating an ArrayList of strings

ArrayList<String> anyNameList = new ArrayList<String>();

The default capacity of an ArrayList is 10 items

ArrayList Methods

All ArrayList methods are done like so:


ArrayListName.method_goes_here()

add(Data) Adds a data-item at the end of the array list


add(index, Data) Adds a data-item at the indexed location
size returns the number of items in the Arraylist
get(index) accesses the data-item at an index of an array-list
remove(index) removes a data-item from an index of an array-list
set(index, Data) replaces the existing data-item at the indexed location
contains(Data) returns true if the array list contains Data , otherwise returns false
toString() returns a formatted String form of the array list, however this can be accessed just by writing the name of the list.

Generic Types Within an Interface


Consider an interface Pairable that declares this method

public interface Pairable < S >


{
public void setPair (S firstItem, S secondItem)
} // end Pairable

A class that implements this interface could begin with the following statement:

public class OrderedPair < T > implements Pairable < T >

Comparable Interface

Method compareTo compares two objects, say x and y, and returns a negative integer if x < y, positive if x > y, and 0 if x == y

package java.lang;
public interface Comparable<T>{
public int compareTo(T other);
}

Exception
Error

Syntax Error
Linker Error
Runtime Error
Logic Error

Exception

An event that represents a runtime error or run-time condition that presents the program to execute normally.

In Java, exceptions are objects that are generated by a code that encounters an unexpected situation.

Default Exception Handler


Exception-Handling

When reading exception errors, the last line of the error is all that matters

Arithmetic Exception

A type of Exception which occurs when you attempt to divide by 0

NaN

When both operands are 0 and one or both are non integers (for division), NaN is the output

Null Pointer Exception

When you try and access a variable which points to Null (or nothing)

Exception Handling
An effective mean to handle runtime errors so that the regular flow of the application can be preserved. This includes declaring,
throwing, and catching an exception.

Checked Exception

The compiler makes Exception Handling mandatory; the code will not compile if the exception is not checked.

Declaring Checked Exceptions

Every method must state the types of checked exceptions it might throw. This is known as declaring exceptions

Example:

public static void main(String[] args) throws IOException {


// Code Here
}

Unchecked Exception

Exception Handling is optional

A Null Pointer Exception is thrown if you try to access an object through a reference variable before an object is assigned to it

Try Catch

The general methodology for handling exceptions.

If it throws an exception, then that exception is caught by having the flow of control jump to a predefined catch block that contains
the code to apply an appropriate resolution.

If no exception occurs in the guarded code, all catch blocks are ignored/skipped.

try {
guardedBody
} catch (exceptionType1 variable1) {
remedyBody1
} catch (exceptionType2 variable2) {
remedyBody2
}...

Fundamental Data Structures


Data structure

Provides organization of mathematical and logical concepts of data

Factors to consider in choosing a data structure

What kind of information will be stored?


How will that information be used?
Where should data be kept?
What is the best way to organize the data?
What aspects of memory and storage reservation management should be considered?
Linear Data Structure

Sequential, going from one value, to the next, to the next, etc

Each element is linearly connected to each other having reference to the next and previous elements.

Wastage of memory is more common in comparison to Non-Linear Data Structure

Stacks, Queues, Arrays, and OG Vault/Year 2/Semester 1/Data Structures and Algorithms/Definitions/Linked Lists are all examples of
linear data structures

Non-Linear Data Structure

Data items in a non sequential order

Elements are connected in a hierarchical manner

Implementation is more complex as multiple levels are involved, and memory is consumed wisely and there is almost no wastage of
memory

Graphs or Trees are examples of non-linear data structures

Homogeneous Data Structure

Data items in each repository of the data structure are of the same type

Heterogeneous Data Structure

Data items in each repository of the data structure are of various types

Static Data Structure

A data structure that has fixed sizes and memory locations at compile time

Dynamic Data Structure

A data structure that has dynamic sizes and memory locations which can shrink or expand, depending on use.

Array

Concrete data structure that represent a collection of the same type of data items which are stored in consecutive memory locations.
These are Static Data Structures and the size of an array is constant.

Java

Declared like so:

datatype[] arrayRefVar = new datatype[size];


// OR
datatype[] myArr = datatype[]{val1,val2,val3}

The initial values for all the array spaces would be the default values
Linear Search

Search es an Array for a key element by looking through each element in the array sequentially. This has a Time Complexity of
O(n).

This returns −1 if the key element is not found within the array.

Stack

A collection of elements, inserted and removed according to the last-in, first-out (LIFO) principle.

The elements can be inserted or deleted only from one side of the list. The insertion operation is called "push", and the deletion
operation is called "pop"

To keep track of the last element in the list, a pointer called "top" is defined.

Queue

A collection of elements inserted and removed according to the first-in, first-out (FIFO) principle

Unlike Stacks a queue is open on both ends. One end is used to insert data (add) and the other end is used to remove data (remove)

Methods

Enqueue
Dequeue

Circular Queue

The rear position is connected back to the front position of an array to make a circle.

Linked List

A data structure where the objects are arranged in a linear order. It contains a pointer to the next item in a field called "next" and a
value at each node.

Doubly Linked List

Similar to a OG Vault/Year 2/Semester 1/Data Structures and Algorithms/Definitions/Linked List, but has three values at each node:
previous (pointer to previous node), next (pointer to next node), and value (value at that node). It is easier to traverse back and forth
on a doubly linked list.

Nested Class

A class that is declared inside a class or interface. Scope is limited to use within the outer class.

Java Collection Framework and Iterators


Abstract Data Type

Represents the ways of organizing data in the computing environment. An ADT provides a specification of

A set of data that is stored


A set of operations that can be performed on the data
ADT derives the name from the sense that it is independent of various concrete implementations and so its an entity with some
abstract methods

Data Structures are realized by implementing ADTs within a programming language

Collection

An ADT that contains a group of objects/items

Container

A class that implements the collection

Collection Interface

The root interface for manipulating a collection of objects. All the methods are abstract.

Iterator Interface

An ADT that, when implemented, scans a sequence of elements, one element at a time.

It can be asked to advance to the next entry, give a reference to the next entry or modify the list as it passes through.

It has a public boolean hasNext(); method which detects whether the iterator has finished iteration or not returning true if it has
not.

A public T Next (); method which retrieves the next entry in the collection.

And a public void remove(); method which removes from the collection of data the last entry that next() has removed. Throws
IllegalStateException if next() is not called before the call for remove() and throws UnsupportedOperationException if this
iterator does not permit a remove operator.

Hashing

A technique that determines a storage index or location for storing an item in a data structure.

The has function receives the search key and returns the index of an array where the search key is stored. A perfect hash function
maps each search key into an index of the hash table, which saves time in searching or accessing any data item.

Set

Extends the collection interface and does not contain duplicate element.

AbstractSet

Extends AbstractCollection, implements toString() and implements Set.

Provides concrete implementations for the equals() and the hashcode() methods

The hash code of a set is the sum of the hash codes of all the elements in the set

Since the size() and iterator() methods are not implemented in the AbstractSet class, AbstractSet is an abstract class
HashSet

A concrete class that implements set

Can be used to store duplicate-free elements, and objects added to a hashset need to implement the hashCode() method in a
manner that properly disperses the hash code.

LinkedHashSet

A HashSet that imposes an order where we have a head of a tail.

SortedSet

Sub-interface of Set which guarantees that the elements are sorted

TreeSet

Concrete class that implements the sortedset interface. You can use an iterator to traverse the elements in the sorted order.

One can add objects into a TreeSet if they can be compared with each other. There are two ways to compare objects:

Using the Comparable Interface


If we are using a new class, whose elements are not instances of Comparable, we need to implement the Comparable Interface
that uses the compareTo() or compare() method for that class. The approach of using comparator method from Java Collection
Framework is referred to as order by comparator.

Comparator Interface

Used when we want to insert elements of different types into a treeset. The comparator interface has two very useful methods,
compare() and equals() .

List Interface

A List allows to store duplicate elements to be stored in a collection, and can grow infinitely. The Interface inherited by ArrayList and
Western Tech Alumni/Notes/Week 2/Linked List.

Priority Queue

Elements are assigned priorities. When accessing elements, the element with the highest priority is removed (served) first. Priority is
given to the smallest value.

It compares the elements according to the natural order using the comparable interface.

ListIterator Interface

An alternative interface for iterators in the java class library that enables traversal in either direction.

Map ADT, Collections Class, and Arrays Class


Map Abstract Data Type

An ADT which models a searchable collection of key-value entries.


The main operations of a map are for searching, inserting, and deleting items. Multiple entries with the same key are not allowed, but
having multiple entries of the same value are allowed.

Some Functionalities

M.get(k); if the map M has an entry with key k, return its associated value; else, return null
M.put(k, v); insert entry (k, v) into map M; if key k is not already in M, then return null; else, return old value associated with k
M.remove(k); if the map M has an entry with key k, remove it from M and return its associated value; else, return null
size() , isEmpty() Returns the number of key-value pairs, or a boolean on if it is empty or not
entrySet() return an iterable collection of all of the entries in M
keySet() return an iterable collection of the keys in M
values() return an iterator of the values in M

Map Interface

Maps keys to the data elements. The keys are like indices in a List, where the indices are integers. In Map, the keys can be any
object.

Map.Entry Interface

The entrySet() method in the Map interface returns a set of objects that implement the Map.Entry<K, V> interface where Entry
is an inner interface for the Map interface.

Each object in the set is a specific key-value pair in the underlying map.

Load Factor

It is the measure that decides when to increase the capacity of the Map. The default load factor is 75% of the capacity. The threshold
of a HashMap is approximately the product of current capacity and load factor. A HashMap with the default initial capacity (16) and
the default load factor (0.75) has the threshold of 0.75 ⋅ 16 = 12, which means that it will increase the capacity from 16 to 32 after the
12th entry.

HashMap

Efficient for locating, inserting, and deleting a mapping value.

TreeMap

Implements SortedMap, and is efficient for traversing the keys in a sorted order. Maintains the mapping in ascending order of keys

LinkedHashMap

Extends HashMap with a Western Tech Alumni/Notes/Week 2/Linked List implementation that supports an ordering of the entries in
the map. The entries in a HashMap are not ordered, but the entries in a LinkedHashMap can be retrieved in the order in which they
were inserted into the map, or the order in which they were accessed from the first accessed one to the most recently accessed one.

Insertion Order

In which the entries were inserted in the linked hash map

Access Order
In which the entries were accessed. In this case, if some of the entries are not accessed, then those will be ordered according to the
insertion-order first and then the accessed ones will be ordered according of the access-order.

Collections Class

Contains various static methods for operating on collections and maps, for creating synchronized collection classes, and for creating
read-only collection classes

A synchronized collection implies that the class is thread safe

Collection classes are not synchronized by default


The collection object is mutable that means once an object is calling two threads at a time, but one thread is changing the value
of the object then it can be affected by another object. So it is not thread safe
A thread is the path followed when executing a program
A single-threaded application has only one thread and can handle one task at a time
To handle multiple tasks in parallel, multithreading is used: multiple threads are created, each performing a different task

Arrays Class

The Arrays class contains various static methods for sorting and searching arrays, for comparing arrays, and for filling array with
data-elements

It also contains a static method for converting an array to a list (Arrays.asList())

Introduction to Algorithms
Algorithm

An ordered-sequence of well-defined finite instructions (in processing data)

Part of the Design phase in the Software Development Cycle

Divide and Conquer

An algorithm to repeatedly divide a problem to one smaller subproblems (normally recursively) until the subproblems are small
enough to be solved easily, and then combines the solutions to the subproblems to solve the original problems.

Decrease and Conquer

A simpler variant of divide and conquer that solves an identical subproblem and uses the solution of this subproblem to solve the
bigger problem.

Dynamic Programming

An algorithm which avoids re-computing solutions that have already been computed.

It constructs optimal solutions to a problem from optimal solutions to subproblems.

The main difference between dynamic programming and divide and conquer is that the subproblems are independent in divide and
conquer, whereas subproblems overlap in dynamic programming.

Greedy Algorithm
Like a dynamic programming algorithm, but the difference is that the solutions to the subproblems do not have to be known at each
stage; instead, a "greedy" choice can be made of what looks best for the moment.

The greedy method extends the solution with the best possible decision at an algorithmic stage based on the current local optimum
and the best decision made in a previous stage.

It is not exhaustive and does not give accurate answer to many problems

It does not guarantee to find the optimal solution for any given problem

Linear Programming

Informally, it determines the way to achieve the best outcome in each mathematical model.

When solving a problem using linear programming, specific inequalities involving the inputs are found and then an attempt is made to
maximize some linear function of the inputs

You can use linear programming only if there is a linear relationship between the variables you are looking at

Linear programming is a mathematical technique that determines the best way to use available resources

Search and Enumeration

A trivial but very general problem-solving technique that consists of systematically enumerating/itemizing all possible candidates for
the solution and checking whether each candidate satisfies the problem's statement

Backtracking

General Search Algorithm for finding all solutions to some computational problem that incrementally builds candidates to the
solutions, and abandons each partial candidate c ("backtracks") as soon as it determines that c cannot possible be a valid solution

Probabilistic Algorithm

An algorithm that make some choices randomly for some problems, it can in fact be proven that the fastest solutions must involve
some randomness

Genetic Algorithm

An algorithm that attempts to find solutions to problems by mimicking biological evolutionary processes, with a cycle of random
mutations yielding successive generations of "solutions"

Heuristic Algorithim

An algorithm whose general purpose is not to find an optimal solution, but an approximate solution where the time or resources are
limited. They are not practical to find perfect solutions.

Algorithm Efficiency
Time Complexity

The time it takes to run/execute an algorithm.

Space Complexity
The memory an algorithm needs to run/execute

Algorithm Growth Rate

Constant ≈ 1
Logarithmic ≈ log n
Linear ≈ n
N-Log-N ≈ n log n
Quadratic ≈ n 2

Cubic ≈ n 3

Exponential ≈ 2 n

Searching Algorithms
Linear Search

Search es an Array for a key element by looking through each element in the array sequentially. This has a Time Complexity of
O(n) .

This returns −1 if the key element is not found within the array.

Binary Search

A quick and efficient method that uses the decrease and conquer approach.

Prerequisite: The data set must be sorted

Divides the input collection into approximately equal halves, and with each iteration compares the search-element with the element in
the middle.

If the element is found, the search. Otherwise, it continues looking for the element by dividing and selecting the appropriate portion of
the array

public int search(int[] nums, int target) {


int low = 0;
int high = nums.length-1;
int mid;
while(high >= low){
mid = (high + low)/2;
if(nums[mid] > target){
high = mid - 1;
} else if (nums[mid] == target){
return mid;
} else {
low = mid + 1;
}
}
return -low - 1;
}

Sorting Algorithms
Sorting

The process of rearranging a set of data items in ascending or in descending order based on a key in the data-set.
Stability of an Algorithm

A sorting algorithm is stable if it preserves the order of duplicate keys.

Sorting Algorithm

An algorithm that implements sorting.

Examples:

Bubble Sort (Stable)


Insertion Sort (Stable)
Selection Sort (Unstable)
Shell Sort (Unstable)
Merge Sort (Stable)
Quick Sort (Unstable)
Bucket Sort (Unstable)
Heap Sort (Unstable)

Bubble Sort

A Sorting Algorithm that steps through the data-set from one end to the other, and compares adjacent pairs of elements in each pass.
The elements are swapped if they are in the wrong order. After the first pass, the last element becomes the largest in the array. This
is repeated until the list is sorted completely.

Worst Case Time: O(n 2


)

Best Case Time: O(n)

bubbleSort (array list of size n){


for(i from 1 to n - 1){
for(j from 0 to n-1-i){
if(list[j]>list[j+1]){
temp = list[j];
list[j] = list[j+1];
list[j+1] = temp;
}
}
}
}

Insertion Sort

On each iteration, Insertion sort removes one element from the list, finds the best location and inserts it there. It repeats this process
until no input elements remain to be checked for its correct order.

Worst Case: O(n 2


)

Best Case: O(n)

Is adaptive (efficiently sorts data that is already generally sorted), stable, dynamic (sorts the array as it receives new items), in-place
(requires a consistent, small amount of memory), and simple implementation

insertionSort(array list of size n){


i = 0; j = 0;
for(i from 1 to n - 1){
key = list[i];
for(j = i - 1; j >= 0 and list[j] > key; j--){
list[j+1] = list[j]
}
list[j+1] = key
}
}

Selection Sort

Works on the data set by selecting the smallest data item from the list, and starts building a sorted list on the left end of the list. It
repeats the same for the unsorted portion till the whole list is sorted.

Worst Case: O(n 2


)

Advanced Sorting Algorithms


Merge Sort

Follows the divide and conquer approach. It continuously divides the unsorted list until it reaches n single element sub-lists. Then it
repeatedly merges the sub-lists together to produce new sorted sub-lists until all elements are fully merged into a single sorted array.

mergeSort(a, first, last){


if(first < last) {
mid := (first + last)/2
mergeSort(a, first, mid)
mergeSort(a, mid+1, last)
Merge the sorted halves a[first..mid] and a[mid+1..last]
}
}

Quick Sort

Sorts a sequence S using a recursive approach based on the Divide and Conquer technique:

Divide: If S has at least 2 elements, select a random element x (called the pivot) and partition S into 3 sequences:
L storing the elements less than x
E storing the elements equal to x. If the elements of S are all distinct, then E will hold the pivot only.
G storing the elements greater than x
Recur: Recursively sort the sequence L and G
Conquer: Put back the elements into S , by concatenating L, E, and G

Here is the pseudo code:

Algorithm quickSort(a, first, last){


if (first < last) {
Choose a pivot
Partition the array
Index := index of pivot
quickSort(a, first, Index-1)
quickSort(a, Index+1, last)
}
}

Time Complexity
The worst case for quicksort occurs when the pivot is the unique minimum or maximum element in a sorted data-set.

One of L and G has sizen − 1 and the other has size 0.

The running time is proportional to the sum n + (n − 1) + ⋯ + 2 + 1 =


n(n−1)

Thus, the worst case running time of quick-sort is O(n 2


)

The best and average case is O(n log n)

In Place Partitioning
This is a methodology to implement quick sort without creating two new subsequences

Here are the initial stages of the first partition step of the algorithm:

Choose the pivot and swap it with the last element (array.length - 1) of the array
Use two indices, the left most index l (index: 0) and the right-most index r (index: array.length-2)
In the partition step, index l scans the sequence from left to right, and index r scans the sequence from right to left, until they
cross each other.
A swap is performed when lth element is larger than the pivot and rth element is smaller than the pivot.
Now, the above states are repeated recursively for the resultant partitions

Here is the pseudo-code:

Algorithm inPlaceQuickSort(S, a, b)
Input An array S, index a and index b
Output Array S

if a >= b{
return
}
p := S[b]
l = a
r = b - 1
while l <= r {
while l <= r and S[l] <= p{
l++
}
while l <= r and S[r] >= p {
r--
}
if (l < r){
swap S[l] and S[r]
l++
r++
}
}
swap the elements S[l] and S[b]
inPlaceQuickSort(S, a, l-1)
inPlaceQuickSort(S, l+1, b)

Bucket Sort

This sorting algorithm avoids comparison by creating and distributing elements into buckets according to their radix.

It treats an array of elements as if they are strings of the same length.

Then, it groups element by a specified digit or character of a string; this digit or character is known as key.

Elements placed into "buckets" based on the matching key

For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the order of the
prior step, until all digits are considered.

Here is the pseudo-code:

Algorithm radixSort(a, first, last, maxDigits)


for(i := 0 to maxDigits - 1){
Clear bucket [0], bucket [1], ..., bucket[9]
for(index := first to last){
digit := digit i of a[index]
Place a[index] at end of bucket[digit]
}
Place contents of buckets into array a.
}

Time Complexity
Each key is looked at once for each digit of the data items.

If the longest key has M digits and there are N keys, the radix sort has order O(M × N)

Since the size of the keys is not significant, this is a time complexity of O(N )

Hashing
Hashing

A technique that determines a storage index or location for storing an item in a data structure.

The has function receives the search key and returns the index of an array where the search key is stored. A perfect hash function
maps each search key into an index of the hash table, which saves time in searching or accessing any data item.

Hash Function

The hash function in a data structure maps a key with any arbitrary size of data to fixed-sized integer data called hash/hash
code/hash sum.

hash = hashfunction(key)

The hash/hash code is then compressed into the range of indices for the hash table (an array with map-entry where the index works
as a key and the value is the true value that has been converted to hashcode)
index = hash % array_size
Hash Collision

Typical hash functions are not unique, and normally can allow more than one search keys to map into a single index. This is known
as a collision.

Hash Code

Any search key is converted into an integer value called the hash code, with the aid of a hash function. In this case, primitive type
integer data is used as is. Other primitive-type numerical ones can be casted into int-type and the values can be used as is.

Sometimes, internal binary representations are manipulated when a loss of bits may occur. For example, long (64 bits) and int (32
bits) may result in the loss of bits. In the case, folding process is used.

Size of a Hash Table

If the size of a hash table is an even number, even hash codes will result in even indices and odd hash codes will result in odd
indices.

We resolve the above problem by using a prime number (which is not 1 or 2) as the size of the hash table.

If a hash code returns a negative value, we add n to it after the mod operation to get a value between 1 and n-1.

Open Addressing

Finding an unused, or open location in the hash table is called open addressing.

An open addressing scheme locates an alternate location.

A problem with open addressing is that a hash table may not have any null location; frequent additions and removals can cause
every location in the hash table to reference either a current entry or a former entry (available).

Linear Probing

Resolves a hash collision during hashing by examining consecutive location in the hash table, beginning at the original hash index to
find the next available location.

If a collision occurs at hashTable[k] look successively at location k+1, k+2,... till a free location is found - this is known as probe
sequence.

If the probe sequence reaches the end of the hash table, it continues at the beginning of the table.

Types of Locations in Hash Table

Occupied: The location references an entry in the hash table

Empty: The location contains null and always did. If in linear probing, the search will STOP the moment it encounters null

Available: The location's entry was removed from the hash table. The search will go past this location if the expected entry is not
found in any available location.

Quadratic Probing

Similar to linear probing, however instead of probing at k + 1, k + 2, k + 3, . . . it probes at k + 1 2 2 2


,k + 2 ,k + 3 ...

This reaches half of the locations in the hash table if the table size is a prime number, hence is good for a load-factor of less than 0.5.
With a load-factor of greater than or equal to 0.5, the collision may not be resolved at all, since not all the open slots in the table will
be probed.

Quadratic probing avoids primary clustering, but can lead to secondary clustering.

Double Hashing

Double hashing uses a second hash function, which is different from the first and depends on the search key. Double hashing
produces key-dependent probe sequences while linear and quadratic probe sequences are key-independent.

This resolves collision by examining locations

At original hash index


Plus, an increment (1 for linear probing, j 2
(j ≥ 0) for quadratic probing) determined by 2nd function which is dependent of the
search key.

This reaches every location in hash table if table size is a prime number, and avoids both primary and secondary clustering.

The secondary hash function h 2 (key) is implemented as so:

[h 1 (key) + jh 2 (key)] f or j = 0, 1, ⋯ , n − 1

The secondary hash function cannot have a zero value.

A common choice of function for the second hash function is a compression function:

h 2 (key) = q − key mod q

Where:

q < n

q is a prime number

The possible values for h 2 (key) are

1, 2, ⋯ , q

Separate Chaining

The hash table structure is altered by making each index location a reference (pointer) to a chain of data items.

Each location is called a bucket.

A bucket can be:

A list
A sorted list
A chain of linked nodes
An array
A vector

Linked list is the most common implementation.

Chaining slows things down with collisions as we must know search through a linked list, an O(n) action. However, since a chain is
typically shorter than the table, it is O(k) where k is the length of the chain.

Successful retrieval or removal has the same efficiency as a successful search.

Unsuccessful retrieval or removal has the same efficiency as unsuccessful search.


Successful addition has the same efficiency as unsuccessful search.

Unsuccessful addition has the same efficiency as successful search.

Load Factor

It is the measure that decides when to increase the capacity of the Map. The default load factor is 75% of the capacity. The threshold
of a HashMap is approximately the product of current capacity and load factor. A HashMap with the default initial capacity (16) and
the default load factor (0.75) has the threshold of 0.75 ⋅ 16 = 12, which means that it will increase the capacity from 16 to 32 after the
12th entry.

Rehashing

Increasing the size of a hash table followed by placing the current entries to the expanded table is rehashing.

When the preset load-factor of the hash table is reached, rehashing is carried out.

Tree Fundamentals
Tree

A data structure that stores elements hierarchically. Except for the top most element, each element in a tree has one parent element
and zero or more children elements.

The topmost element is called the root of the tree.

Binary Tree

A tree containing at most 2 children per node is called a binary tree.

Full Binary Tree

A binary tree where every non-leaf node has 2 children

Complete Binary Tree

Every level, above the last level, is full, while all the nodes in the last level are as far left as possible.

Pre-Order Traversal

Visit the root before the sub-trees. For each sub-tree visit the parent first and then left and right children (P-L-R).

In-Order Traversal

Visit root between visiting subtrees. For each sub-tree visit the left child, then the parent, and the right child (L-P-R).

Post-Order Traversal

Visit root after visiting the subtrees. For each sub-tree visit left child, then the right child and the parent (L-R-P).

Level-Order Traversal
Visit nodes by level from top to bottom and from left to right.

BST and Heap Trees


Binary Search Tree

A binary tree in which the nodes are arranged in a specific order to make the tree-search more efficient, such that the left child of all
nodes is less in value than the parent node, and the right node is always greater than the parent node.

Removing an Entry from a BST

We must first find the node, then remove it. If found, there are three cases:

1. The node has no children


2. The node has one child
3. The node has two children

If removing a leaf node, then just remove and no need to rearrange the tree.

If removing a node with a child, replace the node with its child (move the child up and remove the node).

If removing a node with two children, it depends on which subtree you are on. If you are in the left subtree, then find the MAXIMUM
node, and replace with that. Do the opposite for the right subtree. After doing this, you will need to deal with the removal of the right-
most or left-most item as it is moved up, and rearrange the rest of the tree as if it is a removal.

Efficiency of Operations in BST


Most operations on a BST take time directly proportional to the height of the tree (O(logn))

The worst case for a single BST operation can be O(n), and for m operations can be O(m * n)

Balanced Tree

A fully balanced tree has subtrees of each node with the exact same height.

A height-balanced tree has subtrees of each node differing by no more than 1 node.

Fully balanced or height balanced trees are balanced trees.

Heap Tree

A complete binary tree (naturally balanced) whose nodes are ordered in two different configurations, maxheap and minheap

MaxHeap

A heap tree where the root node is greater than its children.

Removing the Root

To remove a heap's root:

Replace the root with the heap's last leaf


This forms a semi-heap
Then use the reheap method (think of upheap but downwards)

MinHeap

A heap tree where the root node is smaller than its children. This property is recursively true for all its sub-trees.

Array to Represent a Heap

A heap tree is a complete binary tree, and hence we can use Level-Order Traversal

This enables easy location of the data in a node's parent or children.

In this case, two approaches can be used to devise the algorithm:

Approach 1

The cell at index 0 is not used


Parent of a node at i is found at i

2
when i ≠ 2
The left child is at index 2i
The right child is at index 2i + 1

Approach 2

The cell at index 0 is used


The parent of i = i−1

The left child is at index 2i


The right child is at 2i + 1

Upheap

Insert the new key k at the first available leaf position on the far left. After the insertion, the heap-order property may be violated. The
upheap algorithm restores the heap-order property by swapping k along an upward path from the insertion node.

Heap Sort

1. Place array items into a maxheap


2. Swap the root with the content of the last index
3. Decrement the last-index of the array
4. Re-heap
5. Repeats steps 2 to 4 until the last index is greater than or equal to 0

For descending order, we can use a minheap

HeapSort is an efficient unstable sorting algorithm, with an average, best, and worst case time complexity of O(n log n)

Balanced Search Trees


AVL Tree

A balanced Binary search tree that rearranges nodes whenever it becomes unbalanced.
The tree balancing happens during addition or removal of a node. Since AVL trees maintain their balance, its height is always
proportional to log n for an n-node tree.

To ensure balance, AVL uses single and double rotation.

2-3 Tree

A balanced search tree that contains internal nodes either with 2 or 3 children.

A 2-children node

Contains one data item s


has two children
the data item in the left sub tree is smaller than s
the data item in the right sub tree is greater than s
A 3-children node
Contains 2 data items, s, and l
Has 3 children
Data-item < s is in left subtree
Data-item > l is in right subtree
s < data-item < i is placed in the middle subtree

Adding Entries

1. Add an entry to a 2-3 tree at a leaf-node (same as adding an entry to a binary search tree) one after another before that leaf
node becomes a 3-entry node
2. We split the 3 entry node by pushing the center-entry up the tree to form a parent, if the parent does not exist
- If the parent exists, the parent will accommodate that pushed entry and it becomes a 3-children node
- In some scenarios, both leaf and the root may split based on the entry
- This process continues till the last entry is added to the tree
We locate the leaf by using a search algorithm, which is like BST search algorithm with minor adjustment

2-4 Tree

A generic balanced search tree.

Interior nodes must have 2, 3, or 4 children.

Leaves occur on the same level

Completely balanced

Red-Black Tree

A self-balancing binary search tree in which every node is colored either red or black. Compared to the avl tree, it is easier to balance
a red-black BST using smaller number of rotations, but it is not strictly heigh balanced.

A red-black BST uses a specific set of rules to ensure that the tree is always balanced and that always ensures that the time
complexity for operations are always O(log n).

Properties of a Red-Black Tree

1. The root is black. All the NULL nodes (NOT leaf-nodes) are black
2. Every node is either red or black
3. Every red node has a black parent
4. Any child of a red parent must be black, that is, a red parent cannot have a red child.
5. Every path from the root to any descendants' NULL node (NOT leaf-node) contains same number of black nodes; this is known
as the black-height of a tree
This black-height of a red-black tree is the go-to idea of balancing a red-black tree, which is not a strictly balanced scenario
Based on this property, a chain of 3 nodes is not possible. Also, it can be shown that the longest path from the root of a red
and black tree to the leaf can not be greater than twice the shortest path

Adding Entries to a Red-Black Tree: The Steps

1. Tree is empty - the new node is the root, and it is black


2. Tree is not empty - using BST principle, add the new node as a leaf with color red
3. Parent of the new node is black - exit
4. Parent (P) of the new node is red - check the color of the uncle (U) (parent's sibling)
a. Uncle's black or NULL - do suitable rotation and change color. If rotation is LL or RR, then Grand-parent (GP) and P nodes
will change color, if rotation is RL or LR then GP and child node will change color
b. Uncle's red - change the color of GP, U, and P
i. Now if the GP is a root node, change back GP's color to black and exit
ii. But if the GP is not a root-node, considering this GP as a new node, go back to step 3
c. More entry? Go back to step 2
d. Ensure the following before exit:
i. Root is black
ii. No two adjacent red nodes
iii. Same number of black nodes on each path

Graph Fundamentals
Graph

A non-linear data structure consisting of distinct nodes and edges.

The nodes are sometimes referred to as vertices and the edges are lines or arcs that connect any two nodes in the graph.

All trees are graphs, but not all graphs are trees.

Undirected Graph

A graph in which the edges do not have direction. The edges indicate a two-way relationship, essentially meaning that each edge can
be traversed in both directions.

Directed Graph

A graph where edges have direction. The edges indicate a one-way relationship, in that each edge can only be traversed in a single
direction.

A directed graph is strongly connected if there is a direct or indirect path between any pair of nodes.

Graph Path

A path in a graph is a sequence of edges that connect two vertices. A path in a directed graph is known as a directed path.
Graph cycle

A graph path that begins and ends at the same vertex

Simple cyclic path: does not pass through any vertex more than once
A graph with no cycles is an acyclic graph

Weighted Graph

A graph with values (weights) on its edges. This represents a sort of cost or gain by passing through a path.

Complete Graph

A graph which has an edge between every pair of distinct vertices

Spanning Tree

A tree that contains all vertices of a graph.

Adjacency Matrix

An n × n matrix for a graph with n vertices, is known as the adjacency matrix that has the following properties:

Each row-column pair corresponds to a vertex in the graph


Element a indicates whether an edge exists between vertex i and vertex j
ij

T or F is put for an unweighted graph, while for a weighted graph, the values are put.

Adjacency List

A Linked List that represents only edges that originate from the vertex. For example

A -> B -> C
B -> A -> D
C -> D

Means that A is connected to B and C, B to A and D, and C to D, but NOT B to C.

Graph Traversal

Graph traversal is the process of visiting each node in a graph for either checking or updating data.

Generally, there are two graph traversal algorithms:

Breadth First Traversal


Depth First Traversal

Both traversals result in a spanning tree

Breadth First Traversal

A graph traversal method in which we:

1. Visit any vertex of choice


2. Visit each of v's neighbors
3. Backtrack and start visiting the neighbors of the first neighbor of v. Continue this process until all vertices are visited

Here is some pseudo-code to represent the algorithm

Algorithm breadthFirstTraversal(originVertex){
traversalOrder = new Queue();
vertexQueue = new Queue();
Mark origin vertex as visited
traversalOrder.enqueue(originVertex)
vertexQueue.enqueue(originVertex)
while(!vertexQueue.isEmpty()){
frontVertex = vertexQueue.dequeue()
while(frontVertex has neighbor){
nextNeighbor = next neighbor of front vertex
if(nextNeighbor is not visited){
Mark nextNeighbor as visited
traversalOrder.enqueue(nextNeighbor)
vertexQueue.enqueue(nextNeighbor)
}
}
}
return traversalOrder
}

Depth First Traversal

A graph traversal method that follows these steps:

1. Visit any vertex, then


A neighbor (w) of v
A neighbor of w
And continue until you are as far as possible
2. Back up by one vertex
Consider the next neighbor of that vertex, if any, and then follow step 1 for the unvisited ones.
3. Go back to step 2 till all the vertices are visited

Here is some pseudo code

Algorithm depthFirstTraversal(originVertex){
traversalOrder = new Queue()
vertexStack = new Stack()
Mark originVertex as visited
traversalOrder.enqueue(originVertex)
vertexStack.push(originVertex)
while(!vertexStack.isEmpty()){
topVertex = vertexStack.peek()
if(topVertex has an unvisited neighbor) {
nextNeighbor = next unvisited neighbor
Mark nextNeighbor as visited
vertexStack.push(nextNeighbor)
} else {
vertexStack.pop()
}
}
return traversalOrder
}

Graph Applications
Topological Sort

A linear ordering of a direct acyclic graph's vertices such that for every directed edge from vertex u to v, u comes before v in the
ordering.

Here is the pseudo-code

Algorithm topologicalOrder(){
vertexStack = new Stac()
numberOfVertices = number of nodes in graph
for(counter = 1 to numberOfVertices)
{
nextVertex = an unvisited vertex whose neighbors are all visited
Mark nextVertex as visited
vertexStack.push(nextVertex)
}
return vertexStack
}

Shortest Path in an Unweighted Graph

In an unweighted graph, the shortest path is the one with the fewest edges.

The algorithm is based on a slightly modified Breadth First Search algorithm, where we note the length of the path that the traversal
followed to reach a vertex v and it keeps record of the vertex it just visited before v.

Here is the pseudocode:

Algorithm getShortestPath (originVertex, endVertex, path)


done = false
vertexQueue = a new queue to hold vertices as they are visited
Mark originVertex as visited
vertexQueue.enqueue (originVertex)
while (!done && !vertexQueue.isEmpty ()) {
frontVertex = vertexQueue.dequeue ()
while (!done && frontVertex has a neighbor){
nextNeighbor = next neighbor of frontVertex
if (nextNeighbor is not visited) {
Mark nextNeighbor as visited
Set the length of the path to nextNeighbor to 1 + length of path to frontVertex
Set the predecessor of nextNeighbor to frontVertex vertexQueue.enqueue (nextNeighb
}
if (nextNeighbor equals endVertex)
done = true
}
}
pathLength = length of path to endVertex
path.push(endVertex)
vertex = endVertex
while(vertex has a predecessor){
vertex = predecessor of vertex
path.push(vertex)
}
return pathLength

Shortest Path in a Weighted Graph

In a weighted graph, the shortest path is not necessarily the one with the fewest edges.

This algorithm is based on a slightly modified Breadth First Search algorithm.

Algorithm getCheapestPath (originVertex, endVertex, path)


done = false
priorityQueue = a new priority queue
priorityQueue.add (new EntryPQ (originVertex, 0, null))
while (!done && !priorityQueue.isEmpty ()) {
frontEntry = priorityQueue.remove ()
frontVertex = vertex in frontEntry
if (frontVertex is not visited) {
Mark frontVertex as visited
Set the cost of the path to frontVertex to the cost recorded in frontEntry
Set the predecessor of frontVertex to the predecessor recorded in frontEntry
if (frontVertex equals endVertex)
done = true
else {
while (frontVertex has a neighbor) {
nextNeighbor = next neighbor of frontVertex
weightOfEdgeToNeighbor = weight of edge to nextNeighbor
if (nextNeighbor is not visited) {
nextCost = weightOfEdgeToNeighbor + cost of path to frontVertex
priorityQueue.add (new EntryPQ (nextNeighbor, nextCost, frontVerte
}
}
}
}
}
pathCost = cost of path to endVertex
path.push(endVertex)
vertex = endVertex
while(vertex has a predecessor)
{
vertex = predecessor of vertex
path.push(vertex)
}
return pathCost

You might also like