Data_Structures_Study_Notes
Data_Structures_Study_Notes
Data structures provide a way to represent and arrange data in computer memory
in several ways, enabling algorithms to perform operations on that data with as
little cost to runtime and memory space as possible. This efficiency is important
for optimizing the performance of applications, especially when dealing with large
datasets or resource-constrained environments.
Trees: Hierarchical structures with a root node and child nodes, used for
representing hierarchical relationships and efficient searching.
Hash Tables: Data structures that use a hash function to map keys to array
indices, allowing for fast insertion, deletion, and lookup operations.
Linked Lists: Sequences of nodes where each node contains data and a
reference to the next node (called a pointer), without actually being in the
same contigous memory locations. This allows for efficient insertion and
Big O
Big O notation is a mathematical notation used to describe the upper bound or
worst-case scenario of an algorithm's time or space complexity. It provides a
standardized way to express how an algorithm's performance scales with input
size (typically denoted as n ).
Visual: Runtime grows very slowly as n increases. It halves the steps as input
increases.
Visual: Faster than O(n²) but slower than O(n) for large inputs.
Data Structures
Array
The array is one of the most basic data structures.
[1, 2, 3, 4, 5, 6]
["This", "is", "an", "array", "example"]
Arrays have:
0, 1, 2, 3, 4
Operations of an Array
Reading
This is because the computer has the ability to jump to any particular index
in the array and look inside. We provide it an index (position) in the array
and it can jump to that instantly.
Searching
Insertion
If you’re adding a new data element to the end of an array, insertion takes
just one step. The computer knows where the end of the array is instantly
because when allocating an array in memory, the computer always keeps
track of the array’s size. In other words, the computer knows where it
ends.
If the array begins at memory address 1010 and is of size 5, that means its
final memory address is 1014. So to insert an item beyond that would mean
adding it to the next memory address, which is 1015.
However, inserting new data at the beginning of an array (or the middle) is
a different story.
In these cases, we need to shift pieces of data to make room for what
we’re inserting, and this leads to additional steps.
In other words, it will require a step for each element in the array. That’s n
steps if the n is representing the number of elements in the array.
Deletion
While the actual deletion of a value takes just one step, we may face a
problem:
An array isn’t effective when there are gaps between its indexes (it
must be contiguous), so to resolve this, the computer shifts all values
to the left until it fills the gap.
This is because index 0 would become empty, and we’d have to shift all
the remaining elements to the left to fill the gap.
Time Complexity:
Time
Operation Explanation
Complexity
Stack
A stack is an abstract data type (ADT) with a linear arrangement resembling a pile.
It is similar to an array in that items can be arranged linearly. However, the
difference is that it is a last-in, first-out (LIFO) data structure. This means that
stacks only allow access to the most recently added data first (top of stack) and
you can only remove data from the top of stack.
So basically, the most recently added data gets removed first from the pile.
Operations of a Stack
Reading
A computer can read from a stack in just one step, similar to an array.
This is because the computer always has access to the top element of the
stack, which is where reading operations occur.
Searching
Searching a stack takes linear time, O(n), in the worst case. This is
because in a stack, elements are only accessible from the top, and to
search for a specific value that exists in the stack, you might need to pop
(remove) elements one by one until you find the desired value or reach the
bottom of the stack. So for a stack of n elements, it might take n steps.
Insertion
In a stack, insertion (also called "push") always occurs at the top of the
stack. This operation is very efficient because the new element is simply
placed on top of the existing elements.
The computer always knows where the top of the stack is, so it can add a
new element in a single step, regardless of how many elements are
already in the stack.
Deletion
Deletion from a stack (also called "pop") always occurs at the top of the
stack. This operation is very efficient because the computer only needs to
remove the topmost element. Like insertion, deletion in a stack can be
performed in a single step, regardless of the stack's size.
Summary: Stack
Description: A last-in, first-out (LIFO) data structure. Operations occur at the
top of the stack.
Time Complexity:
Queue
A queue is a linear data structure, similar to a stack, but has a different ordering
rule. Queues follow the First In, First Out (FIFO) principle. Unlike a stack, elements
are added at one end (called the rear or tail) and removed from the other end
(called the front or head). Think of it like people waiting in line - the first person to
join the queue is the first one to be served. The last person in line has to wait.
Queues are typically implemented using a linked list
Operations of a Queue
Reading
A computer can read from a queue in constant time O(1) when accessing
elements at the front of the queue. This is because the front pointer
always points to the first element, making it immediately accessible.
However, accessing elements at other positions requires traversing
through the queue sequentially.
Searching
Searching a queue takes linear time O(n) in the worst case scenario. This
is because you might need to examine each element in the queue
sequentially from front to back to find a specific value. Since elements can
Insertion
Deletion
Summary: Queue
Description: A First In, First Out (FIFO) data structure where elements are
added at the rear and removed from the front, similar to a real-world queue or
line. Elements are processed in the order they arrive, making it ideal for
managing sequential operations or waiting lists.
Time Complexity:
Time
Operation Explanation
Complexity
Searching O(n) Must traverse through all elements to find target value
Set
A set is just like an array, but the only difference is that it does not allow duplicate
elements. Each element in a set must be unique. By unique, we mean that no two
values can be the same. This property of “distinction” between elements makes
sets particularly useful for storing collections of unique items or for quickly
checking whether an item is already present in a collection.
Operations of a Set
Reading
However, even in a hash table set, in the worst-case scenario (due to hash
collisions), the time complexity could degrade to O(n).
Searching
Insertion
Inserting a value into a set means that the computer first needs to
determine that this value doesn’t already exist in the set, because that’s
what sets do: they prevent duplicate data from being inserted into them.
Because of this, the computer will first need to search the set to see
whether the value we want to insert is already in there.
Inserting a value at the end of a set is the best-case scenario, but we still
have to search each of the previous elements to know if there are any
duplicates before performing the final insertion.
Said another way: insertion into the end of a set will take up to N + 1 steps
for N elements. This is because there are N steps to ensure that the value
doesn’t already exist within the set, and then one step for the actual
insertion.
Deletion
However, like insertion and searching, the worst-case scenario for deletion
in a set is O(n) due to potential hash collisions or if using an array-based
implementation.
In an array-based set, deletion takes O(n) time in the worst case. This is
because after removing an element, we may need to shift all the
subsequent elements to fill the gap left by the deleted item. For example, if
we delete the first element of the set, we'd have to move all the remaining
elements one position to the left to maintain the contiguous nature of the
array.
Summary: Set
Description: A collection of unique elements where each item appears only
once. Sets are useful for storing distinct values.
Time Complexity:
Linked List
A linked list is a data structure where elements are stored in a sequence of nodes,
with each node containing both data and a reference (or pointer) to the next node
The last node in the list points to null, indicating the end of the list. This structure
allows for efficient insertion and deletion of elements at any point in the list,
though it requires traversing from the beginning to reach specific positions.
Linked lists come in two varieties: singly linked lists and doubly linked lists. In a
singly linked list, each node only contains a reference to the next node in the
sequence. A doubly linked list, on the other hand, contains references to both the
next and previous nodes, allowing for bidirectional traversal. A doubly linked list
can go either back or forward.
A computer can read from a linked list by traversing through the nodes
sequentially. Unlike arrays where elements are stored in contiguous
memory locations, accessing elements in a linked list requires following
the node references from the beginning until reaching the desired position.
This means that to read the nth element, we need to traverse through n-1
nodes first. That means that reading a linked list takes linear time, meaning
the time increases proportionally with the size of the list. This is because
we must traverse through each node sequentially, following the chain of
references, to reach any desired element. We cannot just jump to an index
like we can with an array. We must traverse the nodes always.
Searching
Searching for a value in a linked list requires traversing through the nodes
sequentially from the beginning until the target value is found. Since we
Insertion
Deletion
Deletion from a linked list involves removing a node from the list by
adjusting the pointers of surrounding nodes. Like insertion, deletion from
the beginning of the list takes O(1) time, but deleting from any other
position requires traversing to that position first, making it O(n) in the worst
case. The process involves updating the pointer of the previous node to
skip over the deleted node and point to the next node in the sequence
instead.
Since we’re not technically deleting the target node and just
“unreferencing” it by choosing not to point to it anymore, that means that it
still exists in memory. It will be garbage collected by the program later
when the computer realizes there are no more references pointing to it in
memory. This automatic memory management helps prevent memory
leaks and ensures efficient use of system resources.
Time Complexity:
Time
Operation Explanation
Complexity
Searching O(n) Must traverse the list to find the target element.
O(1) best case, Adding to the head or tail (if tail is tracked) is
Insertion
O(n) worst case instantaneous. Anywhere else requires n steps
Hash Table
A hash table is a list of paired values. The first item in each pair is called the key,
and the second item is called the value. In a hash table, the key and value have
some significant association with one another. For example: menu = { “french
fries”: 0.75, “hamburger”: 2.5 }
In the example provided, the string “french fries” is the key, and the 0.75 is the
value.
Hash tables are an efficient data structure that use a hash function to map keys to
locations in memory. The hash function will use some sort of mathematical
formula to convert a value to a particular location in memory.
For example, if we have a key "apple", the hash function might perform
mathematical operations on the characters in "apple" to generate a specific
memory address like 1052. This hash value then determines where in the hash
table the key-value pair will be stored. This process of taking characters and
converting them to numbers is known as hashing.
This mapping allows for extremely fast lookups and insertions, making hash tables
ideal for scenarios where quick access to data is crucial. The hash function
converts each key into a unique memory address, though collisions can occur
when different keys hash to the same location.
Reading from a hash table involves using the hash function to convert the
key into a memory address and then accessing that location directly. This
makes reading operations extremely efficient in the average case, taking
constant time O(1). However, in cases where there are hash collisions
(multiple keys mapping to the same location), the operation might require
additional steps to resolve the collision, leading to a worst-case time
complexity of O(n).
Searching
Insertion
Inserting a value into a hash table involves first using the hash function to
compute the memory location for the new key-value pair. In the average
case, this is a constant-time operation O(1), as the hash function directly
maps to an available slot. However, when hash collisions occur (multiple
Deletion
Deletion from a hash table involves removing an element from the data
structure.
Time Complexity:
Time
Operation Explanation
Complexity
In our example, we’d say that “j” is a parent to “m” and “b” (which are also nodes)
As in a family tree, a node can have descendants and ancestors. A node’s
descendants are all the nodes that stem from a node, while a node’s ancestors are
all the nodes that it stems from.
Knowing how to find the height of a tree is important as this will be on the
assessment.
A binary tree is a type of tree where each node can have at most two children,
typically referred to as the left child and right child. This structure allows for
efficient organization of data in a tree-like format, with nodes branching off from a
single root node at the top. The hierarchical nature of binary trees makes them
particularly useful for representing relationships between data elements in a way
that enables efficient searching and traversal operations.
Reading
Searching
Searching for a value in a binary tree involves traversing through the tree
structure to find a specific value. In a standard binary tree, we may need to
explore all nodes in the worst case since there's no guaranteed ordering of
elements. This process typically involves checking the root node, then
recursively searching both left and right subtrees until the value is found or
all nodes have been checked.
Insertion
Deletion
Deletion from a binary tree involves removing an element from the data
structure.
Time Complexity):
Time
Operation Explanation
Complexity
Each node can have at most one left child and one right child
A node’s left descendants can only contain values that are less than the node
itself.
A node’s right descendants can only contain values that are greater than the
node itself.
Note that each node has one child with a lesser value than itself, which is depicted
using a left arrow, and one child with a greater value than itself, which is depicted
using a right arrow.
Additionally, notice that all of the 50’s left descendants are less than it. At the
same time, all of the 50’s right descendants are greater than it. The same pattern
goes for each and every node.
While the following example is a binary tree, it’s not a binary search tree:
Reading
Reading from a binary search tree involves traversing the tree structure in
a systematic way. The most common traversal methods are in-order (left
subtree, root, right subtree), pre-order (root, left subtree, right subtree),
and post-order (left subtree, right subtree, root). In a binary search tree, an
in-order traversal will visit nodes in ascending order of their values.
Searching
Searching for a value in a binary search tree is efficient due to its ordered
structure. At each node, we can determine whether to search the left or
right subtree based on comparing the target value with the current node's
value. If the target value is higher than the current node, then we continue
searching on the right. If the target value is lower than the current node,
then we continue searching on the left. This binary decision process
allows us to eliminate half of the remaining nodes at each step, making the
search operation much faster than in a regular binary tree.
Insertion
Deletion
Deletion from a binary search tree involves removing an element from the
data structure.
Deleting an item from a binary search tree involves first locating the node
to be deleted, then handling the restructuring based on the node's position
and number of children. When deleting a node with no children, we simply
remove it. For a node with one child, we connect its parent directly to its
child. The most complex case is deleting a node with two children, where
we typically replace it with its in-order successor (the smallest value in its
right subtree) to maintain the BST property.
Time
Operation Explanation
Complexity
Priority Queue
A priority queue is a specialized data structure that operates like a regular queue
(FIFO) but with an important distinction: when inserting an element, we always
make sure the data remains sorted based on their priority level rather than their
order of entry.
Expressed another way:
A priority queue is a list whose deletions and access are just like a classic queue
but whose insertions are like an ordered array. So we only delete and access data
from the front of the priority queue, but when we insert data, we always make
sure the data remains sorted in a specific order.
One classic example of where a priority queue is helpful is an application that
manages the triage system for a hospital emergency room. In the ER, we don’t
treat people strictly in the order in which they arrived. Instead, we treat people in
the order of the severity of their symptoms. If someone suddenly arrives with a
life-threatening injury, that patient will be placed at the front of the queue, even if
the person with the flu had arrived hours earlier.
In determining the next patient to treat, we’ll always select the patient at the front
of the priority queue, since that person’s need is the most urgent. In this case, the
next patient we’d treat is Patient C.
The priority queue is an example of an abstract data type— one that can be
implemented using other, more fundamental, data structures. One straightforward
way we can implement a priority queue is by using an ordered array; that is, we
use an array and apply the following constraints:
When we insert data, we ensure we always maintain proper order. Data can only
be removed from the end of the array. (This will represent the front of the priority
queue.)
Deques are usually implemented using doubly linked lists, as they provide efficient
operations at both ends and maintain links between adjacent elements. This
structure allows for O(1) time complexity for both insertion and deletion operations
at either end. The double linking ensures that traversal can happen in both
directions, making it ideal for implementing deque operations.
Heaps
Heaps are of several different types, but we’re going to focus on the binary heap.
The binary heap is a specific kind of binary tree.
As a reminder, a binary tree is a tree where each node has a maximum of two child
nodes.
This is the same for heaps.
Now, even binary heaps come in two flavors: the max-heap and the min-heap.
We’re going to work with the max-heap for now, but as you’ll see later, the
difference between the two is trivial.
Going forward, I’m going to refer to this data structure simply as a heap, even
though we’re specifically working with a binary max-heap.
The value of each node must be greater than each of its descendant nodes.
This rule is known as the heap condition.
Let’s break down both of these conditions, starting with the heap condition.
In this example, the root node of 100 has no descendant that is greater than it.
Similarly, the 88 is greater than both of its children, and so is the 25.
The following tree isn’t a valid heap, because it doesn’t mea the heap condition:
Note how the heap is structured very differently than the binary search tree. In a
binary search tree, each node’s right child is greater than it. In a heap, however, a
node never has any descendant that is greater than it. As they say, “A binary
search tree doth not a heap make.” (Or something
like that.)
Complete Trees
Now, let’s get to the second rule of heaps—that the tree needs to be complete.
A complete tree is a tree that is completely filled with nodes; no nodes are
missing. So if you read each level of the tree from left to right, all of the nodes are
there.
However, the bottom row can have empty positions, as long as there aren’t any
nodes to the right of these empty positions. This is best demonstrated with
examples. The following tree is complete, since each level (meaning, each row) of
the tree is completely filled in with nodes:
The following tree is not complete, since there’s an empty position on the third
level (and other nodes exist on the third level to the right of that empty position):
A heap, then, is a tree that meets the heap condition and is also complete. Here’s
one more example of a heap:
This is a valid heap since each node is greater than any of its descendants, and
the tree is also complete. While it does have some gaps in the bottom row, these
gaps are limited to the very right of the tree.
While heaps have some order, as descendants cannot be greater than their
ancestors, this isn’t enough order to make heaps worth searching through.
Another property of heaps, that may be obvious by now but is worth calling
attention to, is in a heap, the root node will always have the greatest value.
This will be the key as to why the heap is a great tool for implementing priority
queues.
In the priority queue, we always want to access the value with the greatest
priority.
With a heap, we always know that we can find this in the root node. Thus, the root
node will represent the item with the highest priority.
Graphs
A graph is a data structure that consists of nodes (which are called vertices)
connected by lines or links (which are known as edges). Unlike trees which have a
hierarchical structure, graphs can represent more complex relationships where
Graph terminology:
Edge: A connection or link between two vertices that shows their relationship
A Circular or Cyclic Graph means that a vertex can connect back to itself through
a self-loop. This creates a circular reference where a node has an edge pointing
back to itself. Self-loops are important in representing certain types of
relationships in graph structures, such as recursive processes or self-referential
data.