Com124 Data Structure Note-1-1
Com124 Data Structure Note-1-1
called its data structures or Data Structures are the programmatic way of storing data so that
data can be used efficiently. A data item is a single unit of values, It is a raw fact which
becomes information after processing . Data items for example, date are called group items if
they can be divided into subsystems. The date for instance is represented by the day, the
month and number is called an elementary item, because it cannot be sub-divided into
sub-items. It is indeed treated as a single item. An entity is used to describe anything
that has certain attributes or properties, which may be assigned values. For example , the
following are possible attributes and their corresponding values for an entity known as
STUDENT.
Entities with similar attributes for example, all the 200 level Computer science &
Statistics students form an entity set.
Basic Terminology
Data − Data are values or set of values. It is a raw fact which becomes information
after processing
Data Item − Data item refers to single unit of values.
Group Items − Data items that are divided into sub items are called as Group Items.
Elementary Items − Data items that cannot be divided are called as Elementary Items.
Entity − An entity is that which contains certain attributes or properties, which may be
assigned values.
Entity Set − Entities of similar attributes form an entity set.
Field − Field is a single elementary unit of information representing an attribute of an
entity.
Record − Record is a collection of field values of a given entity.
File − File is a collection of records of the entities in a given entity set.
• Seek to identify and develop entities, operations and appropriate classes of problems to use
them.
1|Page
Algorithm. A finite sequence of instructions, each of which has a clear meaning and can be
executed with a finite amount of effort in finite time. whatever the input values, an algorithm
will definitely terminate after executing a finite number of instructions.
As applications are getting complex and data rich, there are three common problems that
applications face now-a-days.
To solve the above-mentioned problems, data structures come to rescue. Data can be organized
in a data structure in such a way that all items may not be required to be searched, and the
required data can be searched almost instantly.
From the data structure point of view, following are some important categories of algorithms −
2|Page
Shortest path by Dijkstra
Project scheduling
Types of Algorithms:
The simplest possible algorithm that can be devised to solve a problem is called the brute force
algorithm. To device an optimal solution first we need to get a solution at least and then try to
optimise it. Every problem can be solved by brute force approach although generally not with
appreciable space and time complexity.
For example:
Greedy Algorithm
In this algorithm, a decision is made that is good at that point without considering the future.
This means that some local best is chosen and considers it as the global optimal. There are two
properties in this algorithm.
3|Page
Greedily choosing the best option
Optimal substructure property: If an optimal solution can be found by retrieving the
optimal solution to its subproblems.
Greedy Algorithm does not always work but when it does, it works like a charm! This algorithm
is easy to device and most of the time the simplest one. But making locally best decisions does
not always work as it sounds. So, it is replaced by a reliable solution called Dynamic
Programming approach.
Applications
For better understanding lets go through the most common problem i.e. Job scheduling
problem: Let us consider a situation where we are given the starting and end times of various
events in an auditorium. Now your job is to maximise the number of events that can be organised
in the auditorium where no two events overlap ( starting time or ending time of one event does
not fall in between the starting and endpoint of another event).
Now a brute force solution would make us think that if we sort the events by their starting times
& starting with the first event while excluding all events which overlap the previous will
certainly give a solution but it wont maximise the number of events. Let us see, after sorting by
starting time –
So, the events that can be organised are – A, F. So, our brute force approach will have multiple
such cases and fail if we don’t select the optimal initial event. Now let’s see what our greedy
algorithm suggests. According to greedy algorithm we sort the events by their ending times, i.e.
we select events which ends first. Our new event table will become:
4|Page
So, we choose – B, E, C which is certainly a larger number of events than previous. Hence in
such cases, the Greedy Algorithm gives the best solution to this type of problem.
Recursive Algorithm
This is one of the simplest to devise algorithm as it does not require to specifically think about
every subproblem. This means we just need to think about the existing cases and the solution of
the simplest subproblem, all other complexity will be handled by it automatically. Recursion is a
very powerful tool although we should always take care of memory management here as
recursion works using recursive stack which is called every time recursion function is invoked.
Recursion simply means calling itself to solve its subproblems.
For Example:
Backtracking Algorithm
It is an improvement to the brute force approach. Here we start with one possible option out of
many available and try to solve the problem if we are able to solve the problem with the selected
move then we will print the solution else we will backtrack and select some other and try to solve
it. It is a form of recursion, it’s just that when a given option cannot give a solution, we backtrack
to the previous option which can give a solution and proceed with other options.
Applications
5|Page
Graph colouring Problem
Let us see the application of this algorithm in generating all strings with n bits.
This is one of the most used algorithms in programming. This algorithm divides the problems
into subproblems and then solve each of them and then combine them to form the solution of the
given problems.
Again, it is not possible to solve all problems using it. As the name suggests it has two parts:
Divide the problem into subproblems and solve them.
The given problem is divided into two parts of n/a and n/b size and then computed separately and
recursively to bring back the result and combine them to form the solution.
6|Page
Applications:
Binary Search
Merge Sort & Quick Sort
Median Finding
Matrix Multiplication
Let us discuss the simplest application of Binary Search. Previously we described how
searching of an element in a sorted array takes O(n) time, this time we apply divide and
conquer algorithm to reduce its complexity to O(logn).
Output 1:
The flow of the program moves to the right subarray as five is greater than the current mid (3)
and hence doesn’t iterate over half of the elements and hence reduces the time complexity.
Although this is not applicable if the array is not sorted as it can be understood it will just neglect
one part of the array and the algorithm will fail.
7|Page
Dynamic Algorithm
This is the most sought out algorithm as it provides the most efficient way of solving a problem.
Its simply means remembering the past and apply it to future corresponding results and hence
this algorithm is quite efficient in terms of time complexity.
Bottom-Up Approach: Starts solving from the bottom of the problems i.e. solving the
last possible subproblems first and using the result of those solving the above
subproblems.
Top-Down Approach: Starts solving the problems from the very beginning to arrive at
the required subproblem and solve it using previously solved subproblems.
Applications
Let us take a simple example of such algorithm. Finding the Fibonacci Sequence.
8|Page
Recursive stack of the function for n = 4
Randomised Algorithm
This is an algorithm type which makes its decision on the basis of random numbers i.e. it uses
random numbers in its logic. The best example for this is choosing the pivot element in
quicksort. This randomness is to reduce time complexity or space complexity although not used
regularly. Probability plays the most significant role in this algorithm.
In terms of quicksort, we fail to choose the correct element we might end up with a running time
of O(n^2 ) in the worst case. Although if chosen with proper interpretation it can give the best
running time of O(nlogn).
9|Page
Applications
There are three basic ways of writing algorithms in programming. They include:
English-Like Algorithm.
Problem − Design an algorithm to add two numbers and display the result.
Step 1 − START
Step 2 − declare three integers a, b & c
Step 3 − define values of a & b
Step 4 − add values of a & b
Step 5 − store output of step 4 to c
Step 6 − print c
Step 7 − STOP
Alternatively:
In design and analysis of algorithms, usually the second method is used to describe an algorithm.
It makes it easy for the analyst to analyze the algorithm ignoring all unwanted definitions. He
can observe what operations are being used and how the process is flowing.
We design an algorithm to get a solution of a given problem. A problem can be solved in more
than one ways.
10 | P a g e
Hence, many solution algorithms can be derived for a given problem.
Flowchart.
11 | P a g e
Problem − Design an algorithm to add two numbers and display the result.
Pseudocode.
This is a notation resembling a simplified programming language, used in program design. The
pseudocode has an advantage of being easily converted into any programming language. This
12 | P a g e
way of writing algorithm is most acceptable and most widely used. In order to write a
pseudocode, one must be familiar with the conventions of writing it.
THEY INCLUDE:
Sorting is nothing but arranging the data in ascending or descending order. The term sorting
came into picture, as humans realised the importance of searching quickly.
There are so many things in our real life that we need to search for, like a particular record in
database, roll numbers in merit list, a particular telephone number in telephone directory, a
particular page in a book etc. All this would have been a mess if the data was kept unordered and
unsorted, but fortunately the concept of sorting came into existence, making it easier for
everyone to arrange data in an order, hence making it easier to search.
Sorting Efficiency
13 | P a g e
If you ask me, how will I arrange a deck of shuffled cards in order, I would say, I will start by
checking every card, and making the deck as I move on.
It can take me hours to arrange the deck in order, but that's how I will do it.
Since the beginning of the programming age, computer scientists have been working on solving
the problem of sorting by coming up with various different algorithms to sort data.
The two main criterias to judge which algorithm is better than the other have been:
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort
Although it's easier to understand these sorting techniques, but still we suggest you to first learn
about Space complexity, Time complexity and the searching algorithms, to warm up your brain
for sorting algorithms.
Data Structure is a systematic way to organize data in order to use it efficiently. Following terms
are the foundation terms of a data structure.
Interface − Each data structure has an interface. Interface represents the set of operations
that a data structure supports. An interface only provides the list of supported operations,
type of parameters they can accept and return type of these operations.
14 | P a g e
Implementation − Implementation provides the internal representation of a data
structure. Implementation also provides the definition of the algorithms used in the
operations of the data structure.
As applications are getting complex and data rich, there are three common problems that
applications face now-a-days.
To solve the above-mentioned problems, data structures come to rescue. Data can be organized
in a data structure in such a way that all items may not be required to be searched, and the
required data can be searched almost instantly.
Data Definition
Data Definition defines a particular data with the following characteristics.
Atomic − Definition should define a single concept.
Traceable − Definition should be able to be mapped to some data element.
Accurate − Definition should be unambiguous.
Clear and Concise − Definition should be understandable.
Data Object
Data Object represents an object having a data.
15 | P a g e
Data Type
Data type is a way to classify various types of data such as integer, string, etc. which determines
the values that can be used with the corresponding type of data, the type of operations that can be
performed on the corresponding type of data. There are two data types −
Primitive Data Type (Built-in Data Type)
Abstract Data Type (Derived Data Type/Non-primitive)
16 | P a g e
Stack A stack is a linear data structure in which elements can be inserted and deleted only from
one side of the list, called the top. A stack follows the LIFO (Last In First Out) principle, i.e.,
the element inserted at the last is the first element to come out. The insertion of an element into
stack is called push operation, and deletion of an element from the stack is called pop operation.
In stack we always keep track of the last element present in the list with a pointer called top.
A stack is an object (an abstract data type - ADT) that allows the following operations:
1. A pointer called TOP is used to keep track of the top element in the stack.
2. When initializing the stack, we set its value to -1 so that we can check if the stack is
empty by comparing TOP == -1.
3. On pushing an element, we increase the value of TOP and place the new element in the
position pointed to by TOP.
4. On popping an element, we return the element pointed to by TOP and reduce its value.
5. Before pushing, we check if the stack is already full
6. Before popping, we check if the stack is already empty
17 | P a g e
Algorithm for PUSH operation
To reverse a word. You push a given word to stack - letter by letter - and then pop letters
from the stack.
In browsers - The back button in a browser saves all the URLs you have visited
previously in a stack. Each time you visit a new page, it is added on top of the stack.
When you press the back button, the current URL is removed from the stack, and the
previous URL is accessed
An "undo" mechanism in text editors; this operation is accomplished by keeping all text
changes in a stack.
o Undo/Redo stacks in Excel or Word.
Language processing :
o space for parameters and local variables is created internally using a stack.
o compiler's syntax check for matching braces is implemented by using stack.
A stack of plates/books in a cupboard.
Wearing/Removing Bangles.
Support for recursion
o Activation records of method calls.
Queue: A queue is a linear data structure in which elements can be inserted only from one side
of the list called rear, and the elements can be deleted only from the other side called the front.
The queue data structure follows the FIFO (First In First Out) principle, i.e. the element inserted
at first in the list, is the first element to be removed from the list. The insertion of an element in a
queue is called an enqueue operation and the deletion of an element is called a dequeue
operation. In queue we always maintain two pointers, one pointing to the element which was
18 | P a g e
inserted at the first and still present in the list with the front pointer and the second pointer
pointing to the element inserted at the last with the rear pointer.
A queue is an object (an abstract data structure - ADT) that allows the following operations:
Working of Queue
19 | P a g e
Enqueue and Dequeue Operations
20 | P a g e
Queue follows the First In First Out (FIFO) rule - the item that goes in first is the item that
comes out first.
In the above image, since 1 was kept in the queue before 2, it is the first to be removed from the
queue as well. It follows the FIFO rule.
A queue of people at ticket-window: The person who comes first gets the ticket first. The
person who is coming last is getting the tickets in last. Therefore, it follows first-in-first-
out (FIFO) strategy of queue.
Vehicles on toll-tax bridge: The vehicle that comes first to the toll tax booth leaves the
booth first. The vehicle that comes last leaves last. Therefore, it follows first-in-first-out
(FIFO) strategy of queue.
Phone answering system: The person who calls first gets a response first from the phone
answering system. The person who calls last gets the response last. Therefore, it follows
first-in-first-out (FIFO) strategy of queue.
Luggage checking machine: Luggage checking machine checks the luggage first that
comes first. Therefore, it follows FIFO principle of queue.
Patients waiting outside the doctor's clinic: The patient who comes first visits the doctor
first, and the patient who comes last visits the doctor last. Therefore, it follows the first-
in-first-out (FIFO) strategy of queue.
Stacks Queues
Stacks are based on the LIFO
Queues are based on the FIFO principle, i.e., the element
principle, i.e., the element inserted at
inserted at the first, is the first element to come out of the
the last, is the first element to come
list.
out of the list.
Insertion and deletion in queues takes place from the
Insertion and deletion in stacks takes
opposite ends of the list. The insertion takes place at the
place only from one end of the list
rear of the list and the deletion takes place from the front
called the top.
of the list.
21 | P a g e
Stacks Queues
Insert operation is called push
Insert operation is called enqueue operation.
operation.
Delete operation is called pop
Delete operation is called dequeue operation.
operation.
In stacks we maintain only one pointer In queues we maintain two pointers to access the list. The
to access the list, called the top, which front pointer always points to the first element inserted in
always points to the last element the list and is still present, and the rear pointer always
present in the list. points to the last inserted element.
Stack is used in solving problems Queue is used in solving problems having sequential
works on recursion. processing.
Array is a container which can hold a fix number of items and these items should be of the same
type. Most of the data structures make use of arrays to implement their algorithms. Following are
the important terms to understand the concept of Array.
Array Representation
Arrays can be declared in various ways in different languages. For illustration, let's take C array
declaration.
Arrays can be declared in various ways in different languages. For illustration, let's take C array
declaration.
22 | P a g e
As per the above illustration, following are the important points to be considered.
Basic Operations
Binary Tree is a special datastructure used for data storage purposes. A binary tree has a special
condition that each node can have a maximum of two children. A binary tree has the benefits of
both an ordered array and a linked list as search is as quick as in a sorted array and insertion or
deletion operation are as fast as in linked list.
23 | P a g e
Important Terms
Path − Path refers to the sequence of nodes along the edges of a tree.
Root − The node at the top of the tree is called root. There is only one root per tree and
one path from the root node to any node.
Parent − Any node except the root node has one edge upward to a node called parent.
Child − The node below a given node connected by its edge downward is called its child
node.
Leaf − The node which does not have any child node is called the leaf node.
Subtree − Subtree represents the descendants of a node.
Visiting − Visiting refers to checking the value of a node when control is on the node.
Traversing − Traversing means passing through nodes in a specific order.
Levels − Level of a node represents the generation of a node. If the root node is at level
0, then its next child node is at level 1, its grandchild is at level 2, and so on.
keys − Key represents a value of a node based on which a search operation is to be
carried out for a node.
24 | P a g e
Types of Binary Trees
There are various types of binary trees, and each of these binary tree types has unique
characteristics. Here are each of the binary tree types in detail:
It is a special kind of a binary tree that has either zero children or two children. It means that all
the nodes in that binary tree should either have two child nodes of its parent node or the parent
node is itself the leaf node or the external node.
In other words, a full binary tree is a unique binary tree where every node except the external
node has two children. When it holds a single child, such a binary tree will not be a full binary
tree. Here, the quantity of leaf nodes is equal to the number of internal nodes plus one. The
equation is like L=I+1, where L is the number of leaf nodes, and I is the number of internal
nodes.
A complete binary tree is another specific type of binary tree where all the tree levels are filled
entirely with nodes, except the lowest level of the tree. Also, in the last or the lowest level of this
binary tree, every node should possibly reside on the left side. Here is the structure of a complete
binary tree:
25 | P a g e
3. Perfect Binary Tree
A binary tree is said to be ‘perfect’ if all the internal nodes have strictly two children, and every
external or leaf node is at the same level or same depth within a tree. A perfect binary tree having
height ‘h’ has 2h – 1 node. Here is the structure of a perfect binary tree:
A binary tree is said to be ‘balanced’ if the tree height is O(logN), where ‘N’ is the number of
nodes. In a balanced binary tree, the height of the left and the right subtrees of each node should
vary by at most one. An AVL Tree and a Red-Black Tree are some common examples of data
structure that can generate a balanced binary search tree. Here is an example of a balanced binary
tree:
A binary tree is said to be a degenerate binary tree or pathological binary tree if every internal
node has only a single child. Such trees are similar to a linked list performance-wise. Here is an
example of a degenerate binary tree:
26 | P a g e
Only two traversals are enough to provide the elements in sorted order
It is easy to pick up the maximum and minimum elements
Graph traversal also uses binary trees
Converting different postfix and prefix expressions are possible using binary trees
Binary Search tree exhibits a special behavior. A node's left child must have a value less than its
parent's value and the node's right child must have a value greater than its parent value.
The basic operations that can be performed on a binary search tree data structure, are the
following −
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all
nodes are connected via edges (links) we always start from the root (head) node. That is, we
cannot randomly access a node in a tree. There are three ways which we use to traverse a tree −
In-order Traversal
Pre-order Traversal
Post-order Traversal
27 | P a g e
Generally, we traverse a tree to search or locate a given item or key in the tree or to print all the
values it contains.
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right sub-tree.
We should always remember that every node may represent a subtree itself.
If a binary tree is traversed in-order, the output will produce sorted key values in an ascending
order.
We start from A, and following in-order traversal, we move to its left subtree B. B is also
traversed in-order. The process goes on until all the nodes are visited. The output of inorder
traversal of this tree will be −
D→B→E→A→F→C→G
Algorithm
Pre-order Traversal
28 | P a g e
In this traversal method, the root node is visited first, then the left subtree and finally the right
subtree.
We start from A, and following pre-order traversal, we first visit A itself and then move to its left
subtree B. B is also traversed pre-order. The process goes on until all the nodes are visited. The
output of pre-order traversal of this tree will be −
A→B→D→E→C→F→G
Algorithm
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First we traverse the left
subtree, then the right subtree and finally the root node.
29 | P a g e
We start from A, and following Post-order traversal, we first visit the left subtree B. B is also
traversed post-order. The process goes on until all the nodes are visited. The output of post-order
traversal of this tree will be −
D→E→B→F→G→C→A
Algorithm
30 | P a g e