Data Structures-UNIT I
Data Structures-UNIT I
Data structures are the fundamental building blocks of computer programming. They define
how data is organized, stored, and manipulated within a program. Understanding data
structures is very important for developing efficient and effective algorithms. In this tutorial,
we will explore the most commonly used data structures, including arrays, linked lists,
stacks, queues, trees, and graphs.
What is Data Structure?
A data structure is a particular way of organising data in a computer so that it can be used
effectively. The idea is to reduce the space and time complexities of different tasks.
The choice of a good data structure makes it possible to perform a variety of critical
operations effectively. An efficient data structure also uses minimum memory space and
execution time to process the structure. A data structure is not only used for organising the
data. It is also used for processing, retrieving, and storing data. There are different basic and
advanced types of data structures that are used in almost every program or software system
that has been developed. So we must have good knowledge of data structures.
Arrays find applications in sorting algorithms, dynamic programming, and implementing other
data structures.
Linked lists excel in memory management, implementing dynamic structures, and polynomial
manipulation.
Stacks are essential for expression evaluation, function call management, and backtracking
algorithms.
Queues play a crucial role in job scheduling, breadth-first search algorithms, and print job
management.
Trees are utilized in file systems, database indexing, representing hierarchical relationships,
and decision-making processes.
Graphs are versatile structures used in social network analysis, network routing, web page
ranking, and bioinformatics.
Hash tables offer fast retrieval in database indexing, caching mechanisms, symbol tables, and
key-value stores.
Introduction to Algorithms
Definition of Algorithm
The word Algorithm means ” A set of finite rules or instructions to be followed in
calculations or other problem-solving operations ”
Or
” A procedure for solving a mathematical problem in a finite number of steps that
frequently involves recursive operations”.
Therefore Algorithm refers to a sequence of finite steps to solve a particular problem.
These are just a few examples of the many applications of algorithms. The use of
algorithms is continually expanding as new technologies and fields emerge, making it a
vital component of modern society.
Algorithms can be simple and complex depending on what you want to achieve.
It can be understood by taking the example of cooking a new recipe. To cook a new recipe,
one reads the instructions and steps and executes them one by one, in the given sequence.
The result thus obtained is the new dish is cooked perfectly. Every time you use your
phone, computer, laptop, or calculator you are using Algorithms. Similarly, algorithms help
to do a task in programming to get the expected output.
The Algorithm designed are language-independent, i.e. they are just plain instructions that
can be implemented in any language, and yet the output will be the same, as expected.
What is the need for algorithms?
1. Algorithms are necessary for solving complex problems efficiently and
effectively.
2. They help to automate processes and make them more reliable, faster, and easier
to perform.
3. Algorithms also enable computers to perform tasks that would be difficult or
impossible for humans to do manually.
4. They are used in various fields such as mathematics, computer science,
engineering, finance, and many others to optimize processes, analyse data, make
predictions, and provide solutions to problems.
What are the Characteristics of an Algorithm?
As one would not follow any written instructions to cook the recipe, but only the standard
one. Similarly, not all written instructions for programming are an algorithm. For some
instructions to be an algorithm, it must have the following characteristics:
Clear and Unambiguous: The algorithm should be unambiguous. Each of its
steps should be clear in all aspects and must lead to only one meaning.
Well-Defined Inputs: If an algorithm says to take inputs, it should be well-
defined inputs. It may or may not take input.
Well-Defined Outputs: The algorithm must clearly define what output will be
yielded and it should be well-defined as well. It should produce at least 1 output.
Finite-ness: The algorithm must be finite, i.e. it should terminate after a finite
time.
Feasible: The algorithm must be simple, generic, and practical, such that it can
be executed with the available resources. It must not contain some future
technology or anything.
Language Independent: The Algorithm designed must be language-
independent, i.e. it must be just plain instructions that can be implemented in
any language, and yet the output will be the same, as expected.
Input: An algorithm has zero or more inputs. Each that contains a fundamental
operator must accept zero or more inputs.
Output: An algorithm produces at least one output. Every instruction that
contains a fundamental operator must accept zero or more inputs.
Definiteness: All instructions in an algorithm must be unambiguous, precise,
and easy to interpret. By referring to any of the instructions in an algorithm one
can clearly understand what is to be done. Every fundamental operator in
instruction must be defined without any ambiguity.
Finiteness: An algorithm must terminate after a finite number of steps in all test
cases. Every instruction which contains a fundamental operator must be
terminated within a finite amount of time. Infinite loops or recursive functions
without base conditions do not possess finiteness.
Effectiveness: An algorithm must be developed by using very basic, simple, and
feasible operations so that one can trace it out by using just paper and pencil.
Properties of Algorithm:
It should terminate after a finite time.
It should produce at least one output.
It should take zero or more input.
It should be deterministic means giving the same output for the same input case.
Every step in the algorithm must be effective i.e. every step should do some
work.
Pseudocode:
A Pseudocode is defined as a step-by-step description of an algorithm. Pseudocode does
not use any programming language in its representation instead it uses the simple
English language text as it is intended for human understanding rather than machine
reading.
Pseudocode is the intermediate state between an idea and its implementation(code) in a
high-level language.
What is the need for Pseudocode
Pseudocode is an important part of designing an algorithm, it helps the programmer in
planning the solution to the problem as well as the reader in understanding the approach to
the problem. Pseudocode is an intermediate state between algorithm and program that plays
supports the transition of the algorithm into the program.
An algorithm only uses simple English Pseudocode also uses reserved keywords
words like if-else, for, while, etc.
A Pseudocode is a step-by-step
A Flowchart is pictorial representation of description of an algorithm in code like
flow of an algorithm. structure using plain English text.
This is a way of visually representing data, These are fake codes as the word pseudo
these are nothing but the graphical means fake, using code like structure but
representation of the algorithm for a better plain English text instead of
understanding of the code programming language
What is a flowchart?
A flowchart is a diagram that depicts a process, system or computer algorithm. They are
widely used in multiple fields to document, study, plan, improve and communicate often
complex processes in clear, easy-to-understand diagrams. Flowcharts, sometimes spelled as
flow charts, use rectangles, ovals, diamonds and potentially numerous other shapes to define
the type of step, along with connecting arrows to define flow and sequence. They can range
from simple, hand-drawn charts to comprehensive computer-drawn diagrams depicting
multiple steps and routes. If we consider all the various forms of flowcharts, they are one of
the most common diagrams on the planet, used by both technical and non-technical people in
numerous fields. Flowcharts are sometimes called by more specialized names such as Process
Flowchart, Process Map, Functional Flowchart, Business Process Mapping, Business Process
Modeling and Notation (BPMN), or Process Flow Diagram (PFD). They are related to other
popular diagrams, such as Data Flow Diagrams (DFDs) and Unified Modeling Language
(UML) Activity Diagrams.
History
Flowcharts to document business processes came into use in the 1920s and ‘30s. In 1921,
industrial engineers Frank and Lillian Gilbreth introduced the “Flow Process Chart” to the
American Society of Mechanical Engineers (ASME). In the early 1930s, industrial engineer
Allan H. Morgensen used Gilbreth’s tools to present conferences on making work more
efficient to business people at his company. In the 1940s, two Morgensen students, Art
Spinanger and Ben S. Graham, spread the methods more widely. Spinanger introduced the
work simplification methods to Procter and Gamble. Graham, a director at Standard Register
Industrial, adapted flow process charts to information processing. In 1947, ASME adopted a
symbol system for Flow Process Charts, derived from the Gilbreths’ original work.
Also in the late ‘40s, Herman Goldstine and John Van Neumann used flowcharts to develop
computer programs, and diagramming soon became increasingly popular for computer
programs and algorithms of all kinds. Flowcharts are still used for programming today,
although pseudocode, a combination of words and coding language meant for human reading,
is often used to depict deeper levels of detail and get closer to a final product.
Flowchart symbols
Here are some of the common flowchart symbols. For a more comprehensive list, see our
full flowchart symbols page.
Terminal/Terminator
Process
Decision
Document
Data, or Input/Output
Stored Data
Flow Arrow
Comment or Annotation
Predefined process
On-page connector/reference
Off-page connector/reference
Want to create a flowchart of your own? Try Lucidchart. It's fast, easy, and totally free.
Create a flowchart
Often, programmers may write pseudocode, a combination of natural language and computer
language able to be read by people. This may allow greater detail than the flowchart and
serve either as a replacement for the flowchart or as a next step to actual code.
Beyond computer programming, flowcharts have many uses in many diverse fields.
In any field:
Identify bottlenecks, redundancies and unnecessary steps in a process and improve it.
Education:
Analysis of Algorithm:
In this chapter, we will discuss the need for analysis of algorithms and how to
choose a better algorithm for a particular problem as one computational problem
can be solved by different algorithms.
Algorithms are often quite different from one another, though the objective of these
algorithms are the same. For example, we know that a set of numbers can be sorted
using different algorithms. Number of comparisons performed by one algorithm
may vary with others for the same input. Hence, time complexity of those
algorithms may differ. At the same time, we need to calculate the memory space
required by each algorithm.
Linear Data Structures are a type of data structure in computer science where data
elements are arranged sequentially or linearly. Each element has a previous and
next adjacent, except for the first and last elements.
Order Preservation: The order in which elements are added to the data structure is
preserved. This means that the first element added will be the first one to be accessed
or removed, and the last element added will be the last one to be accessed or removed.
Fixed or Dynamic Size: Linear data structures can have either fixed or dynamic
sizes. Arrays typically have a fixed size when they are created, while other structures
like linked lists, stacks, and queues can dynamically grow or shrink as elements are
added or removed.
Linear data structures are commonly used for organising and manipulating data in a
sequential fashion. Some of the most common linear data structures include:
1. Array
Homogeneous Elements: All elements within an array must be of the same data type.
Random Access: Arrays provide constant-time (O(1)) access to elements. This means
that regardless of the size of the array, it takes the same amount of time to access any
element based on its index.
Types of arrays:
One-Dimensional Array
Multi-Dimensional Array: Arrays can have more than two dimensions, leading to
multi-dimensional arrays. These are used when data needs to be organized in a multi-
dimensional grid.
Multi-Dimensional Array
Searching: Linear Search takes O(n) time which is useful for unsorted data and
Binary Search takes O(logn) time which is useful for sorted data.
2-D array
To find the address of any element in a 2-Dimensional array there are the following two
ways-
1. Row Major Order
2. Column Major Order
1. Row Major Order:
Row major ordering assigns successive elements, moving across the rows and then down the
next row, to successive memory locations. In simple language, the elements of an array are
stored in a Row-Wise fashion.
To find the address of the element using row-major order uses the following formula:
Address of A[I][J] = B + W * ((I – LR) * N + (J – LC))
I = Row Subset of an element whose address to be found,
J = Column Subset of an element whose address to be found,
B = Base address,
W = Storage size of one element store in an array(in byte),
LR = Lower Limit of row/start row index of the matrix(If not given assume it as zero),
LC = Lower Limit of column/start column index of the matrix(If not given assume it as zero),
N = Number of column given in the matrix.
Example: Given an array, arr[1………10][1………15] with base value 100 and the size of
each element is 1 Byte in memory. Find the address of arr[8][6] with the help of row-major
order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array W = 1 Bytes
Row Subset of an element whose address to be found I = 8
Column Subset of an element whose address to be found J = 6
Lower Limit of row/start row index of matrix LR = 1
Lower Limit of column/start column index of matrix = 1
Number of column given in the matrix N = Upper Bound – Lower Bound + 1
= 15 – 1 + 1
= 15
Formula:
Address of A[I][J] = B + W * ((I – LR) * N + (J – LC))
Solution:
Address of A[8][6] = 100 + 1 * ((8 – 1) * 15 + (6 – 1))
= 100 + 1 * ((7) * 15 + (5))
= 100 + 1 * (110)
Address of A[I][J] = 210
2. Column Major Order:
If elements of an array are stored in a column-major fashion means moving across the
column and then to the next column then it’s in column-major order. To find the address of
the element using column-major order use the following formula:
Address of A[I][J] = B + W * ((J – LC) * M + (I – LR))
I = Row Subset of an element whose address to be found,
J = Column Subset of an element whose address to be found,
B = Base address,
W = Storage size of one element store in any array(in byte),
LR = Lower Limit of row/start row index of matrix(If not given assume it as zero),
LC = Lower Limit of column/start column index of matrix(If not given assume it as zero),
M = Number of rows given in the matrix.
Example: Given an array arr[1………10][1………15] with a base value of 100 and the size
of each element is 1 Byte in memory find the address of arr[8][6] with the help of column-
major order.
Solution:
Given:
Base address B = 100
Storage size of one element store in any array W = 1 Bytes
Row Subset of an element whose address to be found I = 8
Column Subset of an element whose address to be found J = 6
Lower Limit of row/start row index of matrix LR = 1
Lower Limit of column/start column index of matrix = 1
Number of Rows given in the matrix M = Upper Bound – Lower Bound + 1
= 10 – 1 + 1
= 10
Formula: used
Address of A[I][J] = B + W * ((J – LC) * M + (I – LR))
Address of A[8][6] = 100 + 1 * ((6 – 1) * 10 + (8 – 1))
= 100 + 1 * ((5) * 10 + (7))
= 100 + 1 * (57)
Address of A[I][J] = 157
From the above examples, it can be observed that for the same position two different address
locations are obtained that’s because in row-major order movement is done across the rows
and then down to the next row, and in column-major order, first move down to the first
column and then next column. So both the answers are right.
So it’s all based on the position of the element whose address is to be found for some cases
the same answers is also obtained with row-major order and column-major order and for
some cases, different answers are obtained.
Calculate the address of any element in the 3-D Array:
A 3-Dimensional array is a collection of 2-Dimensional arrays. It is specified by using three
subscripts:
1. Block size
2. Row size
3. Column size
More dimensions in an array mean more data can be stored in that array.
Example:
3-D array
To find the address of any element in 3-Dimensional arrays there are the following two ways-
Row Major Order
Column Major Order
1. Row Major Order:
To find the address of the element using row-major order, use the following formula:
Address of A[i][j][k] = B + W *(P* N * (i-x) + P*(j-y) + (k-z))
Here:
B = Base Address (start address)
W = Weight (storage size of one element stored in the array)
M = Row (total number of rows)
N = Column (total number of columns)
P = Width (total number of cells depth-wise)
x = Lower Bound of Row
y = Lower Bound of Column
z = Lower Bound of Width
Example: Given an array, arr[1:9, -4:1, 5:10] with a base value of 400 and the size of each
element is 2 Bytes in memory find the address of element arr[5][-1][8] with the help of row-
major order?
Solution:
Given:
Block Subset of an element whose address to be found I = 5
Row Subset of an element whose address to be found J = -1
Column Subset of an element whose address to be found K = 8
Base address B = 400
Storage size of one element store in any array(in Byte) W = 2
Lower Limit of blocks in matrix x = 1
Lower Limit of row/start row index of matrix y = -4
Lower Limit of column/start column index of matrix z = 5
M(row) = Upper Bound – Lower Bound + 1 = 1 – (-4) + 1 = 6
N(Column)= Upper Bound – Lower Bound + 1 = 10 – 5 + 1 = 6
Formula used:
Address of[I][J][K] =B + W (M * N(i-x) + N *(j-y) + (k-z))
Solution:
Address of arr[5][-1][8] = 400 + 2 * {[6 * 6 * (5 – 1)] + 6 * [(-1 + 4)]} + [8 – 5]
= 400 + 2 * (6*6*4)+(6*3)+3
= 400 + 2 * (165)
= 730
2. Column Major Order:
To find the address of the element using column-major order, use the following formula:1
Address of A[i][j][k]= B + W(M * N(i – x) + M *(k – z) + (j – y))
Here:
B = Base Address (start address)
W = Weight (storage size of one element stored in the array)
M = Row (total number of rows)
N = Column (total number of columns)
P = Width (total number of cells depth-wise)
x = Lower Bound of block (first subscipt)
y = Lower Bound of Row
z = Lower Bound of Column
Example: Given an array arr[1:8, -5:5, -10:5] with a base value of 400 and the size of each
element is 4 Bytes in memory find the address of element arr[3][3][3] with the help of
column-major order?
Solution:
Given:
Row Subset of an element whose address to be found I = 3
Column Subset of an element whose address to be found J = 3
Block Subset of an element whose address to be found K = 3
Base address B = 400
Storage size of one element store in any array(in Byte) W = 4
Lower Limit of blocks in matrix x = 1
Lower Limit of row/start row index of matrix y = -5
Lower Limit of column/start column index of matrix z = -10
M (row)= Upper Bound – Lower Bound + 1 = 5 +5 + 1 = 11
N (column)= Upper Bound – Lower Bound + 1 = 5 + 10 + 1 = 16
Formula used:
Address of A[i][j][k]=B+W×(M×P×(k−z)+M×(j−y)+(i−x))
Solution:
Address of arr[3][3][3] = 400 + 4 * ((11*16*(3-1)+11*(3-(-10)+(3-(-5)))
= 400 + 4 * ((176*2 + 11*13 + 8)
= 400 + 4 * (503)
= 400 + 2012
= 2412
What is a String?
String is considered a data type in general and is typically represented as arrays of bytes (or
words) that store a sequence of characters. String is defined as an array of characters. The
difference between a character array and a string is the string is terminated with a special
character ‘\0’. Some examples of strings are: “geeks” , “for”, “geeks”, “GeeksforGeeks”,
“Geeks for Geeks”, “123Geeks”, “@123 Geeks”.
String Data Type:
In most programming languages, strings are treated as a distinct data type. This means that
strings have their own set of operations and properties. They can be declared and manipulated
using specific string-related functions and methods.
Note: In some languages, strings are implemented as arrays of characters, making them a
derived data type.
String Operations:
Strings support a wide range of operations, including concatenation, substring extraction,
length calculation, and more. These operations allow developers to manipulate and process
string data efficiently.
Below are fundamental operations commonly performed on strings in programming.
Concatenation: Combining two strings to create a new string.
Length: Determining the number of characters in a string.
Access: Accessing individual characters in a string by index.
Substring: Extracting a portion of a string.
Comparison: Comparing two strings to check for equality or order.
Search: Finding the position of a specific substring within a string.
Modification: Changing or replacing characters within a string.
Applications of String:
Text Processing: Strings are extensively used for text processing tasks such as
searching, manipulating, and analyzing textual data.
Data Representation: Strings are fundamental for representing and manipulating
data in formats like JSON, XML, and CSV.
Encryption and Hashing: Strings are commonly used in encryption and hashing
algorithms to secure sensitive data and ensure data integrity.
Database Operations: Strings are essential for working with databases, including
storing and querying text-based data.
Web Development: Strings are utilized in web development for constructing URLs,
handling form data, processing input from web forms, and generating dynamic
content.
String Functions
C also has many useful string functions, which can be used to perform
certain operations on strings.
To use them, you must include the <string.h> header file in your program:
#include <string.h>
String Length
For example, to get the length of a string, you can use the strlen() function:
Example
char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
printf("%d", strlen(alphabet));
In the Strings chapter, we used sizeof to get the size of a string/array. Note
that sizeof and strlen behaves differently, as sizeof also includes
the \0 character when counting:
Example
char alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
printf("%d", strlen(alphabet)); // 26
printf("%d", sizeof(alphabet)); // 27
It is also important that you know that sizeof will always return the memory
size (in bytes), and not the actual string length:
Example
char alphabet[50] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
printf("%d", strlen(alphabet)); // 26
printf("%d", sizeof(alphabet)); // 50
Concatenate Strings
To concatenate (combine) two strings, you can use the strcat() function:
Example
char str1[20] = "Hello ";
char str2[] = "World!";
// Print str1
printf("%s", str1);
Note that the size of str1 should be large enough to store the result of the
two strings combined (20 in our example).
Copy Strings
To copy the value of one string to another, you can use the strcpy() function:
Example
char str1[20] = "Hello World!";
char str2[20];
// Copy str1 to str2
strcpy(str2, str1);
// Print str2
printf("%s", str2);
Note that the size of str2 should be large enough to store the copied string
(20 in our example).
Compare Strings
To compare two strings, you can use the strcmp() function.
It returns 0 if the two strings are equal, otherwise a value that is not 0:
Example
char str1[] = "Hello";
char str2[] = "Hello";
char str3[] = "Hi";
An array is a data structure that stores a collection of elements, all of the same data type, in a
contiguous block of memory. Each element in the array is identified by an index, which is a
numerical value that represents the position of the element in the array. The first element in
the array has an index of 0, the second element has an index of 1, and so on.
Arrays can be used to store a variety of data types, including integers, floats, characters, and
even user-defined types. They are commonly used to store collections of similar data, such as
a list of numbers, a list of strings, or a list of objects.
Though, the array got its own set of advantages and disadvantages.
Advantages of Arrays
Below are some advantages of the array:
In an array, accessing an element is very easy by using the index number.
The search process can be applied to an array easily.
2D Array is used to represent matrices.
For any reason a user wishes to store multiple values of similar type then the Array
can be used and utilized efficiently.
Arrays have low overhead.
C provides a set of built-in functions for manipulating arrays, such as sorting and
searching.
C supports arrays of multiple dimensions, which can be useful for representing
complex data structures like matrices.
Arrays can be easily converted to pointers, which allows for passing arrays to
functions as arguments or returning arrays from functions.
Disadvantages of Arrays
Array size is fixed: The array is static, which means its size is always fixed. The
memory which is allocated to it cannot be increased or decreased.
STACKS
Stack Data Structure is a linear data structure that follows LIFO (Last In First Out)
Principle , so the last element inserted is the first to be popped out. In this article, we will
cover all the basics of Stack, Operations on Stack, its implementation, advantages,
disadvantages which will help you solve all the problems based on Stack.
What is Stack Data Structure?
Stack is a linear data structure based on LIFO(Last In First Out) principle in which the
insertion of a new element and removal of an existing element takes place at the same end
represented as the top of the stack.
To implement the stack, it is required to maintain the pointer to the top of the stack , which
is the last element to be inserted because we can access the elements only on the top of the
stack.
LIFO(Last In First Out) Principle in Stack Data Structure:
This strategy states that the element that is inserted last will come out first. You can take a
pile of plates kept on top of each other as a real-life example. The plate which we put last is
on the top and since we remove the plate that is at the top, we can say that the plate that was
put last comes out first.
Representation of Stack Data Structure:
Stack follows LIFO (Last In First Out) Principle so the element which is pushed last is
popped first.
Example: A + B, (C - D) etc.
All these expressions are in infix notation because the operator comes between the operands.
Prefix Notation
The prefix notation places the operator before the operands. This notation was introduced by
the Polish mathematician and hence often referred to as polish notation.
Postfix Notation
he postfix notation places the operator after the operands. This notation is just the reverse of
Polish notation and also known as Reverse Polish notation.
Example: AB +, CD+, etc.
All these expressions are in postfix notation because the operator comes after the operands.
A* B *AB AB*