Unit 1, 2, 3
Unit 1, 2, 3
Data Structure can be defined as the group of data elements which provides an efficient
way of storing and organising data in the computer so that it can be used efficiently. Some
examples of Data Structures are arrays, Linked List, Stack, Queue, etc. Data Structures are
widely used in almost every aspect of Computer Science i.e. Operating System, Compiler
Design, Artifical intelligence, Graphics and many more.
ADVANTAGES OF DS:
Data Organization: We need a proper way of organizing the data so that it can accessed
efficiently when we need that particular data. DS provides different ways of data organization
so we have options to store the data in different data structures based on the requirement.
2. Efficiency: The main reason we organize the data is to improve the efficiency. We can
store the data in arrays then why do we need linked lists and other data structures? because
when we need to perform several operation such as add, delete update and search on arrays ,
it takes more time in arrays than some of the other data structures. So the fact that we are
interested in other data structures is because of the efficiency.
3. Reusability: Data structure can be reused. Once we have implemented a data structure, the
same data structure can be used later.
4. Abstraction: Data structure hides (abstracts) the implementation details from the user. The
user access the data through an interface and enjoys the benefits of data structures, the
important complex implementation details are hidden from the user.
Example: If we need to calculate the average of the marks obtained by a student in 6 different
subject, we need to traverse the complete array of marks and calculate the total sum, then we
will devide that sum by the number of subjects i.e. 6, in order to find the average.
2) Insertion: Insertion can be defined as the process of adding the elements to the data
structure at any location.
If the size of data structure is n then we can only insert n-1 data elements into it.
3) Deletion:The process of removing an element from the data structure is called Deletion. We
can delete an element from the data structure at any random location.
pg. 1
If we try to delete an element from an empty data structure then underflow occurs.
4) Searching: The process of finding the location of an element within the data structure is
called Searching. There are two algorithms to perform searching, Linear Search and Binary
Search. We will discuss each one of them later in this tutorial.
5) Sorting: The process of arranging the data structure in a specific order is known as Sorting.
There are many algorithms that can be used to perform sorting, for example, insertion sort,
selection sort, bubble sort, etc.
6) Merging: When two lists List A and List B of size M and N respectively, of similar type of
elements, clubbed or joined to produce the third list, List C of size (M+N), then this process is
called merging.
Linear Data Structures: A data structure is called linear if all of its elements are arranged in
the linear order. In linear data structures, the elements are stored in non-hierarchical way where
each element has the successors and predecessors except the first and last element.
Arrays: An array is a collection of similar type of data items and each data item is called an
element of the array. The data type of the element may be any valid data type like char, int,
float or double.
The elements of array share the same variable name but each one carries a different index
number known as subscript. The array can be one dimensional, two dimensional or
multidimensional.
Linked List: Linked list is a linear data structure which is used to maintain a list in the memory.
It can be seen as the collection of nodes stored at non-contiguous memory locations. Each node
of the list contains a pointer to its adjacent node.
pg. 2
Stack: Stack is a linear list in which insertion and deletions are allowed only at one end,
called top. A stack is a linear data structure that stores items in a Last-In/First-Out (LIFO) or
First-In/Last-Out (FILO) manner
A stack is an abstract data type (ADT), can be implemented in most of the programming
languages. It is named as stack because it behaves like a real-world stack, for example: – piles
of plates or deck of cards etc.
Queue: Queue is a linear list in which elements can be inserted only at one end called rear and
deleted only at the other end called front.
It is an abstract data structure, similar to stack. Queue is opened at both end therefore it follows
First-In-First-Out (FIFO) methodology for storing the data items.
pg. 3
Non Linear Data Structures: This data structure does not form a sequence i.e. each item or
element is connected with two or more other items in a non-linear arrangement. The data
elements are not arranged in sequential structure.
Trees: Trees are multilevel data structures with a hierarchical relationship among its elements
known as nodes. The bottommost nodes in the herierchy are called leaf node while the topmost
node is called root node. Each node contains pointers to point adjacent nodes.
pg. 4
Tree data structure is based on the parent-child relationship among the nodes. Each node in the
tree can have more than one children except the leaf nodes whereas each node can have atmost
one parent except the root node. Trees can be classfied into many categories which will be
discussed later in this tutorial.
Graphs: Graphs can be defined as the pictorial representation of the set of elements
(represented by vertices) connected by the links known as edges. A graph is different from tree
in the sense that a graph can have cycle while the tree can not have the one.
pg. 5
Characteristics of data structures
It contains data items that can be elementary item, group item or another data structure.
2. It has a set of operations that can be performed on data items. Such as searching, insertion etc.
3. It describes the rules of how the data items are related to each other.
*****************************************************************************
*****************************************************************************
1. Linear or non-linear. This characteristic describes whether the data items are arranged
in sequential order, such as with an array, or in an unordered sequence, such as with a
graph.
Primitive types – primitive data structures, python examples. Non primitive types - Non
primitive data structures, python examples. Linear and nonlinear data structures – with
python examples.
• Integers
• Float
• Strings
• Boolean
pg. 6
Integers
You can use an integer represent numeric data, and more specifically, whole numbers from
negative infinity to infinity, like 4, 5, or -1.
Float
"Float" stands for 'floating point number'. You can use it for rational numbers, usually ending
with a decimal figure, such as 1.11 or 3.14.
pg. 7
Strings:
Strings are collections of alphabets, words or other characters. In Python, you can create
strings by enclosing a sequence of characters within a pair of single or double quotes. For
example: 'cake', "cookie", etc.
You can also apply the + operations on two or more strings to concatenate them.
Boolean
This built-in data type that can take up the values: True and False, which often makes them
interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison
expressions, just like in the following examples:
pg. 8
Non-primitive data structure:
Non-primitive data types are Array, List, Tuples, Dictionary, Sets and Files. Some of these
non-primitive data types, such as List, Tuples, Dictionaries and Sets, are in-built in Python.
Array:
An array is a special variable, which can hold more than one value at a time.
car1 = "Ford"
car2 = "Volvo"
car3 = "BMW"
Access the Elements of an Array:
x = cars[0]
output: “Ford”
Modify the value of the first array item:
pg. 9
LIST: Lists are used to store multiple items in a single variable.
List items are indexed, the first item has index [0], the second item has index [1] etc.
Characteristics of Lists
EXAMPLE :
Create a List:
TUPLE:
A tuple can be written as the collection of comma-separated (,) values enclosed with the
small () brackets. The parentheses are optional but it is good practice to use. A tuple can be
defined as follows.
T1 = (101, "Peter", 22)
Characteristics of Tuple:
EXAMPLE:
pg. 10
Dictionary:
pg. 11
Characteristics of Dictionary:
Dictionary items are ordered, changeable, and does not allow duplicates.
Dictionary items are presented in key:value pairs, and can be referred to by using the key
name.
Sets:
Each element in the set must be unique, immutable, and the sets remove the duplicate
elements.
FILES:
In Python, files are treated in two modes as text or binary. The file may be in the text
or binary format, and each line of a file is ended with the special character.
o Open a file
o Read or write - Performing operation
o Close the file
o "r" - Read - Default value. Opens a file for reading, error if the file
does not exist
o "a" - Append - Opens a file for appending, creates the file if it does not
exist
o "w" - Write - Opens a file for writing, creates the file if it does not exist
o "x" - Create - Creates the specified file, returns an error if the file
exists
pg. 12
EXAMPLE:
A stack is a linear data structure that stores items in a Last-In/First-Out (LIFO) or First-
In/Last-Out (FILO) manner.
A stack is a linear data structure where data is arranged objects on over another. It
stores the data in LIFO (Last in First Out) manner. The data is stored in a similar order
as plates are arranged one above another in the kitchen. The simple example of a stack
is the Undo feature in the editor. The Undo feature works on the last event that we
have done.
We always pick the last plate from the stack of the plate. In stack, the new element is
inserted at the one end and an element can be removed only that end.
We can perform the two operations in the stack - PUSH and POP.
The PUSH operation is when we add an element and the POP operation is when we
remove an element from the stack.
pg. 13
EXAMPLE:
QUEUE:
A queue is a linear type of data structure used to store the data in a sequentially. The concept
of queue is based on the FIFO, which means "First in First Out". It is also known as "first come
first severed". The queue has the two ends front and rear. The next element is inserted from
the rear end and removed from the front end.
pg. 14
o Enqueue - The enqueue is an operation where we add items to the queue. If
the queue is full, it is a condition of the Queue
o Dequeue - The dequeue is an operation where we remove an element from
the queue. An element is removed in the same order as it is inserted. If the
queue is empty, it is a condition of the Queue Underflow.
o Front - An element is inserted in the front end.
o Rear - An element is removed from the rear end.
# Uncommenting print(queue.pop(0))
# will raise and IndexError
# as the queue is now empty
pg. 15
Abstraction:
Abstract Data type (ADT) is a type (or class) for objects whose behaviour is defined by a
set of value and a set of operations.
The definition of ADT only mentions what operations are to be performed but not how
these operations will be implemented. It does not specify how data will be organized in
memory and what algorithms will be used for implementing the operations. It is called
“abstract” because it gives an implementation-independent view. The process of providing
only the essentials and hiding the details is known as abstraction.
An abstract data type (or ADT) is a programmer-defined data type that specifies a set of data values and
a collection of well-defined operations that can be performed on those values. Abstract data types are
defined independent of their implementation, allowing us to focus on the use of the new data type
instead of how it’s implemented. This separation is typically enforced by requiring interaction with the
abstract data type through an interface or defined set of operations. This is known as information hiding.
By hiding the implementation details and requiring ADTs to be accessed through an interface, we can
work with an abstraction and focus on what functionality the ADT provides instead of how that
functionality is implemented.
User programs interact with instances of the ADT by invoking one of the several operations defined by
its interface. The set of operations can be grouped into four categories:
ˆ Constructors: Create and initialize new instances of the ADT.
ˆ Accessors: Return data contained in an instance without modifying it.
ˆ Mutators: Modify the contents of an ADT instance.
ˆ Iterators: Process individual data components sequentially.
The implementation of the various operations are hidden inside the black box, the contents of which we
do not have to know in order to utilize the ADT.
pg. 16
Defining the ADT :
The Gregorian calendar was introduced in the year 1582 by Pope Gregory XIII to replace the
Julian calendar. The new calendar corrected for the miscalculation of the lunar year and
introduced the leap year. The official first date of the Gregorian calendar is Friday, October 15,
1582. The proleptic Gregorian calendar is an extension for accommodating earlier dates with
the first date on November 24, 4713 BC. This extension simplifies the handling of dates across
older calendars and its use can be found in many software applications.
A date represents a single day in the proleptic Gregorian calendar in which the first day starts
on November 24, 4713 BC.
Date( month, day, year ): Creates a new Date instance initialized to the given Gregorian date
which must be valid. Year 1 BC and earlier are indicated by negative year components.
dayOfWeek(): Returns the day of the week as a number between 0 and 6 with 0 representing
Monday and 6 representing Sunday
numDays( otherDate ): Returns the number of days as a positive integer between this date and
the otherDate.
isLeapYear(): Determines if this date falls in a leap year and returns the appropriate boolean
value.
advanceBy( days ): Advances the date by the given number of days. The date is incremented if
days is positive and decremented if days is negative. The date is capped to November 24, 4714
BC, if necessary.
comparable ( otherDate ): Compares this date to the otherDate to determine their logical
ordering. This comparison can be done using any of the logical operators <=, >, >=, ==, !=.
pg. 17
toString (): Returns a string representing the Gregorian date in the format mm/dd/yyyy.
Implemented as the Python operator that is automatically called via the str() constructor.
There are two common approaches to storing a date in an object. One approach stores the three
components—month, day, and year—as three separate fields. With this format, it is easy to
access the individual components, but it’s difficult to compare two dates or to compute the
number of days between two dates since the number of days in a month varies from month to
month. The second approach stores the date as an integer value representing the Julian day,
which is the number of days elapsed since the initial date of November 24, 4713 BC (using the
Gregorian calendar notation). Given a Julian day number, we can compute any of the three
Gregorian components and simply subtract the two integer values to determine which occurs
first or how many days separate the two dates. We are going to use the latter approach as it is
very common for storing dates in computer applications and provides for an easy
implementation.
DATE:
It is date type object. It uses Gregorian calendar. It has year, month, day attributes.
Date Type Object
The date objects represent a date. In the date there are Day, month and the Year part. It uses
the Gregorian Calendar. According to this calendar the day of January 1 of Year 1 is called as
the day number 1, and so on.
Some date related methods are −
Method date.date(year, month, day)
This is the constructor to create a date type object. To create a date, all arguments are required
as integer type data. The year must be in range MINYEAR & MAXYEAR. If the given date is
not valid, it will raise ValueError.
Method date.today()
This method is used to return the current local date.
Method date.fromtimestamp(timestamp)
This method is used to get the date from POSIX timestamp. If the timestamp value is out of
range, it will raise OverflowError.
Method date.fromordinal(ordinal)
This method is used to get the date from proleptic Gregorian Calendar ordinal. It is used to get
the date from the date count from January 1 of Year 1.
Method date.toordinal()
This method is used to return a date to proleptic Gregorian Calendar ordinal.
pg. 18
Method date.weekday()
This method is used to return the date of a week as an integer from the date. The Monday is 0,
Tuesday is 1 and so on.
Method date.isoformat()
This method is used to return the date as an ISO 8601 format string. The format is YYYY-
MM-DD.
pg. 19
UNIT—2 Algorithm Analysis – Space Complexity,
Time Complexity
Algorithm Analysis
Analysis of efficiency of an algorithm can be performed at two different stages, before
implementation and after implementation, as
A priori analysis − This is defined as theoretical analysis of an algorithm. Efficiency of
algorithm is measured by assuming that all other factors e.g. speed of processor, are constant
and have no effect on implementation.
A posterior analysis − This is defined as empirical analysis of an algorithm. The chosen
algorithm is implemented using programming language. Next the chosen algorithm is executed
on target computer machine. In this analysis, actual statistics like running time and space
needed are collected.
Algorithm analysis is dealt with the execution or running time of various operations involved.
Running time of an operation can be defined as number of computer instructions executed per
operation.
Space Complexity
Space complexity of an algorithm represents the amount of memory space needed the algorithm
in its life cycle.
Space needed by an algorithm is equal to the sum of the following two components
A fixed part that is a space required to store certain data and variables (i.e. simple variables and
constants, program size etc.), that are not dependent of the size of the problem.
A variable part is a space required by variables, whose size is totally dependent on the size of
the problem. For example, recursion stack space, dynamic memory allocation etc.
Time Complexity
Time Complexity of an algorithm is the representation of the amount of time required by the
algorithm to execute to completion. Time requirements can be denoted or defined as a
numerical function t(N), where t(N) can be measured as the number of steps, provided each
step takes constant time.
For example, in case of addition of two n-bit integers, N steps are taken. Consequently, the
total computational time is t(N) = c*n, where c is the time consumed for addition of two bits.
Here, we observe that t(N) grows linearly as input size increases.
pg. 20
Asymptotic notation
The word Asymptotic means approaching a value or curve arbitrarily closely (i.e., as
some sort of limit is taken).
Asymptotic notations are used to write fastest and slowest possible running time for
an algorithm. These are also referred to as 'best case' and 'worst case' scenarios respectively.
Asymptotic Notation is used to describe the running time of an algorithm .
1. Big-oh notation: Big-oh is the formal method of expressing the upper bound of an
algorithm's running time. It is the measure of the longest amount of time. The function f
(n) = O (g (n)) [read as "f of n is big-oh of g of n"] if and only if exist positive constant
c and such that f(n)<=c*g(n) for n>n0.
It measures the worst case time complexity or the longest amount of time an algorithm
can possibly take to complete.
3. Theta Notation The notation θ(n) is the formal way to express both the lower bound
and the upper bound of an algorithm's running time. It is represented as
follows c1*g(n)<=f(n)<=c2*g(n) where c1 and c2 are constants and f(n) and g(n) are
functions.
pg. 21
pg. 22
UNIT—3 ALGORITHM DESIGN STRATEGIES:
1. Brute force – Bubble sort, Selection Sort, Linear Search.
BRUTE FORCE :
This is the most basic and simplest type of algorithm. A Brute Force Algorithm is the
straightforward approach to a problem i.e., the first approach that comes to our mind on
seeing the problem.
Many problems solved in day-to-day life using the brute force strategy, for example exploring
all the paths to a nearby market to find the minimum shortest path.
BUBBLE SORT:
Bubble sort is one of the easiest and brute force sorting algorithm. It is used to
sort elements in either ascending or descending order. Every element is
compared with every other element in bubble sort.
The bubble sort uses a straightforward logic that works by repeating swapping the adjacent
elements if they are not in the right order. It compares one pair at a time and swaps if the first
element is greater than the second element; otherwise, move further to the next pair of elements
for comparison.
CODE:
def bubbleSort(arr):
n = len(arr)
bubbleSort(arr)
pg. 23
for i in range(len(arr)):
print("%d" % arr[i], end=" ")
OUTPUT:
SELECTION SORT:
The selection sort algorithm sorts an array by repeatedly finding the minimum
element (considering ascending order) from unsorted part and putting it at the beginning.
The algorithm maintains two subarrays in a given array.
1) The subarray which is already sorted.
2) Remaining subarray which is unsorted.
pg. 24
In every iteration of selection sort, the minimum element (considering ascending order)
from the unsorted subarray is picked and moved to the sorted subarray.
Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-
based algorithm in which the list is divided into two parts, the sorted part at the left end and
the unsorted part at the right end.
CODE:
LINEAR SEARCH:
pg. 25
Linear search is a method of finding elements within a list. It is also called a sequential
search. It is the simplest searching algorithm because it searches the desired element in a
sequential manner.
CODE:
# Searching an element in a list/array in python
# can be simply done using \'in\' operator
# Example:
# if x in arr:
# print arr.index(x)
for i in range(len(arr)):
if arr[i] == x:
return i
return -1
EXAMPLE:
pg. 26
DECREASE AND CONQUER:
Decrease or reduce problem instance to smaller instance of the same problem and extend
solution.
Conquer the problem by solving smaller instance of the problem.
Extend solution of smaller instance to obtain solution to original problem .
Basic idea of the decrease-and-conquer technique is based on exploiting the relationship
between a solution to a given instance of a problem and a solution to its smaller instance.
This approach is also known as incremental or inductive approach.
pg. 27
INSERTION SORT:
Insertion sort is a simple sorting algorithm that works similar to the way you sort playing
cards in your hands. The array is virtually split into a sorted and an unsorted part. Values
from the unsorted part are picked and placed at the correct position in the sorted part.
It is an in-place and stable algorithm that is more beneficial for nearly-sorted or fewer
elements.
The insertion sort algorithm is not so fast because of it uses nested loop for sort the elements.
o In-place: The in-place algorithm requires additional space without caring for the input
size of the collection. After performing the sorting, it rewrites the original memory
locations of the elements in the collection.
o Stable: The stable is a term that manages the relative order of equal objects from the
initial array.
CODE:
# Python program for implementation of Insertion Sort
key = arr[i]
pg. 28
arr = [12, 11, 13, 5, 6]
insertionSort(arr)
for i in range(len(arr)):
print ("% d" % arr[i])
pg. 29