0% found this document useful (0 votes)
16 views125 pages

1 Intro

Uploaded by

Govind 025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views125 pages

1 Intro

Uploaded by

Govind 025
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

DATA STRUCTURE

Prof. ACHAL KAUSHIK


Paper Code(s): CIC-209 L P C
Paper: Data Structures 4 - 4
Marking Scheme:
1. Teachers Continuous Evaluation: 25 marks
2. Term end Theory Examinations: 75 marks
Instructions for paper setter:
1. There should be 9 questions in the term end examinations question paper.
2. The first (1st) question should be compulsory and cover the entire syllabus. This question should be objective, single line answers or short
answer type question of total 15 marks.
3. Apart from question 1 which is compulsory, rest of the paper shall consist of 4 units as per the syllabus. Every unit shall have two questions
covering the corresponding unit of the syllabus. However, the student shall be asked to attempt only one of the two questions in the unit.
Individual questions may contain upto 5 sub-parts / sub-questions. Each Unit shall have a marks weightage of 15.
4. The questions are to be framed keeping in view the learning outcomes of the course / paper. The standard / level of the questions to be asked
should be at the level of the prescribed textbook.
5. The requirement of (scientific) calculators / log-tables / data – tables may be specified if required.
Course Objectives :
1. To introduce basics of Data structures (Arrays, strings, linked list etc.)
2. To understand the concepts of Stacks, Queues and Trees, related operations and their implementation
3. To understand sets, heaps and graphs
4. To introduce various Sorting and searching Algorithms
Course Outcomes (CO)
CO 1 To be able to understand difference between structured data and data structure
CO 2 To be able to create common basic data structures and trees
CO 3 To have a knowledge of sets, heaps and graphs
CO 4 To have basic knowledge of sorting and searching algorithms
Course Outcomes (CO) to Programme Outcomes (PO) mapping (scale 1: low, 2: Medium, 3: High)
PO01 PO02 PO03 PO04 PO05 PO06 PO07 PO08 PO09 PO10 PO11 PO12

CO 1 3 2 2 2 3 - - - 2 2 2 3
CO 2 3 2 2 2 3 - - - 2 2 2 3
CO 3 3 2 2 2 3 - - - 2 2 2 3
CO 4 3 2 2 2 3 - - - 2 2 2 3
UNIT – I
Overview of data structure, Basics of Algorithm Analysis including Running Time Calculations, Abstract Data Types, Arrays, Arrays and Pointers,
Multidimensional Array, String processing, General Lists and List ADT, List manipulations, Single, double and circular lists. Stacks and Stack ADT,
Stack Manipulation, Prefix, infix and postfix expressions, recursion. Queues and Queue ADT, Queue manipulation.

UNIT – II
Sparse Matrix Representation (Array and Link List representation) and arithmetic (addition, subtraction and multiplication), polynomials and
polynomial arithmetic.
Trees, Properties of Trees, Binary trees, Binary Tree traversal, Tree manipulation algorithms, Expression trees and their usage, binary search trees,
AVL Trees, Heaps and their implementation, Priority Queues, B-Trees, B* Tree, B+ Tree

UNIT – III
Sorting concept, order, stability, Selection sorts (straight, heap), insertion sort (Straight Insertion, Shell sort), Exchange Sort (Bubble, quicksort),
Merge sort (External Sorting) (Natural merge, balanced merge and polyphase merge). Searching – List search, sequential search, binary search,
hashing methods, collision resolution in hashing.

UNIT – IV
Disjoint sets representation, union find algorithm, Graphs, Graph representation, Graph Traversals and their implementations (BFS and DFS).
Minimum Spanning Tree algorithms, Shortest Path Algorithms

Textbook(s):
1. Richard Gilberg , Behrouz A. Forouzan, “Data Structures: A Pseudocode Approach with C, 2nd Edition, Cengage Learning, Oct 2004
2. E. Horowitz, S. Sahni, S. Anderson-Freed, "Fundamentals of Data Structures in C", 2nd Edition, Silicon Press (US), 2007.

References:
1. Mark Allen Weiss, “Data Structures and Algorithm Analysis in C”, 2nd Edition, Pearson, September, 1996
2. Robert Kruse, “Data Structures and Program Design in C”, 2nd Edition, Pearson, November, 1990
3. Seymour Lipschutz, “Data Structures with C (Schaum's Outline Series)”, McGrawhill, 2017
4. A. M. Tenenbaum, “Data structures using C”. Pearson Education, India, 1st Edition 2003.
5. Weiss M.A., “Data structures and algorithm analysis in C++”, Pearson Education, 2014.
Data Structure

• DATA
• Dictionary – sorted list of words
• Map – 2D plane, position, direction
• Cash book – Tabular, cash-in cash-out
DATA STRUCTURING
• Organizing DATA to solve the problem
• Established some level of order
• Arrange things to get them quickly

• Data increased so require some management


• Linking
• Ordering
• Grouping etc.
Data Structure
• DataStructure is a way of collecting and organizing data in such a
way that we can perform operations on these data in an effective
way
• Insimple language, Data Structures are structures programmed to
store ordered data, so that various operations can be performed on
it easily
Why DS?
• Complex and data rich applications faces three common problems
now-a-days:
• Data Search −
• Processor speed −
• Multiple requests −
• Data can be organized in a data structure in such a way that all
items may not be required to be searched and required data can be
searched almost instantly
ALGORITHM

• An algorithm is a finite set of instructions or logic, written in order, to


accomplish a certain predefined task
• Algorithmic solution of the problem
INPUT → ALGORITHM → OUTPUT
• May be different algorithms possible
• An algorithm is designed to achieve optimum solution for given
problem
• Efficient algorithm
• Space used
• Running time
• PSEUDOCODE
PSEUDOCODE
• A mixture of natural language and high-level programming concepts that
describe the main idea behind a generic implementation of a data structure
or algorithm
Pseudocode = part English + part structured code

• Pseudocode: More structured but formal


Categories of Algorithms
• Algorithm is a step-by-step procedure, which defines a set of
instructions to be executed in certain order to get the desired output
• From data structure point of view, following are some important
categories of algorithms −
• Search − to search an item in a data structure.
• Sort − to sort items in certain order
• Insert − to insert item in a data structure
• Update − to update an existing item in a data structure
• Delete − to delete an existing item from a data structure
Characteristics of an Algorithm
• An algorithm should have mentioned characteristics −
• Unambiguous − Each of its steps (or phases), must lead to only one meaning
• Input − should have 0 or more well defined inputs.
• Output − should have 1 or more well defined outputs.
• Finiteness − must terminate after a finite number of steps.
• Feasibility − Should be feasible with available resources.
• Independent − independent of any programming code.
Analysis of Algorithm
• Primitive operations: Low level operations independent of programming
language
• Data movement ( Assignment)
• Control (Branch, subroutine call, return)
• Arithmetic or logical operation (+, -, > )

• By inspecting we can count the number of primitive operations executed


by an algorithm
Algorithm Complexity
• Two main factors to decide the efficiency of Algorithm .
• Time Factor − The time is measured by counting the number of key operations
such as comparisons in sorting algorithm
• Space Factor − The space is measured by counting the maximum memory space
required by the algorithm.

• The complexity of an algorithm f(n) gives the running time and / or storage space
required by the algorithm in terms of n as the size of input data.
Time Complexity
• Time Complexity of an algorithm represents the amount of time
required by the algorithm to run to completion.
• Time requirements can be defined as a numerical function T(n),
where T(n) can be measured as the number of steps, provided each
step consumes constant time.
Space Complexity
• Space complexity represents the amount of memory space required
by the algorithm in its life cycle
• Instruction Space : space required to store the executable version of
the program (number of lines of code)
• Data Space : space required to store constants and variables
• Environment Space : space required to store the environment
information needed to resume the suspended function
Algorithm Types
• Backtracking algorithms
• Branch and bound algorithms
• Brute force algorithms
• Divide and conquer algorithms
• Dynamic programming algorithms
• Greedy algorithms
• Randomized algorithms
• Simple recursive algorithms
Greedy algorithms
• In greedy algorithm approach, decisions are made from the
given solution domain.
• the closest solution that seems to provide optimum solution is
chosen.
• Greedy algorithms tries to find localized optimum solution
which may eventually land in globally optimized solutions.
• But generally greedy algorithms do not provide globally
optimized solutions.
• Counting Coins
Greedy algorithms
• For currency system,
• where we have coins of 1, 7, 10 value,
• counting coins for value 18 will be absolutely optimum (10, 7, 1)
• For count like 15 may use more coins then necessary.
• For example − greedy approach will use
• 10 + 1 + 1 + 1 + 1 + 1 total 6 coins.
• Where the same problem could be solved by using only 3 coins (7 +
7 + 1)
• Hence, we may conclude that greedy approach picks immediate
optimized solution and may fail where global optimization is major
concern.
Examples
• Most networking algorithms uses greedy approach. Here is the list
of few of them −
• Travelling Salesman Problem
• Prim's Minimal Spanning Tree Algorithm
• Kruskal's Minimal Spanning Tree Algorithm
• Dijkstra's Minimal Spanning Tree Algorithm
• Graph - Map Coloring
• Graph - Vertex Cover
• Knapsack Problem
• Job Scheduling Problem
Divide-and-Conquer approach
• In divide and conquer approach, the problem in hand, is divided
into smaller sub-problems and then each problem is solved
independently.
• Keep dividing the sub-problems into even smaller sub-problems, till
no more dividation is possible "atomic"
• The solution of all sub-problems is finally merged in order to obtain
the solution of original problem.
Divide-and-Conquer approach
• Broadly, divide-and-conquer approach as three step process:
• Divide/Break
• Conquer/Solve
• Merge/Combine
• This algorithmic approach works recursively and conquer & merge
steps works so close that they appear as one.
Examples

• The following computer algorithms are based on divide-


and-conquer programming approach −
• Merge Sort
• Quick Sort
• Binary Search
• Strassen's Matrix Multiplication
• Closest pair (points)
Dynamic programming approach
• Dynamic programming approach is similar to divide and conquer in
• breaking down the problem in smaller and yet smaller possible sub-problems
• results of these smaller sub-problems are remembered and used for similar or
overlapping sub-problems.

• Dynamic algorithms use memorization.


Dynamic programming approach
• The following computer problems can be solved using dynamic
programming approach −
• Fibonacci number series
• Knapsack problem
• Tower of Hanoi
• All pair shortest path by Floyd-Warshall
• Shortest path by Dijkstra
• Project scheduling
Comparison

• Divide and conquer algorithms: solutions are combined


to achieve overall solution
• Dynamic algorithms : overall optimization
• Greedy algorithms: local optimization is addressed
Algorithm Efficiency
• Problem – many solutions (Algorithms)
• Check which one is more efficient
• Algorithm’s efficiency as a function of number of elements to be
processed
f(n) = efficiency
• Function is linear (no loops or recursion)
• Efficiency is a function of the number of instructions
• Depends on the speed of the computer
Linear Loops
• Check how many times the body of the loop repeated
for (i =0; i < 1000; i ++)
application code
• Answer is 1000 since the number of iteration is directly
proportional to loop factor i.e. 1000
f(n) = n
Linear Loops
• Check how many times the body of the loop repeated
for (i =0; i < 1000; i + = 2)
application code
• Answer is 500 since the number of iteration is half the loop factor
i.e. 500
f(n) = n/2
• Plot either of these loop examples would get a straight line (LINEAR
LOOPS)
Logarithmic Loops
• Linear loops the loops update either adds or subtracts
• Logarithmic loops the controlling variable is multiplied or divided
in each iteration
• MULTIPLY LOOPS
for (i =1; i <= 1000; i * = 2)
application code
• DIVIDE LOOPS
for (i =1000; i >= 1; i /= 2)
application code
Logarithmic Loop
Multiply Divide
Iteration Value of i Iteration Value of i
1 1 1 1000
2 2 2 500
3 4 3 250
4 8 4 125
5 16 5 62
6 32 6 31
7 64 7 15
8 128 8 7
9 256 9 3
10 512 10 1
(exit) 1024 (exit) 0
Analysis of Multiply and Divide
loops
• The number of iteration is a function of the multiplier or
divisor, in this case 2
• Loops continues while the condition is true
Multiply: 2iteration <= 1000
Divide: 1000/2iteration >= 1

y = bx ....is equivalent to... logb(y) = x

f(n) = log n
Nested Loops
• Loops contain loops requires to determine how many iterations
each loop completes
• The total number of loop is product of the number of iterations in
the inner loop and the number of loops in the outer loop

Iterations = outer loop iterations × inner loop iterations


Quadratic
for (i =0; i < 10; i ++)
for (j =0; j < 10; j ++)
application code
• The number of times the inner loop execute is
the same as the number of times the outer loop
executes
• Generalization as
f(n) = n2
Linear Logarithmic
for (i =0; i < 1000; i ++)
for (j =0; j <= 1000; j *= 2)
application code
• The inner loop is a loop that multiplies
• The number of iterations in the inner loop is log 10
• Inner loop is controlled by outer loop so total iterations
= 10 00log2 1000
• Generalization as f(n) = n log n
Dependent Quadratic
for (i =0; i < 10; i ++)
for (j =0; j <= i; j ++)
application code

f(n) = ???
SORTING
• Input: 3, 4, 6, 8, 9, 7, 2, 5, 1
• Output: 1, 2, 3, 4, 5, 6, 7, 8, 9
• How to sort them? ALGORITHM
• First sorting technique – INSERTION SORT
• Playing cards
Insertion Sorting Example
• 3, 4, 6, 8, 9, 7, 2, 5, 1

• 3, 4, 6, 8, 9, 7, 2, 5, 1; Key=4

• 3, 4, 6, 8, 9, 7, 2, 5, 1; Key=6

• 3, 4, 6, 8, 9, 7, 2, 5, 1; Key=8

• 3, 4, 6, 8, 9, 7, 2, 5, 1; Key=9

• 3, 4, 6, 8, 9, 7, 2, 5, 1; Key=7

• 3, 4, 6, 8, 7, 9, 2, 5, 1; Key=7

• 3, 4, 6, 7, 8, 9, 2, 5, 1; Key=7 … so on

• 1, 2, 3, 4, 5, 6, 7, 8, 9
Analysis of Insertion Sort
Steps Cost Times
for j  2 to n 3, 4, 6, 8, 9, 7, 2, 5, 1 C1 n (roughly)
Key  A[j] 3, 4, 6, 8, 9, 7, 2, 5, 1 C2 n–1
Insert A[j] into the sorted sequence A[j-1]
i  j -1 C3 n–1
𝑛
C4
while i > 0 and A[i] > Key 9&7 ෍ 𝑡𝑗
𝑗=2
𝑛
C5
do A[i + 1]  A[i] ෍(𝑡𝑗 −1)
𝑗=2
𝑛
C6
i-- ෍(𝑡𝑗 −1)
𝑗=2
A[i + 1]  Key C7 n-1
Insertion Sorting
• tj reflects the number
• accounts for shifting elements to the right while inserting jth card
• Inserting 7 shifts TWO cards

• Total Time ≈
𝑛
• n(C1 + C2 +C3 +C7) + σ𝑗=2 𝑡𝑗 ( C4 + C5 +C6) – (C2 + C3 + C5
+ C6 + C7)
• How tj affects?
Insertion Sorting
• Best case: Elements already sorted
• tj = 1; only compares with the last element
• Running time is linear time f(n) - LOW

• Worst case: Elements sorted oppositely


• tj = j; as comparisons with all the element
• Running time is quadratic time f(n2) - HIGH

• Average case: Elements already sorted


• tj = j/2;
• Running time is quadratic time f(n2) - AVERAGE
Insertion Sorting
•Worst case is usually used
• It provides an upper bound

•Average case is as bad as the worst case


• Finding average case can be very difficult
• To take all possible instances (infinitely many)
Execution Time Cases
• There are three cases which are usually used to compare
various data structure's execution time in relative
manner.
• Worst Case − a particular data structure operation takes
maximum time it can take.
• Average Case − depicting the average execution time of
an operation of a data structure. If a operation takes ƒ(n)
time in execution then m operations will take mƒ(n)
time.
• Best Case − depicting the least possible execution time
of an operation of a data structure.
Algorithm Efficiency
• Due to HIGH Processor speed
• Exact measurement of an algorithm efficiency not required
• General order of magnitude is sufficient

• No. of statements executed in the function for n


elements of data is a function of the number of
elements, expressed as f(n)
• LOOK FOR DOMINANT FACTOR
Asymptotic analysis of an Algorithm

• characterizing algorithms according to their efficiency


❑order of growth of the running time of an algorithm,
❑not in the exact running time

• This is also referred to as the asymptotic running time


• Talk about rate of growth of functions so that we can
compare algorithms
• Asymptotic notation gives us a method for classifying
functions according to their rate of growth
Asymptotic analysis of an Algorithm

•Refers to
• defining the mathematical boundation/framing
of its run-time performance are input bound
• i.e., if there's no input to the algorithm it is
concluded to work in a constant time
• Other than the "input" all other factors are
considered constant
Asymptotic analysis
• GOAL: to simplify analysis of running time by getting rid
of “details” which may be affected by specific
implementation and hardware
• For instance 10000001 ≈ 10000000
• And f(n)= 3n2 ≈ n2
• usually 3 is depending on computer system (H/w etc. )

• The running time increases with the size of the input


• Asymptotically more efficient algorithms are best for
all but small inputs
Asymptotic Notations
•Following are commonly used asymptotic
notations used in calculating running time
complexity of an algorithm.
•Ο Notation
•Ω Notation
•θ Notation
Big O Notation

•No need to find the complete measure


of efficiency, only the factor that
determines the magnitude
•This factor is big-O: “on the order of ”
•O(n) is on the order of n
Big Oh Notation, Ο
• The Ο(n) is the formal way to express the upper bound of an
algorithm's running time
• It measures the worst case time complexity or longest
amount of time an algorithm can possibly take to complete.
• For example, for a function f(n)

• f(n) = {O( g(n)) : there exists c > 0 and n0


• such that f(n) ≤ c.g(n) for all n > n0}
Omega Notation, Ω
• The Ω(n) is the formal way to express the lower bound of an
algorithm's running time.
• It measures the best case time complexity or best amount of
time an algorithm can possibly take to complete.
• For example, for a function f(n)

• f(n) ≥ {Ω(g(n)) : there exists c > 0 and n0


• such that c. g(n) ≤ f(n) for all n > n0. }
Theta Notation, θ
• The θ(n) is the formal way to express both the lower bound
and upper bound of an algorithm's running time

• f(n) = θ(g(n))
• there exists c1, c2 > 0 and n0
• such that
• c1. g(n) ≤ (f(n)) ≤ c2. g(n) for all n > n0

• f(n) = θ(g(n)) if and only if f(n) = Ο(g(n)) and f(n) = Ω (g(n))


Big Oh Notation, Οthe big-O notation
• Rules to derive the big-O notation from f(n):
• Drop
• lower order terms &
• constant factors (coefficients)

• Terms are ranked from higher to lower order


log n n n log n n2 n3 … nk 2n n!

(polynomial grows slower than any exponential →nk ---- 2n)


Big Oh Notation, O
Following is a list of some common asymptotic notations −

constant − Ο(1)
logarithmic − Ο(log n)
linear − Ο(n)
n log n − Ο(n log n)
2
quadratic − Ο(n )
3
cubic − Ο(n )
Ο(1)
polynomial − n
Ο(n)
exponential − 2
Examples
• f(n) = 50 n log n is O(n log n)
• f(n) = 7n – 3 is O(n)
• f(n) = 8n2 log n + 5 n2 + n is O(n2 log n)

• f(n) = n(n+1)/2 is O( ? )
• f(n) = aj nk + aj-1 nk-1 + aj-2 nk-2 + … a2 n2 + a1 n +
a0 is O(?)
Check
Examples
Example
• Show that f(x) = 4x2 - 5x + 3 is O(x2)
|f(x)| = |4x2 – 5x + 3| <= |4x2|+ |- 5x| + |3|
<= 4x2 + 5x + 3, for all x > 0
<= 4x2 + 5x2 + 3x2, for all x > 1
<= 12x2, for all x > 1
Hence we conclude that f(x) is O(x2)
Example
• By definition, f(n) is O( g(n) ) if:
There exists constants c, n0 where c > 0, s.t. for all n >n0:
f(n) <= c * g(n)

So to prove that f(x) = 4x2 - 5x + 3 is O(x2)


we need to show that:
There exists constants c, n0 where c > 0, s.t. for all x > n0:
f(x) <= c * g(x)
4x2 - 5x + 3 <= c * x2
Example
• So to prove that f(x) = 4x2 - 5x + 3 is O(x2)
we need to show that:
There exists constants c, n0 where c > 0, s.t. for all x > n0:
f(x) <= c * g(x) implies 4x2 - 5x + 3 <= c * x2
• Aim is to find some c and some n0 that will work.

The basic strategy is:


- break up f(x) into terms
- for each term find some term with a coefficient * x2 that is
clearly equal to or larger than it
- this will show that f(x) <= the sum of the larger x2 terms
- the coefficient for the sum of the x2 terms will be our c
Example
• Explanation of provided proof:
• f(x) = 4x2 - 5x + 3
• A number is always <= its absolute value
• e.g. -1 <= | -1| and 2 <= |2|
• So we can say that:
• f(x) <= |f(x)|
f(x) <= |f(x)| = |4x2 – 5x + 3|
• 4x2 + 3 will always be positive, but -5x will be negative for x>0
• So we know that -5x is < = |-5x|, so we can say that:
• f(x) <= |4x2|+ |- 5x| + |3|
• For x > 0, |4x2|+ |- 5x| + |3| = 4x2 + 5x + 3,
• so we can say that : f (x) <= 4x2 + 5x2 + 3x2, for all x > 0
Example
• For x > 0
• |4x2|+ |- 5x| + |3| = 4x2 + 5x + 3,
• so we can say that : f (x) <= 4x2 + 5x2 + 3x2, for all x > 0
• suppose x > 1. multiply both sides by x to show that x2
> x
• We can say that x <= x2
• Thus, f(x) <= 4x2 + 5x2 + 3x2, for all x > 1
f(x) <= 12x2, for all x > 1
• So c = 12 and since we had to assume x > 1 we pick n0
=1
Hence we conclude that f(x) is O(x2)
Example
• Interestingly:
• n 2 + n = O(n3 )
• Proof: We have f(n) = n 2 + n, and g(n) = n 3
• if n ≥ 1 then n ≤ n 3 is clear, also n 2 ≤ n 3 is clear
• Therefore, n 2 + n ≤ n 3 + n 3 = 2n 3
• Implies n 2 + n ≤ 2n 3 for all n ≥ 1
• Thus, we have shown that n 2 + n = O(n 3 )
• (by definition of Big-O, with n0 = 1, and c = 2.)
Question
•Which one is correct?
• 5 n2 =O(n3 ) & 5 n2 =O(n2)
•Both these statements are true: They both just say that 5 n2
grows no faster than both n2 and n3
•In fact, 5n2 grows no faster than n2 and grows slower
than n3.
•In ⪯ notation, expressed as: 5n2 ⪯ n2 and 5n2 ⪯ 3n3
•This resembles comparing integers: if x=2, then both
statements x≤2 and x≤3 are correct.
Example
•To prove 5n 2 + 3n + 20 = O(n2 ),
•we pick c = 5 + 3 + 20 = 28.
•Then if n ≥ n0 = 1,
•5 n2 + 3 n + 20 ≤ 5 n2 + 3 n2 + 20 n2 = 28 n2 ,
•thus 5n2 + 3n + 20 = O(n2 ).
Examples
Types of Notations for Time
Complexity
• Big Oh denotes "fewer than or the same as" <expression> iterations.
• Big Omega denotes "more than or the same as" <expression> iterations.
• Big Theta denotes "the same as" <expression> iterations.
• Little Oh denotes "fewer than" <expression> iterations.
• Little Omega denotes "more than" <expression> iterations.
Asymptotic notation: Summary
• Rule 1: Multiplicative constants can be
omitted 8n2+6n+4 grows no faster than n2
• Rule 2: Out of two polynomials, the one with
larger degree grows faster For constants a>b>0, n grows
a

faster than nb
• Rule 3: Any polynomial grows slower than any
exponential n6 and 2n
• Rule 4: Any polylogarithm grows slower than
any polynomial log n indeed grows slower than n
• Rule 5: Smaller terms can be omitted 8n2+6n+4 Both 6n and 4
grow slower than 8n2
Data Type
• DATA TYPE
• set of values
• a set of operations on values

• INTEGER TYPE consists of values (whole number)


and set of operations (add, subtract, multiply etc.)

DATA TYPE = FIELD METHOD


Data Type
• Built-in Data Type
• Derived Data Type
Example

•Company XYZ
• Linear : LIST →
• Non-Linear : TREE →

NAME POSITION
A Manager P
C VP
C J
G Employee
H Employee A
J VP
Data Representation
• Physical representation
• How data is actually organized in the memory cell 20001

20001 20002 20003 20004 20005


20002
20006 20007 20008 20009 20010
20003
20011 20012 20013 20014 20015
20004
20016 20017 20018 20019 20020
20005

• Logical representation
• We think how data being stored in the computer
Abstract view
• Storing data using LIST or POINTERS
• But provide same functionality
• Although they are implemented differently
ABSTRACT VIEW of the list which is distinct from any
particular computer implementation
• Basic things should be available
• Functionality must be there
don’t worry
• about the implementation
ABSTRACT DATA TYPE
• DATA STRUCTURE is a programming construct used to
implement abstract data types i.e.

• Physical implementation of abstract data type


• Declaration of data
• Declaration of operations
• Encapsulation of data and operations
Generic code for ADT
• Generic code allows us to write one set of code
& apply it to any data type
• Example
• Generic function to implement a stack structure
• Which can be used to implement an
• Integer stack
• Float stack
• Double stack and so on…
Example
• Linear Data Structure: Ordered List
• Ordered List
❑Keep items in a specific order such as alphabetic/ numeric
❑Whenever an item is added it is placed in the correct sorted position

• What is the abstract view of ordered list?


• LIST→ Abstract view demands: Add item/ Remove item
• LIST → Implementation view: Array/ Pointers
Ordered List: Array Implementation

• Reserve a block of adjacent memory cells large


enough to hold the entire data (ARRAY)
• Insert 6 → requires shifting 5 7 8 9

• Two Disadvantages-
❑Computer require sequential list so we can’t leave
blanks in between
❑Array fixed size, it can be increases provided adjacent
free memory cells available
Ordered List: Pointers
Implementation
• Link groups of memory cells together using pointers
• Each memory cell is called a NODE
• Node →
DATA Pointer to next item

• Linked list more flexible but with high complexity due to indirection
Pointers Implementation

5 20010

20001 20002

7 30076

20010 20011

8 30100

30076 30077

9 NULL

40003 30100 30101

40001 40002 40003 40004


Generic code for ADTs
• need to create generic code for abstract data
types
• Generic code
• allows us to write one set of code and apply it to any
data type
• C has limited capability through two features:
• pointer to void
• pointer to function
Pointer to void
• C is strongly typed
• assign and compare operations must use
• compatible types or be cast to compatible types

• one exception is the pointer to void


• It is a generic pointer
• It is not a null pointer
• can be used to represent any data type during compilation or run time

• Note: it is pointing to a generic data type (void)


• #include <stdio.h>
• int main ()
•{
• // Local Definitions
• void* p;
• int i = 50 ;
• float f = 50.5;
Results:
• // Statements i contains 50
• p = &i; f contains 50.500000
• printf ("i contains: %d\n", *((int*)p) );
• p = &f;
• printf ("f contains: %f\n", *((float*)p));
• return 0;
• } // main
Pointer to void
• Pointer p is declared as a void pointer can accept
• the address of an integer or floating-point number
• remember about pointers to void:
• a pointer to void cannot be dereferenced unless it is cast

• we cannot use *p without casting


• Example:
• use of malloc to create a pointer to an integer
• intPtr = (int*)malloc (sizeof (int));
Generic function to create:
a node structure
• Structure has two fields: data and link
• The link field is a pointer to the node structure.
• The data field:
• integer
• floating point
• string
• even another structure

• use a void pointer to data stored in dynamic memory


Generic function
•ADTs are stored in their own header files
• write the code for creating the node and
• placing the code in a header file
• /* Header file for create node structure "P1-node.h" Header file */
• typedef struct node
•{
• void* dataPtr;
• struct node* link;
• } NODE;
• /* =================== createNode ====================
• Creates a node in dynamic memory and stores data pointer in it.
• */
• NODE* createNode (void* itemPtr)
•{
• NODE* nodePtr;
• nodePtr = (NODE*) malloc (sizeof (NODE));
• nodePtr->dataPtr = itemPtr;
• nodePtr->link = NULL;
• return nodePtr;
• } // createNode
• /* Demonstrate simple generic node creation function */
• #include <stdio.h>
• #include <stdlib.h>
• #include "P1-node.h" // Header file
• int main (void)
•{
• // Local Definitions
• int* newData;
• int* nodeData;
• NODE* node;
• // Statements
• newData = (int*)malloc (sizeof (int));
• *newData = 7;
• node = createNode (newData);
• nodeData = (int*)node->dataPtr;
• printf ("Data from node: %d\n", *nodeData);
• return 0; The createNode function allocates a
• } // main node structure in dynamic memory,
stores the data void pointer in the
node, and then returns the node’s
address.
• /* Create a list with two linked nodes */
#include <stdio.h> // Statements
#include <stdlib.h> newData = (int*)malloc (sizeof (int));
#include "P1-node.h" *newData = 7;
node = createNode (newData);
// Header file
newData = (int*)malloc (sizeof (int));
int main (void) *newData = 75;
{ node->link = createNode (newData);
// Local Definitions nodeData = (int*)node->dataPtr;
int* newData; printf ("Data from node 1: %d\n", *nodeData);
int* nodeData; nodeData = (int*)node->link->dataPtr;
NODE* node; printf ("Data from node 2: %d\n", *nodeData);
return 0;
} // main

Data from node 1: 7


Data from node 2: 75
Pointer to Function
• Functions of the program occupy memory
• Name of the function is a pointer constant to its first byte of memory
• For example: four functions stored in memory -- main, fun, pun, and sun
• The name of each function is a pointer to its code in memory
Pointer to Function
• Define pointers to function variables and store the address of fun, pun, and sun in
them
• To declare a pointer to function, we code it as if it were a prototype definition,
with the function pointer in parentheses.
• The parentheses are important: without them C interprets the function return
type as a pointer
Basic Operations
• Data in the data structures are processed by certain operations
• Choice of particular data structure largely depends on the
frequency of the operation that needs to be performed on the
data structure
• Traversing
• Searching
• Insertion
• Deletion
• Sorting
• Merging
Linear Data Structures
• Elements form a sequence or LINEAR LIST
• Linear structures in memory representation
• Arrays – linear relationship between the
elements represented by means of sequential
memory locations
• Pointers – linear relationship between the
elements represented by means of pointers or
links
Arrays vs. Pointers
Arrays Pointers

Sequentially a list is Ordered collection of data


maintained (indexes) (next pointer)
Searching is very efficient Sequential search is difficult

Addition and deletion Addition and deletion easy


inefficient (no shifting)
ARRAYS
• int value[5] = {1, 2, 3, 4, 5};
• Variable value is an array of integers with 5 components
• value[0], value[1], value[2], value[3], value[4]

Name Address Contents


value[0] 2000 1
value[1] 2002 2
value[2] 2004 3
value[3] 2006 4
value[4] 2008 5
Linear Arrays
• List of a finite number of homogeneous data
elements
❑Elements are referenced by an index set (consecutive)
❑Stored in successive memory locations

• Length or size of ARRAY = UB – LB + 1


UB: Upper bound
LB: Lower bound
Linear Arrays
• Linear Array A in the memory of computers
• LOC(A[k]) = Address of the element A[K] of the array
• Base (A)
• LOC(A[K]) = Base(A[K]) + w (K – LB)
w=number of words per memory cell Address Value

2000

2001

2002

2003

2004

2005
Traversing Linear Arrays
1. Repeat For i = LB to UB
2. Apply PROCESS to A[i]
[End of For Loop]
3. Exit
Inserting
1. Set i = N [Initialize counter]
2. Repeat While (i >= LOC)
3. Set A[i+1] = A[i] [Move elements downward]
4. Set i = i – 1 [Decrease counter by 1]
[End of While Loop]
5. Set A[LOC] = ITEM [Insert element]
6. Set N = N + 1 [Reset N]
7. Exit
Deleting
1. Set ITEM = A[LOC] [Assign the element to be deleted
to ITEM]

2. Repeat For i = LOC to N


3. Set A[i] = A[i+1] [Move the ith element upwards]

[End of For Loop]


4. Set N = N – 1 [Reset N]
5. Exit
Sorting: Bubble Sort
1. Repeat For j = 1 to N
2. Repeat For k = 1 to N-j
3. If (A[k] > A[k+1]) Then
4. Interchange A[k] and A[k+1]
[End of If]

[End of Step 2 For Loop]


[End of Step 1 For Loop]

5. Exit
Searching: Linear Search
1. Repeat For j = 1 to N
2. If (ITEM == A[j]) Then
3. Print: ITEM found at location j
4. Return
[End of If]
[End of For Loop]

5. If (j > N) Then
6. Print: ITEM doesn’t exist
[End of If]
7. Exit
Searching: Binary Search
1. Set BEG = 1 and END = N
2. Set MID = (BEG + END) / 2

3. Repeat While (BEG <= END) and (A[MID] ≠ ITEM)


4. If (ITEM < A[MID]) Then
5. Set END = MID – 1
6. Else
7. Set BEG = MID + 1
[End of If]
8. Set MID = (BEG + END) / 2
[End of While Loop]

9. If (A[MID] == ITEM) Then


10. Print: ITEM exists at location MID
11. Else
12. Print: ITEM doesn’t exist
[End of If]

13. Exit
Two-Dimensional Arrays

• m × n array A is a collection of m . n data elements


A[j, k] where 1 ≤ j ≤ m & 1 ≤ k ≤ n

Columns

1 2 3 4
1 A[1, 1] A[1, 2] A[1, 3] A[1, 4]
Rows

2 A[2, 1] A[2, 2] A[2, 3] A[2, 4]


3 A[3, 1] A[3, 2] A[3, 3] A[3, 4]

Two Dimensional 3 × 4 Array A


Multidimensional Arrays
• Non-regular Arrays where Lower bound is not 0 (zero)
• A(2:5, -1:5)
• Index sets
• 2:5 → 2, 3, 4, 5 &
• -1, 5→ -1, 0, 1, 2, 3, 4 ,5

• Length
o first dimension: 5 – 2 + 1 = 4
o second dimension: 5 – (-1) + 1 = 7
Memory representation
• In computing
• row-major order and column-major order are methods for storing
multidimensional arrays in linear storage such as random-access
memory
• Array Can be stored as
• Column – major order: column by column
• Row – major order: row by row

• Row-major order is used in C/C++


• Column-major order is used in Fortran, MATLAB, R, Scilab
Two-Dimensional Array
A Subscript Column
A Subscript Column
(1, 1)
Column 1 (1, 1)
(2, 1)
(1, 2)

Row 1
(3, 1) (1, 3)
(1, 2)
(1, 4)
Column 2

(2, 2) (2, 1)
(3, 2) Columns
(2, 2)

Row 2
(1, 3) 1 2 3 4 (2, 3)
Column 3

(2, 3) 1 A[1, 1] A[1, 2] A[1, 3] A[1, 4] (2, 4)


Rows

(3, 3) (3, 1)
2 A[2, 1] A[2, 2] A[2, 3] A[2, 4]
(3, 2)

Row 3
(1, 4)
3 A[3, 1] A[3, 2] A[3, 3] A[3, 4]
Column 4

(2, 4) (3, 3)
(3, 4)
(3, 4)
Row major
Column major
Row-Major
LB2 UB2
LB1

a[i, j]
UB1

addr (a[i, j]) =


total number of rows before ith row (i – LB1)
× ×
size of row (UB2 – LB2+1) × size

+ +

total number of elements present before jth (j – LB2)


×
element in ith row size
×
size of element
Two-Dimensional Array

• Column major address


LOC(A[j, k]) = Base (A) + w[L1 (E2 ) + (E1)]
• Column major address (Array size m × n )
LOC(A[j, k]) = Base (A) + w[m (k – Lower bound) + (j - Lower bound)]
• Row major address
LOC(A[j, k]) = Base (A) + w[L2 (E1 ) + (E2)]
n
• Li = UBi – LBi + 1 (0, 0) (LB1 ,LB2) UB1

• Ei = Ki – Lower bound
a[j, k]
m UB2
Two-Dimensional Array
• Array size m × n

• Base(A) the address of the first element is A[1,1]


• Column major address
LOC(A[j, k]) = Base (A) + w[m (k - 1) + (j - 1)]
• Row major address
LOC(A[j, k]) = Base (A) + w[n (j - 1) + (k - 1)]
n
(1,1)

a[j, k]

m
3-Dimensional Array
• B(2, 4, 3) contains
2. 4. 3 = 24 elements
• Three layers
where each layer
have 2 × 4 elements

Column major address


LOC(B[K1, K2, k3]) = Base (B) + w[((E3L2 + E2)L1) + E1]
Column major address
LOC(B[K1, K2, k3]) = Base (B) + w[((E3L2 + E2)L1) + E1]
Multidimensional Arrays
• n-dimensional m1 × m2 × m3 × … × mn
• Find address of array C[ k1, k2, …, kn] ???
• where 1≤ k1 ≤ m1, 1≤ k2 ≤ m2, … , 1≤ kn ≤ mn
• Column major
oFirst subscript varies first, the second subscript second
and so on
• Row major
oLast subscript varies first, the second to last subscript
second and so on (ODOMETER)
General multidimensional Arrays
• Let C be n-dimensional array
• Length Li of dimension i of C is no. of elements
in index set
Li = UBi – LBi + 1
• For a given subscript Ki, the effective index Ei of
Li is the number of indices preceding Ki in the
index set
Ei = Ki – Lower bound
Two-Dimensional Array
• Column major address
LOC(A[j, k]) = Base (A) + w[L1 (E2 ) + (E1)]
• Column major address (Array size m × n )
LOC(A[j, k]) = Base (A) + w[m (k - 1) + (j - 1)]
• Row major address
LOC(A[j, k]) = Base (A) + w[L2 (E1 ) + (E2)]

n
• Li = UBi – LBi + 1
(1, 1) (LB1 ,LB2) UB1
• Ei = Ki – Lower bound

a[j, k]
m UB2
Three-Dimensional Array
• Length Li , effective index Ei of Li
• Li = UBi – LBi + 1
• Ei = Ki – Lower bound
• Column major address
• 2D: LOC(A[j, k]) = Base (A) + w[L1 (E2 ) + (E1)]
• 3D: LOC(B[K1, K2, k3]) = Base (B) + w[((E3L2 + E2)L1) + E1]
• Row major address
• 2D: LOC(A[j, k]) = Base (A) + w[L2 (E1 ) + (E2)]
• 3D: LOC(B[K1, K2, k3]) = Base (B) + w[((E1L2 + E2)L3) + E3]
General multidimensional Arrays
• Base(C) the address of the first element
• Column major address
LOC(C[K1, K2, … kN]) =
Base (C) + w[((( … ENLN-1 + EN)LN-2) + … E3)L2 + E2)L1 + E1]

• Row major address


LOC(C[K1, K2, … kN]) =
Base (C) + w[(… ((E1L2 + E2)L3) + E3)L4 + … EN-1)LN + EN]
Example 1
• Linear Array A(2:22)
• Find the number of elements.
• Suppose Base(A) = 300 and w = 4 words per
memory cell for A.
• Find the address of A[15]
• Find the address of A[25]
Solution 1
• Array A(2:22)
• Number of elements is equal to the length
• Length = UB – LB + 1 => 22 – 2 + 1 = 21
• Length A = 21
• Address Loc(A[15]) = Base(A) + w(K - LB) = 300 + 4(15 - 2) = 352
• Address Loc(A[25]) = not an element of A,
• since 25 exceeds UB=22
Example 2
• Array B(20)
• Find the number of elements.
• Find the address of B[15]
Solution 2
• Array B(20)
• Number of elements is equal to the length
• Length = UB – LB + 1 (LB =1)
• 20 – 1 + 1 = 20
Example 3
• Array A(-2:2, 2:22)
• Find the length of each dimension
• Find the number of elements.
Solution 3
• Array A(-2,:2, 2:22)
• Length = UB – LB + 1
• L1 = 2 – (– 2) +1=5
• L2 = 22 – 2 + 1 = 21
• A has 5 × 21 = 105 elements
Example 4
• Array B(1:8, -5:5, -10:5)
• Find the length of each dimension & Find the number of elements.
• Suppose Base(B) = 300 and w = 4 words per memory cell for B.
• Consider element B[3,3,3] in B.
• Find the effective indices E1, E2, E3 &
• Find the address of the element.
o Row major
o Column major
• Column major address
LOC(C[K1, K2, … kN]) = Base (C) + w[((( … ENLN-1 + EN)LN-2) + … E3)L2 + E2)L1 + E1]

• Row major address


LOC(C[K1, K2, … kN]) = Base (C) + w[(… ((E1L2 + E2)L3) + E3)L4 + … EN-1)LN + EN]
Solution 4
• Array B(1:8, -5:5, -10:5)
• Length = UB – LB + 1
• L1 = 8 – 1+1=8 & L2= 5 – (– 5) + 1 = 11 & L3 = 5 – (– 10) + 1 =
16
• B has 8 × 11 × 16 = 1408 elements
• The effective index Ei = Ki – LB
• E1 = 3 – 1 = 2 & E2 = 3 – (– 5) = 8 & E3 = 3 – (– 10) = 13

• B stored in Row Major LOC(B[K1, K2, k3]) = Base (B) + w[((E1L2 + E2)L3) + E3]
300 + 4 ×[( (2×11 + 8) × 16 ) + 13] = 300 + 4 × 493 = 2272

• B stored in Column major LOC(B[K1, K2, k3]) = Base (B) + w[((E3L2 + E2)L1) +
E1]
300 + 4 ×[( (13×11 + 8) × 8 ) + 2] = 300 + 4 × 1210 = 5140
Sparse Matrix
• Matrix that has many elements with a value zero

0 2 0 0 3 0
0 0 6 0 0 0
0 9 0 2 0 0
0 4 0 0 0 4
Sparse Matrix: Arrays
• Matrix that has many elements with a value zero

4 6 7
0 1 2
0 2 0 0 3 0
0 4 3
0 0 6 0 0 0 1 2 6
0 9 0 2 0 0 2 1 9
0 4 0 0 0 4 2 3 2
3 1 4
3 5 4

You might also like