Lecture 1:introduction To Data Structure and Algorithms: Mohsin Raza Khan

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

LECTURE 1:INTRODUCTION

TO DATA STRUCTURE AND


ALGORITHMS

Mohsin Raza Khan


Books
 Recommended Text Book:

 Langsam, Tanenbaum, “Data Structures Using C


and C++”, Prentice Hall.
 Seymour Lipschutz, “ Theory And Problems of
Data Structures”, Schaum’s Outline Series
Books
Reference Books
 Alfred V. Aho, Jaffery D. Ullman, John E.

Hopcroft, “Data Structures and Algorithms”,


Addison-Wesley
 Allen L. Weiss, “Data Structures and Algorithm

Analysis in C”
 Robert Lafore, “Data Structures and Algorithms in

24 Hours”, SAMS
Goals of this Course
 The objective of this course is to introduce the
analysis and designing of data structures using
various standard algorithms.

 Cover well-known data structures such as dynamic


arrays, linked lists, stacks, queues, trees and
graphs.

 Implement data structures in C++


Data Representation

 A computer is a machine that manipulates data.

 The prime aim of data structure includes to study


how data is organized in a computer, how it is
manipulated, how it is retrieved, and how it can be
utilized, resulting in more efficient programs.
What is Data Structure?
• In computer science, a data structure is a particular way
of storing and organizing data in a computer so that it
can be used efficiently.

• Data may be organized in many different ways, the


logical or mathematical model of a particular
organization of data in memory or on disk is called
Data Structure.

Algorithms are used for manipulation of data.


What is Data Structure?
 There are three basic things associated with data
structures:

 space for each data item it stores


 time to perform each basic operation
 programming effort
Data Structure Operation
The data appearing in our data structure is processed by means of certain
operations. The particular data structure that one chooses for a given situation
depends largely on the frequency with which specific operations are performed.
The following four operations play a major role:

Traversing
Accessing each record exactly once so that certain items in the record may be
processed.(This accessing or processing is sometimes called 'visiting" the
records.)
Searching
Finding the location of the record with a given key value, or finding the
locations of all records, which satisfy one or more conditions.
Inserting
Adding new records to the structure.
Deleting
Removing a record from the structure.
Types of Data Structure

There are two types of data structure 

Non- Linear Data Structure Linear Data Structure


A non-linear structure is mainly used to A data structure is said to be linear if
its elements form a sequence, or in
represent data containing a hierarchical
other words a linear list.  
relationship between elements.  Array        
 Tree  Stack
 Graph  Queue
 Linked List
Characteristic of Data Structure
Data Advantages Disadvantages
Structure
Arrays Quick insertion, very fast access if index Slow search, Slow deletion, Fixed
is known size
Ordered Array Quicker search than unsorted array. Slow insertion and deletion, Fixed
size
Stack Provides last-in first-out access Slow access to other items
Queue Provides first-in first-out access Slow access to other items
Linked List Quick insertion and quick deletion Slow search
Binary Trees Quick search, insertion and deletion if tree Deletion algorithm is complex.
remains balance
Hash Table Very fast access if key known, Fast Slow deletion, access slow if key
insertion. not known, inefficient memory
usage
Heap Fast insertion ,deletion. Access to largest Slow access to other items
item
Graph Models real world situation Some algorithms are slow and
Data Structure and Algorithms
 Algorithm: An algorithm is a finite set of
instructions that takes some raw data as input and
transforms it into refined data.
 An algorithm is a well-defined list of steps for
solving computational problem.
 Program: Program is an implementation of an
algorithm in some programming language.
 Data Structure: Organization of data needed to
solve the problem.
Algorithmic Problem

Specification of output
Specification of input
as a function of input

 Infinite number of input instances satisfying the


specification. For eg: A sorted, non-decreasing
sequence of natural numbers of non-zero, finite length:
 1,20,908,909,100000,1000000000.
 3.
Algorithmic Solution

Input instance, adhering Algorithm


Output related to the
to the specification input as required

 Algorithm describes actions on the input instance.


 Infinitely many correct algorithms for the same
algorithmic problem.
What is a Good Algorithm?
 Efficient:
 Running time
 Space used
 Efficiency as a function of input size:
 The number of bits in an input number
 Number of data elements (numbers, points)
Analysis of Algorithms
 The theoretical study of computer program
 performance and resource.
 What is more important then performance?
 Why study algorithm and performance?
Measuring the Running Time
 How should we measure the running time of an
algorithm?
 Experimental Study
 Write a program that implements the algorithm.
 Run the program with data sets of varying size and
composition.
 Use a method like startTime and stopTime to get an
accurate measure of the actual running time.
Limitations of Experimental
Studies
 It is necessary to implement and test the algorithm in
order to determine its running time.

 Experiments can be done only a limited set of inputs,


and may not be indicative of the running time on
other inputs not included in the experiment.

 In order to compare two algorithms, the same


hardware and software environments should be used.
Beyond Experimental Studies
 We will develop a general methodology for
analyzing running time of algorithms. This
approach

 Uses a high-level description of the algorithm instead


of testing one of its implementations.
 Takes into account all possible inputs.
 Allows one to evaluate the efficiency of any algorithm
in a way that is independent of the hardwae and
software environment.
Pseudo-Code
 A mixture of natural language and high-level programming
concepts that describes the main ideas behind a generic
implementation of a data structure or algorithm.
 Eg: Algorithm array Max(A,n)
 Input: An array A storing n integers.
 Output: The maximum element in A.
currentMax = A[0]
for i =1 to n-1
if currentMax < A[i] then currentMax = A[i]
return currentMax
Pseudo-Code
 It is more structured than usual prose but less formal than a programming language.
 Expressions:
 Same as Programming
 Method Declarations:
 Algorithm name(param1, param2)
 Programming Constructs:
 Decision structures: if…then….[else]
 While-loop: while….do
 Repeat-loop: repeat……until….
 For-loop: for
 Array indexing: A[i], A[I,j]
 Methods:
 Calls: object method(args)
 Return: return value
Analysis of Algorithms
 Primitive Operation: Low –level operation
independent of programming language. Can be
identified in pseudo-code. For eg:
 Data movement(assign)
 Control(branch(if …then…else), subroutine call, return)
 Arithmetic and logical operations (eg: addition,
comparison)
 By inspecting the pseudo-code, we can count the
number of primitive operations executed by an
algorithm.
Example: Sorting
Input Output
 Sequence of numbers  A permutation of the
sequence of numbers

a1, a2, a3,……..an Sort b1, b2, b3,……..bn

2Correctness 10 7 for the output)2 4


4 4 (requirements 4 7 10
For any given input the algorithm halts with the Running Time
output:
• b1 < b2 <b3<……..<b4 Depends on
•b1, b2, b3,……..bn is a permutation of •Number of elements (n)
a1, a2, a3,……..an •How (partially)sorted they are
•Algorithm
Insertion Sort

A 3 4 6 8 9 7 2 5 1

1 n
INPUT: A[1….n]- an array of integers
Strategy OUTPUT: a permutation of A such that
A[1]<A[2]….<A[n]
•Start “empty handed”
•Insert a card in the right for j =2 to n
position of the already key = A[j]
sorted hand. Insert A[j] into the sorted sequence A[1…..j-1]
while i>0 and A[i] >key
•Continue until all cards
do A[i+1] = A[i]
are inserted/sorted i--
A[i+1] = key
Analysis of Insertion Sort

INPUT: A[1….n]- an array of integers


OUTPUT: a permutation of A such that
A[1]<A[2]….<A[n]
 Cost Times
 C1 n
for j =2 to n
 C2 n-1
key = A[j]
Insert a[j] into the sorted sequence
A[1…..j-1]  C3 n-1
i=j-1  C4 ∑n j=2 tj
while i>0 and A[i] >key  C5 ∑n j=2 (tj -1 )
do A[i+1] = A[i]  C6 ∑n j=2 (tj -1 )
i--
A[i+1] = key  C7 n-1

Total time=n(c1+c2+c3+c7)+∑n j=2 tj (c4+c5+c6) – (c2+c3+c5+c6+c7)


Best/Worst/Average Case
Total time=n(c1+c2+c3+c7)+∑n j=2 tj (c4+c5+c6) –
(c2+c3+c5+c6+c7)

 Best Case: elements already sorted; tj = 1, running


time = f(n), i.e., linear time.

 Worst Case: elements are sorted in inverse order; tj


= j, running time = f(n2), i.e., quadratic time.

 Average Case: tj = j/2, running time = f(n2), i.e.,


quadratic time.
Best/Worst/Average Case
 Worst Case is usually used: it is an upper-bound and in
certain application domains (air traffic control, surgery)
knowing the worst case time complexity is of crucial
importance

 For some algorithms worst case occurs fairly often.

 Average Case is often as bad as the worst case.

 Finding average case can be very difficult


What is Insertion Sort’s time?

 Depends on computer
 Relative Speed(on same machine)
 Absolute speed(on diff machine)

This brings us to the Big Idea of Asymptotic Analysis


Asymptotic Analysis

 Goal: to ignore machine dependent constants


 Like “rounding”; 1,000,001=1,000,000
 3n2=n2

 Capturing the essence: how the running time of an


algorithm increases with size of the input in the
limit.
Asymptotic Notation
 The “big-Oh” O-Notation:
 Asymptotic upper bound
 f(n)=O(g(n)), if there exists constant c and n0 s.t. f(n)<
c g(n) for n>no
 f(n) and g(n) are functions over non-negative integers.
 Drop low-order terms and
 Ignore leading constants.
 Eg: 3n3+90n2-5n+6= Ѳ(n2)
 Eg: 50 n log n =Ѳ(n log n)
Asymptotic Analysis of Running
Time
 Use O notation to express number of primitive operations
executed as function of input size.
 Comparing asymptotic running times
 An algorithm that runs in O(n) time is better than one that runs in
O(n2) time
 Similarly, O(log n ) I better than O(n)
 Hierarchy of functions: log n<n<n2,n3<2n
 Caution! Beware of very large constant factors. An
algorithm running in time 1,000,000 n is still O(n) but might
be less efficient than one running in time 2n 2 which is O(n2)
Example of Asymptotic Analysis
 Algorithm prefixAverages1(X):
 Input: An n-element array X of number
 Output: An n-element array A of numbers such that A[i] is
the average of elements x[0],……x[i].
for i=0 to n-1
a=0
for j=0 to I
a=a+x[j] n2
A[i]=a/(i+1) n
Return array A
Analysis: Running time is O(n2)
A Better Algorithm
 Algorithm prefixAverages2(X):
 Input: An n-element array X of number

 Output: An n-element array A of numbers such that

A[i] is the average of elements x[0],……x[i].


S=0
for i=0 to n
s=s+X[i]
A[i]=s/(i+1)
return array A
Analysis: Running time is O(n)
Asymptotic Notation(terminology)
 Special classes of algorithms:
 Logarithmic: O(log n)
 Linear: O(n)
 Quadratic: O (n2)
 Polynomial: O (nk), k>1
 Exponential: O (an), a>1
 “Relatives” of the Big-O
 Ω (f(n)): Big Omega-asymptotic lower bound
 Ѳ (f(n)): Big Theta-asymptotic tight bound
Time-Space Tradeoff
 In computer science, a space-time or time-memory tradeoff
is a way of solving a problem or calculation in less time by
using more storage space (or memory), or by solving a
problem in very little space by spending a long time.

 So if your problem is taking a long time but not much


memory, a space-time tradeoff would let you use more
memory and solve the problem more quickly.

 Or, if it could be solved very quickly but requires more


memory than, you can try to spend more time solving the
problem in the limited memory.
Thank You

You might also like