Data Structre and Algorithm-1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

Data Structure and Algorithm

 Algorithm – Outline, the essence of a computational


procedure, Step-by-step instruction
 Program – Ans implementation of an algorithm
in some programming language
 Data Structure – Organization of data needed to
solve the problem.
What is good algorithm

 Efficiency
 Running Time
 Space used

 Efficiency as a function of input size


 The number of bits as input
 The number of data elements
Analysis of Algorithms

Running time of algorithm depends typically on

• Size of input
• Data structure
• Hardware environment (clock speed, processor, memory, disk speed)
• Software environment (OS, language, interpreted or compiled)

If hardware and software environment remain same, running time depends on the size
of inputs and data structure.

While experimenting an algorithm with test input


• Test inputs must be representative
• Test must be done on same H/w and S/w environment.
Analysis of Algorithms
We need an analytical framework to analyze algorithms even without actual
experimentation which can provide us

• Comparison of efficiency of two algorithms


• Can consider all possible inputs
• Can be studied without actually implementing or running the algorithm

There are various components while developing an analysis methodologies such as

• A standard language for expressing the algorithm


• A computational model for algorithm
• A metric for measuring the running time of algorithm
• Expressing running time in terms of input size including recursive algorithms
Limitations of Experimental Studies

• It is necessary to implement and test the algorithm in order to


determine ints running time.
• Experiments can be done only on some limited sets of inputs while
actually an infinite set of inputs can be created. So running time may
not be indicative as we are leaving lot of inputs out of
experimentation.
• In order to experiment same hardware and software shold be used.
Without Actual Experimentation
We should develop a general methodology for analyzing
running time of algorithms. In this approach we will use

• High-level description of algorithm instead of testing few of its


implementation

• Takes into account all possible inputs.

• Allows one to evaluate the efficiency of algorithm in a way that is


independent of the hardware and software environment.
Language for expressing the algorithms – Pseudo Code
Pseudo-code
• A way of expressing algorithms that uses a mixture of English phrases and
indention to make the steps in the solution explicit
• Pseudo-code mixes standard programming language with natural language.
• There are no grammar rules in pseudo-code
• Pseudo-code is not case sensitive

Following is the algorithm of insertion sort on a zero-based array

1. for j ←1 to length(A)-1
2. key ← A[ j ]
3. > A[ j ] is added in the sorted sequence A[1, .. j-1]
4. i←j-1
5. while i >= 0 and A [ i ] > key
6. A[ i +1 ] ← A[ i ]
7. i ← i -1
8. A [i +1] ← key
Pseudo Code Constructs
Programming language constructs we use in pseudo-code are taken from high level
languages from C,C++. These are as follows
• Expression: ← is used for assignment and = is used for comparison.
• Method declaration : callx(param1,param2) is a method named “callx” with two
input parameter param1 and param2.
Programming constructs
• Decision structure: If <condition> then <true part> else <false part>. Indentation is
used to show the scope.
• While-loops: While <condition> do <actions>. Here also indentation is used for
scope.
• For loop: for <variable initialization,increment and condition checking> do
<action>. Indentation is used to display the scope of for.
• Repeat loop: Repeat < action> until <condition>
• Array indexing: A[i] represents the i-th cell in the array A.
Methods
• Method call: object.method(<arguments>). Where method belongs to object.
• Method returns: The return value of method.
Random Access Machine (RAM) Model

• A Random Access Machine (RAM) is a theoretical computer model which has an


unlimited number of registers of unlimited size which can be accessed randomly.
• Then, there is a program which is run step-by-step and consists of a limited number
of instructions out of a simple instruction set. The set includes very basic arithmetic
operations, jump instructions and direct and indirect access to the registers.
• A RAM model can perform any primitive operation in a constant number of steps
which is independent of size of input.
Primitive Operations

To have an idea of running time of a pseudo-code without actually implementing it,


we define a set of high level primitive operations that can be identified in pseudo-
code and independent of implementation. We generally consider the following
primitive operation to estimate running time. In estimation we count the number of
primitive operation of a pseudo code. From this count we can compare two pseudo-
code. The primitive operations are

• Value assignment to a variable.


• Calling a method
• Doing an arithmetic operation
• Comparison of two numbers
• Array indexing
• Object reference
• Returning from a method
Analysis of Algorithm by counting primitive operations

By inspecting the pseudo-code , we can count the number of primitive


operations required by an algorithm.
Primitive Operations
• for j ←2 to length(A) n-1
• key ← A[ j ] n-1
• # A[ j ] is added in the sorted sequence A[1, .. j-1]
• i←j-1 n-1
• while i >= 0 and A [ i ] > key Best :(n-1)*2 Worst:(n-1)(n+2)/2
• A[ i +1 ] ← A[ i ] 0 (n)(n-1)/2
• i ← i -1 0 (n)(n-1)/2
• A [i +1] ← key n-1

And thus we can have an idea of the running time of the code.
Best-case, Average-case and worst-case analysis
Let us do a cost analysis of Insertion sort.
Cost times
• for j ←2 to n c1 n
• key ← A[ j ] c2 n-1
• # A[ j ] is added in the sorted sequence A[1, .. j-1]
• i←j-1 c3 n-1
• while i >= 0 and A [ i ] > key c4 tk
• A[ i +1 ] ← A[ i ] c5 tk - 1

• i ← i -1 c6 tk - 1
• A [i +1] ← key c7 n-1
n

Total Time = n(c1+c2+c3+c7)+ tk (c4+c5+c6) – (c2+c3+c5+c6+c7)


k=2
Best-case, Average-case and worst-case analysis

• Best Case for insertion sort : When numbers are already sorted.
tk=1. Running time = f(n)
• Worst Case : When numbers are inversely sorted. tk=k. Running
time = f(n2)
• Average Case: tk=k/2. Running time = f(n2).
Time vs Input Graph
Best-case, Average-case and worst-case analysis
• Best, worst and average cases of a given algorithm express what
the resource usage is at least, at most and on average,
respectively. Usually the resource being considered is running
time, but it could also be memory or other resources.
• In real-time computing, the worst-case execution time is often of
particular concern since it is important to know how much time
might be needed in the worst case to guarantee that the algorithm
will always finish on time.
• Average performance and worst-case performance are the most
used in algorithm analysis. Less widely found is best-case
performance. Probabilistic analysis techniques, especially expected
value, to determine average case analysis.
• Average case is often as bad as the worst case.
• Finding average case can be very difficult.
Analyzing Recursive Algorithm
Analyzing Recursive Algorithm

T(n) denotes the running time time of an algorithm of input size n


Asymptotic Analysis
• Goal :To simplify analysis of running time be getting rid of details
which may be affected by specific implementation and hardware.
• Capturing the essence : How the running time of an algorithm with
the size of input in the limit.

Ignoring constants in T(n)


Analyzing T(n) as n "gets large"
Example:

T (n)  13n  42n  2n log n  4n


3 2

As n grows larger, n3 is MUCH larger than n2, n log n, and n,


so it dominates T (n)
The running time grows "roughly on the order of n3"
Notationally, T(n) = O(n3 )
Asymptotic Notation
Definition (Big Oh)     Consider a function f(n) which is non-negative
for all integers n>0. We say that ``f(n) is big oh g(n),'' which we write
f(n)=O(g(n)), if there exists an integer n0 and a constant c>0 such that
for all integers , n>=n0 , f(n)<=cg(n).
This notation are typically used for worst case analysis.

R
u
n
c*g(n)
n
c*f(n) is an upper
i bound for T(n)
n f(n)
g
T
i
m
e
Input size
Asymptotic Notation
Big O notation is a huge simplification; can we justify it?
– It only makes sense for large problem sizes
– For sufficiently large problem sizes, the highest-order term
swamps all the rest!
Consider R = x2 + 3x + 5 as x varies:
x = 0 x2 = 0 3x = 10 5 = 5 R = 5
x = 10 x2 = 100 3x = 30 5 = 5 R = 135
x = 100 x2 = 10000 3x = 300 5 = 5 R = 10,305
x = 1000 x2 = 1000000 3x = 3000 5 = 5 R = 1,003,005
x = 10,000 R = 100,030,005
x = 100,000 R = 10,000,300,005
Some properties of Big Oh notation
• Fastest growing function dominates a sum O(f(n)+g(n)) is O(max{f(n), g(n)})
• Product of upper bounds is upper bound for the product If f is O(g) and h is O(r)
then fh is O(gr)
• f is O(g) is transitive If f is O(g) and g is O(h) then f is O(h)
• If d is O(f), then ad is O(f) for a>0
• If d is O(f) and e is O(g), then d+e is O(f+g)
• If f(n)=a0+a1n+……+adnd the f(n) is O(nd)
• log(nx) is O(logn) for any fixed n>0.
• Hierarchy of functions O(1), O(logn), O(n1/2), O(nlogn), O(n2), O(2n), O(n!)
An Example of solving Big O

Method

•Thus c = 35, and n = 1


Other Examples
f(n) = 5n+2 = O(n) // g(n) = n
– f(n)  6n, for n  3 (C=6, n0=3)
f(n)=n/2 –3 = O(n)
0
– f(n)  0.5 n for n  0 (C=0.5, n0=0)
Classifying Algorithm Based on Big Oh
A function f(n) is said to be of at most logarithmic growth if f(n) = O(lo
g n)
A function f(n) is said to be of at most quadratic growth if f(n) = O(n2)
A function f(n) is said to be of at most polynomial growth if f(n) = O(nk
), for some natural number k > 1
A function f(n) is said to be of at most exponential growth if there is a c
onstant c, such that f(n) = O(cn), and c > 1
A function f(n) is said to be of at most factorial growth if f(n) = O(n!).
A function f(n) is said to have constant running time if the size of the in
put n has no effect on the running time of the algorithm (e.g., assign
ment of a value to a variable). The equation for this algorithm is f(n)
=c
Other logarithmic classifications: f(n) = O(n log n)
f(n) = O(log log n)
Growth Rates Compared

n=1 n=2 n=4 n=8 n=16 n=32

1 1 1 1 1 1 1

logn 0 1 2 3 4 5

n 1 2 4 8 16 32

nlogn 0 2 8 24 64 160

n2 1 4 16 64 256 1024

n3 1 8 64 512 4096 32768

2n 2 4 16 256 65536 4294967296


Big-Omega and Big-Theta notations
Big-Oh provides an asymptotic way of saying that a function f is less than or equal
to another function g i,e f is O(g).
f(n) = O(g(n)) if f(n) grows with same rate or slower than g(n).
It represents the upper bound or the worst case of algorithm
If f = (g), then f is at least as big as g ( )
In other word f(n) is (g(n)) if for a real constant c>0 and an integer n0>=1 such
that f(n)>= cg(n) for n>=n0.
f(n) grows faster or with the same rate as g(n): f(n) = Ω (g(n))
It represents the best case or lower bound of the algorithm
If f = (g), f=O(g) and f =  (g) (or g is both an upper and lower bound. It is a
“tight” fit)
In other word for c`>0 and c``>0 and integer n0>=1 such that c`g(n)<=f(n)<=c``g(n)
for n>=n0.
Some Words of Caution
Be careful about very large constant factors while getting an asymptotic
notation.Say for example an algorithm running in time 1,000,000n is still O(n) but
might be less efficient than one running in time 2n2 which is O(n2) for sufficiently
large vale of n. So while we are comparing two algorithms based on big-Oh
notation then we should be careful of constants.
Little-oh and little-Omega
f(n) grows slower than g(n) (or g(n) grows faster than f(n))
•if (lim( f(n) / g(n) ) = 0, n → ∞

•Notation: f(n) = o( g(n) ) pronounced "little oh“

•In other words f(n) is o(g(n) if f(n) becomes in significant in comparison


to(g(n) as n tends to infinity.
f(n) grows faster than g(n) (or g(n) grows slower than f(n))

•if (lim( f(n) / g(n) ) = ∞, n → ∞

•Notation: f(n) = ω (g(n)) pronounced "little omega"


if g(n) = o( f(n) ) then f(n) = ω( g(n) )
Example of Asymptotic Analysis
Algorithm prefixAverage1(a):
Input: An n element array a of numbers
Output: An n element array of b of numbers such that b[i]=(a[0]+a[1]+….+a[i]) / (i+1)
for i 0 to n-1 do
x 0
for j 0 i do
x x+a[i]
b[i] x/(i+1)
return array b
Analysis : Running Time is O(n2)
Algorithm prefixAverage2(a):
Input: An n element array a of numbers
Output: An n element array of b of numbers such that b[i]=(a[0]+a[1]+….+a[i]) / (i+1)
s 0
for i 0 to n do
s s+a[i]
b[i] s/(i+1)
return array b
Analysis : Running Time is O(n)
Importance of Asymptotics
A Quick Mathematical Review
n
• Σ f(i)=f(1)+f(2)+……+f(n)
i=1
• Σai= (1-an+1)/(1-a)
• Σi=n(n+1)/2
Logarithmic identities

Floor and Ceiling functions


Simple Justification Techniques
To prove that an algorithm or data structure is correct or faster we need to use a
mathematical model. Without using this mathematical model always we can use some
simple ways to test our algorithm.
By Example
In this we can try to get some example to prove a claim is wrong. If somebody claims
that all odd number are prime then by example of 9 (3X3) we can prove the theory is
wrong. This instance is called as counterexample.
By Contrapositive
Proof by contrapositive takes advantage of the logical equivalence between "P implies
Q" and "Not Q implies Not P". For example, the assertion "If it is my car, then it is red"
is equivalent to "If that car is not red, then it is not mine". So, to prove "If P, Then Q"
by the method of contrapositive means to prove "If Not Q, Then Not P".
If x and y are two integers for which x+y is even, then x and y have the same parity.
Proof. The contrapositive version of this theorem is "If x and y are two integers with
opposite parity, then their sum must be odd." So we assume x and y have opposite
parity. Since one of these integers is even and the other odd, there is no loss of
generality to suppose x is even and y is odd. Thus, there are integers k and m for which
x = 2k and y = 2m+1. Now then, we compute the sum x+y = 2k + 2m + 1 = 2(k+m) +
1, which is an odd integer by definition.
Simple Justification Techniques
In a proof by contradiction we assume, along with the hypotheses, the logical negation
of the result we wish to prove, and then reach some kind of contradiction. That is, if we
want to prove "If P, Then Q", we assume P and Not Q. The contradiction we arrive at
could be some conclusion contradicting one of our assumptions, or something
obviously untrue like 1 = 0.
There are no positive integer solutions to the equation x 2 - y2 = 1.
Proof. (Proof by Contradiction.) Assume to the contrary that there is a solution (x, y) where x
and y are positive integers. If this is the case, we can factor the left side: x 2 - y2 = (x-y)(x+y) = 1.
Since x and y are integers, it follows that either x-y = 1 and x+y = 1 or x-y = -1 and x+y = -1. In
the first case we can add the two equations to get x = 1 and y = 0, contradicting our assumption
that x and y are positive. The second case is similar, getting x = -1 and y = 0, again contradicting
our assumption.

The difference between the Contrapositive method and the Contradiction method is
subtle. Let's examine how the two methods work when trying to prove "If P, Then Q".
• Method of Contradiction: Assume P and Not Q and prove some sort of contradiction.
• Method of Contrapositive: Assume Not Q and prove Not P.
The method of Contrapositive has the advantage that your goal is clear: Prove Not P. In
the method of Contradiction, your goal is to prove a contradiction, but it is not always
clear what the contradiction is going to be at the start.
Simple Justification Techniques
Induction
If q(n) is true for an integer n , if we can prove that q(n+1) is true then we
can assume that q(n) is true for all positive integers.
For any positive integer n, 1 + 2 + ... + n = n(n+1)/2.
Proof. (Proof by Mathematical Induction) Let's let P(n) be the statement "1 + 2 + ... + n
= (n (n+1)/2." (The idea is that P(n) should be an assertion that for any n is verifiably
either true or false.) The proof will now proceed in two steps: the initial step and the
inductive step.
Initial Step. We must verify that P(1) is True. P(1) asserts "1 = 1(2)/2", which is
clearly true. So we are done with the initial step.
Inductive Step. Here we must prove the following assertion: "If there is a k such that
P(k) is true, then (for this same k) P(k+1) is true." Thus, we assume there is a k such
that 1 + 2 + ... + k = k (k+1)/2. (We call this the inductive assumption.) We must
prove, for this same k, the formula 1 + 2 + ... + k + (k+1) = (k+1)(k+2)/2.
This is not too hard: 1 + 2 + ... + k + (k+1) = k(k+1)/2 + (k+1) = (k(k+1) + 2 (k+1))/2 =
(k+1)(k+2)/2. The first equality is a consequence of the inductive assumption.
Loop Invariant
For
Loop Invariant
Loop Invariant
Basic Probability
Independence: Two events A and B are independent if P(A B)=P(A).P(B)

Conditional Probability : The conditional probability of an event B occurs given an


event A is denoted by P(B/A), which is defined as

The expected value of a random variable is the weighted average of all possible values
that this random variable can take on. The weights used in computing this average
correspond to the probabilities in case of a discrete random variable, or densities in
case
of Suppose
a continuous random
random variable.
variable X can take value x1 with probability p1, value x2 with
probability p2, and so on, and, lastly, it can take value xk with probability pk. Then
the expectation of this random variable X is defined as
E(X)= x1. p1+ x2. p2 + ………….+ xk. pk                                           

If X and Y be two arbitrary random variables then


E(X+Y)=E(X)+E(Y)
E(XY)=E(X).E(Y)

You might also like