0% found this document useful (0 votes)
4 views

Lecture 8_Data structure and algorithm

The document introduces data structures and algorithms, emphasizing their importance in computer science and modern technology. It discusses principles of algorithm analysis, including measuring running time and space usage, and introduces concepts like big-Oh notation for characterizing algorithm efficiency. Additionally, it covers recursion and provides examples such as binary search and the factorial function to illustrate these concepts.

Uploaded by

darren boesono
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 8_Data structure and algorithm

The document introduces data structures and algorithms, emphasizing their importance in computer science and modern technology. It discusses principles of algorithm analysis, including measuring running time and space usage, and introduces concepts like big-Oh notation for characterizing algorithm efficiency. Additionally, it covers recursion and provides examples such as binary search and the factorial function to illustrate these concepts.

Uploaded by

darren boesono
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Introduction to Computer Science:

Programming Methodology

Lecture 8
Data Structure and Algorithm - Intro
Prof. Jiangfan Yu
School of Science and Engineering
Data structure and algorithm
• A data structure is a systematic way of organizing and
accessing data

• An algorithm is a step-by-step procedure for performing


some task in a finite amount of time.
Why study data structure and algorithm?
• Important for all other branches of computer science

• Plays a key role in modern technological innovation

• Moore’s law predicts that the density of transistors in integrated


circuits would continue to double every 1 to 2 years

• However, in many areas, performance gains due to the improvements


in algorithms have greatly exceeded even the dramatic performance
gains due to increased processor speed
Why study data structure and algorithm?
• Provide novel “lens” on processes outside of computer
science and technology, such as quantum mechanics,
economic markets, evolution

• Challenging (good for your brain!!) and funny


Example: Integer Multiplication
• Inputs: two n-digits number x and y

• Output: the product of x and y

• Primitive operations: add or multiply 2 single digit numbers


The algorithm designer’s mantra

• “Perhaps the most important principle for the good


algorithm designer is to refuse to be content”

Aho, Hopcroft, and Ullman, The Design and Analysis of Computer


Algorithms, 1974
How do we define a“good” algorithm?

• The primary analysis of algorithms involves characterizing the running


times and space usage of algorithms and data structure operations

• Running time is a natural measure of “goodness,” since time is a


precious resource—computer solutions should run as fast as possible

• Space usage is another major issue to consider when we design an


algorithm, since we only have limited storage spaces
Measuring the running time experimentally
Visualize the running time
• Running time and space
usage are dependent on
the size of the input

• Perform independent
experiments on many
different test inputs of
various sizes

• Visualize the results by


plotting the
performance of each
run of the algorithm as a
point
Challenges of experimental analysis

• Experimental running times of two algorithms are difficult to directly


compare unless the experiments are performed in the same hardware
and software environments

• Experiments can be done only on a limited set of test inputs; hence,


they leave out the running times of inputs not included in the
experiment (and these inputs may be important)

• An algorithm must be fully implemented in order to execute it to


study its running time experimentally
Principle of algorithm analysis 1: Counting
primitive operations
• To analyse the running time of an algorithm without performing
experiments, we perform an analysis directly on a high-level description of
the algorithm

• We define a set of primitive operations such as the following:


ü Assigning an identifier to an object
ü Determining the object associated with an identifier
ü Performing an arithmetic operation (for example, adding two numbers)
ü Comparing two numbers
ü Accessing a single element of a Python list by index
ü Calling a function (excluding operations executed within the function)
ü Returning from a function.
Principle of algorithm analysis 2: Measuring
Operations as a Function of Input Size

• To capture the order of growth of an algorithm’s running


time, we will associate, with each algorithm, a function f (n)
that characterizes the number of primitive operations that
are performed as a function of the input size n
Principle of algorithm analysis 3: Focusing
on the Worst-Case Input
• An algorithm may run faster on some inputs than it does on others of
the same size. Thus, we may wish to express the running time of an
algorithm as the function of the input size obtained by taking the
average over all possible inputs of the same size

• Unfortunately, such an average-case analysis is typically quite


challenging. It requires us to define a probability distribution on the
set of inputs, which is often a difficult task

• We will characterize running times in terms of the worst case, as a


function of the input size, n, of the algorithm
The 7 functions used in algorithm analysis
• We may use the following 7 functions to measure the time complexity
of an algorithm: constant, logarithm, linear, N-log-N, quadratic, cubic
and other polynomials, exponential
Asymptotic Analysis
• In algorithm analysis, we focus on the growth rate of the running time as a
function of the input size n, taking a “big-picture” approach

• Vocabulary for the analysis and design of algorithms

• “Sweet spot” for high-level reasoning about algorithms

• Coarse enough to supress unnecessary details, e.g.


architecture/language/compiler…

• Sharp enough to make meaningful comparisons between algorithms


The big Oh notation
• Let f(n) and g(n) be functions mapping positive integers to
positive real numbers.
• We say that f(n) is O(g(n)) if there is a real constant c > 0 and
an integer constant n0 ≥ 1 such that
f(n) ≤ cg(n), for n ≥ n0

• This definition is often referred to as the “big-Oh” notation


• Example: The function 8n+5 is O(n).
The big Oh notation
• The big-Oh notation allows us to say that a function f(n) is
“less than or equal to” another function g(n) up to a constant
factor and in the asymptotic sense as n grows toward infinity

• The big-Oh notation is used widely to characterize running


times and space bounds in terms of some parameter n,
which varies from problem to problem, but is always defined
as a chosen measure of the “size” of the problem
Some Properties of the Big-Oh
Notation
• The big-Oh notation allows us to ignore constant factors and lower-order
terms and focus on the main components of a function that affect its
growth

• Example: 5𝑛! + 3𝑛" + 2𝑛# + 4𝑛 + 1 is O(𝑛! )


• Example: 2$%# is O(2$ )
• Example: 2𝑛 + 100𝑙𝑜𝑔𝑛 is O(𝑛)

• In general, we should use the big-Oh notation to characterize a function as


closely as possible
Comparative analysis
Question: Suppose two algorithms solving the same problem
are available: an algorithm A, which has a running time of
O(n), and an algorithm B, which has a running time of O(𝑛! ).
Which algorithm is better?

Answer: Algorithm A is asymptotically better than algorithm B


Comparative analysis
• We can use the big-Oh notation to order classes of functions by
asymptotic growth rate
• Our seven functions are ordered by increasing growth rate in the
following sequence
Why AlghaGo is a remarkable achievement?
• If we use brutal-force
to search the best
move in Go, the time
complexity is at the
order of 𝑂(10" )

• The search space is


even larger than the
number of atoms in
the universe!!!
The line of tractability
• To differentiate efficient and inefficient algorithms, the
general line is between polynomial time algorithms and
exponential time algorithms

• The distinction between polynomial-time and exponential-


time algorithms is considered a robust measure of
tractability
Example: Finding the smallest number in a list

• What is the time complexity of this algorithm?


Recursion
• Recursion is a technique by which a function makes one or
more calls to itself during execution

• Recursion provides an elegant and powerful alternative for


performing repetitive tasks

• Recursion is an important technique in the study of data


structures and algorithms
Inception
Example: The factorial function
• The factorial of a positive integer n, denoted n!, is defined as follows:

• The factorial function is important because it is known to equal the


number of ways in which n distinct items can be arranged into a
sequence, that is, the number of permutations of n items
The recursive definition

• First, a recursive definition contains one or more base cases,


which are defined non-recursively in terms of fixed
quantities

• Second, it also contains one or more recursive cases, which


are defined by appealing to the definition of the function
being defined
The recursive definition of
factorial function
• The factorial function can be naturally defined in a recursive way, for
example, 5! = 5 ·(4 · 3 · 2 · 1) = 5 · 4!

• More generally, for a positive integer n, we can define n! to be n ·


(n−1)!

• Therefore, the recursive definition of factorial function is:


Solution
How Python implements recursion
• In Python, each time a function (recursive or otherwise) is called, a
structure known as an activation record or frame is created to store
information about the progress of that invocation of the function

• This activation record stores the function call’s parameters and local
variables

• When the execution of a function leads to a nested function call, the


execution of the former call is suspended and its activation record
stores the place in the source code at which the flow of control
should continue upon return of the nested call
The recursive trace
Example: Drawing an English ruler
• We denote the length of the tick
designating a whole inch as the
major tick length.

• Between the marks for whole


inches, the ruler contains a series
of minor ticks, placed at intervals
of 1/2 inch, 1/4 inch, and so on.

• As the size of the interval


decreases by half, the tick length
decreases by one
Recursive implementation of English
ruler

•An interval with a central tick length L ≥ 1 is


composed of:
ü An interval with a central tick length L−1
ü A single tick of length L
ü An interval with a central tick length L−1
Solution
The recursive
trace for
English ruler
Example: Binary search
• A classic and very useful recursive algorithm, binary search, can be
used to efficiently locate a target value within a sorted sequence of n
elements

• When the sequence is unsorted, the standard approach to search for


a target value is to use a loop to examine every element, until either
finding the target or exhausting the data set; This is known as the
sequential search algorithm
Binary search
• When the sequence is sorted and indexable, binary search is a much
more efficient algorithm

• For any index j, we know that all the values stored at indices 0, . . . ,
j−1 are less than or equal to the value at index j, and all the values
stored at indices j+1, . . . ,n−1 are greater than or equal to that at
index j
The strategy of binary search
• We call an element of the sequence a candidate if, at the current
stage of the search, we cannot rule out that this item matches the
target

• The algorithm maintains two parameters, low and high, such that all
the candidate entries have index at least low and at most high

• Initially, low = 0 and high = n−1. We then compare the target value to
the median candidate, that is, the item data[mid] with index
mid = (low+high)/2
The strategy of binary search
• If the target equals data[mid], then we have found the item we
are looking for, and the search terminates successfully

• If target < data[mid], then we recur on the first half of the


sequence, that is, on the interval of indices from low to mid−1

• If target > data[mid], then we recur on the second half of the


sequence, that is, on the interval of indices from mid+1 to high
Solution
Time complexity of binary search

Proposition: The binary search algorithm runs in


O(logn) time for a sorted sequence with n elements
Proof

You might also like