0% found this document useful (0 votes)
132 views107 pages

Ecs 32B Introduction To Data Structures: Winter 2024 - Instructor Siena Saltzen

Uploaded by

jaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views107 pages

Ecs 32B Introduction To Data Structures: Winter 2024 - Instructor Siena Saltzen

Uploaded by

jaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

ECS 32B

Introduction to Data Structures


Winter 2024 - Instructor Siena Saltzen
Administrative
● HW 1 Due Tmr
● HW 2 Coming out tonight/Tmr morning
● SLAC NIGHT TMR!
● SDC Peeps should be good
● Q 2 thoughts?
● Midterm next week!
● Poke me for Practice Exam, should be done by Thurs.
● EXAM INFO:
○ In Class
○ On Gradescope
○ One 3x5 Note Card
○ Will be 45-50Qs, BUT CURVED
○ No other notes/tools
○ No Quiz Exam Week
○ Pseudo Programming
○ Material Through this week
Searching

Let's revisit two different types of search. Search is about finding one item in a
collection of items. We’ll assume that we’re searching arrays. Usually we’re just looking
to confirm that the item is in the list, so we return True or False, but we could also return
the location or index of the item in the list.

C++ has the std::find , std::binary_search operator to do this same thing.


Search

Not only will we look at two different types of search, but we’ll look at two
different implementations of each type:
• sequential (or linear) search – iterative and recursive
• binary search – iterative and recursive
Sequential search

Sequential search using iteration (i.e., a loop) begins with looking at the list element at index
0. If the element is what the function is searching for, the function is done and returns the
index. If the list element is not what is being looked for, the index is incremented, and the
steps above are repeated for index 1. This continues until either the target item is found, or
the end of the list is reached (in which case, the function returns False).

How do we change this if we want to return the index of the found item?
What’s the time complexity of sequential search?
Sequential Search
O(n) if targets are distributed.
What operation(s) are we counting in this analysis?

● Comparisons that occur every loop

If the list is ordered, how would we


change the function? Don’t always
have to go to end of list to find out
if the target item is in the list.
Add another check for alist[pos] > item. (or < depending on how it
sorted)

How would that change the time complexity?

Might be some impact if you do a lot of searching for items that aren’t
there, but if not, it’s still O(n).
Sequential search with recursion

The recursive version of sequential search


begins by checking to see if it’s looking at the
empty list. If so, the search is done and
unsuccessful, so return False. If it’s not the
empty list, the function looks to see if the first
thing on the list (alist[0]) is the target item. If so,
return True. Otherwise, the function calls itself
with the list of all elements except the first
element (alist[1:]).
SS with Recursion

What’s the time complexity of sequential search with recursion?

Still O(n)

What is the impact of using the slice operator(alist[1:])?


Slice incurs a cost but it’s not proportional to n, so O(k)
Binary search with iteration

Binary search is only useful if the list is ordered. When a collection of items is sorted, binary
search for one item can be much faster than a sequential search.
The search starts with finding the item at the midpoint of the list. If that item is the target, then
the procedure is done and returns True. If that item is not the target, then the procedure
determines whether the target is less than the midpoint value or greater than (or equal to) the
midpoint value. Then the loop starts again but on that segment of the list that might contain the
target value. This approach is often called a divide-and-conquer strategy, because we’re
repeatedly dividing the search scope by half
https://www.tutorialspoint.com/binary-search-recursive-and-iterative-in-c-program
Binary Search
What’s the time complexity of binary search?

O(log2 n)

Why? Because we’re reducing what’s left to search by ½ with every comparison.

A sequential search on an unordered list is probably more simply implemented


as a loop instead of a series of recursive function calls. But for binary search on
an ordered list, a recursive search algorithm is a more simple and elegant
solution
than the previous loop-based approach...
Binary search with iteration

What’s the time complexity of binary


search? O(log2 n)
Why? Because we’re reducing what’s left to
search by ½ with every comparison.
Which is Better

For binary search on an ordered list, a recursive search algorithm is arguably a more
simple and elegant solution than the loop-based approach. The recursive approach still
uses up stack space, but the space complexity, like the time complexity, is O(log2 n).
That's quite acceptable.
Fun fact: Python's default stack limit seems to be about 1000 items on the stack. What
is C++ recursion depth? Hah! There is no limit on the depth of recursion! When it breaks!
Recursion vs. Iteration

Recursion vs. Iteration


Both involve the repetition of a sequence of statements.

An iterative solution exists for any problem solvable by recursion.


(On the midterm)

An iterative solution may be more efficient.


A recursive solution is often easier to understand.
Recursion vs. Iteration

Although the iterative and recursive solutions to binary search in a


sorted list achieve the same result with roughly the same number of
steps, there is technically more overhead for function calls and returns
than for simple loop repetition, along with the stack usage previously
noted.

This difference is small, however. Generally, if it is easier to


conceptualize a problem as recursive, it should be coded as such. With
problems like Fibonacci numbers being exceptions.
Ternary Search??
If given some basic code structure and Binary recursive search?

Can you code ternary search?

A binary search divides information into two parts, while a ternary


search does the same but into three equal parts.
Bonus Point

What is the big O of Ternary Search?

O(log3N)
Sorting and Searching

Computers are essential for keeping track of large


quantities of data and finding little pieces of that data on
demand. Searching for and finding data when necessary is
made much easier when the data is sorted in some way, so
computer people think a lot about how to sort things.
Finding medical records, banking information, income tax
returns, driver's license information, and your academic
records, all depend on the information being sorted.
Fun Sorting Terms

Stable sorting algorithm

Parallelizable algorithm

In-place sorting algorithm

Online vs offline algorithm


Stable Sorting
Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a
sorting algorithm is stable if whenever there are two records R and S with the same key and with R
appearing before S in the original list, R will appear before S in the sorted list.
Stability is important to preserve order over multiple sorts on the same data set. For example, say
that student records consisting of name and class section are sorted dynamically, first by name, then
by class section. If a stable sorting algorithm is used in both cases, the sort-by-class-section
operation will not change the name order; with an unstable sort, it could be that sorting by section
shuffles the name order, resulting in a non-alphabetical list of students.

Source : Wikipedia
In-Place
An in-place sorting algorithm is a sorting algorithm that sorts the input array in place, without
using any additional memory. This means that the algorithm does not need to create a new array
to store the sorted elements.

In-place sorting algorithms are often used in applications where memory is limited.
Eg. Selection Sort: An in-place sorting algorithm that divides the list into two parts — the
sorted part and the unsorted part. It repeatedly selects the smallest (or largest) element from the
unsorted part and moves it to the end of the sorted part.
Online vs Offline algorithm
● In computer science, an online algorithm is one that can process its input
piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the
algorithm, without having the entire input available from the start. In contrast, an
offline algorithm is given the whole problem data from the beginning and is required
to output an answer which solves the problem at hand.

○ An offline algorithm requires all information BEFORE the algorithm starts.


■ For example: selection sort is offline because step 1 is: Find the minimum
value in the list. To do this, you need to have the entire list available -
otherwise, how could you possibly know what the minimum value is? You
cannot.
○ Insertion sort, by contrast, is online because it does not need to know anything
about what values it will sort and the information is requested WHILE the
algorithm is running. Simply put, it can grab new values at every iteration.

Source : Wikipedia
Parralleziable
An algorithm is a sequence of steps that take inputs from the user and after some computation,
produces an output. A parallel algorithm is an algorithm that can execute several instructions
simultaneously on different processing devices and then combine all the individual outputs to
produce the final result.
Parallelism is the process of processing several set of instructions simultaneously. It reduces the
total computational time. Parallelism can be implemented by using parallel computers, i.e. a
computer with many processors. Parallel computers require parallel algorithm, programming
languages, compilers and operating system that support multitasking.

Source : Wikipedia
Sorting and Searching

Does the efficiency of one approach to


sorting when compared to another really make a
difference? It could. Here's an example of a well-known,
simple, but not very efficient sorting algorithm that's easily
implemented using a list.
Selection Sort

Let's say we want to sort the values in the list


at the left in increasing order. One way to
approach the problem would be to use an
algorithm called selection sort. We start by
setting a pointer to the first element in the list;
this is where the smallest value in the list will
be placed. Then we'll look at every value in this
unsorted list and find the minimum value
Once we've found the minimum
Selection Sort
value, we swap that value with the
one we selected at the beginning.
At this point we know that the smallest number in the
list is in the first (index 0) element in the list. That is,
the first element is sorted, and the rest of the list
remains unsorted. So now we can select the second
element of the list to be the location which will hold
the next smallest value in the list.

In other words, we'll do all that stuff we just did,


only we'll do it only to the unsorted part of the
list -- in this case, all but the first element.
Continue with the Steps
Selection sort

As the example proceeded, you watched the red


arrow on the left and the yellow arrow on the right
move down the list. The arrows represent two
different variables, each one containing an index into
the list. A C++ program that performs a selection sort
on a list of numbers is shown on the next slide; when
you look at it, think of the outer loop as controlling
the movement of the red arrow, and the inner loop as
controlling the movement of the yellow arrow.
Selection Sort
Selection Sort

Some textbooks does this slightly differently. It may


start at the bottom of the list to put the largest value
in the list in the correct place, then moves upward to
place the next largest value, and so on.
Whether the sorting goes from small to large or large
to small, the algorithm is still selection sort.
Now for the good stuff

The selection sort algorithm is a comparison-based sort, so to get


some sense of the time requirements, we count the comparisons that
must be made to complete the sorting.

We can go back to the selection sort example and count the


comparisons. The first pass through the list of 5 elements started with
16 being compared to 3, then 3 was compared to 19, 8, and 12. There
were 4 comparisons. The value 3 was moved into the location at index
0.
Estimating time required to sort
We can go back to the selection sort example and count
the comparisons. The first pass through the list of 5
elements started with 16 being compared to 3, then 3
was compared to 19, 8, and 12. There were 4
comparisons. The value 3 was moved into the location
at index 0. Then the second pass through the list
began, starting with index 1. 16 was compared to 19,
then 16 was compared to 8, which became the new
minimum and was compared to 12. So among 4
elements in the list, there were 3 comparisons.
It takes 4 passes through the list to get it completely
sorted. There are 4 comparisons on the first pass, 3
comparisons on the second pass, 2 comparisons on the
third pass, and 1 comparison on the last pass. That is, it
takes 4 + 3 + 2 + 1 = 10 comparisons to sort a list of five
values.
If you do this same computation on a list with six values,
you'll find it takes 5 + 4 + 3 + 2 + 1 = 15 comparisons to
sort the list. Do you see a familiar pattern?
It takes 4 passes through the list to get it completely sorted. There are 4 comparisons on the
first pass, 3 comparisons on the second pass, 2 comparisons on the third pass, and 1
comparison on the last pass. That is, it takes 4 + 3 + 2 + 1 = 10 comparisons to sort a list of
five values.
If you do this same computation on a list with six values, you'll find it takes 5 + 4 + 3 + 2 + 1
= 15 comparisons to sort the list. Do you see a familiar pattern?

The number of comparisons required to perform selection sort on a list of N values is


given by the expression: N*(N-1)/2. The time complexity here is O(N^2).
Estimating time required to sort
Either way, it should be easy to see that as N, the number of values in
the list gets very big, the number of comparisons needed to sort the list
grows in proportion to N2, with the other terms becoming insignificant
by comparison.
So…
So sorting a list of 1,000 values would require approximately 1,000,000
comparisons. Similarly, sorting a list of 1,000,000 values would take
approximately 1,000,000,000,000 comparisons.

As the number of values to be sorted grows, the number of comparisons


required to sort them grows much faster. Fortunately, there are other
sorting algorithms that are much less time-consuming, and we’ll talk
about them too. In the meantime, here are some real numbers to help
you think about just how long it might take to sort some really big lists...
Estimating time required to sort
Let's assume that your computer could make 1 billion (1,000,000,000) comparisons per
second. That's a lot of comparisons in a second. And let's say your computer was
using selection sort to sort the names of the people in the following hypothetical
telephone books. Here's some mathematical food for thought.
But note that the best sorting algorithms can run in N log2 N time instead
of N2 time. If N is 7,000,000,000, then log2 N is just a little less than 33.
So if we round up to 33, N log N would be 231,000,000,000 comparisons
instead of 49,000,000,000,000,000,000 (N2) comparisons.
If our computer can perform 1,000,000,000 comparisons per second,
then sorting all the names in the whole world phone book now takes 231
seconds instead of 1554 years. And that's why it's important to think
about the efficiency of algorithms, especially as the size of your data set
gets really really big.
Variation on a Theme
Insertion sort
Insertion sort takes a slightly different approach. Let’s assume
that when we begin to sort the elements of a list, the very first
element represents a sorted list. Everything after that remains to
be sorted.
So we store the first value in the unsorted portion (indicated by
the red arrow) as the item to be inserted into the sorted portion.
Insertion Sort
Insertion Sort

What input would let insertion sort show off its


best possible performance? (Bonus Point)

A list that’s already sorted.

How many comparisons would be made? (Bonus Point)

How many elements would be moved? (Bonus Point)


Insertion Sort

What input would be the worst possible case for


insertion sort? (Bonus Point)

Sorted, but reversed.

How many comparisons would be made?

How many elements would be moved?


Big O

So what’s the worst case Big-O for insertion sort?

O(n2), or pretty much the same as for selection sort


Can we sort faster than O(n^2)?

Selection sort and insertion sort both have worst case time complexity
of O(n^2). For small n, that's perfectly acceptable. But as we saw
previously, when n gets really big, O(n^2) is far from acceptable.

Back when we were analyzing the time complexity of selection sort, we


raised the possibility of O(n lg n) time complexity. As we saw, that
would be a huge advantage when sorting really large data sets.

Is it possible to get that kind of speedup? Yes.


Tune in Wed for the answer!
Merge
Mergesort takes a different approach to the problem. It falls in the class of
algorithms called “divide and conquer”.
In mergesort, the problem space is continually split in half by applying the
algorithm recursively to each half, until the base case is reached.
A simple algorithm for mergesort is:

where ‘list’ could be any sequential data structure


Mergesort
Here’s how these functions would sort a list of integers:
Mergesort
What kind of time complexity are we dealing with here?
Let’s just look at half the work. Multiply by two if you’re concerned.

Those last three lines represent the merge of n elements from the original list to larger
and larger lists, along with the required comparisons to maintain sorted order while
merging. So each round of merging is O(n).
Mergesort

What kind of time complexity are we dealing with here?


There are 3 rounds of merging when n = 8. How many rounds of
merging would be required if n = 16? n = 32? What does that tell you?
Mergesort
What kind of time complexity are we dealing with here?
There are 3 rounds of merging when n = 8. How many rounds of merging would be
required if n = 16? n = 32? What does that tell you?
How does O(lg n) sound?
Mergesort

What kind of time complexity are we dealing with here?


Worst case time complexity for mergesort is O(n lg n).
Mergesort
What kind of time complexity are we dealing with here?
Worst case time complexity for mergesort is O(n lg n).
What's the best case time complexity for mergesort?
Mergesort

What kind of time complexity are we dealing with here?


Worst case time complexity for mergesort is O(n lg n).
What's the best case time complexity for mergesort? O(n lg n) again.
Mergesort

● What about space complexity for this approach?


● With all the list copying, we're effectively using up another list's worth of memory at
each level.
● n new list cells at each level and 2 lg n levels implies O(n lg n) worst case space
complexity.
There's another way to do mergesort, that only takes up 2 * n (i.e. O(n)) space. It
involves creating another list of size n to be used as temporary storage. Partially sorted
components are copied to the temp list, then back to the original list as needed. That
program is less elegant.
As an Aside
Applications of Merge Sort:
● Sorting large datasets: Merge sort is particularly well-suited for sorting large datasets due to its guaranteed worst-case time complexity of O(n log n).
● External sorting: Merge sort is commonly used in external sorting, where the data to be sorted is too large to fit into memory.
● Custom sorting: Merge sort can be adapted to handle different input distributions, such as partially sorted, nearly sorted, or completely unsorted data.

Advantages of Merge Sort:


● Stability: Merge sort is a stable sorting algorithm, which means it maintains the relative order of equal elements in the input array.
● Guaranteed worst-case performance: Merge sort has a worst-case time complexity of O(N logN), which means it performs well even on large datasets.
● Parallelizable: Merge sort is a naturally parallelizable algorithm, which means it can be easily parallelized to take advantage of multiple processors or
threads.

Drawbacks of Merge Sort:


● Space complexity: Merge sort requires additional memory to store the merged sub-arrays during the sorting process.
● Not in-place: Merge sort is not an in-place sorting algorithm, which means it requires additional memory to store the sorted data. This can be a
disadvantage in applications where memory usage is a concern.
● Not always optimal for small datasets: For small datasets, Merge sort has a higher time complexity than some other sorting algorithms, such as
insertion sort. This can result in slower performance for very small datasets.
Mergesort
Merge Sort Visualizer
Quicksort
If we want to sort big sequences quickly, without the extra memory and copying back
and forth of mergesort, the answer is quicksort.

In real-world time, it is the fastest sorting algorithm known.

It can be performed without extra memory (but our version isn't that one). While its
worst-case run time is O(n^2), its average run time is O(n lg n). A simple algorithm for
quicksort is:
Quicksort
Here’s an imperfect trace of quicksort on a small list (assume we’ve drawn little boxes to
represent the list):
Quicksort complexity

What input will give quicksort its worst performance? (Bonus)


Quicksort complexity

Performance is best when the pivot splits the unsorted portion in half. It's worst
when the pivot doesn't split it at all.
Performance depends on choosing good pivots.
With good pivot choices, average case behavior for quicksort is O(n lg n).
● Always pick the first element as a pivot.
● Always pick the last element as a pivot
● Pick a random element as a pivot.
● Pick the middle as the pivot.
Choosing a Pivot
Simple: Pick the first or last element of the range. (bad on partially sorted input)

Better: Pick the item in the middle of the range. (better on partially sorted input)

However, picking any arbitrary element runs the risk of poorly partitioning the array of size n into two arrays of size 1
and n-1. If you do that often enough, your quicksort runs the risk of becoming O(n^2).

One improvement I've seen is pick median(first, last, mid); In the worst case, it can still go to O(n^2), but
probabilistically, this is a rare case.

For most data, picking the first or last is sufficient. But, if you find that you're running into worst case scenarios often
(partially sorted input), the first option would be to pick the central value( Which is a statistically good pivot for
partially sorted data).

So choosing a random pivot minimizes the chance that you will encounter worst-case O(n2) performance since
(always choosing first or last would cause worst-case performance for nearly-sorted or nearly-reverse-sorted data).
Beware of relative performance of comparisons, though; if your comparisons are costly, then middle does more
comparisons than choosing (a single pivot value) at random .
Complexity
● Best Case: Ω (N log (N))
The best-case scenario for quicksort occur when the pivot chosen at the each step divides the array into roughly equal
halves.
In this case, the algorithm will make balanced partitions, leading to efficient Sorting.
● Average Case: θ ( N log (N))
Quicksort’s average-case performance is usually very good in practice, making it one of the fastest sorting Algorithm.
● Worst Case: O(N^2)
The worst-case Scenario for Quicksort occur when the pivot at each step consistently results in highly unbalanced
partitions. When the array is already sorted and the pivot is always chosen as the smallest or largest element. To
mitigate the worst-case Scenario, various techniques are used such as choosing a good pivot (e.g., median of three) and
using Randomized algorithm (Randomized Quicksort ) to shuffle the element before sorting.
● Auxiliary Space: O(1), if we don’t consider the recursive stack space. If we consider the recursive stack space then, in the
worst case quicksort could make O(N).
General

Advantages of Quick Sort:


● It is a divide-and-conquer algorithm that makes it easier to solve problems.
● It is efficient on large data sets.
● It has a low overhead, as it only requires a small amount of memory to function.

Disadvantages of Quick Sort:


● It has a worst-case time complexity of O(N2), which occurs when the pivot is chosen poorly.
● It is not a good choice for small data sets.
● It is not a stable sort, meaning that if two elements have the same key, their relative order will not be
preserved in the sorted output in case of quick sort, because here we are swapping elements according to
the pivot’s position (without considering their original positions).
Bubble sort and Shell sort

They are well-known approaches to sorting that are part of the computer
science vocabulary, even though they are often not particularly useful in
practice.
Bubble Sort
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the
adjacent elements if they are in the wrong order. This algorithm is not suitable for large
data sets as its average and worst-case time complexity is quite high.
In Bubble Sort algorithm,
● traverse from left and compare adjacent elements, then the higher one is placed at
right side.
● In this way, the largest element is moved to the rightmost end at first.
● This process is then continued to find the second largest and place it and so on
until the data is sorted.
Bubble Sort

https://www.geeksforgeeks.org/bubble-sort/?ref=header_search
Bubble Sort
Bubble Sort
Bubble Sort complexity
Time Complexity: O(N2)
Auxiliary Space: O(1)

Advantages of Bubble Sort:


● Bubble sort is easy to understand and implement.
● It does not require any additional memory space.
● It is a stable sorting algorithm, meaning that elements with the same key value maintain their relative order in the sorted
output.

Disadvantages of Bubble Sort:


● Bubble sort has a time complexity of O(N^2) which makes it very slow for large data sets.

Due to its simplicity, bubble sort is often used to introduce the concept of a sorting algorithm. In computer graphics, it is popular for
its capability to detect a tiny error (like a swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n).
Shell Sort

Shell sort is a generalized version of the insertion sort algorithm. It first sorts elements that
are far apart from each other and successively reduces the interval between the elements
to be sorted.
● Step 1 − Start
● Step 2 − Initialize the value of gap size. Example: h
● Step 3 − Divide the list into smaller sub-part. Each must have equal intervals to h
● Step 4 − Sort these sub-lists using insertion sort
● Step 5 – Repeat this step 2 until the list is sorted.
● Step 6 – Print a sorted list.
● Step 7 – Stop.
Shell Sort
Shell Sort
Worst Case Complexity
The worst-case complexity for shell sort is O(n2)

Best Case Complexity


When the given array list is already sorted the total count of comparisons of each interval is equal to the size of
the given array.
So best case complexity is Ω(n log(n))

Average Case Complexity


The shell sort Average Case Complexity depends on the interval selected by the programmer.
θ(n log(n)2).
THE Average Case Complexity: O(n*log n)~O(n1.25)

Space Complexity
The space complexity of the shell sort is O(1).
Quick Recap
https://www.explainxkcd.co
m/wiki/index.php/1185:_Ine
ffective_Sorts

You might also like