Ecs 32B Introduction To Data Structures: Winter 2024 - Instructor Siena Saltzen
Ecs 32B Introduction To Data Structures: Winter 2024 - Instructor Siena Saltzen
Let's revisit two different types of search. Search is about finding one item in a
collection of items. We’ll assume that we’re searching arrays. Usually we’re just looking
to confirm that the item is in the list, so we return True or False, but we could also return
the location or index of the item in the list.
Not only will we look at two different types of search, but we’ll look at two
different implementations of each type:
• sequential (or linear) search – iterative and recursive
• binary search – iterative and recursive
Sequential search
Sequential search using iteration (i.e., a loop) begins with looking at the list element at index
0. If the element is what the function is searching for, the function is done and returns the
index. If the list element is not what is being looked for, the index is incremented, and the
steps above are repeated for index 1. This continues until either the target item is found, or
the end of the list is reached (in which case, the function returns False).
How do we change this if we want to return the index of the found item?
What’s the time complexity of sequential search?
Sequential Search
O(n) if targets are distributed.
What operation(s) are we counting in this analysis?
Might be some impact if you do a lot of searching for items that aren’t
there, but if not, it’s still O(n).
Sequential search with recursion
Still O(n)
Binary search is only useful if the list is ordered. When a collection of items is sorted, binary
search for one item can be much faster than a sequential search.
The search starts with finding the item at the midpoint of the list. If that item is the target, then
the procedure is done and returns True. If that item is not the target, then the procedure
determines whether the target is less than the midpoint value or greater than (or equal to) the
midpoint value. Then the loop starts again but on that segment of the list that might contain the
target value. This approach is often called a divide-and-conquer strategy, because we’re
repeatedly dividing the search scope by half
https://www.tutorialspoint.com/binary-search-recursive-and-iterative-in-c-program
Binary Search
What’s the time complexity of binary search?
O(log2 n)
Why? Because we’re reducing what’s left to search by ½ with every comparison.
For binary search on an ordered list, a recursive search algorithm is arguably a more
simple and elegant solution than the loop-based approach. The recursive approach still
uses up stack space, but the space complexity, like the time complexity, is O(log2 n).
That's quite acceptable.
Fun fact: Python's default stack limit seems to be about 1000 items on the stack. What
is C++ recursion depth? Hah! There is no limit on the depth of recursion! When it breaks!
Recursion vs. Iteration
O(log3N)
Sorting and Searching
Parallelizable algorithm
Source : Wikipedia
In-Place
An in-place sorting algorithm is a sorting algorithm that sorts the input array in place, without
using any additional memory. This means that the algorithm does not need to create a new array
to store the sorted elements.
In-place sorting algorithms are often used in applications where memory is limited.
Eg. Selection Sort: An in-place sorting algorithm that divides the list into two parts — the
sorted part and the unsorted part. It repeatedly selects the smallest (or largest) element from the
unsorted part and moves it to the end of the sorted part.
Online vs Offline algorithm
● In computer science, an online algorithm is one that can process its input
piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the
algorithm, without having the entire input available from the start. In contrast, an
offline algorithm is given the whole problem data from the beginning and is required
to output an answer which solves the problem at hand.
Source : Wikipedia
Parralleziable
An algorithm is a sequence of steps that take inputs from the user and after some computation,
produces an output. A parallel algorithm is an algorithm that can execute several instructions
simultaneously on different processing devices and then combine all the individual outputs to
produce the final result.
Parallelism is the process of processing several set of instructions simultaneously. It reduces the
total computational time. Parallelism can be implemented by using parallel computers, i.e. a
computer with many processors. Parallel computers require parallel algorithm, programming
languages, compilers and operating system that support multitasking.
Source : Wikipedia
Sorting and Searching
Selection sort and insertion sort both have worst case time complexity
of O(n^2). For small n, that's perfectly acceptable. But as we saw
previously, when n gets really big, O(n^2) is far from acceptable.
Those last three lines represent the merge of n elements from the original list to larger
and larger lists, along with the required comparisons to maintain sorted order while
merging. So each round of merging is O(n).
Mergesort
It can be performed without extra memory (but our version isn't that one). While its
worst-case run time is O(n^2), its average run time is O(n lg n). A simple algorithm for
quicksort is:
Quicksort
Here’s an imperfect trace of quicksort on a small list (assume we’ve drawn little boxes to
represent the list):
Quicksort complexity
Performance is best when the pivot splits the unsorted portion in half. It's worst
when the pivot doesn't split it at all.
Performance depends on choosing good pivots.
With good pivot choices, average case behavior for quicksort is O(n lg n).
● Always pick the first element as a pivot.
● Always pick the last element as a pivot
● Pick a random element as a pivot.
● Pick the middle as the pivot.
Choosing a Pivot
Simple: Pick the first or last element of the range. (bad on partially sorted input)
Better: Pick the item in the middle of the range. (better on partially sorted input)
However, picking any arbitrary element runs the risk of poorly partitioning the array of size n into two arrays of size 1
and n-1. If you do that often enough, your quicksort runs the risk of becoming O(n^2).
One improvement I've seen is pick median(first, last, mid); In the worst case, it can still go to O(n^2), but
probabilistically, this is a rare case.
For most data, picking the first or last is sufficient. But, if you find that you're running into worst case scenarios often
(partially sorted input), the first option would be to pick the central value( Which is a statistically good pivot for
partially sorted data).
So choosing a random pivot minimizes the chance that you will encounter worst-case O(n2) performance since
(always choosing first or last would cause worst-case performance for nearly-sorted or nearly-reverse-sorted data).
Beware of relative performance of comparisons, though; if your comparisons are costly, then middle does more
comparisons than choosing (a single pivot value) at random .
Complexity
● Best Case: Ω (N log (N))
The best-case scenario for quicksort occur when the pivot chosen at the each step divides the array into roughly equal
halves.
In this case, the algorithm will make balanced partitions, leading to efficient Sorting.
● Average Case: θ ( N log (N))
Quicksort’s average-case performance is usually very good in practice, making it one of the fastest sorting Algorithm.
● Worst Case: O(N^2)
The worst-case Scenario for Quicksort occur when the pivot at each step consistently results in highly unbalanced
partitions. When the array is already sorted and the pivot is always chosen as the smallest or largest element. To
mitigate the worst-case Scenario, various techniques are used such as choosing a good pivot (e.g., median of three) and
using Randomized algorithm (Randomized Quicksort ) to shuffle the element before sorting.
● Auxiliary Space: O(1), if we don’t consider the recursive stack space. If we consider the recursive stack space then, in the
worst case quicksort could make O(N).
General
They are well-known approaches to sorting that are part of the computer
science vocabulary, even though they are often not particularly useful in
practice.
Bubble Sort
Bubble Sort is the simplest sorting algorithm that works by repeatedly swapping the
adjacent elements if they are in the wrong order. This algorithm is not suitable for large
data sets as its average and worst-case time complexity is quite high.
In Bubble Sort algorithm,
● traverse from left and compare adjacent elements, then the higher one is placed at
right side.
● In this way, the largest element is moved to the rightmost end at first.
● This process is then continued to find the second largest and place it and so on
until the data is sorted.
Bubble Sort
https://www.geeksforgeeks.org/bubble-sort/?ref=header_search
Bubble Sort
Bubble Sort
Bubble Sort complexity
Time Complexity: O(N2)
Auxiliary Space: O(1)
Due to its simplicity, bubble sort is often used to introduce the concept of a sorting algorithm. In computer graphics, it is popular for
its capability to detect a tiny error (like a swap of just two elements) in almost-sorted arrays and fix it with just linear complexity (2n).
Shell Sort
Shell sort is a generalized version of the insertion sort algorithm. It first sorts elements that
are far apart from each other and successively reduces the interval between the elements
to be sorted.
● Step 1 − Start
● Step 2 − Initialize the value of gap size. Example: h
● Step 3 − Divide the list into smaller sub-part. Each must have equal intervals to h
● Step 4 − Sort these sub-lists using insertion sort
● Step 5 – Repeat this step 2 until the list is sorted.
● Step 6 – Print a sorted list.
● Step 7 – Stop.
Shell Sort
Shell Sort
Worst Case Complexity
The worst-case complexity for shell sort is O(n2)
Space Complexity
The space complexity of the shell sort is O(1).
Quick Recap
https://www.explainxkcd.co
m/wiki/index.php/1185:_Ine
ffective_Sorts