SP24-DS&A-Week03-Sorting-Data-Structures
SP24-DS&A-Week03-Sorting-Data-Structures
Week03 (28/29/29-Feb-2024)
M Ateeq,
Department of Data Science, The Islamia University of Bahawalpur.
Quick Sort
Introduction to Quick Sort:
Quick Sort is a highly efficient and widely used sorting algorithm that follows the Divide and
Conquer paradigm. Developed by Tony Hoare in 1960, Quick Sort exhibits remarkable
performance for average and best-case scenarios. Its efficiency lies in the fact that it doesn't
require additional memory space for sorting and is an in-place sorting algorithm.
The motivation behind Quick Sort lies in its ability to efficiently handle large datasets with an
average-case time complexity of O(nlogn) , making it one of the fastest sorting algorithms in
practice. Unlike Bubble Sort, Selection Sort, and Insertion Sort, Quick Sort does not rely on
comparing adjacent elements and, in many cases, outperforms other O(nlogn) algorithms like
Merge Sort due to its lower constant factors.
Quick Sort employs a Divide and Conquer strategy to efficiently sort an array or list. The
algorithm's methodology can be outlined as follows:
1. Divide:
Choose a "pivot" element from the array. The choice of the pivot can vary, but a
common strategy is to select any of the first, middle, and the last elements.
Partition the array into two sub-arrays: elements less than the pivot and elements
greater than the pivot.
2. Conquer:
Recursively apply Quick Sort to the sub-arrays created in the Divide step.
3. Combine:
The sorted sub-arrays are then combined, resulting in the entire array being sorted.
Key Steps:
Pivot Selection:
The efficiency of Quick Sort heavily depends on the choice of the pivot. A well-chosen
pivot can significantly reduce the number of comparisons.
Partitioning:
The partitioning step involves rearranging the elements so that those less than the pivot
come before it, and those greater come after it.
Recursion:
Quick Sort recursively applies the same process to the sub-arrays created during
partitioning until the entire array is sorted.
Advantages:
In-Place Sorting:
Quick Sort requires only a constant amount of additional memory space, making it an
in-place sorting algorithm.
Average-Case Efficiency:
The average-case time complexity of Quick Sort is O(nlogn) , making it highly efficient
for large datasets.
Adaptability:
Quick Sort is adaptive and performs well even on partially sorted arrays.
Conclusion:
Quick Sort's efficiency, adaptability, and in-place nature make it a favored choice for various
applications where sorting is a critical operation. However, it's important to note that its worst-
case time complexity is O(n2 ), which occurs when poorly chosen pivots lead to unbalanced
partitions. Nonetheless, in practice, Quick Sort's average-case performance often outweighs the
worst-case scenario, making it a preferred sorting algorithm for many real-world scenarios.
[7, 2, 1, 6, 8, 5, 3, 4]
[2, 1, 3, |4|, 8, 5, 7, 6]
The pivot, 4, is chosen as the last element (consistent with always choosing the last element
as the pivot).
Partitioning: Elements less than 4 go to the left, and elements greater than 4 go to the right.
Recursive Call on Left Sub-array: [2, 1, 3]
Recursive Call on Right Sub-array: [8, 5, 7, 6]
The pivot, 3, is chosen from the last element of the left sub-array.
Partitioning: Left sub-array is already sorted.
Recursive Call on Right Sub-array: [8, 5, 7, 6]
The pivot, 6, is chosen from the last element of the right sub-array.
Partitioning: Left sub-array is already sorted.
Recursive Call on Right Sub-array: [8, 7]
[||1, 2, 3, 4, 5, 6||, 7, 8]
The pivot, 7, is chosen from the last element of the right sub-array.
Partitioning: Left sub-array is already sorted.
Recursive Call on Right Sub-array: []
[1, 2, 3, 4, 5, 6, 7, 8]
This example ensures that the pivot is always the last element during the partitioning step,
demonstrating how Quick Sort efficiently sorts the array by recursively choosing and placing the
last element as the pivot.
Tree View:
Let's represent the steps of the Quick Sort algorithm on the example array in a tree format:
[7, 2, 1, 6, 8, 5, 3, 4]
/ \
[2, 1, 3] [4] [8, 5, 7, 6]
/ \
[1, 2] [3] [5] [6] [7, 8]
/
[7] [8]
// Swap arr[i+1] and arr[high] (put the pivot in its correct plac
e)
swap(arr[i + 1], arr[high]);
quickSort(arr, 0, n - 1);
return 0;
}
This C++ code defines the quickSort function for the Quick Sort algorithm, along with a
partition function to rearrange elements based on a chosen pivot. The main function
demonstrates the usage of the algorithm on an example array.
1. Partitioning:
n
Partitioning an array of size takes O(n)
time. This is because each element is
compared to the pivot exactly once during the partitioning process.
2. Recursion:
After partitioning, Quick Sort is applied recursively to the two sub-arrays. The
recurrence relation for the time complexity is T(n) = 2T(n/2) + O(n) .
3. Combining:
Combining the sorted sub-arrays does not contribute significantly to the time complexity.
The choice of the pivot in Quick Sort significantly impacts the algorithm's performance. The
efficiency of Quick Sort is evident when a well-chosen pivot leads to balanced partitions during
each recursion step. The common strategies for pivot selection are:
1. First/Last Element:
Choose the first or last element of the array as the pivot. This is simple but can lead to
suboptimal performance for already sorted or nearly sorted arrays.
2. Random Pivot:
Randomly select a pivot element. This reduces the chances of encountering worst-case
scenarios and improves average-case performance.
3. Median-of-Three:
Choose the pivot as the median of the first, middle, and last elements. This aims to
provide a more balanced partitioning, especially for partially sorted arrays.
The impact of pivot selection on complexity is mainly observed in the worst-case scenario, where
poorly chosen pivots result in unbalanced partitions. In such cases, the time complexity degrades
toO(n2 ) . However, on average, Quick Sort exhibits a time complexity of O(nlogn) due to its
divide-and-conquer nature.
Taxonomy of Data Structures
The taxonomy of data structures provides a systematic classification of these fundamental
components, aiding in the understanding and organization of the vast array of structures used in
computer science. It's important to recognize that varying taxonomies exist, and the perspective
from which data structures are viewed greatly influences their categorization.
Varying Taxonomies:
One common taxonomy classifies data structures into linear and non-linear categories.
Linear structures, like arrays and linked lists, organize data sequentially, while non-
linear structures, such as trees and graphs, have a more hierarchical organization.
2. Primitive vs. Non-Primitive:
Perspective Matters:
1. Usage in Algorithms:
Data structures are often categorized based on their utility in algorithms. For instance,
arrays and linked lists are crucial for sorting algorithms, while trees and graphs play
pivotal roles in searching and traversing algorithms.
2. Memory Organization:
In conclusion, the taxonomy of data structures offers a lens through which we can organize,
understand, and apply these fundamental components in computer science. Here is one possible
organization:
Data Structures
Definition: A data structure is a specialized format for organizing and storing data to perform
efficient operations on that data. It defines the way data is organized, stored, and manipulated,
allowing for easy access and modification. Data structures are essential in computer science and
programming for managing and organizing data to meet specific computational needs.
Key Characteristics:
1. Organization: Data structures define the organization and layout of data in a systematic
manner.
2. Operations: They provide operations or functions for manipulating the stored data
efficiently.
3. Efficiency: Data structures are designed to optimize various operations, such as insertion,
deletion, searching, and sorting.
4. Abstraction: They often provide a level of abstraction, hiding the underlying implementation
details.
5. Memory Management: Data structures may involve memory management to allocate and
deallocate memory efficiently.
Examples:
Arrays, Linked Lists, Stacks, Queues, Trees, Graphs, Hash Tables, Heaps
Data Types
Definition: A data type is a classification of data that specifies the type of values a variable can
hold and the operations that can be performed on it. It defines a set of values along with the
allowable operations on those values. Data types are fundamental in programming languages for
ensuring type safety and providing a foundation for variable declarations.
Key Characteristics:
1. Representation: Data types define how data is represented in a computer's memory.
2. Operations: They specify the operations that can be performed on variables of that type.
3. Size: Each data type has a specific size, determining the amount of memory it occupies.
4. Default Values: Data types often have default values assigned when a variable is declared.
5. Type Safety: Data types help in enforcing type safety, preventing unintended operations on
variables.
Examples:
Integer, Float/Double, Character, Boolean, String, Array, Pointer, Structure/Class (in object-
oriented programming)
Abstraction: Both data types and data structures provide a level of abstraction, hiding
implementation details and allowing users to interact with them at a higher level.
Memory Usage: Both concepts involve considerations related to memory, including
allocation, deallocation, and access.
Differences:
Purpose: Data types primarily define the nature of individual variables, specifying the type
of values they can hold. Data structures, on the other hand, organize and structure multiple
variables to facilitate efficient operations.
Level of Complexity: Data types are relatively simpler, defining individual variables, while
data structures involve more complexity in organizing and managing multiple variables.
Operations: While data types specify operations that can be performed on individual
variables, data structures provide operations for managing and manipulating a collection of
variables.
In summary, data types are fundamental building blocks that define the nature of individual
variables, whereas data structures organize and structure multiple variables to efficiently perform
operations on them. Both are crucial concepts in programming, working together to enable
effective data handling and manipulation.
In practice, the terminology might be used interchangeably based on the specific context. What's
crucial is understanding the characteristics and use cases of these constructs, whether classified
as data types or as part of data structures. It's worth noting that the distinction can be subtle, and
different educational materials or programming languages might use varied classifications.
1. Integer ( int ):
In C++, the int data type is commonly used to represent integer values. On x86 architecture,
int is typically 4 bytes.
Representation:
Stored in little-endian format, where the least significant byte comes first.
Example: Let's consider the value 42 (decimal).
Range:
2. Floating-Point ( float ):
The float data type is used for single-precision floating-point numbers, typically 4 bytes on
x86 architecture.
Representation:
Range:
Representation:
Range:
4. Character ( char ):
The char data type is used to represent single characters, typically 1 byte on x86 architecture.
Representation:
Range:
5. Boolean ( bool ):
The bool data type is used to represent Boolean values, typically 1 byte on x86 architecture.
Representation:
| Byte 0 |
|--------|
| 01 |
Range:
It's important to note that the actual sizes and representations may vary across different
compilers and architectures, but the provided examples illustrate common representations on
x86. Additionally, C++ offers fixed-size integer types ( int32_t , int64_t , etc.) to ensure a
specific size regardless of the platform.
Arrays
Arrays in memory are contiguous blocks of storage that allow for the storage of multiple elements
of the same data type. The representation of arrays varies based on the data type, and memory
allocation for arrays is typically done in a way that ensures efficient access to individual
elements.
Let's consider arrays of different data types – int , float , char , and double .
The size of an array in C++ is determined by the data type and the available memory.
The language itself does not impose a strict limit on the size of an array.
However, practical limits may be determined by factors such as available RAM and system
architecture.
Arrays that are too large might lead to stack overflow or segmentation faults.
Example:
It's essential to be mindful of the system's memory limitations and choose appropriate data
structures for handling large amounts of data, such as dynamic data structures like linked lists or
dynamic arrays.
In summary, arrays are represented in memory as contiguous blocks, and memory allocation is
determined by the size and data type of the array. Practical limits on array size depend on the
available system memory.
1. Base Address:
The memory address of the first element in the array is considered the base address.
2. Element Size:
The size of each element in the array is crucial for calculating the address of
subsequent elements.
3. Index:
The index of the element indicates its position within the array.
4. Address Calculation:
Where:
Base_address is the memory address of the first element.
i is the index of the element.
Size_of_each_element is the size of each element in the array.
#include <iostream>
int main() {
// Integer array
int arrInt[5] = {10, 20, 30, 40, 50};
return 0;
}
Line 8: Declares a pointer baseAddress of type int* and assigns it the address of the
first element of arrInt . This is the base address of the array.
Line 11: Uses the sizeof operator to calculate the size (in bytes) of each element in the
array and stores it in the variable sizeOfElement .
size_t is used for sizeOfElement because it represents the size of a memory block
and ensures compatibility across different platforms and architectures.
size_t is an unsigned integer type in C++ that is guaranteed to be able to represent
the size of any object. It is commonly used for variables that hold sizes or indices of
objects in memory.
Line 14: Calculates the address of the third element (index 2) by adding 2 *
sizeOfElement to the base address. This demonstrates pointer arithmetic.
Line 17: Calculates the byte offset by multiplying the index ( 2 ) with the size of each
element ( sizeOfElement ). This provides the number of bytes to offset from the base
address.
size_t is used for byteOffset because it represents the size of a memory offset in
bytes, and using size_t ensures non-negativity and consistency with memory-related
operations.
Using size_t is a good practice when dealing with sizes, indices, or memory-related
calculations to enhance portability and maintain code consistency across platforms.
Important Notes:
1. Pointer Arithmetic:
The actual memory address is calculated in bytes, so the size of each element is crucial
for correct address calculation.
4. Out-of-Bounds:
Accessing elements beyond the array bounds is undefined behavior and should be
avoided.
Arrays are versatile data structures widely used in various applications due to their simplicity and
efficiency. Here are some common use cases, along with their advantages and disadvantages:
1. Use Cases:
Sequential Access: Arrays are efficient for sequential access, making them suitable for
applications where data is accessed in a linear manner, such as iterating through a list
of items.
Random Access: Arrays provide constant-time random access to elements based on
their index, making them suitable for scenarios where quick access to any element is
required.
Fixed-Size Collections: When the size of the collection is known and fixed in advance,
arrays are a suitable choice due to their fixed size.
2. Advantages:
Constant-Time Access: Accessing any element in the array takes constant time, O(1),
as it can be directly calculated using the index.
Memory Efficiency: Arrays are memory-efficient, as they store elements in contiguous
memory locations, allowing for better cache locality.
Simplicity: Arrays are simple and easy to use, making them suitable for scenarios
where complexity needs to be minimized.
3. Disadvantages:
Fixed Size: The main disadvantage is that arrays have a fixed size, making them less
suitable for dynamic data that may grow or shrink.
Insertion and Deletion: Inserting or deleting elements in the middle of an array can be
inefficient, as it may require shifting elements to maintain order.
Wasted Memory: If the array size is larger than the actual number of elements,
memory may be wasted.
Resize O(n)
Search (Sorted Array): Searching in a sorted array can be more efficient, with a time
complexity of O(log n) using binary search.
Insertion (at the End): Inserting an element at the end of the array is a constant-time
operation, O(1).
Insertion (at Arbitrary Position): Inserting an element at an arbitrary position requires
shifting subsequent elements, resulting in a linear time complexity, O(n).
Deletion (from the End): Deleting an element from the end of the array is a constant-time
operation, O(1).
Deletion (from Arbitrary Position): Deleting an element from an arbitrary position requires
shifting subsequent elements, resulting in a linear time complexity, O(n).
Resize: Resizing an array involves creating a new array and copying elements, resulting in
a linear time complexity, O(n).
Understanding these operations and their associated time complexities helps in choosing the
right data structure based on the specific requirements of an application.
Dynamic Arrays
A dynamic array, also known as a resizable array or a dynamically allocated array, is a data
structure that allows for flexible and efficient management of a collection of elements. Unlike
static arrays, the size of a dynamic array can be changed during runtime, enabling the addition or
removal of elements as needed.
In addition to general array features, key characteristics and concepts related to dynamic arrays
include:
Dynamic arrays are created by allocating memory on the heap using mechanisms like
new in C++.
This allocation allows for a more flexible size, as the memory is not fixed at compile-
time.
2. Resizable Capacity:
**Implementation Example:
Here's a C++ code example demonstrating dynamic arrays performing resizing. The example
includes detailed code explanations for each step.
#include <iostream>
int main() {
// Initialize variables
int* dynamicArray = nullptr; // Pointer for dynamic array
int size = 0; // Current size of the dynamic array
int capacity = 5; // Initial capacity
return 0;
}
Code Explanation:
1. Memory Allocation: The program starts by allocating memory for the initial dynamic array
with a specified capacity ( 5 in this case).
2. Print Function: A lambda function printArray is defined to print the elements of the
dynamic array.
3. Resize Function: A lambda function resizeArray is defined to resize the dynamic array
when its size reaches its capacity. It doubles the capacity and copies elements to the new
array.
4. Element Addition: A loop adds elements to the dynamic array. If the size exceeds the
current capacity, the array is resized using the resizeArray function.
5. Printing After Each Addition: The printArray function is called after each element
addition to visualize the dynamic array's state.
6. Memory Deallocation: Finally, memory is deallocated for the dynamic array using
delete[] to free the allocated memory.
Vectors
Here's a revised version of the code example used for dynamic arrays using vectors in C++.
Vectors encapsulate dynamic arrays and provide a higher-level interface for dynamic resizing.
The code explanations are provided for each step.
#include <iostream>
#include <vector>
int main() {
// Initialize a vector with an initial capacity of 5
std::vector<int> dynamicVector;
return 0;
}
Code Explanation:
By using vectors, the code becomes more concise and safer, as the vector class encapsulates
the complexities of dynamic resizing and memory management. Vectors are a powerful and
convenient alternative to dynamic arrays, providing a higher-level abstraction while maintaining
This revised table includes a column for vectors, highlighting the differences in time complexity
between arrays and vectors for various operations. Note that vector insertion at the end is stated
as amortized O(1) because vectors may occasionally need to resize, leading to occasional O(n)
insertions, but these are infrequent on average.
*Here amortize means 'on average'.
Reflection MCQs:
4. Which data type is commonly used for single-precision floating-point numbers in C++?
A. float
B. double
C. int
D. char
Correct Option: A
A. 0 to 255
B. -128 to 127
C. -32768 to 32767
D. 0 to 127
Correct Option: B
A. Constant-time access
B. Dynamic size
C. Wasted memory
D. Variable-size collections
Correct Option: C
A. Access by Index
B. Search
C. Insertion
D. Deletion
Correct Option: B
A. Dynamic size
B. Wasted memory
C. Contiguous storage
D. Variable-size collections
Correct Option: C
11. In C++, how is the random access time complexity for arrays characterized?
A. O(1)
B. O(log n)
C. O(n)
D. O(n^2)
Correct Option: A
A. Search
B. Insertion
C. Deletion
D. Access by Index
Correct Option: A
A. Fixed size
B. Constant-time access
C. Efficient insertion
D. Contiguous storage
Correct Option: A
14. What is the time complexity for resizing an array in C++?
A. O(1)
B. O(log n)
C. O(n)
D. O(n^2)
Correct Option: C
Code Exercises:
Write a C++ program that performs the following operations on an integer array:
int main() {
const int N = 5; // TODO: Change the size as needed
int arr[N];
// TODO: Implement code to calculate and print the sum of all elem
ents
// TODO: Implement code to find and print the maximum and minimum
elements
return 0;
}
1. Accepts user input to fill two integer arrays of the same size.
2. Combines the two arrays into a third array, interleaving elements from the first and second
arrays.
3. Prints the merged array.
4. Sorts the merged array in ascending order using any sorting algorithm.
5. Prints the sorted array.
#include <iostream>
int main() {
const int N = 5; // TODO: Change the size as needed
int arr1[N];
int arr2[N];
int mergedArr[2 * N];
// TODO: Implement code to fill arr1 and arr2 with user input
return 0;
}
int main() {
// feel free to use vectors instead
int* dynamicArray = nullptr;
int size = 0;
int capacity = 5;
// TODO: Implement code to print the extended array after each add
ition
return 0;
}
int main() {
const int N = 5; // TODO: Change the size as needed
int sortedArray[N];
// TODO: Implement code to fill the sorted array with user input
int target;
// TODO: Implement code to perform binary search and print the res
ult
return 0;
}
1. Write a program that prints the memory addresses of elements in an integer array.
2. Modify the program to print the size of each element and the calculated memory address of
each element.
3. Explore the use of pointers to access and manipulate array elements.
#include <iostream>
int main() {
const int N = 5; // TODO: Change the size as needed
int arr[N];
// TODO: Implement code to print the size of each element and the
calculated memory address of each element
return 0;
}
Please feel free to share your problems to ensure that your weekly tasks are completed
and you are not staying behind