DAA-Programming-Project_1
DAA-Programming-Project_1
PROGRAMMING PROJECT 1
EMPIRICAL ANALYSIS OF
SORTING ALGORITHMS
Project Members:
BS COMPUTER SCIENCE 2A
Cadag, Jaycee D.
Dela Cruz, Mark L.
Espinas, A Z Rain L.
Loviña, John Melrick M.
Morcozo, Janna Carla R.
PROGRAMMING PROJECT REPORT
I. DOCUMENTATION
A. Overview
Our program, written in C, implements six common sorting algorithms: Selection
Sort, Bubble Sort, Insertion Sort, Mergesort, Quicksort (with a median-of-three pivot
selection), and Heapsort, all operating on arrays of 32-bit integers to facilitate the
performance analysis of the sorting algorithms. The program allows the user to specify
the size of an integer array and choose between random or incrementally generated
data. It then sorts the array using each of the six algorithms, measures their execution
times, and outputs the original and sorted arrays, along with the execution times, to both
the console and a text file ("output.txt"). This program helps to compare the
performance of these algorithms under various input sizes and data distributions.
Table 1. The key components of the program with their description or purpose
Furthermore, the program includes error handling for memory allocation and file
opening, validates user input, and prevents memory leaks by freeing all dynamically
allocated memory with free().
C. Execution Flow of the Program
The execution flow of the program involves user interaction for determining the
input size, generating, and then sorting the array, measuring the execution time, and
displaying the output, as detailed below:
1. The program begins by clearing the console, displaying the title, and prompting the
user to input the desired size (N) of the array. Once the array size is determined, the
program dynamically allocates memory for the array. If memory allocation fails, the
program outputs an error, and terminates.
2. After allocating memory for the array, the program requests the user to select a data
generation method (random or incremental).
(a)If random generation is selected, the program proceeds to populate the array.
(b)If incremental generation is selected, the program prompts the user for the
starting value.
3. The array is then populated according to the user's chosen data generation method.
4. The original array is then written to the output file (output.txt).
5. Subsequently, the program creates a copy of the original array for each sorting
algorithm to ensure the original data remains unchanged. The program then
executes and measures the execution time, using the clock() function, of each
sorting algorithm. The allocated memory for each copy is freed immediately after its
sorting operation.
6. The sorted array and measured execution time for each sorting algorithm is then
written to both the console and the output file.
7. Finally, the dynamically allocated memory for the original array is released, the
output file is closed, and the program terminates.
(a)The images below show the program output in the terminal and output file for
random data generation.
(b)The images below show the program output in the terminal and output file for
incremental data generation,
II. PROGRAM EXECUTION
A. Random Input Case
The program was executed multiple times for six distinct array sizes: N = 10, 100,
1000, 10000, 100000, and 1000000. For each value of N, the program was run five
separate times, with each run generating a unique random array.
The following images display the recorded execution times for each sorting
algorithm across the varied array sizes. Each image represents the running time for
every input size, from the five individual runs performed for each N value.
N = 10
N = 100
N = 1000
N = 10000
N = 100000
N = 1000000
B. Sorted Input Case
The program was executed multiple times for six distinct array sizes: N = 10, 100,
1000, 10000, 100000, and 1000000. For each value of N, the program was run five
separate times, with each run generating an incrementally sorted array.
The following images display the recorded execution times for each sorting
algorithm across the varied array sizes. Each image represents the running time for
every input size, from the five individual runs performed for each N value.
N = 10
N = 100
N = 1000
N = 10000
N = 100000
N = 1000000
III. EMPIRICAL ANALYSIS
A. Performance Analysis
1. Random Input Case
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 10
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 100
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 1000
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 10000
Sorting Average
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 100000
Sorting Average
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm N = 1000000
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 10
0.0000012
Insertion Sort 0.000002s 0.000002s 0.000001s 0.000000s 0.000001s
s
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 100
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 1000
Average
Sorting
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 10000
Sorting Average
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm
N = 100000
Sorting Average
Run 1 Run 2 Run 3 Run 4 Run 5 Time for
Algorithm N = 1000000
Figure 1.1. Line graph for the average runtimes of a randomly generated array
Figure 1.2. Scatter plot for the average runtime of a randomly generated array
ANALYSIS BASED ON NUMBER OF INPUTS FOR A RANDOMLY GENERATED ARRAY
Selection Sort
● Selection sort starts off fast but quickly dwindles in performance as the input
increases in number.
● Always makes O(n2) swaps, leading to slow performance.
Bubble Sort
● Demonstrates the least efficient performance for large datasets.
● Requires a high number of comparisons and swaps in worst-case scenarios, also
has a time complexity of O(n2).
Insertion Sort
● Better than Selection and Bubble Sort but remains inefficient for large inputs. For
N = 100000, the execution time was 1236.98 seconds (~20 minutes), significantly
better than Bubble Sort but still slow.
Mergesort
● Maintains consistent and efficient performance across all input sizes.
● Slightly slower than Quicksort but still highly efficient.
Quicksort
● Represents the fastest sorting algorithm for large datasets.
● Demonstrates exceptional speed and efficiency.
Heapsort
● Exhibits slightly slower performance compared to Mergesort and Quicksort.
● Has consistent average performance across five runs.
1967.75256 1,494.51451
1000000 3s 2s
0.007214s 0.102159s 0.043919s 0.179164s
Figure 2.1. Line graph for the average runtimes of an already sorted array
Figure 2.2. Scatter plot for the average runtimes of an already sorted array
C. Conclusion
The empirical analysis of sorting algorithms, conducted on both randomly
generated and incrementally sorted inputs, reveals several key trends and insights. In
the case of sorting randomly generated inputs, the results present a clear distinction
between incremental comparison-based and recursive sorting algorithms. For small
inputs, incremental comparison-based sorting algorithms perform slightly better than
recursive algorithms, however, as the input size increases, the quadratic growth of
algorithms such as Selection, Bubble, and Insertion Sort inflicts overhead to its process,
causing them to become significantly slower. Bubble Sort performs the worst due to
multiple swaps, while Insertion Sort remains the fastest among incremental
comparison-based approaches.
In contrast, recursive algorithms such as Quicksort, Mergesort, and Heapsort
consistently outperform incremental comparison-based algorithms, maintaining
efficiency even for large datasets. In all the algorithms tested, Quicksort emerges as the
best overall performer, having the fastest runtime and slowest growth rate overall for
very large inputs (N ≥ 100,000).
For incrementally sorted inputs, Bubble and Selection Sort, despite requiring
fewer swaps, still suffer from O(n2) complexity, making them inefficient for large
datasets. Recursive algorithms maintain stable performance, with Quicksort, Mergesort,
and Heapsort continuing to be efficient. However, the pivot selection strategy in
Quicksort plays a huge role in preventing worst-case scenarios. As observed, Insertion
Sort is the best sorting algorithm for sorted inputs, having a near-linear growth rate.
Comparing the two cases, it is evident that the initial order of data significantly
impacts performance, with Insertion Sort benefiting the most from pre-sorted inputs. To
note, Modified Bubble Sort would’ve grown similarly as Insertion Sort but this was not
the case in our program. Also, while Bubble and Selection Sort remain inefficient
regardless of input order, recursive sorting algorithms show minimal variation between
the two cases, reinforcing their reliability for different data distributions.
After analyzing the data and comparing results, the theoretical analysis
correlates with the results of the empirical analysis of all the sorting algorithms. For
small to medium inputs (10-1,000), it is recommended to use Selection, Bubble,
Insertion sort but using a recursive algorithm such as Mergesort, Quicksort and
Heapsort also results in similar, albeit slightly slower runtimes. For large datasets
(10,000 and above), recursive algorithms should be preferred. Insertion Sort is the best
choice for nearly sorted inputs. This analysis aligns well with theoretical expectations,
demonstrating the importance of algorithm selection based on input size and data
properties.
Our group initially thought that only one The group decided to start over despite
sorting algorithm could run per execution. missing the project deadline. We fixed the
Unfortunately, we discovered this error code first and then restarted the empirical
too late, which prevented us from analysis.
submitting the project on time. As a result,
we had to make significant revisions to
the program before conducting the
empirical analysis again.
Morcozo, Janna Carla R. ● Implemented the Insertion Sort and Merge Sort
algorithms in C.
● Assembled and integrated the group’s source code
contributions to create the program's initial working
version.
● Assisted in organizing screenshots of executed
programs.
● Plotting some of the running times in the table, and
calculating their average.
● Re-checking content validity and editing.