0% found this document useful (0 votes)
79 views

Data Structures and Algorithms Coursework 2020-21

Uploaded by

Arslan Gohar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Data Structures and Algorithms Coursework 2020-21

Uploaded by

Arslan Gohar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Rev.

Date: 29-01-21

DATA STRUCTURES AND ALGORITHMS

Coursework Assignment 2020-21

SID Number Click or tap here to enter text.

Submission Date

Task A Search Algorithm


Choose an item.
Selected

Task B Shell Sort Gap First Second Thir


Choose an item. Choose an item. Choose an item.
Sequences Compared : : d:

Task C Hash Functions First Second Thir


Choose an item. Choose an item. Choose an item.
Compared : : d:

☐ The structure of this template has not been changed


☐ Code included in the report has been inserted as plain text in a fixed width font
☐ Code is indented and commented, and uses meaningful identifier names
☐ Written sections use the third person voice (i.e., not the first person)
☐ All external sources used are listed in the References section for each task
Presentation Checklist
Choose an item. ☐ Equations are properly typeset using the Microsoft Word Equation Editor
☐ Figures are properly exported and not poor-quality screenshots
☐ The work has been placed in a folder containing this template and code only
☐ The folder name is your SID, and is compressed with ZIP (not RAR/7z) for
upload
☐ On upload to Turnitin, the title of the submission is your SID number only

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Marking Scheme:

Task A: Search Algorithm Analysis


Section Marks Awarded (0..5)
Brief Precise Description of how the Selected Search Algorithm Works (Written) 5
Static Analysis of Algorithm (Equations) 2.5
Empirical Analysis Test Harness (Code) 2.5
Empirical Analysis Test Function (Code) 2.5
Empirical Analysis Line Graph (Figure)
Evaluation of Empirical Analysis (Written)
TOTAL (0..30)

Task B: Sort Algorithm Analysis


Section Marks Awarded (0..5)
Precise Description of Shell Sort Algorithm and the Role of the Gap Sequence (Written) 5
Functions for Generating Selected Gap Sequences (Code) 2.5
Modified Shell Sort Function for Recording Metrics of Interest - Comparisons and
2.5
Element Movements (Code)
Test Harness for Generating Test Data, Running Shell Sort Function, and Aggregating
Metrics of Interest (Code)
Empirical Analysis Results Line Graph (Figure, Statistics)
Evaluation of Empirical Test Findings (Written)
TOTAL (0..30)

Task C: Hash Function Analysis


Section Marks Awarded (0..5)
Brief and Precise Description and Hash Tables and the Role of Hash Functions (Written) 5
Description and Static Demonstration of Hash Functions Selected (Written, Equations) 5
Implementation of Chosen Hash Functions as Function accepting a Key and Returning
2.5
an Index (Code)
Test Harness for Evaluating Hash Function Efficacy with Random and Sequential Keys
(Code)
Clustering Visualisations and Summary Statistics (Figure, Statistics)
Evaluation of Relative Efficacy of Hash Functions Compared (Written)
TOTAL (0..30)

Presentation:
All Checklist Items Followed (0 or 10) 10

GRAND TOTAL (0..100) 40

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Task A: Search Algorithm Analysis
In this task you will be comparing the performance of your selected array search algorithm (ternary, exponential or
Fibonaccian) to the search algorithms provided as examples using both static analysis and empirical experimentation.
Closely follow the structure or the test harness and search functions provided as examples (Jump Search example in
Appendix A) to implement your chosen algorithm; note that you are only required to examine the scenario of
successful searches. The examples are written in MATLAB, but you are free to use an alternative language if you
wish (e.g., Python); however, you will then need to convert the examples I have provided to that language to enable
you to plot graphs that compare their empirical performance to that of your chosen algorithm. If you use Python, you
could use matplotlib to plot graphs; if you use a language like C# or Java you may need to store your results in a text
file and plot them using Excel or similar. Obviously, the easiest approach is to use MATLAB and simply modify the
examples provided.

Brief Precise Summary of how the Selected Search Algorithm (Written; Max 100 Words):
Make this box bigger if you need to, but do not exceed the word limit

Static Analysis of Algorithm (Table; Use Word Equation Editor):


Exact Time Asymptotic Notation

Best

Average

Worst

Empirical Analysis Test Harness (Code):


Make this box bigger as needed

Empirical Analysis Test Function (Code):


Make this box bigger as needed

Empirical Analysis Results Line Graph (Figure):


Make this box bigger as needed

Evaluation of Empirical Test Findings (Written; Max 100 Words):


Make this box bigger as needed, but do not exceed the word limit

References:
Make this box bigger as needed

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Task B: Sort Algorithm Analysis
In this task you will empirically compare three different gap sequence functions that can be used with the Shell sort
algorithm, and compare to Shell (1959). Appendix B contains an example function for generating Shell’s original gap
sequence, an implementation of Shell sort that uses Shell’s original gap sequence function, and a table of alternative
sequences, including algebraic processes that can generate them, the first few terms of each sequence, and the worst
case time complexities that are thought to be produced when these gap sequences are used. As noted in class, big-O
notation, and especially worst case analyses, do not necessarily capture how an algorithm will function in the real
world, so this task requires that you create a simulation to test the following scenarios: i. reversed arrays (the putative
worst case scenario); ii. random arrays of different lengths containing unique values. Note that, unlike Task A, your
test harness cannot test all possibilities (i.e., arrays unsorted in all possible ways) because there are too many ( n !).
You will discuss the results of your experimentation and compare the relative performance of the gap sequences you
selected. Do not shy away from choosing gap sequences that share worst case time complexities, as there could be
differences in performance in practice that would be interesting to uncover. For sorting there are two unit of work that
will test us how much work the algorithm has done: comparisons and element moves. You will need to add to the
Shell sort function provided to ensure that this information is recorded. You may use any language you wish, but
clearly MATLAB and Python with matplotlib provide in-built data science functions for graph plotting and statistics,
so will be easier.

Precise Description of Shell Sort Algorithm and the Role of the Gap Sequence (Written; Max 100 Words):
Make this box bigger as needed, but do not exceed the word limit

Functions for Generating Selected Gap Sequences (Code):


Make this box bigger as needed

Modified Shell Sort Function for Recording Metrics of Interest - Comparisons and Element Movements (Code):
Make this box bigger as needed

Test Harness for Generating Test Data, Running Shell Sort Function, and Aggregating Metrics of Interest (Code):
Make this box bigger as needed

Empirical Analysis Results Line Graph (Figure, Statistics):


Make this box bigger as needed

Evaluation of Empirical Test Findings (Written; Max 100 Words):


Make this box bigger as needed, but do not exceed the word limit

References:

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21

Make this box bigger as needed

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Task C: Hash Function Analysis

In this task, you are to perform an empirical comparison of the efficacy of three different hash functions (selecting
from midsquare, modulus, folding, truncation, and Fibonacci) to assess their usefulness for transforming integer keys
to array indices in a hash table with the objective of minimising collisions and clustering. This task is a little more
open-ended than the previous two, however you should build a simulation that involves the generation of keys
(random and sequential) and measure the degree of clustering (i.e., how evenly spread the keys are in the table) under
different load factors, how often collisions occur, and also evaluate the avalanche property and other desirable features
of a hash function, such as having a surjective relationship with the target array and being fast to compute. For each,
do some background reading to understand how they may be applied in practice (e.g., folding may, in practice,
involve grouping digits before summing), and state any assumptions you make.

Brief and Precise Description and Hash Tables and the Role of Hash Functions (Written, Max 100 Words):
Make this box bigger if you need to, but do not exceed the word limit

Description and Static Demonstration of Hash Functions Selected (Written, Equations, Max 50 Words):
Make this box bigger if you need to, but do not exceed the word limit

Implementation of Chosen Hash Functions as Function accepting a Key and Returning an Index (Code):
Make this box bigger if you need to, but do not exceed the word limit

Test Harness for Evaluating Hash Functions with Random and Sequential Keys (Code):
Make this box bigger as needed

Clustering Visualisations and Summary Statistics (Figure, Statistics):


Make this box bigger as needed

Evaluation of Relative Efficacy of Hash Functions Compared (Written, Max 100 Words):
Make this box bigger as needed

References:
Make this box bigger as needed

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Appendix A: Searching Source Code

Example Test Function for Empirical Analysis:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title: Jump Search Function
% Author: Ian van der Linde
% Rev. Date: 27-01-21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [numComparisons, currentIndex] = jumpSearch(V, target)

numComparisons = 0;
N = length(V);

% Check First Element

numComparisons = numComparisons + 1;
if(V(1)==target)
currentIndex = 1;
return;
end % end if

% Determine Jump Size

jumpSize = ceil(sqrt(N));

% Jumping Part

for currentIndex = jumpSize:jumpSize:N

numComparisons = numComparisons + 1;
if(V(currentIndex)>target)
break;
end % end if

end % end for

% Linear Search the Identified Block

for linearSearchIndex = currentIndex - jumpSize + 1:currentIndex

numComparisons = numComparisons + 1;
if(V(linearSearchIndex)==target)
currentIndex = linearSearchIndex;
return;
end % end if

end % end for

currentIndex = -1; % Unsuccessful Search, Return -1

end % end function

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Example Test Harness for Empirical Analysis:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title: Jump Search Test Harness
% Author: Ian van der Linde
% Rev. Date: 03-02-20
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear all;close all;clc;

maxArraySize = 1024;

for N = 1:maxArraySize

array = 1:N;

for searchTarget = 1:N


comparisons(searchTarget) = jumpSearch(array, searchTarget);
end % end for

best_comparisons(N) = min(comparisons);
average_comparisons(N) = mean(comparisons);
worst_comparisons(N) = max(comparisons);

clear comparisons;

end % end for

% Plot Observed

figure;
plot(1:maxArraySize, best_comparisons,'g','LineWidth',3);hold on; % best
plot(1:maxArraySize, average_comparisons,'y','LineWidth',3); % avg
plot(1:maxArraySize, worst_comparisons,'r','LineWidth',3); % worst

% Plot Expected

plot(1:maxArraySize,linspace(1,1,maxArraySize),'k:','LineWidth', 2); % best


plot(1:maxArraySize,sqrt(1:maxArraySize),'k--','LineWidth', 2); % avg
plot(1:maxArraySize,2*sqrt(1:maxArraySize),'k-','LineWidth', 2); % worst

% Annotate Chart

legend('O best','O average','O worst','E best','E average','E worst');


xlabel('Array Length (N)','FontSize',14);
ylabel('Comparisons', 'FontSize', 14);
titleString = sprintf('%s\n%s', 'Jump Search (Successful)', 'Solid: Observed
(O), Dotted: Expected (E)');
title(titleString,'FontSize', 14);

xlim([0 maxArraySize]);
ylim([0 max(worst_comparisons)]);
axis square;

% Store High-resolution Image for Report

print -f1 -r300 -dbmp jumpSearchSuccessful.bmp

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Appendix B: Sorting Source Code

Example Gap Generator Function:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title: Shell (1959) Gap Generator Function
% Author: Ian van der Linde, PhD
% Date: 29-01-21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function h = shell1959(N,k)
h = floor(N/2^k);
end

Example Program to Test Gap Generator Function:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title: Shell (1959) Gap Generator Function Test
% Author: Ian van der Linde, PhD
% Date: 29-01-21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

clear all;
close all;
clc;

N = 1024; % Array Length

k = 1; % Gap Index (k=1 is first gap to use)


h = N; % h is Gap Size; Initialise to N as a dummy that is >1

while(h>1)
h = shell1959(N,k);
disp(h);
k = k + 1;
end

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Example Shell Sort Function:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Title: Shell Sort Function
% Author: Ian van der Linde, PhD
% Rev Date: 29-01-21
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [V] = shellSort(V)

N = length(V); % Length of Array to be Sorted


k = 1; % Gap Index (1, 2, 3, ...)
gap = shell1959(N,k); % Set Initial Gap Size

while(gap>0)

fprintf('\nCurrent Gap h = %d', gap);


for i = gap+1:N
temp = V(i);
j = i;
while (j >= gap+1) && (V(j-gap) > temp)
V(j) = V(j-gap);
j = j - gap;
end % while
V(j) = temp;
end %for

if(gap == 2) % Causes shell sort to become Insertion Sort


gap = 1;
else
k = k + 1; % Increment Gap Number
gap = shell1959(N,k); % Set New Gap Value using Shell (1959)
end % if

end %while

© 2021 Ian van der Linde, PhD


Rev. Date: 29-01-21
Appendix B.2: Shell Sort Gap Sequences

Worst-case Time
Author Sequence Formula Terms
Complexity
N N N
Shell (1959) ⌊ ⌋ ⌊ ⌋ , ⌊ ⌋ ,…,1 O(N 2 )
2k 2 4

Frank & Lazarus N N 2


2⌊ ⌋ +1 2⌊ ⌋ +1 , … ,3,1
(1960) 2k+1 4 O(N 3 )

2
Hibbard (1963) 2k −1 1,3,7,15,31,63 , … O(N ) 3

Papernov &
Stasevich (1965) 1 ,then 2k +1 1,3,5,9,17,33,65 , … O(N 1.5 )

Pratt (1971) 2 p 3q (3-smooth numbers) 1,2,3,4,6,8,9,12 ,… O(N log 2 N)

3k −1 N 3
Knuth (1973) <⌈ ⌉ 1,4,13,40,121 ,… O(N ) 2
2 3

∏ aq , where
I

a 1=0
Incerpi & 5 q +1

Sedgewick (1985) {
a q=min n ∈ N :n ≥ ()
2 }
, ∀ p :0 ≤ p<q ⇒ gcd1,3,7,21,48,112
( a p , n ) =1 ,… O¿

1
I ={ 0 ≤ q<r| q ≠ ( r 2+r )−k }
2
r =⌊ √ 2 k + √ 2 k ⌋

4
Sedgewick (1982) 1 ,then 4 k +3 ( 2k ) +1 1,8,23,77,281 ,… O(N 3 )

Sedgewick (1986)
{ (
8 ( 2k )−6 2
)
9 2 k −2 2 +1 k mod 2=0
k+1
( )+1 k mod 2=1
2
1,5,19,41,109 , …
4
O(N 3 )

5 hk−1 5N 5 5N
Gonnet & Baeza-
Yates (1991)
h k =max ⌊ { 11 }
⌋ ,1 ; h0=N ⌊
11
⌋,⌊ ⌊
11 11
⌋ ⌋ ,…,1 unknown

k−1
1 9 9
Tokuda (1992) ⌈ 9
5 4 (() ) −4 ⌉ ; where h <N
4 k
1,4,9,20,46,103 , … unknown

Ciura (2001) unknown 1,4,10,23,57,132,301,701 unknown

© 2021 Ian van der Linde, PhD

You might also like