0% found this document useful (0 votes)
93 views2 pages

Implement An Algorithm To Read in The Iris Dataset.: Programming Assignment 1 Analysis

This document summarizes an analysis of programming assignment 1 that involved working with the Iris dataset: 1. The algorithm reads in the Iris dataset from the R library and calls the data. 2. The algorithm visually displays two features and their class labels on four scatterplots to see separation between iris types. 3. The sorting algorithm developed was merge sort, which has a time complexity of O(n log n) for sorting each of the four features. Code was provided to implement the sorting.

Uploaded by

maria ren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views2 pages

Implement An Algorithm To Read in The Iris Dataset.: Programming Assignment 1 Analysis

This document summarizes an analysis of programming assignment 1 that involved working with the Iris dataset: 1. The algorithm reads in the Iris dataset from the R library and calls the data. 2. The algorithm visually displays two features and their class labels on four scatterplots to see separation between iris types. 3. The sorting algorithm developed was merge sort, which has a time complexity of O(n log n) for sorting each of the four features. Code was provided to implement the sorting.

Uploaded by

maria ren
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Programming Assignment 1 Analysis

1. Implement an algorithm to read in the Iris dataset.

As shown in the R code, since the iris dataset is already included in the R dataset library, I first
used the library(datasets) function, then used the function data(iris) to call the iris data.

2. Implement an algorithm to visually see two sets of features and the class they belong to.

I graphed the four different features of the dataset on four different scatterplots. For each of
them, the black points represent setosa, red points represent versicolor, and green points
represent virginica.

3. (a) Develop an algorithm (pseudocode) to sort the four features in the dataset.

// sort the feature using a structure of merge sort mergeSort (variable feature)

// recursion stopping case if feature length is 0 or 1

return current feature data else

midpoint = feature.length / 2 // round to the next smallest integer // recursively divide the left and
right sections into halves
left = mergeSort (feature [0: midpoint] )
right = mergeSort (feature [ midpoint + 1 : feature.length]) merge (left, right)

merge (x, y)
// initiate index for loop

xIndex = 0
yIndex = 0
resultIndex = 0
// initiate array to store result
result = array(x.length + y.length) For (resultIndex in 0: result.length)

If (x[xIndex] < y[yIndex] and x[xIndex] <= x.length or yIndex > y.length) result[resultIndex] =
x[xIndex]
xIndex ++

else
result[resultIndex] = y[yIndex]

yIndex ++ return result


(b) Provide the efficiency (running time) of your sorting algorithm in O-notation.

I have utilized merge sort as the foundational method for sorting the feature datasets. Merge sort
utilizes the divide, conquer and combine method that was mentioned in module 1. We first
divide and split the data array into halves until we can not divide further – i.e. we have a list of
single elements. Then we merge the elements two by two into smaller sorted lists, then again
merge the small sorted lists until we merge into one single sorted list. As for the timing of each
step, as mentioned in the textbook on pg 35, the process to divide the array into smaller parts is
constant time, so we label it as !(1). Then we need to recursively solve two subproblems each of
size n/2, and the time complexity is around 2T(n/2) for both of them. Then the merging part
(with the merge(x,y) function used above) will be linear time complexity, and it will take !(n).
We also have to take into account the base case – that if there is only 1 element in the list, we
just return that one single element.

So the total running time of merge sort is:


T(n) = !(1) if n = 1
T(n) = 2T(n/2) + !(n) for n > 1
Using the master theorem, we can solve the above as T(n) = 2T(n/2) + !(n)

a = 2 b = 2 f(n) = n

"!"#!$ = "!"#"% = "& = "

Using Case 2 of the master theorem, we can find

Let k = 0

f(n) = !("!"#!$#$%'") = n * (#$%(") = n

which satisfy the condition above for f(n), so according to the second case for the master
theorem,

f(n) = !("!"#!$#$%')&") = !(n * (#$%()&")) = !(n lg n )

So the time complexity of the sorting algorithm (merge sort) is O(n lg n). And since we need to
do it four times – one for each feature, we multiply the time complexity by 4. But then the
constant drops out, so we still have out resulting complexity as O(n lg n).

(c) Implement your algorithm in your code of choice.

Please see R code and code output.

(d) Determine if any of the four can separate the three plant types.

Looking at the graphs generated above, we can see that after we sorted all the data values, sepal
length isn’t a great separator of the three plant types. On the graph, we can see that the black

You might also like