Implement An Algorithm To Read in The Iris Dataset.: Programming Assignment 1 Analysis
Implement An Algorithm To Read in The Iris Dataset.: Programming Assignment 1 Analysis
As shown in the R code, since the iris dataset is already included in the R dataset library, I first
used the library(datasets) function, then used the function data(iris) to call the iris data.
2. Implement an algorithm to visually see two sets of features and the class they belong to.
I graphed the four different features of the dataset on four different scatterplots. For each of
them, the black points represent setosa, red points represent versicolor, and green points
represent virginica.
3. (a) Develop an algorithm (pseudocode) to sort the four features in the dataset.
// sort the feature using a structure of merge sort mergeSort (variable feature)
midpoint = feature.length / 2 // round to the next smallest integer // recursively divide the left and
right sections into halves
left = mergeSort (feature [0: midpoint] )
right = mergeSort (feature [ midpoint + 1 : feature.length]) merge (left, right)
merge (x, y)
// initiate index for loop
xIndex = 0
yIndex = 0
resultIndex = 0
// initiate array to store result
result = array(x.length + y.length) For (resultIndex in 0: result.length)
If (x[xIndex] < y[yIndex] and x[xIndex] <= x.length or yIndex > y.length) result[resultIndex] =
x[xIndex]
xIndex ++
else
result[resultIndex] = y[yIndex]
I have utilized merge sort as the foundational method for sorting the feature datasets. Merge sort
utilizes the divide, conquer and combine method that was mentioned in module 1. We first
divide and split the data array into halves until we can not divide further – i.e. we have a list of
single elements. Then we merge the elements two by two into smaller sorted lists, then again
merge the small sorted lists until we merge into one single sorted list. As for the timing of each
step, as mentioned in the textbook on pg 35, the process to divide the array into smaller parts is
constant time, so we label it as !(1). Then we need to recursively solve two subproblems each of
size n/2, and the time complexity is around 2T(n/2) for both of them. Then the merging part
(with the merge(x,y) function used above) will be linear time complexity, and it will take !(n).
We also have to take into account the base case – that if there is only 1 element in the list, we
just return that one single element.
a = 2 b = 2 f(n) = n
Let k = 0
which satisfy the condition above for f(n), so according to the second case for the master
theorem,
So the time complexity of the sorting algorithm (merge sort) is O(n lg n). And since we need to
do it four times – one for each feature, we multiply the time complexity by 4. But then the
constant drops out, so we still have out resulting complexity as O(n lg n).
(d) Determine if any of the four can separate the three plant types.
Looking at the graphs generated above, we can see that after we sorted all the data values, sepal
length isn’t a great separator of the three plant types. On the graph, we can see that the black