Midterm - APS1070 - 2020 - 05 Summer

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

1.

Please read this statement and Agree/Disagree below:

“In submitting this assessment, I confirm that my conduct during this quiz adheres to the Code
of Behaviour on Academic Matters. I confirm that I did NOT act in such a way that would
constitute cheating, misrepresentation, or unfairness, including but not limited to, using
unauthorized aids and assistance, impersonating another person, and committing plagiarism. I
pledge upon my honour that I have not violated the Faculty of Applied Science & Engineering’s
Honour Code during this assessment.”

2. [2] Which of the following statements is false?


a. Basis vectors forming an orthogonal basis are always orthonormal.
b. Basis vectors forming an orthonormal basis are always orthogonal.
c. Basis vectors forming an orthonormal basis are always normal.
d. All vectors in an orthonormal basis has length 1.

3. [2] In lecture, we discussed decision trees – an intuitive classification model that splits on
different attributes, creating a tree-like structure. A data scientist is given a large data set and
uses part of the data to train a really big decision tree with many branches and nodes, that
perfectly fits the data. When they apply it to the validation data, overall accuracy is only 78%.
a. Why is test performance so poor?
b. What can the data scientist do to improve the model?

4. [2] A data scientist has a data set with a lot of features and chooses to use some of these
features to train a model on training data and evaluate performance on testing data. They find
that both training and testing accuracy is poor. What would you recommend (i) removing a few
features or (ii) adding more features? Explain.

5. [2] A data set with 4 features has the following covariance matrix:

A B C D
A 0.5 0.018 0.11 0.048
B 0.018 0.01 0.0025 0.14
C 0.11 0.0025 0.023 0.0055
D 0.048 0.14 0.0055 6

You’re asked to remove a highly correlated feature from the data set. Which one would you
remove?
6. [4] You have two binary classification models (P_1 and P_2), that use a series of features to
predict the probability of emails being spam. The computed probabilities are shown in the table
below, along with actual labels, for six validation data.

Label P_1 P_2


1 0 0.1 0.1
2 0 0.4 0.5
3 0 0.3 0.5
4 1 0.5 0.4
5 1 0.4 0.8
6 1 0.8 0.6

a. Calculate the AUC for each model.


b. Assuming you value F1-score, which model would you choose?
c. What is the precision, recall, accuracy and confusion matrix for this best model?

You might also like