0% found this document useful (0 votes)
12 views6 pages

Lab 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views6 pages

Lab 6

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning Classification Theory Examination

Time allowed: 120 minutes Total points: 100

Part 1: Classification Fundamentals (30 points)


1. (10 points) Given the following confusion matrix for a facial expression recognition system:

Actual
Predicted Happy Sad
Happy 85 15
Sad 20 80

a) Calculate accuracy, precision, recall, and F1-score

b) Explain why each metric would or would not be appropriate for this application

c) If this system were to be used in mental health screening, which metric should be prioritized and why?

2. (10 points) The sigmoid function σ(z) = 1/(1 + e⁻ᶻ) is fundamental to logistic regression.

a) For z = ln(3), calculate the exact value of σ(z)

b) Prove that σ(z) + σ(-z) = 1 for any value of z

c) Explain why property (b) makes this function suitable for binary classification

4. (10 points) For a KNN classifier with k=3, consider the following five training points:

Point Feature x Feature y Class


P1 1 2 A
P2 2 1 A
P3 5 5 B
P4 4 4 B
P5 2 3 A

What class would be assigned to a test point at (3,3)? Show all distance calculations and explain your reasoning.

Question 1. Confusion Matrix Analysis (10 points)


Given matrix:

Actual
Predicted Happy Sad
Happy 85 15
Sad 20 80

a) Metric Calculations: TP = 85, TN = 80, FP= 15, FN = 20

1. Accuracy:

Formula: (TP + TN)/(TP + TN + FP + FN)


Numbers: (85 + 80)/(85 + 80 + 15 + 20)
Calculation: 165/200 = 0.825
Result: 82.5%

2. Precision:

Formula: TP/(TP + FP)


Numbers: 85/(85 + 15)
Calculation: 85/100 = 0.85
Result: 85%

3. Recall:

Formula: TP/(TP + FN)


Numbers: 85/(85 + 20)
Calculation: 85/105 ≈ 0.81
Result: 81%

4. F1-score:

Formula: 2 × (Precision × Recall)/(Precision + Recall)


Numbers: 2 × (0.85 × 0.81)/(0.85 + 0.81)
Calculation: 2 × 0.6885/1.66 ≈ 0.83
Result: 83%

b) Metric Appropriateness:

Accuracy: Appropriate because:

Classes are balanced (similar numbers of Happy and Sad)


Both types of errors have similar importance
Gives good overall performance measure

Precision: Important because:

Measures reliability of "Happy" predictions


Crucial if false positives are problematic
Helps avoid wrongly labeling sad expressions as happy

Recall: Critical because:

Shows ability to detect all happy expressions


Important for not missing positive emotions
Helps ensure comprehensive emotion detection

F1-score: Very appropriate because:

Balances precision and recall


Provides single metric for comparison
Accounts for both types of errors

c) For mental health screening:

Recall should be prioritized because:

Missing a negative emotion (false negative) could be dangerous


Better to flag potential concerns for further review
Early intervention is crucial in mental health
Cost of missing a sad expression > cost of false happy detection

Question 2. Sigmoid Function Analysis (10 points)


a) For z = ln(3): σ(ln(3)) = 1/(1 + e^-ln(3)) = 1/(1 + 1/3) = 1/(3/3 + 1/3) = 1/(4/3) = 3/4 = 0.75

b) Proof that σ(z) + σ(-z) = 1:

σ(z) + σ(-z) = 1/(1 + e^-z) + 1/(1 + e^z)

= (e^z)/(e^z + 1) + (1)/(1 + e^z)

= 1

c) This property makes sigmoid suitable for binary classification because:

Outputs are complementary probabilities


P(class 1) + P(class 0) = 1 always holds
Natural interpretation as probability
Smooth transition between classes

Question 3. KNN Classification (10 points)


For test point (3,3), calculate distances to all training points:

P1(1,2) class A: d₁ = √((3-1)² + (3-2)²) = √(4 + 1) = √5 ≈ 2.236

P2(2,1) class A: d₂ = √((3-2)² + (3-1)²) = √(1 + 4) = √5 ≈ 2.236

P3(5,5) class B: d₃ = √((3-5)² + (3-5)²) = √(4 + 4) = √8 ≈ 2.828


P4(4,4)class B: d₄ = √((3-4)² + (3-4)²) = √(1 + 1) = √2 ≈ 1.414

P5(2,3) class A: d₅ = √((3-2)² + (3-3)²) = √(1 + 0) = 1.000

Three nearest neighbors:

1. P5: d=1.000 (Class A)


2. P4: d=1.414 (Class B)
3. P1: d=2.236 (Class A)

Result: Class A (2 votes for A, 1 vote for B)

Part 2: Advanced Concepts (40 points)


4. (15 points) Softmax Regression: Given the following scores for an image classification task:

Cat: 2.0
Dog: 1.0
Bird: -1.0

a) Calculate the softmax probabilities for each class

b) Show all steps of your calculation

c) Prove that your probabilities sum to 1

d) Explain why we need the exponential function in softmax

5. (10 points) For logistic regression with two features x₁ and x₂, weights w₁=2, w₂=-1, and bias b=1:

a) Write the complete equation for P(y=1|x)

b) Find the equation of the decision boundary where P(y=1|x) = 0.5

c) Calculate P(y=1|x) for point (1,1)

Question 4. Softmax Regression (15 points)


Given scores: Cat: 2.0 Dog: 1.0 Bird: -1.0

a) & b) Detailed calculation steps:

1. Calculate exponentials:

Cat: e^2.0 = 7.389

Dog: e^1.0 = 2.718

Bird: e^(-1.0) = 0.368

3. Calculate sum (denominator):

Sum = 7.389 + 2.718 + 0.368 = 10.475

5. Calculate probabilities:

P(Cat) = 7.389/10.475 = 0.705 = 70.5%

P(Dog) = 2.718/10.475 = 0.259 = 25.9%

P(Bird) = 0.368/10.475 = 0.035 = 3.5%

c) Proof probabilities sum to 1:

0.705 + 0.259 + 0.035 = 0.999 ≈ 1.0

(Small difference due to rounding)

d) Exponential function necessity:

1. Non-negativity: Converts all scores to positive numbers

2. Relative scale preservation: Maintains ordering of original scores


3. Amplification: Enhances differences between scores

4. Mathematical properties: Derivative of exp(x) is exp(x), simplifying optimization

Question 5. Logistic Regression (10 points)


Given: w₁=2, w₂=-1, b=1

a) Complete probability equation: P(y=1|x) = 1/(1 + e^-(2x₁ - x₂ + 1))

b) Decision boundary where P(y=1|x) = 0.5:

1. Start with P(y=1|x) = 0.5: 0.5 = 1/(1 + e^-(2x₁ - x₂ + 1))

2. Multiply both sides by denominator: 0.5(1 + e^-(2x₁ - x₂ + 1)) = 1

3. Simplify:

1 + e^-(2x₁ - x₂ + 1) = 2

e^-(2x₁ - x₂ + 1) = 1

-(2x₁ - x₂ + 1) = 0

2x₁ - x₂ + 1 = 0

5. Final equation: x₂ = 2x₁ + 1

c) Calculate P(y=1|x) for (1,1):

1. Calculate z = 2x₁ - x₂ + 1: z = 2(1) - 1 + 1 = 2

2. Apply sigmoid: P(y=1|x) = 1/(1 + e^-2) = 1/(1 + 0.135) = 0.881 = 88.1%

Part 3: Theoretical Analysis (30 points)


7. (15 points) Compare and contrast:

a) The mathematical basis for why KNN can handle non-linear decision boundaries while logistic regression cannot

b) How the curse of dimensionality affects each algorithm

c) The computational complexity during training and prediction phases

8. (15 points) Consider a face recognition system that needs to classify expressions into: Happy, Sad, Angry, Surprised.

a) Write out the full One-vs-All formulation for this problem

b) Calculate the number of binary classifiers needed for both One-vs-All and One-vs-One approaches

c) Explain mathematically why softmax regression might be more appropriate than multiple logistic regressions

Question 7. Algorithm Comparison (15 points)


a) KNN vs. Logistic Regression Decision Boundaries:

KNN decision boundaries:

Forms boundaries based on local neighborhood majority


Mathematical formulation: D(x) = argmax_y Σᵢ I(y=yᵢ)w(||x-xᵢ||) where:
I(y=yᵢ) is indicator function
w(||x-xᵢ||) is weight based on distance
Can create complex, non-linear boundaries because:
1. Decision at each point depends only on nearby training points
2. No global functional form constraint
3. Boundary shape adapts to local data distribution

Logistic regression boundaries:


Creates single linear boundary
Mathematical form: z = w^T x + b P(y=1|x) = 1/(1 + e^-z)
Can only create linear boundaries because:
1. Decision function is linear combination of features
2. Sigmoid function is monotonic
3. Decision boundary occurs where w^T x + b = 0

b) Curse of Dimensionality Effects:

KNN affected severely:

1. Volume grows exponentially with dimensions


In d dimensions, unit hypersphere volume ∝ π^(d/2)/Γ(d/2 + 1)
2. Distance metrics become less meaningful
Ratio of distances (dmax/dmin) → 1 as d → ∞
3. Required sample size grows exponentially
To maintain same density, N ∝ 2^d

Logistic regression affected less:

1. Parameter count grows linearly


Number of parameters = d + 1
2. Model complexity independent of space volume
3. Still effective in high dimensions if:
Linear separation exists
Features are informative

c) Computational Complexity Analysis:

KNN: Training phase:

Time complexity: O(1)


Space complexity: O(nd)
Just stores training data
No actual training performed

Prediction phase:

Time complexity: O(nd + nlog(n))


nd for distance calculations
nlog(n) for sorting distances
Space complexity: O(n)
Must compute distances to all training points

Logistic Regression: Training phase:

Time complexity: O(ndm)


n = samples
d = dimensions
m = iterations
Space complexity: O(d)
Gradient descent iterations

Prediction phase:

Time complexity: O(d)


Space complexity: O(d)
Single matrix multiplication

Question 8. Multi-class Classification (15 points)


a) One-vs-All formulation for emotions:

For each emotion k ∈ {Happy, Sad, Angry, Surprised}:

Train binary classifier fₖ(x)


fₖ(x) = σ(wₖᵀx + bₖ)

Complete formulation:

1. Happy vs. rest: P(Happy|x) = σ(w₁ᵀx + b₁)


2. Sad vs. rest: P(Sad|x) = σ(w₂ᵀx + b₂)
3. Angry vs. rest: P(Angry|x) = σ(w₃ᵀx + b₃)
4. Surprised vs. rest: P(Surprised|x) = σ(w₄ᵀx + b₄)

Final prediction: ŷ = argmax_k fₖ(x)

b) Number of Required Classifiers:

One-vs-All:

Number = K (number of classes)


Here: 4 classifiers

One-vs-One:

Number = K(K-1)/2
Here: 4(4-1)/2 = 6 classifiers
Pairs: (Happy-Sad), (Happy-Angry), (Happy-Surprised), (Sad-Angry), (Sad-Surprised), (Angry-Surprised)

c) Softmax Regression Advantages:

Mathematical explanation:

1. Joint probability modeling: Softmax: P(y=k|x) = exp(wₖᵀx)/Σⱼ exp(wⱼᵀx)

Naturally ensures Σₖ P(y=k|x) = 1


Models class probabilities simultaneously
2. Parameter sharing:

Features used by all classes simultaneously


Better feature utilization
More efficient learning
3. Training stability:

Single optimization objective


Avoids conflicting binary decisions
More numerically stable
4. Direct comparison:

Scores directly comparable


No need for calibration
Natural ranking of probabilities

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

You might also like