Assignment_III

This document outlines the assignment details for the Foundations of Machine Learning course at IIT Madras, due on October 14, 2023. It includes instructions for submission, collaboration, and referencing, followed by a series of problems related to linear regression, ridge regression, data analysis, clustering techniques, and model evaluation. The assignment emphasizes practical applications in healthcare and data science, requiring students to implement various machine learning techniques and analyze their results.

Uploaded by

Anik Bhowmick ae20b102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Assignment_III

Uploaded by

Anik Bhowmick ae20b102

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Indian Institute of Technology Madras

ID5055 Foundations of Machine learning

Assignment III
Due date: 14th October 2023

Instruction
1. Assignment shall be submitted on the due date. Late submissions will not be entertained. If you
cannot submit the assignment due to some reasons, please contact the instructor by email.

2. All the assignments must be the student’s own work. The students are encouraged to collaborate
or consult friends. In the case of collaborative work, please write every student’s name on the
submitted solution.
3. If you find the solution in the book or article or on the website, please indicate the reference in the
solutions.

Problems
1. Consider the fitted values that result from performing linear regression without an intercept. In
this setting, the ith fitted value takes the form

ŷi = xi β̂ (1)

where ! !
n
X n
X
β̂ = xi yi / x2i (2)
i=1 i=1

Show that we can write

n
X
ŷi = a i yi . (3)
i=1

What is the value of ai ?

2. The ridge regression objective function is defined as
m
1X T λ
J(θ) = (β xi − yi )2 + ||β||2
2 i=1 2
1 λ
= ||Xβ − y||22 + ||β||2 .
2 2
Find the closed form expression for the value of β which minimizes the ridge regression objective
function.

3. You are given a design matrix X (whose ith row is sample point xTi ) and an n-vector of labels
y ≜ [y1 . . . yn ]T . For simplicity, assume XT X = nI. Do not add a fictitious dimension/bias term.
For input 0, the output is always 0. Let x∗i denote the ith column of X.

1
(a) Show that the cost function for L1 -regularized least squares, J1 (β) ≜ ||Xβ − y||2 + λ||β||1
Pd
(where λ > 0), can be rewritten as J1 (β) = ||y||2 + i=1 f (x∗i , βi ) where f (·, ·) is a suitable
function whose first argument is a vector and second argument is a scalar.
(b) Using your solution in the previous question 3a, derive the necessary conditions for the ith
component of the optimizer β ∗ of Ji (·) to satisfy each of these three properties: βi∗ > 0, βi∗ = 0
and βi∗ < 0.
(c) For the optimizer β # of the L2 -regularized least squares cost function J2 (β) ≜ ||Xβ − y||2 +
λ||β||2 where, λ > 0, derive a necessary and sufficient condition for βi# = 0, where βi# is the
ith component of β # .
(d) A vector is called sparse if most of its components are 0. From your solution to part 3b and
3c, which of β ∗ and β # is more likely to be sparse? Why?

4. You are the visionary owner of “HealthPlus Insurance”, a prominent health insurance company
dedicated to improving the healthcare financing landscape. Your company’s long-term success
hinges on accurate predictions of medical expenses, allowing you to set competitive premiums while
ensuring profitability. Medical expenses are influenced by various factors, including age, smoking
habits, and obesity. As the owner, you are personally assuming the role of Chief Data Scientist,
responsible for analysing the provided dataset (“insurance.csv”) and developing a model to estimate
average medical care expenses for different population segments.

(a) Calculate and interpret the correlation matrix to understand relationships among features.
(b) Create a scatterplot matrix to visualize relationships among features. Explain the insights
they can gain from these visualizations.
(c) Perform data preprocessing and cleaning, which involves addressing missing values and han-
dling categorical features, followed by conducting a train-test split of the data.
(d) Implementing and training the linear regression model (apply Ridge and Lasso regression
techniques) using appropriate Python libraries.
(e) Evaluate the model’s performance by calculating relevant metrics such as Mean Absolute
Error (MAE), Mean Squared Error (MSE), and R-squared. Additionally, interpret the model’s
coefficients and discuss how various features impact predictions of medical expenses.

5. In Multiple Linear Regression, the normal equation solution was obtained by minimizing the sum of
squares error. Show that the Maximum Likelihood method is the in essence the same as minimizing
the sum of squares error and thus show that the Maximum Likelihood estimate for the matrix of
the coefficients (θM L ) is the same as that obtained via solving the normal equation (Hint: Use the
assumption that the additive noise term is Gaussian)
6. Using K-Means clustering for image compression: Image compression enables us store enormous
amounts of data while using lesser disk space while retaining significant aspects of the image which
will be needed for analysis. We can apply the K-Means algorithm to an image where the parameter
K is the palette of the colours that we have in the final image.

(a) Use the K-Means algorithm to apply compression on a test image. Visualize the results
obtained by using powers of 2 less than 2048 as the value for K
(b) Decide on an appropriate value for K (Hint: Use an elbow plot to justify your choice)
(c) Is the compression obtained lossy or lossless? What is the effect of varying the value of K in
terms of overfitting or underfitting the data?

A color image is represented by a matrix of dimensions (w,h,c) where w, h, c stand for width,
height, and number of color channels which in our test image is three, (for example RGB: Red,
Green and Blue). These three colors can be treated as three features and each pixel can be treated
as separate datapoints on which we apply the K-Means clustering.

2
7. Visualize the difference in the clusters obtained through applying Hierarchical Clustering on Online
Retail data using different linkage methods. Plot the dendograms for the three linkage meth-
ods(single, complete and average). Use the number of clusters as 3 for producing the visualizations.
Provide a brief explanation for the use cases of each kind of linkage.
8. Imagine that you are running “MedGenius Solutions”, a startup providing innovative healthcare
solutions, and you aim to develop a Proof of Concept (PoC) using a toy dataset. Your goal is
to secure a project and budget from leading clients in the healthcare industry. To achieve this,
you need to demonstrate the capabilities of spectral clustering in gene expression analysis. You
are tasked with developing a PoC for using spectral clustering in gene expression analysis. Gene
expression data, which records the activity levels of genes across different samples, is provided in
tabular form. Your goal is to

(a) Implement spectral clustering using Python and scikit-learn to identify clusters of co-expressed
genes within the dataset.
(b) Create visualizations for the true clusters based on the information in the 3rd column of the
dataset.
(c) Evaluate and provide insights on the outcomes, including a comprehensive report on perfor-
mance metrics such as Adjusted Rand Index, Adjusted Mutual Information, and Silhouette
Score.

9. Imagine you are conducting a data analysis project and want to use DBSCAN (Density-Based
Spatial Clustering of Applications with Noise) to cluster data points. You are tasked with imple-
menting DBSCAN, a density-based clustering algorithm, on a dummy toy dataset. The aim is to
demonstrate the clustering capabilities of DBSCAN and assess its sensitivity to parameter choices.

(a) Generate a dummy toy dataset with varying densities and shapes. Set the eps (Epsilon) and
min samples (MinPts) parameters, and then fit DBSCAN to the generated dataset.
Hint: You can use functions like make blobs or make moons from scikit-learn to create a
synthetic dataset.
(b) Experiment with each combination of eps and min samples (consider at least 3 values of each)
for these parameters. Report the values of the performance metrics to evaluate DBSCAN’s
sensitivity to parameter choices.
(c) Visualize the clustering results using a scatter plot, where each cluster is assigned a different
color. Additionally, use a different marker shape for noise points.
(d) Calculate and report the following performance metrics: Silhouette Score, Adjusted Rand
Index, Adjusted Mutual Information.

NB 12
No ratings yet
NB 12
34 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Problem Solving, Mathematical Investigation and Modeling Week 8 Logical Reasoning or Elimination
100% (2)
Problem Solving, Mathematical Investigation and Modeling Week 8 Logical Reasoning or Elimination
2 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
No ratings yet
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
3 pages
HW_02
No ratings yet
HW_02
3 pages
MIDA1 AUT - Solutions
No ratings yet
MIDA1 AUT - Solutions
4 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
Ml Cyber Lab
No ratings yet
Ml Cyber Lab
16 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
COMPSCI5014 1 Machine Learning (M) 201904
No ratings yet
COMPSCI5014 1 Machine Learning (M) 201904
7 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Machine 2021 Jul-Dec
No ratings yet
Machine 2021 Jul-Dec
46 pages
dsa _dk question paper
No ratings yet
dsa _dk question paper
4 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
Homework 9 Due: March 13, 2020, 11:59PM PT
No ratings yet
Homework 9 Due: March 13, 2020, 11:59PM PT
2 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
22CB340
No ratings yet
22CB340
4 pages
DM_LabManual_teena
No ratings yet
DM_LabManual_teena
6 pages
hw3_red
No ratings yet
hw3_red
4 pages
ML (1)
No ratings yet
ML (1)
6 pages
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
No ratings yet
A1388404476 - 64039 - 23 - 2023 - Machine Learning II
10 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
DSP 51 Mock Test II
No ratings yet
DSP 51 Mock Test II
4 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
Question Bank
No ratings yet
Question Bank
6 pages
Assignment 1
No ratings yet
Assignment 1
16 pages
Question Bank1
No ratings yet
Question Bank1
9 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
hw01s
No ratings yet
hw01s
10 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Homework 1
No ratings yet
Homework 1
9 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Amazon ML Pyq
No ratings yet
Amazon ML Pyq
8 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Atm4171 2024 E0
No ratings yet
Atm4171 2024 E0
7 pages
assgmt1
No ratings yet
assgmt1
7 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Homework Assignment 3 Homework Assignment 3
No ratings yet
Homework Assignment 3 Homework Assignment 3
10 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
lab 07
No ratings yet
lab 07
3 pages
0 Computer Vision Panikzettel
No ratings yet
0 Computer Vision Panikzettel
28 pages
Experimental Quantification of Hang Up
No ratings yet
Experimental Quantification of Hang Up
9 pages
7 papers 2025
No ratings yet
7 papers 2025
33 pages
Quadratic Equations
No ratings yet
Quadratic Equations
2 pages
Delhi Public School, Kalyanpur Class - Iv Subject: Maths Periodic Assessment IV Section A Choose The Correct Option
No ratings yet
Delhi Public School, Kalyanpur Class - Iv Subject: Maths Periodic Assessment IV Section A Choose The Correct Option
5 pages
G8 and G9 Math Olympics
No ratings yet
G8 and G9 Math Olympics
9 pages
Stair Safety
No ratings yet
Stair Safety
96 pages
Strip Method Best Word
No ratings yet
Strip Method Best Word
28 pages
MAT
No ratings yet
MAT
3 pages
GATE EE 2005 With Solutions
50% (2)
GATE EE 2005 With Solutions
53 pages
Ideal Gas Law
No ratings yet
Ideal Gas Law
5 pages
Bahria University Karachi Campus: Management Sciences Department
No ratings yet
Bahria University Karachi Campus: Management Sciences Department
5 pages
Comprehensive Report Ajit
No ratings yet
Comprehensive Report Ajit
25 pages
Cestat30 Midterms Reviewer
No ratings yet
Cestat30 Midterms Reviewer
17 pages
Smart Stress
No ratings yet
Smart Stress
24 pages
Equilibrium of Concurrent Forces in A Plane
No ratings yet
Equilibrium of Concurrent Forces in A Plane
13 pages
Riset Operasi 1
No ratings yet
Riset Operasi 1
10 pages
5-Heat Transfer (Jun 19)
No ratings yet
5-Heat Transfer (Jun 19)
30 pages
1MS3010201 - Quantitative Methods For Business II
No ratings yet
1MS3010201 - Quantitative Methods For Business II
4 pages
APznzaZX4yw3yuiK7uvu5FaVhUBKGjBQ1Yqgia57DLKG2aCYb4x4aBoGrViYY5a4hwE8G BoQ7iKTZs6SiGF69mVE1YpZXDojt4z2cWcINLNW6tRwGDNv326tFnwmBI8wai3fiIQT5WbPwtnMgPsA1reiI1KPfT1jWMcDPks43pZdnEonHujoQgQ-QSfUH5Vd70YV5X0s c6tmIaeF4mOlU0 TRLYgdzbb8 HtkW6LCZsJvzPxiYIhODzGa
No ratings yet
APznzaZX4yw3yuiK7uvu5FaVhUBKGjBQ1Yqgia57DLKG2aCYb4x4aBoGrViYY5a4hwE8G BoQ7iKTZs6SiGF69mVE1YpZXDojt4z2cWcINLNW6tRwGDNv326tFnwmBI8wai3fiIQT5WbPwtnMgPsA1reiI1KPfT1jWMcDPks43pZdnEonHujoQgQ-QSfUH5Vd70YV5X0s c6tmIaeF4mOlU0 TRLYgdzbb8 HtkW6LCZsJvzPxiYIhODzGa
24 pages
Teaching Plan 3B.1 Wave Phase and Superposition
No ratings yet
Teaching Plan 3B.1 Wave Phase and Superposition
15 pages
MECH 1A Module 6 - Cable and Catenary
No ratings yet
MECH 1A Module 6 - Cable and Catenary
10 pages
Dev
No ratings yet
Dev
16 pages
Maxwell's Equations: Date:-16-10-19
No ratings yet
Maxwell's Equations: Date:-16-10-19
7 pages
Chapter 3
0% (1)
Chapter 3
100 pages
[Ebooks PDF] download (Ebook) Handbook of Developmental Systems Theory and Methodology by Peter C. M. Molenaar, Richard M. Lerner, Karl M. Newell ISBN 9781609185091, 1609185099 full chapters
100% (1)
[Ebooks PDF] download (Ebook) Handbook of Developmental Systems Theory and Methodology by Peter C. M. Molenaar, Richard M. Lerner, Karl M. Newell ISBN 9781609185091, 1609185099 full chapters
86 pages
06-Garagash Large Toughness
No ratings yet
06-Garagash Large Toughness
26 pages
Mathematics Year 4 Textbook DLP 122
No ratings yet
Mathematics Year 4 Textbook DLP 122
1 page

Assignment_III

Uploaded by

Assignment_III

Uploaded by

Indian Institute of Technology Madras

ID5055 Foundations of Machine learning

Show that we can write

What is the value of ai ?

You might also like