Machine Learning Assignment

This document outlines an assignment for an introduction to machine learning course. It involves experiments with linear classification and regression methods. Students must generate synthetic datasets, train linear classifiers and k-NN on the data, perform linear and ridge regression on a real-world dataset, and submit code and a report detailing their results. They are instructed not to collaborate with others on the individual assignment.

Uploaded by

Nikhilesh Rajaraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

692 views2 pages

Machine Learning Assignment

Uploaded by

Nikhilesh Rajaraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

IITM-CS5011 : Introduction to Machine Learning Assignment #2

Given on: Aug 19, 10pm Due on : Sep 02, 11:55pm

The goal of this assignment is to experiment with linear methods for classication and regression. This is an individual assignment. Collaborations and discussions with others are strictly prohibited. You may use Matlab, Octave, Python, R or Java for your implementation. If you are using any other languages, please contact us before you proceed. You have to turn in the well documented code along with a detailed report of the results of the experiment electronically in Moodle. Typeset your report in Latex. Your report should contain detailed answer for all of the questions asked below. Look at the end of the assignment for submission instructions.

1. You will use a synthetic data set for the classication task. Generate two classes with 10 features each. Each class is given by a multivariate Gaussian distribution, with both classes sharing the same covariance matrix. Ensure that the covarianve matrix is not spherical, i.e., that it is not a diagonal matrix, with all the diagonal entries being the same. Generate 1000 examples for each class. Choose the centroids for the classes close enough so that there is some overlap in the classes. Specify clearly the details of the parameters used for the data generation. Randomly pick 40% of each class (i.e., 400 data points per class) as a test set, and train the classiers on the remaining 60% data. When you report performance results, it should be on the left out 40%. Call this dataset at DS1. 2. For DS1, learn a linear classier by using regression on indicator variable. Report the best t achieved by the classier, along with the coecients learnt. 3. For DS1, use k-NN to learn a classier. Repeat the experiment for dierent values of k and report the performance for each value. Technically this is not a linear classier, but I want you to appreciate how powerful linear classiers can be. Do you do better than regression on indicator variables or worse? Are there particular values of k which perform better? 4. Now instead of having a single multivariate Gaussian distribution per class, each class is going to be generated by a mixture of 3 Gaussians. For each class, dene 3 Gaussians, with rst Gaussian of the rst class sharing the covariance matrix with rst Gaussian of the second class and so on. For both the classes, x the mixture probability as (0.1,0.42,0.48) i.e. the sample has arisen from rst gaussian with probablity 0.1, second with probability 0.42 and so on. Now sample from this distribution and generate the

dataset similar to question 1. Call this dataset as DS2. Now perform the experiments in questions 2 and 3 again, but now using DS2. What do you observe? Can you comment on the performance of both the classier when you use DS1 and DS2? 5. For the regression tasks, you will use the Communities and Crime Data Set from the UCI repository (http://archive.ics.uci.edu/ml/datasets/Communities+and+Crime). This is a real-life data set and as such would not have the nice properties that we expect. Your rst job is to make this dataset usable, by lling in all the missing values . Use the sample mean of the missing attribute. Is this is a good choice ? What else might you use? If you have a better method, describe it, and you may use it for lling in the missing data. Turn in the complete data set. 6. Fit the above data using linear regression. Report the residual error of the best t achieved, along with the coecients learnt. 7. Use Ridge-regression on the above data. Repeat the experiment for dierent values of . Report the residual error for each value, along with the coecients learnt. Which value of gives the best t? Is it possible to use the information you obtained during this experiment for feature selection? If so, what is the best t you achieve with a reduced set of features?

Submission Instructions
Specify your choice of language in the link provided in the moodle. Submit a single tarball/zip le containing the following les in the specied directory structure. Use the following naming convention: cs5011 a2 rollno.tar.gz cs5011 a2 rollno Dataset DS1-train.csv DS1-test.csv DS2-train.csv DS2-test.csv CandC-train.csv CandC-test.csv Report rollno-report.pdf Code all your code les

Page 2

Software Engineering For Data Scientists (MEAP V2) Andrew Treadway Download
100% (5)
Software Engineering For Data Scientists (MEAP V2) Andrew Treadway Download
84 pages
100 MCQ Questions For Practice
No ratings yet
100 MCQ Questions For Practice
35 pages
Prob-Stat - 222 Final
No ratings yet
Prob-Stat - 222 Final
41 pages
Dokumen - Pub Time Series Econometrics J 6726102
100% (2)
Dokumen - Pub Time Series Econometrics J 6726102
219 pages
Wamala Group
No ratings yet
Wamala Group
72 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
25 pages
Elderly Tourism Using Virtual Reality - Verification of Strategies To Build Loyalty Using The Loyalty Chain Stages Theory
No ratings yet
Elderly Tourism Using Virtual Reality - Verification of Strategies To Build Loyalty Using The Loyalty Chain Stages Theory
13 pages
BCOC 134 Important Notes
No ratings yet
BCOC 134 Important Notes
8 pages
Made - Kajian Analisis Jalur Menggunakan SmartPLS 3.0
No ratings yet
Made - Kajian Analisis Jalur Menggunakan SmartPLS 3.0
21 pages
The Effect of Organizational Culture and Leadership Style On Organizational Commitment Within Smes in Suriname, With Job Satisfaction As A Mediator
No ratings yet
The Effect of Organizational Culture and Leadership Style On Organizational Commitment Within Smes in Suriname, With Job Satisfaction As A Mediator
81 pages
Ej 1282211
No ratings yet
Ej 1282211
11 pages
11 Chapter 7 Analyzing The Moderating Variable
100% (1)
11 Chapter 7 Analyzing The Moderating Variable
27 pages
3.4 Jurnal Internasional Judul 3
No ratings yet
3.4 Jurnal Internasional Judul 3
16 pages
Recent Advances in Surrogate-Based Optimization
No ratings yet
Recent Advances in Surrogate-Based Optimization
30 pages
Case Study Questions
No ratings yet
Case Study Questions
3 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
Ashcraft-1992 - Cognitive Arithmetics PDF
No ratings yet
Ashcraft-1992 - Cognitive Arithmetics PDF
32 pages
Rohini 73149042113
No ratings yet
Rohini 73149042113
11 pages
02 Multicollinearity
100% (1)
02 Multicollinearity
8 pages
Lancaster - Sample Chapter - Intro To Modern Bayesian Econometrics
No ratings yet
Lancaster - Sample Chapter - Intro To Modern Bayesian Econometrics
69 pages
Machine Learning Assignment
100% (1)
Machine Learning Assignment
55 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
Statistical Learning in Practice - Young
No ratings yet
Statistical Learning in Practice - Young
2 pages
Gender Inequality A Case Study in Pakistan
No ratings yet
Gender Inequality A Case Study in Pakistan
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Intellectual Capital PDF
No ratings yet
Intellectual Capital PDF
18 pages
Econometrics - Review Questions
No ratings yet
Econometrics - Review Questions
4 pages
Correlation and Causality by David A. Kenny
No ratings yet
Correlation and Causality by David A. Kenny
182 pages
Data Science Statistics With Data Science Portfolio
No ratings yet
Data Science Statistics With Data Science Portfolio
6 pages
A Hydrological Neighbourhood Approach To Predicting Streamflow in The Mackenzie Valley
No ratings yet
A Hydrological Neighbourhood Approach To Predicting Streamflow in The Mackenzie Valley
24 pages
Ad3461 ML Lab Manual
100% (1)
Ad3461 ML Lab Manual
54 pages
Oliweh Gender Stereotyping On Female Students
No ratings yet
Oliweh Gender Stereotyping On Female Students
13 pages
Recommender System Syllabus
No ratings yet
Recommender System Syllabus
3 pages
Transportation Engineering 05 Ce 63xx
No ratings yet
Transportation Engineering 05 Ce 63xx
55 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Liderazgo e Innovacion
No ratings yet
Liderazgo e Innovacion
10 pages
1694601295-Unit 3.6 Generalized Discriminant Analysis CU 2.0
100% (1)
1694601295-Unit 3.6 Generalized Discriminant Analysis CU 2.0
15 pages
Student Information Nguyen Gia Phuong Anh 1704040005 1 8 - Intdef
No ratings yet
Student Information Nguyen Gia Phuong Anh 1704040005 1 8 - Intdef
2 pages
Machine Learning Assignments
No ratings yet
Machine Learning Assignments
3 pages
Ab Crypt 2 Classical Encryption
No ratings yet
Ab Crypt 2 Classical Encryption
101 pages
Introduction To Machine Learning - III
75% (4)
Introduction To Machine Learning - III
51 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
Machine Learning Unit 4 MCQ
No ratings yet
Machine Learning Unit 4 MCQ
28 pages
Email Spam Detection Using Machine Learning
No ratings yet
Email Spam Detection Using Machine Learning
2 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
UNIT 4 - Perceptron and DL
No ratings yet
UNIT 4 - Perceptron and DL
39 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
ML MCQ Question Bank
100% (1)
ML MCQ Question Bank
4 pages
Assignment EMET8005
No ratings yet
Assignment EMET8005
3 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
MCA18R5103-Soft Computing Techniques Question Bank: 2 Marks
No ratings yet
MCA18R5103-Soft Computing Techniques Question Bank: 2 Marks
3 pages
Daa Assignment
No ratings yet
Daa Assignment
12 pages
Econometric S Lecture 45
No ratings yet
Econometric S Lecture 45
31 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
Econometrics Worksheet
No ratings yet
Econometrics Worksheet
7 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Syllabus
No ratings yet
Syllabus
9 pages
SQL Level 2 - Powerpoint Joins
No ratings yet
SQL Level 2 - Powerpoint Joins
43 pages
Question Bank For NN
No ratings yet
Question Bank For NN
6 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
Lesson Plan For GE3151
No ratings yet
Lesson Plan For GE3151
5 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
Project #1 - Python Magic 8 Ball: Complete Python Programming Masterclass Beginner To Advanced
No ratings yet
Project #1 - Python Magic 8 Ball: Complete Python Programming Masterclass Beginner To Advanced
4 pages
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Data Mining Syllabus
No ratings yet
Data Mining Syllabus
1 page
UNIT 1 Practice Quiz - MCQs - ML
100% (1)
UNIT 1 Practice Quiz - MCQs - ML
10 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Cyber Security Seminar Brochure
No ratings yet
Cyber Security Seminar Brochure
4 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
18CS72
No ratings yet
18CS72
2 pages
1 FIND+S+Algorithm
No ratings yet
1 FIND+S+Algorithm
2 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
5 pages
Machine Learning Assignment 1 Submission Date: 5/10/2020
No ratings yet
Machine Learning Assignment 1 Submission Date: 5/10/2020
1 page
Implement Union, Intersection, Complement and Difference Operations of Fuzzy Set Using Python
No ratings yet
Implement Union, Intersection, Complement and Difference Operations of Fuzzy Set Using Python
2 pages
CS01207
No ratings yet
CS01207
3 pages
Deep Learning KCS078
0% (1)
Deep Learning KCS078
2 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
ML Question Bank - Beena Kapadia
No ratings yet
ML Question Bank - Beena Kapadia
3 pages
NNFLC Question
No ratings yet
NNFLC Question
1 page
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Data Mining - Discretization
100% (1)
Data Mining - Discretization
5 pages
18AI61
No ratings yet
18AI61
3 pages
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
0% (1)
Data Analytics (A) CS-503, B.Tech. 5 Semester Assignment Questions
2 pages

Machine Learning Assignment

Uploaded by

Machine Learning Assignment

Uploaded by

IITM-CS5011 : Introduction to Machine Learning Assignment #2

Given on: Aug 19, 10pm Due on : Sep 02, 11:55pm

You might also like