0% found this document useful (0 votes)

9 views7 pages

Assgmt 1

The document outlines the assignment details for CS 725 at IIT Bombay, including submission instructions, directory structure, and specific tasks across three parts: Numpy Basics, Linear Regression, and a Kaggle Competition. Students are required to complete various programming tasks in Python, focusing on coin tossing simulations, linear regression techniques, and sentiment prediction from product reviews. The assignment is due by September 6, 2024, with no extensions allowed.

Uploaded by

prosenjitshilsharma19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

Assgmt 1

Uploaded by

prosenjitshilsharma19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

IIT B OMBAY

Assignment #1
Points: 50
Course: CS 725 – Instructor: Preethi Jyothi
Due date: 11:55 pm, September 6, 2024

General Instructions

• Download the file assgmt1.tgz from Moodle and extract the file to get a direc-
tory named assgmt1 with all the necessary files within.

• For your final submission, create a final submission directory named assgmt1
with the following internal directory structure:

assgmt1/
|
+-README
+- part1A.py
+- histograms/
+- part1B.py
+- closedForm.py
+- batchGradientDescent.py
+- smilingJoker.py
+- part3.py
+- kaggle.csv

Compress your submission directory using the command: tar -cvzf

assgmt1.tgz assgmt1 and upload assgmt1.tgz to Moodle. This submission is
due on or before 11:55 pm on Sept 6, 2024. No extensions will be entertained.

• README should contain the names and roll numbers of all your team members.

• A quick and short introduction to NumPy is here. Please make sure you go
through this carefully; numpy operations beyond what is mentioned in this doc-
ument can also be used for the assignment.

1
2

Part I: Numpy Basics (15 points)

(A) Coin Tossing. Say you toss 100 fair coins simultaneously and want to pro-
grammatically record the total number of heads. One simultaneous toss of the 100
coins counts as a single trial. part1A.py contains a list num_trials_list of varying
numbers of trials. For every value in this list, we want to count the number of
heads and plot a histogram. The histogram should take the shape of a binomial
distribution and get more accurate with larger numbers of trials.

part1A.py has two functions:

1. toss(num_trials): Takes the number of times the experiment is to be per-
formed as an argument and returns a numpya array of the same size with the
number of heads obtained in each trial. [2 pts]

2. plot_hist(trial): Takes an array of the number of heads obtained for k trials

from toss(k) and plots a histogram based on these counts. Generate plots for
each value in trials_list and save them in a directory. [3 pts]
Complete the following tasks:

1. Complete the function definitions in part1A.py and submit it as

assgmt1/part1A.py.

2. Create a directory named histograms inside the directory assgmt1/.

3. Save the histograms obtained by plot_hist() in histograms with names for-

matted as hist_<num_trials>.png. For example, for num_trials = 1000, the
filename would be hist_1000.png.

Notes:

1. Use plt.savefig() function to save the histograms.

2. Use the given template only. DO NOT change the names of the given functions
as it will cause the autograder to fail.

3. Use for loops to simulate the 100 coin tosses for a given num_trials value. Do
not use predefined functions to calculate the numpy array in ‘‘toss()" function
else marks will be deducted.

Here is an example histogram for num_trials = 10000.

a https://numpy.org/doc/stable/user/index.html
3

(B) Tensor Manipulations and Softmax. Consider N column vectors

{u1 , . . . , u N }, ui ∈ Rd≥0 , and two non-negative real-valued d × d matrices M1 , M2 .

Complete each of the following steps using numpy operations and do not use any
for loops:

1. Compute xi = uiT M1 , ∀i ∈ {1, . . . , N }. Stack the N row vectors xi to form a new

matrix X ∈ R N ×d . Similarly, construct a matrix Y ∈ R N ×d using row vectors
computed from uiT M2 , ∀i ∈ {1, . . . , N }. [1 pts]

2. Modify X to add the integer i to all elements of its ith row, where i ∈ {1, . . . , N }.
Let this offset-modified matrix now be X̂. [2 pts]

3. Compute Z = X̂YT , Z ∈ R N × N . Let

   
← z1 → z11 z12 . . . z1N
 ← z2 →   z21 z22 . . . z2N 
Z= =
   
..   .. 
 .   . 
← zN → N×N z N1 z N2 . . . z NN N × N

Apply the following “sparsify" operation on Z (examples shown below for N =

3 and N = 4):

sparsify
sparsify z11 z12 z13 z14 z11 0 z13 0
z11 z12 z13 z11 0 z13 z21 z22 z23 z24 0 z22 0 z24
z21 z22 z23 0 z22 0
z31 z32 z33 z34 z31 0 z33 0
z31 z32 z33 z31 0 z33
z41 z42 z43 z44 0 z42 0 z44

You should use numpy broadcasting in this step. [2 pts]

exp(zij )
4. Compute Ẑ such that each zi in Z is replaced by ẑi , where ẑij = ∑ j exp(zij )
. Note
that each row in Ẑ will now be a probability distribution. [2 pts]

5. Print the index of the maximum probability in each row of Ẑ. [1 pts]

Add your solution to part1B.py that already contains an initialise_input function

and submit it as assgmt1/part1B.py. We will add new test cases during grading;
successful completion will get you 2 points. [2 pts]
4

Part II: Linear Regression (30 points)

(A) Closed-form Solution of Linear Regression (10 points). For a training set D =
{(x1 , y1 ), . . . , (xn , yn )}, xi ∈ R(d+1) represented by a feature matrix X and a label
vector y, the least squares solution w∗ can be computed by w∗ = (X⊤ X)−1 X⊤ y where

←− x1 ⊤ −→
     
y1 w0
 ←− x ⊤ −→   y2   w2 
2
X= , y =  ..  , w =  .. 
     
.. 
 .   .   . 
⊤
←− xn −→ n×(d+1) y n n ×1 wd (d+1)×1

Make the following changes to closedForm.py:

1. fit(): This function should calculate the weights using the closed-form equa-
tion as defined on the given values of X and y passed as input. [3 pts]

2. predict(): This function should return the predicted values of the model on
an input set of feature values. [2 pts]

3. plot_learned_equation(): This function should generate a plot (to be saved

as closed_form.png) that shows all the data points along with the best fit. The
generated plot should look something like: [2 pts]

Note:

1. generate_toy_dataset() can be used to generate the data. On completing

your tasks, run:
$ python closedForm.py

2. To make your life easy, we have a test suite that checks your code on a few
test cases. You will need PyTest to run these test cases. We might add some
additional cases during the final grading. [3 pts]
To run the test suite, execute:
$ pytest test_closedForm.py

3. Stick to the given template code. DO NOT change the names of the functions
given as it will cause the autograder to fail.
5

(B) Gradient Descent for Linear Regression (12 points). Find the least squares so-
lution using Batch Gradient Descent. batch_size indicates the number of data points
in a batch. If batch_size is None, this function implements full gradient descent and
computes the gradient over all training examples. The code for generating and cre-
ating batches on a toy dataset is in batchGradientDescent.py.

Your task is to complete the following functions:

1. fit(): There are two loops inside this function. You have to complete
the inner for loop and use compute_gradient() to calculate the gradient of
the loss w.r.t. the weights. Next, you must calculate the training loss us-
ing compute_rmse_loss() and store these losses across training epochs in
error_list. [2 pts]

2. compute_gradient(): This function should return the gradient of the loss w.r.t
weights of the model. (Normalize the values before returning, or it may cause
gradients to explode.) [2 pts]

3. compute_rmse_loss():
q This function should return the Root Mean Square Error
loss ( n1 ||y − Xw||22 ) between the target labels and predicted labels. [1 pts]

4. predict(): This function should return the predicted values of the model on
the given set of feature values passed as an argument to it. [1 pts]

5. plot_loss(): This function is used to plot the losses stored in error_list of

the model. Save the image as plot_loss.png. [1 pts]

6. plot_learned_equation(): This function generates a plot (save the image as

gradient_descent.png) on a dataset with 1 feature (i.e d = 1). [1 pts]

Finally, your code should generate two plots that look like this:

Note:

1. Similar to the previous question, to generate these plots, simply execute:

$ python batchGradientDescent.py

2. You can run test cases using: [4 pts]

$ pytest test_batchGradientDescent.py

3. Stick to the given template code. DO NOT change the names of the functions
given as it will cause the autograder to fail.
6

This question will require some basic reasoning about feature spaces. You are given
a training set D = {(x1 , y1 ), . . . , (xn , yn )}, xi ∈ R. The relation between xi and yi
is defined by a weighted combination of transformations Φ = {ϕ1 , ϕ2 , . . . , ϕK } such
that:
K
y i = w 0 + ∑ w j · ϕ j ( xi ) ∀i ∈ {1, 2, . . . , n}
j =1

Given just the dataset, you are tasked with identifying the appropriate number and
choice for these transformations ϕj and fitting a linear regression model.
More concretely, complete the following functions in smilingJoker.py.:
1. read_dataset(): This function should read the CSV file dataset.csv and gener-
ate train and test splits. The file contains 500 data points; you can use the first
90% of the data for training and the rest for testing your model. [2 pts]

2. transform_input(): This function should take X ∈ Rn×1 as input and return

the transformed X ∈ Rn×(K +1) as output. (Note: This can be implemented as a
matrix operation.) [3 pts]
Note:

1. Similar to the previous questions, you should run the following and pass all the
prediction checks: [3 pts]
$ python smilingJoker.py

2. Stick to the given template code. DO NOT change the names of the functions
given as it will cause the autograder to fail.

Note:
When you plot the dataset, you will see a Smiling Joker a (hence the name!).

a https://tinyurl.com/stock-smiley-joker
7

Part III: Kaggle Competition (5 points)

Challenge: Predict Sentiments from Compressed High-dimensional Features.

Overview. In this Kaggle competition, you will solve a realistic linear regression
task using code you have written in the previous parts. The task is to predict target
scores (0 to 5 ranging from very negative to very positive) based on a set of features
extracted from product reviews. The final model will be evaluated on a test dataset
via Kaggle, and test performance will be measured using the Mean Squared Error
(MSE) metric.
Competition Link: You can join the competition on Kaggle: IIT Bombay CS 725
Assignment 1 (Autumn 2024). Please sign up on Kaggle using your IITB LDAP email
ID, with your Kaggle “Display Name" set to the roll number of anyone in your team.
This is important for us to identify you on the leaderboard.
Dataset Description. You are given three CSV files:
• train.csv: This file contains the training data with 64 features and a corre-
sponding target score for each entry.

• test.csv: This file contains the test data with 64 features but without the target
scores.

• sample.csv: This file contains the submission format with predicted scores for
the test data. You will have to submit such a file with your test predictions.
Each row in the data files represents an instance with the following columns: ID:
A unique identifier for each data point, feature_0, feature_1, ..., feature_63:
The 64 features extracted from the dataset. score: The target score for each data point
(only in train.csv).

Task Description. Implement a linear regression model for the given problem. You
can reuse any of the functions from Part II of this assignment. Tune the hyperparam-
eters on a held-out set from train.csv to achieve best model performance on the test
set. Predict the target scores on the test dataset. Round the predicted scores to the
nearest integer before submission.

Evaluation. The performance of your model will be evaluated based on the Mean
Squared Error (MSE) calculated on the test dataset. Your predicted scores must be
rounded to the nearest integer. You should not implement the MSE metric; it will be
automatically calculated via Kaggle. Your model will be evaluated on the provided
test set, where a random 50% of the examples are marked as private and the remaining
are public. The private/public distribution will be hidden. You can monitor your
model’s performance on the public part of the test set via the public leaderboard.
The final evaluation will be based on the private part of the test set, which will be
revealed via the private leaderboard after the competition concludes.

Submission. Submit your source file named part3.py and a CSV file kaggle.csv
with your predicted scores (rounded to nearest integer) for the test dataset, following
the format in sample.csv. If you match or outperform the baseline RMSE obtained
with the solution from the closed form solution in part II, you will get all 5 points.
Top-scoring performers on the "Private Leaderboard" (with a suitable threshold de-
termined after the deadline passes) will be awarded up to 3 extra points.

ML Coursera Python Assignments
100% (1)
ML Coursera Python Assignments
20 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
3 Semester Physics Python Problem Sheet
No ratings yet
3 Semester Physics Python Problem Sheet
3 pages
Data Science Lab Experiments
No ratings yet
Data Science Lab Experiments
32 pages
NB 12
No ratings yet
NB 12
34 pages
hw2 311
No ratings yet
hw2 311
4 pages
Mayhoc
No ratings yet
Mayhoc
51 pages
Homework 4
No ratings yet
Homework 4
3 pages
Cbse - Class X - Maths Worksheet - Trigonometry
67% (6)
Cbse - Class X - Maths Worksheet - Trigonometry
2 pages
Ass 1
No ratings yet
Ass 1
3 pages
Homework Assignment 3 Homework Assignment 3
No ratings yet
Homework Assignment 3 Homework Assignment 3
10 pages
Assignment 1 PDF
No ratings yet
Assignment 1 PDF
4 pages
Assigniment 2 Machine Learning
No ratings yet
Assigniment 2 Machine Learning
7 pages
Assignment 01
No ratings yet
Assignment 01
7 pages
HW 1
No ratings yet
HW 1
3 pages
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
No ratings yet
# ELG 5255 Applied Machine Learning Fall 2020 # Assignment 3 (Multivariate Method)
8 pages
Python Lab Sols 6
No ratings yet
Python Lab Sols 6
8 pages
ML Lab 06 Manual - Linear Regression 1 (Version 6)
No ratings yet
ML Lab 06 Manual - Linear Regression 1 (Version 6)
8 pages
Machine File
No ratings yet
Machine File
27 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
BITS - AIML-Cohort 10 - Regression - Assignment 1
No ratings yet
BITS - AIML-Cohort 10 - Regression - Assignment 1
2 pages
HW 1 in 2015
No ratings yet
HW 1 in 2015
3 pages
Sns Lab Manuals: 18-Ee-128 Musaib Ahmed
No ratings yet
Sns Lab Manuals: 18-Ee-128 Musaib Ahmed
104 pages
Python Assignments
No ratings yet
Python Assignments
11 pages
HW 3
No ratings yet
HW 3
4 pages
Machine Learning Assignments
No ratings yet
Machine Learning Assignments
3 pages
Shalvin
No ratings yet
Shalvin
9 pages
End Sem PYQ
No ratings yet
End Sem PYQ
8 pages
50 Inference
No ratings yet
50 Inference
31 pages
ML Lab 08 Manual - Logisitic Regression (Ver7)
No ratings yet
ML Lab 08 Manual - Logisitic Regression (Ver7)
9 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 1
11 pages
Qs ML
No ratings yet
Qs ML
8 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Lab 11,12
No ratings yet
Lab 11,12
7 pages
Lab Experiments Vi Sem-1
No ratings yet
Lab Experiments Vi Sem-1
10 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Homework 2
No ratings yet
Homework 2
3 pages
Homework 2
No ratings yet
Homework 2
3 pages
ML Cyber Lab
No ratings yet
ML Cyber Lab
16 pages
178 hw3
No ratings yet
178 hw3
3 pages
HW 1
No ratings yet
HW 1
12 pages
Exercise 01
No ratings yet
Exercise 01
3 pages
Homework #1 (100 Points) : A. Theory Problems
No ratings yet
Homework #1 (100 Points) : A. Theory Problems
4 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
Lab 04 - Logisitic Regression
No ratings yet
Lab 04 - Logisitic Regression
5 pages
Col774 Ass1 v1
No ratings yet
Col774 Ass1 v1
5 pages
HW 3
No ratings yet
HW 3
7 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
Assgmt1 11111111 1
No ratings yet
Assgmt1 11111111 1
2 pages
AI Lab 3 11102024 011846pm
No ratings yet
AI Lab 3 11102024 011846pm
5 pages
Dda3020 2024F HW1
No ratings yet
Dda3020 2024F HW1
6 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
Assignment III
No ratings yet
Assignment III
3 pages
Python Ws20 21
No ratings yet
Python Ws20 21
15 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Ps 1
No ratings yet
Ps 1
5 pages
HW 1
No ratings yet
HW 1
4 pages
Gen Chem Reviewer
100% (1)
Gen Chem Reviewer
6 pages
ETAP 21.0.1 - Unbalanced Load Flow Analysis
No ratings yet
ETAP 21.0.1 - Unbalanced Load Flow Analysis
80 pages
Inventory Management Summary
No ratings yet
Inventory Management Summary
5 pages
18+430 List of CH.: Design of Pier P1
No ratings yet
18+430 List of CH.: Design of Pier P1
54 pages
Meq Model Questions
0% (1)
Meq Model Questions
4 pages
Modifications For The Kenwood TS-940
No ratings yet
Modifications For The Kenwood TS-940
10 pages
Ratios Rates
No ratings yet
Ratios Rates
2 pages
Ionic Equilibrium
No ratings yet
Ionic Equilibrium
55 pages
Artificial Lift - Mericler 2024
No ratings yet
Artificial Lift - Mericler 2024
170 pages
Experiment-2 Name of Student: Waghule Shubham Kalyan Batch: B3 Branch: CS-D Roll No.: 94 Problem Statement
100% (2)
Experiment-2 Name of Student: Waghule Shubham Kalyan Batch: B3 Branch: CS-D Roll No.: 94 Problem Statement
3 pages
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
No ratings yet
B.Sc. II Year (Biotechnology), NEP (Session - 2024-25) Paper I (Major)
25 pages
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
No ratings yet
Chapter 8: Analysis Setup: Setting Up Loading Conditions Formatting Models For Analysis
17 pages
World Population Analysis
100% (1)
World Population Analysis
64 pages
EastWestAirlines Cluster
100% (1)
EastWestAirlines Cluster
6 pages
RMR DOKU V20 E L
100% (1)
RMR DOKU V20 E L
133 pages
Imd 95-280y Im 0322
No ratings yet
Imd 95-280y Im 0322
6 pages
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
No ratings yet
Generalized Minimum Miscibility Pressure Correlation: SPE, Petroleum Technology Research LNST
10 pages
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
No ratings yet
Poster - 6 - PATH - Oxygen Oxygen Conversion Calculation - 33x23 in (NEW)
1 page
Virtual Density Lab 2018 PDF
No ratings yet
Virtual Density Lab 2018 PDF
2 pages
Studbolt Catalouge
No ratings yet
Studbolt Catalouge
47 pages
Theory of Automata Assignment
No ratings yet
Theory of Automata Assignment
4 pages
Priyesh Physics Investigatory Proj
No ratings yet
Priyesh Physics Investigatory Proj
9 pages
Using Python To Explore GOES-16 Data
No ratings yet
Using Python To Explore GOES-16 Data
13 pages
Wind Farm Control: The Route To Bankability
No ratings yet
Wind Farm Control: The Route To Bankability
27 pages
Guia Ilagan Sauz Balinado Pedal Power Generation
No ratings yet
Guia Ilagan Sauz Balinado Pedal Power Generation
5 pages
CS423 Raw Sockets BW
No ratings yet
CS423 Raw Sockets BW
34 pages
Energy For Sustainable Development: Martin Anyi, Brian Kirke
No ratings yet
Energy For Sustainable Development: Martin Anyi, Brian Kirke
7 pages
Equilibrio de Fases
No ratings yet
Equilibrio de Fases
7 pages
Gigamon Gigavue VM Virtual Machine 4022
No ratings yet
Gigamon Gigavue VM Virtual Machine 4022
7 pages
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)

Assgmt 1

Uploaded by

Assgmt 1

Uploaded by

IIT B OMBAY

Compress your submission directory using the command: tar -cvzf

Part I: Numpy Basics (15 points)

part1A.py has two functions:

2. plot_hist(trial): Takes an array of the number of heads obtained for k trials

1. Complete the function definitions in part1A.py and submit it as

2. Create a directory named histograms inside the directory assgmt1/.

3. Save the histograms obtained by plot_hist() in histograms with names for-

1. Use plt.savefig() function to save the histograms.

Here is an example histogram for num_trials = 10000.

(B) Tensor Manipulations and Softmax. Consider N column vectors

1. Compute xi = uiT M1 , ∀i ∈ {1, . . . , N }. Stack the N row vectors xi to form a new

3. Compute Z = X̂YT , Z ∈ R N × N . Let

Apply the following “sparsify" operation on Z (examples shown below for N =

You should use numpy broadcasting in this step. [2 pts]

Add your solution to part1B.py that already contains an initialise_input function

Part II: Linear Regression (30 points)

Make the following changes to closedForm.py:

3. plot_learned_equation(): This function should generate a plot (to be saved

1. generate_toy_dataset() can be used to generate the data. On completing

Your task is to complete the following functions:

5. plot_loss(): This function is used to plot the losses stored in error_list of

6. plot_learned_equation(): This function generates a plot (save the image as

1. Similar to the previous question, to generate these plots, simply execute:

2. You can run test cases using: [4 pts]

2. transform_input(): This function should take X ∈ Rn×1 as input and return

Part III: Kaggle Competition (5 points)

Challenge: Predict Sentiments from Compressed High-dimensional Features.

You might also like