Assignment 1
Assignment 1
Assignment 1
Note: If you are a BTech student enrolled in MM431, then you must NOT
submit your answers for this problem set. This is only for students enrolled
in MM531.
Please submit the screenshots of the output (as a single pdf file), the revised
databases and the python files (.py or .ipynb) via Google Classroom by
Thursday midnight.
Problem 1:
Write a code for entering a m x n matrix (user to enter the values of m and n) from
the console. First re-arrange the matrix such that all the row elements in a given row
are in increasing order. Then reverse all the elements of each individual column (i.e.,
elements at top goes to bottom and so on). Print out the transformed matrix.
(a) Run the code with a set of input values of your choice.
(b) Carry out the transformations manually and compare with the computer
output. Show your work.
Problem 2:
Read the dataset “MM531data.xlsx”. Normalize the y values but not the x values. You
have to perform a linear regression by writing the code from scratch without using
scikit learn.
(a) Process your data by first visualizing and then normalizing only the y
values. Save your revised database as MM531data_rev.xlsx.
(b) State your hypothesis function and loss function.
(c) Write the code for performing linear regression with the code being
optimizing using the Gradient Descent Method. The code should be written
such that the code will automatically update the learning rate instead of you
having to do this manually. Hint: This is similar to tutorial #3 except that
instead of using a single loop, you use a nested loop with the outer loop
performing the learning rate optimization and the inner loop performing the
gradient descent.
(d) Start with an initial estimate for intercept as 0.1 and slope as 0.01, as well
as a learning rate of 0.01 for both slope and intercept. Show after running
the code that your optimization converges. How many iterations were needed
for optimizing the learning rate (i.e., how many times did you have to iterate
in the outer loop before convergence)? What are the optimized values of
slope and intercept?
Problem 3:
Read the dataset “HEA_phases.xlsx” attached with this assignment and process it
(drop the zeros and nulls). You will need to write a code from scratch (i.e., without
using any of the ML libraries) in order to train a Logistic Regression model with all of
the given data (i.e., use all of it for training). Perform the following steps.
(a) Explore your dataset and get information about your dataset.
(b) Clean up the dataset – get rid of nulls, zeros and negatives. Save the new
dataset as HEA_phases_rev.xlsx
(c) Write a simple script such that the crystal structures can be treated in
numeric format. Overwrite HEA_phases_rev.xlsx created in part (c).
Here, you basically need to assign a number value (1 or 2) according to
the two crystal structures in the system.
(d) Write down the hypothesis function (i.e., logistic regression) and the loss
function that you will be minimizing.
(e) Write a script for classification of crystal structures as FCC or BCC
using Logistic Regression from scratch (i.e., without using ML libraries
such as scikit learn etc.). Use the Gradient Descent method for
optimizing the Logistic Regression. Write down the expression for the
optimized hypothesis function.