100% found this document useful (1 vote)
239 views4 pages

Indian Institute of Technology, Kanpur: Applied Machine Learning

This document contains a programming assignment for a linear regression model to predict crop production using area as a predictor variable. The student fits a linear regression model using gradient descent to learn the model parameters, achieving a mean absolute error of 0.26 and R2 value of 0.61. A plot of the error decreasing over 500 epochs with a learning rate of 0.005 is included. The student proposes enhancing the model to include additional features like weather and rainfall data, land quality reports, and fertilizer costs to improve predictions, as these factors also impact crop production.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
239 views4 pages

Indian Institute of Technology, Kanpur: Applied Machine Learning

This document contains a programming assignment for a linear regression model to predict crop production using area as a predictor variable. The student fits a linear regression model using gradient descent to learn the model parameters, achieving a mean absolute error of 0.26 and R2 value of 0.61. A plot of the error decreasing over 500 epochs with a learning rate of 0.005 is included. The student proposes enhancing the model to include additional features like weather and rainfall data, land quality reports, and fertilizer costs to improve predictions, as these factors also impact crop production.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Indian Institute of Technology, Kanpur

Department of Industrial Management and


Engineering

IME 673

Applied Machine Learning

Programming Assignment 1
Linear Regression

Submitted To:
Prof. Veena Bansal
DIME, IIT Kanpur

Submitted By:
Shirsendu Samanta 18125048
Q1. Plotting and understanding the data
a) There are different ways of plotting the data. You could use scatter plot. You could
plot one crop, say coconut. Plot area on the x-axis and production on the y-axis. You
can ignore other dimensions of the data. What inferences do you make from this
plot? Do you need to normalize your data?
Answer a)

 Scatter plot is used for visualization of the data. I have taken one crop “rice” for further
analysis.
Scatter plot of Rice –> Area ~ Production

Not Normalized Normalized

 From the plot we can infer that area and the amount of production is having a linear
relationship.
 For better analysis I have normalize the data , but we can see that there is not much
difference between the scatterplot before and after normalization.

b)What are the other ways of plotting the data in 2-D?


Answer b)
Other ways of plotting the data in 2-D are  Bar Chart , box-plot, histogram, pie chart
Q2. Fit a linear regression model with one variable to predict production. Decision you
must make is: how you want to use the data available to you? Use gradient descent to
learn the parameters of the model. What error measure will you use? Report model
parameters and training as well as testing error. Plot the graph of error as you train your
model. Clearly mention if you are using regularization, stratification etc.
Answer-
Linear Regression model is used to find the relationship between Area and production , since
these are the only available numerical data in the data set .
Gradient descent is used to learn the parameters of the model.
Gradient Descent:
Epochs=500, Learning Rate=0.005

Parameter/Metric Value
Coefficient 0.861796

Intercept 0.009837285

MAE 0.26

MSE 0.30

R2 0.61

Plot of the error :-

Error value vs epochs


We can clearly see that the error is decreasing with increasing of no of epochs.
Q3. Enhance your model to include more features to predict production. Are there
features that you would like to include that are not part of this dataset? Justify your
choice of features.
Answer-
We can further enhance our model by using multiple linear regression if we have more
numerical data.
Features that can be included are :-
I. Weather & Rainfall data – Crop production is highly correlated with rainfall &
weather . We can train our data with past data & can predict cost production on
the basis of weather forecast data
II. Land quality --- Land quality is also highly correlated with amount of production.
If we can get the data quality report of various states we can use that for further
prediction.
III. Fertilizer cost – Cost of fertilizer is inversely related with crop production we can
also include that in our model.

You might also like