0% found this document useful (0 votes)

6 views7 pages

ML Asssignment Subjective Questions Answers

The document discusses various aspects of data analysis and linear regression, including the impact of categorical variables on bike rentals, the importance of using drop_first=True in dummy variable creation, and the validation of linear regression assumptions. It also explains linear regression, Anscombe's quartet, Pearson's R, scaling methods, VIF, and the significance of Q-Q plots in assessing data distribution. Key findings include the top features affecting bike demand and the necessity of data visualization for accurate modeling.

Uploaded by

ravaligan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

ML Asssignment Subjective Questions Answers

Uploaded by

ravaligan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment based subjective questions:

1. From your analysis of the categorical variables from the dataset, what could you infer
about their effect on the dependent variable?

Season  Fall season had highest number of bike rentals whereas spring had least number of rentals

Weathersit  Bike rentals are high in clear and partly cloudy weather condition and low in light
snow condition

Weekday  Count is almost same throughout the week

Mnth  September has highest rentals while Jan has least rentals as expected according to weather
conditions.

2. Why is it important to use drop_first=True during dummy variable creation?

Dummy variables are always created according the number of distinct values of that feature.
Ex: Let’s say for color column we have 3 values as black, white and beige. If we execute
dummy variables command, they get created for all 3 colors and it’s redundant.
Drop_first removes the first color column that gets created and hence redundancy is
eliminated. We can improve performance significantly when the number of columns in the
dataset is more.
Ideally we should have 4 columns for season as we have 4 distinct values but we have defined
drop_first = True and hence we don’t have season_fall column and if we have ‘000’ for all those 3
columns then we can identify that as season_fall.

3. Looking at the pair-plot among the numerical variables, which one has the highest
correlation with the target variable? (1 mark)
From the above plot, we can clearly see that temp and atemp features are clearly correlated with the
target variable cnt

4.How did you validate the assumptions of Linear Regression after building the model on the
training set? (3 marks)

1. Error terms should follow normal distribution as per the assumption of linear regression model
and we clearly see that the below plot of the residuals follow normal distribution.
2. Linear regression model assumes that the variables are independent and relationship between
dependant and independent variables is linear. We visualised this using a pairplot of numeric
variables.

3. Multicollinearity shouldn’t be there among the variables as per the assumption of linear
regression model. We calculated VIF for each model and eliminated the variables that are highly
correlated and have got the idea of how much collinearity each variables has and our final model has
variables that are not correlated with each other

5. Based on the final model, which are the top 3 features contributing significantly towards
explaining the demand of the shared bikes? (2 marks)

The top 3 features contributing significantly are

 Yr coefficient 0.2343

 Temp  coefficient  0.4352
 Weathersit _light_rain and thunderstorm  coefficient -0.2961

General Subjective questions:

1.Explain the linear regression algorithm in detail.

Linear Regression is an algorithm that describes relationship between two variables (dependent and
independent ) to predict the outcome of future events. It’s a supervised learning algorithm that gives
relationship between variables and makes predictions.

Ex: Predicting sales based on different factors such as product prices, marketing spends ..

Linear Regression model is of two types: Simple linear regression and Multiple Linear Regression

Simple Linear Regression:

It involves one independent variable and one dependant variable and is the simplest form of linear
regression

𝑦 = 𝛽0 + 𝛽1𝑋

𝛽0 = 𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛽1 = 𝑆𝑙𝑜𝑝𝑒
y= dependent variable
X= independent variable

Multiple Linear Regression :

It involves more than one independent variable and dependent variable and it’s equation is given by
𝑦 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 + ⋯ . + 𝛽𝑛𝑋𝑛
𝛽0 = 𝐼𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝛽1, 𝛽2 … 𝛽𝑛 𝑎𝑟𝑒 𝑡h𝑒 𝑆𝑙𝑜𝑝𝑒𝑠
y= dependent variable
X1.,X2 ,.. Xn are the independent variable
The goal of linear regression algorithm is to find the best fit line that predicts the outcome based on
independent variables .

2. Explain the Anscombe’s quartet in detail.

Anscombes quartet was constructed by statistician Franscis Anscombe in 1973 and consists of four
datasets having similar properties such as mean, variance, R2 but when plotted , they represent
different visualisation trends. It is designed to highlight the importance of Exploratory data analysis
and visualising data despite having similar mean, variance and other statistical properties as similar
statistical properties alone can be misleading while deriving insights .

The four datasets consists of 11 x-y pairs of data, when we plot them , they seem to have unique
connection between x and y , in terms of variability patterns but they all have same summary
statistics such as correlation coefficient , linear regression line and so on .

It helps to understand the importance of data visualisation and how easy it is to fool a regression
algorithm. Before trying to implement any machine learning algorithm ,we need to first visualise the
data set in order to build a well fit model.

3. What is Pearson’s R?

Pearson’s R summarizes the characteristics of a dataset and describes the strength and direction of
linear relationship between two variables. It lies between -1 and 1.

Between 0 and 1 means positive correlation. Zero indicates no correlation. Between 0 and -1
indicates negative correlation and it’s most widely used correlation coefficient and it’s good choice
to use it when the below conditions are true

 when both variables are quantitative

 when the variables are normally distributed
 when the data have no outliers
 when the linear relationship is linear.

It is also an inferential statistic ,meaning that it can be used to test statistical hypothesis . Specifically,
we can test whether there is significant relationship between two variables.

4. What is scaling? Why is scaling performed? What is the difference between normalized scaling
and standardized scaling?

Scaling is a pre-processing step that is performed on independent variables to normalize the data
within a particular range as it helps in performing calculations in algorithm.

It is performed to convert the data from one magnitude/range to the expected unit as per the data
set we have as most of the times, features in dataset have different magnitude/range. If scaling is not
done, then the coefficients will not be accurate since algorithm takes only magnitude into account
and not units and hence incorrect modelling. To solve this issue, scaling needs to be done so that all
features will be in same magnitude/range.

There are 2 types of scaling.

1. Normalised /Min-Max Scaling

2. Standardization scaling

Min-Max Scaling :

It lies in the range of 0 and 1.

Min-Max scaling x = x-min(x)/max(x)-min(x)

Standardization scaling :

It brings all of the data into a standard normal distribution which has zero mean and one standard
deviation.

Standardization x = x-mean(x)/sd(x)

Normalisation loses some data about outliers over standardization method and hence it’s the
disadvantage of normalisation.

5. You might have observed that sometimes the value of VIF is infinite. Why does this happen?

VIF is calculated using below formula

VIF = 1/1-R2

If R2 is equal to 0 then VIF will be infinite and it means infinite VIF. It denotes perfect correlation in
variables. Large value of VIF indicates that there is a correlation between the variables and the range
to consider VIF while modelling the data is given below :

1 No multicollinearity
4-5 moderate
10 or greater Severe

If VIF is large , we need to take some action before we go for multiple regression. Drop the features
with large VIF and , calculate VIF for the other features and hopefully VIF changes for them.

6.What is a Q-Q plot? Explain the use and importance of a Q-Q plot in linear regression.

Q-Q plots known as Quantile-Quantile plots and they plot the quantiles of a sample distribution
against quantiles of a theoretical distribution. Q-Q plot is a graphical tool to help us assess if a set of
data come from some theoretical distribution such as normal or uniform distribution. It helps to
determine if two datasets come from population with a common distribution.
It helps in a scenario when training and test data set received separately and then we confirm using
Q-Q plot that both the data sets are from populations with same distribution.

Quantitative Techniques For Management PDF
88% (8)
Quantitative Techniques For Management PDF
507 pages
Linear Regression Assignment Questions and Answer
No ratings yet
Linear Regression Assignment Questions and Answer
7 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Varahamihira
100% (2)
Varahamihira
6 pages
Unit V - Update
No ratings yet
Unit V - Update
53 pages
Bike Sharing Assignment
100% (6)
Bike Sharing Assignment
7 pages
Interview Questions Companie
No ratings yet
Interview Questions Companie
72 pages
Linear Regression Subjective Questions
No ratings yet
Linear Regression Subjective Questions
14 pages
Remote Sensing Image Processing
100% (1)
Remote Sensing Image Processing
137 pages
Assignment Linear Regression
No ratings yet
Assignment Linear Regression
10 pages
Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
Subjective Questions
92% (13)
Subjective Questions
6 pages
Cognitive Class - Answers Data Analysis With Python
No ratings yet
Cognitive Class - Answers Data Analysis With Python
6 pages
Final Cc01 Group7
No ratings yet
Final Cc01 Group7
23 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
Subjective Ques SKS
No ratings yet
Subjective Ques SKS
8 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
Explain The Linear Regression Algorithm in Detail
No ratings yet
Explain The Linear Regression Algorithm in Detail
12 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Subjective Questions
No ratings yet
Subjective Questions
3 pages
Linear Regression Assignment - Subjective
No ratings yet
Linear Regression Assignment - Subjective
7 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
CM
No ratings yet
CM
8 pages
Linear Regression Skills Quiz
No ratings yet
Linear Regression Skills Quiz
13 pages
CC02 Group6 Report
No ratings yet
CC02 Group6 Report
36 pages
Information Retrieval Important Questions
No ratings yet
Information Retrieval Important Questions
20 pages
FALLSEM2019-20 MAT1011 ETH VL2019201005249 Reference Material II 19-Sep-2019 Solved-Problems Double-and-Triple-Integrals-2 PDF
100% (1)
FALLSEM2019-20 MAT1011 ETH VL2019201005249 Reference Material II 19-Sep-2019 Solved-Problems Double-and-Triple-Integrals-2 PDF
18 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Bda Answers
No ratings yet
Bda Answers
6 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Linear Regression Subjective Questions
No ratings yet
Linear Regression Subjective Questions
17 pages
Subjective Questions Answers
No ratings yet
Subjective Questions Answers
14 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Unit 4 - R Programming
No ratings yet
Unit 4 - R Programming
26 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Bike Assignment - Subjective Sol
No ratings yet
Bike Assignment - Subjective Sol
5 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
Econometrics For Finance (2017-I)
No ratings yet
Econometrics For Finance (2017-I)
6 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
7 OLS Assumptions
No ratings yet
7 OLS Assumptions
37 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
CIA Understanding
No ratings yet
CIA Understanding
5 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Linear Regression Firm Basit PDF
No ratings yet
Linear Regression Firm Basit PDF
21 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
UNIT 1 DS - End - Term
No ratings yet
UNIT 1 DS - End - Term
6 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
4 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Topic - 9 PDF
No ratings yet
Topic - 9 PDF
12 pages
SMDM Predictive Modeling Business Report 05.02.2022 PDF
No ratings yet
SMDM Predictive Modeling Business Report 05.02.2022 PDF
38 pages
Assignment-Based Subjective Questions
No ratings yet
Assignment-Based Subjective Questions
1 page
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
No ratings yet
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
9 pages
5 Three Phase System1
No ratings yet
5 Three Phase System1
28 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
BUS 3304 Unit 6 Assignment
No ratings yet
BUS 3304 Unit 6 Assignment
3 pages
Le Maitre 1976
No ratings yet
Le Maitre 1976
10 pages
The J Integral: Fracture Mechanics, With or Without Field Theory
No ratings yet
The J Integral: Fracture Mechanics, With or Without Field Theory
13 pages
Math 9 - Q1 - Week 5 - Module 6 - Quadratic Inequalities - Reproduction
No ratings yet
Math 9 - Q1 - Week 5 - Module 6 - Quadratic Inequalities - Reproduction
34 pages
AF12 Chapter 5 Solutions
No ratings yet
AF12 Chapter 5 Solutions
111 pages
Principles of Artificial Intelligence
No ratings yet
Principles of Artificial Intelligence
15 pages
Similarity Class 9 and 10
No ratings yet
Similarity Class 9 and 10
9 pages
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
No ratings yet
Tutorial 4 - MATRIX and LINEAR - DE - WITH SOLUTION 2020
26 pages
M290 Lecture1h
No ratings yet
M290 Lecture1h
16 pages
? Answerbook - JL - Pluto - JL
No ratings yet
? Answerbook - JL - Pluto - JL
32 pages
A Review On Cartans Structure Equations For Certa
No ratings yet
A Review On Cartans Structure Equations For Certa
7 pages
CLRS Solution Chapter31
No ratings yet
CLRS Solution Chapter31
22 pages
Learning Targets 1-16 Practice Problems
No ratings yet
Learning Targets 1-16 Practice Problems
4 pages
GATE 2024 Mining Engineering MN Solutions
No ratings yet
GATE 2024 Mining Engineering MN Solutions
8 pages
Semantics
100% (2)
Semantics
14 pages
MATLAB Lecture 3 - Built-In Matlab Functions
No ratings yet
MATLAB Lecture 3 - Built-In Matlab Functions
24 pages
One Dimensional Array in Java - Tutorial & Example
No ratings yet
One Dimensional Array in Java - Tutorial & Example
4 pages
2 Mesh Analysis
No ratings yet
2 Mesh Analysis
16 pages
CMPS161 Class Notes Chap 03
No ratings yet
CMPS161 Class Notes Chap 03
20 pages
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
No ratings yet
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
17 pages
Thyroid Cancer Letter
No ratings yet
Thyroid Cancer Letter
8 pages
Practice Sheet
No ratings yet
Practice Sheet
2 pages
QSPM 1
No ratings yet
QSPM 1
4 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

ML Asssignment Subjective Questions Answers

Uploaded by

ML Asssignment Subjective Questions Answers

Uploaded by

Assignment based subjective questions:

Weekday  Count is almost same throughout the week

2. Why is it important to use drop_first=True during dummy variable creation?

The top 3 features contributing significantly are

 Yr coefficient 0.2343

General Subjective questions:

1.Explain the linear regression algorithm in detail.

Simple Linear Regression:

Multiple Linear Regression :

2. Explain the Anscombe’s quartet in detail.

 when both variables are quantitative

There are 2 types of scaling.

1. Normalised /Min-Max Scaling

It lies in the range of 0 and 1.

Min-Max scaling x = x-min(x)/max(x)-min(x)

VIF is calculated using below formula

You might also like