Assignment-Based Subjective Questions/Answers
Assignment-Based Subjective Questions/Answers
Q1. From your analysis of thecategorical variables fromthe dataset,what could youinfer about their effect on the
dependent variable?
A1. The categorical variables have very effect on the target variable.
The Equation of our best fitted line is:
cnt = 0.2464yr - 0.0820holiday - 0.2353windspeed + 0.0459summer + 0.0823winter + 0.1026Aug - 0.1653Dec -
0.2074Feb - 0.2759Jan - 0.1111Nov + 0.1212Sep - 0.3090 * Light Snow - 0.0922 Mist
A2. 1.drop_first=True is important to use, as it helps in reducing the extra column created during dummy variable
creation. Hence it reduces the correlations created among dummy variables.
2.Let’s say we have 3 types of values in Categorical column and we want to create dummy variable for that column. If
one variable is not furnished and semi_furnished, then It is obvious unfurnished. So we do not need 3rd variable to
identify the unfurnished.
Hence if we have categorical variable with n-levels, then we need to use n-1 columns to represent the dummy variables.
Q5. Based on the final model, which are the top 3 features contributing significantly towards explaining the
demand of the shared bikes?
A5. The top 3 features contributing significantly towards explaining the demand of the shared bikes are :
1. Correlation
2. P-value
3. VIF
Most of the times, collected data set contains features highly varying in magnitudes, units and range. If scaling is not done
then algorithm only takes magnitude in account and not units hence incorrect modelling. To solve this issue, we have to
do scaling to bring all the variables to the same level of magnitude.It is important to note that scaling just affects the
coefficients and none of the other parameters like t-statistic, F-statistic, p-values, R-squared, etc
Normalization/Min-Max Scaling: It brings all of the data in the range of 0 and 1. sklearn.preprocessing.MinMaxScaler
helps to implement normalization in python.
Standardization Scaling: Standardization replaces the values by their Z scores. It brings all of the data into a standard
normal distribution which has mean (μ) zero and standard deviation one (σ).
Q.5. You might have observed that sometimes the value of VIF is infinite. Why does this happen?
A5. If there is perfect correlation, then VIF = infinity. This shows a perfect correlation between two independent
variables. In the case of perfect correlation, we get R2 =1, which lead to 1/(1-R2) infinity. To solve this problem we need
to drop one of the variables from the dataset which is causing this perfect multicollinearity. An infinite VIF value
indicates that the corresponding variable may be expressed exactly by a linear combination of other variables (which show
an infinite VIF as well).
Q.6. What is a Q-Q plot? Explain the use and importance of a Q-Q plot in linear regression.
A.6. Quantile-Quantile (Q-Q) plot, is a graphical tool to help us assess if a set of data plausibly came from some
theoretical distribution such as a Normal, exponential or Uniform distribution. Also, it helps to determine if two data sets
come from populations with a common distribution.
This helps in a scenario of linear regression when we have training and test data set received separately and then we can
confirm using Q-Q plot that both the data sets are from populations with same distributions.
Few advantages:
b) Many distributional aspects like shifts in location, shifts in scale, changes in symmetry, and the presence of outliers can
all be detected from this plot.