Questions For Viva
Questions For Viva
3. Explain the difference between simple linear regression and multiple linear
regression.
4. How do you interpret the slope coefficient in a linear regression model?
5. What is the role of the intercept term in a linear regression equation?
6. What is the ordinary least squares (OLS) method, and how is it used in
linear regression?
7. What is the purpose of residual analysis in linear regression?
8. How do you check for multicollinearity in multiple linear regression?
12. What are influential points, and how do they affect linear regression anal-
ysis?
13. Describe the process of model selection in linear regression.
14. What are the assumptions of independence of errors in linear regression?
1
19. Explain the difference between explanatory and response variables in the
context of linear regression.
20. How do you handle outliers in linear regression analysis?
21. Explain the role of the p-values for model selection and hypothesis testing
in the context of regression modelling and also the potential limitations.
Typical Answers
1. A linear model is a statistical approach used to describe the relationship
between a dependent variable and one or more independent variables,
assuming a linear relationship.
2. The key assumptions of a linear regression model include linearity, inde-
pendence of errors, homoscedasticity, and normality of errors.
2
12. Influential points are data points that have a significant impact on regres-
sion model parameters.
13. Model selection in linear regression involves choosing the most appropriate
set of independent variables using techniques such as stepwise regression
or information criteria.
14. The assumption of independence of errors in linear regression means that
errors are not correlated with each other.
15. Categorical variables can be included in a linear regression model by con-
verting them into dummy variables.
16. Correlation analysis measures the strength and direction of linear relation-
ship between two continuous variables, while regression analysis predicts
the relationship between variables.
17. Transforming variables in linear regression can help meet model assump-
tions such as linearity and normality.
18. R-squared represents the proportion of variance in the dependent variable
explained by independent variables.
19. Explanatory variables are independent variables that explain variation in
the dependent variable, while response variable is the variable of interest
being predicted.
20. Outliers in linear regression analysis can be handled by removing them,
transforming variables, or using robust regression techniques.
21. p-values
(a) Variable Selection: In stepwise regression, for instance, variables
are added or removed based on their p-values. Variables with p-values
below a certain threshold (e.g., 0.05) are typically included in the
model, while those with higher p-values are excluded. This process
continues iteratively until no further variables meet the inclusion or
exclusion criteria.
(b) Hypothesis Testing: Each coefficient in a linear regression model
comes with an associated p-value, indicating the probability of ob-
serving that coefficient if the true coefficient were zero (i.e., if there
were no relationship between the predictor and the response vari-
able). If a predictor’s coefficient has a p-value below a significance
level (e.g., 0.05), it suggests that the predictor is statistically sig-
nificant and contributes to explaining the variation in the response
variable.
(c) Limitations:Multiplicity Problem: If you perform multiple hypoth-
esis tests (e.g., testing the significance of many coefficients), the like-
lihood of making a Type I error (false positive) increases. This is
3
known as the multiplicity problem, and adjustments (such as Bon-
ferroni correction) may be necessary. Overfitting: Relying solely on
p-values for variable selection can lead to overfitting, where the model
performs well on the training data but poorly on new data. Over-
fitting occurs when the model is too complex relative to the amount
of available data, and it often results from including variables that
are not truly related to the response variable but happen to have
low p-values by chance. Context Dependence: The interpretation
of p-values depends on various factors, including sample size, effect
size, and the quality of the data. A small p-value does not necessar-
ily imply a strong practical or meaningful relationship between the
predictor and the response variable.
Therefore, while p-values can be a useful tool for model selection in linear
regression, it’s essential to consider them alongside other criteria, such as
effect size, theoretical relevance, and model performance metrics like ad-
justed R-squared or AIC (Akaike Information Criterion), to ensure that
the selected model is both statistically significant and practically mean-
ingful.