Team8 Lab3
Team8 Lab3
REPORT LAB 3
STATISTICAL ANALYSIS
1
Table of contents
Task 1: Multivariable Linear Regression
What is Multivariable Linear Regression?......................................
How is Multivariable Linear Regression used?...............................
Why does Multivariable Linear Regression work like that?..........
Example for Multivariable Linear Regression……………………
Task 2: Perform Multivariable Linear Regression with data file “Colleges and
Universities”
By using Microsoft Excel……………………………………………
By using R language…………………………………………………
By using Python language…………………………………………..
_______________________________
Task
Trần Ngọc Xuân Mai – 21522322 - Explain What is Multivariable Linear
Regression and give example
- Using Python language to perform
Multivariable Linear Regression with
data file “Colleges and Universities”
Trần Thị Mĩ Tiên - 21522674 - Explain How Multivariable Linear
Regression is used
- Using Excel to perform Multivariable
Linear Regression with data file
“Colleges and Universities”
Vũ Thị Thanh Xuân - 21522816 - Explain Why do we use Multivariable
Linear Regression and give example
- Using R language to perform
Multivariable Linear Regression with
data file “Colleges and Universities”
2
Task 1
Explanation (What, How and Why) and example of Multivariable Linear
Regression.
1. What is Multivariable Linear Regression ?
Multiple regression, also known as multiple linear regression (MLR), is a
statistical technique that uses two or more explanatory variables to predict the
outcome of a response variable. It can explain the relationship between multiple
independent variables against one dependent variable. These independent
variables serve as predictor variables, while the single dependent variable serves
as the criterion variable. You can use this technique in a variety of contexts,
studies and disciplines, including in econometrics and financial inference. [1]
3
4. Example of Multivariable Linear Regression.
Suppose you want to predict a student's final exam score (Y) based on several factors
(X1, X2, X3):
Task 2
a) Using MS Excel, R language and Python language to perform Multivariable Linear
Regression with data file: Colleges and Universities
Explain the problem:
Show Multivariable Linear Regression by Excel, R, Python with data file: Colleges and
Universities
MS Excel
4
Multiple R (Multiple Correlation Coefficient): Multiple R measures the strength and
direction of the linear relationship between the independent variables and the dependent
variable. In this case, it is approximately 0.731, which indicates a moderately strong
positive correlation between the independent variables and the dependent variable.
Standard Error (Standard Error of the Estimate): The standard error measures the
standard deviation of the residuals (the differences between observed and predicted
values). In this case, it is approximately 5.308. A lower standard error indicates that the
model's predictions are closer to the actual values.
We will calculate:
We will have:
Then R Square =
Adjusted R Square is a modified form of R-squared that takes into account the number of
predictors in the model [1 with editing]. This metric becomes relevant when there are multiple
independent variables in the analysis.
5
Standard Error represents the degree of variation between the observed and predicted
values of the dependent variable (Y) [2 with editing].
Observations refer to the sample size in the dataset. In Figure 1, there are 49 samples.
The Null Hypothesis (H0) asserts that βi = 0, indicating that the independent
variable Xi is not statistically significant [3 with editing].
A smaller p-value indicates stronger evidence against the null hypothesis. In other words,
if the p-value is very low (typically below a chosen significance level, such as 0.05), it
suggests that the data provides strong evidence to reject the null hypothesis in favor of the
alternative hypothesis . [chatgpt]
Conversely, a larger p-value (close to 1) suggests that the data doesn't provide strong
evidence against the null hypothesis, and you may not have reason to reject it. .[chatgpt]
Link: https://docs.google.com/spreadsheets/d/1tcpSCgSfvbSoS0SftGbSt4D4ELyL294s/
edit?usp=sharing&ouid=107710634019894140302&rtpof=true&sd=true
6
R language
7
Link R:
https://drive.google.com/file/d/1AOtt4LQ2plLRcQ82Wp7FJUbJ0AEP1LS_/
view?usp=sharing
Python language
1.Import the necessary libraries:
2.Load the "Colleges and Universities" data file into a Pandas DataFrame:
8
5. To print the results, retrieve the common values for the linear regression
model using the scikit-learn API.
R Square is 0.53
- Explain:
+ We are given the formula y = b0 + b1x1 + b2x2 + ... + bkxk where y depends on k variables:
and we take a sample with n observations. Here, b0 represents the intercept term, while are
the regression coefficients for the independent variables . + To determine the regression
equation, we decompose them into two matrices X and Y
· Matrix X:
· Transport of matrix X:
9
· Matrix Y:
· Matrix C, it has 5 rows and 1 column that include coefficient variables with formula:
· The formula for calculating the correlation coefficient R2 being used is as follows:
10
SSR: Sum of squared variation of prediction deviation and mean value
Link Python:
https://drive.google.com/file/d/1DlK2WjYnfzUVJd6ARScn6yDQtPNjDmS2/
view?usp=sharing
11
REFERENCES
12