Simple Linear Regression
Simple Linear Regression
ipynb - Colaboratory
Linear Regression
Labeled data means the dataset whose respective target value is already known.
The above diagram is an example of Simple Linear Regression, where change in the value
of feature 'Y' is proportional to value of 'X'.
X : Independent Variable.
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 1/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
Regression Line: It is best-fit line of the model, by which we can predict value of 'Y' for
new values of 'X'.
Linear regression makes several key assumptions about the data and the relationships it
models. Violations of these assumptions can affect the validity and reliability of the
regression results. Here are the main assumptions of linear regression:
Linearity: The relationship between the independent variable(s) and the dependent
variable is linear. This means that the change in the dependent variable for a unit
change in the independent variable is constant.
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 2/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
No Perfect Collinearity: Perfect collinearity exists when one independent variable can
be perfectly predicted by a linear combination of other independent variables. This
situation leads to a rank-deficient matrix, making it impossible to estimate unique
regression coefficients.
data.shape
(30, 3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 30 non-null int64
1 YearsExperience 30 non-null float64
2 Salary 30 non-null float64
dtypes: float64(2), int64(1)
memory usage: 848.0 bytes
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 3/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
data.isna().sum()
# So, no null values present
Unnamed: 0 0
YearsExperience 0
Salary 0
dtype: int64
data.duplicated()
# No duplicates present
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 False
13 False
14 False
15 False
16 False
17 False
18 False
19 False
20 False
21 False
22 False
23 False
24 False
25 False
26 False
27 False
28 False
29 False
dtype: bool
data.describe()
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 4/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
Model Fitting:
regressor = LinearRegression()
regressor.fit(x_train, y_train)
▾ LinearRegression
LinearRegression()
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 5/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
y_pred = regressor.predict(x_test)
y_pred
array([[39297.22202233],
[75603.43359409],
[37386.36878171],
[60316.60766914],
[63182.88753007],
[52673.19470666]])
36064238.493955195
# R2 - Score
r2 = r2_score(y_test, y_pred)
r2
0.8143022783109011
5392.453356511894
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 6/7
16/08/2023, 20:32 linearregrsalaryprediction.ipynb - Colaboratory
https://colab.research.google.com/drive/1y9LXlz6NVX77W_zD3gdi3z19XXzEpKKd#scrollTo=rr7xAu2L2YET&printMode=true 7/7