Student - Linear Regression Example - Colaboratory
Student - Linear Regression Example - Colaboratory
Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please reru
enable.
Saving Salary_Data.csv to Salary_Data.csv
0 1.1 39343.0
dataset.describe()
1 1.3 46205.0
2 YearsExperience
1.5 37731.0 Salary
count
3 30.000000
2.0 43525.030.000000
mean
4 5.313333
2.2 76003.000000
39891.0
5std 2.837888
2.9 27414.429785
56642.0
6min 1.100000
3.0 37731.000000
60150.0
725% 3.200000
3.2 56720.750000
54445.0
850% 4.700000
3.2 65237.000000
64445.0
975% 7.700000
3.7 100544.750000
57189.0
max
10 10.500000
3.9 122391.000000
63218.0
11 4.0 55794.0
# Mounting Google Drive
12 4.0 56957.0
from google.colab import drive
drive.mount('/content/drive')
13 4.1 57081.0
14
Drive 4.5 at
already mounted 61111.0
/content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount
15 4.9 67938.0
16 5.1 66029.0
# Importing the dataset
17 = pd.read_csv('/content/drive/My
# dataset 5.3 83088.0 Drive/ATAL/Salary_Data.csv')
18 5.9 81363.0
---------------------------------------------------------------------------
FileNotFoundError
19 6.0 93940.0 Traceback (most recent call last)
<ipython-input-6-242e04d314aa> in <module>()
20 1 # Importing
6.8 the91738.0
dataset
----> 2 dataset = pd.read_csv('/content/drive/My Drive/ATAL/Salary_Data.csv')
21 7.1 98273.0
4 frames
22 7.9 101302.0
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
23 2008 kwds["usecols"]
8.2 113812.0 = self.usecols
2009
->
24 2010 self._reader
8.7 109431.0= parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
25 2012 9.0 105582.0
26 9.5 116969.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
27 9.6 112635.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
28 10.3 122391.0
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/ATAL/Salary_Data.csv'
29 10.5 121872.0
SEARCH STACK OVERFLOW
print(dataset)
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0
dataset.shape
(30, 2)
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 608.0 bytes
print(X)
[[ 1.1]
[ 1.3]
[ 1.5]
[ 2. ]
[ 2.2]
[ 2.9]
[ 3. ]
[ 3.2]
[ 3.2]
[ 3.7]
[ 3.9]
[ 4. ]
[ 4. ]
[ 4.1]
[ 4.5]
[ 4.9]
[ 5.1]
[ 5.3]
[ 5.9]
[ 6. ]
[ 6.8]
[ 7.1]
[ 7.9]
[ 8.2]
[ 8.7]
[ 9. ]
[ 9.5]
[ 9.6]
[10.3]
[10.5]]
print(y)
[ 39343 46205 37731 43525 39891 56642 60150 54445 64445 57189
63218 55794 56957 57081 61111 67938 66029 83088 81363 93940
91738 98273 101302 113812 109431 105582 116969 112635 122391 121872]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
print(X_train)
[[ 2.9]
[ 5.1]
[ 3.2]
[ 4.5]
[ 8.2]
[ 6.8]
[ 1.3]
[10.5]
[ 3. ]
[ 2.2]
[ 5.9]
[ 6. ]
[ 3.7]
[ 3.2]
[ 9. ]
[ 2. ]
[ 1.1]
[ 7.1]
[ 4.9]
[ 4. ]]
print(X_test)
[[ 1.5]
[10.3]
[ 4.1]
[ 3.9]
[ 9.5]
[ 8.7]
[ 9.6]
[ 4. ]
[ 5.3]
[ 7.9]]
print(y_test)
[ 37731 122391 57081 63218 116969 109431 112635 55794 83088 101302]
print(y_train)
YearsExperience= 10
print("Salary for given Years of Experience is : %.f" %(regressor.predict([[YearsExperience]])))
MAE 3426
RMSE 59