ML Assignment 3
ML Assignment 3
ipynb - Colab
df = pd.read_csv('temperatures.csv');
df
0 1901 22.40 24.14 29.07 31.91 33.41 33.18 31.21 30.39 30.47 29.97 27.31 24.49 28.96 23.27 31.46 31.27 27.25
1 1902 24.93 26.58 29.77 31.78 33.73 32.91 30.92 30.73 29.80 29.12 26.31 24.04 29.22 25.75 31.76 31.09 26.49
2 1903 23.44 25.03 27.83 31.39 32.91 33.00 31.34 29.98 29.85 29.04 26.08 23.65 28.47 24.24 30.71 30.92 26.26
3 1904 22.50 24.73 28.21 32.02 32.64 32.07 30.36 30.09 30.04 29.20 26.36 23.63 28.49 23.62 30.95 30.66 26.40
4 1905 22.00 22.83 26.68 30.01 33.32 33.25 31.44 30.68 30.12 30.67 27.52 23.82 28.30 22.25 30.00 31.33 26.57
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112 2013 24.56 26.59 30.62 32.66 34.46 32.44 31.07 30.76 31.04 30.27 27.83 25.37 29.81 25.58 32.58 31.33 27.83
113 2014 23.83 25.97 28.95 32.74 33.77 34.15 31.85 31.32 30.68 30.29 28.05 25.08 29.72 24.90 31.82 32.00 27.81
114 2015 24.58 26.89 29.07 31.87 34.09 32.48 31.88 31.52 31.55 31.04 28.10 25.67 29.90 25.74 31.68 31.87 28.27
115 2016 26.94 29.72 32.62 35.38 35.72 34.03 31.64 31.79 31.66 31.98 30.11 28.01 31.63 28.33 34.57 32.28 30.03
116 2017 26.45 29.46 31.60 34.95 35.84 33.82 31.88 31.72 32.22 32.29 29.60 27.18 31.42 27.95 34.13 32.41 29.69
df.head()
0 1901 22.40 24.14 29.07 31.91 33.41 33.18 31.21 30.39 30.47 29.97 27.31 24.49 28.96 23.27 31.46 31.27 27.25
1 1902 24.93 26.58 29.77 31.78 33.73 32.91 30.92 30.73 29.80 29.12 26.31 24.04 29.22 25.75 31.76 31.09 26.49
2 1903 23.44 25.03 27.83 31.39 32.91 33.00 31.34 29.98 29.85 29.04 26.08 23.65 28.47 24.24 30.71 30.92 26.26
3 1904 22.50 24.73 28.21 32.02 32.64 32.07 30.36 30.09 30.04 29.20 26.36 23.63 28.49 23.62 30.95 30.66 26.40
df.describe()
YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP
count 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 11
mean 1959.000000 23.687436 25.597863 29.085983 31.975812 33.565299 32.774274 31.035897 30.507692 30.486752 2
std 33.919021 0.834588 1.150757 1.068451 0.889478 0.724905 0.633132 0.468818 0.476312 0.544295
min 1901.000000 22.000000 22.830000 26.680000 30.010000 31.930000 31.100000 29.760000 29.310000 29.070000 2
25% 1930.000000 23.100000 24.780000 28.370000 31.460000 33.110000 32.340000 30.740000 30.180000 30.120000 2
50% 1959.000000 23.680000 25.480000 29.040000 31.950000 33.510000 32.730000 31.000000 30.540000 30.520000 2
75% 1988.000000 24.180000 26.310000 29.610000 32.420000 34.030000 33.180000 31.330000 30.760000 30.810000 3
max 2017.000000 26.940000 29.720000 32.620000 35.380000 35.840000 34.480000 32.760000 31.840000 32.220000 3
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 1/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
df.tail()
112 2013 24.56 26.59 30.62 32.66 34.46 32.44 31.07 30.76 31.04 30.27 27.83 25.37 29.81 25.58 32.58 31.33 27.83
113 2014 23.83 25.97 28.95 32.74 33.77 34.15 31.85 31.32 30.68 30.29 28.05 25.08 29.72 24.90 31.82 32.00 27.81
114 2015 24.58 26.89 29.07 31.87 34.09 32.48 31.88 31.52 31.55 31.04 28.10 25.67 29.90 25.74 31.68 31.87 28.27
115 2016 26.94 29.72 32.62 35.38 35.72 34.03 31.64 31.79 31.66 31.98 30.11 28.01 31.63 28.33 34.57 32.28 30.03
df.shape
(117, 18)
df.sum().isnull()
YEAR False
JAN False
FEB False
MAR False
APR False
MAY False
JUN False
JUL False
AUG False
SEP False
OCT False
NOV False
DEC False
ANNUAL False
JAN-FEB False
MAR-MAY False
JUN-SEP False
OCT-DEC False
df.isnull().sum()
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 2/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
YEAR 0
JAN 0
FEB 0
MAR 0
APR 0
MAY 0
JUN 0
JUL 0
AUG 0
SEP 0
OCT 0
NOV 0
DEC 0
ANNUAL 0
JAN-FEB 0
MAR-MAY 0
JUN-SEP 0
OCT-DEC 0
x = df["YEAR"]
y = df["ANNUAL"]
plt.plot(x,y,'o')
[<matplotlib.lines.Line2D at 0x7b7259bea350>]
sns.scatterplot(x=x,y=y,data=df)
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 3/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
model = LinearRegression()
type(x)
pandas.core.series.Series
def __init__(data=None, index=None, dtype: Dtype | None=None, name=None, copy: bool | None=None,
fastpath: bool=False) -> None
/usr/local/lib/python3.10/dist-packages/pandas/core/series.py
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of
x.shape
(117,)
x = x.values
x = x.reshape(117,1)
x.shape
(117, 1)
type(x)
numpy.ndarray
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 4/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
print(f"x test dataset: {x_test.shape}")
print(f"y test dataset: {y_test.shape}")
model = LinearRegression()
model.fit(x_train,y_train)
▾ LinearRegression
LinearRegression()
model.coef_ #w
array([0.01279507])
model.intercept_ #b
4.1011851987150685
y_pred = model.predict(x_test)
y_pred.shape
(30,)
sns.regplot(data=df,x=x_train,y=y_train,)
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 5/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
<Axes: ylabel='ANNUAL'>
MSE: 0.1972410753986664
MAE: 0.30463888560251223
R-Sqaure : 0.48700463368609614
https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 6/6