0% found this document useful (0 votes)
2 views

ML Assignment 3

The document contains a Jupyter notebook that analyzes temperature data from a CSV file using Python libraries such as pandas, numpy, seaborn, and sklearn. It includes data loading, exploration, and a linear regression model to predict annual temperatures based on the year. The notebook also visualizes the data and results using scatter plots and line plots.

Uploaded by

lucifer267302
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ML Assignment 3

The document contains a Jupyter notebook that analyzes temperature data from a CSV file using Python libraries such as pandas, numpy, seaborn, and sklearn. It includes data loading, exploration, and a linear regression model to predict annual temperatures based on the year. The notebook also visualizes the data and results using scatter plots and line plots.

Uploaded by

lucifer267302
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

9/19/24, 1:25 AM Assign3.

ipynb - Colab

import numpy as np;


import pandas as pd;
import seaborn as sns;

df = pd.read_csv('temperatures.csv');

df

JAN- MAR- JUN- OCT-


YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL
FEB MAY SEP DEC

0 1901 22.40 24.14 29.07 31.91 33.41 33.18 31.21 30.39 30.47 29.97 27.31 24.49 28.96 23.27 31.46 31.27 27.25

1 1902 24.93 26.58 29.77 31.78 33.73 32.91 30.92 30.73 29.80 29.12 26.31 24.04 29.22 25.75 31.76 31.09 26.49

2 1903 23.44 25.03 27.83 31.39 32.91 33.00 31.34 29.98 29.85 29.04 26.08 23.65 28.47 24.24 30.71 30.92 26.26

3 1904 22.50 24.73 28.21 32.02 32.64 32.07 30.36 30.09 30.04 29.20 26.36 23.63 28.49 23.62 30.95 30.66 26.40

4 1905 22.00 22.83 26.68 30.01 33.32 33.25 31.44 30.68 30.12 30.67 27.52 23.82 28.30 22.25 30.00 31.33 26.57

... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

112 2013 24.56 26.59 30.62 32.66 34.46 32.44 31.07 30.76 31.04 30.27 27.83 25.37 29.81 25.58 32.58 31.33 27.83

113 2014 23.83 25.97 28.95 32.74 33.77 34.15 31.85 31.32 30.68 30.29 28.05 25.08 29.72 24.90 31.82 32.00 27.81

114 2015 24.58 26.89 29.07 31.87 34.09 32.48 31.88 31.52 31.55 31.04 28.10 25.67 29.90 25.74 31.68 31.87 28.27

115 2016 26.94 29.72 32.62 35.38 35.72 34.03 31.64 31.79 31.66 31.98 30.11 28.01 31.63 28.33 34.57 32.28 30.03

116 2017 26.45 29.46 31.60 34.95 35.84 33.82 31.88 31.72 32.22 32.29 29.60 27.18 31.42 27.95 34.13 32.41 29.69

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

df.head()

JAN- MAR- JUN- OCT-


YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL
FEB MAY SEP DEC

0 1901 22.40 24.14 29.07 31.91 33.41 33.18 31.21 30.39 30.47 29.97 27.31 24.49 28.96 23.27 31.46 31.27 27.25

1 1902 24.93 26.58 29.77 31.78 33.73 32.91 30.92 30.73 29.80 29.12 26.31 24.04 29.22 25.75 31.76 31.09 26.49

2 1903 23.44 25.03 27.83 31.39 32.91 33.00 31.34 29.98 29.85 29.04 26.08 23.65 28.47 24.24 30.71 30.92 26.26

3 1904 22.50 24.73 28.21 32.02 32.64 32.07 30.36 30.09 30.04 29.20 26.36 23.63 28.49 23.62 30.95 30.66 26.40

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

df.describe()

YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP

count 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 117.000000 11

mean 1959.000000 23.687436 25.597863 29.085983 31.975812 33.565299 32.774274 31.035897 30.507692 30.486752 2

std 33.919021 0.834588 1.150757 1.068451 0.889478 0.724905 0.633132 0.468818 0.476312 0.544295

min 1901.000000 22.000000 22.830000 26.680000 30.010000 31.930000 31.100000 29.760000 29.310000 29.070000 2

25% 1930.000000 23.100000 24.780000 28.370000 31.460000 33.110000 32.340000 30.740000 30.180000 30.120000 2

50% 1959.000000 23.680000 25.480000 29.040000 31.950000 33.510000 32.730000 31.000000 30.540000 30.520000 2

75% 1988.000000 24.180000 26.310000 29.610000 32.420000 34.030000 33.180000 31.330000 30.760000 30.810000 3

max 2017.000000 26.940000 29.720000 32.620000 35.380000 35.840000 34.480000 32.760000 31.840000 32.220000 3

from sklearn.model_selection import train_test_split


from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 1/6
9/19/24, 1:25 AM Assign3.ipynb - Colab

df.tail()

JAN- MAR- JUN- OCT-


YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC ANNUAL
FEB MAY SEP DEC

112 2013 24.56 26.59 30.62 32.66 34.46 32.44 31.07 30.76 31.04 30.27 27.83 25.37 29.81 25.58 32.58 31.33 27.83

113 2014 23.83 25.97 28.95 32.74 33.77 34.15 31.85 31.32 30.68 30.29 28.05 25.08 29.72 24.90 31.82 32.00 27.81

114 2015 24.58 26.89 29.07 31.87 34.09 32.48 31.88 31.52 31.55 31.04 28.10 25.67 29.90 25.74 31.68 31.87 28.27

115 2016 26.94 29.72 32.62 35.38 35.72 34.03 31.64 31.79 31.66 31.98 30.11 28.01 31.63 28.33 34.57 32.28 30.03

df.shape

(117, 18)

df.sum().isnull()

YEAR False

JAN False

FEB False

MAR False

APR False

MAY False

JUN False

JUL False

AUG False

SEP False

OCT False

NOV False

DEC False

ANNUAL False

JAN-FEB False

MAR-MAY False

JUN-SEP False

OCT-DEC False

df.isnull().sum()

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 2/6
9/19/24, 1:25 AM Assign3.ipynb - Colab

YEAR 0

JAN 0

FEB 0

MAR 0

APR 0

MAY 0

JUN 0

JUL 0

AUG 0

SEP 0

OCT 0

NOV 0

DEC 0

ANNUAL 0

JAN-FEB 0

MAR-MAY 0

JUN-SEP 0

OCT-DEC 0

x = df["YEAR"]
y = df["ANNUAL"]

plt.plot(x,y,'o')

[<matplotlib.lines.Line2D at 0x7b7259bea350>]

sns.scatterplot(x=x,y=y,data=df)

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 3/6
9/19/24, 1:25 AM Assign3.ipynb - Colab

<Axes: xlabel='YEAR', ylabel='ANNUAL'>

x_train, x_test, y_train, y_test = train_test_split(x, y,test_size=0.25)

print(f"x Training dataset: {x_train.shape}")


print(f"y Training dataset: {y_train.shape}")
print(f"x test dataset: {x_test.shape}")
print(f"y test dataset: {y_test.shape}")

x Training dataset: (87,)


y Training dataset: (87,)
x test dataset: (30,)
y test dataset: (30,)

model = LinearRegression()

type(x)

pandas.core.series.Series
def __init__(data=None, index=None, dtype: Dtype | None=None, name=None, copy: bool | None=None,
fastpath: bool=False) -> None

/usr/local/lib/python3.10/dist-packages/pandas/core/series.py
One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object
supports both integer- and label-based indexing and provides a host of

x.shape

(117,)

x = x.values

x = x.reshape(117,1)

x.shape

(117, 1)

type(x)

numpy.ndarray

x_train, x_test, y_train, y_test = train_test_split(x, y,test_size=0.25)

print(f"x Training dataset: {x_train.shape}")


print(f"y Training dataset: {y_train.shape}")

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 4/6
9/19/24, 1:25 AM Assign3.ipynb - Colab
print(f"x test dataset: {x_test.shape}")
print(f"y test dataset: {y_test.shape}")

x Training dataset: (87, 1)


y Training dataset: (87,)
x test dataset: (30, 1)
y test dataset: (30,)

model = LinearRegression()

model.fit(x_train,y_train)

▾ LinearRegression
LinearRegression()

model.coef_ #w

array([0.01279507])

model.intercept_ #b

4.1011851987150685

y_pred = model.predict(x_test)

y_pred.shape

(30,)

plt.scatter(x_train, y_train, color='blue')


plt.plot(x_test, y_pred, color='red', linewidth=3)
plt.title("Temperature vs Year")
plt.xlabel("Year")
plt.ylabel("Temperature")
plt.show()

sns.regplot(data=df,x=x_train,y=y_train,)

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 5/6
9/19/24, 1:25 AM Assign3.ipynb - Colab

<Axes: ylabel='ANNUAL'>

from sklearn.metrics import mean_absolute_error,mean_squared_error,r2_score


print(f"MSE: {mean_squared_error(y_test,y_pred)}")
print(f"MAE: {mean_absolute_error(y_test,y_pred)}")
print(f"R-Sqaure : {r2_score(y_test,y_pred)}")

MSE: 0.1972410753986664
MAE: 0.30463888560251223
R-Sqaure : 0.48700463368609614

https://colab.research.google.com/drive/1L9zJu37fpdH7-NNEF-eDo8yJhKRP_GR9?authuser=0#scrollTo=jAOEaTids7wx&printMode=true 6/6

You might also like