We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 5
In [1]:
In [2]:
In [3]:
out[3]:
In [4]:
out [4]:
Use petrol_consumption dataset.
Your task is to predict the gas consumption (in millions of gallons) in 48 of
the US states based on petrol tax (in cents), per capita income (dollars),
paved highways (in miles) and the proportion of population with the driving
License.
Build the regression model using Random Forest Regressor.
Analyze the prediction ability of your model.
importing dependencies
import numpy as np
import pandas as pd
data collection and analysis
petrol consumption dataset
#loading the diabetes dataset to a pandas DataFrame
dataset = pd.read_csv("C:\\Users\This PC\Desktop\ml practice\petrol_consumption.<
# printing the first 5 rows of the dataset
dataset.head()
Petrol_tax Average_income Paved Highways Population_Driver_licence(%) Petrol_Consumption
° 20 3671 1976 0525 5a
1 90 4092 1250 osre 524
2 90 3865 1586 0.580 561
3 75 4870 2351 0.529 ana
‘ 80 4399 4st ose 410
»
finumber of rows and columns in this dataset
dataset. shape
(48, 5)In [5]: #getting the statistical measures of the data
dataset. describe()
out [5]:
Petrol_tax Average_income Paved Highways Population Driver_licence(%) Petrol_Consump
count 48,000000 -48,000000
mean 7.668333 4247.893333
std 0.950770 573.628768
‘min 5.000000 3063,000000
25% 7.000000 —_3739,000000
50% 7.500000 428.0000
75% 8.125000 4578.750000
max 10,0000 5342,000000
In [7]: # Preparing the Data
x = dataset. iloc[:,:-1].values
y = dataset. iloc[:,-1].values
In [8]: # Now Data Split function
from sklearn.model_selection import train_test_split
x_train,x_test,y train,y test train test_split(x,y, test_siz
In [9]: from sklearn.ensenble import RandonForestRegressor
RandomForestRegressor(max_depth=2, random_stat:
reer
regr.fit(x_train,y_train)
out[9]: RandonForestRegressor(max_dept
In [10]: #Making Prediction
y_pred = regr.predict(x_test)
48000000
18565.416867
3491.507168
431.000000
3110.250000
44735.500000
17486.000000
17782.000000
, randon_state=@)
48000000
0.570333
0.085470
0.481000
0.529750
0.584500
0.595250
0.724000
48.00
576.77
111.88
348.006
509.506
568.50
632.75¢
968,00¢
3, random_state:In [11]: df = pd.DataFrame({'Actual': y_test, ‘Predicted
df
y_pred})
out (13):
‘Actual Predicted
0 628 571.496777
1547 538.999¢80
2 648 647.580572
3 640 590.781977
4 561 558,606228
5 414 494660716
6 554 598.109298
7 srt 583.124799
8 7e2 27321213
9 631 576.802467
10574 547.557079
11524 564.262270
12540. 497.195530
13487 551.243347
14 640. 705.237167
In [12]: # Evaluating the algo
from sklearn import metrics
print(‘Mean Absolute Error:', metrics.mean_absolute_error(y test, y_pred))
print(‘Mean Squared Error:', metrics.mean_Squared_error(y test, y_pred))
print(‘Root Mean Squared Error:', np.sqrt(metrics.mean_squared error(y test, y_pr
Mean Absolute Error: 46.24860698172747
Mean Squared Error: 3472.023958285361
Root Mean Squared Error: 58.92388274957244In [16]:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(7, 9))
> ax=ax)
plt-grid()
plt.title(‘Actual vs Fitted Values for Price’)
plt.xlabel("Actual values for price”)
plt.ylabel("Fitted values for price”)
plt-legend()
pltshow()
€:\Anaconda\1ib\site-packages\seaborn\distributions.py:2619: Futurearning: “di
stplot’ is a deprecated function and will be removed in a future version. Pleas
adapt your code to use either “displot’ (a figure-level function with similar
flexibility) or “kdeplot’ (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)
:\Anaconda\1ib\site-packages\seaborn\distributions.py:2619: FutureWarning: “di
stplot’ is a deprecated function and will be removed in a future version. Pleas
adapt your code to use either “displot™ (a figure-level function with similar
flexibility) or “kdeplot’ (an axes-level function for kernel density plots).
warnings.warn(msg, FutureWarning)In
In
In
In
Ch:
Ch:
Ch
Ch
Actual vs Fitted Values for Price
awe
— fied ves
0.005 red
oes
& ons
i
2
wea
oes
cao |
a a a
‘Actual values for price
petrol_consumption_dataset[ ‘Petrol_tax' ].value_counts()
petrol_consumption_dataset..groupby( ‘Petrol_tax') .mean()
on the basis of increasing petrol tax we predict that petrol consumption is getti
»