BHMC17 P5.ipynb - Colaboratory
BHMC17 P5.ipynb - Colaboratory
ipynb - Colaboratory
data = pd.read_csv("/EconomiesOfScale.csv")
Mounted at /content/drive
data.head()
0 1.000000 95.066056
1 1.185994 96.531750
2 1.191499 73.661311
3 1.204771 95.566843
4 1.298773 98.777013
data.describe()
nans=pd.isnull(data).sum()
nans[nans>0]
data.shape[0]
1000
ax = sns.boxplot(x=data["Manufacturing Cost"])
https://colab.research.google.com/drive/1Uvk4Nqweukf7QH3oYbfyTY417obmtB4J#printMode=true 1/4
10/14/23, 12:20 PM BHMC17 p5.ipynb - Colaboratory
ax = sns.boxplot(x=data['Number of Units'])
<seaborn.axisgrid.JointGrid at 0x7cfbc13ba6b0>
array([[0. , 0.9383257 ],
[0.02066596, 0.95664687],
[0.02127763, 0.67076638],
...,
[0.86454312, 0.07467234],
https://colab.research.google.com/drive/1Uvk4Nqweukf7QH3oYbfyTY417obmtB4J#printMode=true 2/4
10/14/23, 12:20 PM BHMC17 p5.ipynb - Colaboratory
[0.87752219, 0.06422889],
[1. , 0.01934721]])
scaled_data
0 0.000000 0.938326
1 0.020666 0.956647
2 0.021278 0.670766
3 0.022752 0.944586
4 0.033197 0.984713
y = scaled_data.pop('Manufacturing Cost')
y
0 0.938326
1 0.956647
2 0.670766
3 0.944586
4 0.984713
...
995 0.048188
996 0.094207
997 0.074672
998 0.064229
999 0.019347
Name: Manufacturing Cost, Length: 1000, dtype: float64
X = scaled_data.values
"""
Split the dataset into training set and test set with an 80-20 ratio
"""
from sklearn.model_selection import train_test_split
seed=1
X_train, X_test, \
y_train, y_test = train_test_split(X, y, test_size=0.2, \
random_state=42)
▾ SVR
SVR(epsilon=0.2)
y_pred = regr.predict(X_test)
y_pred
https://colab.research.google.com/drive/1Uvk4Nqweukf7QH3oYbfyTY417obmtB4J#printMode=true 3/4
10/14/23, 12:20 PM BHMC17 p5.ipynb - Colaboratory
0.58535064, 0.21132973, 0.70114225, 0.31394558, 0.19646868,
0.28550935, 0.46230228, 0.23054964, 0.3178442 , 0.21526703,
0.20025726, 0.20334417, 0.2047899 , 0.41879638, 0.23031656,
0.37798776, 0.24097249, 0.22005137, 0.19537004, 0.28626906,
0.28661788, 0.20441796, 0.5164567 , 0.22013279, 0.28129508,
0.20147005, 0.42688768, 0.29052581, 0.42638852, 0.27804957,
0.22929735, 0.22977071, 0.19732925, 0.19497802, 0.27538463,
0.2633768 , 0.25082279, 0.23453376, 0.31433468, 0.19467478,
0.25353211, 0.19456455, 0.39709234, 0.43353714, 0.19685745,
0.1971824 , 0.26981381, 0.22775389, 0.29543255, 0.20733488,
0.20571743, 0.22817938, 0.23200193, 0.33702665, 0.22039579,
0.23372122, 0.19661119, 0.22950817, 0.19593826, 0.39894177,
0.25449299, 0.32918061, 0.23097683, 0.2329515 , 0.28591771,
0.2110974 , 0.25549309, 0.19736443, 0.29471497, 0.30469764,
0.22748822, 0.22111173, 0.25231059, 0.2048678 , 0.19524356,
0.21631618, 0.20047142, 0.29288796, 0.22982797, 0.23354053,
0.46075801, 0.36576062, 0.31622165, 0.59955219, 0.41707837,
0.49931476, 0.29957463, 0.19680618, 0.28884013, 0.21141338,
0.21829503, 0.19566166, 0.30106666, 0.19659288, 0.27753212,
0.20592251, 0.34307049, 0.23110011, 0.25478658, 0.22832641,
0.24935498, 0.28069803, 0.19986757, 0.45245932, 0.20906462,
0.21711763, 0.19448755, 0.20520075, 0.205105 , 0.20446054,
0.20282835, 0.2849949 , 0.20061093, 0.27949631, 0.28022113,
0.26771314, 0.24630999, 0.37452241, 0.2290563 , 0.2430412 ,
0.26757145, 0.29705325, 0.2948896 , 0.19535961, 0.20187177,
0.24093193, 0.20266503, 0.26391159, 0.2341701 , 0.19928893,
0.35138871, 0.26921077, 0.26709635, 0.22872646, 0.19508277,
0.23028492, 0.19709374, 0.20392945, 0.32127952, 0.31694141,
0.25830457, 0.27435644, 0.31842808, 0.21207435, 0.40956217])
mse = mean_squared_error(y_test,y_pred)
rmse= np.sqrt(mse)
rmse
0.0822903295452384
mse
0.006771698336663936
r2_score(y_test, y_pred)
0.4700318323625766
The R-square value is approximately 0.5 which is quite encouraging. We can improve this work by
trying out algorithms as well.
https://colab.research.google.com/drive/1Uvk4Nqweukf7QH3oYbfyTY417obmtB4J#printMode=true 4/4