Laptop
Laptop
df = pd.read_csv('E:\\PAML\\Week_2\\laptop_price.csv',
encoding='latin1')
df
TypeName Inches
ScreenResolution \
0 Ultrabook 13.3 IPS Panel Retina Display
2560x1600
1 Ultrabook 13.3
1440x900
2 Notebook 15.6 Full HD
1920x1080
3 Ultrabook 15.4 IPS Panel Retina Display
2880x1800
4 Ultrabook 13.3 IPS Panel Retina Display
2560x1600
... ... ...
...
1298 2 in 1 Convertible 14.0 IPS Panel Full HD / Touchscreen
1920x1080
1299 2 in 1 Convertible 13.3 IPS Panel Quad HD+ / Touchscreen
3200x1800
1300 Notebook 14.0
1366x768
1301 Notebook 15.6
1366x768
1302 Notebook 15.6
1366x768
Cpu Ram Memory
\
0 Intel Core i5 2.3GHz 8GB 128GB SSD
1300 Intel Celeron Dual Core N3050 1.6GHz 2GB 64GB Flash Storage
1302 Intel Celeron Dual Core N3050 1.6GHz 4GB 500GB HDD
dataset = df.copy()
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 laptop_ID 1303 non-null int64
1 Company 1303 non-null object
2 Product 1303 non-null object
3 TypeName 1303 non-null object
4 Inches 1303 non-null float64
5 ScreenResolution 1303 non-null object
6 Cpu 1303 non-null object
7 Ram 1303 non-null object
8 Memory 1303 non-null object
9 Gpu 1303 non-null object
10 OpSys 1303 non-null object
11 Weight 1303 non-null object
12 Price_euros 1303 non-null float64
dtypes: float64(2), int64(1), object(10)
memory usage: 132.5+ KB
dataset.dropna(axis=0, inplace=True)
dataset.dropna(axis=1, inplace=True)
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 laptop_ID 1303 non-null int64
1 Company 1303 non-null object
2 Product 1303 non-null object
3 TypeName 1303 non-null object
4 Inches 1303 non-null float64
5 ScreenResolution 1303 non-null object
6 Cpu 1303 non-null object
7 Ram 1303 non-null object
8 Memory 1303 non-null object
9 Gpu 1303 non-null object
10 OpSys 1303 non-null object
11 Weight 1303 non-null object
12 Price_euros 1303 non-null float64
dtypes: float64(2), int64(1), object(10)
memory usage: 132.5+ KB
X =
df.drop(['Memory','Cpu','Gpu','ScreenResolution','laptop_ID','Price_eu
ros','Company','Product','TypeName','OpSys'], axis=1)
X['Ram']=X['Ram'].str.replace('GB','').astype(float)
X['Weight']=X['Weight'].str.replace('kg','').astype(float)
y = df['Price_euros']
0.5561474558430259
svr = SVR(kernel='linear')
svr.fit(X_train, y_train)
y_pred_data1=svr.predict(X_test)
svr.score(X_test, y_test)
0.5568239882131703
actual = y_test.to_numpy()
actual = actual.reshape(-1,1)
y_pred_data = y_pred_data.reshape(-1,1)
plt.plot(actual, color = 'red', label = 'Real data')
plt.plot(y_pred_data, color = 'blue', label = 'Predicted data')
plt.title('Real vs Predicted data')
plt.legend()
plt.show()
# Analyze the impact on accuracy. Use
# hyperparameter tuning to optimize an
# SVR model for laptop prices. Explain how this
# affects overfitting.