Analysis On Weight Capacity
Analysis On Weight Capacity
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session
/kaggle/input/student-bag-price-prediction-dataset/Noisy_Student_Bag_Price_Prediction_Dataset.csv
/kaggle/input/playground-series-s5e2/sample_submission.csv
/kaggle/input/playground-series-s5e2/train.csv
/kaggle/input/playground-series-s5e2/test.csv
/kaggle/input/playground-series-s5e2/training_extra.csv
train = pd.read_csv("/kaggle/input/playground-series-s5e2/train.csv")
print("Train shape",train.shape)
train_extra = pd.read_csv("/kaggle/input/playground-series-s5e2/training_extra.csv")
print("Extra Train shape",train_extra.shape)
train = pd.concat([train,train_extra],axis=0,ignore_index=True)
print("Combined Train shape",train.shape)
train.head(10)
Laptop Weight
id Brand Material Size Compartments Waterproof Style Color Price
Compartment Capacity (kg)
1 1 Jansport Canvas Small 10.0 Yes Yes Messenger Green 27.078537 68.88056
Under
2 2 Leather Small 2.0 Yes No Messenger Red 16.643760 39.17320
Armour
4 4 Adidas Canvas Medium 1.0 Yes Yes Messenger Green 17.749338 86.02312
7 7 Puma Canvas Small 1.0 Yes Yes Backpack Blue 21.488864 27.15815
Under
8 8 Polyester Medium 8.0 Yes No Tote Gray 10.207780 25.98652
Armour
Under
9 9 Nylon Medium 2.0 Yes Yes Messenger Pink 15.895100 38.48741
Armour
Unique Weight Capacity values: [11.61172281 27.07853658 16.64375995 ... 12.79080004 22.95972519
16.64173875]
plt.figure(figsize=(10, 6))
sns.histplot(train['Weight Capacity (kg)'], bins=30, kde=False)
plt.title("Distribution of Weight Capacity")
plt.xlabel("Weight Capacity")
plt.ylabel("Count")
plt.show()
train_mean = train.Price.mean()
train['pred'] = train_mean
s = np.sqrt(np.mean( (train.Price-train.pred)**2.0 ) )
print(f"Validation RMSE using Train Mean = {s}")
train.head()
Weight
Laptop
id Brand Material Size Compartments Waterproof Style Color Capacity Price pred
Compartment
(kg)
0 0 Jansport Leather Medium 7.0 Yes No Tote Black 11.611723 112.15875 81.362175
1 1 Jansport Canvas Small 10.0 Yes Yes Messenger Green 27.078537 68.88056 81.362175
Under
2 2 Leather Small 2.0 Yes No Messenger Red 16.643760 39.17320 81.362175
Armour
3 3 Nike Nylon Small 8.0 Yes No Messenger Green 12.937220 80.60793 81.362175
4 4 Adidas Canvas Medium 1.0 Yes Yes Messenger Green 17.749338 86.02312 81.362175
# Now you can use these features in a model, e.g., a simple linear regression:
from sklearn.linear_model import LinearRegression
features = ['pred_TE1', 'pred_TE2', 'pred_TE3']
lr_model = LinearRegression()
lr_model.fit(train[features], train.Price)
test = pd.read_csv("/kaggle/input/playground-series-s5e2/test.csv")
# sub = pd.read_csv("/kaggle/input/playground-series-s5e2/sample_submission.csv")
# print('Submission shape', sub.shape)
# test = pd.read_csv("/kaggle/input/playground-series-s5e2/test.csv")
# sub['Price'] = TE.transform(test['Weight Capacity (kg)'])
# sub.to_csv("submission_TE_weight_capacity.csv",index=False)
# sub.head()