0% found this document useful (0 votes)
11 views2 pages

Data Modeling Featurization Visualization Example

The document provides a step-by-step guide for data modeling, featurization, and visualization using Python. It includes importing necessary libraries, creating sample data, applying one-hot encoding for categorical variables, visualizing the relationship between area and price, and building a linear regression model. The final output compares predicted prices with actual prices based on the model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views2 pages

Data Modeling Featurization Visualization Example

The document provides a step-by-step guide for data modeling, featurization, and visualization using Python. It includes importing necessary libraries, creating sample data, applying one-hot encoding for categorical variables, visualizing the relationship between area and price, and building a linear regression model. The final output compares predicted prices with actual prices based on the model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Modeling, Featurization, and Visualization - Python Example

Step 1: Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

Step 2: Create Sample Data

data = {
'Area': [1000, 1500, 2000, 2500, 1200],
'Rooms': [2, 3, 4, 4, 2],
'City': ['Mumbai', 'Delhi', 'Delhi', 'Mumbai', 'Chennai'],
'Price': [50, 70, 100, 110, 60] # in Lakhs
}

df = pd.DataFrame(data)
print("Sample Data:")
print(df)

Step 3: Featurization (One-Hot Encoding)

encoder = OneHotEncoder(sparse=False)
city_encoded = encoder.fit_transform(df[['City']])
city_labels = encoder.get_feature_names_out(['City'])

city_df = pd.DataFrame(city_encoded, columns=city_labels)


df_featurized = pd.concat([df.drop('City', axis=1), city_df], axis=1)
print("Featurized Data:")
print(df_featurized)

Step 4: Visualization (Area vs Price)

plt.figure(figsize=(6,4))
sns.scatterplot(x='Area', y='Price', data=df)
plt.title('Area vs Price')
plt.xlabel('Area (sq ft)')
plt.ylabel('Price (Lakhs)')
plt.grid(True)
plt.show()
Data Modeling, Featurization, and Visualization - Python Example

Step 5: Data Modeling (Linear Regression)

X = df_featurized.drop('Price', axis=1)
y = df_featurized['Price']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

predicted = model.predict(X_test)
print("Predicted vs Actual:")
for i in range(len(y_test)):
print(f"Actual: {y_test.iloc[i]} Lakhs, Predicted: {round(predicted[i], 2)} Lakhs")

You might also like