0% found this document useful (0 votes)
5 views

Phase 2

The document describes building an e-commerce product recommendation system. It covers data collection, preprocessing, feature engineering, model development using collaborative filtering, and deploying a recommendation engine. The goal is to enhance user experience by providing personalized product recommendations on an e-commerce platform.

Uploaded by

Harsha Varthini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Phase 2

The document describes building an e-commerce product recommendation system. It covers data collection, preprocessing, feature engineering, model development using collaborative filtering, and deploying a recommendation engine. The goal is to enhance user experience by providing personalized product recommendations on an e-commerce platform.

Uploaded by

Harsha Varthini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

E-commerce Product Recommendation System

Introduction

E-commerce platforms have revolutionized the way we shop, offering a


vast array of products to users. However, with the increasing number of
products, users often struggle to find items that best match their
preferences and needs. Recommendation systems address this
challenge by providing personalized suggestions based on user
interactions, enhancing user engagement and satisfaction.

Objectives

Cleanse the dataset: Data cleaning involves handling missing values and
outliers to ensure data integrity and accuracy.

Explore dataset characteristics through EDA: Exploratory Data Analysis


(EDA) helps in understanding dataset characteristics through
visualization and statistical analysis.

Engineer relevant features: Feature engineering involves extracting and


creating relevant features from raw data to improve recommendation
accuracy.

2
Develop a recommendation engine: Train and deploy a
recommendation model to deliver personalized product
recommendations.

Dataset Description

The dataset includes user interaction data from an e-commerce


platform, including user profiles, product items, and user interactions
such as ratings, views, and purchases. Each row represents a user's
interaction with a specific product, forming the basis for personalized
product recommendations.

System Architecture

Our e-commerce product recommendation system consists of the


following components:

Data Collection:Collect user interaction data, including user profiles,


product items, and user interactions such as ratings, views, and
purchases.

Data Preprocessing:Cleanse the dataset by handling missing values


and outliers.Explore the dataset's characteristics through EDA.Engineer
relevant features for model development.

Model Development:Implement recommendation algorithms such as


collaborative filtering, content-based filtering, and hybrid
methods.Train the recommendation model using the preprocessed
dataset.

3
Recommendation Engine:Generate personalized product
recommendations for users based on their preferences and
interactions.Deploy the recommendation engine on the e-commerce
platform.

Data Wrangling Techniques

1. Data Description:

Head: Displaying the first few rows of the dataset to get an initial
overview.

Tail: Examining the last few rows of the dataset to ensure


completeness.

Info: Obtaining information about the dataset structure, data types,


and memory usage.

Describe: Generating descriptive statistics for numerical features to


understand their distributions and central tendencies.

CODE

import pandas as pd

import numpy as np

np.random.seed(0)

data = pd.DataFrame({

'user_id': np.random.randint(1, 100, 100),

4
'product_id': np.random.randint(1, 50, 100),

'product_category': np.random.choice(['Electronics', 'Clothing',


'Books'], 100),

'product_brand': np.random.choice(['Brand_A', 'Brand_B', 'Brand_C'],


100),

'product_popularity': np.random.randint(1, 100, 100),

'interaction_type': np.random.choice(['view', 'purchase'], 100)

})

X = data[['user_id', 'product_id', 'product_popularity']]

X = pd.get_dummies(X, columns=['product_id', 'product_popularity'],


drop_first=True)

y = data['interaction_type']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

5
import seaborn as sns

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 6))

sns.histplot(y_train, color='blue', alpha=0.5, label='Train')

sns.histplot(y_test, color='red', alpha=0.5, label='Test')

plt.title('Distribution of Interaction Types in Train and Test Sets')

plt.xlabel('Interaction Type')

plt.ylabel('Frequency')

plt.legend()

plt.show()

OUTPUT

6
2.Data Cleaning:

● Data cleaning involves handling missing values and outliers to


ensure data integrity and accuracy.

CODE

import pandas as pd

import numpy as np

data_dict = {

'user_id': [1, 2, 3, 4, 5],

'product_id': [101, 102, 103, 104, 105],

'interaction_type': ['view', 'purchase', 'view', 'purchase', 'view']

data = pd.DataFrame(data_dict)

data.dropna(inplace=True)

from scipy import stats

numeric_cols = ['user_id', 'product_id']

z_scores = np.abs(stats.z score(data[numeric_cols]))

threshold = 3

data = data[(z_scores < threshold).all(axis=1)]

7
data

OUTPUT

2. Exploratory Data Analysis (EDA)

● Exploratory Data Analysis (EDA) helps in understanding dataset


characteristics through visualization and statistical analysis.

CODE

import numpy as np

8
import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

np.random.seed(0)

n = 1000

data = pd.DataFrame({

'user_id': np.random.randint(1, 100, n),

'product_id': np.random.randint(1, 50, n),

'interaction_type': np.random.choice(['view', 'click', 'purchase'], n)

})

plt.figure(figsize=(10, 6))

sns.countplot(data['interaction_type'])

plt.title('User Interactions with Products')

plt.xlabel('Interaction Type')

plt.ylabel('Frequency')

plt.show()

OUTPUT

9
4.Feature Engineering

● Feature engineering involves extracting and creating relevant


features from raw data to improve recommendation accuracy.

CODE

import pandas as pd

data = {

'user_id': [1, 1, 2, 2, 3, 3],

'product_id': [101, 102, 101, 103, 102, 104],

'interaction_type': ['view', 'purchase', 'view', 'rating', 'purchase', 'view'],

'product_name': ['Product A', 'Product B', 'Product A', 'Product C',


'Product B', 'Product D'],

'timestamp': ['2022-01-01 10:00:00', '2022-01-01 10:15:00',


'2022-01-01 11:00:00', '2022-01-01 12:00:00', '2022-01-01 13:00:00',
'2022-01-01 14:00:00']

10
}

data = pd.DataFrame(data)

user_profiles =
data.groupby('user_id').size().to_frame('num_interactions')

data['timestamp'] = pd.to_datetime(data['timestamp'])

data['hour_of_day'] = data['timestamp'].dt.hour

data['product_category'] = data['product_name'].apply(lambda x:
x.split()[0])

data['product_brand'] = data['product_name'].apply(lambda x:
x.split()[1])

data['product_popularity'] =
data.groupby('product_name')['user_id'].transform('count')

print("Modified Dataset:")

print(data)

print("\nUser Profiles:")

print(user_profiles)

OUTPUT

11
5. Data Transformation

Data transformation involves selecting relevant features, splitting the


dataset into training and testing sets, and standardizing numerical
features to ensure consistent scaling.

CODE

X = pd.get_dummies(data[['user_id', 'product_id', 'product_category',


'product_brand']])

y = data['interaction_type']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,


random_state=42)

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler() # Instantiate StandardScaler

12
X_train_scaled = scaler.fit_transform(X_train)

X_test_scaled = scaler.transform(X_test)

X_train_scaled[:5]

OUTPUT

Model Development

Collaborative Filtering :

● Collaborative filtering is a recommendation algorithm that makes


automatic predictions about the interests of a user by collecting
preferences from many users.

CODE

data = {

'user_id': [1, 1, 2, 2, 3, 3, 4, 4],

'product_id': [101, 102, 101, 103, 102, 104, 101, 103],

'interaction_type': ['view', 'purchase', 'view', 'view', 'purchase', 'view',


'purchase', 'purchase']

13
}

data_df = pd.DataFrame(data)

interaction_type_map = {'view': 1, 'purchase': 5}

data_df['interaction_type'] =
data_df['interaction_type'].map(interaction_type_map)

reader = Reader(rating_scale=(1, 5))

data_cf = Dataset.load_from_df(data_df[['user_id', 'product_id',


'interaction_type']], reader)

trainset, testset = train_test_split(data_cf, test_size=0.2,


random_state=42)

algo_cf = KNNBasic()

algo_cf.fit(trainset)

predictions_cf = algo_cf.test(testset)

rmse_cf = accuracy.rmse(predictions_cf)

OUTPUT

14
Recommendation Engine

Collaborative Filtering Recommendations :

● Collaborative filtering recommendations are generated based on


the preferences and interactions of similar users.

CODE

def get_top_n_recommendations(user_id, n=10):

user_items = data_df[data_df['user_id'] ==
user_id]['product_id'].tolist()

all_items = data_df['product_id'].unique().tolist()

items_to_predict = list(set(all_items) - set(user_items))

predictions = [algo_cf.predict(user_id, item).est for item in


items_to_prediction

top_n_items = [x for _, x in sorted(zip(predictions, items_to_predict),


reverse=True)][:n]

return top_n_items

user_id = 12345

top_n_recommendations = get_top_n_recommendations(user_id, n=10)

top_n_recommendations

OUTPUT

15
Assumed Scenario

● Scenario: The e-commerce platform aims to enhance user


experience by providing personalized product recommendations.
● Objective: Deliver relevant and tailored product
recommendations to users.
● Target Audience: Users seeking personalized product
recommendations across various categories.

Conclusion

Phase 2 of the project focuses on preparing the dataset for building an


e-commerce product recommendation system. By employing data
wrangling techniques and system architecture, we aim to develop a
recommendation engine that delivers product recommendations to
users, thereby enhancing their shopping experience on the
e-commerce platform.

CODE

def get_top_n_recommendations(user_id, n=10):

all_products = [i for i in range(1, 1000)]

top_n_items = np.random.choice(all_products, n, replace=False)

16
return top_n_items

user_id = 12345

top_n_recommendations = get_top_n_recommendations(user_id, n=10)

print("Top 10 recommendations for user",

OUTPUT

17

You might also like