0% found this document useful (0 votes)
121 views

Hackathon Problem Statement

An e-commerce company wants to predict the top 3 categories each user might purchase from in the future based on their past transaction data. The training dataset contains user_id, product purchased, and order value for each transaction. Participants must predict the top 3 categories for each user in the test dataset and submit it in a CSV file with user_id and the 3 predicted categories (pred3) by July 17th. Submissions will be evaluated based on mean reciprocal rank and precision metrics which measure how accurate the top 3 predictions are for each user compared to actual future purchases.

Uploaded by

Tanmay Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views

Hackathon Problem Statement

An e-commerce company wants to predict the top 3 categories each user might purchase from in the future based on their past transaction data. The training dataset contains user_id, product purchased, and order value for each transaction. Participants must predict the top 3 categories for each user in the test dataset and submit it in a CSV file with user_id and the 3 predicted categories (pred3) by July 17th. Submissions will be evaluated based on mean reciprocal rank and precision metrics which measure how accurate the top 3 predictions are for each user compared to actual future purchases.

Uploaded by

Tanmay Singh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Problem Statement

An e-commerce company wants to recommend products to its users.


The company has collected only transaction data in the past. The
training dataset has only 3 columns - user_id, Product bought and
Order value of the product. Using this dataset, predict for all the users
in the training dataset, the top 3 categories that the user might buy
from.

Training dataset sample

aov = Order Value of the product

category = Product Category where the purchase was made

What do you need to predict?

For each user, predict the top 3 probable product categories that they
may purchase from, in the future.

Timeline
DEADLINE EXTENDED

21 DAYS LEFT

SUBMISSIONS OPEN SAT JUN 26

LAST DATE SAT JUL 17


Training Data

This file contains the detailed purchasing history for every user. It has
order value and the category of the product.

Training Data Target

This file contains data for some users about the category of items they
bought in future.

Test Data

This file contains the detailed purchasing history for some users. It has
the order value and the category of the product. You have to predict
the top 3 categories that the users with these user_ids will purchase
from in the future.

Evaluation
Measurements will be based on mean relevance rank
(mrr) and precision. Both the measurements are explained here.

Mean Relevance Rank

User Reciproca |
Products in the order shown Product bought
id Rank
E-readers, Kitchen Supplies, Phones, Comics,
1 1/3
Technology books Technology Books
2 Phones, Comics, Fruits None 0
3 Groceries, Fruits, Phones None 0
Fruits, Home Decor,
4 Phones, Home Decor, readers 1/2
readers
Home Decor, Home Furnishings,
5 Phones, Books, Fruits 0
Kitchen Supplies

Technology Books is the 3rd top prediction for user_id 1 and that is


the one bought by the user - hence the reciprocal rank is 1/rank of the
right prediction which is 1/3. If there is more than one product
matching, the reciprocal rank still takes only the first matching product.
For instance - user_id 4 though both Home Decor and readers are
matching, the first match product is at position 2 and hence the
reciprocal rank is 1/2. Once we get the reciprocal ranks, we do an
average of the reciprocal ranks to get the mean reciprocal rank.
Final MRR
 = ( ⅓ + ½ ) / 2 = 0.41666

Precision or Accuracy

We first find the Number of products in the prediction in each row that
matches with the number of products of the user_id. We then average
this number across all valid predictions. For the above table, precision
would look like -

User
Products in the order shown Product bought Precision
id
E-readers, Kitchen Supplies, Phones, Comics, Technology
1 1
Technology books Books
2 Phones, Comics, Fruits None NA
3 Groceries, Fruits, Phones None NA
4 Phones, Home Decor, readers Fruits, Home Decor, readers 2
Home Decor, Home Furnishings,
5 Phones, Books, Fruits NA
Kitchen Supplies

For user_id 1, one product matched and for user_id 4, two products


matched. So, accuracy is

number of items that matched / number of unique users with a


prediction.

Here it will be 3/2 = 1.5

Recall in this case, the number of items for which there is a prediction =
2/5 = 0.4

Ready to submit?

Submissions should be made in the same format as the sample


submission provided.

Sample Submission
Submissions should be made in the same format as the sample
provided.

Sample Prediction Dataset

Prediction dataset should be a .csv file with 19,981 rows (and one row for
headers) and the columns user_id and pred3 in the same format as the file
below.

You might also like