0% found this document useful (0 votes)

81 views8 pages

Assignment 2

Uploaded by

Divya Gajera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views8 pages

Assignment 2

Uploaded by

Divya Gajera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

V RIJE U NIVERSITEIT A MSTERDAM

Assignment 2
Data Mining Techniques
A Real Life Competition

Deadline Competition: 19/05/2024, 23:59

Deadline Report: 24/05/2024, 23:59

I NTRODUCTION
By now should have a fair idea about techniques we can use, and should also have some
practical experience with mining datasets. In this second assignment, you will gain more ex-
perience, explore various techniques (and whether they work in this situation), and hopefully
learn a lot. The topic of this assignment is positioned in the area of recommender systems.
More specifically, your task is to predict what hotel a user is most likely to book. This could
greatly help companies such as Expedia (from which the dataset actually originates) to orga-
nize the search results for a user in the most suitable way.

This document describes Assignment 2 of the Data Mining Techniques course at the VU.
Please make sure you read it thoroughly and carefully. This is a group task (maximum 3 mem-
bers), and please make sure all team members contribute to the work as expected. There will
be three things to be submitted: 1) A report about the results; 2) A file to be uploaded on the
VU DMT Kaggle competition, and (3) A process report.

D ATASET AND P ROBLEM

The dataset can be downloaded from our in class Kaggle website https://www.kaggle.com/
competitions/dmt-2024-2nd-assignment. For signing up for the competition, please fol-

1
low the following link: https://tinyurl.com/join-dmt-cup.
The dataset originates from a former Kaggle competition1 . Using the dataset from the origi-
nal Kaggle competition is not allowed. The data is split into a training and a test set, train.csv
and test.csv each containing approximately 5 million records. Essentially, the dataset con-
tains information about a search query of a user for a hotel, the hotel properties that resulted
and for the training set, whether the user clicked on the hotel and booked it. The field that
are present are shown in Table 12 .

Each line in the dataset represents a combination of a search query by a user with one specific
hotel property that was shown as part of the results. Of course, a list of hotels is presented to
the user (and hence, there are multiple rules describing a single search). Lines that belong to
the same user/search are identified by the same search id. The link between the fields shown
above and the Expedia site are shown graphically in Figures 1-33 .

Figure 1: Search window

1 https://www.kaggle.com/c/expedia-personalized-sort or go to Kaggle.com > Competitions > All com-

petitions > Personalize Expedia Hotel Searches ICDM 2013

2 Primarily based on https://www.kaggle.com/c/expedia-personalized-sort/data or follow the de-
scription above and select “data” from the menu.
3 Again based on Kaggle

2
Table 1: Description of the dataset (cf. Kaggle)
Field Data Type Description
srch_id Integer The ID of the search
date_time Date/time Date and time of the search
site_id Integer ID of the Expedia point of sale (i.e. Expedia.com, Expedia.co.uk, Expedia.co.jp, ..)
visitor_location_country_id Integer The ID of the country the customer is located
visitor_hist_starrating Float The mean star rating of hotels the customer has previously purchased; null signifies there is
no purchase history on the customer
visitor_hist_adr_usd Float The mean price per night (in US$) of the hotels the customer has previously purchased; null
signifies there is no purchase history on the customer
prop_country_id Integer The ID of the country the hotel is located in
prop_id Integer The ID of the hotel
prop_starrating Integer The star rating of the hotel, from 1 to 5, in increments of 1. A 0 indicates the property has no
stars, the star rating is not known or cannot be publicized
prop_review_score Float The mean customer review score for the hotel on a scale out of 5, rounded to 0.5 increments.
A 0 means there have been no reviews, null that the information is not available
prop_brand_bool Integer +1 if the hotel is part of a major hotel chain; 0 if it is an independent hotel
prop_location_score1 Float A (first) score outlining the desirability of a hotel’s location
prop_location_score2 Float A (second) score outlining the desirability of the hotel’s location
prop_log_historical_price Float The logarithm of the mean price of the hotel over the last trading period. A 0 will occur if the
hotel was not sold in that period
price_usd Float Displayed price of the hotel for the given search. Note that different countries have different
conventions regarding displaying taxes and fees and the value may be per night or for the
whole stay
promotion_flag Integer +1 if the hotel had a sale price promotion specifically displayed
srch_destination_id Integer ID of the destination where the hotel search was performed
srch_length_of_stay Integer Number of nights stay that was searched
srch_booking_window Integer Number of days in the future the hotel stay started from the search date
srch_adults_count Integer The number of adults specified in the hotel room
srch_children_count Integer The number of (extra occupancy) children specified in the hotel room
srch_room_count Integer Number of hotel rooms specified in the search
srch_saturday_night_bool Boolean +1 if the stay includes a Saturday night, starts from Thursday with a length of stay is less than
or equal to 4 nights (i.e. weekend); otherwise 0
srch_query_affinity_score Float The log of the probability a hotel will be clicked on in Internet searches (hence the values are
negative) A null signifies there are no data (i.e. hotel did not register in any searches)
orig_destination_distance Float Physical distance between the hotel and the customer at the time of search. A null means
the distance could not be calculated
random_bool Boolean +1 when the displayed sort was random, 0 when the normal sort order was displayed
comp1_rate Integer +1 if Expedia has a lower price than competitor 1 for the hotel; 0 if the same; -1 if Expedia’s
price is higher than competitor 1; null signifies there is no competitive data
comp1_inv Integer +1 if competitor 1 does not have availability in the hotel; 0 if both Expedia and competitor 1
have availability; null signifies there is no competitive data
comp1_rate_percent_diff Float The absolute percentage difference (if one exists) between Expedia and competitor 1’s price
(Expedia’s price the denominator); null signifies there is no competitive data
comp2_rate
comp2_inv (same, for competitor 2 through 8)
comp2_rate_percent_diff
...
comp8_rate
comp8_in
comp8_rate_percent_diff
Training set only
position Integer Hotel position on Expedia’s search results page. This is only provided for the training data,
but not the test data
click_bool Boolean 1 if the user clicked on the property, 0 if not
booking_bool Boolean 1 if the user booked the property, 0 if not
gross_booking_usd Float Total value of the transaction. This can differ from the price_usd due to taxes, fees, conven-
tions on multiple day bookings and purchase of a room type other than the one shown in the
search

Figure 2: Hotel result

3
Figure 3: Cost overview

D ETAILED TASK D ESCRIPTION

To make things easier, we will use a DM process model to describe your task in a bit more
detail, similar to what you have seen in assignment 1, and during the lectures.

TASK 1: B USINESS UNDERSTANDING

Your task is to predict what hotel properties that result from a search of a user, the user is
most likely to click on. Of course, more people have worked on such predictions. Can you
find some other people that have tried to make such predictions (e.g. from the Kaggle com-
petition)? And what have they used as most prominent predictors? Have other people that
participate in the competition mentioned anything about their approaches? Please spend a
couple of paragraphs on this topic in a ’Related Work’ section in your report.

TASK 2: D ATA UNDERSTANDING

In this task, you will do some exploratory data analysis (EDA). Explore the dataset, count,
summarize, plot things, and report findings that are useful for creating predictions. Remem-
ber that EDA is not necessary done once at the start of the project. It is expected that you do
some EDA, build some features, train some models, then some idea comes up, do some more
EDA, modify your features, train another model on these new features, and so on.

TASK 3: D ATA PREPARATION

You’ll certainly need to work on the dataset, to create, modify or add new features. For in-
stance, you might want to compare the different properties that resulted from the search
instead of learning from them one by one. There are certain attributes with a large amount of
missing values, do they still provide useful information? And how will you handle a missing

4
value if this shows to be the case? Finally, in order to test your approach (since you do not
know the answers for the test set) you will need to split up your data to test your approach
yourself before you generate your answers on the test set.
Of course, you are also allowed to use external data sources if you find ones that are useful.
In case you get inspired by previous work on this competition, please make sure you properly
cite the sources you base your approach on.

TASK 4: M ODELING AND E VALUATION

Naturally, once you prepare the dataset, you should be able to build models. You have great
freedom to select the techniques you feel are most appropriate and should try at least two
different techniques. At least use one technique that has been discussed during the lecture
on Recommender Systems or a variant thereof. . The choice of that techniques or alternatives
you try might be influenced by how we would like to measure your predictions at the end
(described later in this document). To test how your model is compared to other, you can
upload your answers for the test set on the in class Kaggle website: https://www.kaggle.
com/competitions/dmt-2024-2nd-assignment, see previous instructions for signing up
using https://tinyurl.com/join-dmt-cup. Note that the score on Kaggle is only for part
of the test set, the score on the rest of the test set will only be disclosed on the final lecture
and will form part of your final grade.

TASK 5: D EPLOYMENT
While we do not go into real deployment, we do want you to go into a brief discussion on how
Expedia could use your approach to deploy it on their systems in a scalable way, knowing that
they have much more data available and also that the characteristics of the data can change
over time. Discuss this in the light of the methods that have been introduced in the big data
engineering and big data infrastructures lectures.

D ELIVERABLES
We have covered the process above, let us see what we expect you to deliver.

P REDICTIONS
You’ll need to submit your prediction file on the in class website of Kaggle, which ranks the
properties belonging to a user search on the likeliness that the property will be booked. Here,
you should start with listing the property most likely to be booked. An example of part of
such a file is shown below.

SearchId ,PropertyId
2, 7771
2, 26540
2, 25579
2, 7374
2, 131173

5
2, 37331
2, 27090
2, 12938
2, 78858
2, 30434
2, 91899
2, 3105
2, 6399
3, 130729
3, 103937
3, 55688

Please make sure to submit as a team with team name in the format VU-DM-2024-Group-x,
replacing x with the group number from Canvas, e.g. VU-DM-2024-Group-132. This way,
we can take your score into account when grading and when identifying the winner of the
competition.
The deadline for submitting the predictions on Kaggle is 19/05/2024, 23:59.

S CIENTIFIC R EPORT
The assignment is not only about winning, but also about quality of the process and un-
derstanding of what you did. Therefore, we would like you to write a report, which should
contain the following:

1. What you did (you might want to follow the process model, and describe the steps you
took. If you tried a number of things but only some worked, please mention those that
did not work as well, and discuss why they might not have worked).

2. A discussion on scalable deployment of your approach.

3. What you learned (either inside the main part of the report, or separately in a para-
graph of two, please describe what skills and knowledge you have gained from this as-
signment, what were the main difficulties, expected and unexpected outcomes of your
experiments, etc.

4. Please format the document according to the LNCS guidelines. Templates are available
on Canvas for both LateX and Microsoft Word, do not deviate from these templates.
Note that you do not need to include an abstract in your report. The paper should not
exceed 14 including all figures and tables, but excluding references (references do
not count for the number of pages to encourage you to cite all relevant work). With the
page limit, the aim is to challenge you to report only what is necessary. Make sure we
can identify your report, i.e., your group number, names and student numbers should
be in the document’s header.

P ROCESS R EPORT
As the assignment is done in a group, we would like to get insight into what each individual
group member contributed to the eventual result. Therefore we ask you to compose a process
report of at most 2 pages (using the same template) which addresses:

6
1. A schedule describing when you performed what task (e.g. on April 28 we explored the
dataset and looked for suitable approaches for the task at hand).

2. Who contributed to what task (e.g. Angie was responsible for transforming the dataset
into a suitable format for the algorithms chosen whereas Berend was working on the
report).

3. A critical reflection of the overall cooperation within the team.

E VALUATION AND G RADING

Here’s how you will get rewarded for your work. 80% of this mark can be achieved by submit-
ting a nice and thorough report, 20% will come from where you end up in the competition.
The process report will be used to make sure all contributed enough. In case of a clearly un-
equal contribution, grade differentiation will be applied within the group. The deadline for
the reports is 24/05/2024, 23:59. The two reports should be submitted via Canvas while your
prediction file should be uploaded on our Kaggle competition site with deadline 19/05/2024,
23:59. Regarding the competition-based marks, scores will be computed based on your re-
sults: the winner gets a 10, and a performance equal to random gives you a score of a 4. The
evaluation of the competitions is explained in more detail below as well as the final presen-
tation session. Furthermore, a detailed grading scheme can be found in Table 2.

W INNING THE COMPETITION

The winner will be rewarded with the fame and glory of winning the 2024VU Data Mining
Techniques cup. Your accuracy score will be determined as follows (cf. Kaggle):
The evaluation metric for this competition is Normalized Discounted Cumulative Gain (NDCG)@5
calculated per query and averaged over all queries with the values weighted by the l og _2
function. See https://en.wikipedia.org/wiki/Discounted\_cumulative\_gain for more
details.
Hotels for each user query are assigned relevance grades as follows:

• 5 - The user purchased a room at this hotel

• 1 - The user clicked through to see more information on this hotel

• 0 - The user neither clicked on this hotel nor purchased a room at this hotel

Submissions for each user query should recommend hotels in order from the highest grade
(most likely to purchase a hotel room) to the lowest grade (least likely to purchase a hotel
room or click on the hotel). We know that the correct values for the test set are available on-
line, of course, you are not allowed to use this. If we suspect that you used those values we will
ask you for your code and check whether your results are reproducible using the training set
as a basis for generating your predictive model. You should upload your prediction on the in
class Kaggle website: https://www.kaggle.com/competitions/dmt-2024-2nd-assignment.

7
C LOSING EVENT
The presentation of your final assignment will be done during the closing event. Here, we
will present the final outcome of the competition (given the predictions you handed in the
weekend before) and hand out the cup and the fame and glory to the lucky winners. Six
groups will be asked to present, those groups that ended up in the top 3 and three additional
random groups will be asked to present their work. The closing event will take place on May
28th 2024 between 15:30 and 17:15 in room HG-01C (Aula).

Table 2: Grading scheme

Task Grading Component Weight

Dataset statistics 10
Plots 10
Rationale and interpretation 10
Dataset pre-processing: report a replicable pro- 5
cess of feature engineering
Rationale for feature engineering 5
Algorithm: which/why/how it works 10
Parameters of algorithm used 5
Evaluation of model created 10
Scientific report
Quality of the writing 10
Final model description 10
Deployment in real life with big data engineering 10
and infrastructure
What you learned 5
Extra page -10
Wrong formatting -10
Total 100 (Weight 80%)
Kaggle ranking Total 100 (Weight 20%)
Total 100

Chapter 1 Data Analytics
No ratings yet
Chapter 1 Data Analytics
30 pages
Project 1 - Instructions, Airbnb
No ratings yet
Project 1 - Instructions, Airbnb
7 pages
Acta Paediatrica - 2017 - Horowitz Kraus - Brain Connectivity in Children Is Increased by The Time They Spend Reading Books
No ratings yet
Acta Paediatrica - 2017 - Horowitz Kraus - Brain Connectivity in Children Is Increased by The Time They Spend Reading Books
9 pages
Capstone Project 1
100% (1)
Capstone Project 1
20 pages
Gram Panchayat Atlas 2016 PDF
50% (2)
Gram Panchayat Atlas 2016 PDF
527 pages
Data - Personalize Expedia Hotel Searches - ICDM 2013 - Kaggle
No ratings yet
Data - Personalize Expedia Hotel Searches - ICDM 2013 - Kaggle
5 pages
AirBnB Customer Acquisition Report
No ratings yet
AirBnB Customer Acquisition Report
14 pages
Introduction To English As A Second Language Teacher's Book Fourth Edition
No ratings yet
Introduction To English As A Second Language Teacher's Book Fourth Edition
19 pages
Smart Farming
No ratings yet
Smart Farming
2 pages
DM Assignment 2 - Group 6
No ratings yet
DM Assignment 2 - Group 6
12 pages
Hotel Recommender System
No ratings yet
Hotel Recommender System
10 pages
Kaggle Competition: Expedia Hotel Recommendations: Gourav G. Shenoy, Mangirish A. Wagle, Anwar Shaikh
No ratings yet
Kaggle Competition: Expedia Hotel Recommendations: Gourav G. Shenoy, Mangirish A. Wagle, Anwar Shaikh
12 pages
ML1 Project
No ratings yet
ML1 Project
67 pages
SQQP3123 A212 Assignment 2 - Data Mining
No ratings yet
SQQP3123 A212 Assignment 2 - Data Mining
4 pages
Inn Hotels Group ML 1 Coded Project Business Report
No ratings yet
Inn Hotels Group ML 1 Coded Project Business Report
14 pages
Term Project - Python
No ratings yet
Term Project - Python
27 pages
GL Project3 Supervised Learning
No ratings yet
GL Project3 Supervised Learning
32 pages
Ivanov 2013 Personalize Hotel Searches
No ratings yet
Ivanov 2013 Personalize Hotel Searches
2 pages
ML1 Project (Coded) - Sample Business Report-1
No ratings yet
ML1 Project (Coded) - Sample Business Report-1
56 pages
Powerbi Excercise
No ratings yet
Powerbi Excercise
5 pages
INNHotels Group
No ratings yet
INNHotels Group
40 pages
Industry Background and Challenges
No ratings yet
Industry Background and Challenges
5 pages
Assessment 3
No ratings yet
Assessment 3
18 pages
INN Hotels Project
No ratings yet
INN Hotels Project
26 pages
Data Analysis Assessment 2
No ratings yet
Data Analysis Assessment 2
4 pages
ADBproject
No ratings yet
ADBproject
8 pages
Amit Khilare INN Hotels Project ML 1
No ratings yet
Amit Khilare INN Hotels Project ML 1
39 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
30 pages
HMPE 211 #2 Data Analytics Affect Industry
No ratings yet
HMPE 211 #2 Data Analytics Affect Industry
10 pages
Combination of Diverse Ranking Models For Personalized Expedia Hotel Searches
No ratings yet
Combination of Diverse Ranking Models For Personalized Expedia Hotel Searches
6 pages
Foundation of Data Science: Cia1 1bba A
No ratings yet
Foundation of Data Science: Cia1 1bba A
6 pages
Hotels Review Classification Final
No ratings yet
Hotels Review Classification Final
34 pages
Mini II Batch 7 PPT (Final)
No ratings yet
Mini II Batch 7 PPT (Final)
18 pages
Hyatt Hotel Project
100% (1)
Hyatt Hotel Project
43 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
ML1+Project+ (Coded) + +Sample+Business+Report
No ratings yet
ML1+Project+ (Coded) + +Sample+Business+Report
56 pages
Data Science Laboratory Worksheet
No ratings yet
Data Science Laboratory Worksheet
4 pages
Balaji 1
No ratings yet
Balaji 1
30 pages
MBA 909 - DATA - ANALYTICS - Assessment - 3
No ratings yet
MBA 909 - DATA - ANALYTICS - Assessment - 3
19 pages
Tourism Adoption Project Report
No ratings yet
Tourism Adoption Project Report
14 pages
2023 s2 DB A1a SQLQueries
No ratings yet
2023 s2 DB A1a SQLQueries
5 pages
SHOPIT SYNOPSIS MERN STACK New
No ratings yet
SHOPIT SYNOPSIS MERN STACK New
14 pages
Cart-Rf-ANN: Prepared by Muralidharan N
0% (1)
Cart-Rf-ANN: Prepared by Muralidharan N
16 pages
BMGT 7074
No ratings yet
BMGT 7074
21 pages
Capstone Project Output - Hotel Room Pricing in Indian Cities
No ratings yet
Capstone Project Output - Hotel Room Pricing in Indian Cities
23 pages
Big Mart Sales Prediction Using Machine Learning Report PDF
No ratings yet
Big Mart Sales Prediction Using Machine Learning Report PDF
56 pages
Internshippresentation 230414184008 11879a25
No ratings yet
Internshippresentation 230414184008 11879a25
24 pages
Chapter 2
No ratings yet
Chapter 2
4 pages
"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb
No ratings yet
"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb
24 pages
IIMT2641 Group 6 Final Report
No ratings yet
IIMT2641 Group 6 Final Report
24 pages
Problem Statements For PBL Internships
No ratings yet
Problem Statements For PBL Internships
3 pages
DM J Component Review 1
No ratings yet
DM J Component Review 1
9 pages
Project List Data Analytics
No ratings yet
Project List Data Analytics
13 pages
Sentiment Analysis of Reviews Using Machine Learning
100% (1)
Sentiment Analysis of Reviews Using Machine Learning
33 pages
A Comprehensive Mechanism For Hotel Recommendation To Achieve Personalized Search Engine
No ratings yet
A Comprehensive Mechanism For Hotel Recommendation To Achieve Personalized Search Engine
13 pages
Report Final Stats Is Tics
No ratings yet
Report Final Stats Is Tics
7 pages
ML5 Decision Tree Airline Safety
No ratings yet
ML5 Decision Tree Airline Safety
3 pages
Hotel Booking Analysis Report
No ratings yet
Hotel Booking Analysis Report
3 pages
Hotel Booking Cancellation Prediction System
No ratings yet
Hotel Booking Cancellation Prediction System
17 pages
Final
No ratings yet
Final
14 pages
Naan Mudhalvan Phase 2
No ratings yet
Naan Mudhalvan Phase 2
13 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
Price: Maximizing Customer Loyalty through Personalized Pricing
From Everand
Price: Maximizing Customer Loyalty through Personalized Pricing
Cactus Raazi
No ratings yet
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
01 Insight IAS CSP20 Thematic Test 1Q
No ratings yet
01 Insight IAS CSP20 Thematic Test 1Q
24 pages
Crodamol Eo LQ MV
No ratings yet
Crodamol Eo LQ MV
8 pages
Right Layout
No ratings yet
Right Layout
13 pages
v3 - Tubit - Streaming Best Practices 2022
No ratings yet
v3 - Tubit - Streaming Best Practices 2022
47 pages
Quotation For A Granite Line Polisher
No ratings yet
Quotation For A Granite Line Polisher
7 pages
Master Symbol
100% (1)
Master Symbol
3 pages
Practical Datesheet
No ratings yet
Practical Datesheet
6 pages
LP 01 - Perform Computer Operating System: What Is The Computer?
No ratings yet
LP 01 - Perform Computer Operating System: What Is The Computer?
10 pages
Power System Studies - Abstract
No ratings yet
Power System Studies - Abstract
5 pages
Group 7 - Indirect Method
No ratings yet
Group 7 - Indirect Method
8 pages
Chapter-7. Binary Search Tree
No ratings yet
Chapter-7. Binary Search Tree
12 pages
Control CKT
No ratings yet
Control CKT
36 pages
CHAPTER 10 Class and Method Design
No ratings yet
CHAPTER 10 Class and Method Design
87 pages
Current Electricity
No ratings yet
Current Electricity
6 pages
Customer Service Metrics Calculator - HubSpot
No ratings yet
Customer Service Metrics Calculator - HubSpot
25 pages
S Aj User Manual
No ratings yet
S Aj User Manual
31 pages
Think Fast Think Slow @IELTS - Karimi
No ratings yet
Think Fast Think Slow @IELTS - Karimi
42 pages
JSSWH - Volume 52 - Issue 2 - Pages 501-538
No ratings yet
JSSWH - Volume 52 - Issue 2 - Pages 501-538
38 pages
Ignition Blocking Relay: 1.1 About The Accessory
No ratings yet
Ignition Blocking Relay: 1.1 About The Accessory
5 pages
Vivolo Pump
No ratings yet
Vivolo Pump
13 pages
Traffic Flow Prediction Models A Review of Deep Learning Techniques
No ratings yet
Traffic Flow Prediction Models A Review of Deep Learning Techniques
25 pages
1 - The Ultimate Guide To Collections in Excel VBA
No ratings yet
1 - The Ultimate Guide To Collections in Excel VBA
28 pages
Materail Planning
No ratings yet
Materail Planning
2 pages
Information Technology Infrastructure Strategy Director in San Francisco Bay CA Resume Christopher Field
No ratings yet
Information Technology Infrastructure Strategy Director in San Francisco Bay CA Resume Christopher Field
5 pages
Microprocessor and Interfacing CSE-2006: TOPIC - Water Level Indicator Using 8051 Microcontroller
No ratings yet
Microprocessor and Interfacing CSE-2006: TOPIC - Water Level Indicator Using 8051 Microcontroller
13 pages
Harvard Referencing
No ratings yet
Harvard Referencing
6 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

V RIJE U NIVERSITEIT A MSTERDAM

Deadline Competition: 19/05/2024, 23:59

D ATASET AND P ROBLEM

Figure 1: Search window

1 https://www.kaggle.com/c/expedia-personalized-sort or go to Kaggle.com > Competitions > All com-

petitions > Personalize Expedia Hotel Searches ICDM 2013

Figure 2: Hotel result

D ETAILED TASK D ESCRIPTION

TASK 1: B USINESS UNDERSTANDING

TASK 2: D ATA UNDERSTANDING

TASK 3: D ATA PREPARATION

TASK 4: M ODELING AND E VALUATION

2. A discussion on scalable deployment of your approach.

3. A critical reflection of the overall cooperation within the team.

E VALUATION AND G RADING

W INNING THE COMPETITION

• 5 - The user purchased a room at this hotel

• 1 - The user clicked through to see more information on this hotel

Table 2: Grading scheme

Task Grading Component Weight

You might also like