0% found this document useful (0 votes)
64 views

Problem Statement

You have been provided with a Seattle Airbnb dataset and asked to predict listing prices in the test set using a model of your choice. You are to perform exploratory data analysis to understand relationships between variables, conduct any needed data engineering, test several models and select a final model to make predictions. Your submission should include a Jupyter notebook documenting your process and a CSV file with predictions for the test set.

Uploaded by

suryansh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Problem Statement

You have been provided with a Seattle Airbnb dataset and asked to predict listing prices in the test set using a model of your choice. You are to perform exploratory data analysis to understand relationships between variables, conduct any needed data engineering, test several models and select a final model to make predictions. Your submission should include a Jupyter notebook documenting your process and a CSV file with predictions for the test set.

Uploaded by

suryansh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

You have been provided with the Seattle AirBnb dataset of listings data.

You will need to predict


the prices of listings shared in the test dataset. A natural use case for this would be in helping
people price their listings. In your assignment, feel free to use either Python or R, with any
libraries of your choice. You will be evaluated on being able to justify your solution during the
interview, explain the underlying mathematics and statistical phenomena, and make accurate
predictions.

Task Description

1. Download a sample of the Seattle AirBnb listings dataset linked here (original data can be
found here).

2. The goal of the assignment is to predict the prices of AirBnb listings from the test set, using a
model carefully selected by you, trained, tested, and explained.

3. Conduct some exploratory data analysis and understand the relationships between potential
predictors. Document your EDA in a notebook.

4. Note that this is not a particularly large dataset. You will be partially scored based on your
ability to perform ETL on the dataset. Describe what you have done for ETL in 3-4 sentences.

5. Try out a few different models (use your judgement after doing the EDA), and note down why
you have tried each one (2-3 sentences describing the “why” is enough).

6. Pick your final model, and explain why this model is better than the others. Train it, test it, and
list out your analyses (4-5 sentences, or more if required). Finally, run your predictions on the
real test set provided above. Submission Your final submission should be have two files, as
follows

: 1. Notebook with the following components and partial scores for each component: a. EDA-
documented in the notebook, with graphs describing correlations between variables, potential
predictors, initial analyses on the data, and feature engineering (if any)

b. Data Engineering - documented in the notebook, in a few sentences describing the ETL
process and any data engineering that was performed

c. Initial Modelling - a few models run on smaller folds of the dataset, with explanations for why
each model was experimented with

d. Model Selection - analyses around output from each of the models initially selected, and
justification for selecting one model over the others you had initially contemplated 2. CSV file of
listing prices for the test set: a. Final Predictions - each listing from the data set and the model-
predicted price (2 columns: id, price)

You might also like