0% found this document useful (0 votes)
25 views

GoTo Data Science Recruiting Assignment

Uploaded by

Pranav Khurana
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

GoTo Data Science Recruiting Assignment

Uploaded by

Pranav Khurana
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

GoTo Data Science Recruiting Assignment: Solution Approach

The solution Q! has primarily achieved the following –


1. Ensuring seamless end-to-end execution of the pipeline
2. Augmenting the data merging script based on business and intent understanding
3. Implementing a feature capturing historical completed trips for each driver
4. Capture and store the model performance metrics
5. Improving performance of the classification model by hyperparameter tuning
6. Miscellaneous semantic changes

Detailed Walkthrough-

1. Initially, the code was setting the target as 1 wherever the merged dataset had
"ACCEPTED" in the `participant_status` column. However, this was incorrect
because a "CREATED" event is logged whenever the system polls a driver, after
which the driver either ACCEPTS, REJECTS, or IGNORES the request. Including
"CREATED" rows biased the target towards 0. To correct this, we should remove
"CREATED" rows and set the target to 0 only for "REJECTED" or "IGNORED"
statuses, as our goal is to maximize the "ACCEPTED" responses.

2. To implement the new feature capturing the track record of drivers, I retrieved the
number of unique rides COMPLETED by each driver from the booking_log
database. I then merged this data with the master database on the driver_id column
and dropped the null values.

3. To evaluate the model, I used accuracy, precision, recall, and F1 score metrics from
the `sklearn` library. I defined a new `predict_class` function in the
`SklearnClassifier` class to return the predicted classes instead of the predicted
probabilities, which the current `predict` function was computing.

4. The model was performing well with the predefined parameters on this data, but I
altered the hyperparameters, reducing the max_depth attribute to prevent overfitting
which resulted in a better score on test_data. Note that without making the
alterations to the code specified in Step 1, the model was performing poorly on the
test_data, reaffirming the fact that the data was biased towards 0 initially.

5. Finally, I made some semantic changes including adding relevant comments


wherever necessary, adding the new historical_completes column to the config file,
changing the name of target from is_complete to is_accepted.

You might also like