FraudAnalystTakehomeTestv4 1
FraudAnalystTakehomeTestv4 1
Instructions
This assignment has a time limit of 24 hours. You are required to complete the following sections:
1. Data Analysis
2. SQL
3. Fraud Domain Knowledge
For each section, please write your answers in the corresponding boxes on the right. You may expand them if you need more space.
You may also submit your solution in other formats (i.e., Powerpoint, PDF, Markdown), as long as all sections are present within a single document.
GOOD LUCK!! :)
For this section, investigate and summarise your findings as though you are presenting to a group of staff who will take action on the fraud you find.
Please include any pivot tables and graphs that might support your analysis and represent your thinking process, as well as state all assumptions mad
Finally, you may use any outside resources necessary to help your answer.
Overall, you will be evaluated on your reasoning, findings and well as clarity of explanation.
Problem A
You are looking at the user logs of six agents (Lila, Ricky, Tono, Mamat, Bayu, and
Sarah) who are registering new drivers for one week (8-14 May 2017). Your answer h
Based on the data, are there any anomalies that you identify? Please state what the
anomalies are, and explain why.
Problem B
You are given a set of extracted data on the completed bookings between customers
and drivers, over a one week period, for a GO-FOOD merchant called Warkop ABC. Your answer h
Based on the data, are there any fraudulent acts that you identify from this data?
Please state details (ID and name) of all fraudulent entities that you identify, and
elaborate on any patterns of fraud you find concerning them.
You may use any SQL dialect, although we prefer BigQuery Standard-SQL.
Please comment your code where necessary. Higher scores will be given for both efficiency and readibility.
Problem C
Please write a query to return the order_no from all rows where customers made
concurrent bookings (i.e. where a customer made a new booking before they
completed or cancelled the previous booking). Please return the unique order
numbers of all concurrent bookings in ascending order.
Note that there are millions of bookings created in Gojek daily, hence the query
should be efficient.
Sample Output
Your answer h
10000
10001
Explanation
Only Customer A had a concurrent booking. The correct answer would return
order_no #10000 and #10001.
Problem D
You are given two tables, one on booking orders for trips, and the other on driver
information.
Please identify all customers who have had at least 60% of their GO-CAR bookings
completed by the same driver within the last 30 days. Then, please return the unique
full name of those drivers in alphabetical order.
Explanation
Please note that you will also be evaluated on your reasoning and clarity of communication. Fraud in real life is often ambiguous, therefore analysts m
and external stakeholders. So again, please state your methods and conclusions in a clear and concise manner.
In GO-JEK and all other ride-hailing apps, there are instances where an individual may
have multiple customer and/or driver accounts. Please discuss the following: Your answer h
(a) What kind of fraudulent activities could be done with multiple accounts?
(b) What kind of datapoints would you collect to identify these duplicate accounts,
and what methods or thresholds would you set?
(c) From an operational and also a data perspective, what recommendations would
you make to prevent them?
End
more space.
nt within a single document.
ysis
th internal and external stakeholders.
ency
Your answer here
Your answer here
nowledge
ed to have a good sense of situational reasoning.
mbiguous, therefore analysts must be able to justify their decision to both internal