DE-Jumbotail Data Engineering Hiring Assignment
DE-Jumbotail Data Engineering Hiring Assignment
Problem Statement
This is a machine coding round which aims to explore how well you can implement a solution you have in your mind. The task at hand
pertains to the development of a system for tracking user journey within an e-commerce application.
The following steps outline the user's journey within the application:
This is generally referred to as the app funnel and in this task we want to build a system to track the percentage of users at each step of
the funnel.
An event is recorded for each of these steps, and it is possible for the user to discontinue their journey at any point. There could be 100,000
daily active users present on the app generating around 60,000 events in a minute but for the scope of this assignment you can ignore the
scale and focus on implementing the API, satisfying the requirements.
The objective is to create a REST API that can accept incoming event payloads in an asynchronous way and store them in a database for
analysis. There are 5 main components of this problem statement which you need to implement as solution of this problem statement.
1. Event Producer
2. Webhook
3. In memory queue
4. Queue consumer
5. Database
Final Output
The final output, obtained by querying the database, includes the following information:
1. Percentage of users in each stage of the user journey: This indicates the proportion of users at each stage of the journey within the
application. It provides insights into how users progress through the different steps of the app.
2. Evaluation of the performance of different cities: This evaluation is based on the percentage of users from each city. It allows us to
assess how well or poorly different cities are performing in terms of user engagement with the application. By analyzing the distribution
of users across cities, we can identify which cities have a higher or lower percentage of users using the app.
1. Event Producer
1. To simulate the journey of users, a DRIVER class should be written. This class will utilise multithreading, where each thread represents a
unique user on the e-commerce app.
2. The DRIVER class will generate events for each user, with each event representing a particular step of the funnel and containing
information related to it.
3. event = generateEvent(user)
a. When implementing the event generation method, it is important to consider user behavior. This means generating events in a ratio
that reflects the actual user behavior, considering the order of events, which events are more frequent, which stages users engage
with the most, and at which stages users are more likely to drop out.
b. Additionally, it is required to implement batching when sending these events. This helps in optimizing the process by sending events
in batches rather than sending individually.
c. Consider how the Event entity will be structured to answer the questions required in the Final Output (mentioned below). This means
designing the Event entity in a way that allows extracting the necessary information to generate the desired output.
d. To monitor the performance of the API, log the response time of the API calls. It is expected that the response time will be in the order
of milliseconds.
4. response = eventWebhook(event)
a. And once the events are generated send them to your webhook API (explained in next step).
b. To invoke the webhook, multiple threads (representing users) will be simultaneously calling the eventWebhook method.
2. Webhook
1. The system includes a server that acts as a receiver for events. This server is implemented as a REST API and can be accessed at
"localhost:8888/webhook". To ensure efficient handling of events, the API processes requests asynchronously by utilizing an in-memory
queue.
2. To ensure reliability, put in place a retrial mechanism in case push operation to the queue fails. This mechanism will attempt to resend
the event if an error occurs during the initial push operation.
3. In memory queue
1. Consider different options for selecting an in-memory queue system. Focus on what feels most familiar and comfortable for you. The
main objective of the queue is to enable asynchronous behavior. In the context of our webhook API, events will be pushed into this in-
memory queue, and a consumer will retrieve the events from the queue to ultimately insert them into a database of your preference.
4. Queue Consumer
1. The queue consumer is responsible for retrieving messages from the in-memory queue in batches and inserting them into the database
of your choice. It processes multiple messages together for efficient handling.
2. To ensure reliability, a simple retrial mechanism should be implemented in case there are failures during the database insertion process.
This mechanism will attempt to resend the messages if any errors occur during the DB calls. (event = eventRepository.insert(event))
3. Handle the duplicate events coming from the client. Duplicate events can come due to any reason. Don’t try to figure out the reason. It is
a given that events coming from the client can contain duplicate events.
5. Database
1. Choose a database that you are familiar and comfortable with for storing the events. This selected database will be used to store the
events and generate the required output as described in the Final Output section(mentioned below).
2. Ensure proper handling of multiple asynchronous write requests to the database.
Notes
If you are using chat gpt, do not just copy-paste blindly. Use it as an helper and write your solution after proper research and
understanding. Please have thorough understanding of whatever solution you are providing.
You are free to use any language of your choice but it is essential to adhere to appropriate software development and OOP principles
while implementing the solution.
Submission
Include an output file with results of the queries asked in the final output.
Please submit the completed assignment after compressing in .zip format , including a detailed readme file that provides instructions for
running the code from scratch. Please note that is there is no proper read me file assignment won’t be considered for evaluation.
When asked for interview, please come prepared with a running demo of your code. Assignment won’t be considered for evaluation if
the demo is not running.