0% found this document useful (0 votes)
332 views9 pages

Data Analyst 101

1. The document discusses a case study of analyzing whether adding a new chatbot feature to a website would increase purchases. 2. An A/B test was run which found the conversion rate was actually lower for users seeing the chatbot interface. 3. Rather than just reporting the results, the analyst broke down why the new interface underperformed by scrutinizing differences in colors, latency, and support to provide insights beyond the numbers.

Uploaded by

Indira Maharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views9 pages

Data Analyst 101

1. The document discusses a case study of analyzing whether adding a new chatbot feature to a website would increase purchases. 2. An A/B test was run which found the conversion rate was actually lower for users seeing the chatbot interface. 3. Rather than just reporting the results, the analyst broke down why the new interface underperformed by scrutinizing differences in colors, latency, and support to provide insights beyond the numbers.

Uploaded by

Indira Maharani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Data Analyst 101 : Reflection and Insights to become a Great Data Analyst

To avoid the pitfall of turning into a query monkey, a data analyst should actively follow these
steps, with a focus on the 1st, 3rd and 4th steps.
1. Understand the business problem with business stakeholders.
2. Extract raw data efficiently.
3. Extract insights and recommendations.
4. Present findings intuitively.
Case Study :
One day, you received an email from a product stakeholder containing the following.
The product team has just built a new chatbot that shows up on the home page. Can the data team
plan an experiment and tell us if the new chat bot should be launched? Attached is how the new
chat bot looks like.

1. Understand the business problem with business stakeholders.


An understanding of the business problem is important because it allows a data analyst in
providing an analysis that is in line with the business needs. Executing the request without an
understanding of the problem is a futile exercise at best.
I like to imagine a business problem as an iceberg. When we first encounter a business problem,
we only see the email that briefly explains the problem from the internal customer. There are a
lot more to this problem than we can observe at this point.
As cliche as it sounds, it found it helpful to clarify the tasks using the 5W1H framework.
5W1H’s of the data
 Who is the end-user of this data?
 What is the impact of this work?
 When is the relevant period?
 Where can I get the data from?
 Why do we need this?
 How will the data analysis be used?
After clarifying the task, we understand much more about the context surrounding the task.
(Why the request is raised) Recently, we see a decrease in the number of users who are making
a purchase on the website. The product team hypothesizes that promoting the product with a
chatbot will help increase the number of purchasing users.
(Where to get the data) Since this is a brand new feature, there is no existing data set on this
new feature. We will need to plan an experiment and collect new data.
(Who is the customer + how will the analysis be used) This will be presented to the business
team, who will decide whether this new chatbot feature makes business sense.
(What is the impact of this work) If we can increase the conversion, we can potentially
increase the revenue we earn from this product by X USD.
That clarifies our thought process and sheds new light on how we can make our next move.
2. Extract raw data and collecting data
To obtain the raw data of whether the chatbot will increase the number of users who make a
purchase, we plan an experiment or an A/B Test.
In setting up the A/B Test, we will divide the users who landed on the home page into equally
sized groups A and B. The users in group A will see the old interface without the chatbot, while
those in B will see the new interface with the chatbot. Our goal is then to see if more users in A
or B make a purchase.
In the process of planning of an experiment, we determine the hypothesis to be tested, and then
use statistical methods to calculate the required sample size based on the level of significance
and the desired power of the test. To understand more, you can refer to the video below for a
short introduction to AB Testing.
Link video : https://youtu.be/zFMgpxG-chM
To plan the experiment, we need to extract data from the database through SQL.
After 5 months as an analyst, I realize that the skills of producing efficient and readable SQL is
extremely valuable for a data professional. In fact, SQL is the common language for a data
scientist, data engineer and a data analyst. A data analyst who does not know SQL fluently will
have a challenging time communicating with other data colleagues and face difficulty in
navigating the data landscape in an organization.
If you are an aspiring data scientist, and you scoff at the thought of collecting data through
querying data, you might want to rethink your decision: according to a survey of 2,360 data
scientists across the world by Anaconda, a data scientist spends up to 19% of their time loading
(extracting) their data.
A beginner in SQL would be familiar with the following simple SQL functions…
≠ SELECT and WHERE for filtering and selection
≠ COUNT, SUM, MAX, GROUP BY, HAVING for aggregating data
≠ DISTINCT, COUNT DISTINCT for producing useful distinct lists and distinct
aggregates
≠ OUTER (e.g. LEFT) and INNER JOIN when/where to use them
However, I found my work to be much more efficient once I learnt the following:
 Temporary tables
 Handling NULL with COALESCE
 Sub-queries and their impact on the query’s efficiency
 Window functions like PARTITION, LEAD, LAG
 User defined functions
As I mentioned in my previous post How to Learn Data Science in 2020, I picked up the basics
of SQL from Datacamp and Dataquest which did not go into such intermediate concepts
extensively.
Alas, when I first started, I resorted to convoluted nested queries that were a pain to the eye since
I was ignorant of the beautiful syntax of sub-queries. The use of sub-queries make the overall
query much more modular, more readable and easier to troubleshoot.
To illustrate this point, let’s compare the two queries.
Some of the resources I used to learn these intermediate functions include Zachary Thomas’
SQL Questions and Leetcode. They helped my role as an analyst tremendously.
3. Adding insights to analysis
Assume that we’ve collected the data and performed the experiment. Next step: we calculate
whether there is a statistically significant difference between the A/B groups. In this case, the
question is
Does the difference in user interface increases the number of users making a purchase?
Assume that the results come in as follow:
We can calculate the conversion rate as the number of users who made a purchase as a
proportion of the number of users who landed on the home page for the control and treatment
groups. We see that the conversion rate of the treatment group who received the chatbot interface
had a lower conversion rate.
At this point, a query monkey will report this table and conclude that the chatbot did not help
increase the number of users making a purchase… but we’ve learnt not to be a query monkey.
One of the main goals of an analyst is to provide insights. Most of the time, insights come from
an explanation of the numbers that are observed in the data.
Providing data without insight is like uncovering impure gold in a gold mine. To get to the real
gold (insights), we need a little more work.
At this point, the most important questions that we can answer is
Why is the conversion rate of the new interface poorer than the old one?
This is an open-ended question, which is not necessarily easy to answer. To answer this, we can
break down the problem into smaller parts,
We can further break down these two steps logically into smaller parts and seek to answer them
sequentially. The following is an illustration of how we can break down the problem:
Checking the validity of the data
 Pre-experiment
Faulty experimental design: Was the experimental design correct? Did we calculate the sample
size correctly? Did we use the correct test statistic?
 During the experiment
Faulty data: Was the data collected correctly? For instance, was the clicks of both groups of
users being logged by the database correctly?
 Post experiment
Incorrect analysis: Was the test statistic applied correctly?
Seeking the Difference between the Old and the New User Interface
To seek out the differences between the old and the new user interface, we can scrutinize the
differences between the appearance and the performance of the old and new interfaces. This
might provide us with an explanation of the poorer performance of the new interface. For
example, we can look out for:
Color: Was the color in the new interface less appealing than that of the old interface?
Latency: Was the new interface slower than the old interface? If so, by how much?
Support: Was the new function supported by the phone or the browser?

There are many, many more differences that we can explore.
Once we have found the reason that explains the difference in conversion rate, we can then make
recommendations on what experiment is to be done next.
The bottom line is this: as an analyst, it helps to constantly ask yourself…
Can I explain this data further? Can I get more insights from this?
And this can be done by breaking problems down into smaller parts and answering them
logically.

4. Present findings intuitively


After extracting the insights, a data analyst needs to present his or her findings. More often than
not, this comes in the form of data visualization and presentations. In this example, we will be
presenting our finding that the chatbot did not increase the sales of the website to the product and
business teams.
Here are some pointers that I found particularly helpful when I make presentations to non-data
stakeholders.
 Gently introduce the problem statement.
If the attendees do not understand the business value of the presentation, it is easy for them to
shut down. It will be great if you can introduce the problem as an important and high-value
problem to address so we can capture their attention from the start. Here, we can translate the
impact of the analysis by showing the increase in revenue if the conversion rates are increased.
 Provide as much context to the attendees of the meeting as possible.
The attendees might not know as much about the new product as you do. As such, providing
relevant background information about the product gets the attendees up to speed. In this
example, we can provide pictures of the flow of the chatbot.
 Explain metrics and concepts intuitively.
The attendees might not come from a technical background, and may not understand some
technical terms that you might use in the presentation. Avoiding technical jargons might help the
attendees understand your presentation better.
 Pause at appropriate points to allow questions mid-presentation.
By allowing the attendees to clarify their doubts as early as possible, we can help them follow
the logic of the presentation.
These are simply guidelines to follow and are context-specific. In a future post, I will document
more good practices of a data presentation.

You might also like