Data Analyst 101

1. The document discusses a case study of analyzing whether adding a new chatbot feature to a website would increase purchases. 2. An A/B test was run which found the conversion rate was actually lower for users seeing the chatbot interface. 3. Rather than just reporting the results, the analyst broke down why the new interface underperformed by scrutinizing differences in colors, latency, and support to provide insights beyond the numbers.

Uploaded by

Indira Maharani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

332 views9 pages

Data Analyst 101

Uploaded by

Indira Maharani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Data Analyst 101 : Reflection and Insights to become a Great Data Analyst

To avoid the pitfall of turning into a query monkey, a data analyst should actively follow these
steps, with a focus on the 1st, 3rd and 4th steps.
1. Understand the business problem with business stakeholders.
2. Extract raw data efficiently.
3. Extract insights and recommendations.
4. Present findings intuitively.
Case Study :
One day, you received an email from a product stakeholder containing the following.
The product team has just built a new chatbot that shows up on the home page. Can the data team
plan an experiment and tell us if the new chat bot should be launched? Attached is how the new
chat bot looks like.

1. Understand the business problem with business stakeholders.

An understanding of the business problem is important because it allows a data analyst in
providing an analysis that is in line with the business needs. Executing the request without an
understanding of the problem is a futile exercise at best.
I like to imagine a business problem as an iceberg. When we first encounter a business problem,
we only see the email that briefly explains the problem from the internal customer. There are a
lot more to this problem than we can observe at this point.
As cliche as it sounds, it found it helpful to clarify the tasks using the 5W1H framework.
5W1H’s of the data
 Who is the end-user of this data?
 What is the impact of this work?
 When is the relevant period?
 Where can I get the data from?
 Why do we need this?
 How will the data analysis be used?
After clarifying the task, we understand much more about the context surrounding the task.
(Why the request is raised) Recently, we see a decrease in the number of users who are making
a purchase on the website. The product team hypothesizes that promoting the product with a
chatbot will help increase the number of purchasing users.
(Where to get the data) Since this is a brand new feature, there is no existing data set on this
new feature. We will need to plan an experiment and collect new data.
(Who is the customer + how will the analysis be used) This will be presented to the business
team, who will decide whether this new chatbot feature makes business sense.
(What is the impact of this work) If we can increase the conversion, we can potentially
increase the revenue we earn from this product by X USD.
That clarifies our thought process and sheds new light on how we can make our next move.
2. Extract raw data and collecting data
To obtain the raw data of whether the chatbot will increase the number of users who make a
purchase, we plan an experiment or an A/B Test.
In setting up the A/B Test, we will divide the users who landed on the home page into equally
sized groups A and B. The users in group A will see the old interface without the chatbot, while
those in B will see the new interface with the chatbot. Our goal is then to see if more users in A
or B make a purchase.
In the process of planning of an experiment, we determine the hypothesis to be tested, and then
use statistical methods to calculate the required sample size based on the level of significance
and the desired power of the test. To understand more, you can refer to the video below for a
short introduction to AB Testing.
Link video : https://youtu.be/zFMgpxG-chM
To plan the experiment, we need to extract data from the database through SQL.
After 5 months as an analyst, I realize that the skills of producing efficient and readable SQL is
extremely valuable for a data professional. In fact, SQL is the common language for a data
scientist, data engineer and a data analyst. A data analyst who does not know SQL fluently will
have a challenging time communicating with other data colleagues and face difficulty in
navigating the data landscape in an organization.
If you are an aspiring data scientist, and you scoff at the thought of collecting data through
querying data, you might want to rethink your decision: according to a survey of 2,360 data
scientists across the world by Anaconda, a data scientist spends up to 19% of their time loading
(extracting) their data.
A beginner in SQL would be familiar with the following simple SQL functions…
≠ SELECT and WHERE for filtering and selection
≠ COUNT, SUM, MAX, GROUP BY, HAVING for aggregating data
≠ DISTINCT, COUNT DISTINCT for producing useful distinct lists and distinct
aggregates
≠ OUTER (e.g. LEFT) and INNER JOIN when/where to use them
However, I found my work to be much more efficient once I learnt the following:
 Temporary tables
 Handling NULL with COALESCE
 Sub-queries and their impact on the query’s efficiency
 Window functions like PARTITION, LEAD, LAG
 User defined functions
As I mentioned in my previous post How to Learn Data Science in 2020, I picked up the basics
of SQL from Datacamp and Dataquest which did not go into such intermediate concepts
extensively.
Alas, when I first started, I resorted to convoluted nested queries that were a pain to the eye since
I was ignorant of the beautiful syntax of sub-queries. The use of sub-queries make the overall
query much more modular, more readable and easier to troubleshoot.
To illustrate this point, let’s compare the two queries.
Some of the resources I used to learn these intermediate functions include Zachary Thomas’
SQL Questions and Leetcode. They helped my role as an analyst tremendously.
3. Adding insights to analysis
Assume that we’ve collected the data and performed the experiment. Next step: we calculate
whether there is a statistically significant difference between the A/B groups. In this case, the
question is
Does the difference in user interface increases the number of users making a purchase?
Assume that the results come in as follow:
We can calculate the conversion rate as the number of users who made a purchase as a
proportion of the number of users who landed on the home page for the control and treatment
groups. We see that the conversion rate of the treatment group who received the chatbot interface
had a lower conversion rate.
At this point, a query monkey will report this table and conclude that the chatbot did not help
increase the number of users making a purchase… but we’ve learnt not to be a query monkey.
One of the main goals of an analyst is to provide insights. Most of the time, insights come from
an explanation of the numbers that are observed in the data.
Providing data without insight is like uncovering impure gold in a gold mine. To get to the real
gold (insights), we need a little more work.
At this point, the most important questions that we can answer is
Why is the conversion rate of the new interface poorer than the old one?
This is an open-ended question, which is not necessarily easy to answer. To answer this, we can
break down the problem into smaller parts,
We can further break down these two steps logically into smaller parts and seek to answer them
sequentially. The following is an illustration of how we can break down the problem:
Checking the validity of the data
 Pre-experiment
Faulty experimental design: Was the experimental design correct? Did we calculate the sample
size correctly? Did we use the correct test statistic?
 During the experiment
Faulty data: Was the data collected correctly? For instance, was the clicks of both groups of
users being logged by the database correctly?
 Post experiment
Incorrect analysis: Was the test statistic applied correctly?
Seeking the Difference between the Old and the New User Interface
To seek out the differences between the old and the new user interface, we can scrutinize the
differences between the appearance and the performance of the old and new interfaces. This
might provide us with an explanation of the poorer performance of the new interface. For
example, we can look out for:
Color: Was the color in the new interface less appealing than that of the old interface?
Latency: Was the new interface slower than the old interface? If so, by how much?
Support: Was the new function supported by the phone or the browser?
…
There are many, many more differences that we can explore.
Once we have found the reason that explains the difference in conversion rate, we can then make
recommendations on what experiment is to be done next.
The bottom line is this: as an analyst, it helps to constantly ask yourself…
Can I explain this data further? Can I get more insights from this?
And this can be done by breaking problems down into smaller parts and answering them
logically.

4. Present findings intuitively

After extracting the insights, a data analyst needs to present his or her findings. More often than
not, this comes in the form of data visualization and presentations. In this example, we will be
presenting our finding that the chatbot did not increase the sales of the website to the product and
business teams.
Here are some pointers that I found particularly helpful when I make presentations to non-data
stakeholders.
 Gently introduce the problem statement.
If the attendees do not understand the business value of the presentation, it is easy for them to
shut down. It will be great if you can introduce the problem as an important and high-value
problem to address so we can capture their attention from the start. Here, we can translate the
impact of the analysis by showing the increase in revenue if the conversion rates are increased.
 Provide as much context to the attendees of the meeting as possible.
The attendees might not know as much about the new product as you do. As such, providing
relevant background information about the product gets the attendees up to speed. In this
example, we can provide pictures of the flow of the chatbot.
 Explain metrics and concepts intuitively.
The attendees might not come from a technical background, and may not understand some
technical terms that you might use in the presentation. Avoiding technical jargons might help the
attendees understand your presentation better.
 Pause at appropriate points to allow questions mid-presentation.
By allowing the attendees to clarify their doubts as early as possible, we can help them follow
the logic of the presentation.
These are simply guidelines to follow and are context-specific. In a future post, I will document
more good practices of a data presentation.

Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
100% (1)
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
481 pages
Google Coursera Data Analytics
No ratings yet
Google Coursera Data Analytics
37 pages
Personal Statement
82% (34)
Personal Statement
2 pages
Lesson 3 - Machine Learning Workflow
No ratings yet
Lesson 3 - Machine Learning Workflow
53 pages
A. Datetime
50% (2)
A. Datetime
98 pages
Become A Data Analyst A Workbook
No ratings yet
Become A Data Analyst A Workbook
16 pages
Data Analyst Resume
100% (2)
Data Analyst Resume
6 pages
Data Analytics For Absolute Beginners A Deconstructed Guide To Data Literacy 1081762462 9781081762469
100% (1)
Data Analytics For Absolute Beginners A Deconstructed Guide To Data Literacy 1081762462 9781081762469
133 pages
Data Science Fundamentals QB
No ratings yet
Data Science Fundamentals QB
23 pages
Data Analytics-Python
No ratings yet
Data Analytics-Python
41 pages
RoadMap Data Analytics For Beginners
100% (1)
RoadMap Data Analytics For Beginners
18 pages
UpGrad Campus - Data Science & Analytics Brochure
100% (1)
UpGrad Campus - Data Science & Analytics Brochure
10 pages
Data Analyst Roadmap 2023 by Rishabh Mishra
100% (1)
Data Analyst Roadmap 2023 by Rishabh Mishra
9 pages
Top 65 SQL Data Analysis Q&A
No ratings yet
Top 65 SQL Data Analysis Q&A
53 pages
Kickstart Career As Data Analyst
No ratings yet
Kickstart Career As Data Analyst
7 pages
Data Analyst Syllabus
No ratings yet
Data Analyst Syllabus
25 pages
The SQL Tutorial For Data Analysis v2
No ratings yet
The SQL Tutorial For Data Analysis v2
103 pages
Data Analyst Resume
No ratings yet
Data Analyst Resume
2 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
16 pages
Data Engineering Workbook
No ratings yet
Data Engineering Workbook
30 pages
Interview Questions and Answers For Data Analysts
No ratings yet
Interview Questions and Answers For Data Analysts
8 pages
Causal Forecasting Final
No ratings yet
Causal Forecasting Final
29 pages
Data Analyst Syllabus
No ratings yet
Data Analyst Syllabus
25 pages
Business Analytics With Power Bi
No ratings yet
Business Analytics With Power Bi
35 pages
A Beginners Guide To Data and Analytics
100% (1)
A Beginners Guide To Data and Analytics
22 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
39 pages
Data Analysis With Power BI SQL
100% (1)
Data Analysis With Power BI SQL
14 pages
Data Analyst
No ratings yet
Data Analyst
20 pages
Excel Interview Questions
No ratings yet
Excel Interview Questions
51 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
Top 50 Data Analyst Interview Questions (2023)
No ratings yet
Top 50 Data Analyst Interview Questions (2023)
26 pages
Data Analyst Resume: A Complete Guide: Preface
100% (1)
Data Analyst Resume: A Complete Guide: Preface
12 pages
1 - Power BI - Query Editor - Introduction
No ratings yet
1 - Power BI - Query Editor - Introduction
43 pages
Sample 0811 Advanced Analytics With Power Bi
100% (1)
Sample 0811 Advanced Analytics With Power Bi
7 pages
Data Analyst Roles and Job Descriptions
No ratings yet
Data Analyst Roles and Job Descriptions
3 pages
Data Analysis
No ratings yet
Data Analysis
5 pages
Pivot Tables
No ratings yet
Pivot Tables
44 pages
Scraping Book Python PDF
No ratings yet
Scraping Book Python PDF
50 pages
Non Tech Data Analytics Roadmap 1689017100
No ratings yet
Non Tech Data Analytics Roadmap 1689017100
10 pages
Become A Data Analyst in One Month
No ratings yet
Become A Data Analyst in One Month
1 page
Data Analytical Roadmap
No ratings yet
Data Analytical Roadmap
10 pages
Data Science PPT-2
No ratings yet
Data Science PPT-2
34 pages
DATA ANALYST ROADMAP 2.0 Syllabus
100% (1)
DATA ANALYST ROADMAP 2.0 Syllabus
17 pages
Kickstart Career As Data Analyst
No ratings yet
Kickstart Career As Data Analyst
44 pages
365 Data Science S Comprehensive Data Analyst Career 1685186251
No ratings yet
365 Data Science S Comprehensive Data Analyst Career 1685186251
51 pages
RM 613 Edited Part-1
No ratings yet
RM 613 Edited Part-1
44 pages
5 Data Analytics Projects For Beginners - CourseraG
No ratings yet
5 Data Analytics Projects For Beginners - CourseraG
6 pages
Data Mining Project Report Template
No ratings yet
Data Mining Project Report Template
3 pages
Cardio Fitness Project
0% (1)
Cardio Fitness Project
2 pages
Preparing Data For Analysis Using Excel
No ratings yet
Preparing Data For Analysis Using Excel
10 pages
Day65 - Day70 Power BI Interview
No ratings yet
Day65 - Day70 Power BI Interview
31 pages
Data-Analyst - ERT
No ratings yet
Data-Analyst - ERT
21 pages
How To Write A Data Analyst Job Description
No ratings yet
How To Write A Data Analyst Job Description
2 pages
Module BI1-M2 Data Warehouse Architecture: SAP University Alliances
No ratings yet
Module BI1-M2 Data Warehouse Architecture: SAP University Alliances
15 pages
Chapter 04
No ratings yet
Chapter 04
70 pages
Writing A Thesis Report
100% (3)
Writing A Thesis Report
7 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Data Analyst Roadmap New
No ratings yet
Data Analyst Roadmap New
9 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
7 pages
Data Analyst
No ratings yet
Data Analyst
2 pages
Week 4 - 5 - Data Preprocessing
No ratings yet
Week 4 - 5 - Data Preprocessing
67 pages
Technical Writing
No ratings yet
Technical Writing
3 pages
MATH6200 - Data Analysis
No ratings yet
MATH6200 - Data Analysis
4 pages
Data Analyst Masters Program
No ratings yet
Data Analyst Masters Program
34 pages
Co 2 Multivariate Analysis
No ratings yet
Co 2 Multivariate Analysis
71 pages
Updated-Module 4-Sampling and Data Preparation
No ratings yet
Updated-Module 4-Sampling and Data Preparation
58 pages
Senior Data Analyst
No ratings yet
Senior Data Analyst
7 pages
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
No ratings yet
Introduction To MS Power BI Desktop - Exercise 02 - Deeper Understanding Power BI ETL - V03
6 pages
CODE201911 Practices DataVisualizations
No ratings yet
CODE201911 Practices DataVisualizations
19 pages
5 Data Analytics Projects For Beginners - Coursera
No ratings yet
5 Data Analytics Projects For Beginners - Coursera
7 pages
Tableau Tutorial For Beginners 1
No ratings yet
Tableau Tutorial For Beginners 1
8 pages
Stanley Nwador Data Analyst Resume
No ratings yet
Stanley Nwador Data Analyst Resume
3 pages
CEE Candidate Report
No ratings yet
CEE Candidate Report
45 pages
Data Analyst Career
No ratings yet
Data Analyst Career
2 pages
Prediction of Air Quality Index Using Supervised Machine Learning
No ratings yet
Prediction of Air Quality Index Using Supervised Machine Learning
14 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
44 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
7 pages
The Crowdsourced Guide To The KPMG Virtual Internship PDF
No ratings yet
The Crowdsourced Guide To The KPMG Virtual Internship PDF
15 pages
A Study On Customer Awareness Towards Loan Products & Services in Bangalore With Special Reference To State Bank of India
No ratings yet
A Study On Customer Awareness Towards Loan Products & Services in Bangalore With Special Reference To State Bank of India
16 pages
Deep Learning and The Future of Auditing: N Rief
No ratings yet
Deep Learning and The Future of Auditing: N Rief
7 pages
s10212 022 00601 4
No ratings yet
s10212 022 00601 4
22 pages
Data Mining
No ratings yet
Data Mining
16 pages
Rajat AMAZON - BA Resume
No ratings yet
Rajat AMAZON - BA Resume
1 page
MBA6018 - U05a1 - Data Gathering and Analysis
No ratings yet
MBA6018 - U05a1 - Data Gathering and Analysis
6 pages
MBA SIP Format v22.2
No ratings yet
MBA SIP Format v22.2
3 pages
DAta Analyst Resume
No ratings yet
DAta Analyst Resume
3 pages
Embry 7.2 Assignment
No ratings yet
Embry 7.2 Assignment
2 pages
My Part-Time Study Notes on Mssql Server
From Everand
My Part-Time Study Notes on Mssql Server
Morris Sebenzile Mntoninzi
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet

Data Analyst 101

Uploaded by

Data Analyst 101

Uploaded by

Data Analyst 101 : Reflection and Insights to become a Great Data Analyst

1. Understand the business problem with business stakeholders.

4. Present findings intuitively

You might also like