Intro to Data Analytics Activity Templates
Intro to Data Analytics Activity Templates
● You’ll use this file for the entirety of this course. Save it in a place where you can easily
access it over the upcoming weeks.
○ You can edit and save this document in Google Drive
○ If you download this document, keep it in a place you can find it later
● The content you put into this document will be used for later lessons
○ It is recommended that you do not skip any activity in any of the lessons
○ It is recommended that you update this document after every week of content
and start with week 2
● Requirements:
○ Answer all the questions in this document
○ When complete, download this as a PDF document for submission in the peer
review assignment.
○ Don’t know how to download it as a PDF? You can find more information about
downloading this by clicking here.
○ Remove this section before submitting
Content
Week 2 Activity: Obtaining and Scrubbing Data
Week 3 Activity: Exploring and Modeling Data
Week 4 Activity: iNterpreting Data
Week 2 Activity: Obtaining and Scrubbing Data
Anna owns a clothing boutique in New York, called BrightThreads. She sells a mix of clothing
brands and chooses items for her store that she believes her clients will like. She also sells
online.
Anna is working on long-term planning for the upcoming year at BrightThreads. Business has
been going well, but she would really like to increase sales and potentially open up a second
location in a different neighborhood. Next year, Anna would like to increase her total sales by
10%. This would be a very good year for Anna and BrightThreads, but it seems doable based
on the last few quarters and with some hard work.
Using this information, answer the questions below regarding the obtain and scrub stages of the
OSEMN process. Add your answers to the template below.
In this scenario, what is a SMART goal that would benefit from data analysis?
What is a Primary KPI that would be useful to analyze for this goal?
The total revenue generated from both: in store and online sales.
Understanding customer purchase patterns and demographics allows the business to tailor
their inventory and marketing strategies, while tracking sales trends across different channels
helps identify growth opportunities and optimize resource allocation.
How do you imagine you could obtain this data? What sources would you gather data from?
Specifically, note what kind of data (first-party, third-party) and what methods you might use
(survey, web analytics).
First party data: Sales transactions (POS system), customer behavior (CRM, web analytics),
and surveys.
Third Party Data: Market research reports and demographic data from industry sources.
Anna at BrightThreads has begun the process of gathering data to help analyze current sales.
She has collected data on recent online sales directly from the online storefront.
Access this sample Customer Data and click on Use Template in the upper right corner. You will
need to be logged into a Google account to use this template.
Anna has isolated 4 different segments that each have issues that need to be fixed. You can
access each segment in the four sheets in this one spreadsheet. Click on each sheet for a
different segment of the dataset. You can click on the tabs at the bottom of the spreadsheet to
move between sheets. Review the image below for a preview:
The four sheets are accessible by clicking the tabs at the bottom of the spreadsheet.
Using what you know about data validity, do you think the data Anna has gathered is valid? Why
or why not?
Is valid because with these data we can start studying about to achieve SMART goal and our
primary KPI.
This is the only segment that looks like is in order, no issues, missing values or wrong ones.
Some customer zip addresses are dirty, we have to equalize them with the another values.
In the segment 3, there are some missing elements such as an order number and customer
zip.
What issue did you identify in segment 4 of the data?
Access BrightThread’s online sales data and click on Use Template in the upper right corner to
access the dataset. Please note you will need to be logged into a Google account.
Review the following data and charts, then share what you can learn in the exploration stage of
the OSEMN process.
Using this information, answer the questions below regarding the explore and model stages of
the OSEMN process. Add your answers to the template below.
What are some things you can tell about this dataset? For instance, what does the size of the
dataset tell you?
It is a small dataset. Its easy to study, create the graphics and explain them really well. Of
course, the lack of more data always makes less accurate our forecast for the future sales.
Both: Numerical: (sale data, customer id, order number, quant items per order, order total.
Categorical: (Zip code, item category)
Reviewing this data, what is the minimum value in the order_total column? What is the maximum
value in order_total column?
The minimum value in the order_total column is 39.99, and the maximum value is 149.99.
What kind of chart would you use to help visualize this data?
A line chart is ideal for tracking sales trends over time, while a histogram or box plot helps
visualize the distribution of order values.
Based on what you have learned, would you add an additional column to this dataset using
feature engineering? For instance, using the sales dates, would it be helpful to add in the day of
the week data?
Yes, adding a day of the week column would be helpful. It could reveal trends in purchasing
behavior, such as which days have the highest or lowest sales. Also we can add the month or
the day of the week when the purchase was made (Weekday or weekend)
Anna has created the following chart to explore the relationship between order totals and the
number of orders.
Based on the data in this chart, what would be a good title for this chart?
Somebody who spends between $75 and $95 dollars often trend to spend more quantity than
the people who spend less of $75 and more than $95
Anna has also been analyzing data on the amount of money she spends on social media ads
and how many clicks to the BrightThreads website they are generating.
Do you notice any correlations between the variables in this chart? If so, how would you
describe them?
Yes, we can evidence a POSITIVE correlation between the clicks on the ad and weekly social
media money spent in the ad. Because more money spent on the weekly ads, more clicks
appears on the ad itself.
Anna has learned a lot while exploring the data she has gathered. Now, it’s time to model some
of this data.
Reviewing this linear regression model, roughly how many site visits can be expected if the
marketing budget is increased to $250?
If anna spends around $250, with the positive correlation she can expects around 12 site
visits.
Review this linear regression model which shows the actual data values and the values
predicted by the model when given a test set. Do you think that this model is sufficient for
general use for this data? Why or why not?
This model shows a slight difference between the actual and predicted regression positive line
that we can expect on the money spent on the ads vs the site visits. Although,
The slope of the curve in the current prediction shows that it started a little higher than the
predicted one, but as it progresses, its slope is decreasing, while the predictive one is
increasing. Both lines touch each other at the point (100, 5) and (200, 10), which makes them
slightly similar and effective for use in the current prediction.
Review this clustering model. A clustering algorithm has been used and identified two
groups.How would you describe the two different customer groups? Why?
Their order totals range from approximately $40 to $80 and this group likely consists of
budget-conscious or casual buyers who make smaller purchases.
Their order totals range from about $80 to $150 or more and this group includes customers
who make larger purchases, possibly taking advantage of bulk buying or premium products.
You are trying to forecast BrightThreads sales in the coming quarter- what model might you
use? Why did you choose this?
To forecast BrightThreads sales for the next quarter, you could use a trendline or a regression
model in Excel:
Trendline: Helps visualize and extend past sales trends into the future using linear,
polynomial, or moving average methods.
Regression Model: Useful if multiple factors (like: seasonality, promotions, etc.) impact sales,
using Excel’s LINEST function or Data Analysis ToolPak.
Review the presentation, then share your thoughts on Anna’s interpretation of the data at the
end of OSEMN process.
Using this information, answer the questions below regarding the interpret stage of the OSEMN
process. Add your answers to the template below.
The data helps answer Anna’s questions by identifying sales trends, customer purchasing
behaviors, and key growth opportunities. It provides insights into which products perform best
and when, helping shape strategies to achieve the 10% sales increase goal.
Anna can apply this in a business context by using the insights to optimize marketing
strategies, adjust inventory based on demand patterns, and tailor promotions to high-
performing products or customer segments. This data-driven approach will help BrightThreads
make informed decisions to achieve its 10% sales growth goal.
Slides 2 to 6
What slides in the presentation covered the methods used in the project?
Slide 7
Slides 8 to 10
Slide 15
Slide 16
In your opinion, what parts of the presentation were meant to explain, engage, and enlighten the
audience? Why?
I think all the presentation is necessary to understand how we are going to increase the sales
in 10% next month, because we have to go through all the OSEMN framework and analyze
step by step since our SMART goal how are we going to achieve the goal.
In your opinion, what parts of the presentation were the setup, buildup, climax, and conclusion?
Why?
Setup: Slides 1 to 6
Build up: Slides 7 to 12
Climax: Slides 13 to 16