0% found this document useful (0 votes)
11 views

Excel DataAnalysis

The document discusses using Excel to analyze sales data from an ice cream truck business. It introduces sample sales data and describes how to create pivot tables to summarize total sales and customers by date, location, and other factors. It also covers using correlation, regression, and descriptive statistics tools to analyze relationships in the data and describe variables.

Uploaded by

tom a
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Excel DataAnalysis

The document discusses using Excel to analyze sales data from an ice cream truck business. It introduces sample sales data and describes how to create pivot tables to summarize total sales and customers by date, location, and other factors. It also covers using correlation, regression, and descriptive statistics tools to analyze relationships in the data and describe variables.

Uploaded by

tom a
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Excel Data Analysis

https://www.depts.ttu.edu/itts/training/shortcourses/handouts.php
OR
Google "TTU Short Course", click the link and then "Handouts"

Benjamin Chamness
Accessibility & Data Analyst
ITTS Technology Support, IT Team Web
Material Notes
• Information here may apply to other spreadsheet software tools
(LibreOffice, OpenOffice, Google Sheets, etc.).
• These other tools are not approved for use with TTU data.

• All sample data here is either fully anonymized and random OR


publicly available.
• Any connection to real TTU data / events is not intentional.
• I do not intend to make any statements with the sample data presented.

• I don’t know your data – so my examples may not be 100% accurate


with your processing.

• I may touch on concepts in statistics, but this course isn’t meant to


teach you statistics.
Outline
• Our Test Data
• Pivot Tables and Charts
• Analysis ToolPak
Our Test Data
Outline
• Our Test Data
• Pivot Tables and Charts
• Analysis ToolPak
Our Test Data – Scoop! There It Is!
• Scoop! There It Is! is an ice cream truck serving cool treats to
people all over the city.
• We have 1 year of sales data, which includes:
• How much we posted on social media
• Where the truck was located
• What the weather was like (high temperature and if it was raining)
• How many people passed the truck
• How many people made a purchase
• The total sales from the day
• Not every day is tracked, because even ice cream trucks need
a break!
Our Test Data – Scoop! There It Is!
About Our Data
Categories: Facts:

Things that describe our data. These The values we care about for our
can help “Group” our data for insights. business needs. These values are often
summarized.
• Date
• Weekday • Foot Traffic
• Location • Customer Count
• Social Media Posts • Total Sales
• High Temp • Customer Conversion (*careful
when summarizing
• Rain percentages)
• Foot Traffic
Pivot Tables and Charts
Outline
• Our Test Data
• Pivot Tables and Charts
• Analysis ToolPak
Pivot Tables and Charts
• Pivot Tables are an easy way to build reports on your data in
Excel
• Allow you to group your data by category (date, location,
temperature, college, major, residence hall, etc)
• Allows you to summarize the data by some calculation (total
sales, average traffic, count of students, etc.)
What reports may be relevant with our
data?
• Total sales and customers by date and location
• Average sales and customers by day of week and location
• Average sales by temperature
Pivot Table Creation
1. Select all your data (ideally by selecting the column letters)
2. Go to the “Insert” menu and select “PivotTable”
3. Confirm that the “Table/Range” field is the correct value (for
our test data, it should be “SampleData!$A:$J”)
4. State where you want the PivotTable to be created. The
default of “New Worksheet” is fine for our purposes, but you
could also tell Excel to crate the PivotTable in a specific cell.
PivotTable Parts
• Field List shows all of
your available data
fields.
• Filters will filter your
data by a category.
• Rows and Columns
break data out by
categories.
• Values will hold your
data summaries.
About Our Data
• Categories: • Facts:
• Date • Foot Traffic
• Weekday • Customer Count
• Location • Total Sales
• Social Media Posts • Customer Conversion (*careful
• High Temp when summarizing
• Rain percentages)
• Foot Traffic
Use these in “Rows”, Use these in “Values”
“Columns”, and “Filters”
Rows and Columns
• Will identify distinct category values in your dataset
• Will separate your data points by the categories
• Will NOT fill in any gaps (days of the week not represented)
Values
• Perform some summary (count, average, sum, etc) on a "fact" in your
dataset
• Will NOT fill in any gaps (data intersections not represented in your data)
• Do not summarize on percentage data
Report 1: Total sales and customers by
time and location

• Drag “Date” to rows, which automatically


summarizes by month
• Drag Location to columns
• Drag Sales and Customers to values
• Change Sales to report in dollar values
Report 2: Average sales and customers
by day of week and location
Report 3: Average sales by temperature
Charts
• Bar charts show counts or totals grouped by a category
• Pie charts show how distinct categories make up parts of a
whole
• Line charts show changes in value over time

• As with other charts in Excel, you have all of the same tools to
adjust labels, titles, colors, and other chart elements
Bar and Pie Chart Examples
Stacked Area Chart Example
• Set "Sum of
Total Sales" to
be a "Running
Total In > Date"
• Notice date
gap: Jan 9 - 20
• Solution: Build
a list of dates
and use
GetPivotData
to fetch values
Analysis ToolPak
Outline
• Our Test Data
• Pivot Tables and Charts
• Analysis ToolPak
To Enable:
1. Click on "File" in the top left
and then "Options" towards
the bottom.
2. In the "Excel Options" window
that opens, click on "Add-ins"
on the left, and then click the
button that says "Go…"
3. Check the box next to
"Analysis ToolPak" and then
click "OK".
4. When enabled, it should be in
the "Data" ribbon menu.
Analysis Limitations: Text
• String data is hard to analyze
• Binary strings (yes/no, true/false) can be mapped to 0 and 1
• Categories with some defined order can sometimes be mapped
to integers
• Freshman, Sophomore, Junior, Senior
• Strongly Agree, Agree, Disagree, Strongly Disagree
• Our “Rain” column is a binary yes/no, so it is coded as 1s and
0s
• Weekday and Location can’t be mapped to numbers as easily
Correlation
• Helps describe a relationship between variables in your dataset.
• A positive correlation means two variables will change in the
same direction (increase/decrease together).
• A negative correlation means two variables will change in
opposite directions (increase in one, decrease in the other).
• Values close to +/-1 show a strong correlation, values close to 0
show weak / no correlation

• Correlation is not causation: the relationship does not mean that


one change causes the other.
Correlation
1. Click the "Data Analysis" tool in the "Data" menu.
2. Select "Correlation"
3. Select your data for the "Input Range" (entering in column
letters), check the "Labels in First Row" box, and click "OK"
Social Media Posts High Temp Rain Foot Traffic Customer Count Total Sales
Social Media Posts 1.00
High Temp -0.07 1.00
Rain 0.01 -0.10 1.00
Foot Traffic 0.02 0.07 -0.40 1.00
Customer Count 0.25 0.35 -0.25 0.73 1.00
Total Sales 0.24 0.29 -0.17 0.60 0.91 1.00

• Green cells are our high(er) correlated values.


• Blue cells may be moderately correlated.
Regression
• Attempts to create a function that can predict a given value.
• In our case, given known values for temperature, rain, traffic,
etc – how can we predict our daily sales?
• Values:
• R Square: How well the model fits the given data, ideally above 0.95.
• Coefficient: This value will be multiplied by your variable to produce the
predicted output.
• P-Value: A p-value less than 0.05 means the variable is a likely
predictor of your output.
Regression
1. Click the "Data Analysis" tool in the "Data" menu.
2. Select "Regression"
3. Select the value you want to predict as the "Input Y Range"
4. Select the values that will influence this output as the "Input X
Range"
5. Check the "Labels" box and the "Confidence Interval" box,
then "OK"

Note: In items 3 and 4, you will need to specifically select the


cells with data (H1:H314), not just the full column (H:H)
Regression (R Square: 0.83)
Coefficients P-value Note
Intercept 335.88 0.32 Ignore: High P-Value
Social Media Posts -23.07 0.55 Ignore: High P-Value
High Temp -7.08 0.04
Rain 152.38 0.40 Ignore: High P-Value
Foot Traffic -2.55 0.00
Customer Count 23.65 6.09E-75

• Theoretically, we can predict sales with this formula:


dailySales = $-7.08 * highTemp + $152.38 * rain + $-7.08 * footTraffic +
$23.65 * customerCount
• This fails to consider the text category of "Location", which was
a large contributor of sales.
Descriptive Statistics
• Allows you to see some various values that describe a given
variable.
• Statistics:
• Mean – average
• Median – "middle" number when sorted low to high
• Standard Deviation – how much the value can vary
• Skewness – is the distribution of your data shifted to one side
Skewness
Skewness = 0
Value Frequency

Skewness > 0
Skewness < 0

Daily Sales
Descriptive Statistics
1. Click the "Data Analysis" tool in the "Data" menu.
2. Select "Descriptive Statistics"
3. Select your data for the "Input Range" (entering in the column
letters)
4. Check the "Labels in First Row" box and the "Summary
Statistics" box, then "OK"
Descriptive Statistics
Total Sales

• Average sales is $1371, but median


Mean 1370.937
Standard Error 139.6033
is $399.
Median 399.0955 • Lots of small sale days, a few REALLY
Mode large sale days
Standard Deviation 2469.834 • Standard Deviation is $2469, so the
Sample Variance 6100081 daily sale value varies quite a bit.
Kurtosis 7.653245
Skewness 2.805633 • Skewness is fairly large
Range 12895.57
Minimum 0
Maximum 12895.57
Sum 429103.4
Count 313
Actual Data Distribution
• Heavily skewed towards
smaller daily sales
• Positive Skewness
Questions? Answers?
Favorite PivotTable Uses?

You might also like