0% found this document useful (0 votes)
208 views10 pages

LEARN - Statistics For Data Analysis

The document provides an overview of the statistics workflow, including understanding sample data through descriptive statistics, modeling data using probability distributions, leveraging the central limit theorem to draw conclusions about populations from samples, and using additional variables to increase the accuracy of estimates and predictions. It discusses understanding what the sample data looks like, using probability distributions to model populations if the sample fits a distribution or the central limit theorem if it does not, and continuing to leverage the central limit theorem to draw conclusions about populations from samples.

Uploaded by

Shreeya Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views10 pages

LEARN - Statistics For Data Analysis

The document provides an overview of the statistics workflow, including understanding sample data through descriptive statistics, modeling data using probability distributions, leveraging the central limit theorem to draw conclusions about populations from samples, and using additional variables to increase the accuracy of estimates and predictions. It discusses understanding what the sample data looks like, using probability distributions to model populations if the sample fits a distribution or the central limit theorem if it does not, and continuing to leverage the central limit theorem to draw conclusions about populations from samples.

Uploaded by

Shreeya Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

THE STATISTICS WORKFLOW

Understand what If the sample data If the sample doesn’t Continue to leverage Use additional
your sample data fits a probability fit a distribution, use the central limit variables to increase
looks like distribution, use it as the central limit theorem to draw the accuracy of your
a model for the theorem to make conclusions about estimates and make
entire population estimates about what a population predictions based
population looks like based on a on their
parameters sample relationships

PRO TIP: do what is required for the task and don’t go


overboard. If you have all the population data, or simply need
a bit of inspiration to make an “unimportant” decision, then
descriptive statistics may be all you need!
Maven Analytics
DESCRIPTIVE STATS: 3 TYPES

Represents the frequency of each value Represents the middle of the values Represents the dispersion of the values

Examples: Examples: Examples:


• Frequency Tables • Mean, Median, and Mode • Min, Max, and Range
• Histograms • Skew • Quartiles & Interquartile Range
• Box & Whisker Plots
• Variance & Standard Deviation

Maven Analytics
PROBABILITY DISTRIBUTIONS

1) Discrete probability distributions


Uniform Binomial Poisson

The height of each bar is its probability


There are “gaps” between the numbers

2) Continuous probability distributions


Uniform Exponential Normal The height of the curve is NOT its
probability, the area under the curve is
(more on this later!)

The numbers can take any value

Maven Analytics
CONFIDENCE INTERVALS
A confidence interval is an estimate of an unknown population value using a sample
• It is a range defined by a point estimate, like the sample mean, plus/minus a margin of error
• It includes a confidence level, or probability of including the population value (can’t be
certain!)

The area is the


μ confidence level
Estimating the population mean:
The distance

𝑥̅ = 𝟐𝟑𝟗. 𝟗 between the mean


and bounds is the
margin of error

𝜇= ?
Remember, the sample means are normally 239.9
distributed around the population mean
239.9

239.9

239.9

It’s possible, but not probable, that


the interval won’t include the mean!
n=95

Maven Analytics
HYPOTHESIS TESTING

Ho: μ = μo Ho: μ ≥ μo Ho: μ ≤ μo


Ha: μ ≠ μo Ha: μ < μo Ha: μ > μo

μo μo μo

p/2 p/2 p p

tlower tupper t t

Excel p-value formulas: Excel p-value formulas: Excel p-value formulas:

=T.DIST(tlower, df, TRUE)*2 =T.DIST(t, df, TRUE) =1-T.DIST(t, df, TRUE)


=T.DIST.2T(tupper, df) =T.DIST.RT(t, df)

Maven Analytics
REGRESSION ANALYSIS
The goal of regression is to predict a dependent variable using independent variables
• This is achieved by fitting a line through the sample data points that models the population

This line is a model that can be used


to predict site traffic in a given month
based on the advertising budget!

This is the dependent variable (y), This is the independent variable (x), which
which is what you’re trying to predict helps you predict the dependent variable

Maven Analytics
NEW COURSE: STATS FOR
DATA ANALYSIS!
Discuss the role of statistics in the context of business
Why Statistics? intelligence and decision-making, and introduce the statistics
workflow

Understand data using descriptive statistics, including


Descriptive Statistics frequency distributions and measures of central tendency &
variability

Model data with probability distributions, and use the


Probability Distributions normal distribution to calculate probabilities and make value
estimates

Introduce the Central Limit Theorem, which leverages the


Central Limit Theorem normal distribution to make inferences on populations with
any distribution

Make estimates with confidence intervals, which use sample


Confidence Intervals statistics to define a range where an unknown population
parameter likely lies

Draw conclusions with hypothesis tests, which let you


Hypothesis Tests evaluate assumptions about population parameters using
sample statistics

Make predictions with regression analysis, and estimate the


Regression Analysis values of a dependent variable via its relationship with
independent variables
Maven Analytics
THE COURSE PROJECT
You’ve just been hired as a Recruitment Analyst by Maven Business School, an
online startup that’s looking to disrupt the postgraduate programs offered by
traditional universities

You have data from the first graduating class of their MBA program, including
details & scores from their application, the program itself, and their employment
status 2 months later

Your goal is to leverage statistics to evaluate the results of this class, predict the
performance of future classes, and propose changes in recruitment to improve
graduate outcomes

• Understand the data with descriptive statistics


• Model the data with probability distributions
• Make estimates with confidence intervals
• Draw conclusions with hypothesis tests
• Make predictions with regression analysis

Maven Analytics
COURSE EXPECTATIONS
This course is about introducing & demystifying essential statistics concepts
• Our goal is to break down seemingly complex techniques using simple and intuitive
explanations that will help you develop an intuition into when, why, and how to
use them in the real world

It’s also about applying those concepts to real-world use cases


• As we introduce each topic, we’ll use Microsoft Excel as a tool to apply them through
hands-on demos & assignments, and include additional projects to test your
knowledge in different scenarios

We’ll be using Excel for Office 365 on a PC for the course demos
• What you see on your screen may not always match what you see on ours, especially
if you are running a different operating system or following along with an older
version of Excel

You do NOT need a math or stats background to take this course


• Although we will cover many statistical equations (and their equivalent Excel
functions), the focus will be placed on the meaning behind them and not in the
technical details or proof
Maven Analytics
WHERE TO FIND THE COURSE
Like all Maven Analytics courses, Statistics
for Data Analysis is included with an
unlimited access subscription at
mavenanalytics.io

For those who prefer to purchase individual


courses, this one just went live on Udemy
and you can get it for $9.99 with code:
STATSFORDATA
udemy.com/course/essential-statistics-for-
data-analysis/
Maven Analytics

You might also like