0% found this document useful (0 votes)

80 views47 pages

Evans Analytics2e PPT 04

The document discusses key concepts in descriptive statistics including measures of location, dispersion, and proportions for categorical variables. It defines important terms like population, sample, mean, median, range, interquartile range, variance, standard deviation, proportions, and correlation. Examples are provided to demonstrate how to calculate and interpret these descriptive statistics. Key points about outliers and how they can impact measures of location and dispersion are also covered.

Uploaded by

txuan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views47 pages

Evans Analytics2e PPT 04

Uploaded by

txuan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

shortened

 Notation
 Measures of Location
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 Population - all items of interest for a particular
decision or investigation
- all married drivers over 25 years old
- all subscribers to Netflix

 Sample - a subset of the population

- a list of individuals who rented a comedy from
Netflix in the past year

 The purpose of sampling is to obtain sufficient

information to draw a valid conclusion about a
population.

Is the Netflix sample above a good sample? Why?

Other ways to select a sample?
 We typically label the elements of a data set using subscripted
variables, x1, x2 , … , and so on, where xi represents the ith
observation. Upper-case letters like X represent often random
variables.

 It is common practice in statistics to use

◦ Greek letters, such as m (mu; mean), s (sigma; std. deviation), and p (pi;
proportion), to represent population measures and
◦ italic letters such as by 𝑥ҧ (called x-bar), s, and p to represent sample statistics.

 N represents the number of items in a population and n represents

the number of observations in a sample.
 Notation
 Measures of Location
 Mean
 Median
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 Population mean:

 Sample mean:

 Excel function: =AVERAGE(data range)

 Property of the mean:

 Outliers can affect the value of the mean.

 Mean valid for interval/ratio variables and often
questionable for ordinal variables.
Purchase Orders database
 Using formula:

=SUM(B2:B95)/COUNT(B2:B95)

Mean = $2,471,760/94
= $26,295.32

Using Excel AVERAGE Function

=AVERAGE(B2:B95)
Person Age Person Age
1 17 1 17
2 21 2 21
3 15 3 15
4 18 4 18
5 999 5
6 22 6 22
7 11 7 11
8 25 8 25
Mean 141.00 Mean 18.43

Wikipedia: In statistics, an outlier is an observation point that is distant from

other observations. An outlier may be due to variability in the
measurement or it may indicate experimental error; the latter are
sometimes excluded from the data set.
 The median specifies the middle value when the data are arranged
from least to greatest.
◦ Half the data are below the median, and half the data are above it.
◦ For an odd number of observations, the median is the middle of the
sorted numbers.
◦ For an even number of observations, the median is the mean of the two
middle numbers.
 We could use the Sort option in Excel to rank-order the data and
then determine the median. The Excel function =MEDIAN(data
range) could also be used.

 The median is meaningful for ratio, interval, and ordinal data.

 Not affected by outliers.
 Sort the data from smallest to largest. Since we
have 90 observations, the median is the average
of the 47th and 48th observation.

Median =
($15,562.50 + $15,750.00)/2
= $15,656.25

=MEDIAN(B2:B94)
Person Age
1 17.00
2 21.00
3 15.00
4 18.00
5 999.00
6 22.00
7 11.00
8 25.00
Mean 141.00
Median 19.50

Median is insensitive to outliers!

The Excel file Computer Repair Times includes 250
repair times for customers.
 What repair time would be
reasonable to quote to a
new customer?
 Median repair time is 2
weeks; mean and mode are
about 15 days.
 Examine the histogram.
90% are completed within 3 weeks

Distribution is important!
 Notation
 Measures of Location
 Measures of Dispersion
 Range
 Interquartile Range
 Variance
 Standard Deviation
 Empirical Rules
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 Dispersion refers to the degree of variation in
the data; that is, the numerical spread (or
compactness) of the data.

 Key measures:
◦ Range
◦ Interquartile range
◦ Variance
◦ Standard deviation
 The range is the simplest and is the difference
between the maximum value and the minimum
value in the data set.

 In Excel, compute as =MAX(data range) -

MIN(data range).

 The range is affected by outliers, and is often

used only for very small data sets.
 Purchase Orders data

 For the cost per order data:

◦ Maximum = $127,500
◦ Minimum = $68.78
 Range = $127,500 - $68.78 = $127,431.22
 The interquartile range (IQR), or the midspread
is the difference between the first and third
quartiles, Q3 – Q1.

 This includes only the middle 50% of the data and,

therefore, is not influenced by extreme values.
 Purchase Orders data
 For the Cost per order data:
 Third Quartile = Q3 = $27,593.75
 First Quartile = Q1 = $6,757.81
 Interquartile Range = $27,593.75 – $6,757.81
=$20,835.94
 The variance is the “average” of the squared
deviations from the mean.
 For a population:

◦ In Excel: =VAR.P(data range)

 For a sample:

◦ In Excel: =VAR.S(data range)

 Note the difference in denominators!

 The standard deviation is the square root of the
variance.
◦ Note that the dimension of the variance is the square of the
dimension of the observations, whereas the dimension of the
standard deviation is the same as the data. This makes the
standard deviation more practical to use in applications.
 For a population:

◦ In Excel: =STDEV.P(data range)

 For a sample:

◦ In Excel: =STDEV.S(data range)

Excel file: Closing Stock
Prices
Intel (INTC):
Mean = $18.81
Standard deviation = $0.50
General Electric (GE):
Mean = $16.19
Standard deviation = $0.35

INTC is a higher risk

investment than GE.
 For many data sets encountered in practice:
 Approximately 68% of the observations fall within one
standard deviation of the mean
 Approximately 95% fall within two standard deviations of
the mean
 Approximately 99.7% fall within three standard deviations
of the mean

 These rules are commonly used to characterize

the natural variation in manufacturing processes
and other business phenomena.
 The empirical Rule comes from the normal distribution .

Most data does not follow a normal distribution!

 For any data set (any distribution), the
proportion of values that lie within +/- k (k > 1)
standard deviations of the mean is at least 1 –
1/k2

 Examples:
◦ For k = 2: at least ¾ or 75% of the data lie within two
standard deviations of the mean
◦ For k = 3: at least 8/9 or 89% of the data lie within
three standard deviations of the mean
 Notation
 Measures of Location
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 A standardized value, commonly called a z-score,
provides a relative measure of the distance an
observation is from the mean, which is independent of
the units of measurement.
 The z-score for the ith observation in a data set is
calculated as follows:

◦ Excel function: =STANDARDIZE(x, mean, standard_dev).

Standardized data is needed by many predictive

methods since it makes variables comparable.
 Purchase Orders Cost per order data

=(B2 - $B$97)/$B$98, or
=STANDARDIZE(B2,$B$97,$B$98).

0
1
 Notation
 Measures of Location
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 The proportion, denoted by p, is the fraction of
data that have a certain characteristic.

 Proportions are key descriptive statistics for

categorical data, such as defects or errors in
quality control applications or consumer
preferences in market research.

 Example: Proportion of female students is 60%.

 Proportion of orders placed by Spacetime Technologies
=COUNTIF(A4:A97, “Spacetime Technologies”)/94
= 12/94 = 0.128
 Notation
 Measures of Location
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Correlation
 Outliers
 Two variables have a strong statistical relationship
with one another if they appear to “move” together.

 When two variables appear to be related, you

might suspect a cause-and-effect relationship.

 Caution: Correlation does not prove causation!

Statistical relationships may exist even though a
change in one variable is not caused by a change
in the other.
 Covariance is a measure of the linear association between two
variables, X and Y. Like the variance, different formulas are used for
populations and samples.
 Population covariance:

◦ Excel function: =COVARIANCE.P(array1,array2)

 Sample covariance:

◦ Excel function: =COVARIANCE.S(array1,array2)

 The covariance between X and Y is the average of the product of
the deviations of each pair of observations from their respective
means.
 Colleges and
Universities data
 Correlation is a measure of the linear relationship between two
variables, X and Y, which does not depend on the units of
measurement.
 Correlation is measured by the correlation coefficient, also known as
the Pearson product moment correlation coefficient.
 Correlation coefficient for a population:

 Correlation coefficient for a sample:

 The correlation coefficient is scaled between -1 and 1.

 Excel function: =CORREL(array1,array2)
Why is correlation important?
 Colleges and Universities data
 Is a schools graduation rate related to the SAT score of
incoming students?

Is there a causal relationship?

Data >
Data Analysis >
Correlation

 Excel computes the correlation coefficient

between all pairs of variables in the Input Range.
Input Range data must be in contiguous columns.
 Colleges and Universities data

◦ Moderate negative correlation between acceptance rate and

graduation rate, indicating that schools with lower acceptance
rates have higher graduation rates.
◦ Acceptance rate is also negatively correlated with the median
SAT and Top 10% HS, suggesting that schools with lower
acceptance rates have higher student profiles.
◦ The correlations with Expenditures/Student suggest that schools
with higher student profiles spend more money per student.
Value Field Settings include several statistical
measures:
 Average
 Max and Min
 Product
 Standard deviation
 Variance
 Credit Risk Data
 First, create a PivotTable.
 In the PivotTable Field List, move Job to the Row Labels
field and Checking and Savings to the Values field. Then
change the field settings from “Sum of Checking” and
“Sum of Savings” to the averages.
 Notation
 Measures of Location
 Measures of Dispersion
 Standardization
 Proportions for Categorical Variables
 Measures of Association
 Outliers
 There is no standard definition of what constitutes an
outlier!

 Wikipedia: “In statistics, an outlier is an observation point that is

distant from other observations. […] Outliers can occur by
chance in any distribution, but they often indicate either
measurement error or that the population has a heavy-tailed
distribution.”

 If the outlier is due to a measurement error then we often want to

exclude it from the analysis.

 Some typical rules of thumb:

 Normal distribution: z-scores greater than +3 or less than -3
 Boxplot:
 Extreme outliers are more than 3*IQR to the left of Q1 or right of Q3
 Mild outliers are between 1.5*IQR and 3*IQR to the left of Q1 or right of Q3
 Home Market Value data

 None of the z-scores exceed 3. However, while

individual variables might not exhibit outliers,
combinations of them might.
◦ The last observation has a high market value ($120,700) but
a relatively small house size (1,581 square feet) and may be
an outlier.
 Excel file Surgery Infections
◦ Is month 12 simply random variation or some explainable
phenomenon?
 Three-standard deviation empirical rule:

 There is only a 0.3% (for normally distributed data) or a 11% (for any
distribution) chance to see an observation outside +/- 3 std.dev.
 This suggests that month 12 is statistically different from the rest of
the data.

Overcoming Addictive Behavior - Neil T. Anderson
100% (1)
Overcoming Addictive Behavior - Neil T. Anderson
191 pages
Business Analytics Course Summary
No ratings yet
Business Analytics Course Summary
15 pages
Evans Analytics1e PPT 04
No ratings yet
Evans Analytics1e PPT 04
64 pages
Popcorn Lab Ideal Gas Law
0% (1)
Popcorn Lab Ideal Gas Law
3 pages
Evans Analytics2e PPT 04 Revised
No ratings yet
Evans Analytics2e PPT 04 Revised
51 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
(IN) Measures
No ratings yet
(IN) Measures
11 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
63 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
57 pages
Chapter 4
No ratings yet
Chapter 4
19 pages
Descriptive Statistical Measures
No ratings yet
Descriptive Statistical Measures
18 pages
Torturing Excel Into Doing Statistics: Preparing Your Spreadsheet
No ratings yet
Torturing Excel Into Doing Statistics: Preparing Your Spreadsheet
10 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
74 pages
Business Statistics and Analysis Course 2&3
No ratings yet
Business Statistics and Analysis Course 2&3
42 pages
Statistics Refresher
No ratings yet
Statistics Refresher
11 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Excel Formulas
No ratings yet
Excel Formulas
8 pages
Notes Week 3
No ratings yet
Notes Week 3
4 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
1.Business Analytics Course Summary-merged
No ratings yet
1.Business Analytics Course Summary-merged
30 pages
Basics of Statistics For Analytics Using SAS/ Excel
No ratings yet
Basics of Statistics For Analytics Using SAS/ Excel
28 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Quantitative AnalysisJD
No ratings yet
Quantitative AnalysisJD
64 pages
Section 1 Introduction To Statistics Slides
No ratings yet
Section 1 Introduction To Statistics Slides
41 pages
Midterms Day 4
No ratings yet
Midterms Day 4
51 pages
Measures of Dispersion Updated
No ratings yet
Measures of Dispersion Updated
38 pages
Lesson Recap
No ratings yet
Lesson Recap
106 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Tutoring Study Plan
No ratings yet
Tutoring Study Plan
17 pages
STAT241 – Business Statistics (Day 3)
No ratings yet
STAT241 – Business Statistics (Day 3)
32 pages
ISA Summary Toya
No ratings yet
ISA Summary Toya
38 pages
History Reporting
No ratings yet
History Reporting
61 pages
Stats For Data Analytics
No ratings yet
Stats For Data Analytics
87 pages
Stastical Data Analysis: A Lokeshwari 22N31E0014
No ratings yet
Stastical Data Analysis: A Lokeshwari 22N31E0014
30 pages
I am sharing 'DOC-20250811-WA0005.' with you
No ratings yet
I am sharing 'DOC-20250811-WA0005.' with you
16 pages
SDA 3E Chapter 2
No ratings yet
SDA 3E Chapter 2
40 pages
Stat App CH 2
No ratings yet
Stat App CH 2
7 pages
Lecture 03
No ratings yet
Lecture 03
31 pages
Lecture2 - Descriptive Statistics - 0909
No ratings yet
Lecture2 - Descriptive Statistics - 0909
29 pages
Data Science Course
No ratings yet
Data Science Course
50 pages
Lesson 09 - 10 - Statistics in Excel-New
No ratings yet
Lesson 09 - 10 - Statistics in Excel-New
15 pages
Basic Stats Session
No ratings yet
Basic Stats Session
16 pages
Chapter 2 bsc TY statistical data analysis
No ratings yet
Chapter 2 bsc TY statistical data analysis
124 pages
ECON1203/ECON2292 Business and Economic Statistics: Week 2
No ratings yet
ECON1203/ECON2292 Business and Economic Statistics: Week 2
10 pages
Data Management 2
No ratings yet
Data Management 2
18 pages
Module 1 Statistical Inference
No ratings yet
Module 1 Statistical Inference
67 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Basics of Biostatistics: DR Sumanth MM
No ratings yet
Basics of Biostatistics: DR Sumanth MM
27 pages
Ch-9 Data Preparation and Preliminary Analysis
No ratings yet
Ch-9 Data Preparation and Preliminary Analysis
15 pages
Lecture of BIOSTATISTICS 12.2022 RMDC
No ratings yet
Lecture of BIOSTATISTICS 12.2022 RMDC
85 pages
SLG 4.3 Using Technology To Summarize Quantitative Variables
No ratings yet
SLG 4.3 Using Technology To Summarize Quantitative Variables
4 pages
Definitions of Descriptive Statistics of A Single Variable Generated by The Descriptive Statistics Tool in Excel's Data Analysis
No ratings yet
Definitions of Descriptive Statistics of A Single Variable Generated by The Descriptive Statistics Tool in Excel's Data Analysis
3 pages
Descriptive Stat Excel
No ratings yet
Descriptive Stat Excel
3 pages
Surgical Safety Checklist
No ratings yet
Surgical Safety Checklist
103 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
36 Gregorio F Ortega Et Al Vs Court of Appeals Et Al
No ratings yet
36 Gregorio F Ortega Et Al Vs Court of Appeals Et Al
5 pages
Global Insights
No ratings yet
Global Insights
184 pages
Overview of Grammar: Deviant Usage and Common Errors: Week 1
No ratings yet
Overview of Grammar: Deviant Usage and Common Errors: Week 1
66 pages
Configuring Route Redistribution - Lab
No ratings yet
Configuring Route Redistribution - Lab
6 pages
Societal Interships
No ratings yet
Societal Interships
16 pages
PAK301-MCQs Solved+midtermr
No ratings yet
PAK301-MCQs Solved+midtermr
10 pages
Sessions 3 and 4
No ratings yet
Sessions 3 and 4
14 pages
EDWARDS v. LEWIS Et Al - Document No. 4
No ratings yet
EDWARDS v. LEWIS Et Al - Document No. 4
16 pages
Defun (Define Function) : (Alert "Hello and Welcome To Autolisp!")
No ratings yet
Defun (Define Function) : (Alert "Hello and Welcome To Autolisp!")
4 pages
MAA00A1 2023.223209863.tut Test 5
No ratings yet
MAA00A1 2023.223209863.tut Test 5
1 page
Postal Assistant and Sorting Assistant
No ratings yet
Postal Assistant and Sorting Assistant
8 pages
AC Permission Letter
67% (6)
AC Permission Letter
1 page
GIWIndustries Booth1369 TBCBrochure.60fcea7e3f25a
No ratings yet
GIWIndustries Booth1369 TBCBrochure.60fcea7e3f25a
4 pages
Gastrulatio in Amphioxus
No ratings yet
Gastrulatio in Amphioxus
15 pages
HCA Meditech Instructions (Rev 1-10-18)
No ratings yet
HCA Meditech Instructions (Rev 1-10-18)
40 pages
ELL 700 Assignment 1
No ratings yet
ELL 700 Assignment 1
10 pages
Judiciary and Right To Information (Ms. Samrridhi Kumar) PDF
No ratings yet
Judiciary and Right To Information (Ms. Samrridhi Kumar) PDF
37 pages
Converting Between AES EBU and S PDIF Interfaces
No ratings yet
Converting Between AES EBU and S PDIF Interfaces
10 pages
Get Notified When Matric 10th Result Is Announced
No ratings yet
Get Notified When Matric 10th Result Is Announced
25 pages
Bank
No ratings yet
Bank
5 pages
SENECA
No ratings yet
SENECA
7 pages
College Art Association Art Journal: This Content Downloaded From 149.156.89.220 On Fri, 15 Sep 2017 19:57:52 UTC
No ratings yet
College Art Association Art Journal: This Content Downloaded From 149.156.89.220 On Fri, 15 Sep 2017 19:57:52 UTC
10 pages
Validated TVL Smaw11 q3 M 4
No ratings yet
Validated TVL Smaw11 q3 M 4
11 pages
Paul Murugesu
No ratings yet
Paul Murugesu
16 pages
Class - Ix - English - The Last Leaf - 2020 - 2021
No ratings yet
Class - Ix - English - The Last Leaf - 2020 - 2021
2 pages
CDC
No ratings yet
CDC
116 pages
4.1 The Writing Process & Avoiding Plagiarism
No ratings yet
4.1 The Writing Process & Avoiding Plagiarism
48 pages
Laylah Ali
No ratings yet
Laylah Ali
23 pages