0% found this document useful (0 votes)

118 views100 pages

Eba3e PPT ch04

Uploaded by

Nazia Enayet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views100 pages

Eba3e PPT ch04

Uploaded by

Nazia Enayet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 100

Business Analytics: Methods, Models,

and Decisions
Third Edition, Global Edition

Chapter 4
Descriptive Statistics

Copyright © 2021 Pearson Education Ltd. Slide - 1

Statistics
• Statistics, as defined by David Hand, past president of
the Royal Statistical Society in the UK, is both the
science of uncertainty and the technology of extracting
information from data.
– Statistics involves collecting, organizing, analyzing,
interpreting, and presenting data.
– A statistic is a summary measure of data.
• Descriptive statistics refers to methods of describing
and summarizing data using tabular, visual, and
quantitative techniques.

Copyright © 2021 Pearson Education Ltd. Slide - 2

Metrics and Data Classification
• Metric - a unit of measurement that provides a way
to objectively quantify performance.
• Measurement - the act of obtaining data associated
with a metric.
• Measures - numerical values associated with a
metric.

Copyright © 2021 Pearson Education Ltd. Slide - 3

Types of Metrics
• Discrete metric - one that is derived from counting
something.
– For example, a delivery is either on time or not; an order
is complete or incomplete; or an invoice can have one,
two, three, or any number of errors. Some discrete
metrics would be the proportion of on-time deliveries; the
number of incomplete orders each day, and the number
of errors per invoice.
• Continuous metrics are based on a continuous
scale of measurement.
– Any metrics involving dollars, length, time, volume, or
weight, for example, are continuous.
Copyright © 2021 Pearson Education Ltd. Slide - 4
Measurement Scales
• Categorical (nominal) data - sorted into
categories according to specified characteristics.
• Ordinal data - can be ordered or ranked according
to some relationship to one another.
• Interval data - ordinal but have constant
differences between observations and have
arbitrary zero points.
• Ratio data - continuous and have a natural zero.

Copyright © 2021 Pearson Education Ltd. Slide - 5

Example 4.1: Classifying Data
Elements

Copyright © 2021 Pearson Education Ltd. Slide - 6

Frequency Distributions and
Histograms
• A frequency distribution is a table that shows
the number of observations in each of several
nonoverlapping groups.

• A graphical depiction of a frequency distribution in

the form of a column chart is called a histogram.

Copyright © 2021 Pearson Education Ltd. Slide - 7

Frequency Distributions for
Categorical Data
• Categorical variables naturally define the groups
in a frequency distribution.
• To construct a frequency distribution, we need
only count the number of observations that appear
in each category.
– This can be done using the Excel COUNTIF function.

Copyright © 2021 Pearson Education Ltd. Slide - 8

Example 4.2: Constructing a Frequency Distribution
for Items in the Purchase Orders Database
• List the item names in a column on the spreadsheet.
• Use the
function
where is the cell containing the item
name.

Copyright © 2021 Pearson Education Ltd. Slide - 9

Example 4.2 Continued
• Construct a column chart to visualize the frequencies.

Copyright © 2021 Pearson Education Ltd. Slide - 10

Relative Frequency Distributions
• Relative frequency is the fraction, or proportion, of the
total.
• If a data set has n observations, the relative frequency of
category i is:

• We often multiply the relative frequencies by 100 to

express them as percentages.
• A relative frequency distribution is a tabular summary of
the relative frequencies of all categories.
Copyright © 2021 Pearson Education Ltd. Slide - 11
Example 4.3: Constructing a Relative Frequency
Distribution for Items in the Purchase Orders Database
• First, sum the frequencies to find the total number (note
that the sum of the frequencies must be the same as the
total number of observations, n).
• Then divide the frequency of each category by this value.

Copyright © 2021 Pearson Education Ltd. Slide - 12

Frequency Distributions for
Numerical Data
• For numerical data that consist of a small number of
discrete values, we may construct a frequency
distribution similar to the way we did for categorical
data; that is, we simply use COUNTIF to count the
frequencies of each discrete value.

Copyright © 2021 Pearson Education Ltd. Slide - 13

Example 4.4: Frequency and Relative
Frequency Distribution for A/P Terms
• In the Purchase Orders data, the A/P terms are all
whole numbers 15, 25, 30, and 45.

Copyright © 2021 Pearson Education Ltd. Slide - 14

Excel Histogram Tool
• Frequency distributions and histograms can be
created using the Analysis Toolpak in Excel.
– Click the Data Analysis tools button in the Analysis group
under the Data tab in the Excel menu bar and select
Histogram from the list.

Copyright © 2021 Pearson Education Ltd. Slide - 15

Histogram Dialog
• Specify the Input Range corresponding to the data. If you include the
column header, then also check the Labels box so Excel knows that
the range contains a label. The Bin Range defines the groups (Excel
calls these “bins”) used for the frequency distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 16

Using Bin Ranges
• If you do not specify a bin range, Excel will automatically
determine bin values for the frequency distribution and
histogram, which often results in a rather poor choice.
• If you have discrete values, set up a column of these
values in your spreadsheet for the bin range and specify
this range in the Bin Range field.

Copyright © 2021 Pearson Education Ltd. Slide - 17

Example 4.5: Using the Histogram
Tool
• We will create a frequency distribution and histogram for
the A/P Terms variable in the Purchase Orders database.
• We defined the bin range below the data in cells

Month
15
25
30
45

Copyright © 2021 Pearson Education Ltd. Slide - 18

Example 4.5 Continued
• Histogram tool results:

Copyright © 2021 Pearson Education Ltd. Slide - 19

Grouped Frequency Distributions
• For numerical data that have many different discrete values with little
repetition or are continuous, a frequency distribution requires that we
define by specifying
1. the number of groups,
2. the width of each group, and
3. the upper and lower limits of each group.
• Choose between 5 to 15 groups, and the range of each should be
equal.
• Choose the lower limit of the first group (LL) as a whole number
smaller than the minimum data value and the upper limit of the last
group (UL) as a whole number larger than the maximum data value.

Copyright © 2021 Pearson Education Ltd. Slide - 20

Example 4.6: Constructing a Frequency
Distribution and Histogram for Cost Per Order
• The data range from a minimum of $68.75 to a maximum of $127,500;
set the lower limit of the first group to $0 and the upper limit of the last
group to $130,000.
• If we select 5 groups, using equation (3.2) the width of each group is

Copyright © 2021 Pearson Education Ltd. Slide - 21

Example 4.6 Continued
• Ten-group histogram

Copyright © 2021 Pearson Education Ltd. Slide - 22

Cumulative Relative Frequency
Distributions
• The cumulative relative frequency represents
the proportion of the total number of
observations that fall at or below the upper limit
of each group.
• A tabular summary of cumulative relative
frequencies is called a cumulative relative
frequency distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 23

Example 4.7: Computing Cumulative
Relative Frequencies
• Set the cumulative relative frequency of the first group equal to its
relative frequency. Then add the relative frequency of the next group to
the cumulative relative frequency.
• For example, the cumulative relative frequency in cell D3 is
computed as =D2+C3 = 0.000 + 0.4468 = 0.4468.

Copyright © 2021 Pearson Education Ltd. Slide - 24

Constructing Frequency Distributions
Using PivotTables
• In the Purchase Orders data, we can simply
build a PivotTable to find a count of the number
of orders for each item.
• For continuous numerical data, we can also use
PivotTables to construct a grouped frequency
distribution.

Copyright © 2021 Pearson Education Ltd. Slide - 25

Example 4.8: Constructing a Grouped
Frequency Distribution Using PivotTables
1. Using the Purchase Orders database, create a
PivotTable as shown:

Copyright © 2021 Pearson Education Ltd. Slide - 26

Example 4.8 Continued
2. Click on any value in the Row Labels column, and from
the Analyze tab for PivotTable Tools, select Group Field. Edit
the dialog to start at 0 and end at 130000, and use 26000 as
the group range.

Copyright © 2021 Pearson Education Ltd. Slide - 27

Example 4.8 Continued
• Grouped frequency distribution results:

Copyright © 2021 Pearson Education Ltd. Slide - 28

Percentiles
• The kth percentile is a value at or below which at least k percent
of the observations lie. The most common way to compute the
kth percentile is to order the data values from smallest to largest
and calculate the rank of the kth percentile using the formula:

• Statistical software use different methods that often involve

interpolating between ranks instead of rounding, thus producing
different results.
– The Excel function computes the kth
percentile of data in the range specified in the array field, where k is in the
range 0 to 1, inclusive (i.e., including 0 and 1).

Copyright © 2021 Pearson Education Ltd. Slide - 29

Examples 4.9 and 4.10: Computing
Percentiles
• Compute the 90th percentile for Cost per order in the
Purchase Orders data.
• Rank of kth percentile
• n = 94; k = 90
• For the 90th percentile, the rank is

– Value of the 85th observation = $74,375

• Using the Excel function
the 90th percentile is
$73,737.50, which is different from using formula (3.3).
Copyright © 2021 Pearson Education Ltd. Slide - 30
Example 4.11: Excel Rank and
Percentile Tool

90.3rd percentile
= $74,375
(same result as manually
computing the 90th
percentile)

The Excel value of the 90th percentile that was computed in Example 4.9
as $74,375 is the 90.3rd percentile value.
Copyright © 2021 Pearson Education Ltd. Slide - 31
Quartiles
• Quartiles break the data into four parts.
– The 25th percentile is called the first quartile,Q1;
– the 50th percentile is called the second quartile, Q2;
– the 75th percentile is called the third quartile, Q3; and
– the 100th percentile is the fourth quartile, Q4.

• One-fourth of the data fall below the first quartile, one-half

are below the second quartile, and three-fourths are below
the third quartile.
• Excel function where
array specifies the range of the data and quart is a whole
number between 1 and 4, designating the desired quartile

Copyright © 2021 Pearson Education Ltd. Slide - 32

Example 4.12: Computing Quartiles in
Excel
• Compute the Quartiles of the Cost per Order data
– First quartile:
– Second quartile:
– Third quartile:
– Fourth quartile:

Copyright © 2021 Pearson Education Ltd. Slide - 33

Cross-Tabulations
• A cross-tabulation is a tabular method that displays the
number of observations in a data set for different
subcategories of two categorical variables.
– A cross-tabulation table is often called a contingency table.

• The subcategories of the variables must be mutually

exclusive and exhaustive, meaning that each observation
can be classified into only one subcategory, and, taken
together over all subcategories, they must constitute the
complete data set.

Copyright © 2021 Pearson Education Ltd. Slide - 34

Example 4.13: Constructing a Cross-
Tabulation
• Sales Transactions database

• Count the number (and compute the percentage) of books and

DVDs ordered by region (easy with PivotTables).
Region Book DV D Total Region Book D VD Total
East 56 42 98
East 57.1% 42.9% 100.0%
North 43 42 85
North 50.6% 49.4% 100.0%
South 62 37 99
South 62.6% 37.4% 100.0%
West 100 90 190
Total 261 211 472 West 52.6% 47.4% 100.0%

Copyright © 2021 Pearson Education Ltd. Slide - 35

Cross-Tabulation Visualization: Chart
of Regional Sales by Product

Copyright © 2021 Pearson Education Ltd. Slide - 36

Populations and Samples
• Population - all items of interest for a particular
decision or investigation
– all married drivers over 25 years old
– all subscribers to Netflix
• Sample - a subset of the population
– a list of individuals who rented a comedy from
Netflix in the past year
• The purpose of sampling is to obtain sufficient
information to draw a valid inference about a
population.
Copyright © 2021 Pearson Education Ltd. Slide - 37
Understanding Statistical Notation
• We typically label the elements of a data set using
subscripted variables,
represents the ith observation.
• It is common practice in statistics to use Greek letters,
such as to represent
population measures and italic letters such as by
(called x-bar), s, and p to represent sample statistics.
• N represents the number of items in a population and n
represents the number of observations in a sample.

• Capital Greek sigma

represents summation:
Copyright © 2021 Pearson Education Ltd. Slide - 38
Measures of Location: Arithmetic
Mean
• Population mean:

• Sample mean:

• Excel function: = AVERAGE(data range)

• Property of the mean:

• Outliers can affect the value of the mean.

Copyright © 2021 Pearson Education Ltd. Slide - 39
Example 4.14: Computing Mean Cost
Per Order
Purchase Orders database
• Using formula (4.5):

Using Excel AVERAGE Function

Copyright © 2021 Pearson Education Ltd. Slide - 40

Measures of Location: Median
• The median specifies the middle value when the data are
arranged from least to greatest.
– Half the data are below the median, and half the data are above
it.
– For an odd number of observations, the median is the middle of
the sorted numbers.
– For an even number of observations, the median is the mean of
the two middle numbers.
• We could use the Sort option in Excel to rank-order the data
and then determine the median. The Excel function
=MEDIAN(data range) could also be used.
• The median is meaningful for ratio, interval, and ordinal data.
• Not affected by outliers.
Copyright © 2021 Pearson Education Ltd. Slide - 41
Example 4.15: Finding the Median
Cost Per Order
• Sort the data from smallest to largest. Since we
have 90 observations, the median is the average of
the 47th and 48th observation.

Copyright © 2021 Pearson Education Ltd. Slide - 42

Measures of Location: Mode
• The mode is the observation that occurs most
frequently.
• The mode is most useful for data sets that contain a
relatively small number of unique values.
• You can easily identify the mode from a frequency
distribution by identifying the value or group having
the largest frequency or from a histogram by
identifying the highest bar.
• Excel function:
• For multiple modes:
Copyright © 2021 Pearson Education Ltd. Slide - 43
Example 4.16: Finding the Mode
• Purchase Orders
database: A/P Terms
– Mode = 30 months

• Cost per order

– Mode is the group
between $0 and
$13,000.

Copyright © 2021 Pearson Education Ltd. Slide - 44

Measures of Location: Midrange
• The midrange is the average of the greatest and
least values in the data set.
• Caution must be exercised when using the
midrange because extreme values easily distort
the result. This is because the midrange uses only
two pieces of data, whereas the mean uses all the
data; thus, it is usually a much rougher estimate
than the mean and is often used for only small
sample sizes.

Copyright © 2021 Pearson Education Ltd. Slide - 45

Example 4.17: Computing the
Midrange
• Purchase Orders data
• Use the Excel MIN and MAX functions or sort the
data and find them easily.
• Cost per order midrange:

=($68.78 + $1127,500)/2 = $63,784.39

Copyright © 2021 Pearson Education Ltd. Slide - 46

Example 4.18: Quoting Computer Repair
Times
The Excel file Computer Repair Times includes 250
repair times for customers.
• What repair time would be
reasonable to quote to a new
customer?
• Median repair time is 2
weeks; mean and mode are
about 15 days.
• Examine the histogram.
Copyright © 2021 Pearson Education Ltd. Slide - 47
Example 4.18 Continued

Copyright © 2021 Pearson Education Ltd. Slide - 48

Measures of Dispersion
• Dispersion refers to the degree of variation in the
data; that is, the numerical spread (or
compactness) of the data.
• Key measures:
– Range
– Interquartile range
– Variance
– Standard deviation

Copyright © 2021 Pearson Education Ltd. Slide - 49

Measures of Dispersion: Range
• The range is the simplest and is the difference
between the maximum value and the minimum
value in the data set.
• In Excel, compute as
=MAX(data range) − MIN(data range).
• The range is affected by outliers, and is often used
only for very small data sets.

Copyright © 2021 Pearson Education Ltd. Slide - 50

Example 4.19: Computing the Range
• Purchase Orders data
• For the cost per order data:
– Maximum = $127,500
– Minimum = $68.78

Range = $127,500 − $68.78 = $127,431.22

Copyright © 2021 Pearson Education Ltd. Slide - 51

Measures of Dispersion: Interquartile
Range
• The interquartile range (IQR), or the midspread is
the difference between the first and third quartiles,

• This includes only the middle 50% of the data and,

therefore, is not influenced by extreme values.

Copyright © 2021 Pearson Education Ltd. Slide - 52

Example 4.20: Computing the
Interquartile Range
• Purchase Orders data
• For the Cost per order data:
– Third Quartile =
– First Quartile =

Interquartile Range =
$27,593.75 − $6,757.81 =$20,835.94

Copyright © 2021 Pearson Education Ltd. Slide - 53

Measures of Dispersion: Variance
• The variance is the “average” of the squared
deviations from the mean.
• For a population:

– In Excel:
• For a sample:

– In Excel:
• Note the difference in denominators!
Copyright © 2021 Pearson Education Ltd. Slide - 54
Example 4.21: Computing the
Variance
• Purchase Orders Cost per order data

Copyright © 2021 Pearson Education Ltd. Slide - 55

Measures of Dispersion: Standard
Deviation
• The standard deviation is the square root of the variance.
– Note that the dimension of the variance is the square of the
dimension of the observations, whereas the dimension of the
standard deviation is the same as the data. This makes the
standard deviation more practical to use in applications.
• For a population:

– In Excel:
• For a sample:

– In Excel:
Copyright © 2021 Pearson Education Ltd. Slide - 56
Example 4.22: Computing the
Standard Deviation
• Purchase Orders Cost per order data
• Using the results of Example 4.21, take the square
root of the variance:

• Alternatively, use the Excel function =STDEV.S(B2:B95)

Copyright © 2021 Pearson Education Ltd. Slide - 57

Standard Deviation as a Measure of
Risk
Excel file: Closing Stock
Prices

Intel (INTC):
Mean = $18.81
Standard deviation = $0.50
General Electric (GE):
Mean = $16.19
Standard deviation = $0.35
INTC is a higher risk
investment than GE.

Copyright © 2021 Pearson Education Ltd. Slide - 58

Chebyshev’s Theorem
• For any data set, the proportion of values that lie
within standard deviations of the mean is
at least .

• Examples:
– For k = 2: at least or 75% of the data lie within two
standard deviations of the mean
– For k = 3: at least or 89% of the data lie within three
standard deviations of the mean

Copyright © 2021 Pearson Education Ltd. Slide - 59

Example 4.23: Applying Chebyshev’s
Theorem
• Purchase Orders database
• A two-standard-deviation interval around the
mean is [-$33,390.34, $85,980.98].
– 89 of 94, or 94.7%, of the observations fall in
this interval.
• A three-standard-deviation interval is
[-$63,233.17, $115,823.81]
– 92 of 94, or 97.9%, fall in this interval.

Copyright © 2021 Pearson Education Ltd. Slide - 60

Empirical Rules
• For many data sets encountered in practice:
– Approximately 68% of the observations fall within one
standard deviation of the mean .

– Approximately 95% fall within two standard deviations of

the mean .

– Approximately 99.7% fall within three standard deviations

of the mean .

• These rules are commonly used to characterize the

natural variation in manufacturing processes and
other business phenomena.
Copyright © 2021 Pearson Education Ltd. Slide - 61
Process Capability Index
• The process capability index is a measure of
how well a manufacturing process can achieve
specifications.
• Using a sample of output, measure the dimension of
interest and compute the total variation using the
third empirical rule.
• Compare results to specifications using:

Copyright © 2021 Pearson Education Ltd. Slide - 62

Example 4.24: Using Empirical Rules to Measure
the Capability of a Manufacturing Process

Copyright © 2021 Pearson Education Ltd. Slide - 63

Standardized Values
• A standardized value, commonly called a z-score,
provides a relative measure of the distance an
observation is from the mean, which is independent of
the units of measurement.
• The z-score for the ith observation in a data set is
calculated as follows:

– Excel function:

Copyright © 2021 Pearson Education Ltd. Slide - 64

Properties of z-Scores

• The numerator represents the distance that is from the

sample mean; a negative value indicates that lies to
the left of the mean, and a positive value indicates that it
lies to the right of the mean. By dividing by the standard
deviation, s, we scale the distance from the mean to
express it in units of standard deviations. Thus,
– a z-score of 1.0 means that the observation is one standard
deviation to the right of the mean;
– a z-score of -1.5 means that the observation is 1.5 standard
deviations to the left of the mean.

Copyright © 2021 Pearson Education Ltd. Slide - 65

Example 4.25: Computing z-Scores
• Purchase Orders Cost per order data

Copyright © 2021 Pearson Education Ltd. Slide - 66

Coefficient of Variation
• The coefficient of variation (CV) provides a relative
measure of dispersion in data relative to the mean:

– Sometimes expressed as a percentage.

– Provides a relative measure of risk to return.

• Return to risk = is often easier to interpret,

especially in financial risk analysis.
– The Sharpe ratio is a related measure in finance.

Copyright © 2021 Pearson Education Ltd. Slide - 67

Example 4.26: Applying the
Coefficient of Variation
• Closing Stock Prices worksheet
• Intel (INTC) is slightly riskier than the other stocks.
• The Index fund has the least risk (lowest CV).

Copyright © 2021 Pearson Education Ltd. Slide - 68

Measures of Shape: Skewness
• Skewness describes the lack of symmetry of data.
– Distributions that tail off to the right are called positively
skewed; those that tail off to the left are said to be
negatively skewed.

Negatively skewed Positively skewed

Coefficient of Skewness
• Coefficient of Skewness (CS):

• Excel function: =SKEW(data range)

– CS is negative for left-skewed data.
– CS is positive for right-skewed data.
– suggests high degree of skewness.
– suggests moderate skewness.
– suggests relative symmetry.
Copyright © 2021 Pearson Education Ltd. Slide - 70
Example 4.27: Measuring Skewness
• Purchase Orders database
• Cost per order data: CS = 1.66 (high positive
skewness)
• A/P terms data: CS = 0.60 (more symmetric)

Measures of Shape: Kurtosis
• Kurtosis refers to the peakedness (i.e., high, narrow) or
flatness (i.e., short, flat-topped) of a histogram.
• The coefficient of kurtosis (CK) measures the degree of
kurtosis of a population

– CK < 3 indicates the data is somewhat flat with a wide degree of

dispersion.
– CK > 3 indicates the data is somewhat peaked with less dispersion.

Excel Function for Kurtosis
• Excel computes kurtosis differently; the function
KURT(data range) computes "excess kurtosis”
for sample data, which is CK − 3. (Excel does
not have a corresponding function for a
population).
• Thus, to interpret kurtosis values in Excel,
distributions with values less than 0 are more
flat, while those with values greater than 0 are
more peaked.

Excel Descriptive Statistics Tool
This tool provides a summary of numerical statistical measures
for sample data.

• Enter Input Range

• Labels (optional)
• Check Summary Statistics box

• The data must be in a single row or column. If the data are in

multiple columns, the tool treats each row or column as a
separate data set.
Copyright © 2021 Pearson Education Ltd. Slide - 74
Example 4.28: Using the Descriptive
Statistics Tool
• Purchase Orders database
Note: Results of
the Analysis
Toolpak do not
change when
changes are
made to the data.

Descriptive Statistics for Frequency
Distributions
• Population mean:

• Sample mean:

• Population variance:

• Sample variance:

Example 4.29: Computing Statistical
Measures from Frequency Distributions
• Computer Repair Times

Grouped Data
• If the data are grouped into k cells in a frequency
distribution, we can use modified versions of the
formulas to estimate the mean and variance by
replacing with a representative value (such as the
midpoint, M) for all the observations in each cell
group and summing over all groups.

Example 4.30: Computing Descriptive Statistics
for a Grouped Frequency Distribution

Descriptive Statistics for Categorical
Data: The Proportion
• The proportion, denoted by p, is the fraction of
data that have a certain characteristic.
• Proportions are key descriptive statistics for
categorical data, such as defects or errors in
quality control applications or consumer
preferences in market research.

Example 4.31: Computing a
Proportion
• Proportion of orders placed by Spacetime Technologies

Statistics in PivotTables
Value Field Settings include several statistical
measures:

• Average
• Max and Min
• Product
• Standard deviation
• Variance

Example 4.32: Statistical Measures in
PivotTables
• Credit Risk Data
• First, create a PivotTable.
• In the PivotTable Field List, move Job to the Row Labels
field and Checking and Savings to the Values field. Then
change the field settings from “Sum of Checking” and
“Sum of Savings” to the averages.

Measures of Association
• Two variables have a strong statistical relationship
with one another if they appear to move together.
• When two variables appear to be related, you
might suspect a cause-and-effect relationship.
• Sometimes, however, statistical relationships exist
even though a change in one variable is not
caused by a change in the other.

Measures of Association: Covariance
• Covariance is a measure of the linear association between two
variables, X and Y. Like the variance, different formulas are used for
populations and samples.
• Population covariance:

– Excel function:
• Sample covariance:

– Excel function:

• The covariance between X and Y is the average of the product of the

Measures of Association: Correlation
• Correlation is a measure of the linear relationship between two
variables, X and Y, which does not depend on the units of
measurement.
• Correlation is measured by the correlation coefficient, also known as
the Pearson product moment correlation coefficient.
• Correlation coefficient for a population:

• Correlation coefficient for a

sample:

• The correlation coefficient is scaled between −1 and 1.

Example 4.34: Computing the
Correlation Coefficient
• Colleges and Universities data

Divide the covariance by the product of the standard

deviations in cell F54.
Copyright © 2021 Pearson Education Ltd. Slide - 89
Notes on the CORREL Function
• When using the CORREL function, it does not
matter if the data represent samples or
populations. In other words,

Excel Correlation Tool

• Excel computes the correlation coefficient between

all pairs of variables in the Input Range. Input
Range data must be in contiguous columns.
Copyright © 2021 Pearson Education Ltd. Slide - 91
Example 4.35: Using the Correlation
Tool
• Colleges and Universities data

– Moderate negative correlation between acceptance rate and

graduation rate, indicating that schools with lower acceptance
rates have higher graduation rates.
– Acceptance rate is also negatively correlated with the median SAT
and Top 10% HS, suggesting that schools with lower acceptance
rates have higher student profiles.
– The correlations with Expenditures/Student suggest that schools
with higher student profiles spend more money per student.
Copyright © 2021 Pearson Education Ltd. Slide - 92
Identifying Outliers
• There is no standard definition of what constitutes
an outlier.
• Some typical rules of thumb:
– z-scores greater than +3 or less than −3
– Values more than to the left of or right of
(extreme outliers)
– Values between to the left of
or right of (mild outliers)

Example 4.36: Investigating Outliers
• Home Market Value data

• None of the z-scores exceed 3. However, while individual

variables might not exhibit outliers, combinations of them
might.
– The last observation has a high market value ($120,700) but a
relatively small house size (1,581 square feet) and may be an
outlier.
Copyright © 2021 Pearson Education Ltd. Slide - 94
Using Descriptive Statistics to
Analyze Survey Data
• Descriptive statistics tools are extremely valuable for
summarizing and analyzing survey data.
– Frequency distributions and histograms for the ratio variables
– Descriptive statistical measures for the ratio variables using the
Descriptive Statistics tool
– Proportions for various attributes of the categorical variables in
the sample
– PivotTables that break down the averages of ratio variables
– Cross-tabulations
– Z-scores for examination of potential outliers

Statistical Thinking in Business
Decisions
• Statistical Thinking is a philosophy of learning and
action for improvement, based on principles that:
– all work occurs in a system of interconnected processes
– variation exists in all processes
– better performance results from understanding and reducing
variation

• Work gets done in any organization through processes

— systematic ways of doing things that achieve desired
results.
• Understanding business processes provides the context
for determining the effects of variation and the proper
type of action to be taken.
Copyright © 2021 Pearson Education Ltd. Slide - 96
Example 4.37: Applying Statistical
Thinking
• Excel file Surgery Infections
– Is month 12 simply random variation or some
explainable phenomenon?

Example 4.37 Continued
• Three-standard deviation empirical rule:

• This suggests that month 12 is statistically different

from the rest of the data.
Copyright © 2021 Pearson Education Ltd. Slide - 98
Variability in Samples
• Different samples from any population will vary.
– They will have different means, standard deviations, and
other statistical measures.
– They will have differences in the shapes of histograms.

• Samples are extremely sensitive to the sample size –

the number of observations included in the samples.

Example 4.38: Variation in Sample
Data
• Samples from Computer Repair Times data
• Population statistics:
• Two
samples
of size 50:

• Two
samples of
size 25:

100

DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
No ratings yet
DWM Sem V Module 2 - Introduction To Data Mining, Data Exploration and Data Pre-Processing
55 pages
Evans TB Businessanalytics03 9781292339009
100% (1)
Evans TB Businessanalytics03 9781292339009
334 pages
Test Bank For Business Analytics 3rd Edition by Evans
No ratings yet
Test Bank For Business Analytics 3rd Edition by Evans
28 pages
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
No ratings yet
BCSE 0105 - Machine Learning - Module 1 - Complete - NC
200 pages
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
No ratings yet
Business Analytics: Methods, Models, and Decisions: Descriptive Statistics
100 pages
Business Analytics: Methods, Models, and Decisions
No ratings yet
Business Analytics: Methods, Models, and Decisions
47 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Coronel PPT Ch03
100% (1)
Coronel PPT Ch03
38 pages
Eba3e PPT ch06
No ratings yet
Eba3e PPT ch06
41 pages
Evans Analytics3e PPT 03 Accessible v2
No ratings yet
Evans Analytics3e PPT 03 Accessible v2
36 pages
Tourism MasterTestBank - Tourism - 6e-271720
No ratings yet
Tourism MasterTestBank - Tourism - 6e-271720
104 pages
Chapter 5 - Probability Distributions and Data Modeling
No ratings yet
Chapter 5 - Probability Distributions and Data Modeling
100 pages
Cheat Sheet - BT1101
100% (2)
Cheat Sheet - BT1101
29 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
Evans Analytics2e PPT 06 Final
100% (1)
Evans Analytics2e PPT 06 Final
36 pages
Evans Analytics2e PPT 14
No ratings yet
Evans Analytics2e PPT 14
89 pages
Unit 2 - Knowledge Delivery
No ratings yet
Unit 2 - Knowledge Delivery
31 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
07 ASAP Business Analytics Probability
No ratings yet
07 ASAP Business Analytics Probability
74 pages
Business Intelligence
No ratings yet
Business Intelligence
60 pages
Evans Analytics1e PPT 08
No ratings yet
Evans Analytics1e PPT 08
61 pages
Evans Analytics2e PPT 12
100% (1)
Evans Analytics2e PPT 12
63 pages
Analytics CH 5
No ratings yet
Analytics CH 5
30 pages
Sbe13ch17a PP
No ratings yet
Sbe13ch17a PP
48 pages
Chapter 1 Overview of Statistics: Applied Statistics in Business and Economics, 6e (Doane)
No ratings yet
Chapter 1 Overview of Statistics: Applied Statistics in Business and Economics, 6e (Doane)
16 pages
Laudon Mis17 PPT ch08
No ratings yet
Laudon Mis17 PPT ch08
50 pages
Evans Analytics1e PPT 10
No ratings yet
Evans Analytics1e PPT 10
61 pages
Render Qam13e PPT 02
No ratings yet
Render Qam13e PPT 02
109 pages
CH 1 MIS2023 Spring
No ratings yet
CH 1 MIS2023 Spring
52 pages
Evans Analytics1e PPT 04
No ratings yet
Evans Analytics1e PPT 04
64 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
CH 4
100% (1)
CH 4
14 pages
Chapter 12 - Simulation and Risk Analysis
No ratings yet
Chapter 12 - Simulation and Risk Analysis
64 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Some Important Discrete Probability Distributions
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Some Important Discrete Probability Distributions
48 pages
A Multi-Dimensional Data Model
No ratings yet
A Multi-Dimensional Data Model
37 pages
Evans Analytics3e PPT 16 Accessible
No ratings yet
Evans Analytics3e PPT 16 Accessible
60 pages
Evans Analytics2e PPT 12
100% (1)
Evans Analytics2e PPT 12
63 pages
Data Analysis Midterm 5
0% (3)
Data Analysis Midterm 5
25 pages
IBS Hyderabad : Programme: Mba Course Code Course Title Faculty Name Consultation Hours (Day/time)
No ratings yet
IBS Hyderabad : Programme: Mba Course Code Course Title Faculty Name Consultation Hours (Day/time)
10 pages
Tabular and Graphical Methods: Business Statistics: Communicating With Numbers, 4e
No ratings yet
Tabular and Graphical Methods: Business Statistics: Communicating With Numbers, 4e
32 pages
In-Class Practices - Session 1 - Answers
No ratings yet
In-Class Practices - Session 1 - Answers
19 pages
The Influence of Type of Implicit EWOM On Purchase Intention
No ratings yet
The Influence of Type of Implicit EWOM On Purchase Intention
156 pages
2e. Supply Chain Analytics - Presentation
No ratings yet
2e. Supply Chain Analytics - Presentation
39 pages
Frequency Distribution New
No ratings yet
Frequency Distribution New
18 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
Evans Analytics1e PPT 02
0% (1)
Evans Analytics1e PPT 02
36 pages
Spatial Economterics Using SMLE
100% (1)
Spatial Economterics Using SMLE
29 pages
Chapter 3 - Data Visualization
No ratings yet
Chapter 3 - Data Visualization
36 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
ATHE 2024 Fee Structures
No ratings yet
ATHE 2024 Fee Structures
14 pages
Sixteenth Edition: Global E-Business and Collaboration
No ratings yet
Sixteenth Edition: Global E-Business and Collaboration
36 pages
RIBA - Worst Book Ever
No ratings yet
RIBA - Worst Book Ever
64 pages
Statistical Infrences Lec 1
No ratings yet
Statistical Infrences Lec 1
35 pages
Lecture 11 - Chapter 11
No ratings yet
Lecture 11 - Chapter 11
35 pages
Bus604 - CH13
No ratings yet
Bus604 - CH13
24 pages
Evans Analytics1e PPT 07
No ratings yet
Evans Analytics1e PPT 07
49 pages
Sharda dss10 PPT 04
No ratings yet
Sharda dss10 PPT 04
38 pages
MCQ QB
No ratings yet
MCQ QB
2 pages
STAT 2601 Final Exam Extra Practice Questions
No ratings yet
STAT 2601 Final Exam Extra Practice Questions
9 pages
Unit 5
No ratings yet
Unit 5
19 pages
Introduction To Datascience (R20DS501)
100% (1)
Introduction To Datascience (R20DS501)
19 pages
Sixteenth Edition: I T Infrastructure and Emerging Technologies
No ratings yet
Sixteenth Edition: I T Infrastructure and Emerging Technologies
18 pages
ASap
No ratings yet
ASap
66 pages
Session 3 Descriptive Analysis I-Frequency Distribution and Cross Tabulation
No ratings yet
Session 3 Descriptive Analysis I-Frequency Distribution and Cross Tabulation
30 pages
Sharda dss10 PPT 01
No ratings yet
Sharda dss10 PPT 01
27 pages
Evans Analytics2e PPT 03
No ratings yet
Evans Analytics2e PPT 03
76 pages
Determining Sample Size For Research Activities
No ratings yet
Determining Sample Size For Research Activities
16 pages
Sharda dss10 PPT 01
No ratings yet
Sharda dss10 PPT 01
41 pages
INFERENTIAL STATISTICS (Project)
No ratings yet
INFERENTIAL STATISTICS (Project)
17 pages
Macroeconomics Chapter 1
No ratings yet
Macroeconomics Chapter 1
24 pages
Notes On Forecasting
100% (1)
Notes On Forecasting
12 pages
CH 2
0% (1)
CH 2
16 pages
Coaching and Mentoring NOS
No ratings yet
Coaching and Mentoring NOS
53 pages
CH 1
No ratings yet
CH 1
17 pages
MCQ SettingtheStandard
No ratings yet
MCQ SettingtheStandard
26 pages
CH 3
No ratings yet
CH 3
25 pages
Royal Arsalan Digital Marketing Plan
No ratings yet
Royal Arsalan Digital Marketing Plan
1 page
Fees Structure 2023 08
No ratings yet
Fees Structure 2023 08
2 pages
Understanding The Confusion Matrix in Machine Learning
No ratings yet
Understanding The Confusion Matrix in Machine Learning
4 pages
National Occupational Standards For Personal Tutoring
No ratings yet
National Occupational Standards For Personal Tutoring
25 pages
Chapter 5
No ratings yet
Chapter 5
49 pages
Maintaining and Monitoring The Online Presence
No ratings yet
Maintaining and Monitoring The Online Presence
6 pages
Chapter 2 Answer Key - CompSec4e
No ratings yet
Chapter 2 Answer Key - CompSec4e
2 pages
Faq Pvip
No ratings yet
Faq Pvip
10 pages
Midterm Review
100% (1)
Midterm Review
10 pages
CSCM Certifcate
No ratings yet
CSCM Certifcate
1 page
Audit Sampling Plan
No ratings yet
Audit Sampling Plan
2 pages
Data Warehousing
No ratings yet
Data Warehousing
24 pages
pp01
No ratings yet
pp01
2 pages
Hasil Spss Log
No ratings yet
Hasil Spss Log
6 pages
Projet - COLD STORAGE
No ratings yet
Projet - COLD STORAGE
21 pages
Types of Analytics: What Is Descriptive Analytics?
No ratings yet
Types of Analytics: What Is Descriptive Analytics?
3 pages
Linear Programming Examples Assignment 1
No ratings yet
Linear Programming Examples Assignment 1
5 pages
AP05 Internal Audits For CABs Rev2
No ratings yet
AP05 Internal Audits For CABs Rev2
9 pages
Regression Analysis 2022
No ratings yet
Regression Analysis 2022
92 pages
SP04 Accreditation Fee Schedule For CB & IB-Rev2
No ratings yet
SP04 Accreditation Fee Schedule For CB & IB-Rev2
5 pages
EC1 CBE Centre Application v13.1
No ratings yet
EC1 CBE Centre Application v13.1
11 pages
AP01 Accreditation Procedure Rev 8
No ratings yet
AP01 Accreditation Procedure Rev 8
11 pages
04-SAS For Statistical Genetics
No ratings yet
04-SAS For Statistical Genetics
19 pages
Gao Et Al 2018 Reading - Programs - Academic
No ratings yet
Gao Et Al 2018 Reading - Programs - Academic
15 pages
Day of The Week Effects
No ratings yet
Day of The Week Effects
13 pages
Binomial Distribution
No ratings yet
Binomial Distribution
14 pages
Assign 343
No ratings yet
Assign 343
12 pages
Lampiran 1. Output SPSS Deskriptif. Variabel Kualitas Pelayanan Keperawatan Frequency Table
No ratings yet
Lampiran 1. Output SPSS Deskriptif. Variabel Kualitas Pelayanan Keperawatan Frequency Table
7 pages
Chapter - 4 Sampling Design: Introduction Important Terminologies in Sampling Steps in Sampling
No ratings yet
Chapter - 4 Sampling Design: Introduction Important Terminologies in Sampling Steps in Sampling
45 pages
Brandt and Kinlay - Estimating Historical Volatility v1.2 June 2005
No ratings yet
Brandt and Kinlay - Estimating Historical Volatility v1.2 June 2005
44 pages
Stats Crib Sheet Exam
No ratings yet
Stats Crib Sheet Exam
2 pages
BB A 3 Econometric Sand Excel
No ratings yet
BB A 3 Econometric Sand Excel
28 pages
Simple Linear Regression Interpretation PDF
No ratings yet
Simple Linear Regression Interpretation PDF
2 pages
Exercise TimeSeries
No ratings yet
Exercise TimeSeries
1 page
JNTUH Usedpapers March 2022: (Common To CSE, IT, CSE (SE), CSE (IOT), CSEN)
No ratings yet
JNTUH Usedpapers March 2022: (Common To CSE, IT, CSE (SE), CSE (IOT), CSEN)
2 pages
Lesson 5
No ratings yet
Lesson 5
5 pages
Formule
No ratings yet
Formule
3 pages
Frequency Table: Jenis Kelamin
No ratings yet
Frequency Table: Jenis Kelamin
4 pages