0% found this document useful (0 votes)
50 views3 pages

Chapter 3: Statistics

This document discusses various numerical measures used for descriptive statistics, including measures of location, variability, and distribution shape. Measures of location include the mean, median, trimmed mean, and mode. Measures of variability include the range, interquartile range, variance, and coefficient of variation. Measures of distribution shape include skewness and percentiles, which provide information about how data are spread from minimum to maximum values.

Uploaded by

Sophia Athena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views3 pages

Chapter 3: Statistics

This document discusses various numerical measures used for descriptive statistics, including measures of location, variability, and distribution shape. Measures of location include the mean, median, trimmed mean, and mode. Measures of variability include the range, interquartile range, variance, and coefficient of variation. Measures of distribution shape include skewness and percentiles, which provide information about how data are spread from minimum to maximum values.

Uploaded by

Sophia Athena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 3 o If the data have exactly two modes, the data are bimodal.

o If the data have more than two modes, the data are multimodal.
DESCRIPTIVE STATISTICS: NUMERICAL MEASURES

Numerical Measures
 Weighted Mean
 If the measures are computed for data from a sample, they are called sample statistics.
o In some instances the mean is computed by giving each observation a weight that
 If the measures are computed for data from a population, they are called population
reflects its relative importance.
parameters.
o The choice of weights depends on the application.
 A sample statistic is referred to as the point estimator of the corresponding population
o The weights might be the number of credit hours earned for each grade, as in GPA.
parameter.
o In other weighted mean computations, quantities such as pounds, dollars, or
Measures of Location volume are frequently used.
 Mean ∑ w i xi
o The mean provides a measure of central location. o x́=
o The mean of a data set is the average of all the data values. ∑ wi
o The sample mean x́ is the point estimator of the population mean µ. where: xi = value of observation i
wi = weight for observation I
o Sample Mean: x́=
∑ xi Numerator: sum of the weighted data values
n Denominator: sum of the weights
 Geometric Mean
o Population Mean: μ=
∑ xi o The geometric mean is calculated by finding the nth root of the product of n values.
N o It is often used in analyzing growth rates in financial data (where using the
where: Sxi = sum of the values of the N observations arithmetic mean will provide misleading results).
N = number of observations in the population o It should be applied anytime you want to determine the mean rate of change over
 Median several successive periods (be it years, quarters, weeks, . . .).
o The median of a data set is the value in the middle when the data items are o Other common applications include: changes in populations of species, crop yields,
arranged in ascending order. pollution levels, and birth and death rates.
o Whenever a data set has extreme values, median is the preferred measure of central o x́ g =√n ( x 1 ) ( x 2 ) …( x n)
location.
o The median is the measure of location most often reported for annual income and = [(x1)(x2)…(xn)]1/n
property value data.  Percentiles
o A few extremely large incomes or property values can inflate the mean. o A percentile provides information about how the data are spread over the interval
o For an odd number of observations, arrange it in ascending order and the middle from the smallest value to the largest value.
o Admission test scores for colleges and universities are frequently reported in terms
value is the median.
o For an even number of observations, arrange it in ascending order and the average of percentiles.
o The pth percentile of a data set is a value such that at least p percent of the items
of the middle two values is the median.
take on this value or less and at least (100 - p) percent of the items take on this
 Trimmed Mean
value or more.
o Another measure sometimes used when extreme values are present
o Arrange the data in ascending order.
o It is obtained by deleting a percentage of the smallest and largest values from a data
o Compute Lp, the location of the pth percentile.
set and then computing the mean of the remaining values.
Lp = (p/100)(n + 1)
o For example, the 5% trimmed mean is obtained by removing the smallest 5% and
 Quartiles
the largest 5% of the data values and then computing the mean of the remaining
o Quartiles are specific percentiles.
values.
o First Quartile = 25th Percentile
 Mode
o The mode of a data set is the value that occurs with greatest frequency. o Second Quartile = 50th Percentile = Median
o The greatest frequency can occur at two or more different values. o Third Quartile = 75th Percentile
σ
Measures of Variability
 It is often desirable to consider measures of variability (dispersion), as well as measures
o Population CoV:
[ μ
x 100 ] %

of location. Measures of Distribution Shape, Relative Location, and Detecting Outliers


 For example, in choosing supplier A or supplier B we might consider not only the  Distribution Shape
average delivery time for each, but also the variability in delivery time for each. o Skewness
 Range  An important measure of the shape of a distribution is called skewness.
o The range of a data set is the difference between the largest and smallest data  The formula for the skewness of sample data is
3
values. n x i−x́
o Range = Largest value – Smallest value
o It is the simplest measure of variability.

 Skewness =
(n−1)(n−2)
∑ s [ ]
Skewness can be easily computed using statistical software. (Chapter 2)
o It is very sensitive to the smallest and largest data values.
 Interquartile Range  z-Scores
o The interquartile range of a data set is the difference between the third quartile and o The z-score is often called the standardized value.
the first quartile. o It denotes the number of standard deviations a data value xi is from the mean.
o It is the range for the middle 50% of the data. xi −x́
o It overcomes the sensitivity to extreme data values.
o z i=
s
 Variance o Excel’s STANDARDIZE function can be used to compute the z-score.
o The variance is a measure of variability that utilizes all the data. o An observation’s z-score is a measure of the relative location of the observation in
o It is based on the difference between the value of each observation (xi) and the a data set.
mean ( x́ for a sample,  for a population). o A data value less than the sample mean will have a z-score less than zero.
o The variance is useful in comparing the variability of two or more variables. o A data value greater than the sample mean will have a z-score greater than zero.
o The variance is the average of the squared differences between each data value and o A data value equal to the sample mean will have a z-score of zero.
the mean.  Chebyshev’s Theorem

o Sample Variance: s2=


∑ ( x i−x́ ) 2 o At least (1 - 1/z2) of the items in any data set will be within z standard deviations of
the mean, where z is any value greater than 1.
n−1 o Chebyshev’s theorem requires z > 1, but z need not be an integer.

o Population Variance: σ 2=
∑ ( xi −μ ) 2 o At least 75% of the data values must be within z = 2 standard deviations of the
mean.
N o At least 89% of the data values must be within z = 3 standard deviations of the
 Standard Deviation
mean.
o The standard deviation of a data set is the positive square root of the variance.
o At least 94% of the data values must be within z = 4 standard deviations of the
o It is measured in the same units as the data, making it more easily interpreted than
mean.
the variance.
 Empirical Rule
o Sample SD: s = √ s2 o When the data are believed to approximate a bell-shaped distribution:
o Population SD:  = √   2  The empirical rule can be used to determine the percentage of data values that
must be within a specified number of standard deviations of the mean.
 Coefficient of Variation  The empirical rule is based on the normal distribution, which is covered in
o The coefficient of variation indicates how large the standard deviation is in relation Chapter 6.
to the mean. o For data having a bell-shaped distribution:
s
o Sample CoV:
[ x́
x 100 ] %


Approximately 68% of the data values will be within +/- 1 standard deviation
of its mean.
Approximately 95% of the data values will be within +/- 2 standard deviations
of its mean.
 Almost all of the data values will be within +/- 3 standard deviations of its s xy
mean. o Sample CC: r xy=
 Detecting Outliers sx s y
o An outlier is an unusually small or unusually large value in a data set. σ xy
o A data value with a z-score less than -3 or greater than +3 might be considered an o Population CC: ρ xy=
σxσ y
outlier.
o It might be: o The coefficient can take on values between -1 and +1.
 an incorrectly recorded data value o Values near -1 indicate a strong negative linear relationship.
 a data value that was incorrectly included in the data set o Values near +1 indicate a strong positive linear relationship.
 a correctly recorded unusual data value that belongs in the data set o The closer the correlation is to zero, the weaker the relationship.

Five-Number Summaries and Box Plots Data Dashboards: Adding Numerical Measures to Improve Effectiveness
 Summary statistics and easy-to-draw graphs can be used to quickly summarize large  Data dashboards are not limited to graphical displays.
quantities of data.  The addition of numerical measures, such as the mean and standard deviation of KPIs,
 Five-Number Summary to a data dashboard is often critical.
o Smallest Value  Dashboards are often interactive.
o First Quartile  Drilling down refers to functionality in interactive dashboards that allows the user to
o Median access information and analyses at increasingly detailed level.
o Third Quartile
o Largest Value
 Box Plot
o A box plot is a graphical summary of data that is based on a five-number summary.
o A key to the development of a box plot is the computation of the median and the
quartiles Q1 and Q3.
o Box plots provide another way to identify outliers.
o Limits are located (not drawn) using the interquartile range (IQR).
o Data outside these limits are considered outliers
o The locations of each outlier is shown with the symbol

Measures of Association between Two Variables


 Often a manager or decision maker is interested in the relationship between two
variables.
 Covariance
o The covariance is a measure of the linear association between two variables.
o Positive values indicate a positive relationship.
o Negative values indicate a negative relationship.

o Sample Covariance: s xy=


∑ ( x i ¿−x́)( y i − ý) ¿
n−1
o Population Covariance: σ xy =
∑ ( x i ¿−μ x )( y i−μ y ) ¿
N
 Correlation Coefficient
o Correlation is a measure of linear association and not necessarily causation.
o Just because two variables are highly correlated, it does not mean that one variable
is the cause of the other.

You might also like