0% found this document useful (0 votes)
12 views27 pages

Descriptive Stat Pt.2

The document provides an overview of various statistical data visualization methods, including stem and leaf plots, frequency tables, box and whisker plots, and measures of location and variability. It explains how to construct these plots and interpret their features, as well as discusses concepts such as interquartile range, outliers, and trimmed means. Additionally, it includes examples of data analysis tasks related to energy consumption and movie running times.

Uploaded by

Arwan Chua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

Descriptive Stat Pt.2

The document provides an overview of various statistical data visualization methods, including stem and leaf plots, frequency tables, box and whisker plots, and measures of location and variability. It explains how to construct these plots and interpret their features, as well as discusses concepts such as interquartile range, outliers, and trimmed means. Additionally, it includes examples of data analysis tasks related to energy consumption and movie running times.

Uploaded by

Arwan Chua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

STEM AND LEAF PLOT:

1. A stem and leaf plot is used to organize data as they are collected.
2. shows the first digits of the number (thousands, hundreds or tens)
as the stem and shows the last digit (ones) as the leaf.
3. usually uses whole numbers. Anything that has a decimal point is
rounded to the nearest whole number. For example, test results,
speeds, heights, weights, etc.
4. looks like a bar graph when it is turned on its side.
5. shows how the data are spread—that is, highest number, lowest
number, most common number and outliers (a number that lies
outside the main group of numbers).
FREQUENCY TABLE:
A frequency table lists a set of values and how often each one
appears. Frequency is the number of times a specific data value occurs
in your dataset. These tables help you understand which data values
are common and which are rare.
BOX AND WHISKERS PLOT:
The box and whisker plot, sometimes simply called the box plot,
is a type of graph that help visualize the five-number summary.
A. Minimum Value
B. Maximum Value
C. Q2 (Median of the data set)
D. Q1 (Median of 1st Half)
E. Q3 ( Median of 2nd Half)
BOX AND WHISKERS PLOT:
Short boxes mean their data points consistently hover around the center
values. Taller boxes imply more variable data.
INTERQUARTILE RANGE
The interquartile range (IQR) is a measure of statistical
dispersion, which is the spread of the data.
OUTLIERS:
An extremely high or extremely low data point relative to the
nearest data point and the rest of the neighboring co-existing values in
a data graph or dataset you're working with.

TEST FOR OUTLIERS:


RANGE: [ Q1 – 1.5IQR, Q3 + 1.5IQR]
NOTE:
If max or min data is an outlier then
the new max or min is the previous
and next data respectively.
CANDLE STICKS:
Candlestick chart are similar to box plots. Both show maximum
and minimum values. The difference between them is in the
information conveyed by the box in between the max and min values.
PLATE NUMBER 1: (PROBLEMS 2 TO 6)
! 1. ANALYZE THE FOLLOWING AND CONSTRUCT
THE
A. STEM AND LEAF PLOT FOR PROBLEMS
(2,3,4,5)
B. BOX AND WHISKERS PLOT FOR ALL
PROBLEMS.
Data collected from 50 students by RAUL.
2
78, 77, 92, 80, 93, 95, 84, 93, 77, 78, 89, 96,
81, 92, 89, 77, 77, 86, 78, 77, 94, 96, 94, 95,
94, 96, 94, 76, 79, 89, 82 79, 85, 84, 83, 85,
78, 84, 81, 90, 82, 78, 88, 87, 87, 82, 95, 96,
79, 88
Create a box and whisker plot for the following data.

3 18,34,76,29,15,41,46,25,54,38,20,32,43,22
Do running times of American movies differ somehow from

4 running times of French movies? The author investigated this


question by randomly selecting 25 recent movies of each type,
resulting in the following running times:

Am: 94 90 95 93 128 95 125 91 104 116 162 102 90


110 92 113 116 90 97 103 95 120 109 91 138

Fr: 123 116 90 158 122 119 125 90 96 94 137 102 105
106 95 125 122 103 96 111 81 113 128 93 92

Construct a comparative stem-and-leaf display by listing stems


in the middle of your paper and then placing the Am leaves out
to the left and the Fr leaves out to the right.
An environmental research company is analyzing the energy consumption of

5
two residential neighborhoods, Greenfield and Sunnyvale, over a year. They
recorded the monthly energy usage in kilowatt-hours (kWh) for 12 months.
The data is as follows: Construct a box plot and comment on its feature.
A sample of 26 offshore oil workers took

6 part in a simulated escape exercise,


resulting in the accompanying data on
time (sec) to complete the escape
(“Oxygen Consumption and Ventilation
During Escape from an Offshore Platform,”
Ergonomics, 1997: 281–292):
a. Determine the value of the IQR.
b. Are there any outliers in the sample?
Any extreme outliers?
c. Construct a boxplot and comment on its
features.
MEASURES OF LOCATION:
For a given set of numbers x1, x2, x3….xn , the most familiar and
useful measure of the center is the mean, or arithmetic average of the
set.
Caustic stress corrosion cracking of iron and steel has been

6 studied because of failures around rivets in steel boilers and


failures of steam rotors. Consider the accompanying
observations on x = crack length in um as a result of constant
load stress corrosion tests on smooth bar tensile specimens for a
fixed length of time. Determine the mean of the samples.
MEASURES OF LOCATION:
The word median is synonymous with “middle,” and the sample
median is indeed the middle value once the observations are ordered
from smallest to largest. We will use the symbol to represent the
sample median.
QUARTILES
Quartiles are the set of values which has three points dividing
the data set into four identical parts.

PERCENTILES
Percentiles are a type of quantiles, obtained adopting a
subdivision into 100 groups.
25TH PERCENTILE – FIRST QUARTILE
50TH PERCENTILE – SECOND QUARTILE
75TH PERCENTILE – THIRD QUARTILE
PERCENTILES
PERCENTILE LOCATOR:

A rank of nth percentile means you are higher than the nth of the
sample.
The production of Bidri is a traditional craft of India. Bidri wares

7 (bowls, vessels, and so on) are cast from an alloy containing


primarily zinc along with some copper. Consider the following
observations on copper content (%) for a sample of Bidri
artifacts in London’s Victoria and Albert Museum (“Enigmas of
Bidri,” Surface Engr., 2005: 333–339), listed in increasing order:
Determine the 44th percentile.

2.0 2.4 2.5 2.6 2.6 2.7 2.7 2.8 3.0 3.1 3.2 3.3 3.3
3.4 3.4 3.6 3.6 3.6 3.6 3.7 4.4 4.6 4.7 4.8 5.3 10.1
TRIMMED MEAN
A trimmed mean (similar to an adjusted mean) is a method of
averaging that removes a small designated percentage of the
largest and smallest values before calculating the mean. After
removing the specified outlier observations, the trimmed mean is
found using a standard arithmetic averaging formula. The use of a
trimmed mean helps eliminate the influence of outliers or data
points on the tails that may unfairly affect the traditional or arithmetic
mean.
A trimmed mean with a moderate trimming percentage—
someplace between 5% and 25% will yield a measure of center that is
neither as sensitive to outliers as is the mean nor as insensitive as the
median.
The minimum injection pressure (psi) for injection molding

8 specimens of high amylose corn was determined for eight


different specimens (higher pressure corresponds to greater
processing difficulty), resulting in the following observations
(from “Thermoplastic Starch Blends with a Polyethylene-Co-Vinyl
Alcohol: Processability and Physical Properties,” Polymer Engr.
and Science, 1994: 17–23):

15.0 13.0 18.0 14.5 12.0 11.0 8.9 8.0

a. Determine the values of the sample mean, sample median,


and 12.5% trimmed mean, and compare these values.
b. By how much could the smallest sample observation,
currently 8.0, be increased without affecting the value of the
sample median?
Exposure to microbial products, especially endotoxin, may have

8 an impact on vulnerability to allergic diseases. The article “Dust


Sampling Methods for Endotoxin—An Essential, But
Underestimated Issue” (Indoor Air, 2006: 20–27) considered
various issues associated with determining endotoxin
concentration. The following data on concentration (EU/mg) in
settled dust for one sample of urban homes and another of farm
homes was kindly supplied by the authors of the cited article.

U: 6.0 5.0 11.0 33.0 4.0 5.0 80.0 18.0 35.0 17.0 23.0
F: 4.0 14.0 11.0 9.0 9.0 8.0 4.0 20.0 5.0 8.9 21.0 9.2 3.0 2.0 0.3

a. Determine the sample mean for each sample. How do they


compare?
b. Determine the sample median for each sample. How do they
compare? Why is the median for the urban sample so different
from the mean for that sample?
MEASURES OF VARIABILITY
The simplest measure of variability in a sample is the
range, which is the difference between the largest and smallest
sample values.
The standard deviation (deviations from the mean) are
obtained by subtracting from each of the n sample observations.
• A deviation will be positive if the observation is larger
than the mean (to the right of the mean on the measurement
axis).
• Negative if the observation is smaller than the mean.
• If all the deviations are small in magnitude, then all x’s are
close to the mean and there is little variability.
• Alternatively, if some of the deviations are large in
magnitude, then some x’s lie far from, suggesting a greater
amount of variability.
MEASURES OF VARIABILITY
VARIANCE AND STANDARD DEVIATION

Variance - a statistical measurement of the spread


between numbers in the data set. Measures how far
each number in the set is from the mean.
DEGREES OF FREEDOM (df)
It is the number of independent values that are free to
vary.
Traumatic knee dislocation often requires surgery to repair

9 ruptured ligaments. One measure of recovery is range of motion


(measured as the angle formed when, starting with the leg
straight, the knee is bent as far as possible). The given data on
postsurgical range of motion appeared in the article
“Reconstruction of the Anterior and Posterior Cruciate Ligaments
After Knee Dislocation” (Amer. J. Sports Med., 1999: 189–197):
Determine the variance and standard deviation.

154 142 137 133 122 126 135 135 108 120 127 134 122
Blood cocaine concentration (mg/L) was determined both for a sample of

10
individuals who had died from cocaine induced excited delirium (ED) and for
a sample of those who had died from a cocaine overdose without excited
delirium; survival time for people in both groups was at most 6 hours. The
accompanying data was read from a comparative boxplot in the article “Fatal
Excited Delirium Following Cocaine Use” (J. of Forensic Sciences, 1997: 25–
31).A) Calculate and compare the values of mean, median and s for the two
types of blood cocaine concentration. B) Construct a comparative boxplot
and comment on interesting features.

You might also like