Chapter 1-Overview & Descriptive Statistics - Classroom Upload

The document provides an overview of data types, including univariate, bivariate, and multivariate data, as well as classifications of variables into quantitative and categorical types. It discusses methods of data collection, including census and sampling techniques, and introduces concepts such as enumerative and analytic studies. Additionally, it covers statistical visualization techniques like stem-and-leaf displays, histograms, and boxplots, along with measures of central tendency and variability.

Uploaded by

snehahussain6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views81 pages

Chapter 1-Overview & Descriptive Statistics - Classroom Upload

Uploaded by

snehahussain6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Chapter 1

Overview &
Descriptive Statistics
Dr. Harpreet Kaur
2022-23
Populations, Samples, and Processes
Data is a collection of facts.
Univariate data records the value of only one variable for each
observation.
Multivariate data records the value of multiple variables for each
observation.
Bivariate data is a special case of multivariate data; there are two
variables quantified.
A variable is any characteristic whose value may change from one object
to another in the population.
Variables an be Quantitative or Categorical variables.
Categorical variables take values from a finite number of possibilities.
Quantitative variables, however, take numerical values.
Populations, Samples, and Processes
Data can be classified into nominal, ordinal, interval, and ratio types, the first
two breaking up the “categorical” data type and the second two breaking up
the “quantitative” data type.

A population is a group of interest.

If we collect data for the entire population, we
have conducted a census.
Usually, though, we collect data for a subset of a
population, called a sample. Our objective is to
use the data in the sample to reach conclusions
about the population as a whole.
Populations, Samples, and Processes
Enumerative Versus Analytic Studies
In enumerative studies, the population is a fixed, finite, tangible group that
presently exists.
In analytic studies the population may not presently exist.
Statistics depends crucially on how data is collected in survey style,
observational studies. If data is collected poorly, the results of analysis cannot
be trusted.
Data can be collected using a simple random sample wherein each member
of the population of interest is eligible to be randomly selected to be included
in the sample. Alternatively, stratified sampling can be employed- the
population is divided into observable strata. A simple random sample is then
selected from individuals in each strata. A third approach is convenience
sampling which selects individuals in a way that is not completely random
An enumerative study is focused on obtaining information about and taking
action on specific items contained in a frame which is a well defined group of
physical items (for instance sampling from a batch of product to answer the
question should the batch be rejected or accepted). Statistical inference made
from the data is applied to the remaining units in the frame the goal is not to
characterise the process that produces the frame but to describe and act on
the frame.
An analytical study is focused on obtaining information from the system or
process under study and taking action on the cost system to improve
performance in the future for instance sampling from a batch of product to
answer the question as the process or system changed as a result of our
actions or is the process consistently producing acceptable product the
statistical inference made from the data is applied to the process the goal is to
characterise the process that produces the frame not to describe and act on
the frame.
A distribution describes what values a variable takes and
how frequently it takes them.
Visualization is an important first step in a statistical
project, as it reveals patterns that are difficult to describe
using numbers only, and could suggest what statistical
procedures are appropriate.
1. Select the number of leading digits to be the stem
values. The remaining digits are the leaf values.
2. Draw a vertical line and list the stem values to the left of
this line, in order.
3. Record the leaf of each observation in the row
corresponding to its stem value.
4. Somewhere in the display, indicate the units of the stem
and leaves. (For example, the stems start at the tens place,
and the leaves start at the ones place.)
A stem-and-leaf display conveys information about the
following aspects of the data:
➢ identification of a typical or representative value
➢ extent of spread about the typical value
➢ presence of any gaps in the data
➢ extent of symmetry in the distribution of values
➢ number and location of peaks
➢ presence of any outlying values
A teacher asked 10 of her students how many books they had read in the
last 12 months. Their answers were as follows:
12, 23, 19, 6, 10, 7, 15, 25, 21, 12
Prepare a stem and leaf plot for these data.
The results of 41 students' math tests (with a best possible score of 70) are
recorded below:
31, 49, 19, 62, 50, 24, 45, 23, 51, 32, 48, 55, 60, 40, 35, 54, 26, 57, 37, 43, 65, 50, 55,
18, 53, 41, 50, 34, 67, 56, 44, 4, 54, 57, 39, 52, 45, 35, 51, 63, 42

1.Is the variable discrete or continuous? Explain.

2.Prepare an ordered stem and leaf plot for the data and briefly describe what it
shows.
3.Are there any outliers? If so, which scores?
4.Look at the stem and leaf plot from the side. Describe the distribution's main
features such as:
a. number of peaks
b. symmetry
c. value at the centre of the distribution
➢ The display suggests that a typical or representative
value is in the stem 4 row, perhaps in the mid-40%
range.
➢ The observations are not highly concentrated about this
typical value, as would be the case if all values were
between 20% and 49%.
➢ The display rises to a single peak as we move downward,
and then declines; there are no gaps in the display.
➢ The shape of the display is not perfectly symmetric, but
instead appears to stretch out a bit more in the direction
of low leaves than in the direction of high leaves.
➢ Lastly, there are no observations that are unusually far
from the bulk of the data (no outliers), as would be the
case if one of the 26% values had instead been 86%.
➢ At most colleges in the sample, at least one-quarter of
the students are binge drinkers. The problem of heavy
drinking on campuses is much more pervasive than
many had suspected.
Find the range of the data
represented in the given
stem & leaf Plot
For the given stem & leaf plot, what
is the median value?
What is the mode for the given stem & leaf plot?
A dotplot represents each data point as a dot along a real
number line, putting the point on the line according to its
value. If two points would be almost overlapping, they
would instead be stacked.
A dotplot gives information about location, spread, extremes,
and gaps.
A quantitative variable can be discrete- all possible values are
countable or continuous -if possible values consist of entire intervals of
the real number line. Generally, discrete variables arise from counting,
while continuous variables arise from measurements.
The frequency of a variable is the number of times that value was seen
in a dataset. For discrete variables it’s reasonable to list the frequency
of each observed value, but for continuous variables this is not
reasonable. Instead, for continuous variables, we list the frequency of a
range in which a datapoint lies.
The frequency of any particular 𝑥 value is the number of times that
value occurs in the data set. The relative frequency of a value is the
fraction or proportion of times the value occurs:

Suppose, for example, that our data set consists of 200 observations on
of courses a college student is taking this term. If 70 of these 𝑥 values
are 3, then
A frequency distribution is a tabulation of frequencies or relative frequencies.
Constructing a Histogram for Discrete Data
First, determine the frequency and relative frequency of each x value. Then mark
possible x values on a horizontal scale. Above each value, draw a rectangle whose
height is the relative frequency (or alternatively, the frequency) of that value.
How unusual is a no-hitter or a
one-hitter in a major league
baseball game, and how
frequently does a team get more
than 10, 15, or even 20 hits? The
given table is a frequency
distribution for the number of hits
per team per game for all nine-
inning games that were played
between 1989 and 1993.
Constructing a Histogram for Continuous Data
Determine the frequency and relative frequency for each class. Mark the class
boundaries on a horizontal measurement axis. Above each class interval, draw a
rectangle whose height is the corresponding relative frequency (or frequency).
Constructing a Histogram for Continuous Data: Unequal Class Widths
After determining frequencies and relative frequencies, calculate the height of
each rectangle using the formula

The resulting rectangle heights are usually called densities, and the vertical scale is
the density scale. This prescription will also work when class widths are equal.
Q 27
Histograms come in a variety of shapes.
A unimodal histogram is one that rises to a single peak and then declines.
A bimodal histogram has two different peaks. Bimodality can occur when the
data set consists of observations on two quite different kinds of individuals or
objects.
A histogram with more than two peaks is said to be multimodal. Of course, the
number of peaks may well depend on the choice of class intervals, particularly
with a small number of observations. The larger the number of classes, the more
likely it is that bimodality or multimodality will manifest itself.
A histogram is symmetric if the left half is a mirror image of the right half.
A unimodal histogram is positively skewed if the right or upper tail is stretched
out compared with the left or lower tail and negatively skewed if the stretching
is to the left.
Both a frequency distribution and a histogram can be constructed when the data
set is qualitative (categorical) in nature.
A Pareto diagram is a variation of a histogram for
categorical data resulting from a quality control
study. Each category represents a different type of
product nonconformity or production problem. The
categories are ordered so that the one with the
largest frequency appears on the far left, then the
category with the second largest frequency, and so
on. Suppose the following information on
nonconformities in circuit packs is obtained: failed
component, 126; incorrect component, 210;
insufficient solder, 67; excess solder, 54; missing
component, 131. Construct a Pareto diagram.
Quartiles:
Quartiles divide the data set into four equal parts, with the
observations above the third quartile constituting the upper
quarter of the data set, the second quartile being identical to
the median, and the first quartile separating the lower
quarter from the upper three-quarters.
Percentiles:
A data set (sample or population) can be even more finely
divided using percentiles; the 99th percentile separates the
highest 1% from the bottom 99%, and so on.
Trimmed Mean:
A trimmed mean is a compromise between 𝑥ҧ and 𝑥෤ . A 10%
trimmed mean, for example, would be computed by
eliminating the smallest 10% and the largest 10% of the
sample and then averaging what remains.

If the desired trimming percentage is 100𝛼% and 𝑛𝛼 is not

an integer, the trimmed mean must be calculated by
interpolation.
ഥ = 𝟑. 𝟔𝟓
𝒙
෥ = 𝟑. 𝟑𝟓
𝒙
ഥ𝒕𝒓(𝟕.𝟕) = 𝟑. 𝟒𝟐
𝒙
350
408
540
555
575
590
608
679
815
1285

Trimmed Trimmed Trimmed

Mean Mean Mean
Median (20%) (10%) (15%)
582.5 591.1667 596.25 593.7083
When the data is categorical, a frequency distribution or
relative frequency distribution provides an effective tabular
summary of the data. The natural numerical summary
quantities in this situation are the individual frequencies and
the relative frequencies.
Range, is the difference between the largest and smallest
sample values.
Deviations from Mean
Sum of Deviations from Mean
Squared Deviations from Mean
The Web site www.fueleconomy.gov contains a wealth of information about fuel
characteristics of various vehicles. In addition to EPA mileage ratings, there are many
vehicles for which users have reported their own values of fuel efficiency (mpg).
Consider the following sample of n=11 efficiencies for the 2009 Ford Focus equipped
with an automatic transmission (for this model, EPA reports an overall rating of 27
mpg–24 mpg for city driving and 33 mpg for highway driving):
𝑥𝑖 ’s tend to be closer to their average, 𝑥ҧ than to the population average
𝜇, so to compensate for this the divisor 𝑛 − 1 is used rather than 𝑛.
In other words, if we used a divisor 𝑛 in the sample variance, then the
resulting quantity would tend to underestimate (produce estimated
values that are too small on the average), whereas dividing by the slightly
smaller 𝑛 − 1 corrects this underestimating.

𝑠 2 is based on 𝑛 − 1 degrees of freedom (df)

Boxplots can be used to describe several of a data set’s most prominent
features such as
(1) Center
(2) Spread
(3) the extent and nature of any departure from symmetry
(4) identification of “outliers”

Because even a single outlier can drastically affect the values 𝑥ҧ of and s, a
boxplot is based on measures that are “resistant” to the presence of a few
outliers—the median and a measure of variability called the fourth spread.
Roughly speaking, the fourth spread is unaffected by the positions of those observations
in the smallest 25% or the largest 25% of the data. Hence it is resistant to outliers.

smallest 𝒙𝒊 lower fourth median upper fourth largest 𝒙𝒊

The simplest boxplot is based on the following five-number summary:

➢ First, draw a horizontal measurement scale.
➢ Then place a rectangle above this axis; the left edge of the rectangle is at the lower
fourth, and the right edge is at the upper fourth .
➢ Place a vertical line segment or some other symbol inside the rectangle at the location
of the median; the position of the median symbol relative to the two edges conveys
information about skewness in the middle 50% of the data.
➢ Finally, draw “whiskers” out from either end of the rectangle to the smallest and
largest observations.
A boxplot with a vertical orientation can also be drawn by making obvious modifications
in the construction process.
Ultrasound was used to gather the accompanying corrosion data on the thickness
of the floor plate of an aboveground tank used to store crude oil (“Statistical
Analysis of UT Corrosion Data from Floor Plates of a Crude Oil Aboveground
Storage Tank,” Materials Eval., 1994: 846–849); each observation is the largest pit
depth in the plate, expressed in milli-in.

40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125

𝑥෤ = 92.17 l𝑜𝑤𝑒𝑟 4𝑡ℎ = 45.64 𝑢𝑝𝑝𝑒𝑟 4𝑡ℎ = 167.79
𝑓𝑠 = 122.15
1.5𝑓𝑠 = 183.225
3𝑓𝑠 = 366.45
325 359 370 393
325 359 373 394
334 363 373 397
339 364 374 402
356 364 375 403
356 366 389 424
369 392

Load Out
100% (2)
Load Out
239 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
5 pages
Engineering Data Analysis Given Notes
No ratings yet
Engineering Data Analysis Given Notes
3 pages
Chapter 1 Mathematics
No ratings yet
Chapter 1 Mathematics
2 pages
Stat 101
100% (4)
Stat 101
25 pages
Slides 1 Statistics
No ratings yet
Slides 1 Statistics
171 pages
Stat 104 2.2docx
No ratings yet
Stat 104 2.2docx
3 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
37 pages
What Is Raw Data?
No ratings yet
What Is Raw Data?
8 pages
Chapter 01
No ratings yet
Chapter 01
55 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Lecture 02
No ratings yet
Lecture 02
34 pages
L1 Descriptive Stats
No ratings yet
L1 Descriptive Stats
149 pages
4 1 COPAI-Method
No ratings yet
4 1 COPAI-Method
6 pages
Analysing Quantitative Data
No ratings yet
Analysing Quantitative Data
27 pages
Statistical Data Presentation Tools
0% (1)
Statistical Data Presentation Tools
21 pages
Chapter 1 Descriptivestatistics
No ratings yet
Chapter 1 Descriptivestatistics
21 pages
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
100% (1)
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
4 pages
Statistics Midterms Reviewer 1
No ratings yet
Statistics Midterms Reviewer 1
9 pages
Statistics and Probability: Bill Thaddeus Padasas
No ratings yet
Statistics and Probability: Bill Thaddeus Padasas
102 pages
V2 Chapter3 Summer 2020 - 21 - Tagged
No ratings yet
V2 Chapter3 Summer 2020 - 21 - Tagged
36 pages
Chap1 Introduction To Applied Probability Statistics Upload
No ratings yet
Chap1 Introduction To Applied Probability Statistics Upload
87 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
Introduction To Qa
No ratings yet
Introduction To Qa
4 pages
WEEK1
No ratings yet
WEEK1
36 pages
6 7graphs
No ratings yet
6 7graphs
35 pages
CH 02
No ratings yet
CH 02
20 pages
Chapter 1 Lecture Slides
No ratings yet
Chapter 1 Lecture Slides
22 pages
All Lectures
No ratings yet
All Lectures
53 pages
0 Lec 4 5
No ratings yet
0 Lec 4 5
29 pages
Lecture No. Statistics and Probability
No ratings yet
Lecture No. Statistics and Probability
64 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Chapter 1 - Introduction To Statistics
No ratings yet
Chapter 1 - Introduction To Statistics
38 pages
Research Methodology and Scientific Writing AS 402 Descriptive Statistics
No ratings yet
Research Methodology and Scientific Writing AS 402 Descriptive Statistics
41 pages
Chap6 STAT 2
No ratings yet
Chap6 STAT 2
11 pages
Confirmatory Data Analysis (CFA)
No ratings yet
Confirmatory Data Analysis (CFA)
8 pages
IE 220 Probability and Statistics: Descriptive Statistics - Graphical Summary: Describing Data With Graphs
No ratings yet
IE 220 Probability and Statistics: Descriptive Statistics - Graphical Summary: Describing Data With Graphs
36 pages
Ae 9 Reviewer
No ratings yet
Ae 9 Reviewer
7 pages
Statanalysis C2a
No ratings yet
Statanalysis C2a
6 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
17 pages
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
No ratings yet
COR-STAT1202 Introductory Statistics Seminar 2 Full Version
17 pages
STAT 111: Introduction To Statistics and Probability: Lecture 2: Data Reduction
No ratings yet
STAT 111: Introduction To Statistics and Probability: Lecture 2: Data Reduction
28 pages
Statistics Notes
No ratings yet
Statistics Notes
89 pages
AEM Lecture 2
No ratings yet
AEM Lecture 2
71 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
MC Math 13 Module 4
No ratings yet
MC Math 13 Module 4
12 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Lecture 01 Introduction To Statistics PPT 06022025 095924am
No ratings yet
Lecture 01 Introduction To Statistics PPT 06022025 095924am
40 pages
AP Statistics Study
No ratings yet
AP Statistics Study
76 pages
3 Graphical Methods For Describing Data
No ratings yet
3 Graphical Methods For Describing Data
46 pages
TN 3 2.2 - 2.3 - 2.4
No ratings yet
TN 3 2.2 - 2.3 - 2.4
5 pages
Statistics For Research: Data and Variables
No ratings yet
Statistics For Research: Data and Variables
7 pages
STAT606 Class03
No ratings yet
STAT606 Class03
18 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Organizing-Data 250120 180858
No ratings yet
Organizing-Data 250120 180858
32 pages
3 Data Description and Measures of Central Tenndency
No ratings yet
3 Data Description and Measures of Central Tenndency
72 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
BLANKS: Checks The BOD Water & BOD Bottles: Notes
No ratings yet
BLANKS: Checks The BOD Water & BOD Bottles: Notes
2 pages
DYNA102 Stanadyne Pump
100% (3)
DYNA102 Stanadyne Pump
4 pages
Zinc Flake Coating Ex Geomet
No ratings yet
Zinc Flake Coating Ex Geomet
7 pages
Application of Assembly Construction in Intelligen
No ratings yet
Application of Assembly Construction in Intelligen
6 pages
HGP11 Q3 W3 - Las
No ratings yet
HGP11 Q3 W3 - Las
13 pages
(-) Collapse All: Jamnagar Municipal Corporation (JMC)
No ratings yet
(-) Collapse All: Jamnagar Municipal Corporation (JMC)
5 pages
Here Are The Stages in The Procurement Process
No ratings yet
Here Are The Stages in The Procurement Process
6 pages
Coquille Coho Business Plan 2017
No ratings yet
Coquille Coho Business Plan 2017
9 pages
Franck Hertz
No ratings yet
Franck Hertz
6 pages
Design Calculation: Hindustan Construction Co. LTD
No ratings yet
Design Calculation: Hindustan Construction Co. LTD
13 pages
Baum Et Al 2021 Artificial Intelligence in Chemistry Current Trends and Future Directions
No ratings yet
Baum Et Al 2021 Artificial Intelligence in Chemistry Current Trends and Future Directions
16 pages
Electric Project Documentation: . +A PROJECT 2507 1. Information PLUG-SPRAY2507
No ratings yet
Electric Project Documentation: . +A PROJECT 2507 1. Information PLUG-SPRAY2507
5 pages
HW - 7 1
No ratings yet
HW - 7 1
4 pages
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
No ratings yet
2425 - Pgdlma - Elscon - Mock - Assessment - Tagged
4 pages
MFR11 Manual
No ratings yet
MFR11 Manual
59 pages
Medical Devices Report 2020 Revmar2021
No ratings yet
Medical Devices Report 2020 Revmar2021
15 pages
Coursework Assessment Summary Form Cie
100% (2)
Coursework Assessment Summary Form Cie
8 pages
Soualmia 2016
No ratings yet
Soualmia 2016
5 pages
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
No ratings yet
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
2 pages
(15PR201203644338) PDF
No ratings yet
(15PR201203644338) PDF
4 pages
Introduction To Logic Module 3 Language and Definitions
No ratings yet
Introduction To Logic Module 3 Language and Definitions
16 pages
Technical Paper (Seismic Fragility Curves) - Dec2018
No ratings yet
Technical Paper (Seismic Fragility Curves) - Dec2018
32 pages
Questionpaper Paper1P June2017 PDF
No ratings yet
Questionpaper Paper1P June2017 PDF
36 pages
Value of Philippine Literature
No ratings yet
Value of Philippine Literature
14 pages
Installation Art: New Media Art
No ratings yet
Installation Art: New Media Art
16 pages
Eyongand Akpa Publication 2
No ratings yet
Eyongand Akpa Publication 2
13 pages
Log
No ratings yet
Log
2 pages
Fact Family Trees PDF
No ratings yet
Fact Family Trees PDF
5 pages
HSBC Digital Starter Kit Masterbrand HBPH
No ratings yet
HSBC Digital Starter Kit Masterbrand HBPH
27 pages

Chapter 1-Overview & Descriptive Statistics - Classroom Upload

Uploaded by

Chapter 1-Overview & Descriptive Statistics - Classroom Upload

Uploaded by

Chapter 1

A population is a group of interest.

1.Is the variable discrete or continuous? Explain.

If the desired trimming percentage is 100𝛼% and 𝑛𝛼 is not

Trimmed Trimmed Trimmed

𝑠 2 is based on 𝑛 − 1 degrees of freedom (df)

smallest 𝒙𝒊 lower fourth median upper fourth largest 𝒙𝒊

The simplest boxplot is based on the following five-number summary:

40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125

You might also like