0% found this document useful (0 votes)

20 views19 pages

05.1 Data Organization PRESENTATION

This document provides an introduction to statistics, covering key concepts such as data collection, descriptive statistics, statistical inference, and the importance of samples. It explains various methods for presenting data, including frequency distributions, histograms, and stem-and-leaf displays, as well as measures of central tendency and variation. Additionally, it discusses quartiles and percentiles in the context of data analysis.

Uploaded by

ckranock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views19 pages

05.1 Data Organization PRESENTATION

Uploaded by

ckranock

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Section 5 –

Introduction to Organization &

Description of Data (Statistics)
ENGR 3311 – Engineering Math Methods
Instructor: Michael Weeks, PhD
Spring 2024
Introduction to Statistics
• Statistics includes the collection, processing, analysis, and
interpretation of numerical data.
• The results from the statistical analysis can provide the basis for
making decisions or choosing actions.
• Statistics can also be described as the study of how to make
inference and decisions in the face of uncertainty and variability.
• Probability theory is devoted to the study of uncertainty and
variability.
• Descriptive Statistics:
• the presentation of data in tables and charts,
• the summarization of data by means of numerical
descriptions and graphs.
• Statistical Inference:
• the task of making generalizations based on the sample
data,
• allows for making inferences beyond the information
contained in the data set.
• Any experiment or investigation involves the collection of
relevant data.
• A thorough evaluation requires an exhaustive set of data to be
collected.
• A unit is defined as a single entity, usually an object or a person,
whose characteristics are of interest, i.e. the source of each
measurements.
• A variable is often used to quantify the characteristics of interest.
• A population of units is the complete collection of units about
which information is sought.
• A statistical population is the set of all measurements that
correspond to each unit in the entire population of units about
which information is sought.
2
Population and Sample

• It is often impossible or infeasible to obtain a complete set

of data.
• In most situations, we must work with only partial
information.
• A sample from a statistical population is the subset of
measurements that are actually collected in the course of
an investigation.
• The distinction between the data actually collected and the
vast information of all potential observations is a key to
understanding statistics.

• The sample needs both to be representative of the

population and to be large enough to contain sufficient
information to answer the questions about the population
that are crucial to the investigation.
• The selection of a sample from a finite population must be
done impartially and objectively.
• Avoid any bias due to self-selected samples.
• The selection can be carried out using a chance mechanism,
or a random number generator.

• Why use samples?

• The population may be too large to count
• The population may be too dangerous to observe
• The population may be too difficult to measure

3
Presentation of Data
• Data obtained from experiments and investigations are
often large and needs to be condensed into a suitable form
for extracting any meaningful information.
• It is typical to group the data and present them in tabular
or graphical form.
• Graphical presentations are often the most effective way of
communicating the information.

Raw data:
71 91 63 99 93 88 95 63 67
76 65 68 82 81 83 61 77 100
87 68 85 60 98 89 80 78 82

Figure 1.1: Histogram of Grades

Range Frequency 10

8
90-100
Frequency

6
80-89
4
70-79
2

60-69 0
60-69 70-79 80-89 90-100
Grade Range

4
Pareto Diagrams
• A diagram that contains both a bar chart and a line graph.
• The bar chart represents the individual values, and often in
descending order.
• The cumulative total, or percentage (%), is represented by
the line graph.
• The purpose of the Pareto diagram is to highlight the most
important among a set of factors.

• The following example illustrates the amount of

improvement that can be made by addressing the first two
major causes of faults on a machine.

Fault Frequency
Power fluctuations 6
Unstable controller 22
Operator error 13
Worn tool 2
other 5

5
Dot Diagrams

• Used for identifying variations, or patterns of variations.

3 6 -2 4 7 4 3

• Dot plots are usually used for small data sets. They are
useful for highlighting clusters, gaps, skews in distribution,
and outliers.
38.9 58.0 96.3 122.2 155.6 333.3 3408.0

• A dot diagram can be generated for multiple samples that

help reveal the differences between them.
Example:
• Samples of the copper content in a welding material
produced from one plant are as follows.
0.27 0.35 0.37
• Samples from another plant are as follows.
0.23 0.15 0.25 0.24 0.30 0.33 0.26

6
Frequency Distributions
• A frequency distribution is a table that divides a set of data
into a suitable number of classes (or categories), showing
also the number of items belonging to each category.
• This grouping often highlights some important features of
the data.
• Once the data have been grouped, each observation has lost
its identity in the sense that its exact value is no longer
known.
• The first step in constructing a frequency distribution
consists of deciding how many classes to use and choosing
the class limits for each class.
• Use between 5 and 15 different classes.
• The different classes should:
• Not overlap
• Accommodate all the data
• Have the same width
Raw data:
245 333 296 304 276 336 289 234 253 292
366 323 309 284 310 338 297 314 305 330
266 391 315 305 290 300 292 311 272 312
315 355 346 337 303 265 278 276 373 271
308 276 364 390 298 290 308 221 274 343

Classes Frequency
206 – 245 3
246 – 285 11
286 – 325 23
326 – 365 9
366 – 405 4

7
Frequency Distributions

• The number of observations in each class are counted to

obtain the frequency distribution.
• The class limits are given to as many decimal places as the
original data.
• The ranges in each class can be defined using the endpoint
convention.
• For example, and using the right-hand endpoint convention,
the class (205,245] includes all data between 205 and 245,
but not including 205.
• The class boundaries are the endpoints of the intervals that
specify each class.
• The class interval is the length of the range for the class. All
classes are typically of equal length.
• The class marks of a frequency distribution are obtained by
averaging successive class limits or boundaries.
Raw data:
245 333 296 304 276 336 289 234 253 292
366 323 309 284 310 338 297 314 305 330
266 391 315 305 290 300 292 311 272 312
315 355 346 337 303 265 278 276 373 271
308 276 364 390 298 290 308 221 274 343

Classes Frequency
(205,245] 3
(245,285] 11
(285,325] 23
(325,365] 9
(365,405] 4

8
Cumulative Distributions
• An alternative form of distributions into which data are grouped.
• A cumulative “less-than-or-equal-to” distribution shows the total
number of observations that are less than or equal to the given
values.
• A cumulative “less-than” distribution is when the class includes
the left-hand endpoint but not the right-hand endpoint.
• A cumulative “greater-than” distribution are similarly
constructed by adding the frequencies, one by one, starting at the
end of the frequency distribution.
Classes Frequency Cumulative Cumulative
(≤) (≥)
206 – 245 3 3 50
246 – 285 11 14 47
286 – 325 23 37 36
326 – 365 9 46 13
366 – 405 4 50 4

Percentage Distributions
• Comparing distributions can be easily done if they are each
converted in percentage distributions.
• This is accomplished by dividing each class frequency by the total
frequency (or number of observations) and multiply by 100.
• The result of the percentage of data that falls into each class of the
distribution.
Classes Frequency Frequency (%) Cumulative
(≤)
206 – 245 3 6% 6%
246 – 285 11 22% 28%
286 – 325 23 46% 74%
326 – 365 9 18% 92%
366 – 405 4 8% 100%

9
Graphs of Frequency Distributions

• The most common form of graphical representation of a

frequency distribution is the histogram.
• The histogram consists of rectangles with heights equal to
the class frequencies and the bases extending between
successive class boundaries.
Classes Frequency
206 – 245 3
246 – 285 11
286 – 325 23
326 – 365 9
366 – 405 4

• The cumulative distributions are typically presented in the

form of ogives, where the cumulative frequencies are
plotted at the class boundaries.
• The resulting points are connected by means of straight
lines.
• The curve is the steepest over the class with the highest
frequency.

Classes Cumulative
(205,245] 3
(245,285] 14
(285,325] 37
(325,365] 46
(365,405] 50

10
Stem-and-Leaf Displays
• The previous methods involved the grouping of large sets of
data to present them in a manageable form.
• This entailed some loss of information.
• To avoid the loss of information, the following stem-and-
leaf display can be used to keep track of the last digits of
the readings within each class.
• The stem is the left-hand column which contains the tens
digits.
• Each number to the right of the vertical line is a leaf.
• For example, the first row corresponds to the data 12, 17,
and 15.
Raw data:
29 44 12 53 21 34 39 25 48 23
17 24 27 32 34 15 42 21 28 37 10 – 19 2 7 5
20 – 29 9 1 5 3 4 7 1 8
Classes Frequency 30 – 39 4 9 2 4 7
10 – 19 3 40 – 49 4 8 2
20 – 29 8 50 – 59 3
}
30 – 39 5
}

40 – 49 3
stem leaves
50 – 59 1

• The same stem-and-leaf display

can be constructed with the 1 2 5 7
stem column only showing the 2 1 1 3 4 5 7 8 9
digit that corresponds to the 3 2 4 4 7 9
tens. 4 2 4 8
5 3
• The list of units to the right of
}

the vertical line, i.e. the leaves,

}

can also be listed in ascending stem leaves

order.
11
Descriptive Measures

• In addition to graphical representations of the data,

numerical measures can also be used to describe the data.

• The descriptive measures are computed from a sample of

data (raw or ungrouped) of measurements
.

• The sample mean is the sum of all of the observations in

the data set divided by the sample size .

• The sample median is the center, or location, of a set of

data. If the observations are arranged in an ascending or
descending order:
• the median is the middle value if the number of observations
is odd.
• the median is the average of the two middle values if the
number of observations is even.

12
Descriptive Measures (Example 1)

An engineering group receives e-mail requests for technical

information from sales and service. The daily numbers of e-
mails for six days are:

11 9 17 19 4 15

Find the mean and median.

13
Descriptive Measures

• The sample mean and median summarizes a set of data in

terms of a single number. It describes their “middle” or
average.
• Another important measure is the variation of a set of data
in terms of the amount by which the values deviate from
their mean.
• For a set of observations with a mean , the
following are the deviations from the mean.

• The mean of the deviations can used as a measure of the

variations in the set of data. But the deviations sum to .

• The most common measure of variation is the average of

the squared deviations from the mean . This is known as
the sample variance .

• The greater the variance, the larger the overall data range.
• The calculation of variance uses squares and thus weights
outliers more heavily than data very near the mean.
• The standard deviation of the observations is the square
root of the variance. It is more commonly used than the
variance since it can be expressed in the same units as the
observation.

14
Descriptive Measures (Example 2)

Calculate the variance and standard deviation of the

following data sample.
0.6 1.2 0.9 1.0 0.6 0.8

S2=0.055
S=0.235 15
Quartiles and Percentiles

• The median divides the set of data into two halves.

• When an ordered data set is divided into quarters, the

resulting division points are called sample quartiles.

• The first quartile, , is the data value that has of

observations below its value.

• The first quartile is also known as the 25th percentile, or

. .

• The median is also known as the 50th percentile.

Inter Quartile Range (IQR) = 3rd Quartile – 1st Quartile = Q3 – Q1

• The sample (100𝑝)th percentile is a value such that at least 100𝑝%

of the observation are at or below this value, and at least
100(1 − 𝑝)% are at or above this value. (0.00 < 𝑝 < 1.00)

To calculate the sample (100 )th percentile:

• Order the observations from the smallest to the largest.
• Determine the product .
• If 𝑛𝑝 is not an integer, round it up to the next integer and find the
corresponding ordered value.
• If 𝑛𝑝 is an integer 𝑘, find the mean of the 𝑘 and 𝑘 + 1 ordered
observations.

16
Quartiles and Percentiles (Example 3)

Find the 1st quartile, 2nd quartile, 3rd quartile, and the 93rd
percentile for the following ordered data.

221 234 245 253 265 266 271 272 274 276
276 276 278 284 289 290 290 292 292 296
297 298 300 303 304 305 305 308 308 309
310 311 312 314 315 315 323 330 333 336
337 338 343 346 355 364 366 373 390 391

Q1=278
Q2=304.5
Q3=323 17
P0.93=366
Boxplots
• The summary information
contained in the quartiles is
highlighted in a graphic display
called a boxplot.
• The center half of the data,
extending from the 1st to the 3rd
quartile, is represented by a
rectangle.
• The median is identified by a bar
within the box.
• A line extends from the 3rd
quartile to the maximum, and
another line extends from the 1st
quartile to the minimum of the
data set.

• Boxplots are also referred to as a

Five-number summary:
• Minimum
• 1st Quartile 𝑄
• 2nd Quartile 𝑄
• 3rd Quartile 𝑄
• Maximum
• Multiple boxplots on the same
display can reveal differences
and similarities among the
various sets of observations.

18
Descriptive Measures (Grouped Data)

• The calculation of the descriptive measures, such as sample

mean and standard deviation, can be simplified if the data
is grouped.

where is the class mark of the ith class, is the

corresponding class frequency, and is the number of classes
in the distribution.
Classes Frequency
(205,245] 3
(245,285] 11
(285,325] 23
(325,365] 9
(365,405] 4

EASA Part-66 Exam Questions of Module 05 Avionics - Part VI
80% (5)
EASA Part-66 Exam Questions of Module 05 Avionics - Part VI
22 pages
Methods of Data Collection and Presentation
No ratings yet
Methods of Data Collection and Presentation
33 pages
Intro To Statistics
No ratings yet
Intro To Statistics
38 pages
QA1 Notes Binder
No ratings yet
QA1 Notes Binder
139 pages
Lesson Plan 6
100% (3)
Lesson Plan 6
3 pages
Methods of Organizing Data
No ratings yet
Methods of Organizing Data
8 pages
DataPresentation PSTN01E
No ratings yet
DataPresentation PSTN01E
30 pages
MODULE IN STATISTICS Frequency Distribution and Graph
No ratings yet
MODULE IN STATISTICS Frequency Distribution and Graph
13 pages
Statistics and Probability
No ratings yet
Statistics and Probability
253 pages
Dokumen PDF 2
No ratings yet
Dokumen PDF 2
21 pages
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
No ratings yet
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
43 pages
Stat 02
No ratings yet
Stat 02
62 pages
Eng4201 Note Merged-2
No ratings yet
Eng4201 Note Merged-2
58 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
7 pages
Assessment Learning 2. M4
No ratings yet
Assessment Learning 2. M4
10 pages
5315 ch00 Plotschartshistogram
No ratings yet
5315 ch00 Plotschartshistogram
37 pages
4 - Organizing Data
No ratings yet
4 - Organizing Data
42 pages
Lecture-3 Frequency Distribution
No ratings yet
Lecture-3 Frequency Distribution
22 pages
Statistics and Probability
No ratings yet
Statistics and Probability
196 pages
Module in Statistic Data Representation
No ratings yet
Module in Statistic Data Representation
12 pages
PDF Document
No ratings yet
PDF Document
28 pages
STAT111 Module3-PresentationOfData
No ratings yet
STAT111 Module3-PresentationOfData
9 pages
Wordpress Documentation
No ratings yet
Wordpress Documentation
24 pages
2035 CH2 Notes
No ratings yet
2035 CH2 Notes
42 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
65 pages
PLU Quantitative Techniques 2
No ratings yet
PLU Quantitative Techniques 2
20 pages
EMBA Day3
No ratings yet
EMBA Day3
29 pages
Chapter 2 - Describing The Data
No ratings yet
Chapter 2 - Describing The Data
9 pages
Descriptive Statistics
100% (1)
Descriptive Statistics
18 pages
BIOSTAT Chapter2
100% (1)
BIOSTAT Chapter2
57 pages
Chapter 2 Summarising Data
No ratings yet
Chapter 2 Summarising Data
13 pages
STA112 Week 2 Class Note
No ratings yet
STA112 Week 2 Class Note
102 pages
Lesson 2: Summarizing Data
No ratings yet
Lesson 2: Summarizing Data
53 pages
Part 1 Descriptive
No ratings yet
Part 1 Descriptive
42 pages
Balanced Reading Program 2018 - 2019
No ratings yet
Balanced Reading Program 2018 - 2019
8 pages
Methods of Data Presntation
No ratings yet
Methods of Data Presntation
53 pages
Lecture 7 Quantitative Reasoning
No ratings yet
Lecture 7 Quantitative Reasoning
7 pages
Statistics Lec 2
No ratings yet
Statistics Lec 2
25 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Week 02 Data Organizatiion and Presentaion
No ratings yet
Week 02 Data Organizatiion and Presentaion
51 pages
Chapter-2-Methods of Data Presentation
No ratings yet
Chapter-2-Methods of Data Presentation
17 pages
Data Presentation
No ratings yet
Data Presentation
19 pages
Chapter 1 Descriptive Data
No ratings yet
Chapter 1 Descriptive Data
113 pages
STA 111 - Topic One - Lecture 2
No ratings yet
STA 111 - Topic One - Lecture 2
20 pages
Statistics Chapter-II
No ratings yet
Statistics Chapter-II
66 pages
Chapter I - The Self From The Various Perspectives-1
No ratings yet
Chapter I - The Self From The Various Perspectives-1
19 pages
Mathematical Statistics: Instructor: Dr. Deshi Ye
No ratings yet
Mathematical Statistics: Instructor: Dr. Deshi Ye
42 pages
Frequency, Distribution & Graphs
No ratings yet
Frequency, Distribution & Graphs
4 pages
Topic 3
No ratings yet
Topic 3
22 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Population vs. Sample
100% (1)
Population vs. Sample
44 pages
(Ebook) - Piers Anthony - Ghost
No ratings yet
(Ebook) - Piers Anthony - Ghost
116 pages
Chapter2: Summarizing and Graphing Data 2-2 Frequency Distributions
No ratings yet
Chapter2: Summarizing and Graphing Data 2-2 Frequency Distributions
4 pages
1st Mid
No ratings yet
1st Mid
19 pages
Describing Data New
No ratings yet
Describing Data New
13 pages
Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data
100% (1)
Business Statistics: Graphs, Charts, and Tables - Describing Your Data Graphs, Charts, and Tables - Describing Your Data
74 pages
Graphical Representations and Frequency Distribution
No ratings yet
Graphical Representations and Frequency Distribution
12 pages
GE 4 Module 10
No ratings yet
GE 4 Module 10
16 pages
2 Frequency Distribution and Graphs
0% (1)
2 Frequency Distribution and Graphs
4 pages
What Is Statistics
No ratings yet
What Is Statistics
147 pages
Math 140 Chapter 2 Notes
No ratings yet
Math 140 Chapter 2 Notes
5 pages
'Asiya Bint Muzahim - The Wife of The Pharaoh
100% (7)
'Asiya Bint Muzahim - The Wife of The Pharaoh
2 pages
Chapter 2 Review
No ratings yet
Chapter 2 Review
12 pages
FIRO-B Test
No ratings yet
FIRO-B Test
4 pages
Report 7
No ratings yet
Report 7
13 pages
II PUC English Notes
No ratings yet
II PUC English Notes
2 pages
Franz Schubert and His Times - Kobald, Karl, 1876 - Marshall, Beatrice, TR - 1970 - Port Washington, N.Y., Kennikat Press - 9780804607568 - Anna's Archive
No ratings yet
Franz Schubert and His Times - Kobald, Karl, 1876 - Marshall, Beatrice, TR - 1970 - Port Washington, N.Y., Kennikat Press - 9780804607568 - Anna's Archive
328 pages
Emerson-Thoreau Test Review
No ratings yet
Emerson-Thoreau Test Review
5 pages
Relevant Coursework For Physical Therapy
100% (2)
Relevant Coursework For Physical Therapy
4 pages
AP Euro French Revolution Reading
No ratings yet
AP Euro French Revolution Reading
26 pages
CH 10 I Grabbed The Leash of The Blind Beast - Read at WuxiaWorldEU
No ratings yet
CH 10 I Grabbed The Leash of The Blind Beast - Read at WuxiaWorldEU
1 page
Cultural Identity Assignment
No ratings yet
Cultural Identity Assignment
4 pages
Venkata Ravi Kadali: Snowflake Architect/BI Analytics
No ratings yet
Venkata Ravi Kadali: Snowflake Architect/BI Analytics
2 pages
Physical Assessment Write-Up
No ratings yet
Physical Assessment Write-Up
10 pages
Buy Ebook Mechanical Behavior of Materials 2nd Edition by Thomas H. Courtney Wei Zhi Cheap Price
100% (3)
Buy Ebook Mechanical Behavior of Materials 2nd Edition by Thomas H. Courtney Wei Zhi Cheap Price
24 pages
Weekly Training Plan and Accomplishment Report
No ratings yet
Weekly Training Plan and Accomplishment Report
8 pages
Komatsu Forklift Fd07!09!2 Fg07 09 2 Part Manual Fd0709 2 Pne1
No ratings yet
Komatsu Forklift Fd07!09!2 Fg07 09 2 Part Manual Fd0709 2 Pne1
22 pages
Awards Level A2 June 2015
No ratings yet
Awards Level A2 June 2015
12 pages
Species Concepts
No ratings yet
Species Concepts
33 pages
Exam 1 KeyFinance
No ratings yet
Exam 1 KeyFinance
7 pages
What You Are Losing Being A Digital Zombie
No ratings yet
What You Are Losing Being A Digital Zombie
2 pages
Reviewer For PCM 1 PDF
No ratings yet
Reviewer For PCM 1 PDF
8 pages
Fundamental Analysis and Stock Returns: An Indian Evidence: Full Length Research Paper
No ratings yet
Fundamental Analysis and Stock Returns: An Indian Evidence: Full Length Research Paper
7 pages
Dat LM3940
No ratings yet
Dat LM3940
9 pages
Example Type2Settling
No ratings yet
Example Type2Settling
2 pages
Jeremy Anderson Resume
No ratings yet
Jeremy Anderson Resume
2 pages
Raz Cqlc21 Ilookedevery
No ratings yet
Raz Cqlc21 Ilookedevery
1 page
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

05.1 Data Organization PRESENTATION

Uploaded by

05.1 Data Organization PRESENTATION

Uploaded by

Section 5 –

Introduction to Organization &

• It is often impossible or infeasible to obtain a complete set

• The sample needs both to be representative of the

• Why use samples?

Figure 1.1: Histogram of Grades

• The following example illustrates the amount of

• Used for identifying variations, or patterns of variations.

• A dot diagram can be generated for multiple samples that

• The number of observations in each class are counted to

• The most common form of graphical representation of a

• The cumulative distributions are typically presented in the

• The same stem-and-leaf display

the vertical line, i.e. the leaves,

can also be listed in ascending stem leaves

• In addition to graphical representations of the data,

• The descriptive measures are computed from a sample of

• The sample mean is the sum of all of the observations in

• The sample median is the center, or location, of a set of

An engineering group receives e-mail requests for technical

Find the mean and median.

• The sample mean and median summarizes a set of data in

• The mean of the deviations can used as a measure of the

• The most common measure of variation is the average of

Calculate the variance and standard deviation of the

• The median divides the set of data into two halves.

• When an ordered data set is divided into quarters, the

• The first quartile, , is the data value that has of

• The first quartile is also known as the 25th percentile, or

• The median is also known as the 50th percentile.

Inter Quartile Range (IQR) = 3rd Quartile – 1st Quartile = Q3 – Q1

• The sample (100𝑝)th percentile is a value such that at least 100𝑝%

To calculate the sample (100 )th percentile:

• Boxplots are also referred to as a

• The calculation of the descriptive measures, such as sample

where is the class mark of the ith class, is the

You might also like