0% found this document useful (0 votes)
15 views46 pages

Week 2 Chapter 2 Describing Data

The document outlines the types and organization of data, distinguishing between qualitative (categorical) and quantitative (numerical) data. It explains methods for presenting numerical data, including ordered arrays, frequency distributions, histograms, and scatter diagrams, as well as categorical data representation through bar charts and pie charts. Additionally, it discusses the importance of summarizing data for effective decision-making and visual interpretation.

Uploaded by

cminh1088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views46 pages

Week 2 Chapter 2 Describing Data

The document outlines the types and organization of data, distinguishing between qualitative (categorical) and quantitative (numerical) data. It explains methods for presenting numerical data, including ordered arrays, frequency distributions, histograms, and scatter diagrams, as well as categorical data representation through bar charts and pie charts. Additionally, it discusses the importance of summarizing data for effective decision-making and visual interpretation.

Uploaded by

cminh1088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

HO CHI MINH CITY OPEN UNIVERSITY

ADVANCED STUDY PROGRAM


CHAPTER OUTLINE

I Data

II Numerical data

III Categorical data


I. DATA
What is data
⚫ Data in raw form are usually not easy to use for
decision making

⚫ Some type of organization is needed


 Table  Graph

⚫ Data certainty known facts from which


conclusions
may be drawn
I. DATA
Type of data
Qualitative data (categorial data)
Quantitative data (numerical data)
The data can also be classified
into two types of data
Primary data: The data collected directly from the original
source is called the primary data i.e. the data collected for
the first time.
Secondary data: The data, which have already been
collected by some agency and have been processed or
used at least once are called secondary data.
I. DATA
Type of data

Data

Categorical Numerical

Examples:
 Marital Status
 Are you registered to
vote?
Discrete Continuous
 Eye Color
(Defined categories or
groups) Examples:
Examples:
  Weight
Number of Children
  Voltage
Defects per hour
(Counted items) (Measured
characte
I. DATA
Graphical presentation of data
⚫ Numerical data: data about values, numbers
⚫ Categorial data: data about categories
II. NUMERCIAL DATA

Tables used for organizing numerical data


Numerical Data

Ordered Array Frequency Cumulative


Distributions Distributions
II. NUMERCIAL DATA
2.1.The ordered array
A sequence of data, in rank order, from the smallest value
to the largest value
⚫ Shows range (min to max)

⚫ Provides some signals about variability within the range

⚫ May help identify outliers (unusual observations)

⚫ If the data set is large, the ordered array is less useful


II. NUMERCIAL DATA
2.1.The ordered array
⚫ Data in raw form (as collected):

24, 26, 24, 21, 27, 27, 30, 41, 32, 38

⚫ Data in ordered array from smallest to largest:

21, 24, 24, 26, 27, 27, 30, 32, 38, 41


II. NUMERCIAL DATA
2.2. Organizing numerical data: Frequency distribution
⚫ Sort raw data in ascending order:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41,
43, 44, 46, 53, 58
⚫ Find range: 58 - 12 = 46
⚫ Select number of classes: 5 (usually between 5 and 15)
⚫ Compute class interval (width): 10 (46/5 then round up)
⚫ Determine class boundaries (limits): 10, 20, 30, 40, 50,
60
⚫ Compute class midpoints: 15, 25, 35, 45, 55
⚫ Count observations & assign to classes
II. NUMERCIAL DATA
2.2. Organizing numerical data : Data from ordered array

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38,
41, 43, 44, 46, 53, 58

Relative Frequency = Frequency / Total, e.g. 0.10 = 2 / 20


II. NUMERCIAL DATA
2.2 Organizing numerical data : Cumulative frequency
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage Frequency Percentage

10 but less than 20 3 15 3 15


20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100

Total 20 100

Cumulative Percentage = Cumulative Frequency * 100 / Total e.g. 45% = 100*9/20


I. DATA

Tables and charts for numerical data

Numerical Data

Frequency Distributions and


Ordered Array Cumulative Distributions

Stem-and-Leaf
Display Histogram Polygon Ogive
II. NUMERCIAL DATA
2.2. Organizing numerical data
Stem and leaf diagram:
⚫ A simple way to see distribution details in
a data set

METHOD: Separate the sorted data series


into leading digits (the stem) and
the trailing digits (the leaves)
II. NUMERCIAL DATA
2.2.Organizing numerical data: example

Data in ordered array:


21, 24, 24, 26, 27, 27, 30, 32, 38, 41

⚫ Here, use the 10’s digit for the stem unit:

Stem Leaf
 21 is shown as 2 1
 38 is shown as 3 8
 41 is shown as
4 1
II. NUMERCIAL DATA
2.2.Organizing numerical data
Example

Data in ordered array:


21, 24, 24, 26, 27, 27, 30, 32, 38,
41

⚫ Completed stem and leaf


diagram
Stem Leaves
2 1 4 4 6 7 7
3 0 2 8
4 1
II. NUMERCIAL DATA
2.2.Organizing numerical data
Using other stem units
⚫ Using the 100’s digit as the stem:
◦ Round off the 10’s digit to form the leaves

Stem Leaf
 613 would become 6 3
 776 would become
7 8
 1224 would become 12 2
II. NUMERCIAL DATA
Stem and leaf
Using other stem units
⚫ Using the 100’s digit as the stem:
◦ The completed stem-and-leaf display:

Data: Stem Leaves

6 136
613, 632, 658, 717,
722, 750, 776, 827, 7 2258
841, 859, 863, 891, 8 346699
894, 906, 928, 933, 9 13368
955, 982, 1034, 10 356
1047,1056, 1140, 11 47
1169, 1224 12 2
II. NUMERCIAL DATA
2.2.Organizing numerical data

Frequency distributions
What is a frequency distribution?
⚫A frequency distribution is a list or a table …
⚫ Containing class groupings (ranges within which
the data fall) ...
⚫ The corresponding frequencies with which data fall
within each grouping or category
II. NUMERCIAL DATA
2.2. Organizing numerical data:
Why use a frequency distributions
⚫ It is a way to summarize numerical data
⚫ It condenses the raw data into a more useful
form...
⚫ It allows for a quick visual interpretation of the
data
⚫ It enables the determination of the major
characteristics of the data set including where
the data are concentrated / clustered
II. NUMERCIAL DATA
2.2. Organizing numerical data:
Class intervals and class boundaries

⚫ Each class grouping has the same width


⚫ Determine the width of each interval by

range
Width of interval
 number of desired class
groupings
 Usually at least 5 but no more than 15 groupings
 Class boundaries never overlap
 Round up the interval width to get desirable endpoints
II. NUMERCIAL DATA
2.3.The histogram
⚫ A graph of the data in a frequency distribution
is called a histogram
⚫ The class boundaries (or class midpoints) are
shown on the horizontal axis
⚫ The vertical axis is either frequency, relative
frequency, or percentage
⚫ Bars of the appropriate heights are used to
represent the number of observations within each
class
III. NUMERCIAL DATA
2.3.The histogram
Example
Class Midpoint Frequency

10 but less than 20 15 3


20 but less than 30 25 6
Histogram : Daily High Temperature
30 but less than 40 35 5
7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency

4
3
2
(No gaps 1
between bars)
0
5 15 25 35 45 55
65
III. NUMERCIAL DATA
2.3.The histogram
III. NUMERCIAL DATA
2.3.The histogram
III. NUMERCIAL DATA
2.3.The histogram
II. NUMERCIAL DATA

2.4. The frequency polygon

Class Midpoint
Class Frequency

10 but less than 20 15 3


20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4 Frequency Polygon: Daily High
50 but less than 60 55 2
Temperature
7
6
5
Frequency

4
3
2
(In a percentage polygon the 1
vertical axis would be defined to
show the percentage of 0
observations per class)
5 15 25 35 45 55 65

Class Midpoints
II. NUMERCIAL DATA
2.4.The ogive (cumulative % polygon)

Lower
class Cumulative
Class boundary Percentage

Less than 10 0 0
10 but less than 20 10 15
Ogive: Daily High Temperature
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90 1
Cumulative Percentage 0
50 but less than 60 50 100
0

8
0
6
0

4
10 20 30 40 50
0
60
2
II. NUMERCIAL DATA
Distribution shape
⚫ The shape of the distribution is said to be
symmetric if the observations are balanced, or
evenly distributed, about the center
II. NUMERCIAL DATA
2.5. Scatter diagrams
⚫ Scatter diagrams are used to examine possible
relationships between two numerical
variables

⚫ The Scatter Diagram:


◦ one variable is measured on the vertical axis
and the other variable is measured on the
horizontal axis
II. NUMERCIAL DATA
2.5. Scatter diagram
Example
Volume Cost per
per day day
23 131 Cost per Day vs. Production
24 120 Volume
250
26 140 Cost per Day

29 151 200

33 160 150

38 167 100
41 185 50
42 170
0 0 10 20 30 50 60 70
50 188 40
55 195 Volume per Day

60 200
III. CATEGORICAL DATA

Tables and charts for categorical data

Categorical Data

Graphing Data
Tabulating Data

Frequency Bar Charts Pie Charts Pareto


Diagram
Distribution
Table
III. CATEGORICAL DATA
The summary table
Summarize data by category

Example: Current Investment Portfolio


Investment Amount Percentage
Type (in thousands $) (%)

Stocks 46.5 42.27


Bonds 32.0 29.09
CD 15.5 14.09
Savings 16.0 14.55
(Variables are
Categorical)
Total 110.0 100.0
III. CATEGORICAL DATA

Bar and pie charts


⚫ Bar charts and pie charts are often used for qualitative
data (categories or nominal scale)

⚫ The bar chart visualizes a categorical variable as a


series of bars. The length of each bar represents
either the frequency or percentage of values for
each category. Each bar is separated by a space
called a gap

⚫ Height of bar or size of pie slice shows the frequency or


percentage for each category
III. CATEGORICAL DATA
3.1.Bar chart example

Current Investment Portfolio

Investor's
Portfolio

Savings
CD
Bond
s
0 1 20 30 4 5
Stock 0 0 0
Amount in
s $1000's
III. CATEGORICAL DATA

Pie charts

 The pie chart is a circle broken up into slices


that represent categories. The size of each
slice of the pie varies according to the
percentage in each category.
III. CATEGORICAL DATA
3.2.Pie chart example
Current Investment Portfolio

Savings
15%

Stocks
42%
CD 14%

Percentages
are rounded to
Bonds
29% the nearest
percent
III. CATEGORICAL DATA
3.3.Pareto diagram
⚫ Used to portray categorical data (nominal scale)
⚫A bar chart, where categories are shown
in descending order of frequency
⚫A cumulative polygon is often shown in
the same graph
⚫ Used to separate the “vital few” from the “trivial
many”
III. CATEGORICAL DATA
3.3.Pareto diagram
⚫ Forexample: 400 defective items are examined
for cause of defect:”
Source of Number of defects
Manufacturing
Error
Bad Weld 24
Poor Alignment 223
Missing Part 25
Paint Flaw 78
Electrical Short 19
Cracked case 21
Total 400
III. CATEGORICAL DATA
3.3.Pareto diagram
For example: 400 defective items are examined for cause
of defect:
⚫ Step 1: Sort by defect cause, in descending order
⚫ Step 2: Determine % in each category
Source of Number of % of Total
Manufacturing defects defect
Error
Bad Weld 24 55.75
Poor Alignment 223 19.50
Missing Part 25 8.50
Paint Flaw 78 6.25
Electrical Short 19 5.25
Cracked case 21 4.75
Total 400 100%
III. CATEGORICAL DATA
3.3.Pareto diagram example
⚫ Step 3 show the results
graphically
III. CATEGORICAL DATA
Graphs for Time-series data
⚫ A line chart (time series plot) is used to study patterns in
the values of a variable over time
⚫ Time is measured on the horizontal axis
⚫ The variable of interest is measured on the vertical axis
III. CATEGORICAL DATA
Cross table or frequency table
⚫ Cross Tables (or contingency tables) list the number of
observations for every combination of values for two
categorical or ordinal variables
⚫ If there are r categories for the first variable (rows) and c
categories for the second variable (columns), the table
is called an r x c cross table
III. CATEGORICAL DATA
Tabulating and graphing multivariate categorical
data
⚫ Contingency Table for Investment Choices by investors ( value
in
$1000’s)
III. CATEGORICAL DATA
Tabulating and graphing multivariate categorical
data
⚫ Side-by-side bar charts
C o m p a r i n g I nvestors

S avings

CD

B onds

S toc k

s 0 10 20 30 40 50 60

Inve s tor A Inve s tor B Inve s


tor C
III. CATEGORICAL DATA
Side-by-side chart: Example
⚫ Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9

60

50

40
East
30 West
Nort
20
h
10

0
1st 2nd 3rd 4th
Qtr Qtr Qtr Qtr

You might also like