0% found this document useful (0 votes)
37 views

14-Pattern of Data

The document discusses various ways to describe patterns of data distribution including the center, spread, shape, and unusual features. It covers different charts and graphs used to visualize distributions such as dot plots, bar charts, histograms, stem-and-leaf plots, box plots, and scatter plots. Tables are also presented as an alternative way to display data. Key aspects like comparing distributions and Simpson's paradox are mentioned.

Uploaded by

Hoàng Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

14-Pattern of Data

The document discusses various ways to describe patterns of data distribution including the center, spread, shape, and unusual features. It covers different charts and graphs used to visualize distributions such as dot plots, bar charts, histograms, stem-and-leaf plots, box plots, and scatter plots. Tables are also presented as an alternative way to display data. Key aspects like comparing distributions and Simpson's paradox are mentioned.

Uploaded by

Hoàng Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Pattern of data

Part 1 – section 4

Lecturer: Le Hoai Long (Ph.D.)


1
[email protected]
Center
• The center of a distribution is located at the
median of the distribution.
• This is the point where about half of the
observations are on either side.

Lecturer: Le Hoai Long (Ph.D.)


2
[email protected]
Spread
• The spread of a distribution refers to the
variability of the data.
• If the observations cover a wide range, the
spread is larger. If the observations are
clustered around a single value, the spread is
smaller

Lecturer: Le Hoai Long (Ph.D.)


3
[email protected]
Shape
• The shape of a distribution is described by the
following characteristics.
– Symmetry
– Number of peaks. Distributions can have few
or many peaks.
• Distributions with one clear peak are called
unimodal,
• and distributions with two clear peaks are
called bimodal.

Lecturer: Le Hoai Long (Ph.D.)


4
[email protected]
Shape
• And by the following characteristics.
– Skewness. Distributions with most of their
observations on the left (toward lower values)
are said to be skewed right; and so on.
– Uniform. When the observations in a set of
data are equally spread across the range of the
distribution, the distribution is called a uniform
distribution.

Lecturer: Le Hoai Long (Ph.D.)


5
[email protected]
Shape

Lecturer: Le Hoai Long (Ph.D.)


6
[email protected]
Gap and outlier
• Gaps: areas of a
distribution where
there are no
observations.
• Outliers: distributions
are characterized by
extreme values that
differ greatly from the
other observations.
Lecturer: Le Hoai Long (Ph.D.)
7
[email protected]
Chart and graph
Dotplot
• A dotplot is made up of dots plotted on a graph.
– Each dot can represent a single observation or a
specified number of observations.
– The dots are stacked in a column over a category
– If the categories are quantitative, the pattern of data
in a dotplot can be described in terms of symmetry
and skewness
• Dotplots are used most often to plot frequency
counts within a small number of categories,
usually with small sets of data.

Lecturer: Le Hoai Long (Ph.D.)


8
[email protected]
Dotplot
• In SPSS:
1. Graphs
2. Legacy
dialogs
3. Scatter/
Dot

Lecturer: Le Hoai Long (Ph.D.)


9
[email protected]
Chart and graph
Bar Charts
• A bar chart is made up of columns plotted on
a graph.
– The columns are positioned over a label that
represents a categorical variable.
– The height of the column indicates the size of the
group defined by the column label.

Lecturer: Le Hoai Long (Ph.D.)


10
[email protected]
Chart and graph
Histograms
• Like a bar chart, a histogram is made up of
columns plotted on a graph. Usually, there is no
space between adjacent columns.
– The columns are positioned over a label that
represents a quantitative variable.
– The column label can be a single value or a range of
values.
– The height of the column indicates the size of the
group defined by the column label.

Lecturer: Le Hoai Long (Ph.D.)


11
[email protected]
Bar chart and histogram
• In SPSS: Graphs => Legacy dialogs => Bar
(Histogram)

Lecturer: Le Hoai Long (Ph.D.)


12
[email protected]
Chart and graph
Difference Between Bar Charts and Histograms
• With bar charts, each column represents a
group defined by a categorical variable; and
with histograms, each column represents a
group defined by a quantitative variable.
• It is always appropriate to talk about the
skewness of a histogram. And how about bar
charts?

Lecturer: Le Hoai Long (Ph.D.)


13
[email protected]
Chart and graph
Stemplots
• A stemplot is used to display quantitative
data, generally from small data sets (50 or
fewer observations).
• The entries on the left are called stems; and
the entries on the right are called leaves
• Stemplots usually do not include explicit
labels for the stems and leaves
Lecturer: Le Hoai Long (Ph.D.)
14
[email protected]
Stemplot (Stem and leaf)

Lecturer: Le Hoai Long (Ph.D.)


15
[email protected]
Chart and graph
Boxplot Basics
• A boxplot splits the data set into quartiles. The body of
the boxplot consists of a "box” which goes from the
first quartile (Q1) to the third quartile (Q3).
• Within the box, a vertical line is drawn at the Q2, the
median of the data set.
• Two horizontal lines, called whiskers, extend from the
front and back of the box. The front whisker goes from
Q1 to the smallest non-outlier in the data set, and the
back whisker goes from Q3 to the largest non-outlier
• If the data set includes one or more outliers, they are
plotted separately as points on the chart
Lecturer: Le Hoai Long (Ph.D.)
16
[email protected]
Boxplot
• In SPSS: Graphs => Legacy dialogs => Boxplot

Lecturer: Le Hoai Long (Ph.D.)


17
[email protected]
Chart and graph
Scatterplot
• A scatterplot is a graphic tool used to display
the relationship between two quantitative
variables
• A scatterplot consists of an X axis (the
horizontal axis), a Y axis (the vertical axis), and
a series of dots.
• Each dot on the scatterplot represents one
observation from a data set
Lecturer: Le Hoai Long (Ph.D.)
18
[email protected]
Chart and graph
Scatterplot
• Scatterplots are used to analyze patterns in
bivariate data.
• These patterns are described in terms of
linearity, slope, and strength.

Lecturer: Le Hoai Long (Ph.D.)


19
[email protected]
Scatter plot

Lecturer: Le Hoai Long (Ph.D.)


20
[email protected]
Compare distributions
• Focus on four
features:
– Center.
– Spread.
– Shape.
– Unusual
features.

Lecturer: Le Hoai Long (Ph.D.)


21
[email protected]
Table
• Alternatively, data can be presented in table
form
– One-way table
– Two-way table

Lecturer: Le Hoai Long (Ph.D.)


22
[email protected]
Table
• A one-way table is the tabular equivalent of a bar
chart. Like a bar chart, a one-way table displays
categorical data in the form of frequency counts
and/or relative frequencies.
– Frequency Tables: a one-way table shows frequency
counts for a particular category of a categorical
variable
– Relative Frequency Tables: a one-way table shows
relative frequencies for particular categories of a
categorical variable
Lecturer: Le Hoai Long (Ph.D.)
23
[email protected]
Table
• A two-way table (also called a contingency
table) is a useful tool for examining
relationships between categorical variables.
The entries in the cells of a two-way table can
be frequency counts or relative frequencies
just like a one-way table

Lecturer: Le Hoai Long (Ph.D.)


24
[email protected]
Table

Lecturer: Le Hoai Long (Ph.D.)


25
[email protected]
Be careful,
Simpson’s paradox
• Simpson's paradox (or the Yule-Simpson
effect) is a paradox in which a correlation
present in different groups is reversed when
the groups are combined.
• It occurs when frequency data are hastily
given causal interpretations.
• Simpson's Paradox disappears when causal
relations are brought into consideration
(Wikipedia)

Lecturer: Le Hoai Long (Ph.D.)


26
[email protected]
Be careful,
Simpson’s paradox
• Consider the situation of two contractors in the table
below (Good quality/number of contracts)
• Who is better? (Long N.D. 2010)
Type of contract
Civil Industrial Total
Contractor A 40/60 13/15 53/75
66.6% 86.7% 70.7%
Contractor B 5/8 42/50 47/58
62.5% 84% 81%
Lecturer: Le Hoai Long (Ph.D.)
27
[email protected]

You might also like