Chapter 1 Data Analysis Making Sense of Data(1)
Chapter 1 Data Analysis Making Sense of Data(1)
Discrete
2. Descriptive analysis of categorical variable
The distribution of a categorical variable lists the categories and gives either
I. the count/number
Pie charts
• A pie chart is a circular chart which uses slices or sectors of the circle to
show categorical data.
• Must have
1. a title
2. number, scale or axis
3. categories displayed
4. Pie
2. Descriptive analysis of categorical variable
Marginal distribution
Conditional Distribution
2. Descriptive analysis of categorical variable
Relationships Between Categorical Variables
We could have used a segmented bar graph to compare the distributions
of male and female responses in the previous example.
• The distribution of a categorical variable lists the categories and gives the count
(frequency) or percent (relative frequency) of individuals
• Pie charts and bar graphs display the distribution of a categorical variable
• A two-way table of counts organizes data about two categorical variables measured
for the same set of individuals.
• Marginal distribution and conditional distribution are often used in two categorical
variables for analysis.
• There is an association between two variables if knowing the value of one variable
helps predict the value of the other.
2. Descriptive analysis of categorical variable -Exercise
3. Display of quantitative data with Graphs
• Here are data on the number of goals scored by the team in the 12
months prior to the 2012 Olympics.
3. Display of quantitative data with Graphs
histograms are for quantitative data bar graphs are for categorical data
3. Display of quantitative data with Graphs
2. Class widths/interval
3. Frequency/Frequency density
3. Display of quantitative data with Graphs
• In any graph, look for the overall pattern and for striking departures
from that pattern.
1. Centre tendency
2. Spread
3. Shape
4. Outliers
3. Display of quantitative data with Graphs
1. Centre tendency
3. Display of quantitative data with Graphs
2. Spread
2. Spread
3. Display of quantitative data with Graphs
Shape:
4. Outliers
High/low outliers
3. Display of quantitative data with Graphs
3. Display of quantitative data with Graphs
3. Display of quantitative data with Graphs
1. Centre tendency
2. Spread
3. Shape
4. Outliers
3. Display of quantitative data with Graphs
1. Centre tendency
2. Spread
3. Shape
4. Outliers
Compare shape:
E.G. The distribution of household size for the U.K. sample is roughly
symmetric and unimodal, while the distribution for the South Africa
sample is skewed to the right and unimodal.
3. Display of quantitative data with Graphs
Compare outliers
For data:
4 6 6 6 7 14 8
Mode=6(most frequent)
Mean=(numerical average)
𝑀𝑒𝑎𝑛=𝐸 ( 𝑋 ) =
∑ 𝑥𝑓
( 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎)=
∑ 𝑥𝑝
( random data)
∑𝑓 ∑𝑝
4. Display of quantitative data with Numbers
• Range:
[Mean: ]
• Variance:
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
If your data set consists of the entire population, then it’s appropriate to
use . Most often, the data we’re examining come from a sample.
• Percentiles
• Z-scores
4. Display of quantitative data with Numbers
• Quartiles
• Two very important percentiles are the lower and upper quartiles.
These lie 25% and 75% of the way through the data respectively.
• If the position does not turn out to be a whole number, you simply
find the mean of the pair of numbers on either side.
4. Display of quantitative data with Numbers
• For each of the following sets of data calculate the median, upper and
lower quartiles.
• In each case calculate the interquartile range.
I. 13 12 8 6 11 14 8 5 1 10 16 12
II. 14 10 8 19 15 14 9
4. Display of quantitative data with Numbers
• As with the range, the interquartile range gives a measure of how spread
out or consistent the data is.
• If one set of data has a smaller IQR than another set, then the first set is
more consistent and less spread out. This can be a useful comparison tool.
The 1.5 × IQR rule for outliers
Call an observation an outlier if it falls more than 1.5 × IQR above the third
quartile or below the first quartile
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
comparable.
distribution.
• Try to explain the following terms in your own words. You may use
description, examples or comparison.