DOM105 Session 1
DOM105 Session 1
Session 1
Reading: SfM Ch.2,3
Categorical and Numerical Data
Categorical data is data that is separated
into various groupings or categories for
display.
Takes form of tables, bar charts, pie
charts, etc.
Numerical data comprises of numbers that
have not been separated into categories.
Displays of numerical data include arrays,
frequency distributions, scatter plots, etc.
Both types of data can be displayed using
some types of tables such as Pivot Tables.
Summary table
Tallies the values of various categories as frequencies and
percentages for each category.
Contingency Table
Cross-tabulates, or tallies jointly, the value
of two or more categorical variables,
allowing the study of patterns. Tallies of
frequency, or percentages.
Display Categorical Data – Bar Chart
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in K$
Pie Chart – Investor’s portfolio
Savings
15%
Stocks
42%
CD
14%
Comparing Investors
Savings
CD
Bonds
Stocks
0 10 20 30 40 50 60
Relative
Class Frequency Frequency Percentage
10 but under 20 3 .15 15
20 but under 30 6 .30 30
30 but under 40 5 .25 25
40 but under 50 4 .20 20
50 but under 60 2 .10 10
Total 20 1 100
© 2002 Prentice-Hall, Inc.
Graphing Numerical Data:
The Histogram
7 6
Frequency 6 5
5 4 No Gaps
4 3 Between
3 2 Bars
2
1 0 0
0
5 15 25 36 45 55 More
Class Boundaries
Class Midpoints
Tabulating Numerical Data:
Cumulative Frequency
Cumulative Cumulative
Class Frequency % Frequency
10 but under 20 3 15
20 but under 30 9 45
30 but under 40 14 70
40 but under 50 18 90
50 but under 60 20 100
Graphing Numerical Data:
The Ogive (Cumulative % Polygon)
Ogive
100
80
60
40
20
0
10 20 30 40 50 60
X i
X1 X 2 X n
X i 1
n n
n is the size of the sample.
Median
3rd Quartile splits the lowest 75% of the values from the rest.
X X
2
i
S2 i 1
Sample variance: n 1 , is the sample mean.
N
Xi
2
2 i 1
N
Population variance: , µ is the population
mean.
Standard Deviation
Most important measure of variation
Shows variation about the mean
Has the same units as the original data
Is the square root of the variance
n
X X
2
Sample standard deviation: i
S i 1
n 1
N
Xi
2
Population standard deviation:
i 1
N
Why do we divide by (n-1) for sample variance?
For a sample variance to be unbiased, the average variance for
all possible samples for a given population has to be equal to
the population variance.
It was mathematically shown that if the sample variance was
calculated using n instead of n-1, the average variance of all
possible samples was not equal to population variance.
This is called Bessel’s correction.
Only used when the population mean and variance is unknown.
Shape of a Distribution
Describes how data is distributed
Measures of shape
Symmetric or skewed
Median( Q2 ) Xlargest
X smallest Q Q3
1
4 6 8 10 12
Distribution Shape and the Boxplot
Q1 Q2 Q3 Q1 Q2Q3 Q1 Q2 Q3
Measuring skewness
Skewness is the measure of asymmetry in a data
distribution. One method of calculating it is adjusted Fisher
Pearson coefficient, as follows: