Click To Add Text Dr. Cemre Erciyes
Click To Add Text Dr. Cemre Erciyes
1
2
3
4
5
6
7
8
9
10
Frequency table of sex
Female
Male
Total 10 1 10
Frequency table of Year of birth
Year of birth Frequency Relative Cumulative Percentage
Frequency Frequency
Frequency table of Grouped
CGPA
Grouped Frequenc Relative Cumulativ Percentag Cumulativ
CGPA y Frequenc e e e
y Frequency Percentag
e
2.50<
2.01-2.50
1.51-2.00
1.01-1.50
<1.01
Total
Histogram or Bar chart of sex?
Histogram or Bar chart of Year of
Birth
Histogram or bar chart of
CGPA?
Stem and Leaf Graph of Year of
Birth
Stem and leaf Graph of CGPA
Descriptive statistics for sex
Mean
Median
Mode
Lower quartile
Upper quartile
30th percentile
60th percentile
The range
Descriptive statistics for year of
birth
Mean
Median
Mode
Lower quartile
Upper quartile
30th percentile
60th percentile
The range
Descriptive statistics for CGPA
Mean
Median
Mode
Lower quartile
Upper quartile
30th percentile
60th percentile
The range
Today
When to use which measure of central
tendency?
Normal distribution (introduction)
Measures of varience (Range, Variance,
Std. Dev., Emprical Rule)
Properties of distributions
Weighted mean
Measures of Central Tendency
The best way to reduce a set of data and
still retain its information is to summarize it
with a single value.
Measures of central tendency—mean,
median, and mode—can help you capture,
with a single number, what is typical of a
data set.
Properties of the mean
The formula for the mean uses numerical values for the observations.
Mean is appropriate only for quantitative variables.
It is not sensible to compute the mean for observations on a nominal scale.
For instance, for religion measured with categories such as (Protestant, Catholic,
Jewish, Other), the mean religion does not make sense, even though these levels
may sometimes be coded by numbers for convenience.
Similarly, we cannot find the mean of observations on an ordinal rating such as
excellent, good, fair, and poor, unless we assign numbers such as 4, 3, 2,1 to the
ordered levels, treating it as quantitative.
The mean can be highly influenced by an observation that falls well above or well
below the bulk of the data, called an outlier.
The mean is pulled in the direction of the longer tail of a skewed distribution,
relative to most of the data.
The mean is the point of balance on the number line when an equal weight is at
each observation point.
Properties of the median
If there are even number of scores you can either find the arithmetic average of the
two middle scores or calculate the 50th percentile.
The median, like the mean, is appropriate for quantitative variables. Since it requires
only ordered observations to compute it, it is also valid for ordinal-scale data, as the
previous example showed. It is not appropriate for nominal-scale data, since the
observations cannot be ordered.
For symmetric distributions, the median and the mean are identical.
For skewed distributions, the mean lies toward the direction of skew (the longer
tail) relative to the median.
The median is insensitive to the distances of the observations from the middle, since
it uses only the ordinal characteristics of the data. For example, the following four
sets of observations all have medians of 10:
Set 1: 8, 9, 10, 11, 12
Set 2: 8, 9, 10, 11, 100
Set 3: o, 9, 10, 10, 10
Set 4: 8, 9, 10, 100, 100
The median is not affected by outliers. For instance, the incomes of the seven
employees in Example 3.5 have a median of $12,200 whether the largest observation
is $20,000, $215,000, or $2,000,000.
Normal distribution
In a normal
Click to add distribution,
Text mean,
median and mode are
identical in value.
Normal distribution
Often just called the bell-curve or bell-shaped curve.
Most of the scores in this graph accumulate around the
middle.
The mean, median and mode are all equal, and the
scores at either end of the distribution occur less often.
For example, a curve representing the results of an
intelligence test would have the most number of people
in the middle or around the 'average' intelligence range.
Whereas the number of people decreases as the scores
get farther away on either side of the average, giving the
curve its shape and name.
Measures of
Varience
Measures
Click to addofText
variability
describe the spread of the
data
Measures of Varience
The range of a set of data is the difference between the highest and
lowest values in the set. To find the range, first order the data from
least to greatest. Then subtract the smallest value from the largest
value in the set.
The variance averages the squared deviations about the mean.
Its square root, the standard deviation, is easier to interpret,
describing a typical distance from the mean.
The Empirical Rule states that for a bell-shaped distribution, about
68% of the observations fall within one standard deviation of the
mean, about 95% fall within two standard deviations, and nearly all,
if not all, fall within three standard deviations.
Range
Range = difference between highest and lowest observed values
The range value of a data set is greatly influenced by the presence of just
one unusually large or small value (outlier).
The range can be expressed as an interval such as 4–10, where 4 is the
lowest value and 10 is highest. Often, it is expressed as interval width. For
example, the range of 4–10 can also be expressed as a range of 6.
The disadvantage of using range is that it does not measure the spread of
the majority of values in a data set—it only measures the spread between
highest and lowest values.
Other measures are required in order to give a better picture of the data
spread.
The range is an informative tool used as a supplement to other measures
such as the standard deviation or semi-interquartile range, but it should
rarely be used as the only measure of spread.
Variance
n= n-1=
Sample varience 1- solution:
xi => x̄= (7+7+9+13+15+23)/6=
x1= 7 14
x2= 7
x3= 9
x4= 13
x5= 15
x6= 23
n= 6 n-1= 5
Sample varience 2:
(xi - x̄ )=> (xi - x̄ )2=>
x1 - x̄ = (x1 - x̄ )2 =
x2 - x̄ = (x2 - x̄ )2 =
x3 - x̄ = (x3 - x̄ )2 =
x4 - x̄ = (x4 - x̄ )2 =
x5 - x̄ = (x5 - x̄ )2 =
x6 - x̄ = (x6 - x̄ )2 =
∑ (xi - x̄ )2 =
Sample varience 2 – solution:
(xi - x̄ )=> (xi - x̄ )2=>
x1 - x̄ = 7-14 = -7 (x1 - x̄ )2 = 49
x2 - x̄ = 7-14= -7 (x2 - x̄ )2 = 49
x3 - x̄ = 9 -14= -5 (x3 - x̄ )2 = 25
x4 - x̄ = 13-14=-1 (x4 - x̄ )2 = 1
x5 - x̄ = 15-14=1 (x5 - x̄ )2 = 1
x6 - x̄ = 23-14=9 (x6 - x̄ )2 = 81
s>0.
s = 0 only when all observations have the same value.
For instance, if the ages for a sample of five students are
19,19,19,19,19, then the sample mean equals 19, each of the
five deviations equals 0, and s = 0. This is the minimum possible
variability.
The greater the variability about the mean, the larger is
the value of s.
If the data are rescaled, the standard deviation is also
rescaled.
For instance, if we change annual incomes from dollars (such as
34,000) to thousands of dollars (such as 34.0), the standard
deviation also changes by a factor of 1000 (such as from 11,800
to 11.8)
Lecture 4
Same mean different variability
Same variability different mean
Variabilityprovides a quantitative
measure of the degree to which
scores in a distribution are spread out
or clustered together.
In other words variabilility refers to the
degree of “differentness” of the scores in
the distribution.
Emprical rule
Denote the sample means for two sets of data with sample sizes n1 and
Click to add Text
n2 by x̄ 1 and x̄ 2. The overall sample mean for the combined set of (n1 +
n2) observations is the weighted average
x̄ = n1 x̄ 1+n2 x̄ 2 /n1+n2
The numerator n1 x̄ 1+n2 x̄ 2 is the sum of all the observations, since n x̄=
∑x
for each set of observations.
The denominator is the total sample size.
Example of two groups of students
asked to state their number of shoes.
Number of shoes:
30, 11, 12, 20, 14, 12, 15, 8, 6, 8, 10, 15, 25, 6, 35, 20, 20, 20, 5,
7, 5, 5, 5, 25, 15
X 57555
X 5.4
n 5
X 327
X 16.35
n 20
X1n1 X2 n2
5.4 * 5 16.35 * 20
14.16
XN
n1 n2 5 20
XN
X1n1 X2 n2
5.4 * 5 16.35 * 20
14.16
n1 n2 5 20