Lecture 2
Lecture 2
AZA 2490
Do you have a mental block towards research?
https://www.awakeningbusiness.com/how-to-overcome-business-mental-blocks/
Stay open-minded & try to move beyond this mental block.
https://www.idealady.com/stop-a-creative-mental-block/
Revision:
Descriptive Statistics
Very brief summary & reminder –
it is your responsibility to engage with the
textbook chapters and Moodle slides to ensure
you are up to date and on track with last years
work.
Descriptive: Inferential:
Methods for
Methods for
describing individual
describing an entire
scores within a
distribution of scores;
distribution;
mean &
z-score / standard
standard deviation
score
Central Tendency
• Central tendency is a statistical measure to determine a single score that
defines the center of a distribution.
• The goal of central tendency is to find the single score that is most typical or most
representative of the entire group.
X=5
Does X=5
adequately
sum up this
distribution?
The Mean
• The mean – the arithmetic average
• The mean for a distribution is the sum of the scores divided by the
number of scores.
The Greek
• The formula for the population mean is m = SX/N letter mu
• The formula for the sample mean is M = SX/n (mew) m
• If the scores in a distribution are listed in order from smallest to largest, the
median is the midpoint of the list.
• Defining the median as the midpoint of a distribution means that that the scores
are being divided into two equal-sized groups.
1, 1, 4, 5, 7 , 8
4.5
Mode
• In a frequency distribution, the mode is the score or category that has the
greatest frequency.
• Although a distribution will have only one mean and only one median, it is
possible to have more than one mode.
• A distribution with two modes is said to be bimodal, and a distribution with more
than two modes is called multimodal.
Number of absences f
5 1
Mode = 3 4 2
3 7
The most frequent
score. 2 5
1 3
0 2
When to calculate a median rather than a
mean:
• Extreme scores or skewed distributions
• Undetermined values
• Open-ended distributions
See
• Ordinal Scales explanation
on pages 87
– 88 of your
textbook
When to calculate a mode rather than a mean
or median:
• Nominal scales
• Discrete variables
• Describing shape
See
explanation on
pages 87 – 88
of your
textbook
Central Tendency
Bimodal All X values
distribution occur with the
same
Normal frequency
distribution
Central Tendency: Skewed Distributions
Median
requires Mean is Mode/highest
50% of the influenced by frequency at
distribution the extreme the peak
on either scores on the
side left
Statistics intro: Mean, median, and mode | Data and
statistics | 6th grade | Khan Academy
https://youtu.be/h8EYEJ32oQ8
Variability
Variability
• Variability provides a quantitative measure of the differences
between scores in a distribution and describes the degree to which
the scores are spread out or clustered together.
https://cyntegrity.com/clinical-data-quality-article/variability-graph
I can measure variability in a number of ways:
Variability
• The range is the distance covered by the scores in a distribution, from the
smallest score to the largest score.
• Standard deviation is the square root of the variance and provides a measure of
the standard, or average distance from the mean.
• Standard deviation uses the mean of the distribution as a reference point and measures
variability by considering the distance between each score and the mean.
• For interval or ratio scales only
To get rid of
Calculating Standard Deviation the signs
which cancel
each other out
• A smaller number of scores in the sample usually means a smaller variance/distribution of scores
and thus more restrictions on the variance (average squared distance from the mean)
• These restrictions are calculated as degrees of freedom – the number of scores that are free to
vary in a sample.
• In order for the sample to represent the population, sample standard deviation calculations
include degrees of freedom (i.e. n-1)
• There are important notational differences when calculating sample and population variance.
http://www.statisticshowto.com/what-is-standard-deviation/
SD rule of thumb (normal distribution)
http://www.biologyforlife.com/standard-deviation.html
Variability and Inferential Statistics
• In very general terms, the goal of inferential statistics is to detect
meaningful and significant patterns in research results.
• In general, low variability means that existing patterns can be seen clearly,
• Whereas high variability (high error variance) tends to obscure any patterns
that might exist.
https://youtu.be/t8kDuV1Alt4
Z-scores
Describing Data
Methods for
Methods for
describing individual
describing an entire
scores within a
distribution of scores;
distribution;
mean &
z-score / standard
standard deviation
score
We will look at
The purpose of z-scores another purpose of
z-scores next week.
1. Each z-score tells the exact location of the original X value within the
distribution
• Suppose you received a score of X = 76 on a statistics exam. How did you do?
• Your score of X = 76 could be one of the best scores, or it might be the lowest
score.
• To find the location of your score, you must have information about the other
scores in the distribution:
• The mean
• The standard deviation
1. The sign of the z-score (+ or −) signifies whether the score is above the
mean (positive) or below the mean (negative).
2. The numerical value of the z-score specifies the distance from the mean by
counting the number of standard deviations between X and μ.
Positive
Negative numbers
numbers above the
below the mean
mean
http://www.brainy-child.com/experts/normal-iq-range.shtml
Z-scores & location
If you know any 3 of the 4 statistics in the equation, you can adjust the
equation to find the 4th statistic:
Population Sample
equation: X– μ X– M equation:
z = ──── z = ────
X = µ + zs s
X = M + zs
σ
s = 3; 70 +3 + 3 = 76, 2 s from the mean; z= +2.00 s = 12; 6 is half of 12; 0.5 s from the mean; z = +0.5
8 x 1.50 = 12 (8 + half of 8)
This is a
standardized
distribution
Useful for
comparing scores
in different
distributions, like
exam results for
different subjects.
http://www.brainy-
child.com/experts/normal-iq-
range.shtml
Recap Z-scores
We will look at
another purpose of
z-scores next week.
Probability
http://www.e-center.lt/article/statistics-and-probability/
Types of Statistics
Descriptive: Inferential:
• Inferential statistics rely on this connection when they use sample data as the
basis for making conclusions about populations.
See Chapters
6 & 7 for
more info on
Probability
• You are likely to work with large samples containing many scores and
will need to compute a z-score to describe an entire sample.
Large Samples & Probability
• In general, the difficulty of working with
POPULATION samples is that a sample provides an
incomplete picture of the population.
SAMPLE
From Durrheim, K. & Tredoux, C (Eds.). (2002). The Sampling Distribution of the Mean In Numbers,
Hypotheses & Conclusions. A course in Statistics for the Social Sciences. Lansdowne: UCT Press.
Infinite samples
Population
Infinite samples
M
M Distribution of sample means
M
M
M
M
M
M
M
Characteristics of the Distribution of Sample
Means
• The sample means should pile up around the population mean.
• The larger the sample size, the closer the sample means should be to
the population mean, μ.
Sampling Distributions (Statistics - Vol 1 - Sect 1)
https://youtu.be/EOlNb1XXC_M
Theorem: logical
The Central Limit Theorem argument or chain
or reasoning
• The use of the CLT is to estimate the accuracy with which a sample mean
estimates the population mean.
• Knowing this we can ask what proportion of samples have a mean greater or
smaller than a particular value. We can ask for the probability of a randomly
selected sample having a mean less than a particular value. i.e. we can compare a
sample to the population [this makes inferential statistics possible].
The Shape of the Distribution of Sample
Means
• The distribution of sample means is almost perfectly normal if either
of the following two conditions is satisfied:
https://youtu.be/JNm3M9cqWyc
Probability and the Distribution of Sample
Means
• The primary use of the distribution of sample means is to find the
probability associated with any specific sample.
• Recall that probability is equivalent to proportion.
• Because the distribution of sample means presents the entire set of all
possible sample means, we can use proportions of this distribution to
determine probabilities.
http://psychology.illinoisstate.edu/jccutti/psych340/fall02/oldlecturefiles/prob.html
The Standard Error of M
• The standard deviation of the distribution of sample means, σM, is called
the standard error of M.
1. The size of the sample: The law of large numbers states that the larger the
sample size (n), the more probable it is that the sample mean will be close
to the population mean.
FIGURE 7.3 The relationship between standard error and sample size. As the sample size
is increased, there is less error between the sample mean and the population mean.
Standard
Error is often
written as SE
or SEM
Why did I need to know that?
http://www.e-center.lt/article/statistics-and-probability/
Where too from here?
Inferential Statistics – methods that use sample data as the basis for
drawing general conclusions about populations.
There is always a
The natural For the rest of the
margin of error that
differences that exist semester we will look
must be considered
between samples at variety of
whenever a
and populations statistical methods
researcher uses a
introduce a degree of that all use sample
sample mean as the
uncertainty and means to draw
basis for drawing
error into all inferences about
conclusions about a
inferential processes population means.
population mean.
References:
• Many of these slides are directly from or adapted from your textbook and
generic textbook slides:
Durrheim, K. & Tredoux, C (Eds.). (2002). The Sampling Distribution of the Mean
In Numbers, Hypotheses & Conclusions. A course in Statistics for the Social
Sciences. Lansdowne: UCT Press.
• Other sources: