0% found this document useful (0 votes)
23 views

Lecture 2

lecture notes

Uploaded by

Lorraiine Ndadza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 2

lecture notes

Uploaded by

Lorraiine Ndadza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

Research Assessment

AZA 2490
Do you have a mental block towards research?

Does it frighten you?

https://www.awakeningbusiness.com/how-to-overcome-business-mental-blocks/
Stay open-minded & try to move beyond this mental block.

Dare to conquer this challenge.


Face your fear head on.

https://www.idealady.com/stop-a-creative-mental-block/
Revision:
Descriptive Statistics
Very brief summary & reminder –
it is your responsibility to engage with the
textbook chapters and Moodle slides to ensure
you are up to date and on track with last years
work.

Each year builds on the knowledge of the previous


year.
Scenario:

You ask 1000 people what their favourite colour is.


You have 1000 scores.

People ask you what your research has found.


How do you describe your data?
Types of Statistics

Descriptive: Inferential:

statistical procedure used to techniques that allow us to


summarise, organise, and study samples and then make
simplify data. generalizations about the
Example: Mean, Median, population from which they
Mode, Frequency, were selected.
Probability Example: t-test, correlation,
ANOVA, regression
Frequency Distribution
Frequency Distributions
• Data is organized into a frequency distribution to see the patterns in
the data – the researcher can see all the scores at a glance
• A frequency distribution is an organized tabulation showing exactly
how many individuals are located in each category on the scale of
measurement.
• Can be structured either as a table or as a graph, and presents the same two
elements:
1. The set of categories that make up the original measurement scale
2. A record of the frequency, or number of individuals in each category
Frequency Distributions
• One commonly occurring population distribution is the normal curve.
• The word normal refers to a specific shape that can be precisely defined by
an equation.
Frequency Distributions
• Researchers often simply describe a distribution by listing its
characteristics.

• There are three characteristics that completely describe any


distribution:
1. Central tendency measures where the center of the distribution is located.
2. Variability measures the degree to which the scores are spread over a wide
range or are clustered together.
3. Shape is concerned with whether the distribution is symmetrical or
skewed.
Shapes of Frequency Distributions
The left side of
the graph is
(roughly) a
mirror image of
the right side.

The scores tend


to pile up toward
one end of the
scale and taper
off gradually at
the other end.
Central Tendency
Describing Data

Methods for
Methods for
describing individual
describing an entire
scores within a
distribution of scores;
distribution;
mean &
z-score / standard
standard deviation
score
Central Tendency
• Central tendency is a statistical measure to determine a single score that
defines the center of a distribution.
• The goal of central tendency is to find the single score that is most typical or most
representative of the entire group.

• There is no single, standard procedure for determining central tendency.


• The problem is that no single measure produces a central, representative value in
every situation.

• To deal with this problem, statisticians have developed 3 different methods


for measuring central tendency:
• the mean, the median, and the mode.
X could
Where is the centre? equal 8, but
the majority
of scores are
below 8

X=5

Does X=5
adequately
sum up this
distribution?
The Mean
• The mean – the arithmetic average
• The mean for a distribution is the sum of the scores divided by the
number of scores.
The Greek
• The formula for the population mean is m = SX/N letter mu
• The formula for the sample mean is M = SX/n (mew) m

205 individuals are in a sample. There are 205 scores. The


sum (total) of the scores is 902. What is the mean? M =902/205
= 4.4
Median
• The goal of the median is to locate the midpoint of the distribution.

• If the scores in a distribution are listed in order from smallest to largest, the
median is the midpoint of the list.

• Defining the median as the midpoint of a distribution means that that the scores
are being divided into two equal-sized groups.

• Usually, the median can be found by a simple counting procedure:


1. With an odd number of scores, list the values in order, and the median is the middle score
in the list.
2. With an even number of scores, list the values in order, and the median is half-way
between the middle two scores.
Median examples for a discrete variable
3, 5, 8, 10, 11

1, 1, 4, 5, 7 , 8

4.5
Mode
• In a frequency distribution, the mode is the score or category that has the
greatest frequency.

• The mode indicates the peak of the distribution

• Although a distribution will have only one mean and only one median, it is
possible to have more than one mode.
• A distribution with two modes is said to be bimodal, and a distribution with more
than two modes is called multimodal.
Number of absences f
5 1
Mode = 3 4 2
3 7
The most frequent
score. 2 5
1 3
0 2
When to calculate a median rather than a
mean:
• Extreme scores or skewed distributions

• Undetermined values

• Open-ended distributions
See
• Ordinal Scales explanation
on pages 87
– 88 of your
textbook
When to calculate a mode rather than a mean
or median:
• Nominal scales

• Discrete variables

• Describing shape
See
explanation on
pages 87 – 88
of your
textbook
Central Tendency
Bimodal All X values
distribution occur with the
same
Normal frequency
distribution
Central Tendency: Skewed Distributions
Median
requires Mean is Mode/highest
50% of the influenced by frequency at
distribution the extreme the peak
on either scores on the
side left
Statistics intro: Mean, median, and mode | Data and
statistics | 6th grade | Khan Academy

https://youtu.be/h8EYEJ32oQ8
Variability
Variability
• Variability provides a quantitative measure of the differences
between scores in a distribution and describes the degree to which
the scores are spread out or clustered together.

• Variability describes the distribution.


• Variability measures how well an individual score (or group of scores)
represents the entire distribution.
I look at
variability when
I want to
understand the
distribution of
scores

https://cyntegrity.com/clinical-data-quality-article/variability-graph
I can measure variability in a number of ways:
Variability
• The range is the distance covered by the scores in a distribution, from the
smallest score to the largest score.

• Deviation is distance from the mean:


deviation score = X - µ ( a score minus the population mean)

• Variance equals the mean of the squared deviations.


• Variance is the average squared distance from the mean.

• Standard deviation is the square root of the variance and provides a measure of
the standard, or average distance from the mean.
• Standard deviation uses the mean of the distribution as a reference point and measures
variability by considering the distance between each score and the mean.
• For interval or ratio scales only
To get rid of
Calculating Standard Deviation the signs
which cancel
each other out

The standard deviation


provides a measure of
the standard/average
distance from the
mean.
It describes whether
To make the
the scores are clustered value
closely around the meaningful,
mean or widely take the
scattered. squared root.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Population vs Sample Variance
• Samples are meant to represent the population

• Samples are usually much smaller than the population

• A smaller number of scores in the sample usually means a smaller variance/distribution of scores
and thus more restrictions on the variance (average squared distance from the mean)

• These restrictions are calculated as degrees of freedom – the number of scores that are free to
vary in a sample.

• In order for the sample to represent the population, sample standard deviation calculations
include degrees of freedom (i.e. n-1)

• There are important notational differences when calculating sample and population variance.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Measuring variability in the Population vs the
Sample

Population Variance = s2 = SS/N Sample Variance = s2 = SS/(n-1) = SS/df

Population standard deviation = s Sample standard deviation = s = SD

Degrees of freedom = df = n-1


Greek letter
sigma

Copyright © 2017 Cengage Learning. All Rights Reserved.


Measuring variability in the Sample
• The degrees of freedom determine the number of scores in the
sample that are independent and free to vary.

• For a sample of n scores, the degrees of freedom, or df, for the


sample variance are defined as df = n - 1.
N-1 reduces bias
and makes for an
accurate estimate
of population
variance

Copyright © 2017 Cengage Learning. All Rights Reserved.


Representing the Standard Deviation graphically
s = variance = The
average/typical/standard
distance from the mean

Copyright © 2017 Cengage Learning. All Rights Reserved.


Standard deviation

The size of the SD indicates the shape of the distribution

http://www.statisticshowto.com/what-is-standard-deviation/
SD rule of thumb (normal distribution)

http://www.biologyforlife.com/standard-deviation.html
Variability and Inferential Statistics
• In very general terms, the goal of inferential statistics is to detect
meaningful and significant patterns in research results.

• Variability plays an important role in the inferential process because the


variability in the data influences how easy it is to see patterns.

• In general, low variability means that existing patterns can be seen clearly,

• Whereas high variability (high error variance) tends to obscure any patterns
that might exist.

Copyright © 2017 Cengage Learning. All Rights Reserved.


What is the standard deviation?

https://youtu.be/t8kDuV1Alt4
Z-scores
Describing Data

Methods for
Methods for
describing individual
describing an entire
scores within a
distribution of scores;
distribution;
mean &
z-score / standard
standard deviation
score
We will look at
The purpose of z-scores another purpose of
z-scores next week.

X values (scores) are transformed into z-scores for 2 useful purposes:

1. Each z-score tells the exact location of the original X value within the
distribution

2. The z-scores form a standardized distribution that can be directly


compared to other distributions that also have been transformed into z-
scores

Copyright © 2017 Cengage Learning. All Rights Reserved.


Z-scores & Location
• To make raw scores more meaningful, they are transformed into z-
scores which indicates the location of the raw score.

• Suppose you received a score of X = 76 on a statistics exam. How did you do?
• Your score of X = 76 could be one of the best scores, or it might be the lowest
score.
• To find the location of your score, you must have information about the other
scores in the distribution:
• The mean
• The standard deviation

Copyright © 2017 Cengage Learning. All Rights Reserved.


Z-scores example of 2 exam scores A score by itself
does not
necessarily provide
much information
In which scenario, (a) or (b) are you doing really about position
well compared to the class average? within a
distribution

Copyright © 2017 Cengage Learning. All Rights Reserved.


Z-scores & location
• 2 elements of the z-score indicate the precise location of each X value
within a distribution:

1. The sign of the z-score (+ or −) signifies whether the score is above the
mean (positive) or below the mean (negative).

2. The numerical value of the z-score specifies the distance from the mean by
counting the number of standard deviations between X and μ.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Population Distribution

Positive
Negative numbers
numbers above the
below the mean
mean

Copyright © 2017 Cengage Learning. All Rights Reserved.


IQ example
• µ = 100
• s = 15
• a score of X= 130 is
transformed into z = +2.00
• The z score indicates the
score is above the mean
(positive) by a distance of 2
standard deviations (15 x 2
= 30 or 30/2 = 15)

http://www.brainy-child.com/experts/normal-iq-range.shtml
Z-scores & location
If you know any 3 of the 4 statistics in the equation, you can adjust the
equation to find the 4th statistic:

Population Sample
equation: X– μ X– M equation:
z = ──── z = ────
X = µ + zs s
X = M + zs
σ

Copyright © 2017 Cengage Learning. All Rights Reserved.


What is the z-score for each X value?
The locations identified by z-scores are the same for all distributions,
no matter what mean or standard deviation the distributions may have.

s = 3; 70 +3 + 3 = 76, 2 s from the mean; z= +2.00 s = 12; 6 is half of 12; 0.5 s from the mean; z = +0.5

Copyright © 2017 Cengage Learning. All Rights Reserved.


Z-scores & location
You can determine the raw score (X) from a z-score
Calculate X if:
Use logic: Use an equation:
µ = 60 The z-score indicates that X is
s= 8 below the mean (negative z-score) X = µ + zs
z = -1.50 = 60 + (-1.50x8)
The mean is 60. = 60 -12
= 48
The standard deviation is 8 and X
is 1.50 standard deviations from
the mean.

8 x 1.50 = 12 (8 + half of 8)

12 points below the mean of 60 is


60 -12 = 48
Copyright © 2017 Cengage Learning. All Rights Reserved.
Using z-scores to Standardize a Distribution
• It is possible to transform every X value in a distribution into a
corresponding z-score.

• If every X value is transformed into a z-score, then the distribution of z-


scores will have the following properties:
1. The distribution of z-scores will have exactly the same shape as the original
distribution of scores – each individual score stays in the same position.
2. The z-score distribution will always have a mean of zero – making the mean a
convenient reference point.
3. The distribution of z-scores will always have a standard deviation of 1 – the
advantage of having a standard deviation of 1 is that the numerical value of a z-
score is exactly the same as the number of standard deviations from the mean.
Using z-scores to Standardize a Distribution
Relabeling
the values

This is a
standardized
distribution

Copyright © 2017 Cengage Learning. All Rights Reserved.


Using z-scores to Standardize a Distribution
• A Standardized Distribution is composed of scores that have been
transformed to create predetermined values of µ and s.
• Standardized distributions are used to make dissimilar distributions
comparable.

Useful for
comparing scores
in different
distributions, like
exam results for
different subjects.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Transforming z-scores to a distribution with a
predetermined µ and s A new standardized
distribution has
‘simple’ values for the
mean and standard
deviation but does not
change any
individual’s location
within the distribution

Copyright © 2017 Cengage Learning. All Rights Reserved.


Example of a standardized distribution
Most IQ tests are
standardized so that
they have the same
mean and standard
deviation – it is
possible to compare
scores from different
tests

http://www.brainy-
child.com/experts/normal-iq-
range.shtml
Recap Z-scores

The purpose of z-scores or standard scores is to:


• identify and describe the exact location of each score in a distribution.
• standardize an entire distribution.

We will look at
another purpose of
z-scores next week.
Probability
http://www.e-center.lt/article/statistics-and-probability/
Types of Statistics

Descriptive: Inferential:

statistical procedure used to techniques that allow us to


summarise, organise, and study samples and then make
simplify data. generalizations about the
Example: Mean, Median, population from which they
Mode, Frequency, were selected.
Probability Example: t-test, correlation,
ANOVA, regression
Probability
• What is the chance of rain today?
• 100% chance – it will rain
• 0% chance – no rain

• p(event) = a number between 0 and 1


• p(rain) = 0 (no rain)
• p(rain) = 1 (it is raining)
Probability ranges
between 0 and 1
or 0% and 100%
Probability
• Probability is used to predict the type of samples that are likely to be
obtained from a population.

• Thus, probability establishes a connection between samples and


populations.

• Inferential statistics rely on this connection when they use sample data as the
basis for making conclusions about populations.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Probability definition:
• For a situation in which several different outcomes are possible, the
probability for any specific outcome is defined as a fraction or a
proportion of all the possible outcomes.
• If the possible outcomes are identified as A, B, C, D, and so on, then
probability of A = number of outcomes classified as A
total number of possible outcomes
p(king) = 0.02 = 2%

• Example: 52 playing cards in a deck


• The probability of selecting the one and only King is 1/52 = 0.02 = 2%
• The probability of selecting an ace is 4/52 (0.08) (8%) because there are 4
aces in a deck.
Probability & Random Sampling
• For the definition of probability used here to be accurate, it is
necessary that the outcomes be obtained by a process called
independent random sampling:

• A random sample requires that each individual in the population has an


equal chance of being selected.

• An independent random sample requires that each individual has an equal


chance of being selected and that the probability of being selected stays
constant from one selection to the next if more than one individual is
selected.
Probability & the Normal Distribution
• The graph as represents the
entire population; different
portions of the graph
represent different portions
of the population.
• Probabilities and proportions
(%) are equivalent
• A particular portion of the
graph corresponds to a
particular probability.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Probability & the Normal Distribution
• The normal distribution is
symmetrical; the highest
frequency in the middle and
frequencies tapering off as you
move toward either extreme.
• Because the locations in the
distribution are identified by z-
scores, the percentages shown
in the figure apply to any
normal distribution regardless
of the values for the mean and
Z=+2.00 is an the standard deviation.
extreme value with a
probability of only
p=0.0228 Copyright © 2017 Cengage Learning. All Rights Reserved.
Using a unit normal table to answer probability
questions about a normal distribution
• The unit normal table lists several different proportions
corresponding to each z-score location.
• Because probability is equivalent to proportion, the table values can
also be used to determine probabilities.

The table does not list


negative z-score values. To
find proportions for negative
z-scores , you must look up
the corresponding
proportions for the positive
value of z.
Use the table
to find z-
A unit normal table scores,
probabilities,
proportions
and X-values

• Column A - lists z-score


values.
• For each z-score location,
columns B and C list the
proportions in the body
and tail, respectively.
• Finally, column D lists the
proportion between the
mean and the z-score
location.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Example of finding the proportion
• What proportion of the normal distribution corresponds to z-score
values greater than z = 1.00?
• First, you should sketch the distribution and shade in the area you are trying
to determine.
• In this case, the shaded portion is the tail of the distribution beyond z = 1.00.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Example of finding the proportion
To find this shaded area, you simply look for z = 1.00 in column A to
find the appropriate row in the unit normal table. Then scan across the
row to column C (tail) to find the proportion.

See Chapters
6 & 7 for
more info on
Probability

Copyright © 2017 Cengage Learning. All Rights Reserved.


Large Samples & Probability

• The z-scores and probabilities that we have considered so far are


limited to situations in which the sample consists of a single score.
X

• You are likely to work with large samples containing many scores and
will need to compute a z-score to describe an entire sample.
Large Samples & Probability
• In general, the difficulty of working with
POPULATION samples is that a sample provides an
incomplete picture of the population.
SAMPLE

• Sampling error is the natural discrepancy,


or amount of error, between a sample
SAMPLE statistic and its corresponding population
parameter.
These samples will
be different:
different
individuals,
scores, means etc.
Large Samples & Probability
• Luckily, the set of possible samples forms a
relatively simple and orderly pattern that makes it
possible to predict the characteristics of a sample.
POPULATION
• The ability to predict the characteristics of a
sample is based on:
SAMPLE • The distribution of sample means - is the collection of
sample means for all the possible random samples of a
particular size (n) that can be obtained from a
population.
SAMPLE
• A distribution of sample means is an example of a:
• sampling distribution which is a distribution of statistics
obtained by selecting all the possible samples of a
specific size from a population.
Example of a distribution of sample means

From Durrheim, K. & Tredoux, C (Eds.). (2002). The Sampling Distribution of the Mean In Numbers,
Hypotheses & Conclusions. A course in Statistics for the Social Sciences. Lansdowne: UCT Press.
Infinite samples

Population
Infinite samples

M
M Distribution of sample means
M
M
M
M
M
M
M
Characteristics of the Distribution of Sample
Means
• The sample means should pile up around the population mean.

• The pile of sample means should tend to form a normal-shaped


distribution.

• The larger the sample size, the closer the sample means should be to
the population mean, μ.
Sampling Distributions (Statistics - Vol 1 - Sect 1)

https://youtu.be/EOlNb1XXC_M
Theorem: logical
The Central Limit Theorem argument or chain
or reasoning

• It is usually impossible to obtain every possible random sample for a


population

• BUT, it is possible to determine exactly what the distribution of


sample means looks like thanks to the mathematical proposition
known as
The Central Limit Theorem

For any population with mean μ and standard deviation σ, the


distribution of sample means for sample size n will have a mean of μ
and a standard deviation of s/n and will approach a normal
distribution as n approaches infinity.
The Central Limit Theorem
2 simple facts about the CLT:

1. It describes the distribution of sample means for any population, no matter


what shape, mean, or standard deviation

2. The distribution of sample means “approaches” a normal distribution very


rapidly; by the time the sample size reaches n = 30, the distribution is
almost perfectly normal.
So what does this mean????

• Regardless of the shape of the population distribution, the sampling distribution


of the mean will be approximately normally distributed as long as the sample size
is not too small. i.e. sample size influenced the shape of the distribution and
small samples may not be suitably representative of the population.

• The use of the CLT is to estimate the accuracy with which a sample mean
estimates the population mean.

• Knowing this we can ask what proportion of samples have a mean greater or
smaller than a particular value. We can ask for the probability of a randomly
selected sample having a mean less than a particular value. i.e. we can compare a
sample to the population [this makes inferential statistics possible].
The Shape of the Distribution of Sample
Means
• The distribution of sample means is almost perfectly normal if either
of the following two conditions is satisfied:

1. The population from which the samples are selected is a normal


distribution.

2. The number of scores (n) in each sample is relatively large, around 30 or


more.
Central limit theorem | Inferential statistics | Probability and Statistics |
Khan Academy

https://youtu.be/JNm3M9cqWyc
Probability and the Distribution of Sample
Means
• The primary use of the distribution of sample means is to find the
probability associated with any specific sample.
• Recall that probability is equivalent to proportion.
• Because the distribution of sample means presents the entire set of all
possible sample means, we can use proportions of this distribution to
determine probabilities.

Although we cannot construct


the distribution of sample means
by repeatedly taking samples and
calculating means, we know
exactly what the distribution
looks like based on the Central
Limit Theorem
The Mean of the Distribution of Sample Means
• The average value of all the
sample means is exactly equal to
the value of the population mean.

• The mean of the distribution of


sample means always is identical to
the population mean. This mean
value is called the expected value of
M.

• In this case, the average value of all


the sample means is exactly equal
to μ.

http://psychology.illinoisstate.edu/jccutti/psych340/fall02/oldlecturefiles/prob.html
The Standard Error of M
• The standard deviation of the distribution of sample means, σM, is called
the standard error of M.

• The standard error provides a measure of how much distance is expected


on average between a sample mean (M) and the population mean (μ).

• The standard error:


• describes the distribution of sample means.
• measures how well an individual sample mean represents the entire distribution,
specifically, how much distance is reasonable to expect between a sample mean and
the overall mean for the distribution of sample means
The Standard Error of M
The standard error is an extremely
valuable measure - It specifies
precisely how well a sample mean
estimates its population mean.

The standard error tells you


how much error you can expect
between M and μ
The Standard Error of M
The magnitude of the standard error is determined by 2 factors:

1. The size of the sample: The law of large numbers states that the larger the
sample size (n), the more probable it is that the sample mean will be close
to the population mean.

2. The standard deviation of the population from which the sample is


selected: there is an inverse relationship between the sample size and the
standard error.
• Bigger samples have smaller error, and smaller samples have bigger error.
The Relationship between Standard Error and
Sample Size

FIGURE 7.3 The relationship between standard error and sample size. As the sample size
is increased, there is less error between the sample mean and the population mean.

Copyright © 2017 Cengage Learning. All Rights Reserved.


Sampling Error
• The general concept of sampling error is that a sample typically will not
provide a perfectly accurate representation of its population.

• More specifically, there typically is some discrepancy (or error)


between a statistic computed for a sample and the corresponding
parameter for the population.
Standard Error
The standard error provides a way to
measure the ‘average’ or standard,
distance between a sample mean and Whenever you are
the population mean. working with a
sample mean, you
must use the
It’s like the standard deviation used to
standard error.
describe individual scores in a
distribution, except that this is a
distribution of sample means and so
we use the concept of standard error.
Standard Error
• For each individual sample, you can measure the error (or distance)
between the sample mean and the population mean.

• The standard error provides a way to measure the “average”, or


standard, distance between a sample mean and the population mean.

Standard
Error is often
written as SE
or SEM
Why did I need to know that?
http://www.e-center.lt/article/statistics-and-probability/
Where too from here?
Inferential Statistics – methods that use sample data as the basis for
drawing general conclusions about populations.

There is always a
The natural For the rest of the
margin of error that
differences that exist semester we will look
must be considered
between samples at variety of
whenever a
and populations statistical methods
researcher uses a
introduce a degree of that all use sample
sample mean as the
uncertainty and means to draw
basis for drawing
error into all inferences about
conclusions about a
inferential processes population means.
population mean.
References:
• Many of these slides are directly from or adapted from your textbook and
generic textbook slides:

Gravetter, F. J. & Wallnau, L. B. (2017). Statistics for the Behavioural Sciences.


10th ed. Cengage.

Durrheim, K. & Tredoux, C (Eds.). (2002). The Sampling Distribution of the Mean
In Numbers, Hypotheses & Conclusions. A course in Statistics for the Social
Sciences. Lansdowne: UCT Press.

• Other sources:

See individual videos for the URL.

You might also like