0% found this document useful (0 votes)
3 views

Week 12

This document covers descriptive statistics, focusing on measures of variability such as range, variance, and standard deviation, which indicate how spread out data values are. It also discusses the importance of examining relationships among variables using techniques like correlation coefficients and regression analysis. Additionally, it introduces standardized scores (z-scores) and effect size measures like Cohen's d to interpret differences between group means.

Uploaded by

Mishca Heynemann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Week 12

This document covers descriptive statistics, focusing on measures of variability such as range, variance, and standard deviation, which indicate how spread out data values are. It also discusses the importance of examining relationships among variables using techniques like correlation coefficients and regression analysis. Additionally, it introduces standardized scores (z-scores) and effect size measures like Cohen's d to interpret differences between group means.

Uploaded by

Mishca Heynemann
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

RES 320

Social Research –
Methodological Thinking
Week 12
Department of Psychology
Faculty of Humanities

Professor Eugene L Davids


Unit 10
Descriptive statistics (Part 2)
Measures of variability
• When you want to find out how much your data values are
spread out (i.e., how different they are)
• you want to know how much variability is present

• A measure of variability is
• a numerical index that provides information about how spread out or how
much variation is present in a variable
• If all of the data values for a variable were the same, then there is
no variability

• 4, 4, 4, 4, 4, 4, 4, 4, 4, 4 – no variability in these numbers


• 1, 2, 3, 3, 4, 4, 4, 6, 8, 10 – there is variability in these numbers

• The more di6erent your numbers, the more variability you have
• Data for group one: 44, 45, 45, 45, 46, 46, 47, 47, 48, 49
• Data for group two: 34, 37, 45, 51, 58, 60, 77, 88, 90, 98

• data for group two have more variability than group one
• when there is little variability in a group, we say that the scores are
homogeneous
• when the scores show a lot of variability, we say that the scores are
heterogeneous
Measures of variability
• Three types of measures of variability
• Range
• Variance
• Standard deviation
Range
• Simplest, but most crude, measure of variability
• the highest (i.e., largest) number minus the lowest (i.e., smallest) number
in a set of numbers

• Range = H - L
• H is the highest number, and L is the lowest number
Variance and standard deviation
• Two most popular measures of variability are
• the variance and
• the standard deviation

• They are superior to the range because they take into account
all of the data values for a variable
• provide information about the dispersion or variation around the mean
value of a variable
• Variance is
• the average deviation of the data values from their mean in “squared
units”
• the variance is popular because it has nice mathematical properties

• To turn the variance into more meaningful units, you can obtain
the standard deviation
• standard deviation is the square roots of variance
• to calculate the standard deviation,
• you take the square root of the variance (i.e., you put the value of the
variance into your calculator and press the square root key)
• an approximate indicator of the average distance that your data values
are from their mean
• if you have a mean of 5 and a standard deviation of 2, then the data values
tend to be approximately 2 units above or below 5
• For the variance and the standard deviation,
• the larger the value, the greater the data are spread out;
• the smaller the value, the less the data are spread out
Calculate the range, variance and standard
deviation
• Data set
• 14; 72; 52; 15; 19; 36; 58; 25

• Calculate
• Range
• Variance
• Standard deviation

• Round off to 2 decimal places


Standard deviation and normal curve
• Normal curve or normal distribution has a bell shape; it is high in
the middle and it tapers off to the left and the right
• fully normally distributed, then you would be able to apply the “68, 95,
99.7 percent rule”
• 68% of the cases fall within one standard deviation from the mean, 95%
fall within two standard deviations, and 99.7% fall within three standard
deviations

• In practice, it is important to understand that sample data are


never fully normally distributed in the sense of perfectly matching
the normal distribution described here
• theoretical normal distribution
Z scores
• Researchers sometimes like to convert their observed data into a
type of standardised scores called z scores
• these scores are the values for a variable that have been transformed
from their original “raw scores” into a new “standardized” metric that has
a mean of zero and a standard deviation of one
• convenient because the data values now can be interpreted in terms of
how far they are from their mean

• If a data value is +1.00, one can say that this value falls one
standard deviation above the mean, a value of +2.00 means it falls
two standard deviations above the mean, a value of –1.5 means it
falls one and a half standard deviations below the mean, and so
on
• “Standardized units” or “z scores” were used with the normal
curve
Calculate the z-score for the highest value
• Data set
• 14; 72; 52; 15; 19; 36; 58; 25

• Round o' to 2 decimal places


Examining relationships among variables
• We use independent (or predictor) variables to “explain variance”
in dependent (or outcome) variables

• Determining what independent variables predict or cause


changes in dependent variables is perhaps the primary goal of
science
Unstandardised and standardised
difference between group means
• Most direct and simplest way to determine the magnitude of
difference between two means is to subtract one mean from
another and examine the size of the difference
• unstandardised difference between means

• Using data from TikTok University’s graduate data set, the mean
(i.e., the average) starting salary for males is ZAR 34,791.67, and
the mean starting salary for females is ZAR 31,269.23. Therefore,
the unstandardized difference between these two means is ZAR
34,791.67 minus ZAR 31,269.23, which is ZAR 3,522.44
• What can we deduce from this?
• To assist in deciding how different the group means are,
• the difference between the means is often transformed into a
standardised measure

• For group means, Cohen’s d


• is a popular standardised measure of the difference between the means
• one of many effect size indicators that researchers use
• effect size indicator is a standardized measure of the magnitude or
strength of a relationship between variables
As a rough starting point for interpreting d,
Cohen’s d Cohen defined effect sizes of
d = .2 as “small,”
d = .5 as “medium,” and
d = .8 as “large.”
• What can we deduce using the example of TikTok University’s
graduate data?
• What can we deduce using the example of TikTok University’s
graduate data?

Cohen’s d is .88. This says that the mean starting salary for men is .88
standard deviations above the mean for females.

Using Cohen’s criteria for interpretation, one would consider this a “large”
diJerence between the means.
Correlation coefficient
• Index indicating the strength and direction of linear relationship
between two quantitative variables
• value ranges from +1.0 to -1.0

• absolute value indicates strength of relationship


• sign indicates direction
• Positive correlation
• correlation in which values of two variables tend to move in the same
direction
• e.g., the more hours students spend studying for a test, the higher their
test grades tend to be

• Negative correlation
• correlation in which values of two variables tend to move in opposite
directions
• e.g., the more hours students spend partying the night before an exam,
the lower their test grades tend to be
• Pearson correlation (r)
• used with two quantitative variables
• only appropriate if data is related in a linear fashion

• Partial correlation
• a technique that involves examining correlation after controlling for one or
more variables

• a scatterplot can be used to judge the strength and direction of


a correlation
Scatterplot
Regression analysis
• Use of one or more quantitative independent variables to explain
or predict the values of a single quantitative dependent variable

• Two main types


• simple regression
• involves the use of one independent or predictor variable

• multiple regression
• involves two or more independent or predictor variables
• Prediction is made using the regression equation

• This equation defines the regression line that best fits the pattern
of observations in your data
• slope – how steep is the line
• y-intercept – point where regression line crosses y-axis
• Regression coefficient
• predicted change in the dependent variable (Y) given a one unit change in
the independent variable (X)

• Partial regression coefficient


• the regression coefficient in a multiple regression equation
Contingency table
• Table used to examine relationship between two categorical
variables
• Cells may contain frequencies or percentages
Summary
• Measures of variability
• Range
• Variance
• Standard deviation

• Examining relationships
• Cohen’s d
• Correlation
• Regression analysis
• Contingency tables
Thank You
Next Lecture
Unit 11 Prof Eugene L Davids
Inferential statistics (Chapter 15)
Room 11-30 (Humanities Building)
Student Evaluation:
Please keep an eye out for this email Consultation: Tuesdays 9h00 – 11h00 (by
and complete prior email arrangement)

Revision Lecture: Email: [email protected]


Complete the Google Form on ClickUp
announcements

Exam Focus
Units 7 - 11

You might also like