0% found this document useful (0 votes)
18 views72 pages

Study+section+2 4

The document outlines the structure and requirements for a semester test in SOCY 323, focusing on quantitative data analysis. It details the test format, study sections, and expected outcomes, emphasizing the importance of understanding various statistical methods and their applications. Additionally, it covers the types of variables, levels of measurement, and analysis techniques necessary for effective quantitative research.

Uploaded by

Palesa Motshwene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views72 pages

Study+section+2 4

The document outlines the structure and requirements for a semester test in SOCY 323, focusing on quantitative data analysis. It details the test format, study sections, and expected outcomes, emphasizing the importance of understanding various statistical methods and their applications. Additionally, it covers the types of variables, levels of measurement, and analysis techniques necessary for effective quantitative research.

Uploaded by

Palesa Motshwene
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

SOCY 323

Social Research Methodology

Study section 2.4

Quantitative data analysis

Prof. Doret Botha


Semester Test
Date: 13 September 2024
Time: 12:00
Venues:
PC-E5: Abed A - Mimba ML
PC-G20: Mjalo MY - Zondo NJ
Duration of Test: 01:30
Total: 50 Marks
Semester Test
Format: Essay format, write in paragraphs, no
bullets should be used.
Work to study:
Study section 2.1: The nature of quantitative research

Study section 2.2: Sampling in quantitative research

Study section 2.3: Data collection in quantitative research

Format:
You will be given a case study, similar to Test 1, and you will need
to answer specific questions and apply them to the context of the
case.
Outcomes
Study Unit 2
On completion of this study unit you should be able to:
• critically explain the main characteristics of quantitative social research including its
main features, main steps, the development of measures for concepts, procedures
for checking the reliability and validity of the measurement process, and some
criticisms.
• critically discuss the rationale and principles of sampling in quantitative research;
• differentiate critically between probability and non-probability sampling;
• apply sampling methods and techniques to given case studies;
• critically discuss the nature and extent of structured interviews and self-completed
questionnaires;
• apply the theoretical knowledge in practical assignments by compiling structured
interviews and self-completed questionnaires;
• critically explain the considerations involved in asking the questions that are used in
structured interviews and questionnaires and apply them to given case studies; and
• identify, explain and apply some of the most used methods for analysing quantitative
data.
Outcomes
Study Unit 1

Study Unit 1 Focus


Study section 2.1 The nature of quantitative research
Study section 2.2 Sampling in quantitative research
Study section 2.3 Data collection in quantitative research
Study section 2.4 Quantitative data analysis
Outcomes
Study Unit 2.3

On completion of this study unit you should be able to:


• identify, explain and apply some of the most used methods for analysing
quantitative data
Prescribed readings /
Voorgeskrewe leeswerk

Clark, T., Foster, L., Sloan, L. & Bryman, A. 2021.


Bryman's Social Research Methods. Cape Town:
Oxford University Press. Chapters 13.

Bryman, A. 2012. Social research methods. 4ed. New


York: Oxford University press. Chapter 15.
Introduction
Introduction
Focuses:
• the distinctions between the different kinds of variables in
quantitative research;
• methods for analysing a single variable at a time (univariate
analysis);
• methods for analysing relationships between two variables (bivariate
analysis);
• methods for analysing relationships between three or more variables
(multivariate analysis);
• the meaning of statistical significance and how to assess it.
Introduction
• Important to decide early in the research process what techniques
you will be applying - for e.g., when designing the questionnaire.
• Main reasons:
o Any technique cannot be applied to any variable
o The size and nature of the sample can affect the kind of analysis
techniques
• Keep your research questions in mind: determine the variables
which will be employed and the types of analysis you will conduct
• The types of data you have collected, and what you want to find out,
will dictate whether you conduct univariate, bivariate, or multivariate
analysis
Levels of analysis
Univariate

• Exploring one variable

Bivariate

• Looking at patterns between two variables

Multivariate

• Analysing three or more variables simultaneously


15.2 Approaching quantitative data analysis

© Oxford University Press, 2021 11


Descriptive and inferential statistics
Descriptive statistics

• Methods used to describe data and their


characteristics

Inferential statistics

• Methods to make inferences (estimates or


predictions) about what we don’t know

15.2 Approaching quantitative data analysis

© Oxford University Press, 2021 12


Missing data
Missing data

• When respondents fail to


reply to a question—either
by accident or because they
do not want to answer it.
n2 Gender
Valid Cumulative
Frequency Percent Percent Percent
Valid Male 82 37,8 38,3 38,3
Female 127 58,5 59,3 97,7
Other 5 2,3 2,3 100,0
Total 214 98,6 100,0
Missing System 3 1,4
Total 217 100,0
© Oxford University Press, 2021 13
Types of
variables

In order to analyse data


the different types of
variables should be
classified according to
levels of measurement
Levels of measurement
Interval/Ratio

• Scale with where categories are equally distanced and constant

Ordinal

• Categories that can be ranked

Nominal

• Categories which cannot be ranked

Dichotomous

• Data that only has 2 categories

15.4 Types of variable

© Oxford University Press, 2021 15


Nominal variables

• Values indicate classes or categories – cannot be ranked


• Each respondent can only belong to ONE category, for e.g.
Faculty of Study: Humanities, Health Sciences, Engineering
The categories do not have order or importance

What is your hair colour?


What is your Gender?
x Brown
Male
Black
x Female
Blonde
Other
Gray
Other
Dichotomous variables

• Variables only have two categories


• Each respondent can only belong to ONE category, for e.g.
Did you vote in the election?: Yes or No
• The categories do not have order or importance

Do you have work experience?

Yes

x No
Ordinal variables

• Values indicate Classes or Categories


• These categories can be arranged in an order (high to low, or low to
high). E.g:
Year of Study (First year, Second year, Third year, Postgraduate)
Do you feel safe on the University campus? (Not at all, A little, Very safe)
• Often Likert scale items
Most Likert scales are
What is your job satisfaction?
classified as ordinal
variables.
0 Poor
1 Reasonable
2 Good
3 Excellent
Interval variables

• Equal intervals between the categories


• BUT no absolute zero
• Relative comparisons cannot be made (cannot say twice as cold
today as yesterday)
• For example:
• IQ
Cannot say that one student is twice as
• Temperature
intelligent as the other because there
is no absolute zero level.
Ratio variables
• Equal intervals between the categories, but zero is the lowest
• Differences can be calculated
• Relative comparisons can be made (she is twice the age of her friend)
• E.g:
• Age
• Weight
• Height
Levels of measurement
Deciding how
to categorize
a variable

15.4 Types of variable

© Oxford University Press, 2021 21


ACTIVITY
A. Determine the level of measurement used in the following question:

B. Determine the level of measurement used in the following question:


ACTIVITY
C. Determine the level of measurement used in the following question:
ACTIVITY
D. Determine the level of measurement used in the following question:
ACTIVITY
E. Determine the level of measurement used in the following question:
Univariate
analysis

Refers to the process of


analysing one variable at a
time and produces
descriptive statistics —
these are numerical
representations or
summaries of data, which
help to give meaning
Frequency tables
Nominal
variables are
Table 1.1: Reasons for visiting the gym
often
represented in
frequency Reason n %
tables

Relaxation 9 10

Maintain or improve fitness 31 34

Lose weight 33 37

Build strength 17 19

TOTAL 90 100
15.5 Univariate analysis

© Oxford University Press, 2021 27


Use for nominal
Bar charts
and ordinal
variables –
involves non-
continuous data

15.5 Univariate analysis

© Oxford University Press, 2021 28


Use for nominal
Pie charts
and ordinal
variables –
involves non-
continuous data

15.5 Univariate analysis

© Oxford University Press, 2021 29


Use for interval
Histograms
and ratio
variables -
involves
continuous data

With a
histogram there
is no space
between the
bars

15.5 Univariate analysis

© Oxford University Press, 2021 30


ACTIVITY
F. Interpret the results displayed in the following figure:

Faculty of Study

22,7

19

14,7
13,3 13,7

9,0
7,6

Humanities Natural and Education Economic and Law Engineering Health Sciences
Agricultural Sciences Management
Sciences Sciences
Measures of central tendency
Use for ordinal /
ratio / interval
variables

Mean

• Average score
• Sum all values in distribution, then divide by total number of
values

Median

• Middle point within entire range of values


• E.g., if there are 89 values, we would list them from the smallest
to the largest and then treat the 45th value as the median

Mode

• Most frequently occurring value

© Oxford University Press, 2021 32


Measures of dispersion
• Indicates the amount of variation in a sample

Range

• The difference between the maximum and the


minimum value in a distribution of values associated
with an interval/ratio variable
• Range = Highest value – lowest value

Standard deviation

• The average difference between individual values and


the mean
ACTIVITY
G. Interpret the results displayed in the following figure regarding the work
preferences of Gen Zs:
Descriptive statistics
Std.
Number Item N Mean
Deviation
Working in a casual work environment (i.e., a relaxed, 212 4,08 0,985
19.1
supportive work environment).
19.2 Doing work that offers a good monthly salary. 211 4,54 0,698
Having flexible working hours that allows you to balance 211 4,32 0,936
19.3 your work-life (i.e., career) and the demands of your
personal life.
Having the opportunity to work remotely (i.e., working in 212 3,74 1,229
19.4 any other place outside of the traditional office building,
e.g., at home).

The questionnaire employed a 5-point Likert scale including the following


categories: not at all (1), to some extent (2), to a moderate extent (3), to a
large extent (4) and to a very large extent (5).
The boxplot is useful because
Boxplots it provides an indication of
Minutes both central tendency (the
spend in median) and dispersion (the
gym range). It is also helpful in
indicating whether there are
any outliers.

15.5 Univariate analysis


© Oxford University Press, 2021 37
Presenting data effectively
• Use the correct way of presenting your data. Consider whether the data are
categorical or continuous and then select a suitable way to present it.
• Label the diagram clearly. Give each diagram a clear title. In any graph,
make sure both the X (horizontal) and Y (vertical) axes have labels and also
provide the units of measurement.
• The diagram should be clear. Also limit the number of different shadings in
a single graphic. Include a key (legend) where necessary.
• Include the source. E.g., secondary data.
• Be consistent. E.g., using the same number of decimal places throughout
for the same type of data. 0.65 or 0.654
• Totals and subtotals. Include relevant totals and subtotals in the diagram,
and check that they add up correctly.
Figure 1: Reasons for visiting a gym
Use for nominal
and ordinal
variables –
involves non-
continuous data

15.5 Univariate analysis

© Oxford University Press, 2021 39


Figure 1: Reasons for visiting a gym
Use for nominal
and ordinal
variables –
involves non-
continuous data

15.5 Univariate analysis

© Oxford University Press, 2021 40


Bivariate
analysis
• Involves analysing two
variables at a time in
order to uncover
whether or not they
are related.
• Thus, they simply
uncover relationships.
• They do not allow us to
infer that one variable
causes another. (Causal
inferences)
Contingency table Contingency tables
allows us to
simultaneously Table 13.5: Contingency table showing the relationship
between gender and reasons for visiting the gym
analyse two
variables at the
Gender
same time
Reasons Male Female

No. % No. %

Relaxation 3 7 6 13

Fitness 15 36 16 33

Lose weight 8 19 25 52

Build strength 16 38 1 2

TOTAL 42 48

© Oxford University Press, 2021 42


When both variables are Pearson’s r
normally distributed use
Pearson's correlation
coefficient

Perfect No Perfect
negative relationship positive

-1 0 +1
The closer the coefficient is to 1 or −1,
the stronger the relationship, and the
closer it is to 0, the weaker the
This method works on the assumption that relationship
the relationship between the two variables
is broadly linear 15.6 Bivariate analysis

© Oxford University Press, 2021 43


When both variables are
normally distributed use
Pearson's correlation
Pearson’s r
coefficient This method works on
the assumption that
the relationship
between the two
variables is broadly
linear – should form a
straight line
When both variables are Spearman’s rho
not normally distributed
use Spearman’s rho
correlation coefficient

Perfect No Perfect
negative relationship positive

-1 0 +1
The closer the coefficient is to 1 or −1,
Spearman's correlation determines the the stronger the relationship, and the
strength and direction of the monotonic closer it is to 0, the weaker the
relationship between your two variables relationship
A monotonic relationship: (1) as the value of one variable increases, so does 15.6 Bivariate analysis
the value of the other variable, OR, (2) as the value of one variable
increases, the other variable value decreases.
© Oxford University Press, 2021 45
When both variables are
not normally distributed
use Spearman’s rho
Spearman’s rho
correlation coefficient
• p-value = 0.00, thus p < 0.05
• Thus, there is a relationship between the digital capability of Gen Z’s and
their work preferences
• r = 0.267, thus r < 0.3, thus there is a small positive correlation between
the digital capability of Gen Z’s and their work preferences
• p-value = 0.00, thus p > 0.05
• Thus, there is not a relationship between the gender role expectations of
Gen Z’s and their work preferences
Multivariate
analysis

• Involves analysing
three or more
variables
simultaneously.
• Thus, they simply
uncover relationships.
• They do not allow us to
infer that one variable
causes another. (Causal
inferences)
Multivariate analysis
• The relationship between two variables might be spurious
– Each variable could be related to a separate, third variable

15.7 Multivariate analysis

© Oxford University Press, 2021 50


Multiple regression
• Assess the strength and direction of a relationship

• Assess the relative influence of individual variables


on a dependent variable

• Varies between -1 and +1

15.7 Multivariate analysis

© Oxford University Press, 2021 51


Statistical
significance
Statistical significance
We can only use
them in relation
to samples that
have been Statistical significance
selected using
probability • Useful in exploring how confident
sampling can we be that the findings from
a sample can be generalised to
the population as a whole?

15.8 Statistical significance

© Oxford University Press, 2021 53


Statistical significance
p < 0.05 (p means probability)

• This means we are recognizing that if we drew 100 samples from a


population, as many as 5 of them might falsely suggest that there is a
relationship

p < 0.01 (p means probability)

• This means we are recognizing that if we drew 100 samples from a


population, only 1 of them might falsely suggest that there is a
relationship

15.8 Statistical significance

© Oxford University Press, 2021 54


Testing procedure for statistical
significance

1 2 3 4

Set up a null Decide on a Use a Reject/accept


hypothesis level of statistical test null
statistical hypothesis
significance

15.8 Statistical significance

© Oxford University Press, 2021 55


The independent samples t-test
is used to compare two sample means from
unrelated groups
Hypothesis: There is a relationship between gender and "NWU Generation Z
undergraduate students’ work preferences ".

Group statistics

Std.
Gender N Mean Deviation Std. Error
F19_Work Male 81 4,4794 0,54844 0,06094
preferences

Female 126 4,6012 0,53402 0,04757


p ˃ 0.05
Analysis of variance (ANOVA)
is a statistical technique used to check if the means of
two or more groups are significantly different from
each other
Hypothesis: There is a relationship between the socio-economic
status of the family and "NWU undergraduate students’ preparedness
for employment in the 4IR workplace".

Descriptive statistics
95% Confidence
Interval for
Mean
Std. Std. Lower Upper
N Mean Deviation Error Bound Bound Minimum Maximum
Struggling to 45 4.2587 0.55814 0.08320 4.0910 4.4264 3.00 5.00
make ends meet
Living an 86 4.2688 0.64328 0.06937 4.1309 4.4067 1.14 5.00
adequate life

Well off 97 4.3093 0.54737 0.05558 4.1990 4.4196 1.14 5.00

Affluent 41 4.3449 0.49998 0.07808 4.1871 4.5028 2.29 5.00

Total 26 4.2933 0.57264 0.03491 4.2246 4.3621 1.14 5.00


9
Hypothesis: There is a relationship between the socio-economic
status of the family and "NWU undergraduate students’ preparedness
for employment in the 4IR workplace".

ANOVA
Sum of Mean
Squares df Square F Sig.
Between 0.239 3 0.080 0.241 0.867
Groups
Within Groups 87.642 265 0.331
Total 87.881 268

p ˃ 0.05
Spearman's rank-order correlation
measures the strength and direction of
association between two ranked variables
Hypothesis: There is a relationship between "NWU undergraduate
students’ digital skills" and "NWU undergraduate students’
preparedness for employment in the 4IR workplace".

F19_Digital_s
kills F9_4IR
Spearman's rho F19_Digital_skills Correlation 1.000 .283**
Coefficient
Sig. (2-tailed) 0.000
N 270 270
Correlations
F9_4IR Correlation .283** 1.000
Coefficient
Sig. (2-tailed) 0.000
N 270 270

** Correlation is significant at the 0.01 level (2-tailed)


* Correlation is significant at the 0.05 level (2-tailed)
(a) small effect: r = 0.1, (b) medium effect: r = 0.3 and (c) large effect: r > 0.5
ACTIVITY
H. Interpret the results displayed in the following figure:
ACTIVITY
I. Interpret the results displayed in the following figure:
Conclusion

• You need to ensure that your data analysis will be able to address your
research questions.
• You need to think about your data analysis before you begin designing your
research instruments.
• The different techniques of data analysis are suitable for different types of
variable.
• To understand what kind of analysis you can use, you will need to know the
difference between the four types of variable: nominal, ordinal, interval/ratio,
and dichotomous variables.
• It is a good idea to familiarize yourself with software such as SPSS before
you begin designing your research instruments.
• Make sure you are familiar with the techniques introduced in this chapter
and when you can and cannot use them.
Individual
activity

?
Efundi: Study Section 2.4
Individual activity
1. At what stage should you begin to think about the kinds of data analysis you
need to conduct?
2. What are missing data and why do they arise?
3. What are the differences between the four types of variable outlined in this
chapter: interval/ratio; ordinal; nominal; and dichotomous?
4. Why is it important that you can distinguish between the four types of
variables?
5. Imagine that you administered the following four questions in a survey.
What kind of variable would each question generate: dichotomous; nominal;
ordinal; or interval/ratio?
6. What is an outlier and why might it distort the mean and the range?
7. In conjunction with which measure of central tendency would you expect to
report the standard deviation: the mean; the median; or the mode?
8. Can you infer causality from bivariate analysis?
Efundi: Study Section 2.4
Individual activity
9. Why are percentages crucial when presenting contingency tables?
10. What does statistical significance mean and how does it differ from
substantive significance?
11. What is a significance level?
12. What does the chi-square test achieve?
13. What does it mean to say that a correlation of 0.42 is statistically significant
at p < 0.05?
14. Imagine that you administered the following four questions in a survey.
What kind of variable would each question generate: dichotomous; nominal;
ordinal; or interval/ratio?
Next lecture

Study Unit 2:
Quantitative research

Complete the class activity and bring to class for discussion.


Any questions

?
K Y O U
T H A N

© North-West University (2012)

You might also like