Measuring Individuals' Misogynistic Attitudes - Development and Validation of The Misogyny Scale

MEASURING INDIVIDUALS’ MISOGYNISTIC ATTITUDES
Measuring Individuals’ Misogynistic Attitudes: Development and

Validation of the Misogyny Scale
Bettina Rottweiler1 | Paul Gill1

1 Security and Crime Science Department, University College London, United Kingdom
Correspondence
Bettina Rottweiler, Security and Crime Science Department, University College London, 35
Tavistock Square, London WC1H 9EZ, United Kingdom.
Email: [email protected]
Funding information
This study received funding from The European Research Council (ERC) under the European
Union’s Horizon 2020 research and innovation programme (Grant 758834). See
https://www.grievance-erc.com.
This manuscript has not been peer reviewed. Contents may change prior to the publication.
Version: 27/07/2021
Word Count: 9919
1
Abstract
Across three studies based on a nationally representative survey (n = 1500), we developed and
validated the misogyny scale. Initial items were generated from an extensive literature search
and subsequently derived from validated scales assessing internalised misogyny, hostile
sexism, and hostility towards women. Construct and measurement validity were established
across several studies. An exploratory factor analysis (Study 1, n = 750) established the factor
structure of the 10-item misogyny scale. In study 2 (n = 750), the 10-item structure was
replicated via confirmatory factor analysis. The misogyny scale displayed good convergent
(i.e., significant and strong relationship with male sexual entitlement, masculinity related
violent beliefs and willingness to use violence) and discriminant validity (i.e., no relationship
with analytical thinking). In study 3 (n = 750), we established measurement invariance across
gender and age groups. This allows researchers to deploy the scale among male and female
individuals, across different age groups as well as to assess latent mean differences. Significant
latent mean differences for all three latent factors emerged between male and female
participants, demonstrating that men had significantly stronger misogynistic attitudes than
women (MDiff1 = -.482***; MDiff2 = -.324***; MDiff3 = -.197***). The latent mean
differences ranged from small (Cohen’s d2 = .27; Cohen’s d3 = .19) to medium effect sizes
(Cohen’s d1 = .38). The strongest latent mean differences between age groups were found for
the factor ‘manipulative and exploitative nature of women’. Older age groups reported
significantly stronger attitudes relating to this factor than younger participants. The misogyny
scale will allow researchers to explore the psychological antecedents and consequences of
misogyny among population samples and the subsequent findings may have important practical
implications for prevention and intervention programs on violent (extremist) propensity
development.
Keywords: misogyny, scale development, violent extremism, incels, domestic violence,

gender-based violence
2
Introduction
Public discourse on misogyny and its consequences is growing. Broad-based social
movements (e.g., Me Too), violence prevention awareness programs, and highly publicised
instances of harassment and violence against women consistently brought discussions of
misogyny and related constructs (e.g., toxic masculinity) to the fore. In the United Kingdom,
there are proposals to make misogyny a hate crime under the Domestic Abuse Bill currently
under consideration. At the same time, different research designs demonstrate and argue the
link between misogyny and domestic/family violence (Blake, O’Dean, Lian, & Denson, 2021),
sexual violence (Munsch & Willer, 2012; Leone & Parrott, 2019), harassment (Marwick &
Caplan, 2018), coercive control (Dragiewicz et al., 2018), the celebration of violence (Scaptura,
2019), and violent fantasies (Scaptura & Boyle, 2020).
Increasingly, studies on violent extremism also highlight the role of misogyny (Díaz &
Valji, 2019; Hoffman Ware, & Shapiro, 2020). Misogynistic worldviews form a core part of
the extreme right’s recruitment (Bjork-James, 2020; Center on Extremism, 2019) and
misogyny has been a fundamental motive within recent far right terrorist attacks (Wilson,
2020), adding to the argument that misogyny may constitute a precursor for different forms of
mass murder, including school shootings (Freeman, 2017; Neiwert, 2017; Wilson, 2018; Lyle
& Esmail, 2019; Muschert, 2007; Tyberg, 2016). Most evidently, misogyny is central to the
‘involuntary celibate’ movement. The violent fringe of this online subculture holds extreme
misogynistic attitudes and advocates for violence against women (Ging, 2017; Maxwell,
Robinson, Williams, & Keaton, 2020). Since 2014, movement advocates, colloquially known
as ‘incels’, conducted several acts of mass murders in the United States and Canada. These
attacks were explicitly motivated by hatred towards women. Perpetrators expressed they sought
vengeance for being unable to find a romantic partner and for being rejected by women (Bratich
& Banet-Weiser, 2019; Baele, Brace, & Coan, 2019).
3
Despite the increased public attention and scholarly research, the definitional
boundaries of misogyny remain quite loose. Given this, it motivates a finer-grained
measurement of misogyny, as well as an exploration of misogyny’s psychological antecedents
and consequences. Surprisingly, no validated psychometric tools that measure misogyny
amongst males and females exist. 1 This renders it vital to develop a tool that adequately
measures the construct of misogyny. Hence this paper sought to develop a psychometric scale
assessing the construct of misogyny. We do so across three studies which we now outline in
turn.
Study 1
Study 1 attempted to gain a conceptual and theoretical understanding of misogyny.
Psychometrically sound measures are fundamental to quantitative research. These tools have
to be valid and reliable in order to generate robust findings. Yet, proper scale development
techniques and reporting procedures are often absent or fragmented (Carpenter, 2018).
Resultingly, methodological inconsistencies mean that standards of scale development vary
(Davidson, Shaw, & Ellis, 2020). We therefore draw from Carpenter’s (2018) ten steps for
scale development and introduce the misogyny scale: (1) Research the intended meaning and
breadth of the theoretical concept (2) Determine sampling procedure (3) Examine data quality
(4) Verify the factorability of the data (5) Conduct common factor analysis (6) Select factor
extraction method (7) Determine number of factors (8) Rotate factors (9) Evaluate items based
on a priori criteria (10) Present results.
Step 1: Research the intended meaning and breadth of the theoretical concept
Theoretical and conceptual research for scale development. We began the scale
development process by trying to understand the meaning and breadth of the theoretical
1
For an internalised misogyny scale for homosexual women see Piggott, 2004)
4
concept and subsequently, to be able to identify potential dimensions of the construct and
related items. If the scale dimensions and items adequately capture the intended representation
of the abstract construct, meaningful measurement can be achieved (Carpenter, 2010; Chaffe,
1991). As such, it is important to ensure content validity of the construct before conducting the
methodological applications and statistical analyses by taking several steps, such as trying to
understand the extent of the construct and its dimension. Further steps include careful
conceptualisation, such as finding suitable conceptual definitions, selecting appropriate
conceptual labels for the overall constructs and its dimensions, as well as generating and
refining items of the proposed scale. We conducted all of these steps as they have proven
critical in the dimension identification and item generation process (DeVellis, 2012).
The literature review intended to provide an overview of the concept of ‘misogyny’.
We decided to broaden our literature search to further include related concepts to get a more
holistic view of the construct and thus, to achieve a greater theoretical and conceptual
understanding. The labelling of the construct and potential subscales affect future
interpretations of the concept (Carpenter, 2018). We defined misogyny as the hatred or
devaluation of, hostility to, or prejudice against women. However, our conceptualisation of
misogyny does not include subtle sexism or gender bias in favour of men. Yet, we chose to
also search for theoretically and conceptually related concepts, such as hostile sexism and more
general hostility towards women. We conducted a literature search via Google Scholar on
search items, such as “misogyny”, “hostile sexism” and “hostility and/ or hatred towards
women”.
Scale dimension and item generation. The literature review process further intended
to bring together existing questionnaires that measure misogyny, hostile sexism, or hostility
towards women and consequently, to examine the different dimensions of the constructs and
to identify potential subscales. While searching the literature for existing scales measuring the
5
construct of misogyny, we were able to identify only one existing scale that was explicitly
measuring misogyny. However, the focus of this scale was different to ours as it was
specifically developed to assess homosexual women’s internalised misogyny (Piggott, 2004).
However, we further identified validated scales measuring hostile sexism, such as the subscale
‘hostile sexism’ of the Ambivalent Sexism Inventory (Glick & Fiske, 1996), the Hostility
Towards Women Scale (Check, Malamuth, Elias, & Barton, 1985), as well as the Modern
Sexism Scale (Swim, Aiken, Hall & Hunter, 1995). We excluded the subscale ‘benevolent
sexism’ of the Ambivalent Sexism Inventory (Glick & Fiske, 1996), as we found that only the
hostile sexism subscale aligned with our theoretical conceptualisation of misogyny. As such,
we identified and developed the items based on the literature on misogyny specifically and
sexism more generally. We identified several dimensions pertaining to the construct of
misogyny and hostility towards women, yet the most common ones appeared to be related to
the distrust of women, the devaluation of women and the manipulative and exploitative nature
of women.
Generating and refining items. Several steps informed the compilation of items for our
proposed scale. To set up our initial pool of items, we began by listing the existing tools of
misogyny and hostile sexism published up until 2020. After compiling 44 items from 4 existing
instruments that measure misogyny or hostile sexism, each item was reviewed individually. In
the next step, we narrowed these measures down. More specifically, we either kept items
without changing them, modified (e.g., due to slightly diverging conceptualisation), or
removed (e.g., due to repetition/ redundancy or because they seemed unsuitable to capture our
conceptual definition) individual items. We added five further items, pertaining to the
manipulative and exploitative nature of women, as these attributes play a fundamental part in
our conceptualisation of misogyny, but which had not been adequately assessed in previous
measures.
6
Feedback for scale item refinement. Before the main data collection, we conducted a
pilot test (n = 40) via Prolific in June 2020. Respondents were not sampled based on any pre-
set requirements. The pilot test was run to reduce response burden and to assess the possibility
of measurement error, which can arise due to complex phrasing or language, lack of clarity in
questions or response categories as well as leading or biased questions (Ruel, Wagner, &
Gillespie, 2016). We specifically asked participants whether the wording or meaning of any of
the items was unclear or needed refinement and whether they had any other comments relating
to response burden. None of the participants indicated lack of clarity of the survey items nor
did they indicate a sign of response burden. After reviewing the individual items, the pilot test
and peer feedback discussions, 19 of the 49 items remained. These 19 items were generated to
assess the latent construct of misogyny and to create a scale for research purposes. Following
several scale content development stages, we started our main data collection process.
Method EFA
Data collection procedure. We conducted a large scale general population survey in
order to proceed with our scale development process. The primary purpose of this survey was
to collect individual-level data on risk and protective factors for violent extremism. Yet, the
secondary purpose was to collect data on the 19 items pertaining to the construct of misogyny
and subsequently, to conduct the scale development and validation tests. Participants were
recruited via Prolific. After completing the consent form, participants were asked to fill out the
questionnaire. Unless stated otherwise, throughout all studies, all items were measured on a 7-
point scale ranging from 1 (strongly disagree) to 7 (strongly agree). After completing the
questionnaire, the respondents were thanked and debriefed.
Participants. The main data collection took place in July 2020. Participants were part
of a UK nationally representative sample (by age, gender, and ethnicity) n = 1500. We split the
whole sample in half in order to conduct an EFA on one half of the sample (n = 750) and to
7
run the CFA on the other half of the sample (n = 750). In the EFA sample 51.2% (n = 384)
identified as female and 48.8% (n = 366) identified as male (Mage = 45.02; SDage = 16.46). The
majority of participants (n = 644; 85.5%) indicated ‘White’ as their ethnicity. This was
followed by 7.7% (n = 58) who stated ‘Asian’, 2.9% (n = 22) who identified as ‘Black’ and
2% (n = 22) as ‘Mixed’, as well as 1.5% (n = 11) of all respondents answered ‘Other’.
Step 2: Determine sampling procedure
The sample of n = 750 exceeds previously recommended guidelines of a minimum of
300 participants (Worthington & Whittaker, 2006). However, some call for abandoning the
sample size logic and instead rely on item ratios as a way to determine sufficient sample sizes
(Osborne, 2014). Costello and Osborne (2005) suggest at a 1:20 ratio of respondents to items
as their findings found that these sample sizes produced the most robust and correct solutions.
Our sample size translated into a 1:39 ratio, which allows us to achieve robust and generalisable
results.
Step 3: Examine Data Quality
After data collection ended, we manually reviewed the dataset to ensure data quality
and to examine any missing data. We examined whether participants had missed attention
checks and we also reviewed the completion time for each respondent. We excluded
participants from our data analysis if they missed more than one attention check and when they
completed the survey more than two standard deviations quicker than the average survey
completion time. We also assessed the ‘Bot Detection’ review. None of the ‘participants’ were
flagged as potential bots. There was no missing data and responses on the misogyny items.
An exploratory factor analysis was conducted for evaluating the factor structure of the
19 items that comprised the preliminary misogyny scale. We conducted the EFA in the
software programme R. We used the R package ‘psych’ to run the EFA analyses (Revelle,
2020). Further reliability analyses were conducted with the R package ‘multilevel’ (Bliese,
8
2016). We decided to apply PAF rather than the maximum likelihood method as the former
constitutes a more robust method and is recommended when the normality assumption is
violated (Costello & Osborne, 2005).
Results EFA
Step 4: Verify the Factorability of the Data. The first step was to verify the factorability
of the data. Bartlett’s test of sphericity is expected to be significant at p < .05, and the Kaiser-
Meyer-Olkin (KMO) measure of sampling adequacy with a value of ≥ .60 is recommended
before proceeding with the exploratory factor analysis (Tabachnick & Fidell, 2007). Bartlett’s
chi square test, χ2 (19) = 503.91, p < .001, and KMO = .95 were inspected and demonstrated
very good common variance as well as multivariate normality of the set of distributions,
thereby verifying the factorability of the misogyny scale. Second, we inspected the correlation
matrix. Carpenter (2018) suggests that inter-item correlations should be ≥ .30. Items that do
not correlate as such should be considered for deletion, if it makes theoretical sense to do so.
All items correlated ≥ .30.
Steps 5-7: Conduct Factor Analysis, Select Factor Extraction Method and Determine
Number of Factors. Next, we conducted an EFA using the principal axis factoring (PAF)
method and we ran a parallel analysis to establish how many factors to retain. Parallel analysis
is one of the most accurate factor retention methods (Hayton, Allen, & Scarpello, 2004; Kline,
2013). Parallel analysis compares eigenvalues of the EFA sample against a randomly ordered
data set. Factors are retained if the sample’s eigenvalues are larger than the ones pertaining to
the random dataset (Carpenter, 2018). Based on the parallel analysis scree plot (Supplementary
Materials A), a 5-factor solution was initially extracted.
Step 8: Rotate Factors
9
Next, we chose an oblique rotation technique, Promax, based on the assumption that
the factors should be related to one another. Promax has been argued to be more robust than
the Direct Oblimin rotation method, and thus is recommended (Thompson, 2004).
Step 9: Evaluate Items based on a Priori Criteria. We based scale item selection on
several a priori criteria to decide which items to retain or delete. This was necessary in order
to ensure consistency across the item selection process. We followed recommended guidelines
(e.g., Kline, 2013; Tabachnick & Fidell, 2007; Worthington & Whittaker, 2006). First, items
had to display a minimum factor item loading in order to be retained. We set the minimum
loading at > .50, although Carpenter (2018) suggests loadings above .32 are acceptable.
Further, items which cross-loaded on another factor above > .32 were excluded. The next
inclusion criteria referred to a minimum of three items per factor. Factors with less than three
items would be discarded. Additionally, we assessed items based on their theoretical
convergence. More specifically, we examined whether individual items, loading onto the same
factor, were found to demonstrate a clear conceptual grouping. Lastly, we retained or omitted
items based on the principle of parsimony, which aimed to minimise the redundancy of
wording or meaning across items. Non-parsimonious items were dropped (DeVellis, 2012).
The findings showed that two items showed loadings < .50 and were therefore removed.
A further two items were omitted as they yielded cross-loadings > .32. One factor consisted of
only two items and had to be excluded, as a minimum of three items per factor is required.
Three items loaded onto the same factor, yet there was no clear conceptual grouping (‘Women
always feel offended’; ‘I believe that most women do not tell the truth’; ‘The intellectual
leadership should be in the hands of men’). The latter of those items also showed a weak
loading. These three items indicated poor theoretical convergence and therefore, they were also
dropped. All non-excluded items and corresponding factors were found to be parsimonious.
10
After deleting the above-mentioned items, we had 10 items left (see Table 1 for the misogyny
scale after EFA).
Step 10: Present Results. Finally, we re-ran the EFA on the remaining 10 items using
principal axis factoring analysis with Promax rotation. The results indicated a three-factor
solution. Parallel analysis indicated that these three factors exceeded chance values and were
above the simulated data (see Supplementary Materials B). There were no cross-loadings (>
.32) or weak loadings (< .50) remaining. As a result, 3 factors and 10 items remained, factor 1
was comprised of four items, while factor 2 and factor 3 consisted of three items each (see
Table 1 for the finalised scale). The sums of squared loadings are the factors’ variances after
extraction. Sums of squared loadings of 2.90, 2.37, and 1.53 emerged, representing 29%,
23.7% and 15.3% of the variance, respectively, and explaining 68% of the total variance.
Inter-item reliability indices examine scale homogeneity and assess the level of
consistency between multiple items measuring the same underlying construct. For instance,
corrected item-total correlations are widely accepted item indices to assess item-score
reliability (Zijlmans, van der Ark, Tijmstra, & Sijtsma, 2018). A value of ≥ .30 per item for the
item-total correlation is considered to be sufficient, but researchers should aim for .30 – .70 to
achieve a greater degree of homogeneity (de Vaus, 2004). Our corrected item-total correlations
ranged between .57 – .80, indicating good scale homogeneity. We further assessed the
communalities of items. Communalities (h2) are the sum of squared factor loadings for the
variables. A communality indicates the proportion of each item’s variance, which can be
explained by the factors (e.g., the underlying latent construct). Communalities are considered
satisfactory if they range between .40 – .70 (Ibid). All item communalities of the misogyny
ranged between .44 – .82 (Table 1).
Inter-item correlations were also examined as they present an essential element in
conducting scale validity tests. Inspecting the inter-item matrix is a fundamental part of
11
examining item redundancy. Cut-off scores are correlations below .20 and over .80, yet ideally
the values should range between .20 – .50 (Cohen & Swerdlik, 2005). Our inter-item
correlations varied between .32 –.76 (Supplementary Materials C). The misogyny scale yielded
an average inter-item correlation of .55, which is satisfactory.
12
Table 1. Misogyny Scale final item selection and factor loadings obtained with exploratory factor analysis (EFA) in Study 1 (n = 750).
No. Dimension Item Factors M (SD) Skewness Corrected h2

item-total
1 2 3
correlation
Manipulative and exploitative Women seek to gain power
1 .58 2.92 (1.64) .46 .75 .63
nature of women by getting control over men
Manipulative and exploitative Women use their sexuality to
2 .92 3.70 (1.75) -.10 .69 .68
nature of women manipulate men
Manipulative and exploitative Women exploit men for their
3 .88 2.75 (1.64) .58 .81 .80
nature of women own agendas
Manipulative and exploitative If things don’t go their way,
4 .82 2.85 (1.72) .56 .80 .78
nature of women women will play the victim
It is generally safer not to
5 Distrust of women .83 2.15 (1.40) 1.27 .74 .70
trust women too much
When it comes down to it a
6 Distrust of women .94 2.37 (1.52) 1.00 .79 .82
lot of women are deceitful
I think that most women
7 Distrust of women .61 2.46 (1.54) .95 .81 .72
would lie just to get ahead
I think I get a raw deal from
8 Devaluation of women .60 2.06 (1.33) 1.27 .62 .52
women in my life
Sometimes women bother me
9 Devaluation of women .96 1.71 (1.19) 2.01 .55 .67
by just being around
I feel uncomfortable when a
10 Devaluation of women woman dominates the .54 1.68 (1.08) 1.79 .57 .44
conversation
Note. Further displayed are the communalities (h2), corrected item-total correlations and additional descriptive statistics for all scale items
13
The misogyny scale is a 3-factor scale, whereby factor 1 contains four items and factor
2 as well as factor 3 each are composed of three items, reflecting underlying aspects of a
misogynistic belief system (MScale = 2.46, SD = 1.15). Each factor refers to a different, yet
related aspect of the overall latent construct. The subscale naming logic aimed to identify an
overarching ‘concept’ linking the individual items of each factor. We also compared our factor
naming with the subscale labels of the hostile sexism and the internalised misogyny
instruments (Glick & Fiske, 1995; Piggott, 2004). Factor 1 ‘Manipulative and exploitative
nature of women’ included items that addressed individuals’ attitudes about the manipulative
and exploitative nature of women. Factor 2 ‘Distrust towards women’ reflected a general
distrust towards women. Lastly, Factor 3 ‘Devaluation of women’ focused on items which
referred to a general devaluation and derogation of women. All factors showed a strong positive
correlation with one another. The 10-item misogyny scale displayed good factor loadings,
satisfactory inter-item reliabilities (i.e., Communalities h2 and total-item correlations) as well
as a very good internal consistency (i.e., McDonald’s ω) for each subscale and thus, provides
a psychometric instrument that represents misogynistic beliefs among a general population
sample (see Table 2).
Table 2. Cronbach's alphas, means, SDs, and correlations between the three dimensions of the
misogyny scale.
Subscale McDonald’s M (SD) Factor
ω 1 2 3
1. Manipulative and .91 3.05 (1.49) -
exploitative nature of women
2. Distrust of women .89 2.32 (1.35) .75*** -
3. Devaluation of women .80 1.82 (.99) .57*** .66*** -
Note: ***p < .001. Correlation coefficient r is reported.
14
Study 2
In Study 2, we aimed to replicate the 3-dimentional factor structure of the misogyny
scale via confirmatory factor analysis (CFA) and we further assessed the convergent and
discriminant validity of the scale to confirm the structural and external aspects of construct
validity. More specifically, we analysed the relationship between the misogyny scale and other
constructs, which have been shown in past research to correlate with misogynistic and hostile
sexist beliefs or where no significant relationship has been found and/or there is no theoretical
reason to hypothesise such a relationship. The internal consistency (i.e., composite reliability)
of the scale was also examined along with the average variance extracted (AVE).
Method CFA
Participants. As mentioned above, participants were part of a UK nationally
representative sample (by age, gender, and ethnicity) n = 1500 conducted in July 2020. We ran
the CFA on the other half of the total sample (n = 750). In the CFA sample 51.3% (n = 385)
identified as female and 48.7% (n = 365) identified as male (Mage = 44.82; SDage = 15.36). Out
of all participants, 84% (n = 630) stated ‘White’ as their ethnicity, 7.6% (n = 57) answered
‘Asian’, 4.4% (n = 33) identified as ‘Black’, 2.1% (n = 16) as ‘Mixed’, as well as 1.9% (n =
14) answered ‘Other’. The ratio of respondents to items was again 1:39, which exceeds the 10–
20 participants per item rule when conducting an CFA and thus, should ensure robust results
(Schumacker & Lomax, 2015).
Procedure. To validate our newly developed theoretical three-dimensional construct of
misogyny, we applied a CFA. We ran the CFA on the second half (n = 750) of the total sample
(nTotal = 1500) in order to confirm the structure of the proposed scale, which we obtained from
the EFA analysis (Study 1) (Worthington & Whittaker, 2006). We ran the models in the
software program R using the packages ‘Lavaan’ (Rosseel, 2020) and ‘SemTools’ (Jorgensen,
2020). We evaluated multiple fit indices (i.e., χ2/df, CFI, TLI, RMSEA, SRMR) to accept or
15
reject our proposed and alternative models. Model fit was accepted if: the χ2/df ratio was fewer
than three (Byrne, 2001), Comparative Fit Index (CFI) ≥ .90, Tucker Lewis index (TLI) ≥ .90,
Root Mean Square Error of Approximation (RMSEA) ≤ .08, Standardized Root Mean Square
Residual (SRMR) ≤ .08 (Hu & Bentler, 1999). We applied a robust estimator as the data
displayed a skewed distribution, violating the normality assumption. As such, we conducted a
maximum likelihood estimation with robust standard errors and a Satorra-Bentler scaled test
statistic (Rosseel, 2020). Where available, we will report the robust fit indices.
Results CFA
The 3-factor model showed very good fit: χ2 (32) = 94.55, p < .001, χ2/df ratio = 2.95;
CFIRobust = .982, TLIRobust = .975, RMSEARobust = .051; SRMR = .029 and thus, model fit was
accepted. Factor loadings (λ1-9) ranged from .71 - .92, demonstrating strong factor loadings
(see Figure 1).
Figure 1. Confirmatory factor analysis of the 3-factor Misogyny Scale (Study 2)
Note: Standardised coefficients are shown. All beta coefficients were statistically significant
(all p < .05).
16
Further, to ensure that the proposed model was the best fitting model, the 3-dimentional
scale was compared to a unidimensional model, whereby all items loaded onto one factor. As
expected, the 1-factor model displayed poor model fit: χ2(35) = 337.35, p < .001, χ2/df ratio =
9.64; CFIRobust = .913, TLIRobust = .889, RMSEARobust = .107; SRMR = .055. We further ran an
ANOVA on both models to see whether the χ2 test was significant, which would indicate a
statistically significant worse fit of the alternative model, and we further compared the
alternative fit indices across models. Due to the non-normality of the data, we applied the
Satorra-Bentler scaled chi-square difference test (Satorra & Bentler, 2001). A significant χ2 test
and a significant drop in fit indices: ΔCFI = .01, ΔTLI = .01, ΔRMSEA = .015, ΔSRMR = .03
would indicate that the alternative 1-factor model fit the data significantly worse and thus,
would be rejected. Changes in χ2 parameters and fit indices are displayed as ‘Δ’. The χ2 test
was significant at p < .001, Δχ2 = 242.80; Δdf = 3 and significant drops in fit indices emerged:
ΔCFIRobust = .069, ΔTLIRobust = .086, ΔRMSEARobust = .056, ΔSMRM = .026. The 3-factor
model yielded significant better fit rather than the alternative factor solution. We accepted the
original 3-factor construct and the 1-factor model was rejected. Therefore, the misogyny scale
is best conceptualised as a multidimensional model with three underlying factors, representing
three underlying dimensions of misogynistic attitudes related to: (1) the manipulative and
exploitative nature of women, (2) the distrust towards women, and (3) the devaluation of
women.
The composite reliability (CR), McDonald’s ω, is a less biased estimate of reliability
than Cronbach’s alpha (α) as it takes into account the strength of association between items
and constructs as well as item-specific measurement errors (Zinbarg, Revelle, Yovel, & Li,
2005). An acceptable value for McDonald’s ω is .7 and above. The internal consistencies for
each factor of the misogyny scale ranged from good to excellent. AVE values ≥ .6 are
considered good, whereas values ≥ .5 are acceptable (Hayes & Coutts, 2020). The AVE was
17
.60 or above for all factors. Next, correlations between all factors were moderate to strong (see
Table 3). Correlations between manifest variables as well as latent correlations between latent
factors can be found in the Supplementary Materials D.
Table 3. Correlations between the three dimensions of the misogyny scale as well as composite
reliability, means and SDs (Study 2).
Dimensions McDonald’s M (SD) AVE F1 F2 F3
ω
F1 .90 3.14 (1.48) .69 -
F2 .88 2.36 (1.34) .71 .79*** -
F3 .77 1.86 (1.05) .60 .62*** .68*** -

Note: ***p < .001. AVE = Average variance extracted. Correlation coefficient r is reported.
Convergent and discriminant validity of the misogyny scale. Convergent and
discriminant validity assessment are both fundamental aspects of construct validity (Piedmont,
2014). Convergent validity refers to how strongly a construct is related to measures of other
latent constructs that are theorised to have causal relationships. Conversely, discriminant
validity describes how well a measure performs in not being associated with theoretically
dissimilar und unrelated concepts (Chin, Yao, 2014).
To test whether the misogyny scale has convergent validity, we explored constructs that
have been shown to positively correlate with misogynistic and hostile sexist beliefs (e.g., sexual
entitlement, Hill & Fischer, 2001; tendency to seek revenge, Pina et al., 2017; hypermasculinity,
Johnson, & Knight, 2000; physical aggression, Forbes, Adams-Curtis, & White, 2004). As
anticipated, misogyny was positively correlated with sexual entitlement (r = .45, p <. 001;
MSexualEntitlement = 1.70, SD = 0.91) and tendency for revenge motivation (r = .47, p <. 001;
MRevenge = 2.49, SD = 1.33). Additionally, misogyny was positively correlated to masculinity
related violent beliefs (r = .44, p <. 001; MViolentBeliefs = 1.76, SD = .91) as well as physical
18
aggression (r = .38, p <. 001; MViolentIntentions = 1.80, SD = 1.03) Next, to examine discriminant
validity, we explored the relationship with a construct where no relationship is expected. Our
findings confirmed that there was no significant correlation between misogyny and a measure
of analytical thinking (r = -.06, p > .05, MAnalyticalThinking = 5.13, SD = .94).
Study 3
The third study was further designed to establish the generalisability aspect of construct
validity, i.e., measurement validity. We conducted measurement invariance tests in order to
ensure that the scale operates equivalently across comparison groups (Wang, Willett, & Eccles,
2011), and subsequently estimated latent mean differences. Measurement invariance can
establish whether group differences represent accurate mean differences rather than
measurement bias (Dimitrov, 2010). More specifically, we examined whether there were
significant gender and age group differences in regard to misogynistic attitudes. We also
explored whether latent mean differences between younger and older individuals exist.
Measurement invariance and latent mean comparisons
Measurement invariance represents a fundamental step within the construct validation
process. Self-report questionnaires consist of individual items, which are developed to assess
an underlying latent construct. To be valid, (i.e., that the test is not biased against one group or
another), measurement invariance has to be demonstrated (van de Schoot, Lugtig, & Hox,
2012). Measurement invariance examines the psychometric equivalence of an instrument
across groups or time and assesses whether a tool displays the same psychometric properties
across heterogeneous groups (Chen, 2007). If a questionnaire measures an identical construct
with the same structure and meaning across groups or time points, the assessment instrument
is called measurement invariant (van de Schoot et al., 2012). Conversely, if a psychometric
construct demonstrates measurement noninvariance, it suggests that the instrument has a
different structure or meaning to different groups (e.g., male and female participants) or across
19
different measurement points (e.g., pre-test and post-test), and thus the tools cannot be
meaningfully tested or interpreted across groups or across time (Putnick & Bornstein, 2016).
Measurement invariance analysis examines whether the factor loadings and intercepts/
thresholds, from which the latent factor scores are created, are equal across groups (Meredith,
1993). Therefore, it is required to establish measurement invariance prior to testing for group
mean differences of latent constructs as latent means cannot be adequately assessed and
compared when measures are noninvariant (Cheung & Rensvold, 2002). This refers to a key
issue of analyses comparing group means of latent constructs. Observed composite scores on
which most group comparison analyses (e.g., T-tests, ANOVAs) are based, cannot simply be
equated with the latent or true means of the underlying construct. Instead, the relationship
between an observed mean and the latent factor mean is a probabilistic function, which includes
two further important parameters - the indicator intercepts/ thresholds and the factor loadings
- which link individual indicators to the latent construct (Steinmetz, 2010; Sass, 2011). Yet, it
is common practice to compare means and other statistics of latent constructs across groups
without establishing strong factorial invariance (i.e., establishing that factor loadings and
intercepts/thresholds are equal across groups or measurement occasions). The violation of the
measurement invariance assumption can lead to inaccurate inferences of group comparisons
(Yuan & Chan, 2016). For instance, when measurements are non-invariant across groups, the
observed group mean difference may simply be due to differential meaning or understanding
of the construct or particular items across groups (Sass, 2011).
Method Measurement Invariance
Participants. The measurement invariance tests as well as the latent mean comparisons
are based on the same sample as study 2.
Procedure. We ran all analyses within study 3 in the software program R using the
packages ‘Lavaan’ (Rosseel, 2020) and ‘SemTools’ (Jorgensen, 2020). We tested for strong
20
factorial invariance across gender and age groups. Measurement invariance is tested within the
framework of multiple-group confirmatory factor analysis (MGCFA). The procedure in our
analysis involved testing for configural, measurement, and structural invariance (e.g., equality
of group means) (for an outline of the full procedure see Chen, Sousa, & West, 2005; Putnick
& Bornstein, 2016; van de Schoot et al., 2012). The MGCFA process consists of conducting
multiple hierarchically nested confirmatory factor analyses by incrementally increasing levels
of group equality constraints (e.g., constraining factor loadings and thresholds across groups).
Typically, model evaluation includes testing whether the differences between these models are
statistically significant and/ or assessing the change in magnitudes of fit indices to see if more
restricted models perform less well, suggesting that instruments are noninvariant (Gregorich,
2006). More specifically, to decide whether invariance can be confirmed or not, the majority
of analyses either use the chi-square difference test (∆χ2) and/ or examine the change in
magnitudes of accepted fit indices (i.e., ∆AFI) between two nested models.
A significant limitation of the χ2 test is its inherent sensitivity to reject the null
hypothesis when analyses include large samples as well as complex models. Chi-square
difference tests have been found to reject adequate models if the sample size is large but
conversely, fail to reject poor models if the sample is rather small (van de Schoot et al., 2016).
Due to our large sample size and the relatively complex model structure of the misogyny scale,
a multidimensional construct with three latent underlying factors, we decided to employ
alternative fit statistics, such as CFI, TLI, RMSEA and SRMR, which adjust for sample size
and model complexity. This procedure is increasingly employed within MGCFA analyses and
researcher recommend applying this approach if models are based on large samples and contain
a complex structure (e.g., Chen, 2007; Cheung & Rensvold, 2002).
Hence, we compared the changes in model fit statistics for the invariance models to the
previous, less restrictive model and we further evaluated the overall model fit for each
21
individual model. We compared the ΔCFI, ΔTLI, ΔRMSEA, ΔSRMR rather than χ2 difference
test and based our cut-off criteria for model evaluation on recommendations made by Chen
(2007). As such, acceptable model fit for more restrictive invariant models are: ΔCFI < 0.01,
ΔTLI < 0.01, ΔRMSEA < 0.015, and ΔSRMR < 0.03 for metric invariance (i.e., equal factors
loadings) and ΔCFI < 0.01, ΔTLI < 0.01, ΔRMSEA < 0.015, and ΔSRMR < 0.01 for scalar
invariance (i.e., equal items intercepts). Yet, the Δχ2(Δdf) can be found in the Supplementary
Materials F. We applied a Satorra-Bentler scaled (mean-adjusted) chi-square difference test
(SBχ2) due violations of the normality assumption of the data. The SBχ2 is the normal χ2 divided
by a scaling correction to improve the approximate chi-square under non-normality (Satorra &
Bentler, 2001). Due to the non-normal distribution of the data, we ran all MGCFA models with
a robust estimator, MLM, which applies maximum likelihood parameter estimates with
standard errors and a mean-adjusted chi-square test statistic that are robust to non-normality
(Lavaan, 2021).
Results
Measurement Invariance – Gender. First, we ran separate CFAs for both male and
female groups to assess the factor loadings and fit indices before we proceeded with the
multigroup confirmatory analyses. Findings indicated strong factor loadings and good fit
indices for each group, suggesting adequate factorial validity and thus, allowed us to pursue
the measurement invariance tests (see Supplementary Materials E, for an overview of
measurement invariance results see Supplementary Materials F). Next, we tested for configural
invariance (CI), i.e., form invariance, which examines the invariance of the model
configuration, which is another prerequisite before testing for measurement invariance. The CI
analysis assesses the factor structure of latent constructs and a baseline model is estimated for
each group. If the number and pattern of factors and indicator loadings are equal across groups,
configural invariance is supported (Dimitrov, 2010; Vandenberg & Lance, 2000). The fit
22
indices for the CI model showed good model fit: χ2 (64) = 142.36, p < 0.001; χ2/df ratio = 2.22;
CFIRobust = .978, TLIRobust = .969, RMSEARobust = .057; SRMR = .032, confirming configural
invariance across genders. Configural invariance justified the evaluation of more restrictive
invariance models.
The following step involved testing for metric invariance, (i.e., the equivalence of
individual item loadings on the latent factors). Metric invariance is tested by constraining factor
loadings to be equivalent across groups (van de Schoot et al., 2012). Findings of the metric
invariance test showed that the model fit the data well: χ2 (71) = 167.03, p < 0.001; χ2/df ratio
= 2.35; CFIRobust = .974, TLIRobust = .967, RMSEARobust = .060; SRMR = .050. The change of
fit indices, i.e., ΔCFI = -.004, ΔTLI = -.002, ΔRMSEA = -.003 as well as ΔSMRM = -.018
between the configural and metric invariance model were all within the thresholds outlined
above, supporting metric invariance across men and women. Next, tests for scalar invariance
were conducted. Scalar invariance assesses the equivalence of item intercepts or indicator
means across groups. This is tested by constraining item intercepts to be equivalent in the
respective groups (Chen et al., 2005; Putnick & Bornstein, 2016). Invariance testing comparing
female and male individuals suggested good model fit across all indices at the scalar level: χ2
(78) = 207.12, p < 0.001; χ2/df ratio = 2.66; CFIRobust = .971, TLIRobust = .963, RMSEARobust =
.063; SRMR = .053. Evaluation of scalar invariance showed acceptable changes in fit indices:
ΔCFI = -.003, ΔTLI = -.004, ΔRMSEA = -.003 as well as ΔSMRM = -.003. Our findings
confirmed strong factorial invariance of the misogyny scale, indicating that our scales
measured comparable constructs among males and females.
Latent mean differences – Gender. Once we were able to confirm measurement
invariance for the misogyny scale across men and women, we tested for structural invariance
(i.e., the equality of latent factor means across gender groups). More specifically, we examined
whether there were any latent mean differences on the three latent factors between female and
23
male individuals. If the factor means in the reference group are fixed to zero, the estimated
latent factor means in the other groups show the relative differences between the groups (Sass,
2011). The male group in our analyses was treated as the reference group while comparing the
latent means to the female group. Therefore, we fixed the latent means of the male group to
zero and therefore, the latent means of the three factors in the comparison group (i.e., females)
show the mean differences between the two groups. Additionally, we report the effect size
Cohen’s d to allow comparisons across analyses. The latent mean differences and Cohen’s d
for all models are displayed in the Supplementary Materials G.
We tested for structural invariance by constraining the structural coefficients (i.e.,
latent means) to be equal across groups. We evaluated the χ2 difference test (Δχ2) in order to
determine whether significant group mean differences exists. The structural invariance analysis
revealed a significant test: Δχ2 (3) = 22.66, p < .001, evaluating the two nested models, one
having the latent means constrained to be equal, and the other one freely estimating those. Post-
hoc tests confirmed significant Δχ2 on all latent factors between males and females, which
indicates that significant gender mean differences on all subscales: (1) manipulative and
exploitative nature of women (2) distrust towards women and (3) devaluation of women, exist.
Results showed that men report significantly stronger misogynistic attitudes than women for
all three latent factors: MDiff1 = -.482*** (manipulative and exploitative nature of women),
MDiff2 = -.324*** (distrust towards women), and MDiff3 = -.197*** (devaluation of women).
The Cohen’s d indices show that the values of effect size for the factor ‘manipulative and
exploitative nature of women’ (d1 = .38) is medium, whereas the factors ‘distrust towards
women’ (d2 = .27) as well as ‘devaluation of women’ (d3 = .19) display small effect sizes.
Measurement Invariance – age groups. For the age group measurement invariance
tests, we applied the same sequential constraint imposition approach, which we used for the
gender measurement invariance tests. The tests followed a logical sequence of nested models
24
ordered in an increasingly restrictive fashion. At each step, we evaluated the difference of
multiple fit indices to decide whether invariance is accepted or rejected (see Dimitrov, 2010).
To start with, we inspected separate CFAs for all three age groups: (1) individuals aged 18– 29
(2) individuals aged 30– 49 (3) individuals aged 50– 82. The fit indices showed good model fit
for each group (see Supplementary Materials G for also non-significant differences).
Configural invariance model across age groups demonstrated satisfactory fit indices, which
supported invariance of the configural model. The metric invariance model with factor loadings
being restricted to be equal, yielded very good model fit, which suggests that the model fits the
data well: χ2 (110) = 188.07, p < 0.001; χ2/df ratio = 1.71; CFIRobust = .979, TLIRobust = .975,
RMSEARobust = .053; SRMR = .053. Additionally, the change of fit indices between configural
and metric invariance models showed that ΔCFI = -.004, ΔTLI = -.002, ΔRMSEA = -.002 as
well as ΔSMRM = -.020 were all within the thresholds of 0.01 and 0.03, respectively. These
results confirm metric invariance across age groups. The scalar invariance model also
presented good model fit: χ2 (124) = 234.91, p < 0.001; χ2/df ratio = 1.89; CFIRobust = .975,
TLIRobust = .972, RMSEARobust = .057; SRMR = .057. The scalar invariance analysis further
revealed that the indicators’ intercepts were invariant across age groups, as the change between
the scalar and metric invariance tests were all within the thresholds for scalar invariance testing:
ΔCFI = -.004, ΔTLI = -.003, ΔRMSEA = -.004, and ΔSMRM = -.004.
Latent mean differences – age groups. Based on the establishment of measurement
invariance across age groups, we further compared the latent mean differences across these age
groups. We assessed the equality of factor means by comparing two nested models, one had
the latent means constrained to be equal and the other model estimated those freely. Findings
from the structural invariance test revealed significant differences between age groups. The
constrained model had a significantly worse fit, indicated via a significant χ 2 test between
models: Δχ2 (6) = 32.71, p < .001 as well as a notable drop in fit indices. Several post hoc tests
25
were run, each time comparing Δχ2 among another set of age groups, i.e., (1) group 1 and 2,
(2) group 1 and 3, as well as (3) group 2 and 3. Several significant differences emerged between
age groups. Results revealed that latent mean differences existed for the first (manipulative and
exploitative nature of women) and third latent factor (devaluation of women) of the misogyny
scale but not for the second later factor (distrust towards women).
Manipulative and exploitative nature of women. Overall, results showed that younger
individuals hold weaker attitudes about the manipulative and exploitative nature of women.
Specifically, age group 1 (individuals aged 18– 29) reported significantly weaker attitudes
referring to the manipulative and exploitative nature of women than age group 2 (individuals
aged 30– 49) (MDiff1 = .187***, d1 = .14). Age group 1 also reported weaker attitudes than
group 3 (individuals aged 50– 82) on this factor (MDiff2 = .333***, d2 = .28). Further, the
latent factor mean for group 3 was significantly larger than for group 2, which demonstrated
that group 3 holds stronger misogynistic attitudes relating to the manipulative and exploitative
nature of women (MDiff3 = .166 ***, d3 = .13).
Devaluation of women. In regard to our third latent factor addressing the ‘devaluation
of women’, on average, younger individuals reported stronger attitudes than older
participants. Our results showed that the latent factor mean for age group 1 was significantly
larger than the mean for age group 3 (MDiff4 = -.159***, d4 = .17), indicating that younger
individuals hold stronger attitudes, which capture the devaluation of women. Additionally,
we found that age group 2 scored significantly higher on this latent factor than age group 3 in
our sample (MDiff5 = -.159***, d5 = .14). However, we did not find any significant latent
mean differences between group 1 and 2 for this latent factor. Overall, we did not find any
significant differences for the second latent factor ‘distrust towards women’.
26
Discussion
Recent incidents have demonstrated that misogynistic beliefs can lead to acts of
violence. Manifestos of incel as well as far-right terrorist attackers have repeatedly shown that
the perpetrators espoused extreme misogynistic attitudes (Maxwell et al., 2020; Wilson, 2020).
Misogyny has further been characterised as a key motivating factor within extremist
recruitment and ideologies (Center on Extremism, 2019). These incidents clearly demonstrate
that misogyny represents an urgent topic which requires more research. Yet, the topic of
misogyny is particularly under-researched, which became clear when searching for existing
studies or psychometric scales assessing the concept. Across several studies, the present work
has developed and validated a novel measure of misogynistic attitudes that is suitable for
population samples. Overall, the misogyny scale displays robust psychometric properties. It
has been shown to be reliable and valid in assessing misogynistic beliefs among the general
population. In study 1, using exploratory factor analysis, a 10-item scale with three latent
factors was identified. The factors were labelled ‘manipulative and exploitative nature of
women’, distrust towards women’, and ‘devaluation of women’, each consisting of three items.
All item communalities showed satisfactory correlations and item-total correlations
showed good scale homogeneity and inter-item reliability. The misogyny scale displayed very
good internal consistency, indicated by high values of McDonald’s ω across all factors. In
Study 2, the 3-dimensional factor structure was replicated via confirmatory factor analysis,
demonstrating very good model fit. All three factors showed high internal reliabilities. Further,
the scale displayed good convergent (i.e., relationship with sexual entitlement, violent beliefs,
physical aggression as well as revenge motivations) and discriminant (i.e., no relationship with
analytical thinking) validity. Study 3 demonstrated the significance of establishing
measurement invariance before comparing latent factor means. More specifically,
measurement invariance testing allows researchers to identify those items that are problematic
27
or noninvariant, which in turn may enhance the development of new or revised items or
instruments. We established full factorial invariance of the misogyny scale across gender and
age groups. Latent mean analyses highlighted the differences for different latent factors
between men and women as well as between older and younger individuals.
While this paper offers a valuable contribution to studying the concept of misogyny, it
is important to acknowledge several limitations. First, we did not assess test-retest reliability
of the misogyny scale and therefore, we were not able to examine the stability of the misogyny
scale over time. Further research should test for test-retest reliability and assess the intra-class
coefficient (ICC) as well as conduct paired samples’ t-tests to confirm the scale’s repeatability.
Relatedly, another shortcoming is the fact that our scale development is solely based on a cross-
sectional research design so far. Future research should collect longitudinal data to test for the
predictive validity of the misogyny scale and to conduct the above-mentioned reliability tests.
Additionally, it would also be beneficial to assess the relationship of our newly
developed misogyny scale and existing scales measuring closely related concepts, such as
hostile sexism. Strong correlations would suggest convergent validity. Lastly, our misogyny
scale will most likely be dependent on the cultural context where the scale is applied and as
such, may not be applicable to non-WEIRD (Western, educated, industrialised, rich, and
democratic) countries. The concepts of women’s rights and the role of women within society
more broadly and that of misogyny more specifically, vary heavily between countries and are
progressive issues which constantly change and adapt. As such, no universal measure of
misogyny should be expected. Studies operationalising the scale should think carefully whether
the underlying latent construct is applicable to the respective study context. Further, all studies
using the scale should run a CFA to confirm the factor structure and should conduct
measurement invariance test to see whether the scale possesses measurement validity in that
specific sample.
28
Conclusion
Taken together, the current paper makes important first steps in establishing a
conceptualisation of misogyny, which may be used in future survey studies and may encourage
further research in this area. Given the prevalence of extreme misogynistic beliefs among incel
as well as far right extremists and the fundamental role these attitudes have shown to play in
motivating acts of violence, P/CVE programs and violent risk assessment approaches should
more strongly incorporate those concepts into their work.
References
Baele, S. J., Brace, L., & Coan, T. G. (2019). From “Incel” to “Saint”: Analyzing the violent
worldview behind the 2018 Toronto attack. Terrorism and Political Violence, 1-25.
Bjork‐James, S. (2020). White sexual politics: the patriarchal family in white nationalism and
the religious right. Transforming Anthropology, 28(1), 58-7.
Blake, K. R., O’Dean, S. M., Lian, J., & Denson, T. F. (2021). Misogynistic tweets correlate
with violence against women. Psychological science, 0956797620968529.
Bliese, P., & Bliese, M. P. (2016). Package ‘multilevel’. R version, 2.
Bratich, J., & Banet-Weiser, S. (2019). From pick-up artists to incels: con (fidence) games,
networked misogyny, and the failure of neoliberalism. International Journal of
Communication, 13, 25.
Byrne, B. M. (2001). Structural equation modeling with AMOS, EQS, and LISREL:
Comparative approaches to testing for the factorial validity of a measuring
instrument. International journal of testing, 1(1), 55-86.
29
Carpenter, S. (2018). Ten steps in scale development and reporting: A guide for
researchers. Communication Methods and Measures, 12(1), 25-44.
Casey, E. A., Masters, N. T., Beadnell, B., Hoppe, M. J., Morrison, D. M., & Wells, E. A.
(2017). Predicting sexual assault perpetration among heterosexually active young
men. Violence against women, 23(1), 3-27.
Center on Extremism (2018) When women are the enemy: The intersection of misogyny and
white supremacy. ADL Center on Extremism Report. New York.
Chaffee, S.H. (1991) Communication Concepts 1: Explication. Newbury Park, CA: Sage.
Check, J. V., Malamuth, N. M., Elias, B., & Barton, S. (1985). On hostile ground. Psychology
Today, 19(4), 56-61.
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement
invariance. Structural equation modeling: a multidisciplinary journal, 14(3), 464-504.
Chen, F. F., Sousa, K. H., & West, S. G. (2005). Testing measurement invariance of second-
order factor models. Structural Equation Modeling, 12, 471–492.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing
measurement invariance. Structural equation modeling, 9(2), 233-255.
Chin CL., Yao G. (2014) Convergent Validity. In: Michalos A.C. (Eds.) Encyclopedia of
Quality of Life and Well-Being Research. Springer, Dordrecht.
Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction
to tests and measurement. Boston: McGraw-Hill.
30
Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four
recommendations for getting the most from your analysis. Practical Assessment,
Research & Evaluation, 10(7), 1–9.
Davidson, B. I., Shaw, H., & Ellis, D. A. (2020). Fuzzy constructs in assessment: The overlap
between mental health and technology “use.”. Open Science Framework.
De Vaus, D. (2004). Surveys in Social Research (5th ed.). London: Routledge.
DeVellis, R. F. (2012). Scale development. Theory and applications (3rd ed.). Thousand Oaks,
CA: Sage.
Diaz, P. C., & Valji, N. (2019). Symbiosis of Misogyny and Violent Extremism. Journal of
International Affairs, 72(2), 37-56.
Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct
validation. Measurement and Evaluation in Counseling and Development, 43(2), 121-
149.
Dragiewicz, M., Burgess, J., Matamoros-Fernández, A., Salter, M., Suzor, N. P., Woodlock,
D., & Harris, B. (2018). Technology facilitated coercive control: Domestic violence
and the competing roles of digital media platforms. Feminist Media Studies, 18(4), 609-
625.
Forbes, G. B., Adams-Curtis, L. E., & White, K. B. (2004). First-and second-generation
measures of sexism, rape myths and related beliefs, and hostility toward women: Their
interrelationships and association with college students’ experiences with dating
aggression and sexual coercion. Violence against women, 10(3), 236-261.
Ging, D. (2017). Alphas, Betas, and Incels: Theorizing the Masculinities of the Manosphere.
Men and Masculinities. Advance online publications.
31
Glick, P., & Fiske, S. T. (1996). The ambivalent sexism inventory: Differentiating hostile and
benevolent sexism. Journal of personality and social psychology, 70(3), 491.
Gregorich, S. E. (2006). Do self-report instruments allow meaningful comparisons across
diverse population groups? Testing measurement invariance using the confirmatory
factor analysis framework. Medical care, 44(11 Suppl 3), S78.
Hayton, J. C., Allen, D. G., & Scarpello, V. (2004). Factor retention decisions in exploratory
factor analysis: A tutorial on parallel analysis. Organizational research methods, 7(2),
191-205.
Hayes, A. F., & Coutts, J. J. (2020). Use omega rather than Cronbach’s alpha for estimating
reliability. But…. Communication Methods and Measures, 14(1), 1-24.
Hill, M. S., & Fischer, A. R. (2001). Does entitlement mediate the link between masculinity
and rape-related variables? Journal of Counseling Psychology, 48(1), 39–50.
Hoffman, B., Ware, J., & Shapiro, E. (2020). Assessing the threat of incel violence. Studies in
Conflict & Terrorism, 43(7), 565-587.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure
analysis: Conventional criteria versus new alternatives. Structural equation modeling:
a multidisciplinary journal, 6(1), 1-55.
Jason Wilson, “What Do Incels, Fascists, and Terrorists Have in Common? Violent Misogyny”
Guardian, 4 May 2018.
Johnson, G. M., & Knight, R. A. (2000). Developmental antecedents of sexual coercion in
juvenile sexual offenders. Sexual Abuse: A Journal of Research and Treatment, 12(3),
165-178.
32
Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2020). semTools:
Useful tools for structural equation modeling.
Kline, R. B. (2013). Exploratory and confirmatory factor analysis. In Y. Petscher, C.
Schatschneider, & D. L. Compton (Eds.), Applied quantitative analysis education and
the social sciences (pp. 171–207). New York, NY, USA: Routledge.
Leone, R. M., & Parrott, D. J. (2019). Misogynistic peers, masculinity, and bystander
intervention for sexual aggression: Is it really just “locker‐room talk?”. Aggressive
behavior, 45(1), 42-51.
Lyle, P., & Esmail, A. (2019). Mass Shooting and Misogyny: Broken Males are Pulling the
Trigger.
Marwick, A. E., & Caplan, R. (2018). Drinking male tears: Language, the manosphere, and
networked harassment. Feminist Media Studies, 18(4), 543-559.
Munsch, C. L., & Willer, R. (2012). The role of gender identity threat in perceptions of date
rape and sexual coercion. Violence Against Women, 18(10), 1125-1146.
Maxwell, D., Robinson, S. R., Williams, J. R., & Keaton, C. (2020). “A Short Story of a Lonely
Guy”: A Qualitative Thematic Analysis of Involuntary Celibacy Using
Reddit. Sexuality & Culture, 1-23.
Meredith, W. (1993). Measurement invariance, factor analysis and factorial
invariance. Psychometrika, 58(4), 525-543.
Osborne, J. W. (2014). Best practices in exploratory factor analysis. Scotts Valley, CA:
CreateSpace Independent Publishing.
33
Piggot, M. (2004). Double jeopardy: Lesbians and the legacy of multiple stigmatized
identities. Unpublished thesis, Psychology Strand at Swinburne University of
Technology, Australia.
Pina, A., Holland, J., & James, M. (2017). The malevolent side of revenge porn proclivity:
Dark personality traits and sexist ideology. International Journal of Technoethics (IJT),
8(1), 30-43.
Piedmont R.L. (2014) Construct Validity. In: Michalos A.C. (eds) Encyclopedia of Quality of
Life and Well-Being Research. Springer, Dordrecht
Putnick, D. L., & Bornstein, M. H. (2016). Measurement invariance conventions and reporting:
The state of the art and future directions for psychological research. Developmental
review, 41, 71-90.
Revelle, W. (2018). psych: Procedures for psychological, psychometric, and personality
research. R package version, 1(10).
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling and more. Version
0.5–12 (BETA). Journal of statistical software, 48(2), 1-36.
Ruel, E., Wagner III, W. E., & Gillespie, B. J. (2015). The practice of survey research: Theory
and applications. Sage Publications.
Sass, D. A. (2011). Testing measurement invariance and comparing latent factor means within
a confirmatory factor analysis framework. Journal of Psychoeducational
Assessment, 29(4), 347-363.
Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment
structure analysis. Psychometrika, 66(4), 507-514.
34
Scaptura, M. N. (2019). Masculinity Threat, Misogyny, and the Celebration of Violence in
White Men (Doctoral dissertation, Virginia Tech).
Scaptura, M. N., & Boyle, K. M. (2020). Masculinity threat,“Incel” traits, and violent fantasies
among heterosexual men in the United States. Feminist Criminology, 15(3), 278-298.
Smith, J. (2019). ‘When I Saw Women Being Attacked… It Made Me Want to Stand Up and
Fight’: Reporting, Responding to, and Resisting Online Misogyny. In Online
Othering (pp. 287-308). Palgrave Macmillan, Cham.
Steinmetz, H. (2011). Estimation and comparison of latent means across cultures. Cross-
cultural analysis: Methods and applications, 85-116.
Swim, J. K., Aikin, K. J., Hall, W. S., & Hunter, B. A. (1995). Sexism and racism: Old-
fashioned and modern prejudices. Journal of personality and social psychology, 68(2),
199.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA:
Allyn and Bacon.
Thompson, B. (2004). Exploratory and confirmatory analysis: Understanding concepts and
applications. Washington, DC, USA: American Psychological Association.
Tyberg, S. A. (2016). Entitlement and Anguish: An Analysis of Masculinity and Misogyny in
American School Shootings. Dickinson College Honors Theses. Paper 224.
Van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement
invariance. European journal of developmental psychology, 9(4), 486-492.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement
invariance literature: Suggestions, practices, and recommendations for organizational
research. Organizational research methods, 3(1), 4-70.
35
Wang, M. T., Willett, J. B., & Eccles, J. S. (2011). The assessment of school engagement:
Examining dimensionality and measurement invariance by gender and
race/ethnicity. Journal of School Psychology, 49(4), 465-480.
Worthington, R. L., & Whittaker, T. A. (2006). Scale development research. A content analysis
for recommendations for best practices. The Counseling Psychologist, 34(6), 806–838.
Yuan, K. H., & Chan, W. (2016). Measurement invariance via multigroup SEM: Issues and
solutions with chi-square-difference tests. Psychological methods, 21(3), 405.
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s α, Revelle’s β, and
McDonald’s ω H: Their relations with each other and two alternative
conceptualizations of reliability. psychometrika, 70(1), 123-133.
Zijlmans, E. A., van der Ark, L. A., Tijmstra, J., & Sijtsma, K. (2018). Methods for estimating
item-score reliability. Applied Psychological Measurement, 42(7), 553-570.
36
Supplementary materials A
Parallel Analysis Scree Plots
FA Actual Data
FA Simulated Data
8
FA Resampled Data
eigen values of principal factors
6
4
2
0
5 10 15
Factor Number
37
Supplementary materials B
Parallel Analysis Scree Plots
FA Actual Data
6
FA Simulated Data
FA Resampled Data
5
eigen values of principal factors
4
3
2
1
0
2 4 6 8 10
Factor Number
Supplementary materials C. Inter-item correlations of all manifest items and factors of the
misogyny scale after EFA (Study 1).
Item 1 2 3 4 5 6 7 8 9 10 F1 F2 F3
1 -
2 .63 -
3 .67 .74 -
4 .71 .70 .76
5 .58 .46 .59 .59 -
6 .62 .52 .62 .65 .76 -
7 .64 .58 .66 .67 .69 .75 -
8 .48 .37 .48 .46 .50 .53 .54 -
9 .36 .32 .43 .41 .46 .47 .46 .57 -
10 .43 .34 .43 .43 .48 .50 .47 .46 .53 -
F1 .85 .87 .91 .91 .63 .68 .72 .51 .43 .46 -
F2 .68 .58 .69 .70 .89 .93 .90 .58 .51 .53 - -
F3 .51 .42 .55 .53 .59 .61 .59 .84 .85 .78 - - -
38
Supplementary materials D. Correlations and covariances between all manifest items and
factors and latent correlations between all factors of the finalised misogyny scale after CFA
(Study 2).
Item 1 2 3 4 5 6 7 8 9 10 F1 F2 F3
1 2.80 1.67 1.84 1.99 1.28 1.41 1.58 .98 .81 .73 - - -
2 .62 2.99 1.91 2.07 1.33 1.47 1.63 1.02 .84 .76 - - -
3 .63 .70 2.72 2.28 1.46 1.61 1.79 1.12 .93 .84 - - -
4 .69 .68 .76 2.92 1.58 1.75 1.93 1.21 1.00 .90
5 60 .51 .61 .67 2.08 1.43 1.58 1.04 .86 .74 - - -
6 .59 .56 .65 .70 .70 2.10 1.74 1.15 .95 .81 - - -
7 .62 .58 .66 .73 .65 .76 2.53 1.28 1.05 .89 - - -
8 .52 .47 51 .53 .53 .56 .57 1.96 .89 .85 - - -
9 .40 .33 .46 .48 .49 .51 .50 .57 1.47 .71 - - -
10 .39 .37 .42 .43 .48 .44 .48 .47 .55 1.39 - - -
F1 .84 .86 .90 .91 .68 .71 .74 .58 .47 .46 1.60 1.29 1.00
F2 .67 .62 .72 .77 .87 .91 .91 .62 .55 .52 .89 1.31 1.02
F3 .53 .48 .56 .58 .60 .61 .62 .87 .85 .80 .73 .82 1.17
39
Supplementary materials E. Standardised factor loadings and fit indices of models for each group.
Standardised factor loadings Fit indices of models

1 2 3 4 5 6 7 8 9 10 χ2/ df df CFI TLI RMSEA SRMR
Gender
Female
.72 .75 .89 .92 .75 .87 .85 .79 .84 .68 2.29 32 .979 .971 .057 .036
(n = 385)
Male
.78 .75 .87 .92 .82 .86 .88 .77 .70 .63 2.22 32 .977 .967 .058 .034
(n = 365)
Age groups
Age group 1
.76 .77 .85 .89 .84 .91 .79 .74 .66 .60 1.45 32 .984 .978 .053 .034
(n = 157)
Age group 2
.77 .77 .89 .93 .76 .89 .88 .77 .71 .64 1.97 32 .974 .968 .059 .039
(n = 279)
Age group 3
.75 .75 .89 .92 .78 .84 .89 .83 .82 .73 1.51 32 .990 .986 .040 .031
(n = 314)
40
Supplementary materials F. Measurement invariance (configural, metric, and scalar).

Model Comparison χ2 (df) CFI TLI RMSEA SRMR Δχ2(Δdf) ΔCFI ΔTLI ΔRMSEA ΔSRMR MI
Gender
Model 1: CI 142.36 (64) .978 .969 .057 .032
Model 2: Metric Model 1 167.03 (71) .974 .967 .060 .050 24.67 (17)*** -.004 -.002 -.003 -.018 Met
Model 3: Scalar Model 2 207.12 (78) .971 .963 .063 .053 40.09 (7)*** -.003 -.004 -.003 -.003 Met
Model 4: Eqmeans Model 3 229.78 (81) 22.66 (3)***
Age groups
Model 1: CI 158.20 (96) .983 .977 .051 .037
Model 2: Metric Model 1 188.07 (110) .979 .975 .053 .053 29.87 (14) -.004 -.002 -.002 -.016 Met
Model 3: Scalar Model 2 234.91 (124) .975 .972 .057 .057 46.84 (14)*** -.004 -.003 -.004 -.004 Met
Model 4: Eqmeans Model 3 267.62 (130) 32.71 (6)***
Notes: Statistically significant Δχ2 were marked with ‘*’. The Satorra-Bentler scaled chi-square difference test is reported. CI = configural
invariance; Metric = Metric invariance; Scalar = Scalar invariance; Eqmeans = Equal group means. MI = measurement invariance.
41
Supplementary materials G. Latent Mean Differences.
Model Factor MDiff Cohen’s d

Gender, Male = 0
F1 -.482*** .38
F2 -.324*** .27
F3 -.197*** .19
Age Groups
Age group 1-2 F1 .187*** .14
F2 -.019 .02
F3 -.048 .05
Age group 1-3 F1 .333*** .28
F2 -.055 .05
F3 -.159*** .17
Age group 2-3 F1 .166*** .13
F2 -.038 .03
F3 -.159*** .14
42

Measuring Individuals' Misogynistic Attitudes - Development and Validation of The Misogyny Scale

Uploaded by

Copyright:

Available Formats

Measuring Individuals' Misogynistic Attitudes - Development and Validation of The Misogyny Scale

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measuring Individuals' Misogynistic Attitudes - Development and Validation of The Misogyny Scale

Uploaded by

Copyright:

Available Formats

MEASURING INDIVIDUALS’ MISOGYNISTIC ATTITUDES

Measuring Individuals’ Misogynistic Attitudes: Development and

Bettina Rottweiler1 | Paul Gill1

Word Count: 9919

Keywords: misogyny, scale development, violent extremism, incels, domestic violence,

Public discourse on misogyny and its consequences is growing. Broad-based social

instances of harassment and violence against women consistently brought discussions of

2019), and violent fantasies (Scaptura & Boyle, 2020).

& Banet-Weiser, 2019; Baele, Brace, & Coan, 2019).

boundaries of misogyny remain quite loose. Given this, it motivates a finer-grained

measurement of misogyny, as well as an exploration of misogyny’s psychological antecedents

and consequences. Surprisingly, no validated psychometric tools that measure misogyny

Study 1 attempted to gain a conceptual and theoretical understanding of misogyny.

Resultingly, methodological inconsistencies mean that standards of scale development vary

on a priori criteria (10) Present results.

conceptualisation, such as finding suitable conceptual definitions, selecting appropriate

The literature review intended to provide an overview of the concept of ‘misogyny’.

interpretations of the concept (Carpenter, 2018). We defined misogyny as the hatred or

specifically developed to assess homosexual women’s internalised misogyny (Piggott, 2004).

sexism more generally. We identified several dimensions pertaining to the construct of

without changing them, modified (e.g., due to slightly diverging conceptualisation), or

Data collection procedure. We conducted a large scale general population survey in

questionnaire, the respondents were thanked and debriefed.

2% (n = 22) as ‘Mixed’, as well as 1.5% (n = 11) of all respondents answered ‘Other’.

Step 2: Determine sampling procedure

The sample of n = 750 exceeds previously recommended guidelines of a minimum of

Step 3: Examine Data Quality

violated (Costello & Osborne, 2005).

Meyer-Olkin (KMO) measure of sampling adequacy with a value of ≥ .60 is recommended

All items correlated ≥ .30.

Materials A), a 5-factor solution was initially extracted.

Step 8: Rotate Factors

items would be discarded. Additionally, we assessed items based on their theoretical

scale after EFA).

ranged between .44 – .82 (Table 1).

Inter-item correlations were also examined as they present an essential element in

an average inter-item correlation of .55, which is satisfactory.

No. Dimension Item Factors M (SD) Skewness Corrected h2

satisfactory inter-item reliabilities (i.e., Communalities h2 and total-item correlations) as well

a psychometric instrument that represents misogynistic beliefs among a general population

sample (see Table 2).

3. Devaluation of women .80 1.82 (.99) .57*** .66*** -

Note: ***p < .001. Correlation coefficient r is reported.

In Study 2, we aimed to replicate the 3-dimentional factor structure of the misogyny

Participants. As mentioned above, participants were part of a UK nationally

(Schumacker & Lomax, 2015).

Procedure. To validate our newly developed theoretical three-dimensional construct of

displayed a skewed distribution, violating the normality assumption. As such, we conducted a

(see Figure 1).

Figure 1. Confirmatory factor analysis of the 3-factor Misogyny Scale (Study 2)

is best conceptualised as a multidimensional model with three underlying factors, representing

The composite reliability (CR), McDonald’s ω, is a less biased estimate of reliability

factors can be found in the Supplementary Materials D.

F2 .88 2.36 (1.34) .71 .79*** -

F3 .77 1.86 (1.05) .60 .62*** .68*** -

Convergent and discriminant validity of the misogyny scale. Convergent and

dissimilar und unrelated concepts (Chin, Yao, 2014).

MRevenge = 2.49, SD = 1.33). Additionally, misogyny was positively correlated to masculinity

of analytical thinking (r = -.06, p > .05, MAnalyticalThinking = 5.13, SD = .94).

3. Devaluation of women .80 1.82 (.99) .57* .66* -

F3 .77 1.86 (1.05) .60 .62* .68* -