Construct Validity and Reliability of The Scale of Attitudes Towards External Assessments Applied On A Large Scale
Construct Validity and Reliability of The Scale of Attitudes Towards External Assessments Applied On A Large Scale
Abstract: The Scale of Attitudes towards External Assessments applied on a Large-Scale was developed to capture what basic
education teachers think, feel, and how they behave in response to this type of assessment. Considering the potential of the instrument
to support management decisions in the field of educational assessment policies, this article aimed to evaluate construct validity,
via factor analysis, as well as scale reliability, via composite reliability of the scale, based on its application to a sample of teachers
affiliated with the Education Department of Espírito Santo/Brazil. This is a quantitative, nonexperimental, instrumental study that
involved 405 teachers from the public school network, whose results indicate adequate psychometric indices and a satisfactory
factor structure consistent with the proposed three-dimensional attitude construct. The statistical coefficients found at the level
of the analyses performed legitimize its use for the development of public policies and effective practices in the educational field.
Palabras clave: psicometría, escala de actitudes, profesores, investigación educacional, análisis factorial
1
Instituto Federal de Educação, Ciência e Tecnologia de Minas Gerais, In the educational context, assessments have evolved
Ouro Preto - MG, Brazil into a multifaceted domain, transcending specific theories,
2
Centro Federal de Educação Tecnológica Celso Suckow da Fonseca,
Rio de Janeiro - RJ, Brazil
processes, and methods. This approach is manifested in
3
Instituto Federal de Educação, Ciência e Tecnologia do Sul de Minas the development and guidance of evidence-based public
Gerais, Passos - MG, Brazil policies, which seek to establish quality standards aligned
4
Universidade Federal do Espírito Santo, Vitória - ES, Brazil with the constant social, cultural, scientific, and technological
Support: Article derived from the research project entitled “Standardized
evaluation in the states of Espírito Santo/Brazil and Baja California/
transformations that drive innovation and knowledge
Mexico: dilemmas and tensions of Paebes and Exeims-BC,” funded by the production in an increasingly globalized world.
Foundation for Research and Innovation Support of Espírito Santo, Public When assessments are implemented by external agents
Notice No. 03/2021 – Universal, No. SIAFEM: 2021-1WKB8. to the school, it is called external assessment, which
Correspondence address: Denilson Junio Marques Soares. Instituto Federal
de Educação, Ciências e Tecnologia de Minas Gerais - Campus Ouro Preto -
is generally applied on a Large Scale, i.e., for a large
Rua Pandiá Calógeras, 898, Bauxita, Ouro Preto, CEP 35400-000. E-mail: number of people, providing important information for the
[email protected] monitoring of educational systems. Thus, external Large
Available in www.scielo.br/paideia 1
Paidéia, 34, e3410
Scale assessments have gained relevance in the national and has about the attitudinal object, being usually elicited in its
international scenarios, being highlighted for their role as verbal or written form. The affective component, in turn,
instruments in the development of public policies that impact refers to feelings, emotions, and sensations, assuming a
teaching practice, aiming to improve the quality of teaching connotation of the individual’s evaluative judgment in
(Soares et al., 2022a). relation to the object in question, positively or negatively.
Studies developed by educational researchers have As indicated by Svenningsson et al. (2022), a special
highlighted this movement, which is aimed at understanding case of the affective dimension is interest, analytically
what impacts the policies of accountability, as manifested by understood as an emotional schema that also includes
these assessments, can bring to teaching practice (Baidoo- cognitive dimensions.
Anu & Ennu Baidoo, 2022). However, there is still a lack The theory of planned behavior (TPB), initially
of studies in the literature aimed at assessing the attitudes of proposed by Ajzen and Fishbein (1980), proposes that
teaching professionals towards such assessment. these two components (cognitive and affective) determine,
The Scale of Attitudes towards External Assessments in part, the behavioral intention, which is the immediate
applied on a Large Scale (EAAE), proposed by Soares motivational factor for the behavior itself. In this aspect,
et al. (2022a), was developed for this purpose. It is a 30- behavioral intention can be seen as a direct result of the
item instrument, composed of statements, which seeks to affective-cognitive consistency of the subjects (Svenningsson
capture what basic education teachers think, feel, and how et al., 2022). However, according to Ankiewicz (2019),
they behave towards this type of assessment. The EAAE this influence can be positive or negative, depending on
employs a five-point Likert scale, ranging from 1 (strongly other factors that also affect behavior, such as situational
disagree) to 5 (strongly agree). In summary, it is expected and cultural factors. The analysis of the internal structure
that lower (higher) scores indicate less (more) positive/ of the instrument allows us to examine these relationships.
favorable attitudes towards the object evaluated.
During EAAE development, content validity was
analyzed by experts, based on the calculation of the Methods
Content Validity Coefficient (CVC), and a pilot application,
conducted with a sample of the target population (Soares This is a quantitative, non-experimental, instrumental
et al., 2022a). However, considering the potential of EAAE study (Carretero-Dios & Pérez, 2007), conducted with a
to support decision-making within the scope of Large Scale cross-sectional design, which consists of the search for
assessment policies, its validation process requires further evidence of structural validity of the EAAE.
studies to confirm its validity and reliability, as the use of
scales with good psychometric parameters is essential to Participants
ensure the accuracy and usefulness of the results in different
contexts, including education. The EAAE was applied to a non-probabilistic
In this aspect, this research aimed to evaluate construct convenience sample of 405 teachers linked to the
validity, via Factor Analysis, as well as scale reliability, Education Department of Espírito Santo/Brazil. After
via composite reliability, based on its application to a sample treating missing cases and extreme values (univariate
of teachers affiliated with the Education Department of and multivariate outliers), 367 responses were considered
Espírito Santo/Brazil. valid (n = 367). As the scale consists of 30 items, there are
The EAAE is part of the field of study on attitudes approximately 12.23 subjects per item, which is higher than
that originated in the early 20th century, based on the the recommendations of Hair et al. (2021) that suggest,
contributions of sociologists Allport (1935) and Thomas as a general rule, a minimum sample of 200 respondents
and Znaniecki (1919). In search of a definition that would and an ideal ratio of at least 10 subjects per item.
fit the various theories and perspectives of the field, based In this sample, there was a predominance of women
on a systematic review of the literature on the concept of (62.13%), whites (56.4%), graduates of federal universities
attitude, Eagly and Chaiken (1993) define it as a hypothetical (51.5%) with a teaching degree (72.48%) or a teaching
construct related to a “a psychological tendency that is degree and a bachelor’s degree (19.35%), who work in
expressed by evaluating a particular entity with some high school (77.11%), in a single school (61.85%), with an
degree of favor or disfavor” (Eagly & Chaiken, 1993). average workload of 36 hours (SD = 10.18), and on a
The specialized literature present various models to permanent basis (51%). The age of participants ranged
explain attitude, of which the tripartite model, introduced from 22 to 69 years (M = 40.5, SD = 9.56) and the average
by Rosenberg and Hovland (1960), is the most relevant, time of teaching was 13 years (SD = 9.32). Regarding
as it encompasses multiple psychological factors (Mazana schooling, 16.08% held a bachelor’s degree as their highest
et al., 2019; Svenningsson et al., 2022). This model, assumed qualification, 54.5% were specialists, 21.53% were masters,
by EAAE, considers the following interrelated dimensions: 6.81% were doctors, and 1% were post-doctorates.
cognitive, affective, and behavioral. The participating teachers worked in 29 of the
The cognitive component is related to the beliefs, 78 municipalities of Espírito Santo, of which most
perceptions, concepts, and knowledge that the individual lived and worked in the Metropolitan Region of Greater
2
Soares, D. J. M. et al. (2024). Construct Validity and Reliability of the EAAE.
Vitória (64.31%). As for the subject they teach, most are Data analysis. Initially, descriptive statistics were
in the area of Languages (28.34%), followed by estimated for the scale score and its dimensions. Then,
Natural Sciences (21.25%), Human Sciences (19.62%), to assess the factorial structure of the EAAE and thus search
and Mathematics (18.80%), respectively. Approximately for evidence of construct validity, an Exploratory Factor
12% of participants reported working in other disciplines Analysis (EFA) was performed. To verify the possibility of
or did not want to state the discipline in which they work. factoring the data, two indices were analyzed: the Kaiser-
Meyer-Olkin (KMO) measure of sampling adequacy, which
Instruments needs to be at least 0.60 to support this type of analysis,
and the Bartlett’s test of Sphericity, whose chi-square value
The Scale of Attitudes towards External Assessments must be statistically significant (Tabachnick & Fidell, 2007).
applied on a Large Scale (EAAE) was used. It consists of The analysis was implemented using a polychoric
30 items, elaborated in the form of assertions and structured correlation matrix and robust diagonally weighted least
on a five-point Likert scale, ranging from 1 (strongly squares (RDWLS) extraction method. To define the number
disagree) to 5 (strongly agree). It seeks to capture what of factors to be extracted, the Parallel Analysis technique
basic education teachers (target audience) think, feel, was used with random permutation of the observed data and
and how they behave towards this type of assessments the assumed rotation was the Robust Promin (Timmerman &
that are applied on a Large Scale in Brazil. Lorenzo-Seva, 2011). The Hull method was also used to
To this end, it considers the attitudes construct, aid in deciding the number of dimensions to be retained
composed of the cognitive (12 items), affective (8 items), (Timmerman & Lorenzo-Seva, 2011).
and behavioral (10 items) dimensions. Each dimension To confirm the hypothetical factor structure found via the
is accompanied by a guiding phrase: for the cognitive EFA, a confirmatory factor analysis (CFA) was performed to
dimension, participants were requested to answer based verify whether the hypothetical factor structure was adequate
on what they believe (beliefs, knowledge, information, to the observed variables, thus consolidating the theoretical
and/or opinions) towards external assessments applied model previously identified by the EFA (Hair et al., 2021).
on a Large Scale; for the affective dimension, a response The adequacy of the model was assessed using the
based on feelings was requested; and, for the behavioral Root Mean Square Error of Approximation (RMSEA),
dimension, a response based on daily actions was requested. Comparative Fit Index (CFI), and Tucker-Lewis Index (TLI)
The values from attitude measurement, both for the fit indexes. According to the literature, adequate RMSEA
general scale and for each of its dimensions, were obtained values should range from 0.05 to 0.08, which can go up
from the arithmetic addition of the answers given by the to 0.10, and CFI and TLI values should be above 0.90, or
participant on the respective items. Thus, the EAAE preferably 0.95 (Timmerman & Lorenzo-Seva, 2011).
score varies from 30 to 150 points, with a neutral score of The factor loadings and thresholds of the items were also
90 points. In summary, higher values (above the neutral assessed. These indicators were analyzed to investigate in
point) reveal more positive attitudes and, on the other hand, depth the accuracy of the items, using factor loadings, as well
lower values (below the neutral point) indicate more as the difficulty limits of the items, using thresholds, assessed
negative attitudes towards external assessments applied on using the Reckase parameterization (Reckase, 1985).
a Large Scale. Finally, a Gaussian graphical model was estimated,
regularized by L1 regularization technique (LASSO) with
Procedures the selection of the EBIC model, which was presented
in a network structure, in which the nodes represent the
Data collection. The scale was applied in the online questionnaire items and the lines (edges) represent the
(38.15%) and face-to-face (61.85%) formats. For the online relationship between the questionnaire items, aiming
method, an online survey form was used, hosted on the to identify the strength of the correlation between them
Google Forms platform, with dissemination via e-mails, (Epskamp & Fried, 2018).
sent by the Education Department of Espírito Santo (SEDU),
and via the WhatsApp messaging application, in specific Ethical Considerations
groups of teachers in the network. For the face-to-face
method, subjects were approached in their workplaces. The research was approved by the Research Ethics
Notably, in both methods, participants signed an Committee on Human Subjects of the Universidade Federal
informed consent form before responding, which explained do Espírito Santo, CAAE No. 57014722.2.0000.5542 and was
the objectives of the study and ensured the confidentiality authorized by the Education Department of Espírito Santo.
of the information provided. In the description of the
applied instrument, it was emphasized that participation
was voluntary, and it was possible to abandon it at any time, Results
without penalty. The absence of right or wrong answers was
also emphasized, and the anonymity of the participants Table 1 shows the descriptive statistics estimated for the
was assured. applied EAAE and its cognitive, affective, and behavioral
3
Paidéia, 34, e3410
dimensions. Note that the variation in scores indicates the KMO (0.953) suggested the interpretability of the correlation
relevance of the scale to discriminate positive and negative matrix of the items. The parallel analysis and the Hull method
attitudes towards external assessments applied on a suggested three factors as being the most representative for
Large Scale. the data, as indicated by Figures 1 and 2, respectively. In the
Regarding the factorial structure of the scale, first, the eigenvalues and random data obtained from the
the Bartlett’s sphericity test (131.02, gl = 29, p < 0.001) and resampling process via bootstrap methods were presented.
Table 1
Descriptive Statistics of the EAAE Application
Scale and its dimensions Quantity of items Mean Score Standard Deviation Coefficient of Variation Range
Cognitive 12 34.55 10.00 28.94% 12–60
Affective 8 22.32 7.60 34.05% 8–40
Behavioral 10 34.7 7.80 22.48% 10–50
Full Scale 30 91.57 21.97 23.99% 30–150
Figure 1
Parallel Analysis Scree Plots
14
FA Actual
ctua Data
ata
FA Simulated Data
FA Resampled Data
12
eigen values of principal factors
10
8
6
4
2
0
0 5 10 15 20 25 30
Factor Number
Figure 2
Hull Method
0.6
0.5
4 3
0.4 2
f
0.3
0.2
0.1
0
0.0
df
4
Soares, D. J. M. et al. (2024). Construct Validity and Reliability of the EAAE.
From the determination of the three-dimensional model for can be observed in Table 2. The variance explained by the scale
the structuring of the scale, confirmed by the CFA, we sought to and its dimensions was also reported, as well as the composite
estimate the factor loadings and thresholds of the items, which reliability indices (Timmerman & Lorenzo-Seva, 2011).
Table 2
Psychometric analyses of EAAE
5
Paidéia, 34, e3410
As Table 2 shows, the factor structure confirms the Regarding the thresholds, estimated via the item
differentiation of the three dimensions of the attitude response theory, no unexpected pattern of response was
construct. Factor I, which presented the highest percentage of found, with a gradual increase in the difficulty of response
explained variance (R² = 25%), integrates the items related to along the interval scale, that is, as the response category on
the cognitive dimension; Factor II includes all those related the scale increased, so did the level of latent trait required for
to the behavioral component (R² = 21%); and Factor III, endorsement. Thus, the difficulty is greater when the answer
all those related to the affective component (R² = 18%). option of the item is closer to the alternative “I totally agree.”
As for the factor loadings, associated with the precision Thus, items 1 “Adequately assess the quality of teaching
of the items, there are adequate and relatively high values and learning” and 18 “I feel that my knowledge is valued
in their respective factors, ranging from 0.484 to 0.873 on by them” presented greater difficulty and items 25 “I talk
the scale. Only two items presented a cross-load pattern to students about its importance” and 26 “I recommend
(i.e., items with factor loadings above 0.30 in more than participation in these assessments” were easier to answer.
one factor), namely Item 5 and 13. However, Pratt’s The correlations obtained between the cognitive factor
importance measures (Wu & Zumbo, 2017) demonstrated and the affective and behavioral factors were 0.778 and
that both items were more strongly explained by their 0.574, respectively. Between them (affective and behavioral),
original factors. The fit indices of the instrument were the correlation was 0.530. The networks of partial correlations
adequate (χ2 = 777.708, gl = 348; p < 0.001; RMSEA = between the EAAE items are represented in Figure 3,
0.058 (0.053 – 0.064); CFI = 0.997; TLI = 0.996). The in which the size and density of the edges between the nodes
composite reliability of the factors was also acceptable (which represent the EAAE items) indicate the strength of
(above 0.70) for all factors. the existing correlation.
Figure 3
Partial correlation networks between EAAE items
Figure 3 reveals a network with many connections negative attitudes towards external assessments applied on
between the nodes, and especially strong connections a Large Scale. On the other hand, the behavioral dimension
emerge within each factor and between some items of the indicates positive attitudes, signaling that despite negative
cognitive and affective factors, reinforcing the existence beliefs and feelings, in general, the teachers have positive
of a strong correlation between them. From Figure 3, it is responses in the behavioral component, reflecting aspects
also possible to infer, although subjectively, the tripartite of cognitive dissonance (Yahya & Sukmayadi, 2020).
structure of the EAAE. For the scale as a whole, the mean score also indicates
positive attitudes.
This result is revealing and dialogues with the significant
Discussion increase in assessment systems based on accountability
policies on exams and the use of management models based
Initially, it should be noted that the mean values on corporate rationality, called the Global Education Reform
obtained by the cognitive and affective dimensions of the Movement (GERM) (Falabella, 2021). These policies
scale, indicated in Table 1, reflect, for the sample analyzed, involve credentialing, promotion, and inspection processes,
6
Soares, D. J. M. et al. (2024). Construct Validity and Reliability of the EAAE.
as well as rewarding or punishing schools and teachers, dimensions. This statistic reveals the absence of a uniform
resulting in greater school control. This management model conception of the object among the sample participants.
reflects the market ideology in education. Regarding the analysis of the internal structure,
Parcerisa et al. (2022) postulate that these policies the results indicate that the EAAE is a tool with adequate
regularize teaching practice, outlining its behaviors to the psychometric indicators and satisfactory factorial structure
State’s intentions around external assessments. Thus, even if that is consistent with the three-dimensional proposal of the
teachers disagree with Large Scale assessment policies, attitude construct, given that the factor analysis revealed
they adopt consonant practices via political mechanisms three factors that allowed to explain 64% of the total
of coercion created by the State, which may explain the variance. In addition, the items presented adequate and high
cognitive dissonance evidenced. factor loadings in their respective factors, whose composite
It should be noted that the state of Espírito Santo, reliability was also acceptable. The variation in the scores
Brazil, has assumed external assessment as the central axis also points to the relevance of the EAAE to discriminate
of educational policies since the beginning of this century. positive and negative attitudes towards the assessments.
To this end, several initiatives have been implemented to It should be noted that the use of scales with good
monitor student performance and the quality of education psychometric parameters is essential to ensure the accuracy
offered by schools, which are used as a basis for decision- and usefulness of the results in different contexts, including
making and the implementation of public policies education. In this respect, the statistical coefficients found
towards education. in the psychometric analyses legitimize its use. Therefore,
Among them, we highlight: the emergence of the by applying it to a specific target audience, the EAAE
Basic Education Assessment Program of Espírito Santo can generate discussions and reflections on the impact of
(PAEBES), in 2000, with the declared objective of assessments on teaching practice, their relationship with
assessing the performance of the state public network of social/demographic/economic variables, with the results
elementary and secondary education and its reformulations; achieved by different school units, among others, which
the implementation of the new common curricular base, based can contribute to the development of more effective public
on the notions of competencies and skills (Espírito Santo policies and educational practices.
State Department of Education, 2009); and the establishment It is also necessary to highlight the association between
of the School Development Index of Espírito Santo (IDE), the items of the cognitive and affective dimensions of the
and the performance bonus policy (Complementary Law EAAE evidenced by Figure 3, which, in fact, was expected.
No. 504 of 23 November 2009). As indicated by the literature, the beliefs and thoughts a person
In summary, the performance bonus policy provides has about an object influence their emotions and feelings
monetary rewards to teachers and other education associated with it (Eagly & Chaiken, 1993; Rosenberg &
professionals in the state who have achieved pre-established Hovland, 1960). In the case of this scale, it is understood
educational goals, based on the results of the PAEBES. that a professional who believes that external assessments
The amount to be received can reach up to 150% of the applied on a Large Scale “adequately assess the quality of
teacher’s base salary, which represents a significant amount teaching and learning” (Item 1) and/or “satisfactorily fulfill
for the category, with the potential to influence their daily the purpose of measuring students’ learning levels” (Item 3)
school activities. may feel positive emotions, such as appreciation for this type
This practice has generated some criticism and of assessment (Item 13), which explains the magnitude of the
controversy regarding its effectiveness. Soares et al. (2022b), connections between these items.
for example, showed that bonuses can be seen as a way of However, it is necessary to consider the external
pressuring teachers to achieve results at any cost, without validity of the scale, in terms of generalizing its results
considering the real working conditions and difficulties to other populations or contexts. The sample analyzed in
faced in the day-to-day life of the classroom. Moreover, this study was extracted from a context recognized by the
the authors argue that the vertical way in which the policy specialized literature as an Evaluative State, which decision-
was implemented generated harmful competition among making and allocation of technical and financial resources
teachers, pressuring them to focus only on the content that in the educational field is based on the metadata produced
is evaluated by PAEBES, to the detriment of other important by external assessments applied on a Large Scale (Costa
areas of knowledge, which would amount to a gaming and et al., 2019). It is important to consider this limitation in
score inflation tactic (Baidoo-Anu & Ennu Baidoo, 2022). the interpretation of the results obtained, as well as in the
On the other hand, some teachers see this policy as a way application of the EAAE in other populations.
of recognizing and valuing their work, allowing an increase Moreover, other factors or variables may affect the
in their remuneration. These teachers believe that bonuses attitudes of the subjects investigated that are not being
can encourage the improvement of student performance and, measured by the scale. This may limit this construct validity
consequently, improve the quality of the education offered and requires further studies. In fact, the evidence of validity
(Soares et al., 2022b). This contrast of opinions can be of any instrument needs to be continuously verified and,
identified by means of the high indices obtained for the thus, subsequent psychometric studies should be performed
coefficient of variation obtained for the total scale and its to investigate them in different contexts.
7
Paidéia, 34, e3410
8
Soares, D. J. M. et al. (2024). Construct Validity and Reliability of the EAAE.
Authors’ Contribution:
All authors made substantial contributions to the conception
and design of this study, to data analysis and interpretation,
and to the manuscript revision and approval of the final
version. All the authors assume public responsibility for the
content of the manuscript.
Associate editor:
Sônia Maria Guedes Gondim