100% found this document useful (1 vote)
5K views47 pages

Mettl Test For Abstract Reasoning

Uploaded by

ching chong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
5K views47 pages

Mettl Test For Abstract Reasoning

Uploaded by

ching chong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

welcome to brighter

METTL TEST FOR ABSTRACT


REASONING (MTAR)
- Technical Manual

Copyright © 2019 Mercer Mettl. All rights reserved.

Revised in 2022

Mettl Test for Abstract Reasoning Manual© may not, in whole or in part, be copied, photocopied, reproduced, translated, or converted to any electronic
or machine-readable form without prior written consent of Mercer Mettl.
Page 2

Table of Contents

Executive Summary ........................................................................................................................................4


Theoretical Foundations of the MTAR .............................................................................................................5
Definition of Intelligence .............................................................................................................................5
Fluid Intelligence/Abstract Reasoning ........................................................................................................6
Literature Review ...........................................................................................................................................7
Test Development & Standardization- MTAR ..................................................................................................9
Item Banking ...............................................................................................................................................9
Item Bank Development ............................................................................................................................10
Item Bank Calibration ...............................................................................................................................12
Psychometric Properties of MTAR ................................................................................................................14
Internal Consistency Reliability .................................................................................................................14
Validity......................................................................................................................................................14
Construct Validity .....................................................................................................................................15
Criterion Validity.......................................................................................................................................15
Group Differences: Adverse Impact Analysis .................................................................................................18
Administration, Scoring and Interpretation ..................................................................................................20
Summary Remarks and Recommendations for Use .......................................................................................21
Appendices ...................................................................................................................................................22
Appendix 1: Demographic details of Pilot study (N = 710) .......................................................................22
Appendix 2: Demographic Details of Standardization Study (N = 1218) ...................................................26
Appendix 3: Criterion Validity Results.......................................................................................................30
Appendix 4: Adverse Impact Analysis........................................................................................................31
Appendix 5: Sample Item and Sample Report ............................................................................................32
Sample Report...........................................................................................................................................33
Page 3

Appendix 5: Demographic details for the norming sample-Global (2021) .................................................34


Appendix 6: Demographic details for the norming sample-India (2021) ...................................................37
Appendix 7: Demographic details for the norming sample-Simplified Mandarin (2021) ...........................39
Appendix 8: Demographic details for the norming sample-Portuguese (2021) .........................................40
Appendix 9: Demographic details for the norming sample-Spanish (2021) ...............................................42
Appendix 10: Demographic details for the norming sample-Turkish (2021) .............................................45
References ....................................................................................................................................................46
Page 4

Executive Summary

The purpose of this technical manual is to describe the process of standardization and validation of Mettl’s
Test for Abstract Reasoning (MTAR). The Mettl Test for Abstract Reasoning (MTAR) is a nonverbal test
designed to measure an individual’s fluid intelligence; their ability to make meaning out of ambiguity and
manage new information and solve novel problems. Organizations across the globe use ability tests as part of
their hiring process. Empirical research has shown that cognitive ability tests are extremely useful in assessing
candidate’s capability to reason, solve problems and take appropriate decisions, all of which entail better work
outcomes. Also, in comparison with other methods of employment testing, especially interviews, which are
prone to subjective bias, cognitive tests are unbiased and objective in nature. In addition, we have seen that in
increasingly global and diverse employment settings, there is a growing need of Non-Verbal Reasoning Tests
which are free from any form of cultural bias. These tests are helpful for candidates from a diverse background
with whom English is not their first language. In our experience abstract reasoning tests are one of the most
used and effective tests in predicting job performance. The previous version of this test was used in hiring and
developmental initiatives in major industries like e-commerce, financial sector, manufacturing, retail, IT &
ITES and the results indicate a positive relationship between the MTAR and competencies such as; ambiguity
tolerance, learning agility and innovation.

Mettl’s Test for Abstract Reasoning is a test of inductive, rather than deductive reasoning. That is, it requires
respondents to look for patterns in information and then generalise those patterns to the next space in a
sequence. It is a non-verbal and abstract measure that uses shapes and patterns to test respondent’s lateral
thinking abilities. It measures the following capabilities of test takers:
• Ability to understand and detect the meaning behind data or given information.
• Ability to identify the relationship between subtle ideas.
• Ability to think abstractly, finding patterns and relationships to solve novel problems.
• Ability to grasp the bigger picture, think clearly and effectively solve complex problems.
• Ability to process and analyse ambiguous information.
• Ability to think creatively and come up with innovative solutions.
Page 5

• Ability to learn new skills quickly and efficiently.

The test consists of increasingly difficult pattern matching tasks and has little dependency on language
abilities. Each item in the MTAR comprises a pattern of diagrammatic puzzles with one piece missing. The
candidate’s task is to choose the correct missing piece from a series of possible answers. The following goals
guided the development of the MTAR. The test must be:
 Relevant: The test is designed to measure an individual’s ability to find patterns in information, solve
problems and deal with abstract situations.
 Credible: This manual outlines the statistical evidence of reliability, validity and therefore credibility
of the assessment.
 Easy to Use and Interpret: The assessment has been designed to have simple and easy to understand
instructions. The feedback reports are also simple to interpret.
 Convenient: The assessment is short and takes no more than 20-30 minutes to complete on average.
The assessment is available online and accessible from anywhere in the world.
 Free from cultural biases: The test has undergone statistical analysis to ensure it is free from any bias
or adverse impact.
 In line with International Standards of Psychological testing: The MTAR has been developed in line
with the Uniform Guidelines on Employee Selection Procedures (EEOC, 1978), the Society for
Industrial and Organizational Psychology's Principles for the Validation and Use of Personnel Selection
Procedures (SIOP, 2003), EFPA Test Review Model and the Standards for Educational and
Psychological Testing developed jointly by the American Educational Research Association, American
Psychological Association, and National Council on Measurement in Education (1999).

Theoretical Foundations of the MTAR


Definition of Intelligence

Kanazawa, 2004i; Sternberg, 1997ii and Weinberg, 1989iii separately defined intelligence as:

• The mental abilities that enable one to adapt to, shape, or select one’s environment.
Page 6

• The ability to deal with novel situations.


• The ability to judge, comprehend, and reason.
• The ability to understand and deal with people, objects, and symbols.
• The ability to act purposefully, think rationally, and deal effectively with the environment.

The intellectual or cognitive ability of an individual cannot be based on a single function or capacity. Therefore,
psychologists attempted to identify the various components of intelligence. This resulted in theories and
models of intelligence such as Spearman’s two-factor theory, Cattel’s theory of fluid and crystalized
intelligence, Thurstone’s theory of primary mental abilities, Gardner’s theory of multiple intelligence and
Sternberg’s triarchic theory etc. Spearman’s two factor theory proposed that intelligence is a general cognitive
ability which energizes diverse mental faculty and functions. According to Spearman there are two
components of intelligence. General intelligence or ‘G’ which influence the performance on all mental tasks,
while specific intelligence influences abilities on a particular task. On the other hand, Thurstone proposed that
intelligence consists of seven primary abilities, namely: reasoning, verbal comprehension, numerical ability,
word fluency, perceptual speed, spatial visualization and associative memory. Gardner alternatively proposed
eight distinct types of intelligence which included musical, kinaesthetic, spatial and inter as well as intra-
personal ability. Sternberg triarchic theory of intelligence involves three different factors namely; analytical,
creative and practical intelligence. In summary, despite considerable debate on the definition and exact nature
of intelligence, it is still not distinctly conceptualized. However, Spearman’s two factor theory and Horn &
Cattel’s theory of fluid and crystallized intelligence are the two most dominant theories of intelligence and
they are also more psychometrically sound and empirically tested. Therefore, we used these theories in
conceptualizing our cognitive tests especially the Mettl Test for Abstract Reasoning and the Mettl General
Mental Ability Test.

Fluid Intelligence/Abstract Reasoning

The MTAR is based on Horn & Cattell (1967)iv theory of fluid and crystallized intelligence. According to Cattell
(1987)v intelligence is broadly classified into two distinct factors – fluid and crystallized intelligence. Fluid
intelligence is the ability to reason and use novel information, it includes the ability to distinguish
Page 7

relationships, solve novel or unfamiliar problems, and expand their knowledge base with new information. On
the other hand, crystallized intelligence is the capability to acquire skills and knowledge and apply that
knowledge in specific situations.

Cattel (1987), believed the label ‘fluid intelligence’ reflected the construct’s quality of being applicable to
almost any problem which is why it is assessed with nonverbal or graphical items. The term fluid is intended
to indicate that fluid intelligence is not tied to any specific habits or sensory, motor, or memory area (Cattell,
1987). Fluid intelligence is a basic reasoning ability that can be applied to any problem, including unfamiliar
ones. It is an essential aspect of human cognition because it allows us to adapt to novel and challenging
situations and helps in figuring things out. It also represents the ability to detect meaningful patterns and
relationships.

Figure1: Difference between Fluid and Crystallized Intelligence

Literature Review

Intelligence is one of most investigated and significant predictors of real-world outcomes like academic
performance, training performance and on-the-job performance (Kuncel & Hezlett, 2007vi; Salgado, Anderson,
Page 8

Moscoso, Bertua, & de Fruyt, 2003vii; Schmidt & Hunter, 1998viii). As per the findings of meta-analysis
conducted by Postlethwaite (2011)ix fluid intelligence is a significant predictor of performance in high
complexity occupations. Fluid intelligence includes basic cognitive abilities which are essential to assimilate
critical evidence about a problem or decision. To answer [abstract reasoning] questions, a person must
generate hypotheses, test them, and infer rules (Carpenter, Just, & Shell, 1990). Fluid intelligence is also
significantly related to metacognition and high reasoning and problem-solving ability (Cattel, 1971). Duncan,
Burgess, and Emslie (1995)x believed that fluid intelligence relies on prefrontal cortex activation and it may
be the best measure of executive functioning. Zook et al, (2006)xi also reported the significant role of fluid
intelligence in executive functioning which is measured in terms of solving complex and goal-directed
problem-solving tasks successfully.

Kuncel, Hezlett, and Ones (2004)xii believe that both fluid and crystallized intelligence play important roles in
the work setting. Effective job performance depends both on the effective processing of new information as
well as prior learning and experience. For efficient workplace functioning it is important that employees
should possess both technical knowledge and the ability to acquire new knowledge. This will allow them to
efficiently use new information to solve novel problems. In sum, “selecting employees for their ability to solve
problems that don’t exist today…to be able to learn new technologies quickly” is the need of the contemporary
organization (Baker, 1996)xiii. In order to predict job performance accurately we also offer tests to measure
numerical and verbal reasoning which measure crystallized intelligence of the candidate and also a broad
measure of ‘G’ with general mental ability test.

Fluid intelligence is also proven to be a significant predictor of an individual’s ability to multitask (Ben-
Shakhar & Sheffer, 2001xiv; König & Mürling, 2005xv). Individuals who score high on fluid intelligence/abstract
reasoning tests are good at managing large amounts of information and prioritising. A large amount of
research also suggests a strong link between fluid intelligence and working memory (Ackerman, Beier, &
Boyle, 2005xvi; Kane & Engle, 2002xvii). Lastly, fluid intelligence is also proved to be a significant determinant
of learning, specifically in novel conditions (Kvist & Gustafsson, 2008xviii; Watkins, Lei & Canivez, 2007xix). It is
because an individual’s early learning phase is generally disorganized and ambiguous and their ability to
conceptualize and make meaning out of ambiguity is more important at this stage. Therefore, fluid intelligence
Page 9

is proven to be a significant predictor of learning (Primi, Ferrão & Almeida, 2010xx)

Table 1: Summary of Literature Review of Fluid intelligence and job performance

Research Study Major findings


Postlethwaite (2011) Fluid intelligence is a significant predictor of performance in high
complexity occupations.
Cattel (1971) Fluid intelligence is also significantly related to metacognition and
high reasoning and problem-solving ability.
Duncan, Burgess, and Emslie (1995); Fluid intelligence is the best measure of executive functioning.
Zook et al, (2006)
Ben-Shakhar & Sheffer (2001); König Fluid intelligence significantly predicts an individual’s ability to
& Mürling (2005) multitask.
Ackerman, Beier, & Boyle (2005); Fluid intelligence and working memory are significantly positively
Kane & Engle (2002) correlated with each other.
Kvist & Gustafsson (2008); Primi, Fluid intelligence is also proven to be a significant determinant of
Ferrão & Almeida (2010) learning.

Test Development & Standardization- MTAR


The development and standardization study were conducted between April and September 2019.

Item Banking
MTAR is developed using an item banking approach to generate multiple equivalent forms to support item
randomization. The term ‘item bank’ is used to describe a group of items which are organized, classified and
catalogued systematically. According to the research conducted by Nakamura (2000)xxi Item Response Theory
(IRT) facilitates item banking standardization by calibrating and positioning all items in the test bank on the
same latent continuum by means of a common metric. This method can be further use to add additional items
in the test bank to increase the strength of the item bank. IRT also allows construction of equivalent and
Page 10

multiple tests as per the predefined test blueprint.

Our item bank is developed as per the test composition plan which is based on two parameters; representation
of all types of item content and inclusion of easy, medium and difficult items. xxii. In an abstract reasoning test
individual items are designed as per certain rules like shape, size, addition or subtraction of elements,
movement etc. Test composition is defined by a specified number or percentage of items from various content
domains/rules as well as equal numbers of easy, medium and difficult items. It is used to develop a uniform
content outline which is crucial to confirm the construct validity of the test. In an item bank there are more
questions than are needed for each candidate. This enables random generation of items within certain
parameters to ensure each test is no more or less difficult than the last. Although item characteristics can be
estimated with the help of both Classical Test Theory (CTT) and IRT models, the psychometric literature
indicates the IRT method is more suitable for an item banked test (Embretson & Reise, 2013xxiii; Van der
Linden, 2018xxiv). The classical item and test statistics based on the CTT model vary depending on sample
characteristics whereas an IRT model provides ‘sample free’ indices of item and test statistics. Therefore, we
use item response theory to standardize our item banks.

The advantage of using item bank methodology is as follows:


 All items in the bank are calibrated/validated in terms of psychometric properties with the help of
item response theory.
 Item banking also enables us to generate equivalent but different tests which can be randomly
assigned to test respondents.
 Item banks randomise questions which helps to prevent cheating or piracy of items.
 New items can be added to the bank simultaneously and over exposed items can be retired when they
reach a specific level.
 Fair and non-discriminatory items are only included in the item bank which reduces the adverse
impact for different groups and produces fair assessments for all the candidates.

Item Bank Development


The development of items typically goes through the following stages:
Page 11

1. Item construction
2. Item review by a panel of experts
3. Pilot testing of items
4. Review of item properties based on pilot data
5. Test administration on representative sample
6. Analysis of item properties and test properties
7. Item finalization and development of item bank
Item Writing
The MTAR consists of matrices with black and white geometrical figures. Candidates are given a three by three
matrix which consists of eight cells containing geometric patterns and one of nine blocks in the matrix is left
blank. Candidates must find the logical rules that govern how the sequence progresses horizontally or
vertically and identify from these the next shape that should fill the blank space. Little or no use of language
or any pre-existing knowledge is required when completing the questions. The development of this test was
done in four broad stages; item creation, multiple rounds of item review, pilot testing and standardization. In
the first stage of item creation a large pool of 170 items were developed by subject matter experts and
psychometricians. Detailed in-depth interviews were conducted to explore with SMEs which item images and
item reasoning should be used.

The following general rules were followed when designing the items:
a. Images/ shapes used should be neutral and not include any culturally specific elements.
b. Images/ shapes should be clear to comprehend, and unambiguous e.g. no blurred lines.
c. There should be a balanced mix of easy, medium and high level of difficulty items.
d. There should be a balanced mix of items with a different number of logical rules included in an item.
Item Review
Item reviews were conducted by our in-house Psychometricians, who have over 10 years of research
experience. Items and answer keys were both reviewed in depth. The difficulty level and item logic of each
item was reviewed thoroughly. The items were also analysed in terms of cultural- neutrality so that no ethnic
or cultural group would be advantages or disadvantaged due to culturally specific images. All items that did
not meet these strict standards were removed. Out of 170 original items a pool of 90 items were finalized for
Page 12

the next step, after multiple rounds of item review.

Item Bank Calibration


Stage 1: Item trial for item difficulty estimation

Procedure: In the first stage we conducted a pilot study and individual item parameters were estimated using
a Rasch Model. The objective of conducting the pilot study was to ascertain the basic item properties especially
item difficulty of all 90 items in the first stage. 90 items were divided into three equivalent sets and data was
collected from online administration of all three sets. All the items were mandatory, and participants were not
allowed to skip the item without responding. Only respondents with at least a 90% completion rate were
included in the sample and those with less than 90% completion rate were not included in final data set. This
resulted in 233, 234 and 243 responses in the three sets respectively.

Sample Details: In the first stage data was collected from 710 respondents. 45.5% of respondents from the
total sample were male, 44% of respondents were female, 1.7% of respondents chose ‘other’ as their gender
and 9% of respondents preferred not to disclose. 32% of the respondent’s native language was English and
the mean age of the sample was 31 years. A detailed description of the sample is reported in Appendix 1.

Analysis: A Rasch Model was used to ascertain item properties at stage 1 due to a smaller sample size. This
model provides stable estimates with less than 30 responses per item. A Rasch Model is the one parameter
model of Item Response Theory which estimates the probability of correct responses to a given test item based
on two variables: difficulty of an item and the ability of the candidate. The primary function of this model is to
provide information on item difficulty which helps to organize the test items according to difficulty level,
spread of item difficulty and test length. This helps to ultimately increase the measurement accuracy and test
validity. Based on the findings of the Rasch model, items exhibiting extreme b parameters were rejected at this
stage. Values substantially less than -3 or greater than +3 were regarded as extreme. 21 items from the initial
pool of 90 items got removed at this stage.

Stage 2: Item bank calibration and estimation of psychometric properties of test


Page 13

Procedure: A total of 69 items survived the pilot study stage. These were arranged in terms of difficulty
parameters and then divided into 3 sets of 23 items each for final stage data collection. The objective of the
second stage of data collection was to standardize the item bank and ascertain the essential psychometric
properties (reliability and validity) of the test. All the items were mandatory at this stage and participants
were not allowed to skip the item without responding. Only respondents with a 90% completion rate were
included in the sample and those with less than 90% completion rate were not included in final data set. This
resulted in 486, 365 and 367 responses in all three sets respectively.

Sample: In the second stage, data was collected from 1218 respondents. 52.6 % of respondents from the total
sample were male, 44.5% of respondents were female, and 2.8 % of respondents identified their gender as
‘other’. 28% of the respondent’s native language was English and the mean age of the sample was 31.9 years.
A detailed description of the sample is reported in Appendix 2.

Figure2: Sample Item Characteristic Curve

Analysis: In the second stage of analysis we used a two-parameter model which advocates that the probability
of the correct response is a function of both item difficulty and the respondent’s proficiency. The two
parameter IRT model provides meaningful estimates of item difficulty and item discrimination. For the
finalization of items in the item bank, the following procedure was followed:
Page 14

 Items displaying b parameter (item difficulty) larger than -3 or greater than 3 and above were
removed from the data set.
 Items displaying a parameter (item discrimination) less than .2 were also removed at this stage.

Two out of 69 items were removed and meaning the final bank consist of 67 items with a balanced spread of
easy, medium and difficult items.

Psychometric Properties of MTAR


Internal Consistency Reliability
A commonly used indicator of internal consistency reliability is Cronbach’s alpha, an index of internal
consistency obtained by examining the homogeneity of the items/questions within an assessment and its
value ranges from 0 to 1. As per the APA Standards there are three broad categories of reliability coefficients;
alternate form coefficients, test retest coefficients and internal-consistency coefficient. In the present study,
we computed Cronbach alpha coefficients which are based on the relationships between scores derived from
individual items within the MTAR and all data accrued from a single test administration. As per the APA
Standards “A higher degree of reliability is required for score uses that have more significant consequences
for test takers”. The EFPA BOA test review model also provides guidance on the Cronbach alpha values and
according to them under some conditions a reliability of 0.70 is considered good. For the 3 sets of abstract
reasoning tests generated, the median reliability (internal consistency) was 0.7 and the inter quartile range
was 0.67 to 0.72. The range of SEM across all three sets was .08 to.09 only.

Validity
Validity is the most fundamental property of any psychological test. It involves accumulating relevant scientific
evidence for test score interpretation. The APA Standardsxxv say that there are four major sources of evidence
to consider when measuring the validity of a test; evidence based on test content, evidence based on response
processes, evidence based on internal structure and evidence based on relationship with other variables
especially criterion variables. In order to ascertain the validity of MTAR we collected evidence based on
internal structure (construct validity) and, evidence based on relationship with other variables especially
Page 15

criterion variables (criterion related validity).

Construct Validity
The purpose of the construct validation is to ascertain whether the test measures the proposed construct or
something else. The most common method of ascertaining the construct validity of an assessment is
exploratory and confirmatory factor analysis. We used the CFA method because our objective is to test a
predefined unidimensional measurement model. One of the most important assumptions of using an IRT
model as a measurement system is that it includes unidimensional items from the item bank. Therefore, in
order to establish construct validity evidence confirmatory factor analyses was used. The CFA results
confirmed the unidimensional factor structure with fit statistics that were satisfactory. As per the CFA model
the fit indices were as per the norms (IFI = .927; RMSEA = .02; CFI = .919 and TLI = .903).

Criterion Validity
Criterion-related validity evidence indicates the extent to which assessment outcomes are predictive of
employee performance in a specified job or role. In order to establish the criterion-related validity, there are
two major methods used:
1. Concurrent Validity: In this method, data on the criterion measures are obtained at the same time
as the psychometric test scores. This indicates the extent to which the psychometric test scores
accurately estimate an individual’s present job performance.
2. Predictive Validity: In this method, data on criterion measures are obtained after the test. This
indicates the extent to which the psychometric test scores accurately predicts a candidate’s future
performance. In this method, tests are administered to candidates when they apply for the job and
their performance is reviewed after six months or a year. Afterwards, their scores on the two
measures are correlated to estimate the criterion validity of the psychometric test.

In order to ascertain MTAR validity, concurrent criterion-related validity evidence was gathered where the
performance data and MTAR score were both collected at the same time. Then the relationship between these
two variables was tested and significant relationships were found. It is important to note here that in criterion
related validity analysis, the precision and relevance of criterion data/employee performance data is
Page 16

extremely vital. Error in measurement of the criterion is a threat to accurate assessment of the test’s validity.
Error in criterion measurement may attenuate the relationship between test score and criterion variables, and
thus lead to an erroneous criterion-related validity estimate. The basic criteria of appropriateness or quality
is as follows. Researchers should
• Have a clear and objective definition and calculation of performance levels.
• Have alignment with key demands of the role.
• Have crucial implications on business outcomes.
• Produce reasonable variance to effectively separate various performance levels.

Study Procedure: In the present study MTAR scores were used as the predictor variable and respondent’s
competency score on the basis of Line-managers ratings were used as the criterion variable. Data was collected
from a multinational company which specializes in HR Consulting. A sample of 150 employees from this
organization were invited to participate in the study and the purpose of conducting the assessments were
explained to them in detail. After collecting responses from the employees on the MTAR a detailed
competency-based performance rating form was completed by their respective line managers. In the
competency-based performance rating form all competencies were defined, and respondents were asked to
rate the competency on a 10-point rating scale (1 =low and 10 = high). Pearson product correlation method
was used to test the relationship between the MTAR score and their competency ratings.

Sample: A Total of 114 employees participated in the study and completed the MTAR. We received managerial
ratings on competencies for only 88 of these respondents. The mean age of the sample was 35 years, 57% of
respondents were male and 43% were female. 73% of the respondents worked as Analysts and Consultants
and the remaining 27% were Leaders and Product owners.

Analysis: Pearson product correlation method was used to test the relationship between the MTAR score and
line manager competency ratings. Results indicate significant positive correlations between the MTAR score
and competency ratings. MTAR score is positively correlated with analytical ability (r = .325, p <.01), Critical
thinking (r = .28, p <.01) Innovation (r = .309, p <.05) and High potential (r = .244, p <.05). These correlation
coefficients are not corrected for attenuation or range restriction. MTAR score is also positively correlated
Page 17

with learning orientation, employability and ability with numbers (refer to Appendix 3, table 1).
Page 18

Group Differences: Adverse Impact Analysis

Definition of Adverse Impact (UGESP, 1978)


The Uniform Guidelines on Employee Selection Procedures (UGESP, 1978xxvi) defines Adverse Impact as “a
substantially different rate of selection in hiring, promotion, or other employment decisions which works to
the disadvantage of members of a race, sex or ethnic group” (see section 1607.16). UGESP recommends the
four-fifths rule for examining the potential of Adverse Impact, stating that the “selection rate for any race, sex
or ethnic group which is less than four-fifths (4/5) (or 80%) of the rate for the group with the highest rate
will generally be regarded by the Federal enforcement agencies as evidence of adverse impact.” (1978, see
section 1607.4 D). Courts have also applied this rule to cases involving age discrimination. The Age
Discrimination in Employment Act (ADEA) of 1967 prohibited discrimination in selection contexts against
individuals 40 years of age or older. In addition, the UK’s Equality Act (2010) legally protects people from
discrimination in the workplace and in wider society. Researchers have proposed alternative methods for
examining Adverse Impact (e.g., moderated multiple regression, one-person rule, and the N of 1 rule),
although none have been as widely adopted as the four-fifths rule. Additionally, a statistical significance test
for mean group differences on individual assessment scales is often considered informative.

In the present study group differences based on age, gender, and ethnicity for the MTAR were examined and
reported in table 1- 3 (refer to Appendix 4). Table 1 presents the comparisons of mean group differences
between gender and MTAR score. Results clearly suggest that there is a significant difference in mean score
between male and female respondents. However, based on traditional ranges for interpreting effect sizes
(Cohen’s d; Cohen, 1988), the difference is medium. Table 2 indicates Mean scores for two groups; those 40
years of age and less than 40 years of age. Results indicate these differences are statistically significant,
nonetheless an examination of effect sizes indicates the difference is small. We examined the mean differences
in MTAR scores between two groups– White (reference group) and non-whites (focal group). Results indicate
these differences were not statistically significant.

Additionally, in order to test the impact of English language skills on MTAR scores, we examined the mean
difference in MTAR score between native English speakers and non-English speakers. Results indicate these
Page 19

differences were statistically significant, but the effect size was small. This finding clearly indicates that MTAR
is free from language bias and it’s a global and culture agnostic tool (refer to Appendix 4, table 4).
Page 20

Administration, Scoring and Interpretation

Test Administration
MTAR is an online test administered through an internet-based testing system designed by Mettl for the
administration, scoring, and reporting of occupational tests. Test takers are sent a test link to complete the
test and candidate/test taker data is instantly captured for processing through the online system. Test scores
and interpretive reports are instantly generated. Tests can also be administered remotely but most
importantly all candidates’ data, question banks, reports and benchmarks are stored in a well-encrypted and
highly regarded cloud service. In order to prevent cheating and all forms of malpractices, Mettl’s platform also
offers AI powered anti-cheating solutions that include live monitoring, candidates’ authenticity check, and
secure browsing.

Scoring
Responses to MTAR are scored based on how many correct answers a respondent chooses. Each item consists
of 5 answer options, of which only one is correct. Each item answered correctly is awarded 1 mark and items
answered incorrectly or not attempted are given a 0 (zero) mark. An individual’s overall score is an average
of all items answered correctly. Next, we convert raw scores into sten scores using the formula given below,
which brings these scores into a 10-point scale.

(Z-score * 2) + 5.5
OR
[(X-M)/SD] *2 + 5.5 (same as above)
Test Composition
Each test taker will be asked to complete 23 items in 30 mins. Sample items and a sample report are reported
in Appendix 5.

Interpretation
The MTAR measures the abstract reasoning ability of the test takers working in variety of individual
contributor or managerial roles. This test is suitable to be used in both recruitment and development settings.
Page 21

Abstract reasoning is defined as ability to think laterally, examine problems in unique and unusual ways and
make fresh connections between different concepts. A high score on the MTAR indicates that the test taker
possesses higher ability to solve complex problems by identifying patterns and their underlying rules. A high
score also indicates greater ability to solve problems effectively and perform well in novel situations.

Summary Remarks and Recommendations for Use

Mettl recommends that the MTAR is used with the following caveats and tips in mind:
 Use with other tests: The MTAR, like any other hiring tool, is best used as part of a systematic selection
process, along with other scientifically developed and job-relevant predictors of future success.
Ideally, the MTAR should be administered to job applicants who possess the minimum requirements
for the job. The assessment results can serve as an important part of the hiring decision – but not the
only one.
 Aggregate results: The MTAR, when used with large numbers of job applicants, in recommended ways,
will yield a better-quality workforce over time. However, like with any assessment of human abilities,
it is not infallible and should be used in conjunction with other information and followed up with
behavioural tools such as structured interviews and competency assessment.
 Simple to complex: If the primary focus is to screen out candidates unlikely to succeed, hiring managers
should focus on eliminating those “not recommended for hire” from the pool first. Those remaining,
should be prioritized as those “recommended for hire” and then consider those “cautiously
recommended for hire”.
Page 22

Appendices
Appendix 1: Demographic details of Pilot study (N = 710)

Table 1: Gender
Gender Frequency Percent
Male 323 45.5
Female 312 43.9
Others 12 1.7
Prefer not to say 63 8.9

Table 2: Age
Age Frequency Percent
20 - 30 years 399 56.2
31-40 years 196 27.6
41-50 years 74 10.4
51-60 years 41 5.8

Table 3: Years of work experience


Years of work experience Frequency Percent

0-5 years 415 58.5


6-10 years 123 17.3
11-15 years 62 8.7
16-20 years 54 7.6
20+ years 56 7.9

Table 4: Educational Qualification


Educational QualificationFrequency Percent
Non-Graduate 112 15.8
Bachelors 337 47.5
Masters 219 30.8
Doctorate 42 5.9
Page 23

Table 5: Employment Status


Employment Status Frequency Percent

Student 164 23.1


Seeking Employment 207 29.2
Working 339 47.7

Table 6: Job Level


Job Level Frequency Percent
Level 1: Executive Officers: Senior-most Leaders (CEO + One Level Below) 109 15.4

Level 2: Senior Managers/Directors: Senior Management (Three Levels Below CEO). 73 10.3

Level 3: Managers/Supervisors: Middle management to first-level managers (Five Levels 135 19.0
Below CEO)
Level 4: Entry Level: Non-management/ individual contributor (including entry level) 193 27.2

Not Applicable 200 28.2

Table 7: Industry Details


Industry Frequency Percent

Consulting 122 17.2


Education 81 11.4
Financial services, Banking, Insurance 99 13.9
Government, Public service, Defence 38 5.4
Health Care 29 4.1
Human Resources 6 0.8
Information Technology & Telecommunications 60 8.5
Manufacturing & Production 43 6.1
Not Applicable 95 13.4
Others 97 13.7
Professional services 22 3.1
Publishing, Printing 5 0.7
Trading 13 1.8
Page 24

Table 8: Nature of Occupation


Nature of Occupation Frequency Percent

Architecture and Engineering 98 13.8


Arts, Design, Entertainment, Sports, and Media 60 8.5
Building and Grounds Cleaning and Maintenance 19 2.7
Business and Financial Operations 96 13.5
Community and Social Service 7 1.0
Computer and Mathematical 36 5.1
Construction and Extraction 3 0.4
Education, Training, and Library 25 3.5
Farming, Fishing, and Forestry 2 0.3
Food Preparation and Serving Related 5 0.7
Healthcare Practitioners and Technical 9 1.3
Healthcare Support 5 0.7
Installation, Maintenance, and Repair 4 0.6
Legal 8 1.1
Life, Physical, and Social Science 7 1.0
Management 42 5.9
Military Specific 7 1.0
Not Applicable 107 15.1
Office and Administrative Support 16 2.3
Others 91 12.8
Personal Care and Service 3 0.4
Production 13 1.8
Protective Service 3 0.4
Sales and Related 27 3.8
Transportation and Material Moving 17 2.4
Page 25

Table 9: Nationality
Nationality Frequency Percent

Africa 72 10.1
Asia 231 32.5
Australia & NZ 29 4.1
Europe 162 22.8
LATAM 26 3.7
UK 104 14.6
US & Canada 54 7.6
Not disclosed 32 4.5

Table 10: Ethnicity


Ethnicity Frequency Percent
Asian 226 31.8
Black 96 13.5
Chinese 30 4.2
Prefer not to say 110 15.5
White 248 34.9
Page 26

Appendix 2: Demographic Details of Standardization Study (N = 1218)

Table 1: Gender
Gender Frequency Percent

Male 641 52.6


Female 543 44.6
Others 34 2.8

Table 2: Age
Age Frequency Percent

20-30 years 709 58.2


31 to 40 years 294 24.1
41 to 50 years 125 10.3
51 to 60 years 90 7.4

Table 3: Work Experience


Work experience Frequency Percent

0-5 years 718 58.9


6-10 years 194 15.9
11-15 years 122 10.0
16-20 years 83 6.8
20+ years 101 8.3

Table 4: Educational Qualifications


Educational QualificationsFrequency Percent

Non-Graduate 168 13.8


Bachelors 624 51.2
Masters 384 31.5
Doctorate 42 3.4
Page 27

Table 5: Employment Status


Employment Status Frequency Percent

Seeking Employment 340 27.9


Student 279 22.9
Working 599 49.2

Table 6: Job Level


Job Level Frequency Percent

Level 1: Executive Officers: Senior-most Leaders (CEO + One Level Below) 151 12.4

Level 2: Senior Managers/Directors: Senior Management (Three Levels Below CEO). 137 11.2

Level 3: Managers/Supervisors: Middle management to first-level managers (Five Levels 230 18.9
Below CEO).
Level 4: Entry Level: Non-management/ individual contributor (including entry level). 382 31.4

Not Applicable 318 26.1

Table 7: Industry
Industry Frequency Percent

Consulting 197 16.2


Education 98 8.0
Financial services, Banking, Insurance 166 13.6
Government, Public service, Defence 43 3.5
Health Care 41 3.4
Human Resources 27 2.2
Information Technology & Telecommunications 147 12.1
Manufacturing & Production 66 5.4
Not Applicable 153 12.6
Others 120 9.9
Professional services 28 2.3
Publishing, Printing 5 0.4
Trading 13 1.1
Page 28

Table 8: Nature of Occupation


Nature of Occupation Frequency Percent

Architecture and Engineering 167 13.7


Arts, Design, Entertainment, Sports, and Media 70 5.7
Building and Grounds Cleaning and Maintenance 23 1.9
Business and Financial Operations 187 15.4
Community and Social Service 14 1.1
Computer and Mathematical 90 7.4
Construction and Extraction 11 0.9
Education, Training, and Library 27 2.2
Farming, Fishing, and Forestry 6 0.5
Food Preparation and Serving Related 6 0.5
Healthcare Practitioners and Technical 14 1.1
Healthcare Support 6 0.5
Installation, Maintenance, and Repair 6 0.5
Legal 14 1.1
Life, Physical, and Social Science 13 1.1
Management 172 14.1
Military Specific 5 0.4
Not Applicable 145 11.9
Office and Administrative Support 18 1.5
Others 129 10.6
Personal Care and Service 5 0.4
Production 25 2.1
Protective Service 4 0.3
Sales and Related 40 3.3
Transportation and Material Moving 21 1.7
Page 29

Table 9: Nationality
Nationality Frequency Percent

Africa 136 11.2


Asia 399 32.8
Australia & NZ 52 4.3
Europe 262 21.5
LATAM 56 4.6
UK 148 12.2
USA & Canada 95 7.8
Not disclosed 70 5.7

Table 10: Ethnicity


Ethnicity Frequency Percent

White 374 30.7


Black 122 10.0
Asian 357 29.3
Chinese 32 2.6
Prefer not to say 333 27.3
Page 30

Appendix 3: Criterion Validity Results

Table 1: Correlation Analysis (N = 88)


Competencies Correlation

Change Agility -0.002


Verbal Comprehension 0.109
High Potential .244*
Learning Orientation 0.206
Employability 0.208
Critical Thinking .280**
Collaboration 0.030
Innovation .309*
Organizational Citizenship 0.239*
Ability with Numbers 0.27*
Communication and Influence 0.049
Result Focus 0.150
Analytical Ability .325**
Client Service 0.032
*significant at .05 level; **significant at .01 level
Page 31

Appendix 4: Adverse Impact Analysis


Table 1: Mean differences - Gender
Gender N Mean SD F value Effect size
Male 641 12.78 3.68
Female 543 11.72 3.59 12.46** 0.29
Others 34 12.02 3.05
**significant at .01 level

Table 2: Mean differences – Age group


Age N Mean SD t value Effect size
Less than 40 years 1003 12.46 3.67
3.53* 0.27
More than 40 years 215 11.49 3.50
*significant at .05 level

Table 3: Mean differences – Ethnicity


Ethnicity N Mean SD t value Effect size
White 374 12.45 3.56 0.30 0.06
Non-white 844 12.22 3.71

Table 4: Mean differences – Native Language


Native Language N Mean SD t value Effect size
English 404 11.91 3.78 2.5** -0.15
Others 814 12.47 3.58
**significant at .01 level
Page 32

Appendix 5: Sample Item and Sample Report


Page 33

Sample Report
welcome to brighter

Appendix 5: Demographic details for the norming sample-Global (2021)


The Norms for Abstract Reasoning Standardized English for the Global region have been developed for a
representative sample of 10785 respondents. These norms are based on responses from candidates who have
attempted 23 questions (9 easy, 9 medium and 5 difficult) randomly. The average time taken for completing this
assessment was 24.42 minutes.

Table 1: Gender distribution of the Norming sample (n=10785)


Gender % of Sample

Female 18.28

Male 39.24

Did not Specify 42.48

Table 2: Age group distribution of the Norming sample (n=10785)


Age (in years) % of Sample
Upto 20 Years 9.75

20-30 Years 23.90

30-40 Years 15.71

40-50 Years 10.50

50-60 Years 2.65

Above 60 Years 0.46

Not Specified 37.03

A business of Marsh McLennan


Page 35

Table 3: Region based distribution of the Norming Sample (n=10785)


Region % of Sample
APAC 51.28

Europe 2.04

Middle East 1.91

Africa 0.83

USA and Canada 0.57

UK 0.05

LATAM 0.01

Not Specified 43.28

Table 4: Industry based distribution of the Norming sample (n=10785)


Industry % of Sample

Information Technology and Services 22.38

Education Management 12.68

Higher Education 11.63

Telecommunications 9.53

Management Consulting 6.74

Railroad Manufacture 4.21

Professional Training & Coaching 2.93

Banking 2.30

Consumer Services 2.29

Automotive 2.16

Computer Software 1.91

Consumer Goods 1.91

Human Resources 1.59


Page 36

Industry % of Sample

Market Research 1.42

Financial Services 1.32

Insurance 1.10

Others 13.89
Page 37

Appendix 6: Demographic details for the norming sample-India (2021)

The new Norms for Abstract Reasoning Standardized English for the Indian region have been developed on 4886
respondents. These norms are based on responses from candidates who have attempted 23 questions (9 easy, 9
medium and 5 difficult). The average time taken for completing this assessment was 23.59 minutes.

Table 1: Gender distribution of the Norming sample (n=4886)


Gender % of Sample

Female 28.98

Male 65.82

Did not Specify 5.20

Table 2: Age based distribution of the norming Sample (n=4886)


Age (In years) % of Sample

Upto 20 Years 21.43

20-30 Years 39.23

30-40 Years 18.60

40-50 Years 14.24

50-60 Years 3.99

Above 60 Years 0.49

Not Specified 2.01


Page 38

Table 3: Industry based distribution of the norming Sample (n=4886)


Industry % of Sample

Information Technology and Services 35.39

Higher Education 25.40

Consumer Services 4.93

Human Resources 3.48

Management Consulting 3.05

Market Research 2.97

Automotive 2.74

Insurance 2.25

Consumer Goods 2.21

Computer Software 2.09

Financial Services 2.07

Computer Hardware 1.74

Others 10.50
welcome to brighter

Appendix 7: Demographic details for the norming sample-Simplified Mandarin (2021)


The new Norms for Abstract Reasoning Simplified Mandarin for China region have been developed for 1471
respondents. These norms are based on responses from candidates who have attempted 23 questions (9 easy, 9
medium and 5 difficult) randomly. The average time taken for completing this assessment was 26.60 minutes.

Table 1: Age based distribution of the norming Sample (n=1471)


Age (In years) % of Sample

20-29 Years 13.93

40-49 Years 13.66

50-59 Years 4.01

Above 60 0.13

Not Specified 68.25

Table 2: Industry based distribution of the norming Sample (n=1471)


Industry % of Sample

Automotive 80.28

Farming 4.42

Investment Management 3.67

Banking 3.40

Electrical/Electronic Manufacturing 3.33

Oil & Energy 2.85

Consumer Goods 2.04


welcome to brighter

Appendix 8: Demographic details for the norming sample-Portuguese (2021)


The new Norms for Abstract Reasoning Portuguese for LATAM region have been developed based on 1776
respondents. These norms are based on responses from candidates who have attempted 23 questions (9 easy, 9
medium and 5 difficult) randomly. The average time taken for completing this assessment was 25.61 minutes.

Table 1: Gender distribution of the norming Sample (n=1776)


Gender % of Sample

Female 31.25

Male 48.59

Not Specified 20.16

Table 2: Age based distribution of the norming Sample (n=1776)


Age (In years) % of Sample

Below 20 0.11

20-29 5.91

30-39 26.63

40-49 25.11

50-59 7.55

Above 60 1.30

Not Specified 33.39


Page 41

Table 3: Industry based distribution of the norming Sample (n=1776)


Industry % of Sample

Information Services 41.10

Higher Education 10.81

Financial Services 9.57

Chemicals 6.87

Farming 6.76

Telecommunications 5.18

Facilities Services 3.77

Pharmaceuticals 3.43

Oil & Energy 2.36

Others 4.22

Not Specified 5.91


Page 42

Appendix 9: Demographic details for the norming sample-Spanish (2021)


The new Norms for Abstract Reasoning Standardized Spanish for the Latam
Global region have been developed based on
5029 respondents. These norms are based on responses from candidates who have attempted 23 questions (9 easy, 9
medium and 5 difficult) randomly. The average time taken for completing this assessment was 26.25 minutes

Table 1: Gender distribution of the norming Sample (n=5029)


Gender % of Sample
Female 34.80

Male 45.89

Did not Specify 19.31

Table 2: Age based distribution of the norming Sample (n=5029)


Age (In years) % of Sample

20-30 Years 15.99

30-40 Years 24.88

40-50 Years 11.43

50-60 Years 3.18

Above 60 Years 0.28

Not Specified 44.24


Page 43

Table 3: Country based distribution of the norming Sample (n=5029)


Country % of Sample

Mexico 21.14

Columbia 7.14

Ecuador 5.65

Costa Rica 5.63

Peru 4.39

Dominican Republic 2.64

Bolivia 2.49

Argentina 2.09

Chile 1.33

Others 1.85

Not Specified 45.66


Page 44

Table 4: Industry based distribution of the norming Sample (n=5029)


Industry % of Sample

Food & Beverages 36.59

Information Technology & Services 13.70

Management Consulting 11.47

Financial Services 9.25

Information Services 4.55

Machinery 3.10

Telecommunications 2.35

Human Resources 1.47

Banking 1.47

Biotechnology 1.41

Food Production 1.35

Oil & Energy 1.23

Investment Banking 1.13

Others 2.25

Not Specified 8.67


welcome to brighter

Appendix 10: Demographic details for the norming sample-Turkish (2021)


The new Norms for Abstract Reasoning Turkish for Turkish region have been developed based on 4471 respondents.
These norms are based on responses from candidates who have attempted 23 questions (9 easy, 9 medium and 5
difficult) randomly. The average time taken for completing this assessment was 26.43 minutes

Table 1: Age based distribution of the norming Sample (n=4471)


Age (In years) % of Sample

20-29 30.93

30-39 27.24

40-49 8.03

50-59 1.45

above 60 0.58

Not Specified 31.76

Table 2: Industry based distribution of the norming Sample (n=4471)


Industry % of Sample

Information Technology and Services 62.45%

Banking 30.57%

Food & Beverages 3.13%

Financial Services 2.15%

Commercial Real Estate 1.70%


Page 46

References

i Kanazawa, S. (2004). General intelligence as a domain-specific adaptation. Psychological review,


111(2), 512.
ii Sternberg, R. J. (1997). The concept of intelligence and its role in lifelong learning and success.

American psychologist, 52(10), 1030.


iii Weinberg, R. A. (1989). Intelligence and IQ: Landmark issues and great debates. American

psychologist, 44(2), 98.


iv Horn, J. L., & Cattell, R. B. (1967). Age differences in fluid and crystallized intelligence. Acta

psychologica, 26, 107-129.


v Cattell, R. B. (1987). Intelligence: Its structure, growth and action (Vol. 35). Elsevier.

vi Kuncel, N. R., & Hezlett, S. A. (2007). Standardized tests predict graduate students‘success. Science,

315, 1080-1081.
vii Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., & de Fruyt, F. (2003). International validity

generalization of GMA and cognitive abilities: A European Community meta-analysis. Personnel


Psychology, 56, 573-605
viii Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel

psychology: Practical and theoretical implications of 85 years of research findings. Psychological


Bulletin, 124, 262–274.
ix Postlethwaite, B. E. (2011). Fluid ability, crystallized ability, and performance across multiple

domains: a meta-analysis.
x Duncan, J., Burgess, P. W., & Emslie, H. (1995). Fluid intelligence after frontal lobe lesions.

Neuropsychologia, 33, 261–268.


xi Zook, N. A., Davalos, D. B., DeLosh, E. L., & Davis, H. P. (2004). Working memory, inhibition, and fluid

intelligence as predictors of performance on Tower of Hanoi and London tasks. Brain and cognition,
56(3), 286-292.
xii Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career potential, creativity,

and job performance: Can one construct predict them all?. Journal of personality and social
psychology, 86(1), 148.
xiii Baker, T. G. (1996). Essence of intelligence. Practice Network. Society for Industrial &

Organizational Psychology (SIOP). Retrieved August 8, 2009, from


http://www.siop.org/tip/backissues/tipjul96/BAKER.aspx.
xiv Ben-Shakhar, G., & Sheffer, L. (2001). The relationship between the ability to divide attention and

standard measures of general cognitive abilities. Intelligence, 29, 293–306.


xv Konig, C. J., Buhner, M., & Murling, G. (2005). Working memory, fluid intelligence, and attention are

predictors of multitasking performance, but polychronicity and extraversion are not. Human
performance, 18(3), 243-266.
xvi Ackerman, P. L., Beier, M. E., & Boyle, M. O. (2005). Working memory and intelligence: The same or

different constructs? Psychological Bulletin, 131, 30–60.


xvii Kane, M. J., & Engle, R. W. (2002). The role of prefrontal cortex in working-memory capacity,

executive attention, and general fluid intelligence: An individual-differences perspective.


Page 47

Psychonomic Bulletin & Review, 9, 637–671.


xviii Kvist, A. V., & Gustafsson, J. E. (2008). The relation between fluid intelligence and the general

factor as a function of cultural background: A test of Cattell's Investment theory. Intelligence, 36(5),
422-436.
xix Watkins, M. W., Lei, P. W., & Canivez, G. L. (2007). Psychometric intelligence and achievement: A

cross-lagged panel analysis. Intelligence, 35(1), 59-68.


xx Primi, R., Ferrão, M. E., & Almeida, L. S. (2010). Fluid intelligence as a predictor of learning: A

longitudinal multilevel approach applied to math. Learning and Individual Differences, 20(5), 446-
451.
xxi Nakamura, Y. (2001). Rasch Measurement and Item Banking: Theory and Practice.

xxii Bergstrom, B. A., & Lunz, M. E. (1999). CAT for certification and licensure. Innovations in

computerized assessment, 67-91.


xxiii Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press.

xxiv Van der Linden, W. J. (2018). Handbook of item response theory, three volume set. Chapman and

Hall/CRC.
xxv American Educational Research Association, American Psychological Association, Joint Committee

on Standards for Educational, Psychological Testing (US), & National Council on Measurement in
Education. (1985). Standards for educational and psychological testing. American Educational
Research Association.
xxviUniform Guidelines on Employee Selection Procedures (EEOC, 1978), Retrieved September 11,

2019, from https://www.eeoc.gov/policy/docs/factemployment_procedures.html

You might also like