Efficacy Research Report
Product Summary
Assessment Quality Indicators
Foundational Research
Intended Product Implementation
Product Research
Adapted and published in many countries across the globe, the WISC is the leading cognitive
ability measure in the world. The WISC-V is currently published in the US, Canada, Australia
and Spain, with future publications planned in the United Kingdom, France, Germany,
Netherlands and Scandinavia.
The WISC-V was developed for use with children between the ages of 6 and 16 and is used
to obtain a comprehensive assessment of general intellectual functioning in the context of
various types of evaluations, including (but not limited to):
• Identifying students in school with specific learning disabilities
and qualification for services.
• Identifying children with intellectual disability or giftedness.
• Evaluating cognitive processing strengths and weaknesses.
• Assessing the impact of brain injuries.
The original WISC adapted subtests of the Wechsler-Bellevue Intelligence Scale (Wechsler,
1939) for use with children. It provided a Verbal IQ (VIQ), Performance IQ (PIQ), and Full Scale
The WISC–Revised (WISC-R) retained all 12 subtests from the first edition, shifted the age
range, and continued to offer a VIQ, PIQ, and FSIQ.
The WISC–Third Edition (WISC-III) retained all of the subtests from the WISC-R and
introduced a new subtest. The WISC–III introduced four new index scores that represented
more narrow domains of cognitive function: the Verbal Comprehension Index, the
Perceptual Organization Index, the Freedom from Distractibility Index, and the
Processing Speed Index. It continued to offer a VIQ, PIQ, and FSIQ.
The WISC–Fourth Edition (WISC-IV) dropped three subtests that appeared on the
WISC-III. Ten of the subtests were retained with revised item content and scoring procedures.
Five new subtests were developed. The traditional VIQ and PIQ scores were eliminated,
and the FSIQ was retained. Several process scores, which provided more detailed
information about certain aspects of WISC-V performance, also were included.
The revision goals for the WISC-V were generally to consider advances in structural models
of intelligence, cognitive neuroscience, neurodevelopmental research, psychometrics, and
contemporary practical clinical demands. The latter included revising instructions and
item phrasing to enhance comprehension of the task demands; simplifying scoring criteria,
shortening testing time; improving psychometric properties in norming methods; improving
floors and ceilings; increasing significance level options for critical values; improving the
measure of visual spatial processing, fluid reasoning, and working memory; adding a
variety of new composite scores to provide more clinical information; and adding measures
of cognitive processes that are sensitive to learning problems. These considerations
collectively refine the entire battery.
As the WISC-V is in the market longer, more data on this most current edition will become
available. Many external researchers request access to the WISC data to independently
verify and conduct their own studies on factor structure and many other questions.
They also independently collect and publish large special group studies to validate the
use of the test in their frequently tested populations. In addition to a variety of published
studies, there is ongoing research to extend the norms for intellectually gifted test-takers.
Research Studies
Research Study NA
Three mini-pilot studies (N=17, 5, and 20) and three pilot studies (N=431, 397, and 120)
were conducted on research versions of the test to examine issues with item content and
relevance, instructions for the examiner and child, administration procedures, psychometric
properties, and scoring criteria.
A national tryout was conducted on a version of the scale including all 21 of the subtests to
confirm findings from the earlier pilots, as well as refine item order and conduct statistical
analysis on test structure and potential item bias. Participants included 356 children
sampled using a stratified sampling procedure to account for representation across key
demographic characteristics (sex, race/ethnicity, parent education level, and geographic
region). Within each of nine different age groupings, the sample was similar to the U.S.
population according to 2012 census data.
The WISC-V includes eight new subtests. Although two of the new subtests are adaptations
of item types previously used and studied on the WAIS, the other 6 subtests are brand new
for the WISC-V. Five of the brand new subtests contain item types that are similar to those
studied in previous intelligence research literature. However, the Picture Span subtest
includes some novel elements that may not be as well researched (e.g., use of semantically
meaningful stimuli). To the extent that these are brand new subtests for the WISC-V, there
may be less published research supporting their use compared to subtests that formed part
of previous versions of the WISC. Nevertheless, the WISC-V norms, which are critical
for valid interpretation of individual performance, were developed based on
industry-standard, rigorous methods involving large, representative samples
of learners. The provision of norms based on a large, representative sample
enhances the validity of interpretations.
Research Study NA
A study was conducted on all primary and secondary subtests, in part, to evaluate factor
structure of the test. Participants included 2,200 children from 11 age groups, with each age
group closely matched to 2012 U.S. census data on race/ethnicity, parent education level,
and geographic region and balanced among males and females.
Patterns of correlations among all subtests provide initial evidence of convergent and
discriminant validity. Confirmatory factor analysis shows the WISC-V measures five related,
but distinct general abilities and each of the primary subtests included in the analysis
(e.g., digit span) is associated with the hypothesized aspect of cognitive ability (e.g., working
memory). This hierarchical structure was independently confirmed for test takers in five
different age groups.
Thus, empirical data patterns are consistent with the hypothesized structure of
the test, which is rooted in contemporary intelligence theory, providing support
for its valid use as a measure of cognitive ability.
Research Study NA
The Kaufman Assessment Battery for Children, Second Edition (KABC–II) is an individually
administered battery of subtests measuring the cognitive abilities of children and adolescents
aged three–18. The WISC-V and the KABC-II were administered to 89 children, aged 6-16, in
counterbalanced order, with a testing interval of 14-70 days and a mean testing interval of
22 days. Researchers computed correlations between composite scores and corresponding
subtest scores, which were corrected for range restriction using the normative sample as the
referent group. Corrected correlations between WISC-V FSIQ and KABC-II Fluid Crystallized
Index score (FCI) and Mental Processing Index (MPI) were 0.77 to 0.81, respectively. Corrected
correlations between corresponding subscores of the WISC-V and KABC-II (e.g., WISC-V VCI
and KABC-II Knowledge/Gc) were moderate, ranging from 0.50 to 0.74.
It should be noted that non- clinical samples were used in each study and correlations
were corrected for range restriction. Furthermore, external criterion measures may not
have been designed to assess exactly the same mix of abilities as the WISC-V.
Nevertheless, this collection of studies demonstrates that the WISC-V exhibits
consistent, positive relationships with other published measures of cognitive
ability and achievement.
Research Study NA
The Wechsler Intelligence Scale for Children–Fifth Edition, Integrated (WISC-V Integrated)
is an individually administered, comprehensive clinical instrument for assessing the cognitive
processes of children ages 6:0–16:11. Its subtests and scores extend the clinical information
about the cognitive processes and test-taking behaviors that may affect performance on
the WISC-V. The WISC-V Integrated also provides two index scores that permit additional
understanding of the cognitive abilities measured with the WISC-V in specific areas of
intellectual functioning (i.e., Multiple Choice Verbal Comprehension Index and Visual
Working Memory Index).
In particular, eight subtests are adaptations of WISC-V subtests: they include the same item
content as their corresponding, but the mode of presentation or the response format is
modified. Two subtests are variations of WISC-V subtests, which include either novel item
content or modifications to the mode of presentation or response format. Finally, four
subtests are designed to expand the scope of construct coverage or to provide information
that may be related to the child’s performance on Coding.
Research Study NA
Description of Sample Participants were 58% male, 54% White, 16% African American, 14%
Asian, 12% Hispanic, and 5% other. 98% of participants had a parent
who completed at least 12 years of school and 42% had a parent
who completed 16 or more years of school. 47% of participants came
from the South, 26% from the Northeast, 19% from the West, and 9%
from the Midwest.
A sample of examinees took the WISC-V and then the WASI-II. The correlation coefficients
corrected for the variability of the normative sample of corresponding subtest pairs and
of the two FSIQ scores are moderately high and are all statistically significant at the .05
level, ranging from 0.53 (for Matrix Reasoning) to 0.87 (for the FSIQ measures). Thus,
performance on the WISC-V shows consistently strong and positive relationships
with performance on corresponding subtests of an abbreviated form of the test
designed to measure the same constructs, but using different items.
Research Study NA
Description of Sample Intellectually Gifted: The sample was 65% male, 73% White, 10%
Hispanic, 8% other, 6% Asian, and 3% African-A merican. 100% of
participants had parents with at least 12 years of education, with 88%
of the sample reporting at least 16 years of parental education. 52%
of participants were drawn from the Midwest, 32% from the South, 8%
from the Northeast, and 6% from the West.
Intellectual Disability - Mild Severity: The sample was 55% male, 60%
White, 26% African-A merican, 14% Hispanic, and 1% other. 68% of
participants had parents with at least 12 years of education, with 16%
of the sample reporting at least 16 years of parental education. 60% of
participants were drawn from the South, 27% from the Midwest, 10%
from the West, and 4% from the Northeast.
Specific Learning Disorder - Reading: The sample was 57% female, 63%
White, 28% Hispanic, and 10% African-American. 87% of participants
had parents with at least 12 years of education, with 40% reporting
at least 16 years of parental education. 57% of participants were drawn
from the South, 23% from the West, 17% from the Midwest, and 3%
from the Northeast.
Disruptive Behavior: The sample was 52% male, 48% White, 38% African-
-American, 10% other, and 4.8% Asian. 92% of participants had parents
with at least 12 years of education, with 10% reporting at least 16 years
of parental education. 38% of participants were drawn from the Midwest,
33% from the South, 14% from the Northeast, and 14% from the West.
Traumatic Brain Injury: The sample was 60% male, 55% White, 30%
Hispanic, 10% African-A merican, and 5% other. 90% of participants had
parents with at least 12 years of education, with 40% reporting at least
16 years of parental education. 45% of participants were drawn from the
South, 45% from the West, and 10% from the Midwest.
English Language Learners: The sample was 50% female, 88% Hispanic,
and 13% Asian. 50% of participants had parents with at least 12 years of
education, with 6% reporting at least 16 years of parental education. 38%
of participants were drawn from the West, 31% from the South, 19% from
the Midwest, and 13% from the Northeast.
Study Citation Costa, E. B. Adams, Day, L. A., & Raiford, S. E. (2016). WISC–V special
group study: Children with hearing differences who utilize spoken language
and have assistive technology. Bloomington, MN: Pearson.
Sample Size N=15 children, 6-8 years of age, with hearing differences
Description of Sample The sample was 60% male, 40% White, 20% Asian, 20% Other, 13%
African-American, and 7% Hispanic. 100% of participants had parents
with at least some college or technical school, and 80% of participants
had parents holding a Bachelor’s degree. 100% of participants came
from the South.
Participants included children with hearing loss falling within at least the
mild range unilaterally, who use either a cochlear implant or hearing aid.
Thus, these results replicate previous research on cochlear implant users that demonstrated
lower scores on subtests from the Verbal Comprehension domain. In addition, these
findings are consistent with previous research that demonstrates vulnerability in the area
of verbal working memory for children with hearing differences (Geraci, Gozzi, Papagno,
& Cecchetto, 2008). It should be noted that that the sample was one of convenience, and
included children who use appropriate technology and receive specialized educational
services and supports. Further, the sample was small and not necessarily representative
in several demographic factors (e.g., parental education, geographic region). Nevertheless,
results from this study support the conclusion that the WISC-V is sensitive to
performance differences exhibited by hearing impaired individuals with access
to high-quality assistive devices and ideal levels of educational support.
Research Study NA
Description of Sample Each sample was drawn from the nationally representative
standardization sample. Participants in each of 11 age groups
were closely matched to 2012 U.S. census data on race/ethnicity,
parent education level, and geographic region and were balanced
with respect to gender.
The risk factors analysis sample excluded all clinical and intellectually-
gifted learners.
Researchers analyzed the WISC-V performance of several subgroups using the normative
sample. First, they computed the mean male-female difference on all composite scores to
identify those exhibiting significant sex-related differences (at p<.05 or p<.01). Researchers
concluded that the WISC-V Working Memory Index, Processing Speed Index, FSIQ, Nonverbal
Index, Cognitive Proficiency Index, and Symbol Translation Index are significantly higher in
female than in male children, and the Quantitative Reasoning Index is significantly higher
in male than in female children. However, with the exception of the Processing Speed Index
and the Cognitive Proficiency Index, the magnitude of mean differences is small (i.e., most
differences are less than 1.5 points). The Verbal Comprehension Index, Visual Spatial Index,
Auditory Working Memory Index, General Ability Index, Naming Speed Index, and Storage
and Retrieval Index showed no significant sex differences.
Next, researchers computed mean score differences among test-takers with different
parental education levels as a proxy indicator for socioeconomic status. Differences
between all levels were statistically significant at the p<.001 level and favored children
with higher parental education levels. This effect was strongest for composites that depend
on verbal ability – the General Ability Index, FSIQ, and the Verbal Comprehension Index.
Finally, researchers computed mean composite score differences among White, African-
American, Asian, Hispanic, and Other test-takers. Before adjusting for sex and parental
education level, Asian and White test-takers tend to outperform their African-American,
Hispanic, and Other counterparts on all composites. Combined, sex and parental education
account for between 3.5% and 19.5% of the variance in composite scores. However, even
when subgroup means were adjusted for sex and parental education level, there were still
performance differences between groups, with the largest amount of residual variance
attributable to race/ethnicity for the Visual Spatial Index, General Ability Index, Verbal
Comprehension Index, and FSIQ. The differences between White and African-American
test-takers were the largest, with significant differences persisting on all composite scores,
and the largest differences observed for measures of crystallized ability and acquired
knowledge. Even where large differences persist, however, the percentage of variance
attributable to race/ethnicity is relatively small, ranging from 1% to 6%. White and Hispanic
group score differences are noticeably smaller after adjusting for sex and parental education
level, and may no longer be practically meaningful.
Researchers designed two new risk assessments: the Child and Adolescent Academic
Questionnaire (Academic-Q), containing items related to risk factors for school failure and
the Child and Adolescent Behavior Questionnaire (Behavior-Q), containing items related to
risk factors for delinquency and criminal behavior. Instruments were based on an extensive
review of the academic and delinquency risk research and included both static and dynamic
factors, as well as both individual-level and family-level risk factors.
Scatter Analysis
Scatter refers to the degree of variability across, between, and within composite
and subtest scores, and has long been a topic of interest for the Wechsler family of
assessments (Matarazzo, Daniel, Prifitera, & Herman, 1988; McLean, Kaufman, & Reynolds,
1989; Wechsler, 1991; Wechsler, 2003). Traditionally, scatter analysis has been used to inform
comparison of different types of scores as a way of identifying relative cognitive strengths
and weaknesses. The concept of normative base rates is important in any discussion of
scatter, and captures the degree of variability in scores for nonclinical samples.
Researchers investigated the prevalence of scatter in the normative sample using the index
range, calculated by subtracting the lowest primary index score from the highest. The mean
index range and standard deviations (SDs) were calculated separately for each age level,
gender, parental education level, and racial/ethnic category. The means and SDs were highly
consistent across these groups, with mean differences hovering around 25 points and SDs
of approximately 10 points. This result suggests that the large amount of scatter in the index
scores cannot be attributed to demographic factors. Researchers also computed index
ranges for different clinical groups and the intellectually gifted. Results suggest that all but
one of the special groups (those with intellectual disabilities) exhibit mean index ranges
consistent with the “normal” range of about 25. Moreover, researchers found that 18% of
learners in the normative sample earned three or more significant differences between
primary indexes and their FSIQ, and only a few clinical groups (intellectually gifted, children
with intellectual disability) demonstrated rates that were substantially different from this.
Researchers replicated this analysis with subtest scores for the 10 primary subtests to
compute the mean subtest range across demographic groups. Once again, mean ranges
were quite similar across demographic groups (hovering around 7 +/- 2 points) and were
consistent with previous estimates of subtest ranges for the Wechsler assessments.
Similarly, subtest scatter for most special groups (excluding children with intellectual
disability) was comparable to that for both the normative and non clinical samples. Once
again, as many as 17% of learners in the normative sample earned four or more significant
differences between primary subtest scores, and results for most special groups did not
differ much.
These results are consistent with previous research showing a large degree of variability
in cognitive ability scores for nonclinical samples (Orsini, Pezzuti, & Hulbert, 2014), which
supports the value of the WISC-V as a measure of cognitive ability. They also provide
support for the practice of interpreting discrepancies in terms of significance and
prevalence, which is facilitated by including WISC-V base rates in the manual. Thus,
WISC-V interpretive materials enhance clinical utility.
Research Study NA
The WISC-V was administered twice to a sample of 218 students within five different age
bands (6-7, 8-9, 10-11, 12-13, and 14-16), with test-retest intervals ranging from 9–82 days, and
a mean interval of 26 days. The stability coefficient is the correlation between the first and
second testing, corrected for range restriction using the normative sample as the referent.
The corrected test-retest coefficient for the FSIQ was 0.92 and corrected coefficients for
the primary index scores ranged from 0.75 to 0.94. Corrected coefficients for the WISC-
-V subtest scores ranged from 0.71 to 0.90. It should be noted, however, that sample sizes
for test-retest reliability analysis were somewhat small, particularly for the complementary
subtests and process subscores and correlations were corrected for range restriction.
However, results generally suggest that both primary index scores and subtest
scores demonstrate moderate to high consistency over testing occasion.
Research Study NA
Sample Size N=60 randomly selected cases from the normative sample
Description of Sample The mean age of the participants was 11.3 years. The sample was evenly
split between males and females. 47% of the sample was White, 28%
Hispanic, 12% African American, 8% Other, and 5% Asian. 83% of the
sample reported parental educational levels of at least 12 years and 30%
reported at least 16 years of parental education. 38% of the sample was
drawn from the South, 28% from the West, 22% from the Midwest, and
12% from the Northeast.
Assessment Quality Test scores are consistent over time and/or over
Indicator Measured multiple raters (Reliability)
Most of the subtests for all WISC-V protocols from the normative sample were double scored
by two independent scorers, and evidence of interscorer agreement was obtained using
the normative sample. Data collected by examiners were scored by trained personnel. All
scorers were required, at a minimum, to have a Bachelor’s degree and to attend a training
program conducted by members of the research team. In addition, all scorers received
feedback on scoring errors and additional training, as needed, and a research team member
coached each scorer intermittently. Interscorer agreement for a subset of all subtests was
high, ranging from 0.98 to 0.99.
Scoring of the Verbal Comprehension Index is more subjective, which required a separate
study. A sample of 60 cases was randomly selected from the normative sample and
scored independently by nine different raters who were completing doctoral-level clinical
psychology programs and had completed at least one semester course in psychological
assessment but had no prior training on WISC-V scoring criteria. Interscorer reliabilities, in
the form of the intraclass correlation coefficient, were .98 for Similarities, .97 for Vocabulary,
.99 for Information, and .97 for Comprehension.
Given the extensive training, feedback and support provided to the scorers participating
in the study, it is not clear whether the estimated interrater agreement rates would apply to
the typical clinician who does not receive this type of feedback and support. Nevertheless,
evidence suggests that scoring of the WISC-V is highly consistent across raters.
Study Citation Daniel, M.H., Wahlstrom, D., & Zhang, O. (2014). Equivalence
of Q-interactive and Paper Administrations of Cognitive Tasks:
WISC-V. Q-interactive Technical Report 8. Bloomington, MN: Pearson.
Research Study NA
Description of Sample Paper: The sample was 58% female, 67% White, 17% Hispanic, 10%
African-American, and 6% other. 90% of participants had parents with
at least 12 years of education, with 42% reporting at least 16 years of
parental education. The mean age for the group was 11.1 years.
Assessment Quality Test scores can be interpreted the same way for test-t akers
Indicator Measured of different subgroups (Fairness)
As part of the WISC-V standardization, 350 nonclinical participants, ages 6-16, were randomly
assigned to either the paper or the digital format of the test. Within each condition,
participants were placed into matched pairs on the basis of age range, gender, ethnicity,
and parent education. All examiners were trained, engaged in practice administrations,
and were provided feedback on any administration errors. Researchers calculated effect
sizes for the format effect using a multiple regression based approach in which the
dependent variables were the subtest scaled scores and the predictors were demographic
covariates and WISC-V subtests that had previously shown only very minor format effects.
Effect sizes were mixed, with some positive and some negative. A criterion of greater than
0.20 was used to identify effect sizes worthy of following up. An effect size of 0.20 is slightly
more than one-half of a scaled-score point on the commonly used subtest metric that has
a mean of 10 and standard deviation of three. Only three subtests showed a statistically
significant format effect (two that were significant at the p<.05 level and one significant at
the p<.01 level); however, none of these exceeded the effect size criterion of 0.20. There
were no significant differences in format effects by ability level, age, socioeconomic status,
gender, or race/ethnicity.
It should be noted that this study was based on nonclinical samples, so equivalence cannot
be assumed for clinical groups of test-takers. Test-takers and non-Pearson examiners were
compensated for their participation. Moreover, given the training, practice and feedback
provided to the examiners participating in the study, it is not clear whether the equivalence
could be expected to hold when examiners have not been provided this type of feedback.
This collection of studies suggests that paper and digital formats of the WISC-V
provide comparable results. Thus, learners taking one format will not be at a
disadvantage relative to learners taking the other format.
Study Citation Raiford, S.E., Holdnack, J., Drozdick, L., & Zhang, O., (2014). Q-interactive
special group studies: The WISC-V and children with intellectual giftedness
and intellectual disability.
Q-interactive Technical Report 9. Bloomington, MN: Pearson.
Research Study NA
Description of Sample Intellectual giftedness sample: The sample was 54% male, 71% White,
17% other, 8% Hispanic, and 4% Asian. 100% of participants had parents
with at least 12 years of education, with 88% reporting at least 16 years
of parental education.
Assessment Quality Test scores can be interpreted the same way for test-t akers
Indicator Measured of different subgroups (Fairness)
A special study was conducted to investigate the performance of the digital format
of the WISC-V for clinical groups. The purpose of the study was to show that the digital
format of the test demonstrates similar sensitivity to clinical conditions as the paper format.
24 test-takers identified as intellectually gifted and 22 test-takers identified as intellectually
disabled were each matched with a non-clinical counterpart from the sample used in
the first digital-paper equivalence study on the basis of age range, gender, ethnicity,
and parent education. All examiners were trained, engaged in practice administrations,
and were provided feedback on any administration errors. For each protocol, two
independent scorers reevaluated all subjectively scored items using the final scoring
rules, and an expert scorer or a member of the research team resolved any discrepancies
between the two scorers as needed.
The intellectual giftedness sample outperformed the matched control sample across all
composite scores and subtests. Most of these differences were significant at the p<.01
level, with Cohen’s D effect sizes ranging from 0 .46 to 1
.72. Moreover, the pattern of subtest
effect sizes is consistent with those observed in the WISC-V paper study, and mean General
Ability Index scores were identical for the intellectually gifted samples on both paper and
digital formats. The intellectual disability sample earned significantly lower scores than their
matched control counterparts across all primary and ancillary indices, as well as all subtests,
with Cohen’s D effect sizes ranging from 1.76 to 3.86. In addition, the mean General Ability
Index scores were nearly identical for the intellectual disability samples on both forms
(63.7 on the digital versus 63.5 on paper).
It should be noted that test-takers and non-Pearson examiners were compensated for
their participation. Moreover, given the training, practice and feedback provided to the
examiners participating in the study, it is not clear whether the equivalence could be
expected to hold when examiners have not been provided this type of feedback. However,
this collection of studies provides further support for the comparability of paper
and digital formats of the WISC-V for intellectually gifted learners and those with
an intellectual disability.
Study Citation Raiford, S.E., Drozdick, L., & Zhang, O., (2015). Q-interactive special
group studies: The WISC-V and children with Autism Spectrum Disorder
and accompanying language impairment or Attention Deficit/Hyperactivity
disorder. Q-interactive Technical Report 11. Bloomington, MN: Pearson.
Research Study NA
Assessment Quality Test scores can be interpreted the same way for test-t akers
Indicator Measured of different subgroups (Fairness)
A special study was conducted to investigate the performance of digital formats of the
WISC-V for clinical groups. The purpose of the study was to show that the digital format of
the test demonstrates similar sensitivity to clinical conditions as the paper format. 30 test-
takers identified as being on the autism spectrum with accompanying language impairment
(ASD-L) and 25 test-takers identified as having ADHD were each matched with a non-clinical
counterpart from the sample used in the first digital-paper equivalence study on the basis
of age range, gender, ethnicity, and parent education. All examiners were trained, engaged
in practice administrations, and were provided feedback on any administration errors.
For each protocol, two independent scorers reevaluated all subjectively scored items using
the final scoring rules, and an expert scorer or a member of the research team resolved
any discrepancies between the two scorers as needed.
The ASD-L sample earned significantly lower scores (p<.01) than the matched control
sample on all primary and ancillary indices, as well as all subtests, with Cohen’s D effect
sizes ranging from 0.81 to 2.00. The pattern of performance differences was similar to
those observed for the paper format. The mean General Ability Index scores for the
ASD-L samples taking the digital and paper formats were 81.8 and 85.7, respectively.
It should be noted that test-takers and non-Pearson examiners were compensated for their
participation. Moreover, given the training, practice and feedback provided to the examiners
participating in the study, it is not clear whether the equivalence could be expected to hold
when examiners have not been provided this type of feedback. However, this collection
of studies provides further support for the comparability of paper and digital
formats of the WISC-V for ASD-L and ADHD groups.
Study Citation Raiford, S.E., Drozdick, L. W., & Zhang, O., (2016). Q-interactive special
group studies: The WISC-V and children with specific learning disorders in
reading or mathematics. Q-interactive Technical Report 13. Bloomington,
MN: Pearson.
Research Study NA
Description of Sample SLD-R sample: The sample was 63% male, 88% White, 8% Hispanic,
and 4% African American. 88% of participants had parents with at least
a high school diploma or equivalent, with 42% of participants having a
parent with a Bachelor’s degree. Sample demographics were generally
similar to those of the SLD-R sample used for the special group study
conducted with the WISC-V paper format, although the sample was
slightly older and more male.
SLD-M sample: The sample was 44% male, 61% White, 13% Hispanic,
13% African-A merican, and 13% other. 91% of participants had parents
with at least a high school diploma or equivalent, with 35% of participants
having a parent with a Bachelor’s degree. Sample demographics were
generally similar to those of the ADHD sample used for the special group
study conducted with the WISC-V paper format, although the sample
was slightly younger and less racially diverse and reported slightly higher
levels of parental education.
Assessment Quality Test scores can be interpreted the same way for test-t akers
Indicator Measured of different subgroups (Fairness)
A special study was conducted to investigate the performance of a digital format of the
WISC-V for clinical groups. The purpose of the study was to show that the digital format
of the test demonstrates similar sensitivity to clinical conditions as the paper format. 24
test-takers identified as having SLD-R and 23 test-takers identified as having SLD-M were
each matched with a non-clinical counterpart from the sample used in the first digital-paper
equivalence study on the basis of age range, gender, ethnicity, and parent education. All
examiners were trained, engaged in practice administrations, and were provided feedback
on any administration errors. For each protocol, two independent scorers reevaluated all
subjectively scored items using the final scoring rules, and an expert scorer or a member of
the research team resolved any discrepancies between the two scorers as needed.
The SLD-R sample earned significantly lower scores on all primary indexes (p<.05) than the
matched control sample, with the largest effect sizes seen on VCI (1.01), WMI (1.24), and
AWMI (1.14). Several individual subtest scores were also significantly lower (p<.01) for the
SLD-R sample than for the control sample, including Letter-Number Sequencing, Digit Span,
Arithmetic, Picture Concepts, Information, Picture Span, and Similarities, with effect sizes
ranging from 0.99 to 1.44. Finally, all of the complementary subtest scores were lower for the
SLD-R sample than for the control sample (p<.05), with effect sizes ranging from 0.92 to 1.79.
The results indicate significant difficulties with immediate paired associate learning, rapid
verbal naming, verbal comprehension, and working memory. This pattern of performance
differences was similar to those observed for the paper format.
It should be noted that test-takers and non-Pearson examiners were compensated for their
participation. Moreover, given the training, practice and feedback provided to the examiners
participating in the study, it is not clear whether the equivalence could be expected to hold
when examiners have not been provided this type of feedback. However, this collection
of studies provides further support for the comparability of paper and digital
formats of the WISC-V for learners with specific learning disabilities in Reading
and Mathematics.
Study Citation Raiford, S. E., Zhang, O., Drozdick, L.W., Getz, K., Wahlstrom, D., Gabel, A.,
Holdnack, J. A., & Daniel, M. (2016). WISC-V Coding and Symbol Search in
digital format: Reliability, validity, special group studies, and interpretation.
Q-interactive Technical Report 12. Bloomington, MN: Pearson.
Research Study NA
Type of Study Q-interactive equivalence study and performance for special populations
Description of Sample Non-clinical equivalence: The sample was 52% male, and was 55% White,
24% Hispanic, 12% African-A merican, 7% other, and 1% Asian. 87% of
participants had parents with at least 12 years of education, and one-
-third reported their parents had a Bachelor’s degree. 48% of the sample
was drawn from the South, 25% from the West, 14% from the Northeast,
and 12% from the Midwest.
SLD- R: The sample was 58% male, and was 50% Hispanic, 46% White,
and 4% other. 100% of participants had parents with at least 12 years of
education, and one-third reported their parents had a Bachelor’s degree.
71% of the sample was drawn from the South, 12% from the Northeast,
and 8% each from the Midwest and the West.
SLD- M: The sample was 54% male, and was 54% White, 23% Hispanic,
18% African-A merican, and 4% other. 100% of participants had parents
with at least 12 years of education, and 13% reported their parents had a
Bachelor’s degree. 82% of the sample was drawn from the South, and 9%
each from the Midwest and the West.
MI: The sample was 80% male, and was 80% White, 13% Hispanic,
and 7% other. 93% of participants had parents with at least 12 years
of education, and 47% reported their parents had a Bachelor’s degree.
73% of the sample was drawn from the South, 20% from the Northeast,
and 7% from the West.
Assessment Quality Test scores can be interpreted the same way for test-t akers
Indicator Measured of different subgroups (Fairness)
A special study was conducted to investigate the performance of digital formats of the
WISC-V for clinical groups. The purpose of the study was to show that the digital format of the
test demonstrates similar sensitivity to specific learning disorders in reading and mathematics
as the paper format. For motor-impaired children, the purpose of the study was to illustrate
typical performance for touch response compared to written responses.
24 test-takers identified as having SLD-R, 22 test-takers identified as having SLD-M, and 15 test-
takers with significant motor impairment were each matched with a non- clinical counterpart
from the sample used in the scaling study on the basis of age range, gender, ethnicity, and
parent education.
The SLD-R sample earned significantly lower scores (p<.05) than the matched control sample
on most primary indices and the FSIQ, with Cohen’s D effect sizes ranging from
0.30 to 1.77. The SLD-M sample earned significantly lower scores than their matched
control counterparts across all primary and ancillary indices (p<.05). Across all indices, Cohen’s
D effect sizes ranged from 0.75 to 1.85. The MI sample earned significantly lower scores
(p<.05) than the matched control sample on the Coding and Symbol Search subtests and the
Processing Speed Index, with Cohen’s D effect sizes ranging from 0.81 to 0.95.
For all three special groups, the pattern of performance differences was similar
to those observed for the paper format, suggesting that when the Processing Speed
Index subtests are administered digitally, scores are comparable to
the paper format for these groups.
Research Study NA
Description of Sample Intellectually Gifted: The sample was 65% male, 73% White, 10% Hispanic,
8% other, 6% Asian, and 3% African-A merican. 100% of participants
had parents with at least 12 years of education, with 88% of the sample
reporting at least 16 years of parental education. 52% of participants
were drawn from the Midwest, 32% from the South, 8% from the
Northeast, and 6% from the West.
Intellectual Disability - Mild Severity: The sample was 55% male, 60%
White, 26% African-A merican, 14% Hispanic, and 1% other. 68% of
participants had parents with at least 12 years of education, with
16% of the sample reporting at least 16 years of parental education.
60% of participants were drawn from the South, 27% from the Midwest,
10% from the West, and 4% from the Northeast.
Disruptive Behavior: The sample was 52% male, 48% White, 38% African-
American, 10% other, and 4.8% Asian. 92% of participants had parents
with at least 12 years of education, with 10% reporting at least 16 years
of parental education. 38% of participants were drawn from the Midwest,
33% from the South, 14% from the Northeast, and 14% from the West.
Traumatic Brain Injury: The sample was 60% male, 55% White, 30%
Hispanic, 10% African-A merican, and 5% other. 90% of participants had
parents with at least 12 years of education, with 40% reporting at least
16 years of parental education. 45% of participants were drawn from
the South, 45% from the West, and 10% from the Midwest.
English Language Learners: The sample was 50% female, 88% Hispanic,
and 13% Asian. 50% of participants had parents with at least 12 years of
education, with 6% reporting at least 16 years of parental education. 38%
of participants were drawn from the West, 31% from the South, 19% from
the Midwest, and 13% from the Northeast.
Assessment Quality Test scores can be interpreted the same way for test-t akers of
Indicator Measured different subgroups (Fairness) and they are consistent over time
and/or over multiple raters (Reliability)
Split-half reliability coefficients were computed for each subtest, with the exception of the
following subtests: Coding, Symbol Search, Cancellation, Naming Speed, and the Naming
Speed and Symbol Translation standard process scores. Coefficients were averaged across
groups using Fisher’s z transformation. Across all special groups, the average split-half
reliability of the subtest and scaled process scores ranged from 0.86 to 0.97. Coefficients
were generally consistent with corresponding estimates for the normative sample. It should
be noted that the clinical samples were not randomly selected but were recruited based
on availability. These studies may not be representative of the WISC-V performance of all
children in the diagnostic category. The diagnoses of children within the same special group
might have been made on the basis of different criteria and procedures. Moreover,
the sample sizes for some of the studies are small and cover only a portion of the WISC-V
age range. Nevertheless, evidence from these studies suggests that the WISC-V
subtests are internally consistent for a wide variety of clinical groups,
and their consistency is comparable to that for non-clinical test-takers.
Study Citation Chen, H., Zhang, O., Raiford, S. E., Zhu, J., & Weiss, L. G. (2015). Factor
invariance between genders on the Wechsler Intelligence Scale for
Children–Fifth Edition. Personality and Individual Differences, 86, 15.
Research Study NA
A representative sample of 2,637 children, ages 6-16, from the standardization sample
was administered 16 subtests from the WISC-V. A second- order factor model, positing
an overarching general intelligence factor subsuming five additional factors (Verbal
Comprehension, Visual Spatial, Fluid Reasoning, Working Memory, and Processing Speed)
was tested for invariance across Caucasian, African-American, and Hispanic males and
females. The model tested was identical to the model from the Chen et al. (2015) paper,
except that Arithmetic scores were allowed to cross-load on Working Memory and Verbal
Comprehension, in addition to Fluid Reasoning. Based on CFI, RMSEA, and changes in CFI
and RMSEA for successive models, results demonstrate that the hypothesized model shows
configural invariance (same number of factors and factor pattern) metric invariance
(equal factor loadings), and intercept invariance (equal subtest means) across all groups
(overall model Chi-square=1644.45, df=747, CFI=0.95, RMSEA=0.051). These results
suggest that WISC-V scores can be interpreted in the same way for Caucasian,
African-American, and Hispanic males and females.
