Evaluation of Multiple Variables Predict
Evaluation of Multiple Variables Predict
Purpose: A significant positive correlation between the Physician Assistant National Certifying Examination (PANCE) and the PAEA
Physician Assistant Clinical Knowledge Rating and Assessment Tool (PACKRAT) has been found in previous studies. This study's goal was
to improve prediction of PANCE failure using other predictive measures, alone and in combination, with the PACKRAT. Methods: Correlation
and discriminate and regression analyses were conducted on 3 years of data (2007–2009) collected from Chatham University PA Studies
graduates, with convergent results. Results: Significant positive correlations between the PANCE and each of the assessment measures
were found (in order of significance): (1) the PACKRAT exam; (2) a summative multiple choice question exam given near the end of the clin-
ical (second) year; (3) average multiple-choice-question-exam results from the didactic (first) year; (4) prerequisite preadmission GPA; and
(5) overall preadmission GPA. Discriminant analyses revealed the combined solution of PACKRAT, SUMM, and MCQ to most accurately dif-
ferentiate between the “pass” versus “fail” groups on the PANCE. Most importantly, regression analyses revealed the combined solution
of SUMM and MCQ to most accurately predict PANCE performance at the low end of the scale. Conclusion: This research suggests that all
assessment measures tested can provide helpful estimates of PANCE performance; however the combined SUMM and MCQ solution pro-
vided the most reliable and accurate prediction of PANCE performance for “high risk” students. Other physician assistant programs could
develop analyses of their internal testing assessments to identify students at risk of PANCE failure and allocate institutional and program
resources to improve rates of PANCE passage.
ed in graduate school due to the examination to indicate the perform well on the test. The timing
more intensive curriculum. To assist likelihood of PANCE passage was of both PACKRAT exams might have
students with academic difficulties, undertaken in a small sample (N = also affected student performance,
many institutions have learning cen- 29) of first-time PANCE takers.3 since first-year students took the
ters that provide diagnostic invento- Their results were predictive of exam immediately after a holiday
ries that pinpoint problems and then PANCE passage; however, the study break, and second-year students took
assist with solutions. Early detection subjects were PA program graduates the exam one month prior to the
of knowledge or skills gaps is essential preparing for their first PANCE completion of their clinical year.
for identifying appropriate remedia- attempt postgraduation, not PA In the same year, a statistically sig-
tion techniques for students in need students in their didactic year. nificant correlation between PACK-
of targeted supplemental instruction. Subsequently, in 2003, a retro- RAT and PANCE scores was found.7
Proactive intervention saves the stu- spective analysis of several demogra- Scores below 55% on the PACKRAT
dent wasted time, effort, and money phic and internally developed corresponded with an increased like-
in a program that is unlikely to cul- academic measures was performed lihood of failing the PANCE. As in
minate in professional certification. for prediction of PANCE passage.4 the prior study, the PACKRAT score
Likewise, early detection enables the Statistically significant predictors was not used for grading purposes
educators to tailor their resources were found to be Year One GPA and and therefore might not be indicative
toward the individual needs of stu- summative exam scores. Year One of actual student knowledge or moti-
dents more effectively. In the long GPA of less than 3.0 and summative vation for achieving higher scores.
run, early intervention will accom- scores of less than 67% were associat- In a larger collaborative analysis,
plish savings for the institution in ed with increased risk for PANCE authors found a strong correlation
terms of staff attention and resources failure. between PACKRAT-1 and
as students’ problems are rectified Demographic factors such as age PACKRAT-2 scores and PANCE suc-
before deficiencies are compounded and gender were evaluated to predict cess.8 This large multi-institutional
over time. PANCE passage in 2004.5 Using study (N = 638) identified a PACK-
Few studies have been able to three separate cohorts of PANCE tak- RAT-1 cutoff score of 118 and PACK-
identify reliable predictors of ers, younger test takers and women RAT-2 cutoff score of 127 for distin-
successful PANCE performance. In were found to score relatively higher guishing between likelihood of
1999, the type of degree conferred, than older test takers and men. When passing versus failing the PANCE.
program length, and length of pro- considered together, the overall effect These levels have been used as target
gram accreditation were evaluated.2 was rather small but has practical PACKRAT scores in subsequent years
Higher overall PANCE passage rates significance in that older students, of PA study.
were found within master’s degree especially males, tend to have rela- More recently, PACKRAT scores,
PA programs as compared with bach- tively poorer performance on the comprehensive summative evaluation
elor’s degree programs. Master’s PANCE. This between-group differ- scores, and first-attempt PANCE
degree programs also were found to ence suggests a possible need for scores were correlated over 4 years.9
have higher core, primary care and alternative instructional approaches The PACKRAT was given 6 months
clinical skills scores. In addition, sig- or other opportunities for improving prior to the program’s summative
nificantly greater PANCE pass rates the curriculum for targeted examination and first PANCE
occurred within programs that had subgroups. attempt. Its use was intended as a
been in existence and accredited Student results on PACKRAT 1 diagnostic tool for students’ self-
longer. This significance was only for and PACKRAT 2 and subsequent directed learning and was not includ-
higher clinical skills scores. However, PANCE performance were analyzed ed in a course grade. Results support
since PANCE dropped its clinical in 2004.6 Statistically significant pos- PACKRAT as a helpful tool to
component, these findings are no itive relationships were found improve students’ performance on
longer relevant. No significant rela- between the three exams. The testing both the summative examination and
tionship was found between length of conditions were noted as a limitation PANCE. However, in only one group
program curriculum and PANCE for this study, as the PACKRAT was of students were PACKRAT scores
passage rate. not used as a criterion for passing a below 135 found to correspond to
In 2002, attempts to identify the specific course, which could have low passing scores on the PANCE as
cut score for a norm-referenced undermined students’ motivation to well as high rates of first-time failure
of the summative exam. students transition from pedagogy to • H1: Each of the five predictor
Other health care fields have also certification and, ultimately, to suc- measures (OGPA, PGPA, MCQ,
evaluated factors that influence first- cessful practice in the field. SUMM, and PACKRAT) is posi-
time pass rates on national certifying tively correlated with PANCE.
examinations. Five academic variables METHODS
— overall GPA, athletic training GPA, Description of Measures • H2: PACKRAT explains the most
academic minor GPA, ACT Five standardized measures of unique variance when differentiat-
Composite score, and number of student knowledge were tested for ing PANCE Pass versus Fail
university semesters — could be used their individual and combined accu- groups, as compared with the
as predictors of first-time pass rates racy for predicting student perform- unique variance explained by each
on The National Athletic Trainer’s ance on the PANCE. The five meas- of the other four predictors
Association Board of Certification ures were collected at different points (OGPA, PGPA, MCQ, and
Examination.10 No single variable was during the 2-year PA program, SUMM).
identified as most predictive, but reflecting the level of achievement
academic variables were determined commensurate with the type and • H3: SUMM adds predictive accu-
to be the strongest predictors. A ret- amount of instruction provided. The racy beyond PACKRAT when dif-
rospective population-based cohort five predictor measures were: ferentiating PANCE Pass versus
study of student performance on the (1) Overall GPA (OGPA; the overall Fail groups, as compared with the
National Physical Therapy undergraduate GPA); (2) Prerequi- unique variance provided by the
Examination (NPTE) was evaluated site GPA (PGPA; the GPA for courses remaining three predictors
by attempting to determine the fail- that are PA undergraduate prerequi- (OGPA, GPA, and MCQ).
ure rates of students who experienced sites); (3) Multiple Choice Question
academic difficulty during their pro- Exam (MCQ; the composite score for • H4: MCQ adds predictive accura-
fessional training.11 This sampling of a series of multiple-choice exams cy beyond PACKRAT and SUMM
20 physical therapy programs given throughout the didactic year); when differentiating PANCE Pass
predicted with 95% confidence that (4) Summative Exam (SUMM; the versus Fail groups, as compared
students who had academic difficul- 200 multiple-choice-question exam with the unique variance provided
ties during training had a 5.89 odds representing all areas of medicine by the remaining two predictors
of first-time NPTE failure compared and reflecting the types of questions (OGPA and PGPA).
to students who did not experience that may be encountered on the
academic difficulties. PANCE and given at the end of the • H5: When combined, MCQ and
In summary, the literature that clinical year, approximately 3 months SUMM have predictive accuracy
examines predictive measures of before becoming eligible for the that is comparable to the PACK-
PANCE passage is hopeful. Other PANCE); and (5) PACKRAT. RAT alone for identifying students
research has focused on predicting The internally developed exams “at risk” of failing the PANCE.
medical and health care academic (MCQ and SUMM) have remained
performance using admissions crite- consistent from year to year. A Data Analysis
ria, but these studies typically do not Kuder-Richardson Formula 20 Three consecutive years of data
focus on predicting successful first- analysis indicates adequate internal (2007, 2008, and 2009) were
attempt board or licensing examina- consistency and reliability as reflected collected, standardized, and analyzed
tions. Of the many didactic and clin- by point biserial correlation of each to determine the optimal combina-
ical assessments administered test item with the exam as a whole. tion and weighting of predictors for
throughout the student’s training, MCQ questions are taken from a fac- discriminating between passing and
several indicate strong potential for ulty-created database of questions failing PANCE performance. Two
serving as leading indicators of uploaded onto Exam Master.12 phases of analysis were conducted to
PANCE performance given that the address the stated hypotheses. Each
right testing conditions, timing, Hypotheses phase of analysis utilized the 2007
interpretation, and combination of Five hypotheses regarding the rela- dataset (N = 47) as the development
measures are used. The purpose of tionships between predictors and the sample; the 2008 and 2009 datasets
this research effort is to contribute to PANCE were evaluated: (N = 45 and 49, respectively) served
knowledge in this area to help as validation samples. A detailed
description of methodologies used to tive functions at three cut score levels performers. Given the importance of
develop and validate the PANCE pre- rather than only at the true “failure” accurately identifying those students
dictive algorithm is provided below. cut score level is two-fold. First, the most in need of early intervention,
First, a discriminant analysis was intent of this research is to provide the measures and weights used to
conducted in which PANCE scores PA educators with an “early alert” anticipate PANCE difficulties further
were dichotomized into a “Pass” system for identifying students in down the road demand a multimeth-
group (scoring at or above 450) and need of remedial instruction. An od statistical approach that ensures
a “Fail” group (scoring below 450). early alert for a broader range of rigor as well as practicality when
The 450 cut score is substantially poor performers would be desirable applying the research findings.
higher than the typical minimum as a greater number of poor
passing score on NCCPA (roughly performers could be identified and RESULTS
350 depending upon the specific assisted, which in turn could raise the Goodness of Data
year). The higher cut score of 450 overall quality of the class. Second, Before attempting to run any infer-
was selected for this research to the validity of predictive functions ential tests, overall data quality was
ensure a minimally acceptable num- needs to be tested at each proposed established. All indices confirmed
ber of observations in the Pass and cut score level to optimize classifica- sufficient data quality. The “goodness
the Fail groups. In reality, very few tion accuracy. The measures in this of data” indices did not reveal any
PA graduates actually fall below the study are reliable but are unlikely to significant deviations from normali-
350 “Fail” threshold on the PANCE. be perfectly linear in their relation- ty, in terms of skewness and kurtosis
Second, multiple regression ships with each other or with the (z scores < 1.96, ns, for indices; see
analysis was performed to more real- PANCE at all points in its range. For Table 1). When all predictors were
istically gauge the accuracy of the pre- example, a measure might be more entered into the discriminant analy-
dictive function. Specifically, PANCE accurate for predicting higher versus sis, Box M test of matrix variance is
score estimates were generated from lower PANCE scores. It is thus nonsignificant, ie, covariance matri-
predictors alone and in combination important to verify predictive accura- ces are homogeneous (Box M =
to determine the optimal model for cy for individual measures and func- 13.238, ns; see Table 2). In sum, the
accurately classifying “at risk” tions at each cut score level, so that measures to be used in this study
students.” “At risk” was defined as the strongest function can be identi- conformed to the properties of a
predicted PANCE scores that fall fied and implemented at a cut score normal distribution, demonstrated
below a specified cut score. Three cut level that optimizes its performance. homogeneity of covariance, and oth-
score levels were tested: the actual When combined, the conceptual erwise did not suggest any major vio-
“Fail” threshold (< 350), a small discriminant analysis followed by the lations of parametric statistical
bandwidth above the actual “Fail” realistic multiple regression analysis assumptions.
threshold (< 375), and a larger provided convergent validity for
bandwidth above the “Fail” threshold specifying a predictive function for
(< 400). PANCE performance that captures
The rationale for testing predic- the maximum number of low
Skewness Kurtosis
1.96, 1.96,
N Statistic SE z score P < .05 Statistic SE z score P < .05
MCQ 47 0.209 0.347 0.585 ns 0.193 0.681 0.270 ns
SUMM 47 0.124 0.347 0.347 ns 0.242 0.681 0.339 ns
OGPA 47 0.541 0.347 1.514 ns -0.711 0.681 -0.995 ns
PGPA 47 0.134 0.347 0.375 ns -1.073 0.681 -1.502 ns
PACKRAT 47 0.363 0.347 1.016 ns 0.992 0.681 1.388 ns
NCCPA 47 0.58 0.347 1.623 ns 0.861 0.681 1.205 ns
correlated with each other. This equal variance assumed for PGPA,
Table 2. Homogeneity of Covariance
overlap suggested the possibility of MCQ, SUMM, and PACKRAT and
Box M 13.238 multicollinearity in the predictor set. unequal variance assumed for OGPA
F Approx. 0.687 Stepwise discriminant and multiple (see Table 4).
df1 15.000 regression analysis will control for
df2 1069.745 this redundancy, retaining only the Test of H2: PACKRAT explains the
Sig. 0.799 significantly unique variance in the most unique variance when differen -
resulting solution. The internal tiating PANCE Pass versus Fail
validity of each predictor was further groups, as compared with the unique
Test of H1: Each of the five predictor ensured by comparing the mean variance explained by each of the
measures (OGPA, PGPA, MCQ , PANCE between-group scores other four predictors (OGPA,
SUMM, and PACKRAT) is positively (“Pass,” n = 37, raw score > = 450; PGPA, MCQ, and SUMM).
correlated with PANCE. Hypothesis 1 “Fail,” n = 10, raw score < 450). Hypothesis 2 is supported. When all
is supported. Significant positive Mean scores were found to differ sig- predictor variables were included in
Pearson R correlations between all nificantly in the expected direction, a stepwise discriminant analysis,
measures indicated linear relation- where “Pass” means were significantly PACKRAT was retained as the only
ships between each of the independ- higher than “Fail” means on each meaningful predictor in the final
ent variables with the dependent predictor. Between-group variance model, with all of the other predic-
variable PANCE. Correlations with was found to be insignificant on all tors excluded. This result implies
PANCE ranged from .539 to .858; all measures except the OGPA measure, substantial redundancy between
are significant at P <.00 (see Table 3). where the Pass group demonstrated PACKRAT and the other predictors
In addition to a significant corre- greater Y1 GPA variability than did in the set when predicting PANCE
lation with the dependent measure the Fail group. As such, between- Pass versus Fail group membership.
PANCE, predictors were significantly group mean scores were tested with The PACKRAT model explained
32.6% of PANCE between-group resources can support multiple meas- PACKRAT/SUMM model in all three
variance, accurately classifying 85.1% ures). The real world objective is cross-validation tests (87.2%; 84.4%;
of the development sample cases. accuracy and predictive power, sacri- 79.6%) as compared with the PACK-
Cross validation of the PACKRAT ficing simplicity in the process. With RAT alone model (85.1%; 82.2%;
model was performed using three this goal in mind, the other predictor 79.6%); see Table 5.
distinct methods: (1) “Leave-one- measures were assessed for their The significant partial correlation
out” classification, whereby cases in incremental contribution to PACK- of SUMM with PANCE, after
the development sample (ie, the 2007 RAT’s classification accuracy in the controlling for PACKRAT, is .494,
dataset) were systematically excluded Tests of H3, H4, and H5 below. P <.000, which further demonstrated
and then reclassified using the model the internal validity of SUMM as
developed from the remaining cases; Test of H3: SUMM has the second- adding unique variance beyond
(2) 2008 classification, whereby the highest contribution of unique vari - PACKRAT for predicting PANCE
model developed from the 2007 ance, following PACKRAT, when (see Table 6).
dataset was tested on the 2008 differentiating PANCE Pass versus
dataset; and (3) 2009 classification, Fail groups, as compared with the Test of H4: MCQ adds predictive
whereby the model developed from unique variance provided by the accuracy beyond PACKRAT and
the 2007 dataset was tested on the remaining three predictors (OGPA, SUMM, when differentiating PANCE
2009 dataset. For each of these cross- GPA and MCQ). Hypothesis 3 is Pass versus Fail groups, as compared
validation methods, the PACKRAT supported. Results indicate that with the unique variance provided by
model’s classification accuracy was, superior classification accuracy did in the remaining two predictors (OGPA
respectively, (1) 85.1%, (2) 82.2%, fact exist for the model containing and PGPA). Hypothesis 4 is partially
and (3) 79.6% (see Table 4). both the PACKRAT and the SUMM supported. Results indicate that the
The stepwise method is attractive predictors, as compared with the model containing PACKRAT,
for its ability to determine an “opti- model containing the PACKRAT SUMM, and MCQ did account for
mal” solution where both parsimony predictor alone. The PACKRAT/ relatively more unique variance (35%)
and power are maximized. The result SUMM discriminant function than PACKRAT alone (32.6%) or
is an elegant statistical model that accounted for relatively more unique PACKRAT and SUMM combined
satisfies pure research purposes. variance than does PACKRAT alone (33.9%). However, the cross-valida-
Applied research, however, is more (33.9% vs. 32.6%). Cross-validation tion results revealed fewer cases cor-
concerned with power than parsimo- results revealed as many or more rectly classified by the PACKRAT/
ny (assuming that organizational cases correctly classified by the SUMM/MCQ model in all three
Table 9. Multiple Regression “Hit Rates” Using Different Cut Scores and Predictive Models*
*NOTE: This chart shows only the hit rate, or correct classification percentage, of “at risk” students. For example, PACKRAT accurately classifies 50%, or 1 of
the 2 students, who scored below 350 in 2007, while PACKRAT and SUMM combined accurately classify 100%, or 2 of the 2 students, who scored below
350 in 2007. The correct classification of students who “passed” the PANCE is not shown in this chart, nor is it included in the hit rate percentage. The hit
rate among the disproportionate number of students who “pass” causes the hit rate of “at risk” students to be overshadowed, rendering the results unin-
terpretable.
2008, and 2009); see Table 9. These cut scores and averaged across the 3 cut score was SUMM alone (67%
analyses served to verify predictors years of data collection, are summa- accuracy), while MCQ alone was the
under more realistic conditions in rized in Table 10. Results indicated weakest solution (33% accuracy).
which a larger bandwidth of low-per- variability in predictor accuracy The results of the discriminant
forming students could be targeted across cut score levels. At the lowest analysis and multiple regression
rather than only the students who fall cut score level of 350, the PACKRAT/ analyses do not neatly converge in
below the actual failure cut score of SUMM solution was a better predictor support of a single predictive model;
350. External validity checks at mul- of at-risk students on average than however, the combined evidence
tiple pass/fail cutoffs also control for any other predictor alone or in com- points to SUMM/MCQ at a 375 cut
possible nonlinear measurement bination (50% accuracy), while MCQ score as the optimal model for early
error that could influence discrimi- alone, SUMM alone, and PACKRAT detection of struggling PA students
nant function performance at the low alone tied as the weakest solutions (71–83% discriminant function classi-
end of the PANCE scale. Predictive (16% accuracy). At the 375 cut score fication accuracy and a 67% multiple
accuracy at the low end of the level, the PACKRAT/ SUMM func- regression “hit rate” at the 375 cut
PANCE range is of critical impor- tion dropped down to 33% accuracy, score level). Alternatively, SUMM
tance and must be optimized in terms while SUMM/MCQ demonstrated alone at a cut score of 400 performs
of overall predictive accuracy of the improved accuracy of 67%. The weak- nearly as well (65–79% discriminant
function and maximum classification est solutions at the 375 cut score level function classification accuracy and a
accuracy at a specified cut score level. were again PACKRAT alone and 67% multiple regression “hit rate”).
Hit rate percentages of accurately MCQ alone with 25% classification As discussed in the next section, the
classified “at-risk” students, defined accuracy of “at-risk” students. Lastly, SUMM alone model might be the
as falling below each of the specified the strongest solution at the 400 level better solution to implement in an
50
SUMM
40
MCQ
30
PACKRAT
20
10
0
350 375 400
Cut Scores
actual program setting since this is two measures (SUMM and MCQ). desired result of higher first-time
the most parsimonious approach Another compelling feature of the PANCE passage rates. The authors
(predicting PANCE from one set of SUMM alone solution is its higher believe that intervention strategies
test scores rather than from a combi- optimal cut score, which serves to performed as early as possible would
nation of two sets of test scores) and capture a broader range of at-risk most likely lead to the highest chance
the most inclusive solution (captures students than the SUMM/MCQ solu- of success. Unfortunately, the data
more students in need of additional tion. did not support identification early
instruction due to its optimal An important goal of this research in the educational program. The ear-
performance at the higher cut score effort was to test algorithm perform- liest marker tested was MCQ. This
level of 400 vs. 375), and it does so ance at higher cut score levels in marker alone was not a good predic-
without sacrificing real world predic- order to identify a broader range of tor of at-risk students. However,
tive accuracy (both algorithms pro- “at-risk” students, ie, those who when combined with SUMM or
duce the same 67% classification might score within 50 points of the PACKRAT performance measured in
accuracy “hit rate”). 350 passing score and are still within the clinical year, predictability was
the danger zone of failing the exam. significantly stronger. It is interesting
DISCUSSION From this real world, applied to note that correlation strength
Based on the analysis of the data pre- perspective, the SUMM solution is inversely corresponded to the amount
sented, it is apparent that there is no preferable, since it functions best at a of time that passed in between the two
way to predict 100% of the time which cut score of 400 and would capture measurements.
students will be unsuccessful when more “at-risk” students than SUMM/ Specifically, the longest interval of
attempting to pass the PANCE. MCQ, which functions best at the time was between OGPA and
However, moderate accuracy can be lower cut score level of 375. PANCE, where the relatively weakest
achieved when predicting PANCE While some aspects of this correlation was found (r = 0.539). In
performance with the SUMM/MCQ research effort were realized, other contrast, PACKRAT and PANCE
algorithm at a 375 cut score or with important objectives remain unful- were taken with minimal time in
SUMM at a 400 cut score. The latter filled. Specifically, a primary goal was between and resulted in a relatively
solution may be preferable due to its to identify at-risk students as early as stronger correlation (r = 0.858). The
parsimony, as only one measure possible in their PA education so that other measures followed this pattern,
(SUMM) is needed to predict intervention strategies could be per- where a greater time interval between
PANCE rather than a combination of formed sooner and lead to the measures was associated with a weaker