0% found this document useful (0 votes)
28 views34 pages

Naval Postgraduate School: NPS-CS-20-002

This technical report analyzes machine learning methods for predicting naval aviator training outcomes. Researchers assembled a database of 18,596 pilots and Naval Flight Officers covering pretesting, classroom work, generic aircraft training, and specialized aircraft training. The data was standardized and aggregated to create 301 features per candidate. Correlation analysis identified some early indicators of success or failure in the program, though most were not surprising. The report concludes the Navy is effective at identifying candidates likely to succeed in training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views34 pages

Naval Postgraduate School: NPS-CS-20-002

This technical report analyzes machine learning methods for predicting naval aviator training outcomes. Researchers assembled a database of 18,596 pilots and Naval Flight Officers covering pretesting, classroom work, generic aircraft training, and specialized aircraft training. The data was standardized and aggregated to create 301 features per candidate. Correlation analysis identified some early indicators of success or failure in the program, though most were not surprising. The report concludes the Navy is effective at identifying candidates likely to succeed in training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

NPS-CS-20-002

NAVAL
POSTGRADUATE
SCHOOL
MONTEREY, CALIFORNIA

MACHINE LEARNING FOR ANALYSIS OF NAVY AVIATOR

TRAINING

by

Neil C. Rowe and Arijit Das

October 2020

Approved for public release; distribution unlimited


Prepared for: U.S. Fleet Forces Command
THIS PAGE INTENTIONALLY LEFT BLANK
Form Approved
REPORT DOCUMENTATION PAGE OMB No. 0704-0188
Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources,
gathering and maintaining the data needed, and completing and reviewing this collection of information. Send comments regarding this burden estimate or any other aspect of this
collection of information, including suggestions for reducing this burden to Department of Defense, Washington Headquarters Services, Directorate for Information Operations and
Reports (0704-0188), 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no
person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT
RETURN YOUR FORM TO THE ABOVE ADDRESS.
1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From-To)
14-10-2020 Technical Report 01-10-2019 – 14-10-2020
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Machine Learning for Analysis of Naval Aviator Training
5b. GRANT NUMBER
NPS-20-N309-B-NRP
5c. PROGRAM ELEMENT
NUMBER

6. AUTHOR(S) 5d. PROJECT NUMBER


Neil C. Rowe
Arijit Das 5e. TASK NUMBER

5f. WORK UNIT NUMBER

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) AND ADDRESS(ES) 8. PERFORMING


Naval Postgraduate School ORGANIZATION REPORT
1 University Circle NUMBER
Monterey, CA 93943 NPS-CS-20-002
9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S
U.S. Fleet Forces Command, 1562 Mitscher Ave., Suite 250, Norfolk VA 23551-2487 ACRONYM(S)
UCFLTFORCOM
11. SPONSOR/MONITOR’S
REPORT NUMBER(S)

12. DISTRIBUTION / AVAILABILITY STATEMENT


Distribution Statement A: Distribution unlimited

13. SUPPLEMENTARY NOTES

14. ABSTRACT
This project investigated patterns in the training data of Navy aviators in an attempt to predict their success in training.
With the help of the sponsor, we assembled a database from many sources of training data. This database covered
18,596 pilot and Naval Flight Officer candidates through their pretesting, classroom instruction, candidate training in
generic aircraft, and candidate training in specialized aircraft. This data was a challenge to organize because it had
incompatible formats and missing data. After standardizing the formats and fixing errors in the data, and aggregating
sparse records to a smaller set of average scores, we had 301 features for the candidates. We then correlated their
features using both numeric-correlation and nonnumeric-association (class-characterization) methods. We identified
38 kinds of measures of success in the program and particularly focused on correlations involving those. We did
confirm some early indicators of success and failure in the program, but most were not surprising. We conclude that
the Navy is doing a good job of identifying candidates likely to be successful.
15. SUBJECT TERMS
training, pilots, aviators, performance, testing, prediction, database, classroom, regression, correlation, classes
16. SECURITY CLASSIFICATION OF: 17. LIMITATION 18. NUMBER 19a. NAME OF
a. REPORT b. ABSTRACT c. THIS PAGE OF ABSTRACT OF PAGES RESPONSIBLE PERSON
Unclassified Unclassified Unclassified None 31 Neil C. Rowe
19b. TELEPHONE
NUMBER (include area code)
(831)656-2462
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std. Z39.18

1
THIS PAGE INTENTIONALLY LEFT BLANK

2
NAVAL POSTGRADUATE SCHOOL
Monterey, California 93943-5000

Ann E. Rondeau Rob Dell


President Acting Provost

The report entitled “Machine Learning for Analysis of Naval Aviator Training” was
prepared for Navy Pacific Fleet and funded by NPS Naval Research Program.

Further distribution of all or part of this report is subject to the Distribution


Statement appearing on the front cover.

This report was prepared by:

________________________ _______________________
Neil C. Rowe Arijit Das
Professor of Computer Science Research Associate

Reviewed by: Released by:

________________________ ________________________
Gurminder Singh Jeffrey D. Paduan
Chairman of Computer Science Dean of Research

3
ABSTRACT

This project investigated patterns in the training data of Navy aviators in an attempt to
predict their success in training. With the help of the sponsor, we assembled a database
from many sources of training data. This database covered 18,596 candidate and Naval
Flight Officer candidates through their pretesting, classroom instruction, candidate
training in generic aircraft, and candidate training in specialized aircraft. This data was a
challenge to organize because it had incompatible formats and missing data. After
standardizing the formats and fixing errors in the data, and aggregating sparse training
records to a smaller set of average scores, we had 301 features for the candidates. We
then correlated their features using both numeric-correlation and nonnumeric-association
(class-characterization) methods. We identified 38 kinds of measures of success in the
program and particularly focused on correlations involving those. We did confirm some
early indicators of success and failure in the program, most of which were not surprising.
We conclude that the Navy is doing a good job of identifying candidates likely to be
successful.

4
I. INTRODUCTION AND PREVIOUS WORK

This project investigated methods of predicting pilot training performance from earlier
data on them. The goal was to test features and combinations of them that were most
helpful in guiding the Navy on investments in training of pilots and flight officers.

Military training assessment has many difficulties due to the expense of staging realistic
exercises and the rarity of exceptional events for which warfighters must be ready (Salas,
Milham, and Bowers, 2003; Schnell, Keller, and Poolman, 2008). Skills decay is an
important issue for this kind of training (Schendel and Hagman, 1991; Foggliatto and
Anzanello, 2011; Ebbatson et al, 2012). It is thus important to thoroughly exploit
existing data through data-mining techniques to get early warning of potential problems
(Dubey, 2016; Huggins, 2018; Gombolay, Jensen, and Son, 2019). An important
subproblem is that of predicting future pilot performance, for which a variety of data-
mining techniques have been tried (Kaplan, 1965; Hunger and Burke, 2009; McFarland,
2017).

Previous analysis of the Naval training data by the sponsor was regression analysis.
However, many attribute values were missing in this data, and regression do not work
well on incomplete data. Our previous work (Rowe, 2012) provided some more robust
approaches. It examined records of carrier landings as graded by Landing Signal
Officers. We were able to show the rate at which landing success and quality increased
with experience, and we were able to correlate phrases in the comments on the landings
with the degree of eventual success the candidate had.

5
II. ANALYSIS SETUP

A. SIMPLIFYING THE DATA


The sponsor sent us data in 143 Excel tables concerning Navy training performance by
18,596 training candidates. We first converted the tables into CSV (comma-separated-
value) files to make them easier to manipulate with programs. The main categories of
tables were:

• The ASTB_IFS_API_PRI_v2.1 table which reports data from early in training.


• The “Cumulative_All_Students_2012-2019” table giving basic information about
trainee pilots such as their air wing and curricula.
• The API_DATA table which appears to cover additional scores to the preceding.
• “Academic” tables reporting test scores on written tests in training after API
instruction. These we averaged for each pilot for each course as we will explain.
• The other tables reporting scores on flight performance by the pilot trainees
(which we called “maneuver” tables). These we averaged for each pilot as we
will explain.
• “1542” files which have some data not appearing in any other tables.
A traditional database design would keep the tables separate and do joins between the
tables on their primary keys, the ID codes. See the database design discussion later.
However, the total number of pilot trainees was small enough, with 18,596 explicit pilot
ID codes, that it was simpler and more efficient to do the joins in advance and store a
single flat file in main memory for analysis. When we did this, the file was 301 columns
and 46.3 megabytes, a size that will not require much paging when stored in main
memory since processing generally can operate on one candidate at a time.

We had to be careful about the joins because there was much missing data. Some tests
are not used in some curricula; some candidates are authorized to skip certain tests; some
candidates drop out of the program and lack data for the later stages of training; and some
candidate data could not be located in the incomplete data we were given. For these
reasons, it was important to do outer joins rather than the traditional inner joins to
connect tables, meaning that unmatched values in one table were represented by null
values for their columns in the join.

B. DATA CLEANUP
The sponsor sent us data of many types from several sources. Some of the data was
numeric like traditional test scores; some was numeric in a limited range, such as grades
of 1, 2, 3, 4, or 5 on flight tests; and some of it was nonnumeric, such as candidate race,
the kind of previous flight training they had, and whether they had been given an

6
exemption on a particular evaluation. Pilot names and other personally identifiable
information were excluded.

There were many null values (blanks) in the data for measurements and features that did
not apply to particular candidates, such as tests not taken in their curriculum. Null values
were inferred for the empty string, a string consisting of a single space, “N/A”, “#N/A”,
“NONE”, and “NULL”. These were replaced by the string “NULL” to regularize them.
4804 null values for candidate ID codes occurred in the ASTB_IFS_API_PRI_v2.1 table
for records from 2000 to 2010; they were replaced with consecutive negative numbers
since the rest of their rows contained significant information. Null values for numeric
attributes generally meant missing values, so we excluded them from averages, but nulls
for nonnumeric attributes were generally important, such as a null for the type of
previous flight training which meant the candidate had no previous flight training.

We had to regularize other inconsistent formats. For instance, some grades were 0 and 1
and others were Y and N for the same test. Most training scores were the integers 0 to
100, but some were 10000 and had to be changed to 100 to avoid distorting averages.
We also converted some nonnumeric values into numeric values when it appeared
reasonable and helpful for analysis. For instance, pilot course status was rated as
“Complete”, “Pass”, “Incomplete”, “Conditional Pass”, and “Pass”; to get averages for a
pilot, we converted the first two to numeric value 1.0, the next two to 0.5, and the last to
0.0. Dates we converted to epoch time (seconds since January 1, 1970 at midnight) so
they became numeric and easier to compare.

C. CONSOLIDATING FLIGHT TEST AND ACADEMIC DATA


The main challenge in data setup for this project were dealing with the many tables for
specialized flight tests and academics (143 in all) in the later stages of training, many of
which had multiple rows for the same pilot and many nulls in the flight tests. Our study
of the tables indicated they could be combined horizontally in two circumstances. First,
some tables were labeled “v2” which meant they were the second half of another table
that exceeded the Excel size limit, so we combined those. Second, some tables were
labeled with names of different training wings but covered the same skills, so we
combined those. (We also considered combining some tables that were labeled “CH-1”
and “CH-2”, but we decided against it because they had differences in the column names
and we did not understand what they represented.) However, in seven cases of table pairs
for flight tests of the two acceptable categories, the number of columns differed between
the pair. We found this meant that some tests were not administered to the candidates
listed in one table, so we added columns of nulls to that table to permit appending tables
of equal length.

We further chose to aggregate the sparse data of the remaining flight-test tables into
fewer columns since there were so many nulls. We normalized the grades by dividing
them by the corresponding level-of-difficulty (MIF) values, then averaged them for each
pilot for a particular skill. We then took the average for each pilot over all skills on
which they were tested in a curriculum. This meant we had one average grade for each
pilot and curriculum they took, and reduced the number of such tables to two, one for

7
flight tests and one for academics. This averaging no longer allowed testing the
correlations involving particular grades; however, the database implementation to be
described did allow such queries.

8
III. ANALYSIS

A. CORRELATING PILOT FEATURES


Our analysis used programs we wrote in the Python programming language. Python is
not subject to the size limits of Excel, and it could process the files quickly once they
were converted to comma-delimited text format (CSV). Setup of the data took a few
minutes, and the correlations to be described took a few hours on a single workstation.

The data joined into a single table had 301 distinct columns and 18,596 rows representing
pilot candidates. The columns are listed in Appendix B. To determine predictive ability,
we could convert everything to numbers and run regressions. However, many numeric
attribute values were missing, and regressions can be misled when too many variables are
included, since the weaker factors may confound (interfere with the calculation of) the
stronger factors. Thus we focused on comparing pairs of columns to find those that had
statistically significant correlations on just nonnull numeric values. Once these are
found, regressions can be done on only the statistically significant sets of pairs.

However, there are two issues. First, the data for the columns is generally acquired in a
specific order, and we want to predict later data from earlier data. This meant only
certain correlations of columns were useful. Studying the sponsor’s diagrams, we
obtained this sequence of training phases:

PRE – ASTB – IFS – API – PRI – PRI2 -- INT – ADVCORE -- ADV – FRS

Here “PRE” is our own phase label representing information about the candidate before
they start any training such as their previous flight training, their grades in previous
academic work, their gender, and their race. ASTB (Aviation Selection Test Battery) is
represents initial testing, IFS (Initial Flight Screening) is initial flight school, API
(Aviation Preflight Indoctrination) is academic work on basic aviation concepts, PRI
(Primary Flight Training) is the first phase of flight experience in designated aircraft, INT
(Intermediate Flight Training) is the second phase, ADV (Advanced Flight Training) is
the third phase, and FRS (Fleet Replacement Squadron) is the graduate program. PRI2
represents later concepts in PRI, and ADVCORE represents earlier concepts in ADV.

Candidates increasingly differ in their training as they get more specialized training at the
later stages, but still follow the same basic pattern above. The Naval Flight Officers in
particular have many later courses different from those of the pilots.

Most attributes in our data are associated with particular phases. The file “Key for
ATSB_IFS_API.xlsx” provided phase information on the 116 columns of
ASTB_IFS_API_PRI_v2.1, and the names of the Academic Test and Maneuver Test files
themselves indicated their phases. Phase names for the other columns we determined
from background research.

9
Another issue is that some columns were numeric and others were nonnumeric
categorical data like graduation status. Though most columns were numeric, some
nonnumeric columns are quite important such as those relating to success of training.
This meant we had to implement four cases in correlating columns:
• Two numeric columns, such as two test scores: We did a Pearson correlation and
a linear regression from the earlier column to the later column. The correlation
was a measure of statistical significance.
• An earlier categorical column and a later numeric column: We compared the
mean in the later column for each categorical value in the earlier column. Degree
of significance was the number of standard deviations from overall mean of the
later column.
• An earlier numeric column and a later categorical column: We use the same
method as the preceding in the reverse direction.
• Two categorical columns: We measured the statistical significance as the number
of standard deviations of the frequency of the occurrence of the pair of values
from the expected frequency of a Poisson distribution based on the occurrence
rates of the values individuals. Specifically, if the value in the first column occurs
n1 times out of N1 and the value in the second column occurs n2 times out of N2,
and the pair of values occurred K times in the data, the number of standard
deviations from the expected frequency is |.

Rows with nulls for numeric columns being correlated were ignored, but nulls for
nonnumeric columns were useful and their rows retained, such as nulls for final grades
indicating a candidate had dropped out.

We did not correlate some columns we considered uninteresting:


• Columns having only one value since we cannot conclude anything from them.
• Nonnumeric columns having more than 100 values since these were unlikely to
show statistically significant trends.
• ID code number. This occurred several times in the join table because we were
using outer joins.
• Raw test scores when normalized scores were available.
• Redundant data on sex and gender.

B. MEASURES OF CANDIDATE SUCCESS


We were primarily tasked to find factors indicating future success or failure of a
candidate. 38 attributes could be relevant, both numeric and categorical:

• RetestStatus and ExamineeStatus attributes of the ATSB data

10
• IFS_DISENROLLMENT_DESCRIPTION, IFS_STATUS_NUM, and
IFS_USNA_PFP in the ASTB_IFS_API_PRI_v2.1 table
• IFS_ACAD_FAIL and IFS_FLT_FAIL in the ASTB_IFS_API_PRI_v2.1 table
• API_NSS and API_Test_FAILS in the ASTB_IFS_API_PRI_v2.1 table
• Pri, Int, and Adv in the ASTB_IFS_API_PRI_v2.1 table, representing status of
candidates in primary, intermediate, and advanced training
• NGCode in the ASTB_IFS_API_PRI_v2.1 table
• Number of ASTB1-5 and Number of ASTBE in the ASTB_IFS_API_PRI_v2.1
table
• SYL-ST (syllabus status) and STAT_RESN attributes in the Cumulative table
• NSS_UNSATS, OFFICIAL_NMU, NUM_RRU, IPC, FPC, and NSS in the
Cumulative table, that all seem to be related to grading.
• FRS_TW1_Grade, FRS_TW2_Grade, FRS_TW3_Grade, FRS_TW4_Grade,
FRS_TW5_Grade, and FRS_TW6_Grade from the FRS data.
• FRS_TW6_Status from the FRS data. This is the only FRS_Status attribute that
was not null in our data.
• Counts that we calculated on the number of nonnull records for the candidate for
each of the 10 phases. Unsuccessful candidates will be missing data for the later
phases, although incomplete records mean some successful candidates are missing
data for the earlier phases.
• Averages that we calculated for academic and flight-test grades for the PRI, INT,
and ADV phases.
Table 1 summarizes correlation information for the these columns of the data with earlier
data. (All columns are listed in Appendix B.) Positive Pearson correlations were
considered greater than 0.1 and negative correlations were less than -0.1; correlations
only compared non-null values. Other correlations had a threshold of significance of 5.0.
Note the “Suitability for Prediction” does not include obvious correlations such as
between different status measures. Note that for prediction purposes, it only makes sense
to assess the effect on an attribute on an attribute at the same or a later phase.

Table 1: Possible measures of success of a candidate.

Success- Possible Nonn Suitability for prediction


related nonnull ull
attribute and values occurr
phase ences
RetestStatus Never, 8540 Positive correlations of “Never” with AQR_345
(ASTB) 30Days, > 6.2, SAT_RAW_345 > 0.85, MCT_Z_ATBE
(column 5) 90Days,

11
180Days, > 0.55, AQR_Z_ASTBE > 0.70,
Never199 OAT_X_ASTBE > 0.70.
2,
Resume
ExamineeSta None None No nonnull data
tus (ASTB)
(column 9)
IFS_STATU Complete 13844 Positive correlations of “Disenroll” with
S (IFS) , IFS_ACAD_FAIL > 0.30, IFS_FLT_FAIL >
(column 76) Disenroll, 0.08, IFS_USNA_PRP=”PFP Completer”
Closing
IFS_STATU Numbers 13834 Mild (around 0.07) correlations with AQR,
S_NUM 0.0 to 1.0 PFAR, FOFAR, OAR, SAT.
(IFS)
(column 78)
IFS_DISEN String 5787 All nonnull values had significantly fewer
ROLLMEN previous flight hours. DOR and Performance
T_DESCRIP disenrollment had significantly lower values on
TION (IFS) MST, RCT, MCT, ANIT, Personality3, 7, DOT,
(column 77) DLT, VTT, Skill, PFAR, FOFAR, OAR, and
significantly higher on Personality1, 2, 4, 5, 6.
IFS_USNA_ String 634 Value of PFP Attrite was significantly correlated
PFP (IFS) with lower SAT, RCT, MCT, ANIT,
(column 98) Personality2, 3, 4, 5, 7, 8, 9, DOT, DLT, ATT,
Skill, AQR, PFAR.
IFS_ACAD_ Number 13844 Significantly negatively correlated with AQR,
FAIL (IFS) 0.0 to 1.0 PFAR, FOFAR, OAR, MST, RCT, MCT, ANIT,
(column 93) IFS_STATUS_NUM, less with flight-hours
attributes
IFS_FLT_F Number 13844 Positive correlations (higher failure rates) for
AIL (IFS) 0.0 to 1.0 female, African, and Asian; pipeline of SNFO,
(column 95) IFS_DATE_ENROLLED,
IFS_FPY_HRS_TO_FIRST_SOLO,
IFS_WAIVED_HOURS_TO_SOLO.
Negative correlations on HasFormalFlightInstr,
MCT, ANIT, PFAR, TOTAL_SOLO_HOURS.
API_NSS Integer 17401 Positive correlations on AQR, PFAR, FOFAR,
(API) OAR, SAT, ANI, MST, RCT, MCT, ANIT,
(column 106) DOTFactor, DLTFactor, AQR; positively
correlated with IFS_STG_1, 3, IFS_EOC,
IFS_FAA.
Negative correlations on IRS_RPY_HRS_TO_
FIRST_SOLO, IFS_ACAD_FAIL.
API_Test_F Integer 17446 Positive correlations (higher failure rates) on
AILS (API) female, Afri, Hisp, PacIslander, Aircrew, and
(column 107) Training, pipeline SNFO, IFS_ACAD_FAIL.

12
Negative correlations on AQR, PFAR, FOFAR,
OAR, MST, RCT, EOC, FAA.
Pri (status in G, UI, 14461 Positive correlations of graduate status with
training) NG, AT, FormalFlightInstrHours, SAT, Personality9,
(PRI) TG, UA AQR, IFS_STATUS value Complete.
(column 109) Negative correlations on IFS_FLT_FAIL,
API_Test_FAILS.
Int (status in G, AT, 5530 Higher API_NSS for graduates versus non-
training) UI, NG, graduates of this phase, and fewer
(INT) MA, J API_Test_FAILS.
(column 110)
Adv (status G, AT, 16005 Very similar to INT.
in training) UI, NG,
(ADV) UU, TG,
(column 111) SQ, UIT
NGCode 12 strings 1761 Too many values for useful correlation.
(PRE)
(column 113)
Number of Floating 14477 Significantly higher values for female versus
ASTB1-5 point male, African and Hispanic versus Caucasian,
(ASTB) Aircrew or Training experience, negative
(column 114) correlations with AQR, PFAR, FOFAR, OAR,
MST, RCT, and MCT.
Number of Floating 4564 Very similar correlations to Number of ASTB1-
ASTBE point 5.
(ASTB)
(column 115)
SYL_ST Comp- 6292 Higher attrition rates for Adv_Helo, E-2/C-2,
(syllabus lete, 44USCG, Tiltrotor, and USN_P3_P8, and lower
status) Active, rates for Adv_Stk and CV12ADV_TILT.
(ADV) Attrite
(column 124)
STAT_RESN 20 strings 260 Too many values for useful correlation.
(reason for
syllabus
status)
(ADV)
(column 125)
NSS_UNSAT Number 3012 Significantly higher for SYL_ST value Attrite.
S (ADV) No correlations possible with ASTB and IFS
(column 127) tests because no candidates have data for both.
OFFICIAL_ Number 3012 Values strongly positively correlated with
NMU (ADV) NSS_UNSATS. Same comments as for
(column 129) NSS_UNSATS.

13
NUM_RRU Number 3012 Values positively correlated with
(ADV) NSS_UNSATS. Same comments as for
(column 130) NSS_UNSATS.
IPC (ADV) Number 3012 Not significantly correlated with anything.
(column 131)
FPC (ADV) Number 3012 Positively correlated with NSS_UNSATS,
(column 132) ALL_NMU, and OFFICIAL_NMU.
NSS (ADV) Floating 5221 Positive correlations on SYL_ST value Attrite.
(column 133) point Negative correlations on NSS_UNSATS,
ALL_NMU, and OFFICIAL_NMU.
FRS_TW6_ Number 1274 Positive correlations on DOTFactor, IFS_FAA,
Grade (FRS) IFS_ACAD_FAIL, API_FY, PRI_TW-
(there were 5_166A_CH-1_Academics grade, NFO_TW-
no grades for 6_171_MPR_E6_CH-1 grade,
TW1, 2, 3, 4, PRI_166B_Academics grade,
or 5) (column PRI_166A_CH-1_Academics grade,
151) PRI_166A_CH-2_Academics grade, AER2,
API_COUNT, PRI_COUNT.
Negative correlations on female, Hisp race,
lSNP pipeline, PFP Attrite, Pri, Adv;
significantly higher for Afri race;
HasFormalFlightInstr, Personality5,
Personality8, IFS_STG_1, 2, 3,
IFS_PRIOR_HRS_2_IFS, NFO_TW-
6_171_Core_Academics grade, NFO_TW-
6_155C_157B_Int grade, NFO_TW-
6_158F_CH-1_Fighter grade, NFO_TW-
6_158F_CH-1_Strike grade, NFO_TW-
6_164A_CH-1 grade, NFO_TW-6_164 grade,
NFO_TW-6_171_E2_MPR_E6 grade,
INT_COUNT, ADV_COUNT
FRS_TW6_S Number 7682 Positively correlations on ADV_R_TW-
tatus (FRS) 0.0 to 1.0 5_156D_Academics, ADV-R_TW-
(column 152) 5_156D_GTN650_CN_Academics, ADV-
S_TW-1_167A_CH-1_Academics, INT-J-
167A_Academics, INT-J_167_CH-
2_Academics, NFO_TW-6_158F_CH-
1_Fighter_Academics, NFO_TW-
6_162_Pri1_Academics, NFO_TW-
6_162_Pri2_Academics, NFO_TW-
6_164A_CH-1_Academics, NFO_TW-
6_171_Core_Academics, NFO_TW-
6_171_Core_CH-1_Academics, NFO_TW-
6_171_E2_CH1_Academics, NFO_TW-
6_171_E2_MPR_E6_Academics,
PRI_166A_CH-1_Academics, PRI_166A_CH-

14
2_Academics, ADV-E2_176 grade, ADV-
R_TW-5_156D_GTN650 grade, ADV-R_TW-
5_156D grade, INT-J_167A grade, NFO_TW-
6_155C_157B_Int grade, NFO_TW-
6_155C_Primary grade, NFO_TW-6_158F_CH-
1_Strike grade, NFO-TW-6_162A_Pri2 grade,
NFO_TW-6_162_Pri1 grade, NFO_TW-6_163
grade, NFO_TW-6_164A_CH-1 grade,
NFO_TW-6_164 grade, NFO_TW-6_164 grade,
NFO_TW-6_171_Core grade, NFO_TW-
6_171_E2_CH1 grade, NFO_TW-
6_171_E2_MPR_E6 grade, PRI_166A_CH-2
grade, PRI_TW-5_166A_TOP-offs_CH-1 grade,
PRI_TW-5_166B grade, AER2, AWX1.
Negative correlations on ADV-E2_TW-
4_147G_T-44A_Academics, INT-J_167_CH-
2_Academics, NFO_TW-6_164A_Academics,
PRI_166B_Academics, NFO_TW-6_163A
grade, PRI_166A_CH-1 grade, INT_COUNT.
Count of Floating 18596 Positive correlations on MCT, AQR, PFAR,
nonnull point FOFAR, OAR, IFSFISCAL_YEAR; negative
values for the correlations on IFS_TOTAL_FLIGHT_TIME.
API phase Negative correlations on SNP and SNFO
(API) pipelines, Disenroll for IFS_STATUS,
(column 288) IRS_STG_1, 2, 3.
Count of Floating 18596 Positive correlations on MCT, AQR, PFAR,
nonnull point FOFAR, OAR, SNA pipeline,
values for the IFS_STATUS_NUM, IFS_STG_1, 2, 3,
PRI phase PRI_166A_CH-1_Academics grade, NFO_TW-
(PRI) 6_162A_Pri1 grade, NFO_TW-6_162_Pri1
(column 289) grade, PRI_TW-5_166A_Top-offs_CH-1 grade,
API_COUNT.
Negative correlation on PRI_166A_CH-
2_Academics grade.
PRI Floating 13555 Positively correlations on “Never”, “180Days”,
academic point and “Resume” in RetestStatus, AQR, PFAR,
average FORAR, OAR, MST, RCT, ANIT, IFS_EOC,
(column 290) IFS_FAA, API_NSS, AER1, AER2, AWX1,
ENG1, FRR1, NAV1.
Negative correlations on HasFormalFlightInstr,
IFS_PILOT_SCHOOL of Trident Gulf Shores,
IFS_ACAD_FAIL, API_Test_FAILS,
PRI_COUNT.
PRI flight Floating 10664 Positive correlations on HasFormalFlightInstr,
average point AQR, PFAR, FOFAR, OAR, SAT, ANI, MCT,
(column 291) ANIT, DOTFactor, IFS_EOC, IFS_FAA,

15
API_NSS, AER1, AER2, AWX1, ENG1, FRR1,
NAV1.
Negative correlations on IFS_ACAD_FAIL,
IFS_FLT_FAIL, API_Test_FAILS.
Count of Floating 18596 Positive correlations on License for previous
nonnull point training, AQR, PFAR, FOFAR, OAR, SAT,
values for the ANIT, MCT, ATTFactor, SNA pipeline,
INT phase API_NSS, PRI_166A_CH-1_Academics grade,
(INT) PRI_166A_CH-2_Academics grade,
(column 293) PRI_166B_Academics grade, NFO_TW-
6_155C_Primary grade, NFO_TW-6_162A_Pri1
grade, NFO_TW-6_162A_Pri2 grade,
PRI_166A_CH-1 grade, PRI_166A_CH-2 grade,
PRI_166B grade, AER1, AER2 AWX1, ENG1,
NAV1, PRE_COUNT, PRI_COUNT.
Negative correlations on female, any race but
Cauc, Aircrew for previous training,
IFS_TOTAL_FLIGHT_TIME,
API_Test_FAILS, NFO_TW-6_155C_157B_Int
grade, NFO_TW-6_155C_157B_Int_Academics
grade, NFO_TW-6_162A_Pri1_Academics
grade, NFO_TW-6_162_Pri1_Academics grade
INT Floating 8153 Positive correlations on AQR, PFAR, FOFAR,
academic point OAR, ANI, MST, RCT, ANIT, Personality8,
average AQR, IFS_STG_1, IFS_STG_3, IFS_EOC,
(column 294) IFS_FAA, AER1, AER3, AWX1, ENG1,
FRR1, NAV1, PRI_ACADEMIC_AV,
PRI_FLIGHT_AV.
Negative correlations on IFS_ACAD_FAIL,
API_test_FAILS, Number of ASTB1-5, Number
of ASTBE.
INT flight Floating 3301 Positive correlations on DLTFactor, Number of
average point ASTBE, PRI_TW-5_166A grade, PRI_TW-
(column 295) 5_166A_Top-offs_CH-1 grade, API_COUNT,
PRI_COUNT, PRI_FLIGHT_AV.
Negative correlations on HasFormalFlightInstr,
MCT, ANIT, ATTFactor, AQR, OAR,
IFS_PRIOR_HRS_2_IFS, PRI_166A_CH-1
grade, PRI_166A_CH-2 grade, PRI_166B grade,
ENG1.
Count of Floating 18,596 Positive correlations on female, any race other
nonnull point than Cauc, Pri graduation, Int graduation,
values for the ADV-E2-176_Academics grade, ADV-E2-
ADV phase 176_CN_Academics grade, NFO_TW-
(ADV) 6_164A_CH-1_CN_Academics grade,
(column 297) NFO_TW-6_171_Core_Academics grade,

16
PRI_166A_CH-1_Academics grade,
PRI_166A_CH-2_Academics grade,
ADV_E2_TW-1_176_CN grade, INT-
J_167_CH-2 grade, NFO_TW-6_155C_Primary
grade, NFO_TW-6_158F_CH-1_Fighter grade,
NFO_TW-6_158F_CH-1_Strike grade,
NFO_TW-6_162A_Pri1 grade, NFO_TW-
6_162A_Pri2 grade, NFO_TW-6_162_Pri1
grade, NFO_TW-6_163 grade, NFO_TW-
6_164A_CH-1_CN_Feb_18 grade, NFO_TW-
6_171_Core grade, PRI_166A_CH-1 grade,
PRI_166A_CH-2 grade, PRI_TW-5_166A_top-
offs_CH-1 grade, PRE_COUNT, API_COUNT
PRI_COUNT, PRI2_COUNT,
ADVCORE_COUNT.
Negative correlations on ANIT, ATTFactor,
PFAR, ADV-E2_176_CN_Academics grade,
NFO_TW-6_164A_Academics grade, INT-
J_167A grade, NFO_TW-6_164A grade,
NFO_TW-6_164 grade, PRI_TW-4_166A_CH-
2 grade, PRI_166B grade.
ADV Floating 9712 Positive correlations on PFAR, FOFAR, OAR,
academic point MST, RCT, MCT, ANIT, AQR, IFS_EOC,
average IFS_FAA, API_NSS, AER1, AER2, AWX1,
(column 298) ENG1, FRR1, PRI2_COUNT, INT_COUNT,
INT_ACADEMIC_AV, INT_FLIGHT_AV,
NAV1.
Negative correlations on IFS_ACAD_FAIL,
API_Test_FAILS, Int nongraduate,
API_COUNT,
ADV flight Floating 4593 Positive correlations on AQR, PFAR, FOFAR,
average point OAR, MST, MCT, ANIT,
(column 299) IFS_TOTAL_FLIGHT_TIME, IFS_EOC,
API_NSS, INT-J_167_CH-
2_Academics_RAW_SCORE_DV, NFO_TW-
6_155C_157B_Int_Academics, NFO_TW-
6_155C_Primary_Academics, NFO_TW-
6_162A_Pri1_Academics, NFO_TW-
6_162A_Pri2_Academics, NFO_TW-
6_162_Pri2_Academics, NFO_TW-
6_155C_Primary grade, NFO_TW-6_157B_Int
grade, NFO_TW-6_162A_Pri1 grade,
NFO_TW-6_162A_Pri2 grade,
NFO_TW-6_162_Pri1 grade,
PRI_166A_CH-1 grade, AER1, AER2, AWX1,
ENG1, FRR1, NAV1, API_COUNT,

17
PRI_COUNT, PRI_ACADEMIC_AV,
PRI_FLIGHT_AV, INT_FLIGHT_AV.
Negative correlations on
FormalFlightInstrHours, IFS_ACAD_FAIL,
API_Test_FAILS, Pri nongraduate,
INT_J_167A_Academics, NFO_TW-6,
PRI_166A_CH-1_Academics, INT-J_167A
grade, NFO_TW-6_155C_157B_Int grade.
Count of Floating 18,596 Positive correlations on female, Cauc race,
nonnull point AQR, PFAR, FOFAR, IFS_STG_1,
values for the IFS_STG_2, IFS_STG_3, Pri graduate, Int
FRS phase graduate, Adv graduate, NFO_TW-
(FRS) 6_155C_157B_Int_Academics grade,
(column 300) NFO_TW-6_162_Pri1_Academics grade,
NFO_TW-6_164A_Academics grade,
NFO_TW-6_164_Academics grade, NFO_TW-
6_171_Core_Academics grade, PRI_166A_CH-
1_Academics grade, PRI_166A_CH-
2_Academics grade, PRI_166B_Academics
grade, ADV-E2_176 grade, INT-J_167A grade,
INT-J_167_CH-2 grade, NFO_TW-
6_155C_Primary grade, NFO_TW-6_162_Pri1
grade, NFO_TW-6_163A grade, NFO_TW-
6_163 grade, NFO_TW-6_164A_CH-
1_CN_Feb_18 grade, NFO_TW-6_164A_CH-1
grade, NFO_TW-6_164A grade, NFO_TW-
6_171_Core grade, PRI_TW-5_166A grade,
PRE_COUNT, API_COUNT, PRI_COUNT,
INT_COUNT, ADVCORE_COUNT,
ADV_COUNT.
Negative correlations on Aircrew and Winged
training, DOTFactor,
IFS_TOTAL_FLIGHT_TIME, ADV-
S_167A_CH-1_Academics grade, INT-
J_167_CH-2_Academics grade, INT-
J_167A_Academics grade, PRI_166A_CH-
1_Academics grade, ADV-R_TW-
5_156D_GTN560 grade, INT-J_167A grade,
NFO_TW-6_164 grade, NFO_TW-
6_171_E2_CH-1 grade, PRI_166A_CH-1 grade,
PRI_166A_CH-2 grade.

18
C. DISCUSSION

The reliability of these correlations was hampered by the low counts of candidates who
drop out of the training program. For instance, IFS_STATUS recorded 13,460
candidates who completed and only 374 who were disenrolled; SYL_ST had 5369 who
were complete and 260 who were attrited. Data were specifically sparse for the flight
tests since there were so many specialized curricula; the aggregation we did was essential
to make sense of the data. It is difficult to do useful machine learning when there is such
a strong bias in one direction, in this case success in the program.

Nonetheless, we did see some interesting trends. Note that since correlations were only
calculated on pairs of values where both values were non-null, correlations on later
phases did not include people who attrited at earlier phases.

• There were some strong correlations of success with increasing dates, but these
are likely spurious due to having more complete data for recent candidates.
• There were some strong correlations of success with number of flight hours.
However, “Formal flight instruction hours” correlated negatively with several
measures of final success. It may be that weaker candidates are attrited, get more
remedial instruction, or that formal flight instruction on different aircraft confuses
candidates.
• Female gender and minority race showed relatively more failures early in training
but relatively fewer failures later in training.
• Several ASTB test results correlated well with success in IFS, Primary,
Intermediate, and FRS; we gather that the ASTB has been designed to do this.
However, ASTB metrics were not helpful in predicting success in the Advanced
training, by which time many additional skills have been learned.
• Several Primary, Intermediate and Advanced training grades correlated positively
with both success in Advanced training and FRS. We gather these are useful
metrics that should be preserved. However, some of the advanced-training grades
correlated negatively with success, and these should be investigated further.
Perhaps the grades tend to be recorded more for “makeup” activities for
candidates who have failed in other skills, or perhaps the training associated with
those skills is counterproductive.
• Some of the strong correlations to phase counts may be due to policy rather than
candidate aptitude, as when candidates are attrited if they fail to score sufficiently
well on a metric or fail a benchmark too many times. We are not familiar with
Navy policy and cannot guess what the attrition conditions are. However, being
attrited in the next phase after a test score is probably a good indicator of aptitude
rather than policy because a policy on that test score would have attrited them
earlier.

19
IV. DESIGN OF A DATABASE

An alternative way to store the data is in a traditional database, and this offers more
flexibility in running queries on it. We built a prototype with Oracle XE for Laptops
since the Navy has an Oracle license. The SQL Developer (SQLD) interface tool was
used to access Oracle XE and run SQL queries. After all the data preparation is
completed on the laptop, the final schema was moved (using SQLD) to an Oracle 19c
Database residing on the campus Network Operations Center. SQLD has a tool to load
an Excel spreadsheet into a database table which loaded the files. Access the database
was over port 1521 as setup by the Network Operations staff.

The main tables needed for a traditional database design are a student table, a curriculum
table, a score table, a student-curriculum linking table, and a student-score linking table.
There will be many scores for each student, so there needs to be an auxiliary data
structure holding links to the score records for each student. Additional tables were sent
us beyond those mentioned earlier, and they could be helpful in database queries.
Examples are the list of curricula and their names, the descriptions of the coded values,
and the descriptions of the column labels.

An Oracle database has data and metadata constraints we had to address including:

• Column name length: Database column names usually have limits. Fortunately in
the 19c version the limits have been increased beyond the 30 character limit in
previous versions.
• Non-alphabetic characters in column names: A database column name cannot
have a hyphen so this was replaced by an underscore. Other characters that had to
be replaced were “(“, “)” , “.”, and “/”.
• Values that were spaces: These needed to be replaced by nulls since that is what
they meant.
• Dates and times: Most formats used were either “MM-DD-YYYY” (Date) or
“MM-DD-YYYY HH24:MI” (Timestamp). Some date values did not fit either
format, and were loaded as a Character type and cleaned up later. All dates were
converted to epoch time as described in section II.
• As with the previously described analysis, missing ID_CODE values were
replaced with sequential negative numbers.

A problem with doing outer joins with the ID_CODE attribute is that its values will occur
twice in the columns of the result. The usual to do the outer join using SQL would be:
CREATE TABLE T3 AS SELECT * FROM T1 FULL OUTER JOIN T2 ON
T1.ID_CODE = T2.ID_CODE

20
The problem with this is that the “*” (select all columns) will confuse the SQL as the
ID_CODE column is in both tables. One option is the write out all the column names
instead of the “*” but that would be a tedious as we have tables with hundreds of
columns. One option is to rename (in SQLDEVELOPER) the ID_CODE column to
ID_CODE1 (in table T1) and ID_CODE2 (in table T2). So now the OUTER JOIN
“SELECT *” would generate 2 ID_CODE columns (ID_CODE1 & ID_CODE2), and the
rest of the columns. Now to combine the two ID_CODE columns we had to first create 2
separate tables using SQL,“CREATE TABLE T5 AS (SELECT * FROM T2 where
T3.ID_CODE2 = NULL)”, followed by DROP COLUMN of ID_CODE2. Next we
created a table T6 with ID_CODE2. Next the columns ID_CODE1 & ID_CODE2 (in
tables T5 & T6) were renamed to ID_CODE (SQLDEVELOPER). Finally the 2 tables
were merged into one table (ID_CODE column) using the SQL, “CREATE TABLE T7
AS (SELECT * FROM T5) UNION (SELECT * FROM T6)”. This process was repeated
for all tables one by one till a FULL OUTER JOIN (of all tables) was generated.

The ML_ADV_E2_TW_1_176_ACADEMICS table has scores each time a test was


taken by a student, so to get an average value the SQL used was:

SELECT IC_CODE, SUM(RAW_SCORE_DV)/COUNT(RAW_SCORE_DV) as


RAW_SCORE_DV_AVG
GROUP BY ID_CODE
ORDER BY ID_CODE”.

To take into account the “Degree of Difficulty” a similar SQL query was used where the
score was divided by the MIP column value.

21
V. CONCLUSIONS

Our results identified quite a few factors helpful in predictions, some that were obvious
and some that were not. We did not see any obvious factors in performance that the
Navy is not acting upon. What factors we measured as significant such as previous flight
training, gender, and race are not ones the Navy can control practically or legally.
Overall, we conclude that the Navy is doing a good job predicting performance of
candidate candidates from their multistage testing program.

Future work should definitely try to obtain more complete data on the candidates, as
many potentially useful comparisons such as between cumulative metrics such as NMU,
RRU, IPC, FPC, and NSS lacked sufficient data for us. Further work could investigate
additional metrics for predicting performance by additional testing; combinations of
factors could show new trends. An approach of combining factors with a set-covering
machine-learning approach to optimize statistical significance is promising and should be
explored.

22
APPENDIX A: DELIVERABLES

Besides output files, we are sending to the sponsor the programs (in the Python
programming language) we created:
• pilotscript2.py: Runs overall analysis script.
• extract_labels_from_csv.py: Used to remove columns labels from CSV (comma
separated value) files and store them in a separate file. The results are
“_nolabels.csv” and “_labels.txt” files.
• bettersplit.py: This splits rows in delimited files carefully, taking into account
quotation marks and carriage returns within entries.
• count_all_nonnull_column_values: Counts the number of values not null in a
given column of a given table. Useful since there are so many nulls in this data.
• append_and_extract_labels_from_csv: Combines two academic or flight-test files
that contain similar data.
• append_and_extract_labels_from_csv: Combines all pairs of academic or flight-
test files where either (1) one of the pair is “v2” of the other, or (2) the data is the
same curriculum for different training wings.
• aggregate_academic_data_all.py: Aggregates all the files with classroom grades
(with “Academic” in their names).
• aggregate_maneuver_data_all.py: Aggregates all the files with flight-test grades.
• aggregate_manever_data.py: Aggregates data for a single flight-test file for both
multiple grades of a single pilot on a single skill and on all skills in a particular
curriculum.
• join_files_out_pilots.py: Does an outer join of two tables on specified column
numbers, with columns separated by a given delimiter, and stores the result in a
specified file.
• setup_earlyfile.py: Replaces null ID codes with negative numbers in the
ASTB_IFS_API_PRI_v2.1 table.
• get_time_patterns.py: Counts the number of records for each phase of training for
each pilot.
• join_files_outer_pilots.py: Does an outer join between two tables in CSV form,
also joining the column labels files. A join matches rows in two tables based on
attribute values in a column of each table. An outer join inserts nulls for values in
one table that do not match anything in the other table; this was important for our
data because many candidate records were missing information.
• correlate_table_columns.py: Compares columns of a CSV file by doing tests of
statistical significance as described. Currently it focuses on the “success” metrics
described previously.
• jekamp_nolabels.csv and jekamp_labels.txt: Comma-separated join of all the
relevant data, including some aggregated data, and the labels for the columns.
• jekamp_nolabels_unarystats.txt: Statistics on each column of jekamp, with
average and standard deviation of the nonnull values for numeric and date
columns, and the possible values for nonnumeric columns.

23
• jekamp_nolabels_binarystats.txt: Statistics on the correlation of each column
measuring success in one way or another with all the other columns, as explained
in the discussion of correlation in this document.

24
APPENDIX B: ATTRIBUTES ANALYZED

Table 2 lists the attributes of the complete join of all the tables. For Type, N=numeric,
S=string, and U=uninteresting or unused. Dates and yes/no attributes are converted into
numbers. The last columns 285-300 were calculated by us from the other data as
additional metrics of candidate success. Note the duplicate ID_CODE columns are
necessary when combining data from candidates not appearing in all tables since we are
doing outer joins rather than inner joins and the ID code may not appear in all tables
joined.

Table 2: Complete list of attributes in the full join of all tables.

Attribute name Type Phase


0 ID_CODE U PRE
1 Examineeid U PRE
2 Gender S PRE
3 Race S PRE
4 AviationTraining S PRE
5 RetestStatus S ASTB
6 HasFormalFlightInstr N PRE
7 FormalFlightInstrDesc S PRE
8 FormalFlightInstrHours N PRE
9 ExamineeStatus S ASTB
10 TestID S ASTB
11 DesignTestID U ASTB
12 StartDt N ASTB
13 EndDT N ASTB
14 Form N ASTB
15 AQR_RAW_345 N ASTB
16 AQR_345 N ASTB
17 PFAR_RAW_345 N ASTB
18 PFAR_345 N ASTB
19 FOFAR_RAW_345 N ASTB
20 FOFAR_345 N ASTB
21 OAR_RAW_345 N ASTB
22 OAR_345 N ASTB
23 MCT_RAW_345 N ASTB
24 SAT_RAW_345 N ASTB
25 ANI_RAW_345 N ASTB
26 ANI_345 N ASTB
27 MST_RAW_345 N ASTB
28 MST_345 N ASTB
29 RCT_RAW_345 N ASTB
30 RCT_345 N ASTB
31 Status_345 S ASTB
32 RecruitingBranch_A S PRE
33 ExamineeStatus_A S PRE
34 MST_Z_ASTBE N ASTB
35 RCT_Z_ASTBE N ASTB
36 MCT_Z_ASTBE N ASTB
37 ANIT_Z_ASTBE N ASTB

25
38 Personality1_Z_ASTBE N ASTB
39 Personality2_Z_ASTBE N ASTB
40 Personality3_Z_ASTBE N ASTB
41 Personality4_Z_ASTBE N ASTB
42 Personality5_Z_ASTBE N ASTB
43 Personality6_Z_ASTBE N ASTB
44 Personality7_Z_ASTBE N ASTB
45 Personality8_Z_ASTBE N ASTB
46 Personality9_Z_ASTBE N ASTB
47 DOTFactor_Z_ASTBE N ASTB
48 DLTFactor_Z_ASTBE N ASTB
49 ATTFactor_Z_ASTBE N ASTB
50 VTTFactor_Z_ASTBE N ASTB
51 SkillFactor_Z_ASTBE N ASTB
52 AQR_Z_ASTBE N ASTB
53 AQR_Stanine_ASTBE U ASTB
54 PFAR_Z_ASTBE N ASTB
55 PFAR_Stanine_ASTBE U ASTB
56 FOFAR_Z_ASTBE N ASTB
57 FOFAR_Stanine_ASTBE U ASTB
58 OAR_Z_ASTBE N ASTB
59 OAR_T_ASTBE N ASTB
60 IFS_LOCATION S IFS
61 IFS_PILOT_SCHOOL S IFS
62 IFSFISCAL_YEAR N IFS
63 IFS_BRANCH S IFS
64 IFS_TRAINING_PIPELINE S IFS
65 IFS_DATE_ENROLLED N IFS
66 IFS_DATE_COMPLETED_OR_DISENROLLED N IFS
67 IFS_DAYS_ENROLLED N IFS
68 IFS_DAYS_PENDING N IFS
69 IFS_TOTAL_FLIGHT_TIME N IFS
70 IFS_TOTAL_DUAL_HOURS N IFS
71 IFS_TOTAL_SOLO_HOURS N IFS
72 IFS_TOTAL_LANDINGS N IFS
73 IFS_NIGHT_HOURS N IFS
74 IFS_COMPLETED_CROSS_COUNTRY N IFS
75 IFS_FPY_HRS_TO_FIRST_SOLO N IFS
76 IFS_STATUS S IFS
77 IFS_DISENROLLMENT_DESCRIPTION S IFS
78 IFS_STATUS_NUM N IFS
79 IFS_WAIVED_HOURS_TO_SOLO N IFS
80 IFS_WAIVED_DAYS_TO_SOLO N IFS
81 IFS_WAIVED_DAYS_TO_COMPLETE N IFS
82 IFS_CLASS_NO S IFS
83 IFS_SUPERVISORS_COMMENTS S IFS
84 IFS_DATE_OF_LAST_FLIGHT N IFS
85 IFS_GENDER U IFS
86 IFS_RACE U IFS
87 IFS_ETHNICITY U IFS
88 IFS_STG_1 N IFS
89 IFS_STG_2 N IFS
90 IFS_STG_3 N IFS

26
91 IFS_EOC N IFS
92 IFS_FAA N IFS
93 IFS_ACAD_FAIL N IFS
94 IFS_ACAD_FAIL_BINARY U IFS
95 IFS_FLT_FAIL N IFS
96 IFS_FLT_FAIL_BINARY U IFS
97 IFS_PRIOR_HRS_2_IFS N IFS
98 IFS_USNA_PFP S IFS
99 API_FY N API
100 API_Service S API
101 API_Program N API
102 API_Desig N API
103 API_Source S API
104 API_StartCls S API
105 API_EndCls S API
106 API_NSS N API
107 API_Test_FAILS N API
108 Trawing S API
109 Pri S PRI
110 Int S INT
111 Adv S ADV
112 Select S PRI
113 NGCode S PRE
114 Number of ASTB1-5 N ASTB
115 Number of ASTBE N ASTB
116 WING S PRE
117 SQDN S PRE
118 PHASE_NAME S PRE
119 ID_CODE N PRE
120 BRANCH S PRE
121 SYLLABUS S PRE
122 VERSION S PRE
123 SYL_TRACK S PRE
124 SYL_ST S ADV
125 STAT_RESN S ADV
126 SYL_STAT_DATE N PRE
127 NSS_UNSATS N ADV
128 ALL_NMU N ADV
129 OFFICIAL_NMU N ADV
130 NUM_RRU N ADV
131 IPC N ADV
132 FPC N ADV
133 NSS N ADV
134 ID_CODE U PRE
135 ADV-E2_TW-1_176_Academics_RAW_SCORE_DV N ADV
136 ADV-E2_TW-1_176_CN_Academics_RAW_SCORE_DV N ADV
137 ADV-E2_TW-4_147G_T-44A_Academics_RAW_SCORE_DV N ADV
138 ADV-R_TW-5_156D_Academics_RAW_SCORE_DV N ADV
139 ADV-R_TW-5_156D_GTN650_CN_Academics_RAW_SCORE_DV N ADV
140 ADV-R_TW-5_156D_GTN650_CN_CH- N ADV
1_Academics_RAW_SCORE_DV
141 ADV-S_167A_CH-1_Academics_RAW_SCORE_DV N ADV
142 ADV-S_TW-2_167A_Academics_RAW_SCORE_DV N ADV

27
143 FRS_TW1_Grade N FRS
144 FRS_TW1_Status U FRS
145 FRS_TW2_Grade N FRS
146 FRS_TW2_Status U FRS
117 FRS_TW4_Grade N FRS
148 FRS_TW4_Status U FRS
149 FRS_TW5_Grade N FRS
150 FRS_TW5_Status U FRS
151 FRS_TW6_Grade N FRS
152 FRS_TW6_Status S FRS
153 INT-E2_TW-4_175_Academics_RAW_SCORE_DV N INT
154 INT-J_167A_Academics_RAW_SCORE_DV N INT
155 INT-J_167_CH-2_Academics_RAW_SCORE_DV N INT
156 INT-T_TW-5_161_Academics_RAW_SCORE_DV N INT
157 INT-T_TW-5_161_CH-1_Academics_RAW_SCORE_DV N INT
158 INT-T_TW-5_161_CH-2_Academics_RAW_SCORE_DV N INT
159 INT-T_TW-5_161_CH-2_GTN650_CN_Academics_RAW_SCORE_DV N INT
160 INT-T_TW-5_161_CH-2_GTN650_CN_CH- N INT
1_Academics_RAW_SCORE_DV
161 NFO_TW-6_155C_157B_Int_Academics_RAW_SCORE_DV N INT
162 NFO_TW-6_155C_Primary_Academics_RAW_SCORE_DV N PRI
163 NFO_TW-6_157B_Int_Academics_RAW_SCORE_DV N INT
164 NFO_TW-6_158F_CH-1_ATM_Academics_RAW_SCORE_DV N ADV
165 NFO_TW-6_158F_CH-1_Fighter_Academics_RAW_SCORE_DV N ADV
166 NFO_TW-6_158F_Strike_CH-1_Academics_RAW_SCORE_DV N ADV
167 NFO_TW-6_162A_Pri1_Academics_RAW_SCORE_DV N PRI
168 NFO_TW-6_162A_Pri2_Academics_RAW_SCORE_DV N PRI2
169 NFO_TW-6_162B_Academics_RAW_SCORE_DV N ADV
170 NFO_TW-6_162_Pri1_Academics_RAW_SCORE_DV N PRI
171 NFO_TW-6_162_Pri2_Academics_RAW_SCORE_DV N PRI2
172 NFO_TW-6_164A_Academics_RAW_SCORE_DV N ADV
173 NFO_TW-6_164A_CH-1_Academics_RAW_SCORE_DV N ADV
174 NFO_TW-6_164A_CH-1_CN_Academics_RAW_SCORE_DV N ADV
175 NFO_TW-6_164_Academics_RAW_SCORE_DV N ADV
176 NFO_TW-6_171_Core_Academics_RAW_SCORE_DV N ADVCORE
177 NFO_TW-6_171_Core_CH-1_Academics_RAW_SCORE_DV N ADVCORE
178 NFO_TW-6_171_E2_CH1_Academics_RAW_SCORE_DV N ADV
179 NFO_TW-6_171_E2_MPR_E6_Academics_RAW_SCORE_DV N ADV
180 NFO_TW-6_171_E6_CH-1_Academics_RAW_SCORE_DV N ADV
181 NFO_TW-6_171_MPR_CH-1_Academics_RAW_SCORE_DV N ADV
182 NFO_TW-6_171_MPR_E6_CH-1_Academics_RAW_SCORE_DV N ADV
183 PRI_166A_CH-1_Academics_RAW_SCORE_DV N PRI
184 PRI_166A_CH-2_Academics_RAW_SCORE_DV N PRI
185 PRI_166B_Academics_RAW_SCORE_DV N PRI
186 PRI_TW-5_166A_Academics_RAW_SCORE_DV N PRI
187 PRI_TW-5_166A_Top-offs_CH-1_Academics_RAW_SCORE_DV N PRI
188 ID_CODE N PRE
189 1542.147GT44AAdvE2C2_DATA_COUNT N PRE
190 1542.147GT44AAdvE2C2_DATA_GRADE N PRE
191 1542.176CNATRANOTEJUNE19_DATA_COUNT N PRE
192 1542.176CNATRANOTEJUNE19_DATA_GRADE N PRE
193 1542.176_DATA_COUNT N PRE
194 1542.176_DATA_GRADE N PRE
195 ADV-E2_176_COUNT N ADV

28
196 ADV-E2_176_GRADE N ADV
197 ADV-E2_TW-2_176_CN_COUNT N ADV
198 ADV-E2_TW-2_176_CN_GRADE N ADV
199 ADV-R_TW-5_156D_GTN650_CH-1_COUNT N ADV
200 ADV-R_TW-5_156D_GTN650_CH-1_GRADE N ADV
201 ADV-R_TW-5_156D_GTN650_COUNT N ADV
202 ADV-R_TW-5_156D_GTN650_GRADE N ADV
203 ADV-R_TW-5_156D_COUNT N ADV
204 ADV-R_TW-5_156D_GRADE N ADV
205 ADV_E2_TW-1_176_CN_COUNT N ADV
206 ADV_E2_TW-1_176_CN_GRADE N ADV
207 INT-E2_TW-4_175_COUNT N INT
208 INT-E2_TW-4_175_GRADE N INT
209 INT-J_167A_COUNT N INT
210 INT-J_167A_GRADE N INT
211 INT-J_167_CH-2_COUNT N INT
212 INT-J_167_CH-2_GRADE N INT
213 INT-T_TW-5_161_CH-2_GTN650_CH-1_COUNT N INT
214 INT-T_TW-5_161_CH-2_GTN650_CH-1_GRADE N INT
215 INT-T_TW-5_161_CH-2_GTN650_COUNT N INT
216 INT-T_TW-5_161_CH-2_GTN650_GRADE N INT
217 INT-T_TW-5_161_CH-2_COUNT N INT
218 INT-T_TW-5_161_CH-2_GRADE N INT
219 NFO_TW-6_155C_157B_Int_COUNT N INT
220 NFO_TW-6_155C_157B_Int_GRADE N INT
221 NFO_TW-6_155C_Primary_COUNT N PRI
222 NFO_TW-6_155C_Primary_GRADE N PRI
223 NFO_TW-6_157B_Int_COUNT N INT
224 NFO_TW-6_157B_Int_GRADE N INT
225 NFO_TW-6_158F_CH-1_ATM_COUNT N ADV
226 NFO_TW-6_158F_CH-1_ATM_GRADE N ADV
227 NFO_TW-6_158F_CH-1_Fighter_COUNT N ADV
228 NFO_TW-6_158F_CH-1_Fighter_GRADE N ADV
229 NFO_TW-6_158F_CH-1_Strike_COUNT N ADV
230 NFO_TW-6_158F_CH-1_Strike_GRADE N ADV
231 NFO_TW-6_162A_Pri1_COUNT N PRI
232 NFO_TW-6_162A_Pri1_GRADE N PRI
233 NFO_TW-6_162A_Pri2_COUNT N PRI2
234 NFO_TW-6_162A_Pri2_GRADE N PRI2
235 NFO_TW-6_162B_COUNT N ADV
236 NFO_TW-6_162B_GRADE N ADV
237 NFO_TW-6_162_Pri1_COUNT N PRI
238 NFO_TW-6_162_Pri1_GRADE N PRI
239 NFO_TW-6_163A_COUNT N ADV
240 NFO_TW-6_163A_GRADE N ADV
241 NFO_TW-6_163_COUNT N ADV
242 NFO_TW-6_163_GRADE N ADV
243 NFO_TW-6_164A_CH-1_CN_Feb_18_COUNT N ADV
244 NFO_TW-6_164A_CH-1_CN_Feb_18_GRADE N ADV
245 NFO_TW-6_164A_CH-1_COUNT N ADV
246 NFO_TW-6_164A_CH-1_GRADE N ADV
247 NFO_TW-6_164A_COUNT N ADV
248 NFO_TW-6_164A_GRADE N ADV

29
249 NFO_TW-6_164_COUNT N ADV
250 NFO_TW-6_164_GRADE N ADV
251 NFO_TW-6_171_Core_CH-1_COUNT N ADVCORE
252 NFO_TW-6_171_Core_CH-1_GRADE N ADVCORE
253 NFO_TW-6_171_Core_COUNT N ADVCORE
254 NFO_TW-6_171_Core_GRADE N ADVCORE
255 NFO_TW-6_171_E2_CH1_COUNT N ADV
256 NFO_TW-6_171_E2_CH1_GRADE N ADV
257 NFO_TW-6_171_E2_MPR_E6_COUNT N ADV
258 NFO_TW-6_171_E2_MPR_E6_GRADE N ADV
259 NFO_TW-6_171_E6_CH-1_COUNT N ADV
260 NFO_TW-6_171_E6_CH-1_GRADE N ADV
261 NFO_TW-6_171_MPR_CH-1_COUNT N ADV
262 NFO_TW-6_171_MPR_CH-1_GRADE N ADV
263 NFO_TW-6_171_MPR_E6_CH-1_COUNT N ADV
264 NFO_TW-6_171_MPR_E6_CH-1_GRADE N ADV
265 PRI_166A_CH-1_COUNT N PRI
266 PRI_166A_CH-1_GRADE N PRI
267 PRI_166A_CH-2_COUNT N PRI
268 PRI_166A_CH-2_GRADE N PRI
269 PRI_166B_COUNT N PRI
270 PRI_166B_GRADE N PRI
271 PRI_TW-5_166A_COUNT N PRI
272 PRI_TW-5_166A_GRADE N PRI
273 PRI_TW-5_166A_Top-offs_CH-1_COUNT N PRI
274 PRI_TW-5_166A_Top-offs_CH-1_GRADE N PRI
275 PRI_TW-5_166A_Top-offs_COUNT N PRI
276 PRI_TW-5_166A_Top-offs_GRADE N PRI
277 ID_CODE U PRE
278 AER1 N API
279 AER2 N API
280 AWX1 N API
281 ENG1 N API
282 FRR1 N API
283 NAV1 N API
284 ID_CODE U PRE
285 PRE_COUNT (calculated) N PRE
286 ASTB_COUNT (calculated) N ASTB
287 IFS_COUNT (calculated) N IFS
288 API_COUNT (calculated) N API
289 PRI_COUNT (calculated) N PRI
290 PRI_ACADEMIC_AV (calculated) N PRI
291 PRI_FLIGHT_AV (calculated) N PRI
292 PRI2_COUNT (calculated) N PRI2
293 INT_COUNT (calculated) N INT
294 INT_ACADEMIC_AV (calculated) N INT
295 INT_FLIGHT_AV (calculated) N INT
296 ADVCORE_COUNT (calculated) N ADVCORE
297 ADV_COUNT (calculated) N ADV
298 ADV_ACADEMIC_AV (calculated) N ADV
299 ADV_FLIGHT_AV (calculated) N ADV
300 FRS_COUNT (calculated) N FRS

30
LIST OF REFERENCES

Ambriz, A. (2017, June). Database system design and implementation for Marine air-
traffic-controller training. M.S. thesis, Naval Postgraduate School.
Dubey, R. (2016). Performance evaluation of military training exercises using data
mining. M.S. thesis, University of Skovde.
Ebbatson, M., Harris, D., Huddleston, J., and Sears, R. (2012). Manual flying skill
decay. In De Voogt, A., and D’Olivera, T., (eds.), Mechanisms in the Chain of Safety:
Research and Operational Experiences in Aviation Psychology, Farham, UK: Ashgate,
pp. 67-80.
Fogliatto, M., and Anzanello, M. (2011). Learning curves: The state of the art and
research directions. In Jaben, M. (ed.), Learning Curves: Theory, Models, and
Applications, Boca Raton, FL: CRC Press, pp. 3-21.
Gombolay, M., Jensen, R., and Son, S.-H. (2019, June). Machine learning techniques for
analyzing training behavior in serious gaming. IEEE Transactions on Games, Vol. 11,
No. 2, pp. 109-120.
Huggins, K. (Ed.) (2018). Military applications of data analytics. New York: Auerbach.
Hunter, D., and Burke, E. (2009). Predicting aircraft pilot-training success: A meta-
analysis of published research. Intl. Jnl. of Aviation Psychology, Vol. 4, No. 4, pp. 297-
313.
Kaplan, H. (1965). Prediction of success in Army aviation training. Technical research
report 1152, U.S. Army Personnel Research Office,
McFarland, M. (2017, May). Student pilot aptitude as an indicator of success in a part
151 collegiate flight training program. Ph.D. dissertation, Kent State University College.
Rowe, N. (2012, December). Automated trend analysis for Navy-carrier landing
attempts. Proc. Interservice/Industry Training, Simulation, and Education Conference
(I/ITSEC), Orlando, FL.
Salas, E., Milham, L., and Bowers, C. (2003). Training evaluation in the military:
Misconceptions, opportunities, and challenges, Military Psychology, Vol. 15, No. 1, pp.
3-16.
Schendel, J., and Hagman, J. (1991). Long-term retention of motor skills. In Training
for Performance: Principles of Applied Human Learning, Chichester, UK: Wiley, pp. 53-
92.
Schnell, T., Keller, M., and Poolman, P., 2008. Quality of training effectiveness
assessment (QTEA): A neurophysiologically based method to enhance flight training.
Proc. 27th Digital Avionics Systems Conference, October, pp. 4.D.6-1-4.D.6-13.

31
INITIAL DISTRIBUTION LIST

1. Defense Technical Information Center


Ft. Belvoir, Virginia

2. Dudley Knox Library


Naval Postgraduate School
Monterey, California

3. Research Sponsored Programs Office, Code 41


Naval Postgraduate School
Monterey, CA 93943

32

You might also like