Figure 2.
Illustration of data structure of the BRAVA study. The first column shows the true second cancer breast event (SBCE) status obtained through chart review in validation data. The data is often available for a small subset of patients (size = n). The second column is the SBCE status from an automated algorithm, which is subject to misclassification (size = N). The last 4 columns represent the set of risk factors (ie, year, age, stage, and ER_PR [Surveillance, Epidemiology, and End Results (SEER), estrogen receptor (ER), and progesterone receptor (PR) status of index breast cancer]), which are available for all subjects in the EHR data.