CPT Coding

This document summarizes and compares three research papers on using machine learning and natural language processing techniques to predict Current Procedural Terminology (CPT) codes from medical text data: 1) The first paper develops a neural machine translation model to translate procedure texts to standardized terms to predict CPT codes, finding it performs equivalently to support vector machines and LSTM models. 2) The second paper compares several machine learning models for CPT coding, finding support vector machines have the highest accuracy at 87.9% compared to random forests, LSTMs, and other models. 3) The third paper also compares support vector machines, XGBoost, and BERT models to predict CPT codes from pathology reports,

Uploaded by

hsyedamaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

200 views10 pages

CPT Coding

Uploaded by

hsyedamaria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Research papers on the prediction of

Current Procedural Terminology (CPT)

codes in medical data
Neural Machine Translation–Based Automated Current Procedural Terminology Classification System Using Procedure Text: Development
and Validation Study

Introduction
• This paper has developed an automated anesthesiology current procedural terminology (CPT) prediction system that translates manually entered surgical
procedure text into standard forms using neural machine translation (NMT) techniques. The standard forms are calculated using similarity scores to
predict the most appropriate CPT codes
• The model's performance is compared with that of previously developed machine learning algorithms for CPT prediction.

Dataset
• The researchers collected and analyzed all operative procedures performed at Michigan Medicine between January 2017 and June 2019 (2.5 years).
• The first 2 years of data were used to train and validate the existing models and compare the results from the NMT-based model. Data from 2019 (6-
month follow-up period) were then used to measure the accuracy of the CPT code prediction.
• Three experimental settings were designed with different data types to evaluate the models.

NMT Model Architecture

• The authors developed NMT-based automated CPT coding system that first translates surgical procedure texts in electronic health records (EHRs) into
preferred terms from the Unified Medical Language System (UMLS) and then normalizes the translated preferred term to predict CPT codes.
• Within medicine, each surgical procedure contains a surgical procedure text and a preoperative diagnosis entered by a surgeon or surgical resident. After
completion of the procedure, surgical and anaesthesiology CPT codes were assigned by clinical staff and/or professional medical coders.
• The manually entered texts are the input source, and the preferred terms of the assigned CPT codes are the output target sentences of the NMT model.
• In this study, surgical procedure texts and preoperative diagnoses were the inputs of the model to predict CPT codes.
• Once trained, the NMT model generates multiple candidate translation outputs ranked by a beam search algorithm.
• The top three target sentences were retained and processed through step 2: transformation. With the three target sentences, the best CPT code was
computed in the transformation step using the Levenshtein and Jaccard distances.
The architecture of the NMT-based automated CPT prediction system is shown above.

• The researchers also selected the SVM and LSTM models as the baseline models. For SVM model development, they applied grid search cross-validation
for training and tuning hyperparameters. The input features of the SVM model were bigrams extracted from the training data and weighted using the
term frequency-inverse document frequency.
• For the LSTM model development, a sequence of words from the procedure text and preoperative diagnosis text in the training data was fed into the
embedding layer. The embedding layer then converted each word in the sequence to a vector representation using a Word2Vec model pretrained on
PubMed, PubMed Central, and Wikipedia. The LSTM model was trained on this sequence of vector representations and returned a hidden vector from
each state that was passed through a fully connected layer. A final softmax layer was then used to predict the final label.
Results
• The results in the figure indicate that the top-1 and top-3 accuracies of the NMT-based model were equivalent to those of the SVM and LSTM
models using procedure texts.
• The study also demonstrated that the use of additional information, such as preoperative diagnosis, improves SVM, LSTM, and NMT model
performance.
Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning

Introduction
• This paper uses data science techniques applied to perioperative electronic health record data across multiple centers. Anesthesia CPT code classification
models were developed via multiple machine learning methods and evaluated.
• The study hypothesized that machine learning and NLP could be used to develop an automated system capable of classifying anesthesia CPT codes with
accuracy exceeding current benchmarks.
• This classification modeling could prove beneficial in efforts to optimize performance and reduce costs for research, quality improvement, and
reimbursement tasks reliant on such codes.

Dataset Used
• This study included all patients, adults and pediatrics, undergoing
elective or emergent procedures with an institution-assigned
valid anesthesia CPT code and an operative date between January
1st, 2014 and December 31st, 2016 from 16 contributing centers
in the Multicenter Perioperative Outcomes Group database.
• This data set includes both academic hospitals and community
based practices across the United States.
• A second and distinct data set was created using cases on
patients undergoing elective or urgent procedures with a valid
institution-assigned CPT code between October 1st, 2015 and
November 1st, 2016 from a single Multicenter Perioperative
Outcomes Group institution not included in the Train/Test data
set.
• This “Holdout” data set was used for external validation of the
models created in this study. The figure shows a flow diagram of
the data sets used and the experimental design of this study.
Features
• To maximize the number of cases included in the study, the features used in each model were limited to perioperative electronic health record data
commonly found in anesthesia records:
• age
• gender
• American Society of Anesthesiologists (ASA) physical status
• emergent status
• procedure text
• procedure duration
• derived procedure text length (number of words in procedure text)
• Institution-assigned anesthesia CPT codes were used as labels for each case and each case represents an instance for machine learning modelling.

Supervised Machine Learning Methods

• Five unique supervised machine learning classification models were compared: Random Forest, Long Short-term Memory, Extreme Gradient Boosting,
Support Vector Machine, and Label-embedding Attentive Model.
• After initial hyper-parameter tuning, all models were trained and tested 20 times using 5-fold cross validation: 80% of data for training and the remaining
20% for testing.
• The deep learning methods in this study were the label-embedding attentive model and long short-term memory. Procedure texts for these models were
encoded into vectors using word2vec embedding as input.
• The label-embedding attentive model encoded the descriptions for each anesthesia CPT from the CPT Professional Edition medical code set maintained by
the American Medical Association. Most deep learning models for text classification only embed input (feature) text.
• A “compatibility matrix” was computed between embedded words and labels via cosine similarity.
• From this matrix, an attention score was calculated for each word and the entire procedural text sequence was then derived as the average of embedded
words, weighted by the attention scores. This score was used for CPT classification.
Results on Train/Test Dataset
• The highest overall accuracy was found with the support vector machine model (87.9%, CI 87.6–88.2%) (table 2). Extreme gradient boosting (87.9%, CI
87.5–88.3%), and long short-term memory (86.4%, 83.5–89.3%), and the label-embedding attentive model (84.2%, CI 84.1–84.3%) were all more
accurate than random forest model (82.0%, CI 68.1–95.9%).
• Using CPT categories to identify cases for which the random forest model demonstrated differential performance, there was a low of 70.7% for
radiology procedures and a high of 92.0% for shoulder procedures. There was an observed positive relationship between the number of cases
comprising a specific CPT code and the accuracy of the models for the CPT code, with a Pearson correlation of 0.72.
• Overall accuracy within the top three was 96.8% for support vector machine model and 94.0% for the label-embedding attentive model.
• Within validation, an overall accuracy of 82.1% in the Holdout data set of the best performing model (label-embedding attentive model) was observed.
Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Introduction
• The primary objective of this study is to compare the capacity to delineate primary CPT procedural codes (CPT 88302, 88304, 88305, 88307, 88309)
corresponding to case complexity across state-of-the-art machine learning models over a large corpus of more than 93,039 pathology reports from the
Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM).
• They compared XGBoost, SVM, and BERT methodologies for the prediction of primary CPT codes as well as 38 ancillary CPT codes, using both the
diagnostic text alone and text from all subfields.

Data Acquisition
• The researchers obtained Institutional Review Board approval and accessed more than 96,418 pathologist reports from DPLM, collected between June
2015 and June 2020.
• They removed a total of 3,379 reports that did not contain any diagnostic text associated with CPT codes, retaining 93,039 reports.
• Each report was appended with metadata, including corresponding EPIC (EPIC systems, Verona, WI), Charge Description Master (CDM), and CPT
procedural codes, the sign-out pathologist, the amount of time to sign out the document, and other details.
• The documents were deidentified by stripping all PHI-containing fields and numerals from the text and replacing with holder characters.

Machine Learning Models

The study implemented the following three machine-learning algorithms as a basis for the text classification pipeline.

SVM.
• An SVM model was trained to make predictions by using the UMAP embeddings formed from the tf-idf matrix. The SVM operates by learning a hyperplane
that obtains maximal distance (margin) to datapoints of a particular class.
Bag of words with XGBoost
• XGBoost algorithms operate on the entire word by report count matrix and ensemble or average predictions across individual Classification and Regression
Tree (CART) models.
• Individual CART models devise splitting rules that partition instances of the pathology notes based on whether the count of a particular word or phrase in a
pathology note exceeds an algorithmically derived threshold.

BERT
• Bert model was trained by using the HuggingFace Transformers package.
• The researchers used a collection of models that have already been pretrained on a large medical corpus in order to both improve the predictive accuracy of
their model and significantly reduce the computational load.
• Most BERT models limit the document characterization length to 512 words. To address this, they split pathology reports into document subsections when
training BERT models.
Results
• The study indicates that the XGBoost and BERT methodologies produce highly accurate predictions of both primary and ancillary CPT codes, which has the
potential to save operating costs by first suggesting codes prior to manual inspection and flagging potential manual coding errors for review.
• Further, both the BERT and XGBoost models preserved the ordering of the code/case complexity, where most of the misclassifications were made
between codes of a similar complexity.

Hit-Refresh-Satya-Nadella ENGLISH
0% (1)
Hit-Refresh-Satya-Nadella ENGLISH
377 pages
Opel Astra H (2004-2009) Fuses and Fuse Box Diagram and Location
No ratings yet
Opel Astra H (2004-2009) Fuses and Fuse Box Diagram and Location
28 pages
The Ultimate Zoom Poker Strategy Guide PDF
0% (1)
The Ultimate Zoom Poker Strategy Guide PDF
25 pages
Certified Coding Associate (CCA) Exam
No ratings yet
Certified Coding Associate (CCA) Exam
2 pages
PLAINTIFFS' VERIFIED COMPLAINT - BLACK LIVES MATTER SHENANDOAH VALLEY v. DONALD L. SMITH, Sheriff, Augusta County, Virginia
No ratings yet
PLAINTIFFS' VERIFIED COMPLAINT - BLACK LIVES MATTER SHENANDOAH VALLEY v. DONALD L. SMITH, Sheriff, Augusta County, Virginia
46 pages
CPT Professional 2019
100% (1)
CPT Professional 2019
1,124 pages
Artificial Intelligence in Medicine: Yihan Deng, André Sander, Lukas Faulstich, Kerstin Denecke
No ratings yet
Artificial Intelligence in Medicine: Yihan Deng, André Sander, Lukas Faulstich, Kerstin Denecke
14 pages
Wind Effects On Structures
No ratings yet
Wind Effects On Structures
53 pages
ICD Intro
100% (1)
ICD Intro
13 pages
Medical Coding Brochure 2024 (Updated)
No ratings yet
Medical Coding Brochure 2024 (Updated)
6 pages
Integumentary Surgery
100% (1)
Integumentary Surgery
13 pages
CPT
No ratings yet
CPT
56 pages
Coding and Payment Guide For The Physical Therapist An Essential Coding Billing and Payment Resource For The Physical Therapist
No ratings yet
Coding and Payment Guide For The Physical Therapist An Essential Coding Billing and Payment Resource For The Physical Therapist
29 pages
Integumentary Surgery Guidelines
100% (1)
Integumentary Surgery Guidelines
13 pages
Frequently Used CPT Codes
100% (2)
Frequently Used CPT Codes
2 pages
US
100% (1)
US
68 pages
Insurance Claim Adjudication
No ratings yet
Insurance Claim Adjudication
1 page
Diagnostic Coding Guidelines
No ratings yet
Diagnostic Coding Guidelines
27 pages
Radiology CPT Notes
No ratings yet
Radiology CPT Notes
14 pages
Federal Register of September 5, 2007 (Edocket - Access.gpo - Gov) Office of Management and Budget (WWW - Whitehouse.gov)
No ratings yet
Federal Register of September 5, 2007 (Edocket - Access.gpo - Gov) Office of Management and Budget (WWW - Whitehouse.gov)
3 pages
Welcome To The Webinar: S Stems
100% (1)
Welcome To The Webinar: S Stems
23 pages
Page 301 To 360
100% (1)
Page 301 To 360
60 pages
CPT - Integumentory Coding Guidelines-10000 (Class Notes)
100% (1)
CPT - Integumentory Coding Guidelines-10000 (Class Notes)
46 pages
19 - Symptoms, Signs and Abnormal Clinical and Laboratory Findings, Not Elsewhere Classified (R00-R99) - 1
100% (1)
19 - Symptoms, Signs and Abnormal Clinical and Laboratory Findings, Not Elsewhere Classified (R00-R99) - 1
23 pages
AAPC CPC Coding Exam 2011 The Certification Step Final Examination
No ratings yet
AAPC CPC Coding Exam 2011 The Certification Step Final Examination
35 pages
ICD10 Basics
No ratings yet
ICD10 Basics
47 pages
Musculoskeletal Series
No ratings yet
Musculoskeletal Series
9 pages
HCC Computation Manual
No ratings yet
HCC Computation Manual
195 pages
WEF The Moment of Truth For Healthcare Spending 2023
100% (1)
WEF The Moment of Truth For Healthcare Spending 2023
62 pages
CPC TB Student Sample1
No ratings yet
CPC TB Student Sample1
68 pages
Chapter 1 Overview of Coding
No ratings yet
Chapter 1 Overview of Coding
48 pages
Chapter 4 Surgery Musculoskeletal System CPT Codes 20000 - 29999
No ratings yet
Chapter 4 Surgery Musculoskeletal System CPT Codes 20000 - 29999
20 pages
CPC Medical Coding
No ratings yet
CPC Medical Coding
6 pages
Integumentary System CPT Orginal
100% (1)
Integumentary System CPT Orginal
22 pages
Change Healthcare Real-Time Payer List
No ratings yet
Change Healthcare Real-Time Payer List
13 pages
ACRQ Chapter 03
100% (1)
ACRQ Chapter 03
7 pages
Ifhima Newsletter Jun2021 Fin2
100% (1)
Ifhima Newsletter Jun2021 Fin2
13 pages
CPT Codes: Hepatitis C
No ratings yet
CPT Codes: Hepatitis C
2 pages
Chapter 20: Medicine - Set A
No ratings yet
Chapter 20: Medicine - Set A
4 pages
Anatomy
100% (1)
Anatomy
302 pages
Special Coding Advice During COVID-19 Public Health Emergency
No ratings yet
Special Coding Advice During COVID-19 Public Health Emergency
31 pages
Risk Adjustment Methodology Transcript
No ratings yet
Risk Adjustment Methodology Transcript
82 pages
Aapc Whitepaper Cic Coc Ccs Comparison 2 1 Int
No ratings yet
Aapc Whitepaper Cic Coc Ccs Comparison 2 1 Int
6 pages
HCC For CRC Exam
100% (1)
HCC For CRC Exam
55 pages
Anesthesia CPT Codes
No ratings yet
Anesthesia CPT Codes
2 pages
Modifiers - Approved List
No ratings yet
Modifiers - Approved List
12 pages
Icd 10 CM Conversion Table FY2025 1
100% (1)
Icd 10 CM Conversion Table FY2025 1
111 pages
Medical Scribe Trainee Test Prep Material (1) and More
No ratings yet
Medical Scribe Trainee Test Prep Material (1) and More
11 pages
2025 Frequently Used OT CPT and HCPCS Codes
No ratings yet
2025 Frequently Used OT CPT and HCPCS Codes
5 pages
AAPC RA Codibbbjjj
100% (3)
AAPC RA Codibbbjjj
122 pages
E and M Questions
No ratings yet
E and M Questions
6 pages
Operativenotesinsurgery
No ratings yet
Operativenotesinsurgery
9 pages
Billing Cheat-Sheet: Office Visit Codes
No ratings yet
Billing Cheat-Sheet: Office Visit Codes
4 pages
Joint & Tendon Injection: Coding Corner
No ratings yet
Joint & Tendon Injection: Coding Corner
4 pages
2 - Series - Ans 2022
No ratings yet
2 - Series - Ans 2022
6 pages
May16 Spinal Fusion
No ratings yet
May16 Spinal Fusion
1 page
CPT Code Guide - Formatted
No ratings yet
CPT Code Guide - Formatted
29 pages
The Science of HCC Documentation and Coding
No ratings yet
The Science of HCC Documentation and Coding
20 pages
Cardiovascular System CPT
No ratings yet
Cardiovascular System CPT
20 pages
Hia 340e Final Powerpoint
No ratings yet
Hia 340e Final Powerpoint
37 pages
Unit 2 - Medical Coding Scenarios
No ratings yet
Unit 2 - Medical Coding Scenarios
2 pages
Integumentary System - 37101626 - 2024 - 06 - 18 - 13 - 23
100% (1)
Integumentary System - 37101626 - 2024 - 06 - 18 - 13 - 23
1 page
Daman Wound Care Mangment
No ratings yet
Daman Wound Care Mangment
5 pages
CPT 10000 Series - Integumentary System Practice Test
100% (1)
CPT 10000 Series - Integumentary System Practice Test
9 pages
Multitask Multimodal 2
No ratings yet
Multitask Multimodal 2
12 pages
TDA8139
No ratings yet
TDA8139
5 pages
TARUN 230914500082 11092023 NoMemo H
No ratings yet
TARUN 230914500082 11092023 NoMemo H
6 pages
EM 300 G3 Manual 12 2016 EN
100% (1)
EM 300 G3 Manual 12 2016 EN
60 pages
Contoh Label
No ratings yet
Contoh Label
2 pages
Final Showdown 2
No ratings yet
Final Showdown 2
46 pages
Javascript Sop 1 To Sop 7
No ratings yet
Javascript Sop 1 To Sop 7
19 pages
CV
No ratings yet
CV
4 pages
Anne Enright
No ratings yet
Anne Enright
4 pages
Division of Negros Occidental
No ratings yet
Division of Negros Occidental
5 pages
What Does Regenerative Air Pre-Heater Means, Why They Named So
No ratings yet
What Does Regenerative Air Pre-Heater Means, Why They Named So
10 pages
Bell ADT D-Series General Info
100% (1)
Bell ADT D-Series General Info
32 pages
Sumo
No ratings yet
Sumo
21 pages
Halo Lighting Architectural Lighting Catalog 1985
No ratings yet
Halo Lighting Architectural Lighting Catalog 1985
84 pages
Adani Foundation Annual Report - 2020-21
No ratings yet
Adani Foundation Annual Report - 2020-21
33 pages
Remote Viewing Dialogues Daz Smith PDF Download
No ratings yet
Remote Viewing Dialogues Daz Smith PDF Download
87 pages
3rd Year 1st Semester
No ratings yet
3rd Year 1st Semester
11 pages
of Sedimentary Basins - Notes
100% (1)
of Sedimentary Basins - Notes
44 pages
DynaMed Plus - Thyroid Nodule
No ratings yet
DynaMed Plus - Thyroid Nodule
85 pages
Comparison of PM Frameworks
67% (3)
Comparison of PM Frameworks
77 pages
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
No ratings yet
Ticketcreator Barcodechecker Manual: Check Secure Tickets With Barcodes
8 pages
Tonic: Digital Media Strategy
No ratings yet
Tonic: Digital Media Strategy
8 pages
NCP - Format Spina Bifida
100% (2)
NCP - Format Spina Bifida
2 pages
Guide
No ratings yet
Guide
64 pages
FDP Manual - Petrel Dynamic Modeling PDF
83% (6)
FDP Manual - Petrel Dynamic Modeling PDF
28 pages
Learner Guide CHCCCS007 - Develop and Implement Service Programs
No ratings yet
Learner Guide CHCCCS007 - Develop and Implement Service Programs
45 pages

CPT Coding

Uploaded by

CPT Coding

Uploaded by

Research papers on the prediction of

Current Procedural Terminology (CPT)

NMT Model Architecture

Supervised Machine Learning Methods

Machine Learning Models

You might also like