SlideShare a Scribd company logo
Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library
Generating Computable Phenotype
Intersection Metadata Using the Phenoflow
Library
Toward Implementation: Addressing Real-World Deployments
S25
Martin Chapman, Vasa Curcin
King’s College London
Luke V. Rasmussen,
Jennifer A. Pacheco
Northwestern University
Laura K. Wiley
WashU Medicine
DISCLOSURE OF CONFLICTS OF INTEREST
I have not had any relationships with ACCME-defined ineligible companies within
the past 24 months.
Background: Computable phenotypes
Knowledge objects that capture the logic required to identify individuals with a
disease or condition from their medical records.
Phenotype libraries
Online phenotype catalogues, which store a significant number of computable
phenotypes for the same disease or condition.
Phenotype definition multiplicity
This is a good thing (mostly)…
• It is not desirable (or feasible) to have a single computable phenotype for
every condition. Different use cases necessitate different logic.
But…
• We need to understand which use cases are already supported, to facilitate
reuse. In other words, we need to understand what is unique about each
phenotype. This can then be stored as metadata.
Phenotype intersection
To understand what is unique about each
phenotype (and thus which use cases it best
supports), we can first do the opposite and
understand how two phenotypes for the
same condition intersect.
We can aim to do this automatically and
therefore at scale.
Barriers to automated intersection analysis
1. Identifying when two computable phenotypes target the same disease or
condition in the first place.
• e.g. ‘T2DM Implementation’ vs. ‘Type 2 Diabetes Mellitus’ (PheKB)
2. Comparing different forms of computable phenotypes
• e.g. codelists vs. Natural Language Processing (NLP)
Methods: Identifying same disease/condition
1. Levenshtein distance to identify text similarity.
2. HDR UK API calls to identify phenotypes that target the
same condition but lack text similarity using common
keywords.
3. Large Language Model (LLM) (Llama 3.1) to validate the
additional phenotypes returned in 2. (Not all 161 definitions
are actually for diabetes).
Methods: Comparing definitions
Results: Intersection – Condition groups
1171 definitions loaded into the Phenoflow library.
137 condition groups (conditions with two or more phenotypes). PPV 95%.
574 definitions exist as a part of a group (49%).
Good insight into the extent of the definition multiplicity phenomenon.
Results: Intersection – Steps
Trend: Across the 10 largest condition
groups, the average number of steps in
common between pairs of definitions
relative to the average number of
steps in the group is low.
While definition multiplicty exists,
definitions still have a considerable
number of unique steps.
Results: LLM impact
We observed our LLM:
• Identifying false positives (e.g. matches between phenotypes for different
types of heart failure).
• Identifying false negatives (e.g. phenotype names that do not include the
condition but still aim to identify the condition via the presence of
medications).
Summary and Future work
The use of Phenoflow has allowed us to compare definitions to understand more
about definition multiplicity (extensive) and intersection (limited).
Integrating an LLM increases the reliability of this process.
Unique steps will soon be added to Phenoflow as metadata to support reuse.
To complement definition intersection insight (horizontal), definition
subsumption (vertical) will be explored next.
Links
Implementation (Python): https://github.com/phenoflow/curator
Data analysis (Jupyter): https://github.com/phenoflow/intersection-analysis
Live Phenoflow site: https://kclhi.org/phenoflow

More Related Content

PPT
Reasoning Requirements for Bioscience
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
PPTX
Information extraction from EHR
PDF
Enhancing the Human Phenotype Ontology for Use by the Layperson
PDF
Enhancing the Human Phenotype Ontology for Use by the Layperson
PPT
Ontology and the National Cancer Institute Thesaurus (2005)
PDF
The Monarch Initiative Phenotype Grid
PPT
Biomedical literature mining
Reasoning Requirements for Bioscience
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Information extraction from EHR
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
Ontology and the National Cancer Institute Thesaurus (2005)
The Monarch Initiative Phenotype Grid
Biomedical literature mining

Similar to Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library (20)

DOCX
Current Directions in PsychologicalScience2015, Vol. 24(4).docx
PPT
Text mining for protein and small molecule relations
PPTX
Sequence-Similarity-Identity-and-Homology-Unveiling-Evolutionary-Relationship...
PPTX
2018 NF Conference Cutaneous Neurofibroma
PPTX
A knowledge capture framework for domain specific search systems
PDF
Biomarkers brain regions
PPTX
How to analyse large data sets
PPTX
Exploiting Edinburgh's Guide to PHARMACOLOGY database as a source of protein ...
PPTX
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
PDF
Annotating The Biomedical Literature For The Human Variome
PDF
NLP tutorial at AIME 2020
PDF
ZHE-BHI2012
PPTX
Systems Immunology -- 2014
PDF
Survey and Evaluation of Methods for Tissue Classification
PPT
2011-10-11 Open PHACTS at BioIT World Europe
PDF
Assessing Drug Safety Using AI
PPTX
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
PPT
Biological literature mining - from information retrieval to biological disco...
PPT
Can there be such a thing as Ontology Engineering?
PDF
Comparative Genomics and Visualisation - Part 1
Current Directions in PsychologicalScience2015, Vol. 24(4).docx
Text mining for protein and small molecule relations
Sequence-Similarity-Identity-and-Homology-Unveiling-Evolutionary-Relationship...
2018 NF Conference Cutaneous Neurofibroma
A knowledge capture framework for domain specific search systems
Biomarkers brain regions
How to analyse large data sets
Exploiting Edinburgh's Guide to PHARMACOLOGY database as a source of protein ...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
Annotating The Biomedical Literature For The Human Variome
NLP tutorial at AIME 2020
ZHE-BHI2012
Systems Immunology -- 2014
Survey and Evaluation of Methods for Tissue Classification
2011-10-11 Open PHACTS at BioIT World Europe
Assessing Drug Safety Using AI
The Q-Codes: Metadata, Research data, and Desiderata_2018 12 04_gl20_Author_R...
Biological literature mining - from information retrieval to biological disco...
Can there be such a thing as Ontology Engineering?
Comparative Genomics and Visualisation - Part 1
Ad

More from Martin Chapman (20)

PDF
Phenoflow: An Architecture for FAIRer Phenotypes
PDF
Principles of Health Informatics: Artificial intelligence and machine learning
PDF
Principles of Health Informatics: Clinical decision support systems
PDF
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
PDF
Technical Validation through Automated Testing
PDF
Scalable architectures for phenotype libraries
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Using AI to autonomously identify diseases within groups of patients
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Principles of Health Informatics: Evaluating medical software
PDF
Principles of Health Informatics: Usability of medical software
PDF
Principles of Health Informatics: Social networks, telehealth, and mobile health
PDF
Principles of Health Informatics: Communication systems in healthcare
PDF
Principles of Health Informatics: Terminologies and classification systems
PDF
Principles of Health Informatics: Representing medical knowledge
PDF
Principles of Health Informatics: Informatics skills - searching and making d...
PDF
Principles of Health Informatics: Informatics skills - communicating, structu...
PDF
Principles of Health Informatics: Models, information, and information systems
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Using Microservices to Design Patient-facing Research Software
Phenoflow: An Architecture for FAIRer Phenotypes
Principles of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Clinical decision support systems
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Technical Validation through Automated Testing
Scalable architectures for phenotype libraries
Using AI to understand how preventative interventions can improve the health ...
Using AI to autonomously identify diseases within groups of patients
Using AI to understand how preventative interventions can improve the health ...
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Models, information, and information systems
Using AI to understand how preventative interventions can improve the health ...
Using Microservices to Design Patient-facing Research Software
Ad

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Cardiovascular Pharmacology for pharmacy students.pptx
PPTX
Open Quiz Monsoon Mind Game Final Set.pptx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
From loneliness to social connection charting
PDF
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
PPTX
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
PPTX
Introduction and Scope of Bichemistry.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Business Ethics Teaching Materials for college
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Cardiovascular Pharmacology for pharmacy students.pptx
Open Quiz Monsoon Mind Game Final Set.pptx
Renaissance Architecture: A Journey from Faith to Humanism
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Week 4 Term 3 Study Techniques revisited.pptx
Pharma ospi slides which help in ospi learning
From loneliness to social connection charting
Electrolyte Disturbances and Fluid Management A clinical and physiological ap...
Nursing Management of Patients with Disorders of Ear, Nose, and Throat (ENT) ...
Introduction and Scope of Bichemistry.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pre independence Education in Inndia.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Business Ethics Teaching Materials for college
Abdominal Access Techniques with Prof. Dr. R K Mishra
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Mga Unang Hakbang Tungo Sa Tao by Joe Vibar Nero.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx

Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library

  • 2. Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library Toward Implementation: Addressing Real-World Deployments S25 Martin Chapman, Vasa Curcin King’s College London Luke V. Rasmussen, Jennifer A. Pacheco Northwestern University Laura K. Wiley WashU Medicine
  • 3. DISCLOSURE OF CONFLICTS OF INTEREST I have not had any relationships with ACCME-defined ineligible companies within the past 24 months.
  • 4. Background: Computable phenotypes Knowledge objects that capture the logic required to identify individuals with a disease or condition from their medical records.
  • 5. Phenotype libraries Online phenotype catalogues, which store a significant number of computable phenotypes for the same disease or condition.
  • 6. Phenotype definition multiplicity This is a good thing (mostly)… • It is not desirable (or feasible) to have a single computable phenotype for every condition. Different use cases necessitate different logic. But… • We need to understand which use cases are already supported, to facilitate reuse. In other words, we need to understand what is unique about each phenotype. This can then be stored as metadata.
  • 7. Phenotype intersection To understand what is unique about each phenotype (and thus which use cases it best supports), we can first do the opposite and understand how two phenotypes for the same condition intersect. We can aim to do this automatically and therefore at scale.
  • 8. Barriers to automated intersection analysis 1. Identifying when two computable phenotypes target the same disease or condition in the first place. • e.g. ‘T2DM Implementation’ vs. ‘Type 2 Diabetes Mellitus’ (PheKB) 2. Comparing different forms of computable phenotypes • e.g. codelists vs. Natural Language Processing (NLP)
  • 9. Methods: Identifying same disease/condition 1. Levenshtein distance to identify text similarity. 2. HDR UK API calls to identify phenotypes that target the same condition but lack text similarity using common keywords. 3. Large Language Model (LLM) (Llama 3.1) to validate the additional phenotypes returned in 2. (Not all 161 definitions are actually for diabetes).
  • 11. Results: Intersection – Condition groups 1171 definitions loaded into the Phenoflow library. 137 condition groups (conditions with two or more phenotypes). PPV 95%. 574 definitions exist as a part of a group (49%). Good insight into the extent of the definition multiplicity phenomenon.
  • 12. Results: Intersection – Steps Trend: Across the 10 largest condition groups, the average number of steps in common between pairs of definitions relative to the average number of steps in the group is low. While definition multiplicty exists, definitions still have a considerable number of unique steps.
  • 13. Results: LLM impact We observed our LLM: • Identifying false positives (e.g. matches between phenotypes for different types of heart failure). • Identifying false negatives (e.g. phenotype names that do not include the condition but still aim to identify the condition via the presence of medications).
  • 14. Summary and Future work The use of Phenoflow has allowed us to compare definitions to understand more about definition multiplicity (extensive) and intersection (limited). Integrating an LLM increases the reliability of this process. Unique steps will soon be added to Phenoflow as metadata to support reuse. To complement definition intersection insight (horizontal), definition subsumption (vertical) will be explored next.
  • 15. Links Implementation (Python): https://github.com/phenoflow/curator Data analysis (Jupyter): https://github.com/phenoflow/intersection-analysis Live Phenoflow site: https://kclhi.org/phenoflow