Knowledge Graphs MSDTS-24-03
Knowledge Graphs MSDTS-24-03
Knowledge
Graphs and
Structured Data
Exploring Explainability and Efficiency in
Information Retrieval with Knowledge Graphs
Course Title
Information Retrieval
System
Presented by Presented to
Naba Majeed Dr. Israr Hanif
MSDTS-24-03
Presentation
Outline
• Introduction
⚬ Explainability in IR Systems
⚬ KGs in IR Systems
• Research Abstract
• Problem Statement
2
R-01: Towards Improving the Explainability of Text-based
Information Retrieval with Knowledge Graphs (2023)
Authors: Boqi Chen
et al. Link:
https://arxiv.org/pdf/2301.06974
Abstract:
This paper tackles the lack of explainability in modern IR systems by integrating Knowledge Graphs to
make results more understandable. It proposes methods to highlight key sentences (MIS) and re-rank
documents using KGs, tested on datasets like WIKIQA, showing better explainability and performance.
• KEYWORDS: explainable information retrieval, knowledge graphs, entity linking, natural language
processing
MIS (Most Important Sentence): It is a method to find and highlight the most relevant sentence in a
document that directly answers a user’s query.
WIKIQA: A dataset of questions and answers created from Wikipedia, often used to test how well a system
can answer questions.
3
Problem Statement
This paper focuses on making search results easier to understand (explainable information
retrieval). It explores using Knowledge Graphs with clear relationships to improve explainability.
• Knowledge Graphs (KGs) help explain search results by organizing information about
entities and their relationships.
• Improving how entities are matched in KGs can make the system work even better.
• Developed a framework that uses KGs to improve the transparency of search results.
5
Related Work
Explainable AI (XAI)
XAI methods explain how models make decisions.
⚬ Model-specific: Works only for simple models like trees and linear models.
⚬ Model-agnostic: Explains input-output but struggles with global explanation.
Explainable IR
Methods used:
• Shows key parts (e.g., Google snippets).
• Feature Importance: Scores queries and documents but may miss the deeper
context.
This research overcomes these problems by using a knowledge graph for better highlights
and re-ranking. 6
Limitations
• Heavy reliance on the quality of entity-linking tools, which are prone to errors.
• Current implementation does not fully capture the structural information of sentences,
which could improve explainability.
Future Work
While the framework demonstrates significant promise, further improvements in entity
linking and evaluation across diverse domains (science, law) are necessary to fully realize
its potential.
7
R-02: HybridRAG: Integrating Knowledge Graphs and Vector Retrieval
Augmented Generation for Efficient Information Extraction (Aug 2024)
Authors: Bhaskarjit Sarmah
et al.
Link:
https://arxiv.org/pdf/2408.04948
Abstract:
HybridRAG merges VectorRAG and GraphRAG methods to extract accurate information from unstructured
financial data. The method combines vector-based and KG-based retrieval for better Q&A performance,
tested on financial call transcripts.
8
Contribution
• Development of HybridRAG, a framework combining VectorRAG and GraphRAG for
enhanced retrieval and generation.
• Demonstrated significant improvements in answer relevance, and retrieval accuracy.
• Created a novel dataset of financial earnings call transcripts, contributing to financial NLP
research.
9
Related Work
VectorRAG enhances NLP by retrieving text to support generation tasks, but struggles with
handling multiple documents and long contexts effectively, whileGraphRAG uses knowledge
graphs (KGs) to improve NLP tasks.
• This research combines VectorRAG and GraphRAG into a hybrid approach to improve
performance.
• It is the first to use this hybrid method for better analysis of financial documents. A unique
Q&A dataset from financial call transcripts of Nifty-50 companies is used to demonstrate its
effectiveness.
• This approach addresses previous limitations by leveraging the strengths of both techniques
for more accurate and efficient document analysis.
1
0
Limitations
• High computational cost due to the integration of two retrieval systems.
• Performance depends on the quality of KG construction and maintenance,
which is resource-intensive.
• Results may vary across domains beyond finance due to domain-specific
optimizations.
Future Work
Focus on expanding the approach to other domains and optimizing computational
effi ciency.
1
1
Critical Analysis
1
2
Aspect R-01 R-02
OPPORTUNITIES Can benefit from advancements in Adaptable to other fields like healthcare
entity-linking technologies and or legal analysis, opening new research
expanded KG datasets. avenues.
1
3
Thank you for your
time