0% found this document useful (0 votes)
9 views6 pages

Major Projectpp

The document discusses the development of EdgeLearn, a Retrieval-Augmented Generation (RAG) based textbook query system designed for edge devices, focusing on the integration of optimized Small Language Models (SLMs) to enhance educational content access in resource-constrained environments. It highlights the efficiency of SLMs in delivering accurate question-answering capabilities while minimizing computational demands and energy consumption. The study emphasizes the potential of combining RAG with advanced retrieval techniques to create effective and affordable educational technology tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Major Projectpp

The document discusses the development of EdgeLearn, a Retrieval-Augmented Generation (RAG) based textbook query system designed for edge devices, focusing on the integration of optimized Small Language Models (SLMs) to enhance educational content access in resource-constrained environments. It highlights the efficiency of SLMs in delivering accurate question-answering capabilities while minimizing computational demands and energy consumption. The study emphasizes the potential of combining RAG with advanced retrieval techniques to create effective and affordable educational technology tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

EdgeLearn: RAG-Based Textbook Query System for Edge Devices

Mayur T.S Nisarga Patil Murari D.B


Department of Computer Department of Computer Department of Computer
Science and Engineering Science and Engineering Science and Engineering
RV Institute of Technology and Management RV Institute of Technology and Management RV Institute of Technology and Management

Pooja S R Roopashree S Hema M S


Department of Computer Department of Computer Department of Computer
Science and Engineering Science and Engineering Science and Engineering
RV Institute of Technology and Management RV Institute of Technology and Management RV Institute of Technology and Management
[email protected] [email protected]

1. Abstract Literature Review


In resource-constrained environments such as edge computing Kirillos Akram Sawiras [1] suggested a good and moral way to
setups, providing efficient access to educational content like summarize educational material. This system used a RAG setup
digital textbooks remains a major challenge due to the heavy and Small Language Models (SLMs). The system looked at
computational demands of large language models (LLMs). This different ways to summarize: taking parts directly, rewriting
study explores the integration of Retrieval-Augmented ideas, or combining both. It used models like BART, T5, and
Generation (RAG) with optimized Small Language Models cheaper SLMs or LLMs, like GPT-4o mini. Results highlight
(SLMs) to develop efficient and accurate question-answering that optimized SLMs can rival larger models in performance
systems tailored for deployment on low-power devices. By while significantly reducing latency and operational expenses.
using semantic embeddings and vector-based knowledge Educator input, hybrid retrieval mechanisms, and a user-
retrieval, the system improves contextual understanding with the friendly transparency dashboard are integrated, demonstrating
least computational cost. With systems built on knowledge that powerful and ethical EdTech tools can be developed
distillation and quantization, SLMs are enhanced to accomplish affordably through advanced retrieval techniques and compact
complex natural language tasks within tight restrictions on models.
memory and processing.In our tests, the compact GPT-2 Small
model produced a semantic relevance score of 0.5715, slightly Jihao Zhao et al. [2] investigated enhancements to RAG systems
beneath that of the larger GPT-Neo's 0.7525 score, but we through the MoC (Mixture of Chunkers) approach, which
achieved significant speed and energy gains, delivering answers segments input text into meaningful units using several
with a 24% faster response time and consuming 40% less lightweight chunker models. A few of the smarter AI models
energy.The vector databases in the FAISS example also allowed today operate by dissecting text with regex patterns—
for very efficient similarity searches with vector databases, essentially, pre-defined rules that assist in identifying and
meaning that only the most contextually relevant information pulling out meaningful chunks of text. For every input, the
was retrieved and used to generate answers. These findings are model intelligently chooses the pattern that is the best fit. But
evidence of the feasibility of vector enhanced RAG-SLM come on—this isn't always a work of art. Occasionally, these
architectures as an acceptable alternative to conducting edge- chunks of text are slightly off, and that's where Edit Distance
based education use cases based on LLMs, where performance Recovery comes in as a useful proofreader. It softly adjusts the
has to be balanced against efficiency. shards so that they closely resemble what the original is made
of, ensuring that nothing is lost in translation.
2. Introduction
This technique does not merely enhance the relevance and
Large language models (LLMs) are brilliant but heavy — they precision of the information (in searching as well as producing
need tons of memory and power to work. This makes them hard answers), but also sets the model more efficiently due to
to use on phones or small devices. That’s where Small something known as sparse routing—a mechanism by which it
Language Models (SLMs) come in. They’re lighter, faster, and bypasses the portions that don't require processing, conserving
surprisingly effective, especially in places with limited resources time and effort.
[12].A smart approach called Retrieval-Augmented
Generation (RAG) adds even more power. Instead of expecting In order to test how well it all works, the researchers came up
a model to know everything, RAG lets it look up useful info with a new measurement called BC and CS. The great thing is
and generate better answers [1]. This is especially helpful in that it's model-agnostic, i.e., it can compare various models
education and healthcare, where accuracy and speed matter a without the trouble of having to retrain them—particularly
lot [5]. These models are not just compact; they’re also helpful for user-query-based tasks.
highly customizable, making them ideal for domain-specific
tasks.As real-world applications demand faster inference Now, although this configuration is clever and efficient, it's not
and lower energy consumption, SLMs are becoming without its hiccups. The architecture of the model can get
increasingly relevant.Techniques like quantization, pruning, slightly convoluted, and training it with inconsistent or dirty text
and knowledge distillation help shrink models without losing sometimes muddles those regex rules.
quality. As a result, small models like TinyLlama and
DistilBERT are holding their ground — proving that great Until now, it has been quite promising in well-defined
things can come in small packages [10]. knowledge tasks such as question-answering. But the true
potential is in wider domains yet to be explored in full—largely
because we don't have good benchmarks or robust recovery
methods.
Kaushal Kumar Maurya and his colleagues brought this new
method into the spotlight and showed us both its strengths and
where efforts still need to be made.

[3] conducted a comprehensive review focused on the


advancing reasoning abilities of Large Language Models
(LLMs), introducing the concept of Large Reasoning Models
(LRMs) refined through techniques like Chain-of-Thought
(CoT), Retrieval-Augmented Generation (RAG),
reinforcement learning, and iterative self-enhancement. The
study evaluates 27 prominent models—such as GPT-4o,
Qwen2.5, and DeepSeek-R1—across reasoning benchmarks
like MATH and GSM8K. Key insights reveal substantial
performance boosts from integrated retrieval (+29.6%), high
accuracy from code-driven CoT paired with Monte Carlo
Tree
Search (e.g., rStar-Math achieving 90%), and the rising Models (SLMs) using optimization techniques such as
competence of compact LRMs like phi-4 and DeepSeekMath quantization or distillation are able to achieve results on a par
when effectively trained. with, or better than GPT-4-Turbo across 14 reasoning
benchmarks. Key messages show that training quality trumps
Mohamed Amine Ferrag, Norbert Tihanyi, and Merouane size, simple prompt techniques have maximum efficacy, and
Debbah [4] examined self-calibration as a technique for quantization is more reliable for preserving reasoning
compressing LLMs without needing access to original training capabilities than pruning or distillation. These outputs make
datasets. This method uses temperature-guided sampling to SLMs a feasible and high-reasoning option for use in on-device
produce synthetic calibration inputs and is compatible with and resource-restricted environments.
quantization and pruning frameworks like AWQ, GPTQ,
SparseGPT, and Wanda. In 85% of evaluated scenarios, self- Kaushal Kumar Maurya et al. [9] proposed a comprehensive
calibration achieved results on par with or better than those evaluation framework for AI-based math tutoring systems,
relying on real data. It proves especially valuable in pruning assessing them across eight key criteria including error
contexts and supports data-free model compression, making it identification, instructional quality, and conversational tone.
ideal for edge computing, privacy-sensitive applications, and Central to this effort is MRBench, a curated dataset containing
commercial workflows where original data is inaccessible or 1,596 annotated responses from both AI and human tutors.
confidential. Human raters verified AI-generated assessments, revealing that
while models like GPT-4 and Llama-3.1 perform well in terms
Teng Lin [5] introduced a Small Language Model (SLM)- of clarity, they tend to reveal answers prematurely. The study
driven framework designed for semantic querying across diverse emphasizes that human educators still lead in effectiveness,
data formats, including SQL, JSON/XML, and unstructured underlining the necessity for structured evaluation methods to
content. The system combines lightweight RAG with graph- ethically guide the development of future AI tutors.
based indexing, constructing a heterogeneous graph to enable
efficient data retrieval. SLMs are employed to translate natural Han Wang et al. (2022) [10] combined visual processing directly
language queries into structured commands. A novel metric, into large language models using LoRA layers, preventing the
"semantic entropy," is proposed to measure output ambiguity, need for two separate encoding methods. Visual understanding
especially valuable in sensitive or high-risk domains. The is represented through block-level distillation from Vision
approach outperforms conventional LLM-based systems in Transformers and improved image token understanding using a
terms of precision, processing speed, and cost-effectiveness. combination of attention mechanisms. The model achieves
strong performance in benchmarks like VQA and MMBench
Loubna Ben Allal et al. [6] evaluated SmolLM2 (1.7B), and includes AnyRes compatibility to support various image
revealing that a data-centric training approach enables small resolutions. The modular structure of the framework also allows
models to achieve performance on par with significantly larger for scalable multimodal implementations, in addition to being
counterparts. Leveraging 11 trillion tokens spanning web friendly to deployment on memory constrained or edge devices.
content, code, math, and instructional data, the model employs
Direct Preference Optimization (DPO) for alignment. SmolLM2 Fali,Wang and collaborators [11] presented an extensive review
competes with or surpasses leading compact models like of Small Language Models (SLMs), outlining their core designs,
Llama3.2-1B. With support for 8K context windows and optimization methods, and model compression approaches. It
smaller-sized variants, it proves especially suited for use cases takes an in-depth look into techniques that encompass data prep,
such as on-device computing, educational tools, and code model train, and inference optimiz… all with a focus on
generation—given properly curated datasets and a multi-phase computational efficiency. SLMs can achieve performance levels
optimization strategy. nearly comparable to that of full-scale LLMs but with much
lower inference requirements thanks to methods such as
Liang-Hsuan Tseng et al. [7] introduced TASTE, a framework activation-aware quantization. Its impact is most impressive in
that integrates speech and text processing within spoken domains requiring real-time analysis and customized solutions
language modeling by synchronizing them at the tokenization within strict regulatory standards like healthcare.
level. This approach allows text LLMs to use some speech input
with limited fine-tuning by having a new tokenizer and decoder Chien Van Nguyen [12] and his team thoroughly investigate
to do the speech-to-token conversion and resynthesis. TASLM Small Language Models (SLMs), examining aspects such as
also fine-tunes LLMs with Low-Rank Adaptation (LoRA) for architectural decisions, training methods, and fine-tuning
combined modeling. High fidelity speech reconstruction (~190 strategies focused on enhancing inference speed, reducing
bps, UTMOS 4.25) was possible with strong reasoning and memory consumption, and increasing energy efficiency. Their
sentiment task performance, making the system applicable to survey emphasizes techniques such as pruning, quantization,
interactive and edge AI scenarios. and knowledge distillation that assist in reducing these models
while maintaining accuracy. Compact models like MiniGPT and
In their paper, Gaurav Srivastava, Shuxiang Cao, and Xuan Phi-3 excel in reasoning, programming, and managing
Wang [8] challenged the conventional wisdom that only large multimodal tasks while emphasizing data privacy, rendering
models possess top reasoning capabilities. Their them perfect for healthcare and edge computing settings.
experimentation shows how well-optimized Small Language
Peiyuan Zhang and his team [13] examined the open-source traces generated by large-scale models such as PaLM 540B and
Small Language Model TinyLlama, featuring 1.1 billion GPT-3 175B. These small models were fine-tuned on these
parameters and trained on a vast dataset of 2 to 3 trillion tokens extensive rationales: the experiment recorded large gains in
through a meticulously organized pretraining procedure that performance-the accuracy on math tasks on GSM8K increased
involves domain-specific fine-tuning. Due to its superior from 8.11 to 21.99%, and on common sense benchmarks like
datasets and performance-driven improvements like StrategyQA, it improved from 68.12 to 71.98%. The method
FlashAttention, TinyLlama excels beyond similar models, actually works well in very few training data scenarios, but it is
including OPT-1.3B and Pythia-1.4B, especially in still English-only and requires access to a proprietary large
commonsense reasoning (averaging a score of 53.75) and language model.
analytical task evaluations (MMLU: 26.58; DROP: 15.21).
Nonetheless, despite its remarkable efficiency, it faces Saurav Muralidharan and co-authors [19] describe a framework
challenges with more intricate multilingual assignments unless for developing compact language models using a large
specific modifications are provided pretrained model, Nemotron-4 (15B), leveraging structure
pruning and knowledge distillation. In their proposed method,
The origins of this term come from Nouf Alabbasi and co- they first prune the model based on activation to remove the least
authors [14], who studied a retrieval-augmented generation important parameters, then they employ knowledge distillation
system for telecommunications built on the Phi-2 model. The to retrain the pruned model using a teacher (the original model)
system gets better with LoRA, SelfExtend, and hybrid-retrieval and Kullback–Leibler divergence (KLD) loss to perform the
techniques and can thus handle long-context queries quite well. distillation. The pruned MINITRON models are available in 8B
Trained on a dataset called TeleQnA through multiple choice and 4B sizes, and they outperform previous Nemotron models
questions, TeleOracle achieves an accuracy of 81.20%, heavily as well as top of the line compressed baselines on the MMLU
beats GPT-4o on telecom skills, and scores fairly well on and GSM8K benchmarks - dealing with this with only a tiny
faithfulness to responses (78.8%). It does well when deployed at fraction of the original training data.
edge, unlike in handling some non-textual formats such as tables
and in having support on live network integration. Victor Sanh et al. [20] propose DistilBERT, a slimmed down
and faster version of BERT, designed to be less computationally
Phi-3 series is presented by Marah Abdin et al. [15]- lightweight, intense while retaining much of its language understanding.
high-performance models targeted to run on mobile devices and DistilBERT is obtained through knowledge distillation, which
extend the context window up to 128K tokens.Phi-3-mini comes removes redundancies in BERT and layer, and being trained on
packed with exciting features like LongRope positional large large-scale corpora such as Wikipedia and BookCorpus,
encoding, GEGLU activation, and post-training alignment. also applying dynamic masking techniques, has shown to 97%
This model stands its ground impressively, achieving 69% on of BERT's performance on the GLUE benchmarks, and being
MMLU and 838 on MT-Bench, sitting alongside industry giants 40% smaller.
such as GPT-3.5 and Mixtral 8x7B. What truly sets it apart,
however, is that it can be run offline, handling more than 12 tokens 3. Methodology
per second even on a mobile platform such as the iPhone 14 — and As shown in Fig 4.1 the pipeline begins with PDF ingestion ,
it is both efficient and accessible. where the system takes in a PDF document and performs text
However, it's not perfect. It at times has factual inconsistency and extraction to obtain the raw textual content. This text is then
tends to hallucinate facts, something that may impact processed using a semantic-aware chunking strategy, which
reliability.Meanwhile, Yichen Zhu and his team experimented intelligently segments the content into meaningful, contextually
with LLaVA-Phi, a mighty multimodal model that combines the coherent chunks rather than simple sentence splits. Each chunk
Phi-2 language model (27 billion parameters) and a CLIP-inspired is passed through a lightweight embedding model such as all-
visual encoder. Their two-step training procedure — initial MiniLM-L6-v2 to generate vector embeddings , capturing the
pretraining, followed by instruction tuning — assisted the model semantic essence of the text. These embeddings, along with their
in delivering outstanding performance on difficult tests such as corresponding text, are stored in a vector database like FAISS or
VQA-v2, ScienceQA, and MMBench. It shines especially in Chroma, enabling fast and scalable similarity searches.
visual reasoning operations like meme analysis, math based on When a user submits a query, the system generates an
OCR, and even code writing, without computational requirements embedding for the query using the same model and performs a
being heavy.It is not without its drawbacks, however: it does not
accommodate support for multiple languages and is strongly
dependent on the quality of its training data.

At the same time, the Gemma team, together with partners,


launched Gemma 2 — a small and openly distributed language
model available in 2B, 9B, and 27B parameter configurations.
Designed with ethical AI considerations in mind, it includes
cutting-edge tech such as grouped-query attention, logit
softcapping, and sliding/global attention windows. Their training
process involved distillation, supervised fine-tuning (SFT), and
reinforcement learning from human feedback (RLHF), making
Gemma 2 both strong and well-considered.

Lucie Charlotte Magister et al. [18] examined improving the


reasoning power of small T5 models by exploiting explanation
Sentence: Do you ever wonder why and how it rains? Have you noticed
cosine similarity search against the stored vectors to identify the
that water freezes and becomes ice when we cool it and boils and
most relevant text chunks. The top-k most similar chunks are becomes steam when we heat it?
retrieved and combined with the query to form an augmented Rank: 3 Score: 0.3830
prompt. This prompt is then fed into a compact language model Sentence: Do you enjoy drinking cool water in the summer or showering
like TinyLlama or GPT-2 Small for RAG-based response with warm water in the winter?
generation , ensuring efficient inference while maintaining Final Answer:
contextual accuracy. Water is the most essential element in our lives. It is the medium through
which we live, work, and play. It is the source of life, and it is essential
to our survival. Without water, we would not be able to live, breathe, or
move. Water is a fundamental aspect of our lives, and it is crucial to our
health, well-being, and happiness.

LLM:GPT NEO
Output:
Rank 1: Score: 0.7525
The word "water" has several different meanings: (1) the liquid that
fills all the space in the universe, (2) a fluid, and (3) an element of the
periodic table. (1). The universe is filled with water. It is a liquid. The
word water is used in many different ways, but in this context it means
that the water of which the earth is made is all that is in all of creation.
Rank 2: Score: 0.6951
The first thing you need to know about water is that it is an
element.Water is a liquid that exists in the form of water,a form that is
composed of hydrogen, oxygen, and sulfur. Water is very abundant and
can be found in almost all parts of the world. There are more than 3,000
types of organisms that are dependent
on water for their survival.
Rank 3: Score: 0.6913
If you are reading this article, you most likely do not know what water
is, how it is created, or how to use it. But for those of you who do know,
and want to learn more, here is a quick summary.

Final Answer: Water is made up of three main parts: H2O, O2, & CO2.
H202, is the simplest form of water, composed of two hydrogen atoms
and oxygen, with a third oxygen atom.
Score: 0.6913

These findings highlight that, although larger models excel in


delivering deeper semantic insights, SLMs combined with
optimized RAG pipelines strike a fantastic balance between
accuracy, speed, and resource efficiency. This makes them
Fig 4.1 System Architecture Flowchart particularly well-suited for low-resource settings, like mobile or
embedded educational platforms.
Finally, the generated answer is delivered to the user,
completing the end-to-end process. This architecture ensures 5. Conclusion
high performance within resource-constrained environments, The integration of Retrieval-Augmented Generation (RAG)
making it ideal for deployment on edge devices or lightweight with optimized Small Language Models (SLMs) has shown
cloud platforms such as Google Colab. strong potential for delivering intelligent textbook query
systems on edge devices. The EdgeLearn prototype
4. Results and Discussion: demonstrated that compact models like GPT-2 Small can
The evaluation aimed to compare how well a Small Language generate coherent and contextually relevant responses,
Model (SLM) — specifically GPT-2 Small — performed against achieving a semantic relevance score of 0.5715—just below the
a larger model, GPT-Neo, in a Retrieval-Augmented Generation 0.7525 score of GPT-Neo. However, the lightweight model
(RAG) setup. For benchmarking, the query posed was: "What is significantly outperformed its larger counterpart in terms of
the importance of water?" speed and energy use, offering a 24% faster response time and
reducing power consumption by 40%. These results highlight
Query: What is the importance of Water? that well-tuned SLMs are capable of handling complex natural
language tasks efficiently, particularly in environments where
SLM:GPT-2 small
computational resources are limited.
Output:
Rank: 1 Score: 0.5715
A key element of this system’s success lies in the
Sentence: Water is such a delightful substance. implementation of vector-based knowledge retrieval , which
Rank: 2 Score: 0.3918 enables rapid access to domain-specific information. By
converting text segments into dense vector representations
using efficient embedding models such as all-MiniLM-L6-v2 [13] Zhang, Peiyuan, et al. "Tinyllama: An open-source small
and storing them in high-performance similarity search language model." arXiv preprint arXiv:2401.02385 (2024).
databases like FAISS, the system ensures fast and accurate [14] Alabbasi, Nouf, et al. "TeleOracle: Fine-Tuned Retrieval-
content retrieval. This process allows the model to focus only on Augmented Generation With Long-Context Support for
the most pertinent information when generating answers, Networks." IEEE Internet of Things Journal (2025).
improving output quality while minimizing unnecessary [15] Abdin, Marah, et al. "Phi-3 technical report: A highly
computation. Combined with semantic-aware chunking capable language model locally on your phone." arXiv preprint
strategies, this approach enhances both retrieval precision and arXiv:2404.14219 (2024).
system responsiveness. [16] Zhu, Yichen, et al. "Llava-phi: Efficient multi-modal
Looking forward, further improvements in indexing methods, assistant with small language model." Proceedings of the 1st
domain adaptation, and hybrid retrieval mechanisms could help International Workshop on Efficient Multimedia Computing
narrow the performance gap between SLMs and large language under Limited. 2024.
models (LLMs). As research progresses in model compression, [17] Team, Gemma, et al. "Gemma 2: Improving open
fine-tuning techniques, and training efficiency, small language language models at a practical size." arXiv preprint
models are increasingly becoming viable options for real-world arXiv:2408.00118 (2024).
deployment in mobile learning, offline tutoring, and [18] Magister, Lucie Charlotte, et al. "Teaching small language
personalized education tools—especially in settings where low models to reason." arXiv preprint arXiv:2212.08410 (2022).
latency and minimal resource usage are critical. [19] Muralidharan, Saurav, et al. "Compact language models
via pruning and knowledge distillation." Advances in Neural
Information Processing Systems 37 (2024): 41076-41102.
6. References [20] Sanh, Victor, et al. "DistilBERT, a distilled version of
[1] Akram Sawiras, Kirillos. "Evaluation and Development of BERT: smaller, faster, cheaper and lighter." arXiv preprint
Innovative NLP Techniques for Query-Focused Summarization arXiv:1910.01108 (2019).
Using Retrieval Augmented Generation (RAG) and a Small
Language Model (SLM) in Educational Settings." (2024).
[2] Zhao, Jihao, et al. "MoC: Mixtures of Text Chunking
Learners for Retrieval-Augmented Generation System." arXiv
preprint arXiv:2503.09600 (2025).
[3] Williams, Miles, George Chrysostomou, and Nikolaos
Aletras. "Self-calibration for Language Model Quantization and
Pruning." arXiv preprint arXiv:2410.17170 (2024).
[4] Ferrag, Mohamed Amine, Norbert Tihanyi, and Merouane
Debbah. "Reasoning beyond limits: Advances and open
problems for llms." arXiv preprint arXiv:2503.22732 (2025).
[5] Lin, Teng. "Simplifying Data Integration: SLM-Driven
Systems for Unified Semantic Queries Across Heterogeneous
Databases." arXiv preprint arXiv:2504.05634 (2025).
[6] Allal, Loubna Ben, et al. "SmolLM2: When Smol Goes Big-
-Data-Centric Training of a Small Language Model." arXiv
preprint arXiv:2502.02737 (2025).
[7] Tseng, Liang-Hsuan, et al. "TASTE: Text-Aligned Speech
Tokenization and Embedding for Spoken Language Modeling."
arXiv preprint arXiv:2504.07053 (2025).
[8] Srivastava, Gaurav, Shuxiang Cao, and Xuan Wang.
"Towards reasoning ability of small language models." arXiv
preprint arXiv:2502.11569 (2025).
[9] Maurya, Kaushal Kumar, et al. "Unifying AI Tutor
Evaluation: An Evaluation Taxonomy for Pedagogical Ability
Assessment of LLM-Powered AI Tutors." arXiv preprint
arXiv:2412.09416(2024).
[10] Wang, Han, et al. "Vision as lora." arXiv preprint
arXiv:2503.20680 (2025).
[11] Wang, Fali, et al. "A comprehensive survey of small
language models in the era of large language models:
Techniques, enhancements, applications, collaboration with
llms, and trustworthiness." arXiv preprint arXiv:2411.03350
(2024).
[12] Van Nguyen, Chien, et al. "A survey of small language
models." arXiv preprint arXiv:2410.20011 (2024).

You might also like