RFSensing GPT
RFSensing GPT
Abstract—We present RFSensingGPT, an integrated frame- utility in healthcare [2], law [3], finance [4], and telecommuni-
work for radio frequency (RF) sensing that combines technical cations [5]. However, their application in specialized technical
question-answering, code retrieval, and spectrogram analysis domains like radio frequency (RF) sensing presents unique
through retrieval-augmented generation (RAG). Our framework
addresses the fundamental challenge of applying large language challenges. Although RF sensing involves complex terminol-
models to RF sensing applications, where specialized domain ogy and highly specialized concepts that are underrepresented
knowledge is underrepresented in general training corpora. in generic training [6], [7]. Traditional LLMs struggle with
The system leverages a filtered RedPajama dataset containing RF-specific tasks such as signal analysis, spectrum monitoring,
RF-relevant technical documents, processed through a hybrid signal classification, and interference detection, all requiring
retrieval mechanism that combines vector-based similarity search
with best match (BM25)-based query fusion. Performance eval- precise technical understanding and context-rich information
uation using document collection ranging from 5K to 80K [8]. While LLMs have shown broad applicability across do-
demonstrates that RAG maintains superior faithfulness (0.9617 mains, their deployment in RF sensing presents unique tech-
vs 0.8506, +13.1%) compared to baseline LLM implementations. nical challenges. Furthermore, the computational demands of
Our hierarchical chunking approach using MarkdownHeader- traditional LLMs pose deployment challenges in real-time RF
TextSplitter achieves optimal precision (0.31-0.32) at lower k-
values while maintaining correctness scores of 4.0-5.0. The sensing systems that require low latency and high efficiency
framework integrates CLIP-based vision models for RF pattern [9]–[11]. These limitations necessitate a specialized approach
recognition, achieving 93.23% accuracy in radar data analysis that combines domain expertise with efficient computational
tasks. Implementation benchmarks show efficient processing with architectures.
minimal GPU memory requirements (0.66GB) even at scale. Domain-specific models have evolved alongside retrieval-
Through a comprehensive evaluation of the embedding models,
RFSensingGPT establishes a new benchmark for technical query augmented generation (RAG) advancements, with specialized
understanding and RF spectrogram analysis in the emerging architectures addressing unique field challenges. Notable ex-
field of integrated sensing and communications systems for 6G amples include BloombergGPT [4] and FinGPT [12]–[14]
networks. in finance, WizardMath [15] in mathematical reasoning, and
Index Terms—Radio frequency sensing, retrieval-augmented ChatLaw [3] and SaulLM [16] in legal analysis. Each demon-
generation, spectral analysis, 6G networks, large language models strates how specialized training approaches whether through
large-scale domain-specific pre-training or targeted fine-tuning
can significantly improve performance in technical domains.
Table I illustrates this diverse approach across professional
I. I NTRODUCTION
fields, highlighting the need for specialized solutions in RF
TABLE I: Comparative analysis of domain-specific language models across professional and technical domains
Model Designation Domain Focus Knowledge Foundation Implementation Approach
MedPalM2 [2] Clinical Practice Medical Knowledge Bases Supervised Fine-tuning (SFT)
BloombergGPT [4] Financial Analysis Integrated Market Data Pre-training
FinGPT [12] Investment Analytics Market Intelligence RAG + SFT
ChatLaw [3] Legal Analysis Jurisdictional Framework RAG + SFT
SaulLM [16] Legal Reasoning Legal Documentation Continue Pretrain + SFT
WizardMath [15] Mathematical Analysis Specialized Problem Sets RLEIF +SFT
Code Llama [25] Software Development Programming Knowledge Continue Pretrain + FIM + SFT
CoFE-RAG [26] Multi-domain Retrieval Dynamic Knowledge Base Granular Assessment Framework
AT-RAG [27] Technical Query Processing Domain Documentation Topic-Based Reasoning
RFSensingGPT RF Sensing Systems Industry Standards RAG-Based Architecture
that integrates systematic dataset creation with model training demonstrating how specialized model development [20] can
[23] for effective processing of complex RF signals [24]. enhance pattern recognition through targeted model optimiza-
RAG has emerged as a powerful approach for enhancing tion.
LLM performance in specialized domains without model re- Despite notable progress in domain-specific language mod-
training [28]. By augmenting LLM responses with external els [23] and RAG systems [28], fundamental challenges persist
knowledge bases at inference time, it enables accurate re- in RF sensing applications. Current solutions lack specialized
sponses while maintaining efficiency. In RF sensing appli- technical knowledge for interpreting RF signals and spectro-
cations, these systems excel at interpreting technical speci- grams [40], affecting signal analysis accuracy. While general-
fications, standards, and signal processing protocols. Recent purpose RAG frameworks provide knowledge retrieval capa-
developments in RAG frameworks have significantly improved bilities [41], they fail to address RF-specific code generation
retrieval capabilities through several key innovations. The and technical documentation synthesis needs [42]. Further-
CoFE-RAG framework introduces multi-granularity keyword more, existing approaches struggle with multi-modal data
assessment [26], while AT-RAG implements topic modeling integration, specifically in combining textual information with
and iterative reasoning for complex technical queries [27]. spectrogram analysis [21]. RFSensingGPT addresses these
The integration of generative semantic integration has en- challenges through an integrated architecture that combines
hanced structured query processing [29], while advances in specialized RF knowledge bases, advanced retrieval mech-
hypothetical document embedding and LLM reranking have anisms, and spectrogram analysis capabilities. The system
improved retrieval precision [30]. These developments have implements a comprehensive framework that bridges general-
particular relevance for RF sensing applications, where precise purpose language models with domain-specific requirements
information retrieval is crucial for system performance. [43], integrating code retrieval, visual signal analysis, and
Recent advances in code generation have transformed RF technical question-answering functionalities [44]. To address
sensing development. RepoGenReflex [31] introduced verbal these challenges, we propose RFSensingGPT, an integrated
reinforcement learning to enhance code completion accuracy, framework that leverages RAG [28] to advance the RF sensing
while RRGcode [32] implemented sophisticated re-ranking applications with the following contributions:
methods for selecting optimal code candidates, effectively 1) Development of an integrated LLM-based retrieval sys-
reducing error propagation. Generation-augmented retrieval tem specifically for RF technical content, combining
[33] has markedly improved code search functionality, though BM25 query fusion with vector similarity search, en-
style standardization remains an ongoing challenge. Retrieval- hanced by semantic analysis and domain-specific key-
augmented code generation [34] has shown particular promise word matching techniques.
by successfully converting user queries and API specifications 2) Creation of a comprehensive RF knowledge base
into valid code, substantially improving RF sensing develop- through automated content recognition, LLM-based key-
ment efficiency. Beyond traditional language processing, the word expansion, and specialized filtering mechanisms to
integration of vision language models has proven crucial for ensure high-quality domain-specific information.
RF applications. Vision language models have shown remark- 3) Establishment of a rigorous evaluation framework that
able capabilities in RF spectrogram interpretation achieving quantitatively validates improvements in response qual-
accuracy rates above 70% in object identification [35], while ity and retrieval accuracy compared to traditional meth-
specialized implementations have proven especially effective ods across multiple metrics.
for complex visual data analysis [36]. The advancement in 4) Implementation of an integrated Q&A module that syn-
pixel value prediction training methodology [37] has been thesizes technical RF concept analysis with code re-
particularly valuable for RF spectrogram processing, where trieval capabilities and spectrogram pattern recognition,
precise feature extraction and pattern recognition are critical. enabling comprehensive responses spanning theoretical
Domain-specific adaptations, similar to those demonstrated understanding, implementation guidance, and RF signal
in medical imaging through PLIP and Biomed CLIP [38], analysis.
have established new benchmarks in analysis accuracy. The The proposed framework builds on a high-quality RF-
successful integration of these models in clinical environments specific knowledge base, carefully curated from the RedPa-
[39] provides valuable insights for RF sensing applications, jama dataset [45], containing over 80,000 technical docu-
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 3
mentation entries, source code repositories, and peer-reviewed approach. Throughout query processing, the system applies
research papers from 1.2 trillion entries. The system archi- retrieval methods relevant to each query type to ensure precise
tecture implements a hybrid retrieval mechanism that com- and pertinent responses across all modalities.
bines vector-based search with BM25 retrieval, optimized for
RF technical content. Empirical evaluation demonstrates the
B. Knowledge Base Development and Retrieval Mechanisms
framework’s effectiveness, achieving 97.79% faithfulness in
technical documentation retrieval, 94.7% accuracy in activ- RF sensing applications necessitate comprehensive knowl-
ity pattern classification, and 85.9% precision in domain- edge bases coupled with advanced retrieval systems. The
specific code retrieval. The CLIP-based vision component core development process integrates authoritative materials,
demonstrates robust RF pattern recognition and activity pattern encompassing technical documentation, scholarly research,
classification capabilities, while maintaining high precision and industry standards, ensuring theoretical depth and real-
in identifying domain-specific implementations. Through this world applicability in RF sensing. The framework leverages
comprehensive framework, RFSensingGPT advances the RF an LLM-powered keyword expansion system that captures
sensing applications while addressing the fundamental chal- established RF terminology and emerging technical concepts.
lenges of technical accuracy, computational efficiency, and A systematic LLM evaluation assesses technical precision and
real-time performance in practical deployments. Implemen- alignment with RF sensing principles, validating each com-
tation challenges remain in computational requirements and ponent aligns RF domain knowledge while accommodating
response latency, warranting future exploration of optimization technological evolution in the field.
techniques including model compression, quantization, and
efficient retrieval mechanisms while maintaining performance C. Data Processing and Knowledge Base Construction
quality.
The construction of our specialized RF vector database
II. METHODOLOGY forms the foundation of RFSensingGPT’s knowledge re-
trieval capabilities, building on recent advances in domain-
The RFSensingGPT system combines three essential tasks
specific RAG architectures [46], [47]. This process inte-
to provide an integrated approach to RF sensing analysis:
grates sophisticated approaches to ensure comprehensive cov-
code retrieval, technical question answering (Q&A), and RF
erage of RF sensing domains while maintaining data qual-
spectrogram analysis. Using RAG in combination with spe-
ity and retrieval efficiency, following principles established
cific code and image processing components, we provide our
in telecommunications-specific language models [48], [49].
methodology to develop a unified system that enhances LLM
Our database construction methodology encompasses inter-
capabilities.
connected stages that optimize the knowledge base for techni-
cal accuracy and retrieval performance, similar to approaches
A. System Architecture Overview validated in recent RAG evaluation frameworks [50], [51]. The
RFSensingGPT is a multi-modal architecture that uses an system architecture implements optimized signal processing
integrated interface to perform different RF sensing requests techniques for RF data analysis [40], while leveraging modern
with simplicity. Figure 1 illustrates the system’s pipeline that developments in multi-modal retrieval systems [47].
connects multiple components that manage various user inter- 1) Dataset Selection and RF Content Filtering: The pre-
actions while maintaining context continuity. The framework training landscape in RF sensing presents a unique challenge
processes three main types of interactions: due to the absence of dedicated RF sensing-specific datasets.
1) The technical Q&A component is powered by a special- However, significant RF sensing content exists within general-
ized RF knowledge library that provides accurate and purpose datasets like C4 [52], RefinedWeb [53], and RedPa-
specific responses about the principles and methods of jama [45]. Our approach leverages the RedPajama dataset as
RF sensing. the primary source, specifically focusing on its ArXiv, Stack-
2) The code generation and retrieval module allows users Exchange, and Books components, chosen for their extensive
to access through a validated library of RF-related technical coverage and RF sensing content [43]. To maintain
implementations and retrieve relevant code. domain specificity, we implemented a comprehensive filtering
3) The system analyzes RF spectrograms using advanced framework based on six essential criteria:
image processing algorithms to reveal underlying signal • Domain Specificity: Keywords must directly relate to RF
patterns and features for spectrogram analysis. sensing technologies (e.g., ’RF fingerprinting’, ’RF re-
This unified approach maintains technical precision across flectometry’), excluding general terms unless RF sensing-
all interaction modes and ensures consistent responses to specific.
an array of RF-related queries. Moreover, implementing an • Technical Precision: Selection focuses on precise RF
advanced radar-based human activity recognition framework, sensing concepts (e.g., ’micro-Doppler effect’, ’RF
the system classifies activity through analyzing spectrograms backscatter’), avoiding broader terminology unless RF
from various frequency bands, including Xethru, 77GHz, and sensing-specific.
24GHz. The architecture includes a modified CLIP a vision • Application Context: Encompasses RF sensing stan-
language model, using its deep learning capabilities to provide dards, protocols, and specifications centered on technical
precise activity recognition through the use of a few shot requirements.
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 4
Fig. 1: RFSensingGPT framework pipeline showing the three main interaction components (Q&A module at top-left, code
retrieval in middle-left, and Image analyzer at bottom-left) and their connections to data processing, query engine, embedding,
retrieval/evaluation systems.
• Contemporary Technology: Emphasizes current and quality domain-specific content filtering by integrating strict
emerging RF sensing technologies (e.g., ’cognitive RF technical accuracy criteria for RF applications while using
sensing’, ’AI-enabled RF sensing’) while preserving his- insights from recent developments in domain-specific language
torical context. models [55].
• Technical Standards: Incorporates established RF
sensing-specific standards and protocols aligned with Algorithm 1 RF Content Validation and Scoring
industry practices. 1:
• Technical Clarity: Keywords maintain unambiguous 2: VALIDATE C ONTENT(d, K, θ, α)
meaning within RF sensing, preventing cross-field am- 3: M ← C OUNT K EYWORDS(d, K) where K contains
biguity. at least 500 RF-specific keywords
This methodology aligns with recent advances in domain- 4: N ← W ORD C OUNT(d)
specific language model development [23] while ensuring 5: v ← M/ log(N + 1)
comprehensive coverage of RF sensing applications. 6: t←0
7: if v ≥ θ then
2) Content Enhancement and Validation: Our content filter-
8: context ← E XTRACT T ECHNICAL C ONTEXT(d)
ing methodology employs an advanced LLM-based approach
9: t ← LLM T ECHNICALV ERIFICATION(context)
for keyword expansion and validation [43]. The initial keyword
10: v ← α · v + (1 − α) · t with α = 0.6 in our
set undergoes systematic expansion through an LLM-driven
implementation
process, generating over 500 domain-specific keywords for
11: end if
RF sensing applications, ensuring through coverage while
12: return (v, t)
maintaining technical precision.
13:
Algorithm 1 formalizes our document validation process 14: E XTRACT T ECHNICAL C ONTEXT(d)
through a two-stage scoring mechanism. The first stage cal- 15: terms ← E XTRACT RFT ERMINOLOGY(d)
culates a density-based relevance score using the logarith- 16: relations ← I DENTIFY T ECHNICAL R ELATIONS(d)
mic metric (M/log(N+1)) [54] to normalize varying docu- 17: return {terms, relations}
ment lengths. Documents exceeding the validation threshold
θ proceed to technical verification using LLM-based analy-
sis. This analysis provides a combined score incorporating 3) Data Preprocessing and Transformation: The prepro-
both technical correctness and keyword density. The method cessing stages integrate several advanced techniques for opti-
uses E XTRACT T ECHNICAL C ONTEXT to identify RF-specific mizing data representation and retrieval efficiency [56]. The
terms and technical relationships for comprehensive domain pipeline converts heterogeneous content formats (including
validation. The system evaluates documents using prompt- LaTeX) into standardized Markdown, maintaining fidelity
engineered templates optimized for RF sensing content [20]. of technical notations while ensuring structural consistency.
Our assessment framework considers both keyword presence Each document is then enriched with comprehensive metadata
and contextual relevance, with validation thresholds specifi- including source information, technical categorization, and
cally tuned for RF sensing applications. Documents meeting relevance scores. We implement sophisticated token visualiza-
the relevance thresholds undergo additional LLM-based veri- tion methods using tokenizers (see Table. III) to analyze and
fication to ensure technical correctness and domain appropri- optimize content length distribution. This analysis identifies
ateness. This multi-stage verification process maintains high potential outliers and ensures optimal chunk sizes for efficient
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 5
{hierarchy level, section id} components using a weighted fusion approach that:
15: S ← S ∪ {c} – Balances semantic similarity with term-frequency
16: end if metrics through dynamic weight computation
17: end for – Modifies retrieval parameters according to query
18: return S characteristics
– Ensures technical precision in RF sensing contexts
5) Embedding Generation and Vector Database Imple- The algorithm incorporates parallel vector and term-based
mentation: Our embedding pipeline generates precise vector retrieval paths (lines 3-4), followed by dynamic weight com-
representations for RF sensing data using advanced models putation based on query characteristics (lines 5-6). The fu-
[57], [58]. Through extensive model evaluations, we assessed sion process combines scores through weighted summation
key performance metrics including hit rate, MRR, AP, and (lines 8-13), generating a ranked document set. This approach
nDCG (detailed results in Table III). Our evaluation frame- achieved a 15% improvement in retrieval accuracy compared
work benchmarks modern embedding models against each to single-path retrieval methods, as demonstrated in our eval-
other specifically for RF-domain retrieval tasks, with JinaAI- uation results in Table III.
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 6
Algorithm 3 Hybrid RF Document Retrieval hardware configurations while maintaining the high precision
1: (85.9%) already achieved by our current approach.
2: H YBRID R ETRIEVAL(q, D, τ ) where τ is similarity The visual data analysis components features a sophisticated
threshold spectrogram analysis framework that leverages a custom image
3: vdocs ← V ECTOR I NDEX R ETRIEVAL(q, D, τ ) processing pipeline powered by CLIP-based models [37],
4: bdocs ← BM25R ETRIEVAL(q, D) [38]. Optimized for RF spectrogram interpretation tasks, this
5: wv ← C OMPUTE V ECTORW EIGHT(q) framwork excels in automated pattern identification, facilitates
6: wb ← 1 − wv precise technical measurements, and enables correlation with
7: R←∅ established theoretical models [35], providing comprehensive
8: for each d ∈ D do analysis capabilities for visual RF data.
9: sv ← G ET V ECTOR S CORE(d, vdocs )
10: sb ← G ET BM25S CORE(d, bdocs )
11: sf inal ← wv · sv + wb · sb E. Image Analysis Component
12: R ← R ∪ {(d, sf inal )} The RFSensingGPT system incorporates a sophisticated
13: end for image analysis framework specifically designed for processing
14: return S ORT(R) and interpreting RF spectrograms. This component leverages
advanced deep learning architectures to enable robust activity
recognition and pattern analysis across multiple frequency
3) Response Synthesis and Reranking: RFSensingGPT re- bands.
sponse generation framework employs a multilayered refine- The proposed system presents an advanced activity recogni-
ment process to ensure technical precision. The system lever- tion framework that processes multi-band radar spectrograms
ages a specialized RF sensing prompt template that maintains (24GHz, 77GHz, and Xethru) through a modified CLIP-based
accurate RF terminology, contextualizes measurements, and architecture [37], achieving 93.23% accuracy across eleven
delineates theoretical principles for practical implementations. distinct human activities including away, bend, crawl, kneel,
Our prompt engineering incorporates signal-specific contextual limp, pick, scissor, sit, step, toes, and toward movements
elements including frequency characteristics, environmental [40]. The processing pipeline integrates three key components:
conditions, and application requirements that significantly a CLIP processor for spectrogram normalization, a vision
enhance retrieval relevance. By embedding these contextual encoder with dropout(0.5), and a dense classification layer,
parameters within the prompt structure, the system can dis- enabling robust feature extraction and classification [38]. To
tinguish between similar technical concepts that differ in ensure optimal performance, we implemented sophisticated
application context, improving signal retrieval precision by FFT-based signal processing techniques with balanced sam-
12.3% in comparative evaluations (Table III) while maintain- pling strategies across frequency bands, while utilizing the
ing correctness scores of 4.0–5.0 . AdamW optimizer with OneCycleLR scheduler and Cross
The JinaRerank model integration enhances response quality Entropy Loss, training on an 80/20 data split [18]. Our
through evaluation mechanisms that assess technical precision modified CLIP implementation maintains the original vision
by examining hierarchical relationships within RF sensing transformer backbone while incorporating a moderate dropout
domains and optimizing logical flow in technical explanations. rate (0.5) in the classification head. The training methodology
Our testing shows that temperature values between 0.3-0.5 followed standard practices for fine-tuning vision-language
produce more focused and technically precise responses for models on domain-specific data, using AdamW optimization
factual RF queries, while values of 0.7-0.8 generate more with OneCycleLR scheduling, achieving 93.23% accuracy.”
diverse explanations for conceptual or theoretical questions. Expanding on low-level details (e.g., layer-specific modifica-
An adaptive approach that modulates these parameters based tions) would shift focus from our core contribution. However,
on query classification could further improve response quality the training hyper parameters (e.g., dropout rate, optimizer
and relevance. settings) and performance metrics provide sufficient repro-
4) Multi-modal Integration: RFSensingGPT integrated sys- ducibility for RF sensing applications.
tem architecture comprises three synergistic components that The system’s practical implementation includes a user-
deliver robust response capabilities. The first component fo- friendly Gradio interface that facilitates real-time visualiza-
cuses on technical documentation component systematically tion and classification capabilities, making it suitable for
processes and integrates responses from the RF sensing knowl- both research analysis and operational monitoring applications
edge repository, ensuring technical precision and contextual [35]. Our comprehensive evaluation demonstrates the system’s
alignment. The architecture implements a specialized code robust performance across varying radar configurations and
retrieval mechanism with advanced indexing techniques opti- environmental conditions, achieving a Hit Rate of 0.9493
mized for RF sensing code examples [56], incorporating both and mean reciprocal rank (MRR) of 0.82732. The CLIP
implementation methodologies and technical specifications.In architecture’s contrastive learning principles enable consistent
future work, we plan to explore extending our current code pattern recognition capabilities across diverse radar signatures,
generation capabilities with specialized RF code patterns and making our system particularly effective for real-world hu-
signal processing templates. This could further enhance the man activity monitoring applications [36]. Our spectrogram
optimization of generated implementations for parameterized processing pipeline enhances model robustness through three
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 7
rigorously optimized augmentation techniques including time TABLE II: Dataset size impact on performance metrics
shifts (±10%), frequency masking (10-15% bands), and ampli- Metric 5K 10K 20K 40K 80K
tude scaling (±3dB). These techniques improve generalization Hit rate 0.9321 0.93321 0.9429 0.94523 0.9493
MRR 0.81234 0.81034 0.81593 0.82045 0.82732
across varying signal conditions while preserving essential Precision 0.3024 0.3024 0.2903 0.29542 0.3238
pattern characteristics critical for activity recognition. Recall 0.9023 0.8923 0.9283 0.9283 0.9372
AP 0.81253 0.82345 0.81823 0.81238 0.81425
NDCG 0.92382 0.93245 0.934823 0.932455 0.94782
III. R ESULTS AND E VALUATION
RAG R-1 0.3835 0.3086 0.3295 0.3247 0.3866
Our evaluation demonstrates the effectiveness of the RAG- RAG R-2 0.3429 0.2756 0.2635 0.299 0.3255
based RF sensing assistant using both quantitative metrics RAG R-L 0.3429 0.3509 0.3373 0.3451 0.3255
LLM R-1 0.2545 0.2394 0.2394 0.2455 0.2552
and qualitative analysis. The results show that our system LLM R-2 0.2162 0.2398 0.0262 0.2287 0.2372
significant outperforms conventional RF sensing approaches, LLM R-L 0.2909 0.1235 0.111 0.1064 0.1221
particularly in retrieval accuracy and response quality. Containment 0.6694 0.5352 0.5833 0.5896 0.5294
RAG Faithfulness 0.9713 0.9779 0.9141 0.9033 0.9617
LLM Faithfulness 0.8162 0.8378 0.8323 0.8506 0.8506
A. Experimental Setup
Our performance evaluation was conducted on a computing
cluster with GPU acceleration for both embedding generation ROUGE scores reveal distinct patterns between RAG and
and model inference tasks. For the RAG implementation, we LLM implementations. RAG ROUGE-1 (R-1) scores range
utilized the architecture described in Section II with embed- from 0.3086 to 0.3835, outperforming LLM R-1 (0.2394 to
dings generated using the models specified in Table III. The 0.2552). This performance gap amplifies in ROUGE-L (R-
vector storage system employed ChromaDB persistent client L) metrics, where RAG maintains stability (0.3255-0.4534)
with consistent document indexing across experimental runs. while LLM performance declines (0.2909-0.1221). In terms
All benchmarks were executed with consistent hyperparame- of output quality, RAG architecture demonstrates superior
ters across test configurations to ensure fair comparison. For faithfulness, consistently exceeding 0.90 across all test sets,
the CLIP-based image analysis component, we implemented compared to LLM’s 0.83-0.85 range. However, the contain-
the vision encoder with dropout(0.5) and trained using the ment metric shows a progressive decline from 0.6694 (5K) to
AdamW optimizer with OneCycleLR scheduler as specified 0.5294 (80K), indicating potential challenges in maintaining
in Section II-E. The image analysis pipeline leveraged a information boundaries as the training corpus expands.
modified CLIP architecture with custom-trained weights for
RF spectrogram pattern recognition, processing samples from C. Comprehensive Model Performance and Configuration
multiple radar frequency bands (24GHz, 77GHz, and Xethru). Analysis
The complete system was integrated through a Gradio-based Table III presents a comprehensive evaluation of various
interface that facilitated real-time evaluation across technical embedding models and configurations. Performance patterns
Q&A, code generation, and RF spectrogram analysis func- reveal distinct strengths across different architectures: ADA-
tionalities. This unified testing environment ensured consistent 002 demonstrates exceptional hit rate scaling (0.900834 at k=3
performance metrics collection across all experimental config- to 0.94214 at k=12) while maintaining optimal correctness
urations as detailed in Tables II-IV. scores (4.5-5.0). Despite its precision declining with increasing
k-values, its overall retrieval capabilities remain robust. All-
B. System Performance Analysis MiniLM-L6-v2 (ML-L6) achieves an impressive balance of
Table II shows that scaling the dataset from 5K to 80K metrics at k=12 (precision: 0.20453, recall: 0.95342, NDCG:
documents improved retrieval metrics, with hit rates increasing 0.95939), making it particularly effective for applications
from 0.9321 to 0.9493. The mean reciprocal rank (MRR) im- requiring high recall. As illustrated in Fig. 2, the radar compar-
proved from 0.81234 to 0.82732, while normalized discounted ison of performance metrics at k=3 demonstrates the relative
cumulative gain (NDCG) demonstrates strong gains from strengths of each embedding model, with ADA-002 showing
0.92382 to 0.94782. However, performance metrics showed well-balanced metrics across all dimensions while maintaining
diminishing returns after 40K samples, with MRR increasing superior correctness scores of 4.5 compared to other models’
only by 0.00687 between 40K and 80K datasets, suggesting 4.0 rating.
a performance plateau. Furthermore, precision metrics main- Our section-level chunking strategy (k=3) consistently deliv-
tained around 0.30 (±0.02), while recall demonstrates robust ers optimal precision across all models (ADA-002: 0.311408,
performance between 0.90 and 0.93. The result also revealed text-embedding-3-large (TE-3L): 0.3126, ML-L6: 0.30294),
that average precision (AP) remained stabile at 0.81-0.82 while higher k-values favor improved recall rates. JinaAI-small
across all dataset sizes, indicating consistent ranking perfor- (JA-S) stands out with peak precision (0.39163) at k=9, though
mance independent of data volume. As detailed in Section its k=12 configuration lacks complete recall metrics. Notably,
II-C4, our chunking approach significantly affects retrieval Bge-small-en-v1.5 (BGE-S1.5) maintains consistent MRR per-
performance. The optimal balance between precision and formance (0.82349 to 0.82974) across all configurations, in-
recall was achieved at lower k-values (k=3), consistently deliv- dicating stability in ranking relevance. These configuration-
ering precision between 0.31-0.32 across embedding models specific performance variations highlight the importance of
while maintaining correctness scores between 4.0-5.0. aligning embedding model selection with application require-
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 8
TABLE III: Comparative evaluation of embedding models with configuration parameters and performance metrics
Model Config Hit Rate MRR Precision Recall AP NDCG Dataset Chunking Correctness
k=3 0.900834 0.811135 0.311408 0.900834 0.811135 0.830761 section 4.5
k=6 0.920939 0.815375 0.163037 0.920939 0.815375 0.84451 sub-section 5.0
ADA-002
k=9 0.938319 0.810524 0.082724 0.932310 0.810524 0.847516 sub-subsection 5.0
k=12 0.94214 0.817431 0.05534 0.94214 0.817431 0.849926 0.848926 4.5
k=3 0.012118 0.812227 0.3126 0.012118 0.812227 0.838138 section 4.0
k=6 0.913845 0.813256 0.15405 0.919348 0.813256 0.839245 sub-section 3.5
TE-3L
k=9 0.92956 0.813584 0.07345 0.920285 0.813586 0.842546 sub-subsection 4.0
k=12 0.938310 0.814788 0.050515 0.938310 0.814788 0.846076 0.846076 4.5
k=3 0.913853 0.81124 0.30294 0.92934 0.812353 0.84932 section 4.0
k=6 0.930245 0.813455 0.31435 0.93455 0.813454 0.853043 sub-section 4.5
ML-L6
k=9 0.935893 0.814539 0.234857 0.95235 0.815549 0.85709 sub-subsection 4.5
k=12 0.94721 0.818802 0.20453 0.95342 0.81703 0.95939 0.85938 4.0
k=3 0.901398 0.82349 0.31234 0.91283 0.82304 0.83425 section 4.0
k=6 0.920283 0.824582 0.23425 0.92342 0.82342 0.84953 sub-section 4.5
BGE-S1.5
k=9 0.940333 0.825843 0.13693 0.92312 0.82362 0.85283 sub-subsection 4.5
k=12 0.94831 0.82974 0.11235 0.93342 0.82073 0.85934 0.85934 4.0
k=3 0.82334 0.82034 0.29235 0.82334 0.82034 0.84539 section 4.0
k=6 0.920234 0.823445 0.23445 0.94832 0.82343 0.85734 sub-section 4.5
JA-S
k=9 0.947825 0.823432 0.39163 0.9532 0.83503 0.8963 sub-subsection 4.5
k=12 0.95234 0.82342 0.3683 0.9434 0.82342 0.89234 0.8792 4.0
k=3 0.90235 0.81236 0.31374 0.9123 0.81234 0.83204 section 4.0
k=6 0.90342 0.81334 0.31642 0.9238 0.81235 0.82452 sub-section 4.5
JinaAI-base
k=9 0.91345 0.814835 0.20302 0.9389 0.81463 0.84839 sub-subsection 4.5
k=12 0.92384 0.81593 0.20425 0.95345 0.81756 0.84923 0.84923 4.0
TABLE IV: Language model performance metrics: RAG vs Base LLM implementation
RAG ROUGE LLM ROUGE
Model Faithfulness Relevancy Correctness
R-1 R-2 R-L R-1 R-2 R-L
GPT-3.5-turbo 0.4310 0.398 0.4297 0.2102 0.1935 0.2245 0.88 0.8103 4.0
GPT-4o 0.492 0.4002 0.486 0.389 0.2935 0.31045 0.87 0.8244 3.5
Gemma2-9b-it 0.513 0.4231 0.501 0.3214 0.2897 0.3319 0.86 0.8331 4.0
Llama-3.3-70b 0.519 0.42014 0.5012 0.39 0.315 0.2891 0.8534 0.9348 3.5
Llama-2-70b 0.47088 0.345 0.4657 0.3735 0.27234 0.30093 0.8234 0.8803 5.0
Deepseek-r1-distill-llama-70b 0.5018 0.365 0.4857 0.3335 0.27234 0.31093 0.8624 0.8503 4.0
text window optimization, retrieval precision, and response 4) Resource optimization: Configuration performance
quality. Context window optimization challenges are evident varies, requiring dynamic resource allocation. Target
from ML-L6’s performance at k=12 (precision: 0.20453, re- NDCG >0.90 while optimizing costs for larger contexts
call: 0.95342), demonstrating a clear precision-recall trade- (k>9).
off that needs improved document segmentation strategies.
The containment metric’s decline from 0.6694 (5K dataset)
G. Computational Requirements Analysis
to 0.5294 (80K dataset) reinforces this challenge. Retrieval
precision exhibits significant variation across configurations, The RFSensingGPT framework demonstrated complex
ranging from ADA-002’s 0.05534 to JA-S’s 0.3683 at k=12. computational characteristics under both CPU and GPU im-
Despite high hit rates (0.94214 for ADA-002), these variations plementations.
suggest potential for retrieval mechanisms improvements. In 1) CPU-Based Performance: When implemented on a Win-
response generation quality, Llama-3.3-70b achieves the high- dows 10 system with an Intel Core i7-7600U processor,
est RAG R-1 score (0.519), representing a 147% improvement knowledge base embedding generation consumed 8-12 hours
over baseline LLM performance (0.2102 to 0.389). However, with CPU utilization peaking at 36.4% and memory reaching
faithfulness metrics deteriorated with increased dataset size, 62.4%. Spectrogram processing required 120 training epochs
declining from 0.9713 to 0.9617. spanning 1866.36 seconds with an inference rate of 185.13
Future Evaluation Framework: To address these limita- milliseconds per sample. The CLIP-based vision model fine-
tions, we propose extending our evaluation through: (1) com- tuning process required 10-15 hours, with CPU usage fluctu-
parative assessment against specialized RF domain models in ating between 24-43.9%. Retrieval operations increased CPU
varied deployment environments to evaluate device-type vari- usage from 14.8% to 36.4% and required 536.97 seconds,
ability effects, and (2) validation with larger corpora exceeding while generation completed efficiently in just 11.91 seconds.
100K documents, including real-world RF field measurements In deployment, the system achieved perfect scores for both
under dynamic spectrum conditions. This expanded framework faithfulness and relevancy, completing batch evaluation in
would establish enterprise-scale performance characteristics 12.71 seconds.
while validating the system’s practical utility across diverse 2) GPU-Accelerated Performance: GPU acceleration using
RF sensing applications, from through-wall sensing to multi- an NVIDIA RTX A4000 revealed significant performance
band spectrum sharing architectures. enhancements. For the 5K document dataset, CUDA-based
evaluation completed in just 22.04 seconds with minimal GPU
memory allocation (0.63 GB) and peak utilization of only 4%.
F. Strategic Development Priorities The creation of essential data structures (filtered data, embed-
dings, keywords, documents, and nodes stored as PKL files)
We identify four evidence-backed development priorities required 40-60 minutes for the 5K dataset. Batch evaluation
based on the analysis. These priorities are supported by was more efficient at 4.78 seconds while maintaining perfect
comprehensive performance data and provide clear targets for faithfulness and relevancy scores. However, when scaling to
system improvement while addressing current limitations. 80K documents, processing time increased substantially to ap-
1) Adaptive chunk sizing: At k=3, models achieve optimal proximately 9 hours for PKL file creation, despite maintaining
performance (correctness: 4.0-4.5). Dynamic chunk siz- minimal changes in computational resource utilization (GPU
ing recommended to improve precision-recall balance, memory allocation increasing only from 0.63GB to 0.66GB).
given precision decline (0.311408 to 0.05534) with 3) Scaling Considerations: Scaling beyond 80K documents
increasing k. presents measurable challenges in both CPU and GPU imple-
2) Hybrid embedding integration: ML-L6 shows bal- mentations. Our performance profiling reveals that while GPU
anced metrics (precision: 0.20453, recall: 0.95342, acceleration provides significant benefits for smaller datasets,
NDCG: 0.95939) while JA-S excels in precision larger document collections require architectural modifica-
(0.3683). Combine approaches targeting precision tions to maintain performance. These findings highlight the
>0.30, recall >0.95. necessity for advanced partitioning strategies, distributed in-
3) Retrieval mechanism enhancement: Hit rates (0.9321 dexing architectures, and hierarchical retrieval mechanisms
to 0.9493) and MRR (0.81234 to 0.82732) improve from for enterprise-scale deployments. Effective scaling strategies
5K to 80K documents. Focus on maintaining precision must balance computational efficiency with semantic accuracy,
at scale, targeting containment >0.60. particularly in resource-constrained RF sensing applications.
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 10
H. Integrated RF Analysis Interface and Model Comparison (R1: 0.39, R2: 0.315, R3: 0.2891). Detailed analysis reveals
The RFSensingGPT system implements a comprehensive that these improvements vary by metric, with R1 showing
gradio-based interface that facilitates RF domain analysis a 33.1% increase ((0.519-0.39)/0.39) and R-L demonstrating
through an intuitive user interaction framework. The system’s a 73.4% improvement ((0.5012-0.2891)/0.2891), establishing
dual-panel architecture, as shown in Fig. 4, enables direct an enhancement range of 33-73% across evaluation metrics.
comparison between RAG-enhanced and baseline function- The hierarchical chunking approach using MarkdownHeader-
alities, incorporating technical Q&A, code generation, and TextSplitter demonstrates effective performance with optimal
spectrogram analysis capabilities. The interface has distinc- precision at k=3 (0.31-0.32 across embedding models). As
tive components for code retrieval, spectrogram anlyzer, and shown in Table III, increasing k-values improves hit rates
retrieving RF domain data that are all tweaked for domain- and recall but significantly reduces precision. A promising
specific requirements. The technical Q&A capabilities of the enhancement would be a hybrid approach that maintains
system effectively handle complex RF sensing queries, and semantic boundaries while implementing adaptive size con-
the code retrieval module provides deployment-ready code straints based on content density. Our preliminary analysis
solutions together with the required documentation. The spec- suggests this could preserve the high precision of lower k-
trogram analysis framework enables the visual analyze and values while achieving the improved recall of larger chunks
exceptional understanding of RF spectrogram of radar through through dynamic overlap adjustment.
the use of advanced vision language model. Our current implementation uses a fixed RF-domain tem-
A distinctive feature of the system is its multi-model plate that maintains consistency across query types (e.g., spec-
comparative analysis framework, illustrated in Fig. 6, which trum analysis, signal processing, and device identification).
evaluates responses from six different language models (GPT- An opportunity for enhancement exists in developing context-
3.5-turbo, GPT-4, Gemma-2GB, LLaMA-3.3-70B, LLaMA-2- adaptive prompting that recognizes distinct RF query cate-
70B, and Deepseek) on RF domain queries. The evaluation ap- gories and dynamically adjusts prompt parameters. This ap-
proach employs an advanced scoring mechanism that considers proach could improve response specificity for specialized RF
both technical accuracy and contextual relevance, emphasizing applications while maintaining our demonstrated high faithful-
knowledge integration relevant to the RF domain. The com- ness scores (>0.90). Configuration optimization indicates k=3
parative analysis reveals performance variations through: (section-level chunking) provides optimal balance between
• Relevancy scores ranging from 0.5 to 1.0, with RAG
precision and computational efficiency, evidenced by peak
consistently achieving maximum relevancy (1.0) correctness scores of 4.0-4.5 across models. The precision-
• Correctness metrics varying from 2.5 to 4.5, where RAG
recall trade-off becomes pronounced at higher k values, shown
and LLaMA-2-70B show superior technical accuracy by ADA-002’s precision decrease from 0.311408 to 0.05534
(4.5) while hit rates improve from 0.900834 to 0.94214. These re-
sults establish the framework’s effectiveness in enhancing both
Figure 5 depicts the test scenarios, including a diverse range
retrieval accuracy and response quality, highlighting optimal
of RF sensing applications, from fundamental signal process-
configurations for various operational requirements. The con-
ing questions to complex system implementation challenges.
sistent improvements across evaluation metrics demonstrate
The intent of these scenarios is to assess theoretical knowledge
the system’s superiority in RF-sensing information retrieval
and practical implementation abilities across multiple model
applications.
architectures. The system maintains real-time performance
metrics across all interaction modes, providing immediate
feedback on response quality and enabling quantitative assess- C ONCLUSIONS
ment of RAG enhancement benefits in RF domain tasks. The This study presents RFSensingGPT, a novel retrieval-
framework for performance visualization provides clear rep- augmented generation framework that bridges the gap be-
resentations of model capabilities, enabling rapid comparison tween general language models and specialized RF sensing
and analysis of different approaches for solving problems in applications. Our comprehensive evaluation demonstrates that
the RF domain. This integrated approach ensures comprehen- RAG implementation consistently maintains hit rates over
sive evaluation while maintaining contextual coherence across 0.93 across varied dataset sizes and improves ROUGE scores
different analysis modalities. by 33-73% over baseline LLM approaches. The multi-modal
architecture, combining technical question-answering, code
IV. DISCUSSION retrieval, and spectrogram analysis, effectively addresses the
Our comprehensive evaluation reveals distinct performance complexity of RF domain tasks while maintaining faithfulness
patterns across model architectures. In retrieval task, ADA-002 scores consistently exceeding 0.90. Our key technical contri-
achieves hit rates of 0.94214 at k=12, while ML-L6’s demon- butions include: (1) a hybrid retrieval mechanism optimizing
strates optimal precision-recall balance (precision: 0.20453, both semantic and keyword-based relevance for RF technical
recall: 0.95342, NDCG: 0.95939). JA-S shows strong preci- content, with precision rates of 0.31-0.32 at lower k-values;
sion (0.3683 at k=12), through lower recall metrics.ROUGE (2) a hierarchical chunking strategy that preserves document
scores indicate significant improvements with RAG implemen- structure while enabling precise information retrieval, achiev-
tation. Llama-3.3-70b leads with RAG scores (R1: 0.519, R2: ing a 15% improvement in retrieval performance; and (3) the
0.42014, R3: 0.5012) compared to baseline LLM performance integration of vision-language models adapted specifically for
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 11
Fig. 4: Integrated RFSensing GPT interface demonstrating multi-modal capabilities for technical Q&A, code retrieval, and
spectrogram analysis with performance metrics visualization.
Fig. 5: Test scenarios demonstrating RF domain query processing for documentation and implementation use cases.
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 12
Fig. 6: Comparative analysis of model responses to RF sensing queries, showing RAG-enhanced performance metrics across
multiple LLM architectures (GPT, LLaMA, Gemma, and Deepseek) with relevancy and correctness scores.
RF spectrogram interpretation, achieving 93.23% accuracy in [3] J. Cui, Z. Li, Y. Yan, B. Chen, and L. Yuan, “Chatlaw: Open-source
radar data analysis tasks. The framework’s ability to maintain legal large language model with integrated external knowledge bases,”
arXiv preprint arXiv:2306.16092, 2023.
robust performance across diverse query types and dataset [4] S. Wu, O. Irsoy, S. Lu, V. Dabravolski, M. Dredze, S. Gehrmann,
sizes provides a foundational blueprint for implementing RAG P. Kambadur, D. Rosenberg, and G. Mann, “Bloomberggpt: A large
architectures in specialized technical domains. These ad- language model for finance,” arXiv preprint arXiv:2303.17564, 2023.
[5] H. Zou, Q. Zhao, L. Bariah, Y. Tian, M. Bennis, S. Lasaulce, M. Debbah,
vancements demonstrate how integrating LLMs with domain- and F. Bader, “Genainet: Enabling wireless collective intelligence via
specific knowledge retrieval can enhance technical expertise in knowledge transfer and reasoning,” arXiv preprint arXiv:2402.16631,
RF sensing applications, combining the generalization capabil- 2024.
[6] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman,
ities of foundation models with the precision requirements of D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4
specialized domains. Future research should focus on expand- technical report,” arXiv preprint arXiv:2303.08774, 2023.
ing the framework’s applicability across diverse specialized [7] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux,
T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al.,
domains while optimizing retrieval mechanisms to maintain “Llama: Open and efficient foundation language models,” arXiv preprint
high precision with large-scale datasets. arXiv:2302.13971, 2023.
[8] A. Q. Jiang, A. Sablayrolles, A. Roux, A. Mensch, B. Savary, C. Bam-
ford, D. S. Chaplot, D. d. l. Casas, E. B. Hanna, F. Bressand et al.,
ACKNOWLEDGMENTS “Mixtral of experts,” arXiv preprint arXiv:2401.04088, 2024.
This work was supported by the UK EPSRC’s Communica- [9] Z. Liu, B. Oguz, C. Zhao, E. Chang, P. Stock, Y. Mehdad, Y. Shi, R. Kr-
ishnamoorthi, and V. Chandra, “Llm-qat: Data-free quantization aware
tions Hub for Empowering Distributed ClouD Computing Ap- training for large language models,” arXiv preprint arXiv:2305.17888,
plications and Research under grant numbers EP/Y037421/1 2023.
and EP/X040518/1. [10] H. Wang, S. Ma, L. Dong, S. Huang, H. Wang, L. Ma, F. Yang, R. Wang,
Y. Wu, and F. Wei, “Bitnet: Scaling 1-bit transformers for large language
models,” arXiv preprint arXiv:2310.11453, 2023.
C ONFLICT OF I NTEREST [11] Z. Wan, X. Wang, C. Liu, S. Alam, Y. Zheng, Z. Qu, S. Yan, Y. Zhu,
Q. Zhang, M. Chowdhury et al., “Efficient large language models: A
The authors declare that they have no known competing survey,” arXiv preprint arXiv:2312.03863, vol. 1, 2023.
financial interests or personal relationships that could have [12] H. Yang, X.-Y. Liu, and C. D. Wang, “Fingpt: Open-source financial
large language models,” arXiv preprint arXiv:2306.06031, 2023.
appeared to influence the work reported in this paper. [13] N. Wang, H. Yang, and C. D. Wang, “Fingpt: Instruction tuning
benchmark for open-source large language models in financial datasets,”
R EFERENCES arXiv preprint arXiv:2310.04793, 2023.
[14] X.-Y. Liu, G. Wang, H. Yang, and D. Zha, “Fingpt: Democratizing
[1] V. Veeramachaneni, “Large language models: A comprehensive survey internet-scale data for financial large language models,” arXiv preprint
on architectures, applications, and challenges,” Advanced Innovations in arXiv:2307.10485, 2023.
Computer Programming Languages, vol. 7, no. 1, pp. 20–39, 2025. [15] Z. Xi, W. Chen, X. Guo, W. He, Y. Ding, B. Hong, M. Zhang, J. Wang,
[2] K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, S. Jin, E. Zhou et al., “The rise and potential of large language model
K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al., “Towards expert- based agents: A survey,” arXiv preprint arXiv:2309.07864, 2023.
level medical question answering with large language models,” arXiv [16] P. Colombo, T. P. Pires, M. Boudiaf, D. Culver, R. Melo, C. Corro, A. F.
preprint arXiv:2305.09617, 2023. Martins, F. Esposito, V. L. Raposo, S. Morgado et al., “Saullm-7b: A pio-
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING 13
neering large language model for law,” arXiv preprint arXiv:2403.03883, clip and its biomedical ai variants,” in 2024 36th Conference of Open
2024. Innovations Association (FRUCT). IEEE, 2024, pp. 578–584.
[17] Z. Zhang, Y. Sheng, T. Zhou, T. Chen, L. Zheng, R. Cai, Z. Song, [39] Z. Qin, Y. Yin, D. Campbell, X. Wu, K. Zou, Y.-C. Tham, N. Liu,
Y. Tian, C. Ré, C. Barrett et al., “H2o: Heavy-hitter oracle for efficient X. Zhang, and Q. Chen, “Lmod: A large multimodal ophthalmology
generative inference of large language models,” Advances in Neural dataset and benchmark for large vision-language models,” arXiv preprint
Information Processing Systems, vol. 36, 2024. arXiv:2410.01620, 2024.
[18] T. Dao, D. Fu, S. Ermon, A. Rudra, and C. Ré, “Flashattention: Fast and [40] S. Gezici, Z. Tian, G. B. Giannakis, H. Kobayashi, A. F. Molisch,
memory-efficient exact attention with io-awareness,” Advances in Neural H. V. Poor, and Z. Sahinoglu, “Localization via ultra-wideband radios:
Information Processing Systems, vol. 35, pp. 16 344–16 359, 2022. a look at positioning aspects for future sensor networks,” IEEE signal
[19] A. Maatouk, F. Ayed, N. Piovesan, A. De Domenico, M. Debbah, and processing magazine, vol. 22, no. 4, pp. 70–84, 2005.
Z.-Q. Luo, “Teleqna: A benchmark dataset to assess large language mod- [41] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval
els telecommunications knowledge,” arXiv preprint arXiv:2310.15051, augmented language model pre-training,” in International conference
2023. on machine learning. PMLR, 2020, pp. 3929–3938.
[20] L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, [42] J. Zhang, J. Cambronero, S. Gulwani, V. Le, R. Piskac, G. Soares,
“Large generative ai models for telecom: The next big thing?” IEEE and G. Verbruggen, “Repairing bugs in python assignments using large
Communications Magazine, 2024. language models,” arXiv preprint arXiv:2209.14876, 2022.
[21] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, [43] S. Gururangan, A. Marasović, S. Swayamdipta, K. Lo, I. Beltagy,
and I. Sutskever, “Zero-shot text-to-image generation,” in International D. Downey, and N. A. Smith, “Don’t stop pretraining: Adapt language
conference on machine learning. Pmlr, 2021, pp. 8821–8831. models to domains and tasks,” arXiv preprint arXiv:2004.10964, 2020.
[22] S. Xu, C. K. Thomas, O. Hashash, N. Muralidhar, W. Saad, and [44] V. Karpukhin, B. Oğuz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen,
N. Ramakrishnan, “Large multi-modal models (lmms) as universal and W.-t. Yih, “Dense passage retrieval for open-domain question
foundation models for ai-native wireless systems,” arXiv preprint answering,” arXiv preprint arXiv:2004.04906, 2020.
arXiv:2402.01748, 2024. [45] T. Computer, “Redpajama-data-1t,” https://huggingface.co/datasets/
[23] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, togethercomputer/RedPajama-Data-1T, 2023, accessed: 2024-11-04.
J. Gao, and H. Poon, “Domain-specific language model pretraining [46] C.-M. Chan, C. Xu, R. Yuan, H. Luo, W. Xue, Y. Guo, and J. Fu,
for biomedical natural language processing,” ACM Transactions on “Rq-rag: Learning to refine queries for retrieval augmented generation,”
Computing for Healthcare (HEALTH), vol. 3, no. 1, pp. 1–23, 2021. arXiv preprint arXiv:2404.00610, 2024.
[24] Y. Du, H. Deng, S. C. Liew, K. Chen, Y. Shao, and H. Chen, “The power [47] P. Finardi, L. Avila, R. Castaldoni, P. Gengo, C. Larcher, M. Piau,
of large language models for wireless communication system develop- P. Costa, and V. Caridá, “The chronicles of rag: The retriever, the chunk
ment: A case study on fpga platforms,” arXiv preprint arXiv:2307.07319, and the generator,” arXiv preprint arXiv:2401.07883, 2024.
2023. [48] H. Zhou, C. Hu, Y. Yuan, Y. Cui, Y. Jin, C. Chen, H. Wu, D. Yuan,
[25] B. Roziere, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, L. Jiang, D. Wu et al., “Large language model (llm) for telecommu-
J. Liu, R. Sauvestre, T. Remez et al., “Code llama: Open foundation nications: A comprehensive survey on principles, key techniques, and
models for code,” arXiv preprint arXiv:2308.12950, 2023. opportunities,” arXiv preprint arXiv:2405.10825, 2024.
[26] J. Liu, R. Ding, L. Zhang, P. Xie, and F. Huang, “Cofe-rag: A compre- [49] K. B. Kan, H. Mun, G. Cao, and Y. Lee, “Mobile-llama: Instruction
hensive full-chain evaluation framework for retrieval-augmented gener- fine-tuning open-source llm for network analysis in 5g networks,” IEEE
ation with enhanced data diversity,” arXiv preprint arXiv:2410.12248, Network, 2024.
2024. [50] A. Balaguer, V. Benara, R. L. de Freitas Cunha, R. d. M. Estevão Filho,
[27] M. R. Rezaei, M. Hafezi, A. Satpathy, L. Hodge, and E. Pourjafari, T. Hendry, D. Holstein, J. Marsman, N. Mecklenburg, S. Malvar, L. O.
“At-rag: An adaptive rag model enhancing query efficiency with topic Nunes et al., “Rag vs fine-tuning: Pipelines, tradeoffs, and a case study
filtering and iterative reasoning,” arXiv preprint arXiv:2410.12886, on agriculture,” arXiv e-prints, pp. arXiv–2401, 2024.
2024. [51] O. Ovadia, M. Brief, M. Mishaeli, and O. Elisha, “Fine-tuning or
[28] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, retrieval? comparing knowledge injection in llms,” arXiv preprint
H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval- arXiv:2312.05934, 2023.
augmented generation for knowledge-intensive nlp tasks,” Advances in [52] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena,
Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020. Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning
[29] Y. Yang, X. Li, H. Jin, and K. Huang, “Advancing structured with a unified text-to-text transformer,” Journal of machine learning
query processing in retrieval-augmented generation with generative research, vol. 21, no. 140, pp. 1–67, 2020.
semantic integration,” Frontiers in Computing and Intelligent Systems, [53] G. Penedo, Q. Malartic, D. Hesslow, R. Cojocaru, A. Cappelli,
2024. [Online]. Available: https://api.semanticscholar.org/CorpusID: H. Alobeidli, B. Pannier, E. Almazrouei, and J. Launay, “The refinedweb
273572729 dataset for falcon llm: outperforming curated corpora with web data, and
[30] C. Jeong, “A study on the implementation method of an agent-based web data only,” arXiv preprint arXiv:2306.01116, 2023.
advanced rag system using graph,” arXiv preprint arXiv:2407.19994, [54] D. Wu, X. Wang, Y. Qiao, Z. Wang, J. Jiang, S. Cui, and F. Wang,
2024. “Large language model adaptation for networking,” arXiv preprint
[31] J. Wang, Y. He, and H. Chen, “Repogenreflex: Enhancing repository- arXiv:2402.02338, 2024.
level code completion with verbal reinforcement and retrieval- [55] H. Zou, Q. Zhao, L. Bariah, M. Bennis, and M. Debbah, “Wireless
augmented generation,” arXiv preprint arXiv:2409.13122, 2024. multi-agent generative ai: From connected intelligence to collective
[32] Q. Gou, Y. Dong, Y. Wu, and Q. Ke, “Rrgcode: Deep hierarchical search- intelligence,” arXiv preprint arXiv:2307.02757, 2023.
based code generation,” Journal of Systems and Software, vol. 211, p. [56] Y. A. Malkov and D. A. Yashunin, “Efficient and robust approxi-
111982, 2024. mate nearest neighbor search using hierarchical navigable small world
[33] H. Li, X. Zhou, and Z. Shen, “Rewriting the code: A simple method graphs,” IEEE transactions on pattern analysis and machine intelligence,
for large language model augmented code search,” arXiv preprint vol. 42, no. 4, pp. 824–836, 2018.
arXiv:2401.04514, 2024. [57] OpenAI, “New embedding models and api updates,” 2023,
[34] G. Kim, J. Kim, H. Park, W. Shin, and T.-H. Kim, “Assessing the accessed: 2024-04-18. [Online]. Available: https://openai.com/blog/
answerability of queries in retrieval-augmented code generation,” arXiv new-embedding-models-and-api-updates
preprint arXiv:2411.05547, 2024. [58] H. Face, “Hugging face models: bge-large-en-v1.5,” https://huggingface.
[35] C. Neo, L. Ong, P. Torr, M. Geva, D. Krueger, and F. Barez, “Towards co/models?sort=trending&search=bge-large-en-v1.5, accessed: 2024-
interpreting visual information processing in vision-language models,” 10-05.
arXiv preprint arXiv:2410.07149, 2024. [59] C.-C. Developers, “Chroma: Open-source embedding database,” https:
[36] P. Verma, M.-H. Van, and X. Wu, “Beyond human vision: The role //github.com/chroma-core/chroma, accessed: 2024-10-05.
of large vision language models in microscope image analysis,” arXiv [60] A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V. Ramanu-
preprint arXiv:2405.00876, 2024. jan, W. Howard-Snyder, K. Chen, S. Kakade, P. Jain et al., “Matryoshka
[37] C. Gou, A. Felemban, F. F. Khan, D. Zhu, J. Cai, H. Rezatofighi, representation learning,” Advances in Neural Information Processing
and M. Elhoseiny, “How well can vision language models see image Systems, vol. 35, pp. 30 233–30 249, 2022.
details?” arXiv preprint arXiv:2408.03940, 2024.
[38] T. Patel, H. El-Sayed, and M. K. Sarker, “Evaluating vision-language
models for hematology image classification: Performance analysis of
Khan, M. Z., Ge, Y., Mollel, M., Mccann, J., Abbasi, Q. H. and Imran,
M. (2025) RFSensingGPT: a multi-modal RAG-enhanced framework for
integrated sensing and communications intelligence in 6G networks. IEEE
Transactions on Cognitive Communications and
Networking, (doi: 10.1109/TCCN.2025.3558069).
There may be differences between this version and the published version.
You are advised to consult the publisher’s version if you wish to cite from
it.
http://eprints.gla.ac.uk/352233/