0% found this document useful (0 votes)
20 views46 pages

Full Text 01

The document presents a research project focused on developing a data-driven chatbot to improve vehicle diagnostics in the automotive repair sector, in collaboration with Volvo Group. It highlights the inefficiencies of traditional diagnostic systems and proposes two chatbot systems utilizing a Retrieval Augmented Generation framework to enhance service delivery. The research aims to integrate AI technologies while addressing challenges such as data privacy and system scalability.

Uploaded by

aanikmagar003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views46 pages

Full Text 01

The document presents a research project focused on developing a data-driven chatbot to improve vehicle diagnostics in the automotive repair sector, in collaboration with Volvo Group. It highlights the inefficiencies of traditional diagnostic systems and proposes two chatbot systems utilizing a Retrieval Augmented Generation framework to enhance service delivery. The research aims to integrate AI technologies while addressing challenges such as data privacy and system scalability.

Uploaded by

aanikmagar003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

DEG R E E P RO JE C T

AI in Automotive Repair:
Building a Data Driven Chatbot for
Enhanced Vehicle Diagnostics
(In Collaboration with Volvo Group)

Dipanwita Dash

Master Programme in Applied AI


2024

Luleå University of Technology


Department of Computer Science, Electrical and Space Engineering
[This page intentionally left blank]
Abstract

Generative Artificial Intelligence (AI) and Large Language Models (LLMs) present a promis-
ing avenue to augment service delivery and customer satisfaction in many sectors, including
automotive repair. The traditional diagnostic systems in this sector, supporting the "Triple C"
Complaint, Cause, Correction (CCC) of capturing Complaints, identifying Causes and provid-
ing Corrections, often suffer from inefficiencies, such as the under-utilization of insights from
historic cases, stored in massive databases containing structured and unstructured data. This
results in increased costs and extended vehicle downtime due to repetitive or misdiagnosed
issues.
The primary objective of this research is to enhance the efficiency and accuracy of automotive
repair services by developing a chatbot system which can retrieve relevant CCC information
from a dataset consisting of technician service and repair entries. The dataset is sourced
from workshops from several countries, and includes technical codes and free form text with
vehicle and service descriptions. In order to explore and overcome the infrastructure challenges
to implement this system within the organizational setup, this thesis aims to develop and
analyze two different chatbot systems, both featuring a retrival-augmented generation Retrieval
Augmented Generation (RAG) framework to augment Large Language Model (LLM) outputs.
The first system being implemented on-premises, integrates the Instructor XL embedding
model, Milvus vector database, and Mixtral 8x7B LLM. The second system operates within the
Azure cloud environment, employing the text-embedding-ada-002 model for embedding, Azure
AI Search for vector retrieval and GPT-3.5 Turbo as the LLM. Both systems are evaluated
based on performance, accuracy, scalability, and cost-effectiveness.
The on-premises system is better in performance and cost-effectiveness, however, the cloud-
based system is better in scalability, availability and using searchable metadata. The latter has
been implemented by me.
The main impact of this research is demonstrated through its contribution to the integration of
AI in automotive services, addressing critical aspects such as data privacy, system scalability,
and practical implementation of state-of-the-art AI technologies in an industry-specific con-
text. Recommendations for future research include language support, enhanced interactions,
improved evaluation, exploration of hybrid architectural frameworks to combine the strengths
of both RAG as well as fine-tuning of LLM, and extended integration with real-time vehicle
data systems for a comprehensive service experience.

1
Acknowledgments

I am deeply grateful to a number of individuals whose support and guidance were indispensable
throughout the journey of this thesis.
First, a special thanks goes to my thesis partner, Erfan Kanani (Stockholm University), whose
collaboration and insights were crucial in shaping many of the concepts and analyses presented
in this work. Working with him has been an enriching experience.
I would like to express my appreciation to my thesis supervisor Karl Löwenmark from Luleå Uni-
versity of Technology, whose expertise and insightful guidance have been invaluable throughout
this research.
I am also grateful to my Volvo supervisor, Leila Jamshidian Sales, for her continuous support
and insights that enhanced the quality of this thesis. I extend my gratitude to my colleagues at
Volvo Group, for their understanding and support, providing us with the necessary resources
and flexibility to pursue our research objectives.
Lastly, a big thanks to my family and friends who encouraged and supported directly or indi-
rectly throughout my academic journey and the thesis.
This thesis would not have been possible without the collective support and encouragement of
all mentioned and for that, I am truly grateful.

2
Contents

Acronyms 5

List of Symbols 6

List of Figures 7

List of Tables 8

1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Usage of AI tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4
2.1 Introduction to Chatbot Technologies . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Foundational Concepts in Conversational AI . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Traditional Chatbots Architecture . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Core Deep Learning Technologies . . . . . . . . . . . . . . . . . . . . . . 5
2.2.4 Evolution of LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.5 Retrieval Augmented Generation . . . . . . . . . . . . . . . . . . . . . . 7
2.2.5.1 Ingestion Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.5.2 Retrieval Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.5.3 Generation Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Limitations of LLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Limited Knowledge Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Difficulty with Up-to-Date Information . . . . . . . . . . . . . . . . . . . 9
2.3.3 Prone to Hallucination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.4 Lack of Grounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Gap and Novelty of This Research . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Identification of the Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Novelty of This Research . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.3 Impact of the Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Methodology 11
3.1 Data Collection and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Data Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 General System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Chatbot Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Chatbot Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 On-premise Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.2 Cloud-based Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 15

3
Contents

3.3.3 Implementation Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15


3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Results 20
4.1 Graphical User Interface (GUI) for the Chatbot . . . . . . . . . . . . . . . . . . 20
4.2 Comparative Analysis - LLM vs RAG(Cloud-based) . . . . . . . . . . . . . . . . 22
4.3 Performance Analysis of RAG pipelines . . . . . . . . . . . . . . . . . . . . . . . 25
4.4 Evaluation Results for RAG pipelines . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Discussion 28
5.1 Interpretation of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Pros and Cons - On-premise and Cloud-based . . . . . . . . . . . . . . . . . . . 29
5.3 Challenges Encountered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.4 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6 Conclusion 31
6.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.2 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.3 Recommendations for Future Research . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

References 33

Appendix A Sample Data Record 35

Appendix B GUI for the Chatbot 36

4
Acronyms

AI Artificial Intelligence. 1–4, 6, 9, 10, 17, 31, 32


API Application Programming Interface. 17, 22

BERT Bidirectional Encoder Representations from Transformers. 6

CCC Complaint, Cause, Correction. 1, 2, 16, 28

ELMo Embeddings from Language Models. 6

GDPR General Data Protection Regulation. 30


GPT Generative Pre-trained Transformer. 6, 14, 16, 28, 29
GUI Graphical User Interface. 4, 14, 15, 17, 20, 21, 28

KNN K-Nearest Neighbor. 7

LLM Large Language Model. 1, 3, 4, 7, 8, 13–15, 17, 18, 22–24, 28, 29, 31
LLMs Large Language Models. 1–3, 6, 8–10, 26
LSTM Long-Short Term Memory. 5, 6

MIT Massachusetts Institute of Technology. 4


ML Machine Learning. 4, 5

NLP Natural Language Processing. 1–6, 10

RAG Retrieval Augmented Generation. 1, 2, 4, 7, 13, 14, 18, 22–28, 31


RLHF Reinforcement Learning from Human Feedback. 7
RNN Recurrent Neural Network. 5

VSR Vehicle Service Records. 11

5
List of Symbols

6
List of Figures

1 Components of a RAG system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7


2 General System Architecture/pipeline for the Chatbot . . . . . . . . . . . . . . . 13
3 Evaluation Framework for RAG . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 GUI for the Chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7
List of Tables

1 Vehicle Service Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11


2 Overview of Technologies Used in the On-premise Implementation of the chatbot 14
3 Overview of Azure Components Used in the Cloud-based implementation of the
chatbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Overview of Search Methods in Azure AI Search . . . . . . . . . . . . . . . . . . 17
5 Comparison of Answers from GPT 3.5 turbo api and RAG pipeline (Cloud Im-
plementation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 Comparison of Execution Time for On-premise and Cloud-based Pipelines . . . 26
7 Evaluation Metrics for On-premise RAG . . . . . . . . . . . . . . . . . . . . . . 27
8 Evaluation Metrics for Cloud-based RAG . . . . . . . . . . . . . . . . . . . . . . 27
9 Comparison of Evaluation Metrics for On-Premise and Cloud-based Systems . . 27

8
Introduction

1.1 Overview
AI has become a cornerstone in modern landscape of technology, being capable of transforma-
tion and enhancement of productivity across diverse sectors. The quest for mastering linguistic
intelligence in machines began with the inception of the Turing Test in the 1950s [37]. Language
is a complex system of human expression regulated by grammatical rules and it poses signifi-
cant challenge in developing AI algorithms capable of grasping and leveraging it. Machines can
begin to comprehend natural language only with sophisticated AI advancements. Achieving
the ability for machines to read, write and communicate in a human-like manner has been a
significant challenge in research since longtime [29].
Significant strides in language modeling have been crucial in overcoming these hurdles. The
introduction of Embeddings from Language Models (ELMo) marked a revolutionary shift in
Natural Language Processing (NLP) by emphasizing word context rather than viewing words
as fixed entities [24]. Subsequent progress was achieved with the Transformer architecture [30],
exemplified by models like BERT [6], which uses self-attention to analyze large volumes of
unlabeled text. This approach has enabled models to capture the linguistic subtleties more
effectively, boosting their performance across various NLP tasks significantly.
Continuous improvements in the field suggest that increasing the scale of Pre-trained Language
Models enhances their effectiveness on diverse linguistic tasks — a principle known as the scaling
law. This phenomenon has been demonstrated by introduction of larger models such as the 175
billion-parameter GPT-3, which exhibits more complex behaviors and superior problem-solving
capabilities compared to its predecessors like the 330 million-parameter BERT or the 1.5 billion
parameter GPT-2 [17] [34].
LLMs, such as GPT-3 [3], PaLM [4], Galactica [27], LLaMA [28] and Mistral [13] represent the
leaders of NLP technology. These models, which contain hundreds of billions of parameters that
are trained on extensive text data collections [25], have been able to grasp the subtleties and
complexities of natural language. Their ability to generate coherent and contextually relevant
text showcases their expertise in tasks such as writing, summarizing, answering questions and
generating creative content. The effectiveness of these models highlights not only their vast
scale but also the innovative Transformer architecture [30] that supports their complex data
processing needs.
Automotive is one of such sectors where AI has been pivotal in redefining customer interactions
and service diagnostics [18]. Usage of the data related to vehicle diagnostics collected from
various sources in an AI setup is capable of creating specialized services, offering quick response
to the queries related to vehicle maintenance and even complex automotive troubleshooting.

1.2 Problem Statement


In the area of automotive service repair, customers frequently encounter challenges related to
accurate diagnosis of vehicle issues. This often leads to dissatisfaction due to delays in service
and potential escalation of unresolved problems. In order to address these critical issues, there

1
1.3 Objectives

is a need for a systematic approach that captures customer complaints accurately, identifies the
root causes, and provides effective corrections in a timely manner.
The specific problem that we aim to resolve is related to the "Triple C" or CCC approach.
Firstly, many customers may not accurately describe technical problems, which can lead to
incorrect or incomplete diagnoses. Secondly, the cause of automotive problems often involves
complex interactions of mechanical and electronic systems, which can be difficult to analyze
without an expert. Lastly, providing effective corrections requires an understanding of the
problem and its causes and also needs to be communicated in a way that is clear and actionable
for customers.
In the setup concerning Volvo Group, the application of AI introduces distinct challenges and
opportunities, especially within the context of vehicle maintenance. Volvo’s vehicle repair
workshops require precise, timely, and contextually relevant information to maintain operational
efficiency. Technicians must have access to a wide range of specialized knowledge, which includes
diagnostics, repair protocols, and vehicle maintenance histories. This need for domain-specific
information, coupled with the importance for immediate data updates, highlights the necessity
for advancements in language modeling that extend beyond the capabilities of traditional LLMs.
By integrating a solution equipped with AI and NLP capabilities, the aim is to streamline the
interaction process, enhance the accuracy of diagnostics and improve the overall efficiency of
vehicle service repairs.
On the other hand, LLMs face substantial challenges when tasked with handling queries that
require specialized domain knowledge or expertise [16], despite rapid advancements. One per-
sistent issue is their tendency to generate inaccurate or entirely fabricated "hallucinated" re-
sponses, a problem that is exacerbated when the query extends beyond their training data or
requires up-to-date information [11]. These limitations show the inherent risks in relying solely
on LLMs for practical applications in real-world scenarios, emphasizing the critical need for
additional safeguards and enhancements to ensure their reliability.

1.3 Objectives
The primary goal of this thesis is to develop and evaluate an AI-driven chatbot system that
leverages a RAG framework combined with LLMs to improve the efficiency and precision of ve-
hicle service diagnostics.The system aims to interpret customer complaints, accurately diagnose
underlying causes, and provide effective solutions based on the CCC approach. Additionally,
the research seeks to improve the customer satisfaction by providing contextually relevant ad-
vice, while continuously evaluating the performance, scalability, and cost-effectiveness of the
system compared to traditional methods. The proposed chatbot is envisioned as a crucial step
in narrowing the gap between extensive technical data and the immediate information needs
of technicians, enabling more knowledgeable and fast maintenance activities. Through this
exploration, we intend to contribute valuable insights into the application of advanced AI tech-
nologies within the automotive service industry, laying a foundation for future research and
development.

1.4 Thesis Structure


The thesis consists of 5 main chapters:
1. Introduction: This chapter introduces the thesis, providing a background on the im-
portance and role of AI in enhancing language understanding, followed by the problem

2
1.5 Usage of AI tools

statement consisting of the need of enhancements in vehicle service diagnostics and a dis-
cussion of the specific challenges faced in applying the technologies within this specialized
domain.
2. Background: Explores the evolution and types of chatbot technologies, emphasizing
the transition from rule-based systems to AI-powered chatbots, foundational concepts in
conversational AI and the significant advancements brought by deep learning technologies
in NLP.
3. Methodology: Outlines the research strategies and methods used for developing the
chatbot, detailing the research strategy employed to guide the chatbot’s systematic de-
velopment, along with the steps involved in data collection, preprocessing and system
architecture planning.
4. Results: Describes the technical architecture of the chatbot, including detailed descrip-
tions of the integration and functionality of the components and how these are tailored
to meet the operational needs of Volvo’s repair workshops and also presents the results
of the implemented chatbot system, assessing its performance through both qualitative
feedback and quantitative metrics. This chapter evaluates the chatbot’s impact on the
efficiency and accuracy of maintenance operations, user satisfaction and its alignment
with the intended operational goals.
5. Discussion: Explains about the interpretation of the results, challenges encountered and
observations related to the whole implementation.
6. Conclusion and Future Work: Concludes the thesis by summarizing the findings,
discussing the implications of the research, and suggesting areas for future research to
further enhance and expand the capabilities of AI-driven chatbot solutions in specialized
industrial applications.

1.5 Usage of AI tools


I have used chatGPT 4 as reference for writing this thesis, seeking information during imple-
mentation and also asking for suggestion on different approaches. Since this thesis is based on
Volvo internal dataset and infrastructure, I did not use the external AI tools directly during
coding due to Data Privacy policies. For example, I did not use the Github co-pilot.

3
Background

2.1 Introduction to Chatbot Technologies


Chatbots are software programs designed to simulate human dialogue in its natural format [9].
They have revolutionized the interaction of businesses with customers by providing automated,
instantaneous responses to inquiries and support requests. The concept of chatbots dated back
to the 1960s with the creation of ELIZA, one of the first chatbots developed at Massachusetts
Institute of Technology (MIT) [35].
The fundamental purpose of a chatbot is to simulate human-like interactions through text or
voice to perform a wide range of tasks, from answering Frequently Asked Questions to providing
personalized recommendations or support. The evolution of chatbots has been closely tied to the
advancements in NLP and Machine Learning (ML), enabling more context-aware interactions
[20].
Chatbots can be broadly categorized into two main types: rule-based chatbots and AI-powered
chatbots:
Rule-based Chatbots: These chatbots operate on predefined rules and scripted pathways,
providing specific responses to user inputs.
AI-powered Chatbots: These chatbots utilize AI techniques like NLP and ML and they
can understand complex language, learn from interactions and generate contextually adaptive
responses.
The advent of AI has significantly evolved chatbot technology from rule-based systems to more
sophisticated AI-powered models. Modern chatbots leverage advanced techniques in deep learn-
ing and neural networks to analyze and comprehend human language in a more sophisticated
way. This transition from a fixed, rule-based framework to a more flexible, learning-oriented
approach enables chatbots to refine their responses over time based on continuous interactions,
thereby delivering contextually relevant conversational experiences. This evolution signifies a
transition towards more intelligent systems capable of managing complex dialogues and en-
hancing user engagement [5].

2.2 Foundational Concepts in Conversational AI


Understanding the foundational elements of conversational AI requires evaluation of its core
components. This exploration begins with NLP, which focuses on enabling computers to un-
derstand and manipulate human language. It considers the design principles behind traditional
chatbot architectures and highlights the pivotal role of machine learning in driving the capa-
bilities of these systems.

2.2.1 Natural Language Processing


NLP is the foundational technology enabling conversational AI systems such as chatbots, to
process and understand human language effectively. It combines disciplines like Computer
Science, AI and linguistics to create systems capable of interpreting, analyzing and generating
human language in a meaningful and contextual way. NLP employs various techniques and

4
2.2 Foundational Concepts in Conversational AI

methodologies that facilitate the interaction between humans and machines. For example,
tokenization breaks down text into individual words or phrases, allowing the chatbots to analyze
languages at a granular level. Semantic analysis investigates the meaning behind words and
sentences, enabling chatbots to grasp the intent and sentiment of the user’s input. This level
of understanding is essential for the chatbot to respond in a contextually appropriate manner
and emotionally attuned to the user’s needs [15].
Beyond these fundamental techniques, NLP also involves pragmatic analysis, which helps un-
derstand the language in context and allows chatbots to maintain the coherence of conversations
over multiple turns of dialogue. These advanced NLP capabilities enable chatbots to handle
complex conversations, track the conversation’s context over time, and provide logical, relevant
and conversational responses. The importance of NLP in chatbots cannot be overstated. It
transforms chatbots from simple, keyword matching programs to systems that engage in natu-
ral, fluid, and meaningful interactions. NLP allows chatbots to answer queries and understand
the intent behind them, manage conversation flow and adapt responses based on the conversa-
tion’s context. This leads to a more natural and satisfying user experience, bridging the gap
between human-human and human-computer communication [20].

2.2.2 Traditional Chatbots Architecture


Before the advent of deep learning technologies, traditional chatbot architecture was predomi-
nantly rule based, operating on predetermined rules and decision trees to guide conversations.
This chatbot development era saw several critical systems emerge that laid the groundwork for
future advancements in conversational AI.
The journey began with ELIZA, developed in 1966 at MIT, which utilized a pattern-matching
approach to simulate conversation. Despite its introductory method and lack of context un-
derstanding, ELIZA marked a pivotal moment in chatbot history, exhibiting the potential for
machines to mimic human-like conversation [35]. Following ELIZA, the chatbot PARRY was
introduced in 1972 by psychiatrist Kenneth Colby. It offered more advanced conversation ca-
pabilities, including a model of behavior and emotions, to simulate a patient with paranoid
schizophrenia, enhancing the realism and variability of responses compared to ELIZA.
The progression in chatbot technology continued into the 1990s with the development of AL-
ICE [32]. ALICE used Artificial Intelligence Markup Language (AIML), representing a signifi-
cant evolution in rule-based chatbots, providing more complex pattern-matching and heuristic
conversation rules, thus creating an illusion of understanding. During the same era, Rollo
Carpenter’s creations, Jabberwacky, and later Cleverbot, moved towards a more dynamic and
data-driven approach. These systems learned from past interactions and generated responses
from a growing database of conversation logs, offering a more natural and engaging conversa-
tional experience.
The field witnessed a transformative leap with the introduction of IBM Watson in 2010, which
utilized NLP and ML to analyze and interpret large volumes of data. Watson’s capabilities
were showcased on the "Jeopardy!" game show, signaling a new era in conversational AI where
the understanding and processing of human language at scale became a reality [8].

2.2.3 Core Deep Learning Technologies


The field of conversational AI experienced a transformative shift with the introduction of core
deep learning technologies, particularly Long-Short Term Memory (LSTM) networks, trans-
formers and attention mechanisms. LSTM networks, a kind of Recurrent Neural Network

5
2.2 Foundational Concepts in Conversational AI

(RNN), were among the first to address the need for context and memory in chatbot con-
versations. They could retain information over long periods, allowing for more coherent and
context-aware dialogue [10].
Introduction to attention mechanisms and transformers marked a further evolution [30]. These
technologies allowed for even more complex processing of conversational context, enabling the
chatbot to focus on relevant parts of the conversation and generate more appropriate and
natural responses.
Transformers have set new standards for chatbot performance, fostering the development of
LLMs like OpenAI’s GPT series. These models can process vast amounts of data, learn from
diverse linguistic patterns, and generate remarkably human-like responses in their complexity,
relevance and coherence [6]. The integration of these core deep learning technologies has rev-
olutionized the field of conversational AI,leading to the creation of chatbots that are not only
more responsive and smart but also capable of continuous learning and improvement. This
shows a significant leap forward from the rule-based systems of the past, indicating a new era
of AI-driven conversational agents [3].

2.2.4 Evolution of LLMs


The development of LLMs has seen significant advancements over the past decade, with each
new model building upon the strengths and addressing the limitations of its predecessors.
Embeddings from Language Models (ELMo)
ELMo marked a significant breakthrough in NLP by introducing deep contextualized word
representations. ELMo generates word embeddings that are dependent on the entire context
in which the word appears. This is achieved through a bidirectional LSTM architecture that
processes text sequences both in forward and backward directions, capturing more nuanced
meanings of the words based on their usage in the sentences. It uses the Transformer [30]
with self-attention mechanism which enables the model to weigh the importance of different
words in a sentence when constructing the representations and positional encoding which adds
information about position of words in a sequence. The positional encoding compensates for
the lack of sequential data processing in the self-attention layer.
Bidirectional Encoder Representations from Transformers (BERT)
BERT is a step forward in the area of NLP by leveraging the bidirectional nature of the
transformer architecture. It processes text by looking at all the words to the left and right of
a given token simultaneously, which enables a deeper understanding of context. It is trained
with Masked Language Modelling where it masks some tokens randomly in the input and trains
the model to predict the masked words based on the surrounding context. the other training
technique for this model is Next Sentence Prediction where the model understands the relation
between two sentences by predicting whether a given sentence follows another in the original
text.
Generative Pre-trained Transformer (GPT)
GPT, especially with the release of GPT-3, focused on generating text that is coherent and
contextually appropriate. It processes text in a unidirectional manner, generating tokens se-
quentially. It uses Causal Language Modelling which trains the model to predict the next token
in the sequence, considering only the preceding tokens. This unidirectional approach is suitable
for text generation tasks. After pre-training on a large corpus, GPT models are fine-tuned on
specific tasks to adapt their general language understanding to particular applications.

6
2.2 Foundational Concepts in Conversational AI

Reinforcement Learning from Human Feedback (RLHF)


Human-in-the-Loop Training: Involves human evaluators providing feedback on the model’s
responses. This feedback is used to further train the model to align its responses with desired
outcomes which improve the relevance and coherence.
Policy Optimization: Utilizes reinforcement learning algorithms to optimize the chatbot’s re-
sponse strategy based on human feedback, ensuring that the model’s behavior aligns with user
expectations and delivers high-quality interactions.

2.2.5 Retrieval Augmented Generation


RAG integrates robustness of retrieval systems with generative capabilities of LLMs, forming a
hybrid model that leverages the best of both technologies. This integration occurs in two main
phases: retrieval and generation. Before initiating the retrieval phase, the documents within the
external knowledge base are processed into vector embeddings. Vector embeddings are high-
dimensional, numerical representations of data that allow complex documents to be compared
and retrieved based on content similarity. This process involves transforming the textual data
into vectors using word embedding algorithms, which capture the semantic meaning of words
and phrases in a way that can be numerically analyzed [6].
RAG was first introduced by Lewis, P in the paper [19]. Figure 1 shows a general description
of a RAG process. The components of RAG are mainly classified into three phases: Ingestion,
Retrieval and Generation.

Figure 1: Components of a RAG system

2.2.5.1 Ingestion Phase


The ingestion phase of a RAG system involves three key steps after data preprocessing: docu-
ments/chunks, embedding generation and index creation. Initially, relevant documents or data
chunks are collected segmenting larger documents into manageable chunks. These processed
texts are then transformed into vector representations or embeddings by using an embedding
model. Finally, the embeddings are stored in a vector database and indexed using algorithms
like K-Nearest Neighbor (KNN) to optimize retrieval. This indexing facilitates efficient and
quick retrieval of the most relevant documents in response to user queries, setting a solid foun-
dation for the RAG system’s performance in generating accurate and contextually relevant
responses.

7
2.3 Limitations of LLM

2.2.5.2 Retrieval Phase


In this phase, the system searches through the external knowledge base, typically a large-scale
document store, to find content relevant to the user’s query. The retriever module utilizes
algorithms, including vector similarity matching, to effectively identify and fetch pertinent
information. This is done by comparing the vector representation of the user’s query to the
vectors of documents in the knowledge base, identifying the most relevant matches through
similarity techniques such as dot product and cosine similarity.
• Dot Product: The Dot Product is a primary method for calculating vector similarity. It
is especially important for determining cosine similarity and plays a pivotal role in various
algorithms used in search engines, recommendation systems and data-driven applications.
Definition and Formula: Given two vectors, A and B, the Dot Product is computed
as follows: n
X
A·B= Ai × Bi
i=1

When vectors are normalized to unit length, the dot product provides the cosine of the
angle between them, serving as an effective measure of similarity.
• Cosine Similarity: Cosine similarity measures the cosine of the angle between two
vectors. This similarity index helps to understand the orientation of vectors in a multi-
dimensional space, indicating how similar their directions are irrespective of their magni-
tude.
Definition and Formula: For two vectors, A and B, cosine similarity is calculated
using the dot product and the magnitudes of the vectors:
A·B
S(A, B) =
∥A∥∥B∥
Here, ∥A∥ and ∥B∥prepresent the magnitudes (or norms) of vectors A and B, computed
as n
and i=1 i respectively. This ratio of the dot product to the prod-
,
pP
2
Pn 2
A
i=1 i B
uct of the magnitudes measures the cosine of the angle between the vectors, effectively
quantifying their directional similarity.

2.2.5.3 Generation Phase


Leveraging the retrieved information, the generation module, often a LLM, synthesizes this
data with the original query to produce coherent and contextually appropriate responses. This
phase ensures that the responses are not only accurate but also tailored to the current context,
significantly reducing the risks associated with outdated information or inherent inaccuracies
in the pre-trained LLMs.

2.3 Limitations of LLM


2.3.1 Limited Knowledge Scope
LLMs have inherent limitations in their knowledge scope due to the static nature of their
training datasets.These datasets are historical snapshots, encapsulating information available
up to the point of the model’s last training cycle, but not beyond. This limitation restricts their
ability to accurately reflect ongoing developments or changes in various fields, thereby impacting
their usage in dynamic and rapidly evolving domains. The scope of LLMs is confined to the
data they are trained on [1], which not only bounds their knowledge but also embeds historical

8
2.4 Gap and Novelty of This Research

biases and limitations within their responses. This highlights the need for continuous updating
of the training data and model retraining to keep pace with new knowledge and societal changes.
It is resource-intensive and not always feasible in practice.

2.3.2 Difficulty with Up-to-Date Information


In fast-paced sectors like technology, medicine, and global news, where new information emerges
continuously, LLMs can provide inaccurate or even outdated information, impacting their reli-
ability and effectiveness in tasks requiring current knowledge. The nature of the static training
approach of LLMs affects their performance and relevance with time [17]. The inability to
incorporate new data or learn from ongoing events means that these models can fall behind
human knowledge, necessitating periodic updates and retraining to maintain their accuracy
and relevance, which can be a costly and time-consuming process.

2.3.3 Prone to Hallucination


LLMs are linked to a phenomenon known as hallucination, where they generate text that seems
coherent and plausible but is factually incorrect [36]. This issue stems from the models’ funda-
mental reliance on statistical patterns derived from extensive data, leading them to prioritize
linguistic plausibility over factual truth. Such a tendency results in the creation of content
which is often misleading or entirely false, particularly when LLMs encounter topics with insuf-
ficient or ambiguous data in their training sets. It is suggested that due to their computational
design, LLMs inherently fail to differentiate between computable truths and the complex, yet
inconsistent real-world information they are trained on. All LLMs, regardless of their sophis-
tication, are prone to hallucinate because they cannot fully learn or represent all computable
functions, which leads to inconsistencies in their outputs compared to factual reality.

2.3.4 Lack of Grounding


Another limitation of LLMs is their lack of grounding, which refers to the inability to anchor
linguistic constructs to real-world entities and experiences. Unlike humans, who use sensory
experiences and interactions with the physical world to inform their understanding of language,
LLMs operate solely within the realm of text-based data. This limitation means that LLMs
can generate text that is syntactically correct and semantically plausible, but lacks a true
understanding of the subject matter, leading to responses that can be out of touch with real-
world contexts. The role of grounding is critical in achieving true language understanding and
the challenges faced by LLMs in this regard [2]. It is argued that without the ability to connect
language with real-world experiences and knowledge, LLMs remain limited in their ability to
comprehend and respond to complex or novel situations, highlighting a significant gap between
human and machine understanding of language.

2.4 Gap and Novelty of This Research


2.4.1 Identification of the Gap
The investigation into AI applications within the automotive maintenance industry, particu-
larly as identified by the Volvo Group, highlights a substantial gap in the existing operational
systems. Despite of the availability of sophisticated diagnostic tools and information systems
in repair workshops, there remains a notable lack of an integrated, intelligent system designed
specifically to support technicians in their repair tasks across various organizational sectors.

9
2.4 Gap and Novelty of This Research

This gap leads to inefficiencies and delays in accessing essential repair information, consequently
extending vehicle downtime and potentially escalating maintenance costs. Technicians at Volvo
are often confronted with complex repair scenarios that require immediate access to an extensive
repository of technical data and historical repair records. While the current systems are robust,
they fail to fully leverage AI capabilities to provide context-sensitive, real-time assistance. This
becomes more important when the technicians face rare repair issues or those involving some
kind of link between multiple vehicle systems.
Moreover, there exists an opportunity to augment system intelligence through the deployment
of AI-driven tools capable of adapting and learning from historical data—qualities inherent to
LLMs. The adoption of such technologies could effectively close the existing gap by offering
dynamic, actionable guidance specifically tailored to the unique challenges of automotive repair.
In response to these needs articulated by the Volvo Group, the objective for this thesis has been
formulated to explore how LLMs can be employed to develop domain-specific chatbots. These
chatbots are envisioned to significantly enhance the efficiency and precision of technical support,
seamlessly integrating into the workflows of automotive technicians and thereby substantially
improving the operational efficacy of the organization.

2.4.2 Novelty of This Research


This research introduces a novel solution to the identified gap by proposing the development of a
data driven chatbot, specifically tailored for the Volvo Group’s repair workshops. The novelty of
this research lies in its approach to harness the capabilities of LLMs and the specific application
of a chatbot that utilizes a tabular dataset provided by Volvo. This dataset comprises of
extensive information from past workshop visits, including detailed annotations on vehicle
issues, diagnostic steps and repair outcomes.
The proposed chatbot aims to integrate seamlessly into the existing technical environment of
Volvo’s repair workshops. By leveraging advanced NLP techniques and the latest develop-
ments in conversational AI, the chatbot will be able to interpret technicians’ queries in natural
language and provide precise, contextually relevant information drawn directly from Volvo’s
proprietary datasets.

2.4.3 Impact of the Novelty


The implementation of this chatbot is expected to significantly enhance the efficiency and accu-
racy of the information delivery process in repair workshops. By reducing the time technicians
need to access and interpret technical data, the chatbot will streamline the repair process,
reducing vehicle downtime and potentially increasing workshop throughput. Additionally, the
accuracy of the chatbot’s responses, ensured through continuous learning and updates from
real-time data, will minimize the risk of errors in diagnostics and repairs, leading to higher
quality outcomes and increased satisfaction among the customers.

10
Methodology

This chapter includes the methodology followed for the research done as part of this thesis.

3.1 Data Collection and Preparation


3.1.1 Data Collection
The main dataset for this research is a novel dataset from the Volvo application containing repair
or service records of the vehicles. The database is known as Vehicle Service Records (VSR).
This database contains the vehicle service information collected over time with workshop visits
of the vehicles. Table 1 explains the details of the attributes present in the database.

Table 1: Vehicle Service Information

Category Description
Details of the vehicle such as vehicle identification num-
Vehicle Information
ber, brand, model and product type etc.
Details containing the workshop visit of the vehicle such
Repair Information as place of repair, date, defect description and causal
part etc. This also includes claims related to the repair.
Comments logged in the workshop containing the com-
plaint from the customer, cause of the problem, and
Technician Comments correction or action taken to fix the problem. These are
the free text fields which are of main interest for this
research.

This dataset combines structured tabular data, such as vehicle information and repair details,
with unstructured data predominantly consisting of textual comments from both customers
and technicians. The structured data provides a framework for the chatbot to understand
and organize the technical aspects of vehicle maintenance, while the unstructured comments
are crucial for deriving good insights into the real-world applications of maintenance work,
including troubleshooting and customer service interactions.

3.1.2 Data Analysis


The database analyzed for this project consists of multiple tables aligned with the information
provided in Table 1, linked by a key associated with each repair. Each vehicle visiting the
workshop is logged as an entry in this database, where each repair may involve multiple jobs,
and each job may have a comment logged by the technician in a free-text column. These
comments vary in language, mostly reflecting the geographical location of the workshop.
Through initial analysis, several challenges were identified:
• Duplicate records: Same service record may be entered through multiple applications, so
there are a lot of duplicated information with different keys.
• Comments consolidation issue: Multiple comments related to one repair spread across
different tasks/jobs.

11
3.1 Data Collection and Preparation

• Multilingual comments: Multiple languages in the database pose significant challenges


for consistent data processing
• Legacy data model: The database is quite old and lack of a good data model and structure.
A pre-existing transformed version of the database was also reviewed. Although it is limited
in scope compared to the original database, it offers several enhancements, making it more
suitable for this research:
• Consolidated Claims Data: This version focuses on vehicle services that resulted in claims
paid by the organization, providing a more specific dataset.
• Unified Source: It aggregates data from a single source, eliminating issues of duplicate
records.
• Standardized Language: Comments are consolidated into a single record for each service
and translated into English, simplifying further processing.
• Enriched Content: Compared to the original database, it includes more comprehensive
information about each service.

3.1.3 Data Pre-processing


The decision to utilize the transformed database for data extraction was based on its pre-cleaned
state and relevance to the project’s objectives. The following steps outline the data extraction
and cleansing process:
1. Main attributes were identified to be - repair key (unique), repair date, vehicle identi-
fication number, product type, model, brand, mileage, causal part, defect information,
complaint text, cause text and correction text.
2. Information was collated from four tables using defined relational keys to ensure compre-
hensive data compilation.
3. Filters were applied to restrict data based on brands and repair dates from 2023-01-01 to
2024-05-01 to focus on the recent repairs those were claimed from Volvo and also to have
a good sampling with wide variety of products.
4. Records without comments were excluded to ensure that the dataset contained only entries
with actionable textual data. Also the records with a minimum of 100 characters in each
text field for complaint, cause and correction were considered in the final dataset.
5. The duplicated records with same content for complaint, cause and correction text fields
were eliminated, keeping only one record.
6. The null values in the non-mandatory columns were replaced with the text "Not Avail-
able".
By addressing the initial challenges and utilizing a more refined database, the preprocessing
steps ensure that the data fed into the chatbot system is accurate, relevant, and structured
to maximize the system’s performance and reliability. This careful preparation is essential
for successfully deploying a data-driven AI solution in complex operational environments like
vehicle maintenance.

12
3.2 General System Architecture

3.2 General System Architecture


As discussed in the previous chapter, LLM face significant limitations, including the tendency to
generate responses that may not align with up-to-date or factually accurate information. One
innovative approach to address these limitations is the RAG framework. RAG enhances the
reliability and accuracy of LLMs by grounding their responses in externally retrieved, current
data.

3.2.1 Chatbot Pipeline


The Chatbot pipeline is the seamless integration of the two phases within a chatbot system
which ensures that each user query is processed through a dynamic workflow. First, the query
initiates a retrieval from the knowledge base, followed by real-time response generation based
on the retrieved information and the capabilities of the generative model [12]. Using RAG, the
system minimizes the common pitfalls of LLMs, such as generating hallucinated or irrelevant
responses. It also provides a transparent approach to answer the queries, where each response
is supported by traceable data sources, enhancing the transparency and trustworthiness of the
system.
As shown in Figure 2, there are two steps in the RAG pipeline that have been implemented for
the project.

Figure 2: General System Architecture/pipeline for the Chatbot

Document Retrieval and Ingestion


The first step as shown in Figure 2 starts with the Database which contains the raw data related
to vehicle service repair which is used as an input to the pipeline.
As part of Data Pre-processing the data is cleaned and prepared by removing irrelevant infor-
mation and converting it into a format usable by the model. This cleaned dataset is processed
through an Embedding Model (Encoder) which encodes the text into numerical vectors. This
encoding captures the semantic meaning of the documents in a way that can be understood and
utilized by machine learning models. Then the document embeddings are stored in a Vector
database. This database is designed to facilitate efficient similarity searches which are used to
find the most relevant documents based on a user’s query.
User Query and Response Generation

13
3.3 Chatbot Implementation

This is the second step as shown in Figure 2. The user submits a query through a chatbot
web application, which acts as the interface between the user and the system. This query is
captured by the web application and processed into a vector by the same embedding model
used for document processing. This vectorized query is matched against document embeddings
stored in the Vector DB to identify the most relevant documents. These documents are used by
the LLM, which generates a contextually appropriate response. This response is then streamed
back to the user through the chatbot, providing an answer that leverages the most relevant
information from the database to address the user’s query.

3.3 Chatbot Implementation


As described in the above section the major component of the chatbot is the RAG pipeline
Two different RAG pipelines were implemented as part of this project.
• On-Premise Implementation: As part of this, Volvo owned infrastructure was tried out.
The required components for RAG were already available with an internal framework.
• Cloud-based Implementation: Volvo also has an infrastructure setup in Azure cloud en-
vironment with its private endpoints.

3.3.1 On-premise Implementation


For the on-premise implementation of the chatbot system aimed at vehicle service repair, the
architecture integrates several components including Instructor XL embedding model from
Hugging Face [26], Milvus vector database [33], and Mixtral 8x7B [14] LLM locally hosted
on Volvo servers.Table 2 describes each component briefly and how they function together to
create an efficient chatbot system.

Table 2: Overview of Technologies Used in the On-premise Implementation of the chatbot

Product Functionality Description


An advanced embedding model from Hugging
Face, designed to convert textual data into high-
Instructor XL Embedder
dimensional vector representations, ideal for se-
mantic understanding in complex queries.
An open-source vector database that manages
Milvus Vector Database large-scale vector similarity searches efficiently, es-
sential for quick retrieval of relevant embeddings.
A variant of the GPT model optimized for generat-
ing accurate and contextually relevant textual re-
Mixtral 8x7B LLM
sponses, specifically tailored for high-performance
applications.
A framework used to create interactive web appli-
cations quickly and easily. Streamlit was utilized
Streamlit GUI Framework
to develop the user interface for the chatbot, en-
abling intuitive interaction with the system.

All of these products are made available in a secured way in the on-premise setup which is
part of Volvo’s own datacenter, through an internal framework. The architecture is the same
as shown in the Figure 2. My thesis partner Erfan Kanani has worked more on building this
pipeline whereas I worked more on the Cloud-based implementation.

14
3.3 Chatbot Implementation

3.3.2 Cloud-based Implementation


The dataset for Vehicle service repair is stored in the Volvo Azure cloud environment. Hence
this chatbot was implemented with the help of available products in Azure environment.Table
3 describes the list of Azure products used in building the pipeline.

Table 3: Overview of Azure Components Used in the Cloud-based implementation of the


chatbot

Product Functionality Description


This is a specialized version of Ada language model
from OpenAI, used as the embedding model for
text-embedding- Embedding
processing textual data into vector representa-
ada-002 Model
tions. It offers efficient semantic understanding
capabilities for generating embeddings.
AI Search in Azure provides advanced search capa-
bilities by managing and querying vector embed-
AI Search Vector Search
dings efficiently, which helps in retrieving the most
relevant information quickly.
Utilizes OpenAI’s GPT-3.5 Turbo API for gener-
GPT-3.5 Turbo ating text responses. This API is known for its
LLM
API speed and efficiency, providing high-quality, con-
textually relevant answers based on vector inputs.
A framework used to create interactive web appli-
cations quickly and easily. Streamlit is utilized to
Streamlit GUI Framework
develop the user interface for the chatbot, enabling
intuitive interaction with the system.

These products are used for the project since those are part of the white-listed technology stack
available in Volvo cloud environment, with consideration of important security assesssments and
also the pricing perspective. This will be discussed in more detail in the Discussion. Let us
discuss how these are being used and the technical details associated to them in the following
sub-section with implementation steps.

3.3.3 Implementation Steps


As previously mentioned, there are two pipelines implemented for the chatbot. Data prepro-
cessing and model evaluation are shared between both of the pipelines. The steps followed
are common in high level, however it has been specifically elaborated here for the Cloud-based
Implementation post data preprocessing.
1. Setting up Development Environment - This step mainly involved setting up the
python virtual environment on the personal computer by installing the required python
libraries.The other activities included configuring Azure services and resources those were
needed for the end-to-end chatbot implementation, i.e. the end points and secret keys
for storage, embedding model text-embedding-ada-002, AI search and GPT 3.5 Turbo
API etc. We ensured that the network settings, security protocols and access permissions
were appropriately configured to allow seamless interaction between Azure services and
our application.
2. Creation of Embeddings using text-embedding-ada-002 - The pre-processed dataset
was stored in a file in parquet format. the same file is read with a Pandas dataframe using
Pyarrow method before the data was considered for embeddings. We decided to merge

15
3.3 Chatbot Implementation

the complaint, cause and correction fields to create one vector, considering the focus
of the usecase being CCC. Other fields such as causal part and defect description were
merged to create the second vector and rest of the fields were stored as it is to be used as
metadata. The model text-embedding-ada-002 leverages the capabilities of the GPT ar-
chitecture, particularly focusing on understanding and encoding textual information into
dense vectors that can be used in various natural language processing tasks [22] When
we input a some text into the text-embedding-ada-002 model, here is what happens:
• Tokenization: The text is broken down into tokens using a tokenizer. These tokens
are often words or sub-words that the model can understand. There is a token
limitation (8192 tokens), so the input text should not be very long, otherwise the
same would be truncated as per the implementation.
• Embedding: Each token is then converted into a numerical representation (embed-
ding). These initial embeddings are learnt from the data during the training process.
• Transformer Processing: The embeddings pass through several layers of the trans-
former model. Each layer consists of self-attention mechanisms and feed-forward
neural networks, allowing the model to analyze the text from various perspectives.
• Final Output: The output from the transformer layer is a series of vectors, each
corresponding to a token in the input text. In order to derive a single embedding
vector for the entire input text, the vectors are aggregated and then normalized.
3. Ingestion into Vector Storage (Azure AI Search) - This step refers to the process
of transferring and storing the embeddings created from text data into Azure’s AI Search
service, which is designed to manage and facilitate efficient search operations over large
datasets [23].It enables the subsequent retrieval of information that is most relevant to
user queries by using the embeddings to enhance the precision of search results. The
embeddings generated by the text-embedding-ada-002 model are vectors that represent
the semantic content of text inputs. These vectors need to be indexed in Azure AI Search
to allow for efficient querying. The process involves:
• The embeddings along their associated metadata were formatted according to the
requirements of AI Search
• In Azure AI Search, defining the schema was a critical step where we could specify
the fields of data that would be indexed and how they should be processed. For
embeddings, this included setting the correct data types and indexing fields such as
searchable or filterable etc.
• The prepared data was then uploaded to AI Search where an index is created. The
structure of this index is designed to optimize the performance of vector searches,
ensuring that search queries return the most semantically relevant results quickly.
Azure AI search supports different type of search methods in the context of information
retrieval. Here is a Table 4 explaining each of those with pros and cons:

16
3.3 Chatbot Implementation

Table 4: Overview of Search Methods in Azure AI Search

Search
Functionality Pros Cons
Method
May return irrelevant
Matches query terms Simple to implement results if query words
Keyword Search directly against a set and effective for pre- are ambiguous; does
of documents cise matches not account for con-
text
Captures semantic
Uses embeddings to Requires computa-
meaning and returns
Vector Search perform search based tional resources for
more contextually
on semantic similarity embeddings
relevant results
Balances speed and
Combines keyword accuracy, leverages
More complex to im-
Hybrid Search and vector search both exact match and
plement and manage.
methodologies semantic understand-
ing
Reranks initial search Improves result qual-
Computationally
Semantic results using AI to ity by considering
intensive, requires
Reranking better match the deeper linguistic
advanced AI models
query intent factors

The flexibility and scalability of Azure AI search make it a powerful tool for building
advanced search solutions for chatbots. We have used Semantic reranking as the search
method while creating the index for our vehicle service repair embeddings and metadata.
4. Creation of User Query Embedding - This step involved transforming the text of
the user’s query into a high-dimensional vector representation using the same embed-
ding model text-embedding-ada-002.This embedding captures the semantic meaning and
then can be compared to the embeddings stored in the vecotor database in order to get
contextually relevant search results.
5. Prompt Creation - A prompt plays an important role as well while we try to provide
a context to the LLM.It is the combination of the user query and the context, i.e. the
relevant documents retrieved from the vector database based on the similarity search.
Number of documents to be retrieved based on similarity is configurable. the prompt also
includes some general instructions to the LLM on how to respond to the queries.
6. Interaction with LLM - This step involved sending the constructed prompt to the
model GPT-3.5 Turbo (0301) [21] via Application Programming Interface (API) and
processing the response to provide the user with a context-rich response. The context
length of this model is maximum 8192 tokens. So the length of the prompt passed to the
LLM will affect the length of the generated response.
7. Creation of a GUI - A simple GUI was created using Python Streamlit in order to
provide an interactive interface to the user to be able to ask questions and get the response.
The same interface also displays the documents retrieved to generate the answer. The
GUI was then deployed to the azure environment as an application.

17
3.4 Evaluation

3.4 Evaluation
To ensure the chatbot effectively addresses the specific needs of Volvo Group’s truck repair
operations, a thorough evaluation framework was implemented as part of the research method-
ology. As with any machine learning model, the performance of individual components within
the LLM and RAG pipeline significantly influences the overall user experience. For this eval-
uation, we employed the RAGAs library [7], which provides specialized metrics designed to
evaluate each component of the RAG pipeline.

Figure 3: Evaluation Framework for RAG

Figure 3 outlines a framework for evaluating the performance of a RAG system with a focus
on alignment of generated answers with the relevant context and ground truth data. the main
components of the evaluation framework are: Question, Answer, Context and Ground truth
(optional). Following are the evaluation paths.
Faithfulness
Faithfulness measures the factual consistency of the generated answer against the given context.
It is calculated from the answer and the retrieved context, scaled to a range of (0,1), with higher
values indicating better performance. A generated answer is considered faithful if all the claims
made in the answer can be inferred from the provided context. To determine this, a set of
claims from the generated answer is first identified, and each claim is then cross-checked with
the given context to see if it can be inferred from it.
The formula for faithfulness is:
Number of claims in the generated answer that can be inferred from the given context
Faithfulness =
Total number of claims in the generated answer

Answer Relevancy
Answer Relevancy focuses on assessing how pertinent the generated answer is to the given
prompt. Lower scores are assigned to answers that are incomplete or contain redundant in-
formation, while higher scores indicate better relevancy. This metric is computed using the
question, the context, and the answer.
Answer Relevancy is defined as the mean cosine similarity of the original question to a number
of artificial questions, which are reverse-engineered based on the answer:
N
1 X
Answer Relevancy = cos(Egi , Eo )
N i=1

18
3.4 Evaluation

or

N
1 X Egi · Eo
Answer Relevancy =
N i=1 ∥Egi ∥∥Eo ∥

Where:
• Egi is the embedding of the generated question i
• Eo is the embedding of the original question.
• N is the number of generated questions, typically set to 3 by default
Context Precision
Context Precision is a metric that evaluates whether all of the ground-truth relevant items
present in the contexts are ranked higher or not. Ideally, all the relevant records must appear
at the top ranks. This metric is computed using the question, ground truth, and the contexts,
with values ranging between 0 and 1, where higher scores indicate better precision. The formula
for Context Precision is given by:
PK
k=1 (Precision@k × vk )
Context Precision@K =
Total number of relevant items in the top K results

Where:

true positives@k
Precision@k =
true positives@k + false positives@k
where:
• K is the total number of rows in contexts.
• vk ∈ {0, 1} is the relevance indicator at rank k.
Context Recall
Context Recall measures the extent to which the retrieved context aligns with the annotated
answer, treated as the ground truth. It is computed based on the ground truth and the retrieved
context, with values ranging between 0 and 1, where higher values indicate better performance.
To estimate context recall from the ground truth answer, each sentence in the ground truth
answer is analyzed to determine whether it can be attributed to the retrieved context. In an
ideal scenario, all sentences in the ground truth answer should be attributable to the retrieved
context.
The formula for calculating context recall is:
Number of sentences in the ground truth answer that can be attributed to the context
Context Recall =
Total number of sentences in the ground truth answer

19
Results

This section describes the results obtained from different experiments and implementations and
the qualitative and quantitative comparisons done.

4.1 GUI for the Chatbot


Figure 4 shows the GUI for the chatbot, created using streamlit. This is for the cloud based
system. Similar GUI was created for the on-premise pipeline.The GUI contains the instructions
to use the chatbot, text box to input questions and the response to the questions as the main
part. The number of documents to be retrieved can be configured by the user and the documents
which are used in order to come up with the answers, are displayed on the left side.

20
4.1 GUI for the Chatbot

Data-Driven Chatbot for Enhanced Vehicle Diagnostics


Configuration This chatbot is built using the Retrieval-Augmented Generation (RAG) framework, leveraging AI Search and GPT 3.5 Turbo API in Azure Cloud environment. The dataset
behind all the generated answers, contains texts from the workshops.
Select the number of
documents to retrieve
How It Works
20
Follow these simple steps to interact with the chatbot:

Documents: 1. Select K: k referes to the number of the most similar documents that the generated answer will be based on
2. Ask a Question: Ask any question related to the problem with the vehicle and the chatbot will give you resolution.
Document 1
Ask a question
{...}
What to do when signal lamp is malfunctioned in a vehicle?
Document 2
Submit and Process
{...}

Document 3 Answer: If the signal lamp in a vehicle is malfunctioning, the cause could be a burned-out bulb. In such a case, the solution would be to replace the lamp of the rear signal

{...} on the right side. If the issue persists, it could be due to a faulty wiring harness or a defective control unit. In such cases, it is recommended to perform diagnostics and
check the electrical connections of the relevant components. If necessary, replace the faulty parts and perform any required so�ware updates or parameter resets.
Document 4

{...}

Document 5

{...}

Document 6

{...}

Document 7

{...}

Document 8

{...}

Document 9

{...}

Figure 4: GUI for the Chatbot

21
4.2 Comparative Analysis - LLM vs RAG(Cloud-based)

4.2 Comparative Analysis - LLM vs RAG(Cloud-based)


Table 5 shows the comparison between answers to the same questions when directly asked to the
LLM GPT 3.5 turbo API versus when asked to the chatbot with RAG (cloud implementation).
It is an approach to do a qualitative comparison, however, the questions and answers are chosen
to be generic in order to publish in this report.
Table 5: Comparison of Answers from GPT 3.5 turbo api and RAG pipeline (Cloud
Implementation)

Answer from RAG pipeline


Question Answer from GPT 3.5 Turbo
(Cloud implementation)
The seal in a vehicle is an impor-
tant component that helps to pre-
vent leaks and maintain the in-
tegrity of various systems. There
are several reasons why a seal
may need to be replaced, includ-
ing: 1. Wear and tear: Over
time, seals can become worn and
damaged due to exposure to heat,
friction, and other factors. This
The seals in a vehicle are impor-
can cause them to lose their ef-
tant because they prevent leaks
fectiveness and start to leak. 2.
of fluids such as oil, lubricants,
Age: Seals can also deteriorate
and coolants. If a seal fails, it
over time simply due to age. This
can cause a leak which can lead
is especially true for older vehi-
to damage to the engine or other
cles that may have seals made
components. For example, if the
from materials that are no longer
Why we need to re- crankshaft seal fails, it can cause
used in modern cars. 3. Dam-
place the seal in a ve- oil to leak out of the engine and
age: Seals can be damaged by ac-
hicle, explain in more potentially damage the engine.
cidents, impacts, or other types of
details. Similarly, if the valve seal fails, it
physical trauma. This can cause
can cause oil to leak into the com-
them to fail and start to leak. 4.
bustion chamber and cause exces-
Improper installation: If a seal is
sive smoke and oil contamination
not installed correctly, it may not
in the exhaust manifold. In short,
be able to perform its intended
replacing a failed seal is impor-
function. This can lead to leaks
tant to prevent further damage to
and other problems. Replacing a
the vehicle.
seal is important because it helps
to ensure that the affected sys-
tem is functioning properly and
that there are no leaks that could
cause damage or safety issues. It
is also important to use high-
quality replacement seals that are
designed to meet the specific re-
quirements of the vehicle and the
system in question.
Continued on next page

22
4.2 Comparative Analysis - LLM vs RAG(Cloud-based)

Table 5 – Continued from previous page


Answer from RAG pipeline
Question Answer from GPT 3.5 Turbo
(Cloud implementation)
If the signal lamp is malfunc-
tioned in a vehicle, the following
steps can be taken: 1. Check the
bulb: The first thing to do is to
check the bulb. If the bulb is If your signal lamp is malfunc-
burnt out, it needs to be replaced. tioning, it could be due to vari-
2. Check the fuse: If the bulb is ous reasons such as a burned-out
not the problem, check the fuse. bulb, broken wiring, interruption,
A blown fuse can cause the signal or contact fault. The correction
lamp to malfunction. 3. Check for the issue would depend on the
What to do when sig- the wiring: If the bulb and fuse cause of the malfunction. For ex-
nal lamp is malfunc- are fine, check the wiring. Loose ample, if the lamp is burned out,
tioned in a vehicle? or damaged wiring can cause the it would need to be replaced. If
signal lamp to malfunction. 4. there is a broken wiring issue, the
Replace the relay: If the wiring wiring harness may need to be re-
is fine, the problem may be with placed. It is recommended to take
the relay. A faulty relay can cause your vehicle to a certified Volvo
the signal lamp to malfunction. 5. service center to diagnose and re-
Take the vehicle to a mechanic: pair the issue.
If none of the above steps work,
it is best to take the vehicle to a
mechanic. They will be able to
diagnose and fix the problem.
Continued on next page

23
4.2 Comparative Analysis - LLM vs RAG(Cloud-based)

Table 5 – Continued from previous page


Answer from RAG pipeline
Question Answer from GPT 3.5 Turbo
(Cloud implementation)
If there is a fault in the GSM
antenna of your vehicle, you can
take the following steps: 1. Check
the antenna connection: Make
sure that the antenna is properly
connected to the device. Check Based on the information pro-
for any loose connections or dam- vided, it seems that the fault
aged cables. 2. Replace the an- in the GSM antenna could be
tenna: If the antenna is damaged due to various reasons such as
or faulty, you may need to replace short-circuit, interruption, wa-
it. You can purchase a new an- ter/coolant damage, or oxidation.
tenna from a reputable supplier The correction for all these is-
There is a fault in
and install it yourself or take it sues is to replace the faulty part,
GSM Antenna, what
to a professional for installation. which could be the SIM card, ca-
can I do?
3. Check the signal strength: If ble, base antenna, add antenna,
the signal strength is weak, you or AM/FM/DAB rod antenna de-
may need to move to a location pending on the specific case. It
with better coverage. You can is recommended to take the ve-
also try using a signal booster to hicle to an authorized Volvo ser-
improve the signal strength. 4. vice center for proper diagnosis
Contact the manufacturer: If the and repair.
problem persists, you can contact
the manufacturer for assistance.
They may be able to provide you
with troubleshooting tips or offer
a repair or replacement service.
Continued on next page

24
4.3 Performance Analysis of RAG pipelines

Table 5 – Continued from previous page


Answer from RAG pipeline
Question Answer from GPT 3.5 Turbo
(Cloud implementation)
This error message indicates that
there is an issue with the fuel in-
jector on cylinder 5 of the en-
gine. The offset learning process
for the fuel injector has reached
its maximum limit, which means
that the engine control module
(ECM) is unable to adjust the
fuel injection timing and quantity
The error code DTCP02D500 in-
properly. This can result in poor
dicates that there is an issue with
engine performance, reduced fuel
the fuel injector in cylinder 5.
efficiency, and increased emis-
Specifically, the fuel injector off-
sions.The error message does not
Error indica- set learning is at the maximum
provide any subtype information,
tion on display: limit, and there is no subtype in-
which means that further diag-
DTCP02D500: Cylin- formation available. This could
nostic testing is required to deter-
der 5 fuel injector be caused by a defective injector,
mine the exact cause of the prob-
offset learning at max mechanical or software failures, or
lem. Possible causes of this error
limit, no subtype a blocked or clogged injector. The
message include a faulty fuel in-
information correction for this issue would be
jector, a clogged fuel injector, a
to replace the defective injector
damaged fuel injector wiring har-
and perform a thorough examina-
ness, or a malfunctioning ECM.
tion to ensure that there are no
To diagnose and repair this issue,
other issues causing the problem.
it is recommended to take the ve-
hicle to a qualified mechanic or
dealership for further testing and
repair. They will use specialized
diagnostic equipment to identify
the root cause of the problem
and perform the necessary repairs
to restore proper engine perfor-
mance.

4.3 Performance Analysis of RAG pipelines


Same dataset was used for both pipelines and total number of records after pre-processing was
76,945 and total number of unique vehicles was 52,402. As discussed in the previous section,
data pre-processing and evaluation are done in a common way for both On-premise and Cloud-
based pipelines. Table 6 shows the execution time for the rest of the steps which were performed
as part of respective pipelines.

25
4.4 Evaluation Results for RAG pipelines

Table 6: Comparison of Execution Time for On-premise and Cloud-based Pipelines

Execution Time Execution Time


Steps for On-premise for Cloud-based
Pipeline Pipeline
Embedding
Generation ( 77K 2 hrs 4.5 hrs
rows)
Vector Storage
30 minutes 55 minutes
Ingestion ( 77K rows)
RAG - Retrieval < 1 sec 1 sec - 1.5 sec
RAG - Generation 1 sec 1 sec - 5 sec

It is worth noting that the execution/response time observed for the RAG components Retrieval
and Generation differ with the number of documents configured to be retrieved.

4.4 Evaluation Results for RAG pipelines


RAG evaluation metrics was used as an approach to do the comparative analysis between both
RAG pipelines - On-Premise vs Cloud based implementations.
The metrics utilized in our assessment include faithfulness and answer relevancy, both of which
are critical for evaluating the generation phase of the pipeline and context precision and context
recall assess the retrieval phase.
Context Precision and Context Recall are metrics that require the presence of ground truth
to evaluate the performance of the retrieval phase accurately. Ground truth in the context
of vehicle diagnostics means having an authoritative set of correct responses, which usually
necessitates the involvement of specialists or domain experts, such as expert technicians in the
case of truck maintenance. However, due to the lack of availability of such domain experts to
evaluate our chatbot system within the thesis timeline, we could not have the actual ground
truth for the evaluation.
Given this constraint, we came up with an alternative approach for the evaluation. Since we
had two RAG pipelines using the same dataset but different ways of retrieval and generation,
we chose 10 questions and asked both chatbots separately. Then we used the answers from one
chatbot as the "acting ground truth" for evaluating the retrieval phase of the other chatbot.
While these answers are not necessarily the actual ground truths, we assumed them to be
representative for the purpose of checking the robustness of the systems. Robustness is an
critical aspect while using LLMs and when two different pipelines produce similar results, then
that strongly implies that the result is correct. We manually validated few of the responses
from both chatbots by searching the key words from the question in the documents retrieved.
This approach allowed us to perform a comparative analysis of the retrieval capabilities of
both implementations, even in the absence of actual ground truth data. The results provided
insights into the relative performance of the two architectures, highlighting their strengths and
weaknesses in handling the given queries. Table 7 shows the evaluation metrics for the on-
premise RAG. Similarly, Table 8 shows the evaluation metrics for the cloud-based RAG. The
number of documents to be retrieved was set to 3 for both pipelines.
Table 9 shows the average of the evaluation metrics for the on-premise and cloud-based RAG
solutions.

26
4.4 Evaluation Results for RAG pipelines

Table 7: Evaluation Metrics for On-premise RAG

Question Faithfulness Answer Relevancy Context Precision Context Recall


Q1 1.00 0.89 1.00 0.67
Q2 1.00 0.87 1.00 0.75
Q3 1.00 0.95 1.00 0.50
Q4 1.00 0.96 1.00 0.20
Q5 1.00 0.94 1.00 0.60
Q6 1.00 0.96 0.58 0.00
Q7 1.00 0.82 1.00 0.83
Q8 1.00 0.86 1.00 0.43
Q9 1.00 0.94 1.00 0.67
Q10 1.00 0.93 0.50 0.60

Table 8: Evaluation Metrics for Cloud-based RAG

Question Faithfulness Answer Relevancy Context Precision Context Recall


Q1 1.00 0.83 0.58 1.00
Q2 1.00 0.89 1.00 0.64
Q3 0.86 0.86 1.00 0.00
Q4 1.00 0.93 1.00 0.67
Q5 1.00 0.98 1.00 0.33
Q6 1.00 0.96 1.00 1.00
Q7 1.00 0.86 1.00 1.00
Q8 1.00 0.87 0.83 0.71
Q9 0.83 0.91 1.00 0.33
Q10 1.00 0.89 1.00 0.33

Table 9: Comparison of Evaluation Metrics for On-Premise and Cloud-based Systems

Parameter On-Premise Cloud-based


Faithfulness 1 0.96
Answer Relevancy 0.91 0.89
Context Precision 0.90 0.94
Context Recall 0.52 0.60

27
Discussion

In this section we have discussed the results in more detail.

5.1 Interpretation of Results


Figure 4 shows the GUI for the Cloud-based chatbot which is very similar to the On-premise
version since both are created using Streamlit. The example of the question and the number
of documents to be retrieved, have been chosen carefully in order to keep the answer short and
generic due to data privacy reasons. As mentioned in the Methodology section, the context
length of the LLM GPT-3.5 Turbo is maximum 8192 tokens which includes the retrieved doc-
uments as part of the context and the generated response. Since the number of documents for
retrieval has been set to 20 (which is on higher side), the generated answer is short. It was
observed during several experiments that keeping the value of retrieved documents in a range
of 3-10, can improve the generated answers by including more specific details related to CCC.
Table 5 shows the answers generated from the LLM GPT-3.5 Turbo directly and from the
cloud-based RAG pipeline. We got the results with vehicle metadata and component details
for the same questions based on the change in number of documents to be retrieved from the
RAG pipeline, which cannot be done with direct response for the LLM. Though the answers
seem to be generic from both systems, a careful observation can identify the answer from the
later is little bit more specific from vehicle component perspective, even without additional
details from individual repair cases.
Table 6 illustrates the execution time observed in different steps of both on-premise and cloud-
based pipelines. Embedding generation was particularly time-consuming in the cloud-based
pipeline, possibly due to the network configuration of the cloud infrastructure. It was done in
batches by splitting the records into 100 files to make sure not to send a lot of data at the same
time. The embedding model text-embedding-ada-002 has the token limitation of 8192 tokens,
so the text beyond the maximum number of tokens was set to be truncated. However, this did
not impact our task as the maximum length of the merged text fields considered for embedding
generation, was below the limit of the tokens. This step also raises a question on feasibility of
having this solution production-ready. However, it should probably be fine by generating the
embeddings incrementally using the data received at the database on daily basis after applying
the necessary data cleaning.
Regarding the Vector Ingestion, though the cloud-based pipeline seems to take longer than
on-premise, it was quite evident that Azure AI Search is more capable in terms of creating
the index with multiple vectors and searchable/filterable metadata. For the purpose of this
thesis, we ingested two vectors in Azure AI Search - one with merged fields containing CCC
and the other with merged fields for causal part and defect, alongside the original fields and
other metadata. However, we could not do the same configuration for the On-premise pipeline
due to limitations of the version of Milvus vector database and could only ingest one merged
vector with original fields and other metatdata fields.
Retrieval and Generation parts of the RAG are also faster in the on-premise pipeline compared
to the cloud-based pipeline. However, it was observed that the cloud-based pipeline can answer
more variety of questions due to the advanced capabilities of Azure AI search which enhances the

28
5.2 Pros and Cons - On-premise and Cloud-based

retrieval part, consequently enhancing the context and generation from the LLM. Considering
the difference of response between 1-5 seconds, it doesnot make a big difference.
Table 9 shows the average values of the evaluation parameters for both on-premise and cloud-
based systems, calculated from the values obtained by asking 10 same questions to both the
systems as illustrated in Table 7 and Table 8. Based on this evaluation, it is observed that the
Generation part with Faithfulness and Answer Relevancy are better in the On-Premise chatbot
system, whereas the Retrieval part with Context Precision and Recall are better in the Cloud-
based chatbot system. However, a lot of additional experiments need to be done to support
this statement, possibly with more questions, varying the number of retrieved documents and
also with real Ground Truth validated by experts in a qualitative manner.

5.2 Pros and Cons - On-premise and Cloud-based


On-premise system
Pros: Main advantage of this pipeline is the cost saving. It provides the opportunity to
have prolonged research with different type of tests.A lot of customization can be done while
working with open source products like Mixtral as LLM, Milvus as vector DB and Instructor
XL as embedder. Part of the framework already exists which makes the implementation easier
to maintain. It is also a good option while considering data security and control for sensitive
data.
Cons: The dataset that we have used for this thesis is stored in Azure cloud environment. So
there is no direct connection between the actual location of the dataset and the On-premise
working environment.This result in manual download of the data to the same environment from
cloud. Hence the solution is not scalable and data ingestion can not be automated as of now.
Cloud-based system
Pros: The dataset is co-located in the same environment, so it is scalable, i.e. new data can
be ingested incrementally. The streamlit application deployment process is stable and multiple
users can be added as Viewers to try out the chatbot already.
Cons: Cost can be the main challenge in long run with the cloud implementation.There are
restrictions on usage of better LLM models (e.g. GPT 4) so far.There are costs associated with
Embedding generation and also the requests to be sent to the LLM e.g. to pay for 1000 requests
per month per user.

5.3 Challenges Encountered


There were many challenges encountered during the project. However, the primary one was
to get approvals and access for the dataset. Then after looking at the original dataset that
we intended to use, we realised that the same needs a lot of cleaning, as described in the
Subsection "Data Collection and Preparation" under the section Methodology. We finally used
a dataset from another database which has already been cleaned, however, having a limited
scope compared to the original dataset. Another challenge included the infrastructure related
issues during initial as well as intermediate stage of the thesis.The on-premise environment is
not very stable yet since that is still getting upgraded with kubernetes. Next challenge was
to identify the right stakeholders/users for a qualitative feedback for the answers from the
chatbot. Since we do not interact with the workshop technicians in our daily work, we are
trying to identify few representatives within the department who could reflect on the needs
better and could help us improve the quality of the chatbot.

29
5.4 Ethical Considerations

5.4 Ethical Considerations


The development and deployment of AI technologies, particularly in domains involving sen-
sitive personal and operational data like vehicle service information, raise significant ethical
considerations. This research recognizes the importance of adhering to ethical standards in AI
development, focusing particularly on issues related to data privacy, integrity, and user trust.
Given the chatbot’s reliance on extensive data collected from Volvo vehicles’ workshop visits,
which include personal and sensitive information related to vehicle ownership and technical
specifications, ensuring data privacy is paramount. All data used in this project complies with
the General Data Protection Regulation (GDPR), which mandates stringent measures for data
handling and privacy within the European Union [31].
Data handling practices are rigorously designed to remain in compliance with both GDPR and
Volvo’s internal data protection rules. These rules are specifically tailored to address the unique
operational requirements and privacy concerns of Volvo Group, ensuring that all data used is
handled in a manner that is not only legally compliant but also aligned with corporate ethics
and customer expectations. This dual layer of compliance helps maintain high standards of data
integrity and security, reinforcing the commitment to ethical AI development and deployment.
By integrating these strict data handling and privacy measures, this project upholds the ethical
standards essential for developing technology solutions in sensitive sectors, ensuring that the
chatbot serves as a responsible and trustworthy tool for Volvo Group’s vehicle repair operations.

30
Conclusion

6.1 Summary of Findings


This thesis has illustrated how integrating a RAG framework can contribute to the development
of AI-driven chatbots in automotive repair. By comparing an on-premise system with a cloud-
based one, we discovered notable differences in their performance, scalability and cost. The
on-premise setup excelled in terms of performance and cost-effectiveness, while the cloud-based
system offered better scalability and possibility of seamless integration with existing digital
infrastructures.

6.2 Key Contributions


This research has made some important contributions.
Innovative AI Integration: By applying AI technologies in a traditional sector, the project
helps to improve diagnostic processes and to enhance customer satisfaction through quicker
and more precise responses.
Practical AI Tool Enhancement: By customizing the RAG framework, we have demon-
strated how AI tools can be effectively adapted to specialized domains, paving the way for
future AI applications in similar industrial settings.

6.3 Recommendations for Future Research


There are few short-term recommendations which are feasible but could not be explored due
to time constraint for the thesis.
The current status of the chatbot is to serve as a Question-Answer application which can be
further improved to consider the history of conversations.
Evaluation of the chatbot could be enhanced by including Ground Truth through qualitative
analysis by the experts. Several experiments can be done with more number of questions and
configuring different number of documents being retrieved.
Another subsequent feature which could be very useful, is to support several languages for
interaction. This could be feasible with the Cloud-based implementation. This would make
it more practical for the technicians or technical services across several countries to use this
application.
Fine-tuning of the LLM can be considered and then to see the feasibility of using both RAG
and the fine-tuned model together for the domain-specific diagnostics.
Investigating hybrid systems that combine the security benefits of on-premise setups with the
scalability and efficiency of cloud services could yield more versatile and cost effective solutions.
Experiments can be done with incremental learning where the chatbot continues to learn from
new interactions. Integrating the chatbot system with other data platforms, such as telematics
or vehicle monitoring systems, could provide a more holistic service experience, leveraging
real-time data for enhanced diagnostic capabilities.

31
6.4 Final Remarks

6.4 Final Remarks


The integration of AI into automotive repair services is just the beginning. With continuous
advancement in AI, machine learning, and increasing data availability, there is a vast potential
to transform automotive diagnostics. The successful implementation and evaluation of the
chatbot in this thesis suggests a future where vehicle services are more informed, efficient and
customer-focused.

32
References

[1] Emily M Bender et al. “On the dangers of stochastic parrots: Can language models be
too big?” In: Proceedings of the 2021 ACM conference on fairness, accountability, and
transparency. 2021, pp. 610–623.
[2] Yonatan Bisk et al. “Experience grounds language”. In: arXiv preprint arXiv:2004.10151
(2020).
[3] Tom Brown et al. “Language models are few-shot learners”. In: Advances in neural infor-
mation processing systems 33 (2020), pp. 1877–1901.
[4] Aakanksha Chowdhery et al. “Palm: Scaling language modeling with pathways”. In: Jour-
nal of Machine Learning Research 24.240 (2023), pp. 1–113.
[5] Richard Csaky. “Deep learning based chatbot models”. In: arXiv preprint arXiv:1908.08835
(2019).
[6] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for language
understanding”. In: arXiv preprint arXiv:1810.04805 (2018).
[7] Shahul Es et al. “Ragas: Automated evaluation of retrieval augmented generation”. In:
arXiv preprint arXiv:2309.15217 (2023).
[8] David Ferrucci et al. “Building Watson: An overview of the DeepQA project”. In: AI
magazine 31.3 (2010), pp. 59–79.
[9] Zhuoyan Han. “The applications of chatbot”. In: Highlights in Science, Engineering and
Technology 57 (2023), pp. 258–266.
[10] Sepp Hochreiter and Jürgen Schmidhuber. “Long short-term memory”. In: Neural com-
putation 9.8 (1997), pp. 1735–1780.
[11] Lei Huang et al. “A survey on hallucination in large language models: Principles, taxon-
omy, challenges, and open questions”. In: arXiv preprint arXiv:2311.05232 (2023).
[12] Gautier Izacard and Edouard Grave. Leveraging Passage Retrieval with Generative Models
for Open Domain Question Answering. 2021. arXiv: 2007.01282 [cs.CL].
[13] Albert Q Jiang et al. “Mistral 7B”. In: arXiv preprint arXiv:2310.06825 (2023).
[14] Albert Q. Jiang et al. Mixtral of Experts. 2024. arXiv: 2401.04088 [cs.LG].
[15] Daniel Jurafsky and James H Martin. Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics, and Speech Recognition.
[16] Nikhil Kandpal et al. “Large language models struggle to learn long-tail knowledge”. In:
International Conference on Machine Learning. PMLR. 2023, pp. 15696–15707.
[17] Jared Kaplan et al. “Scaling laws for neural language models”. In: arXiv preprint arXiv:2001.08361
(2020).
[18] Ali Khodadadi, Soroush Ghandiparsi, and Chen-Nee Chuah. “A Natural Language Pro-
cessing and deep learning based model for automated vehicle diagnostics using free-text
customer service reports”. In: Machine Learning with Applications 10 (2022), p. 100424.
issn: 2666-8270. doi: https://doi.org/10.1016/j.mlwa.2022.100424. url: https:
//www.sciencedirect.com/science/article/pii/S2666827022000998.
[19] Patrick S. H. Lewis et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP
Tasks”. In: CoRR abs/2005.11401 (2020). arXiv: 2005.11401. url: https://arxiv.org/
abs/2005.11401.
[20] Michael McTear et al. “Conversational interfaces: devices, wearables, virtual agents, and
robots”. In: The Conversational Interface: Talking to Smart Devices (2016), pp. 283–308.

33
6.4 Final Remarks

[21] Microsoft. OpenAI Models on Azure - Concepts and Models. Last updated on 2024-05-13,
accessed on 2024-06-07. 2024. url: https://learn.microsoft.com/en-us/azure/ai-
services/openai/concepts/models#gpt-35.
[22] Microsoft. Tutorial: Generate and use embeddings with OpenAI service in Azure. Last
updated on 2024-02-22, accessed on 2024-06-07. 2024. url: https://learn.microsoft.
com / en - us / azure / ai - services / openai / tutorials / embeddings ? tabs = python -
new%2Ccommand-line&pivots=programming-language-python.
[23] Microsoft. What is Azure Search? Last updated on 2024-05-21, accessed on 2024-06-07.
2024. url: https://learn.microsoft.com/en-us/azure/search/search-what-is-
azure-search.
[24] Matthew E. Peters et al. “Deep contextualized word representations”. In: CoRR abs/1802.05365
(2018). arXiv: 1802.05365. url: http://arxiv.org/abs/1802.05365.
[25] Murray Shanahan. “Talking about large language models”. In: Communications of the
ACM 67.2 (2024), pp. 68–79.
[26] Hongjin Su et al. One Embedder, Any Task: Instruction-Finetuned Text Embeddings. 2023.
arXiv: 2212.09741 [cs.CL].
[27] Ross Taylor et al. “Galactica: A large language model for science”. In: arXiv preprint
arXiv:2211.09085 (2022).
[28] Hugo Touvron et al. “Llama: Open and efficient foundation language models”. In: arXiv
preprint arXiv:2302.13971 (2023).
[29] Alan M Turing. “Computing Machinery and Intelligence.” In: Creative Computing 6.1
(1980), pp. 44–53.
[30] Ashish Vaswani et al. “Attention is all you need”. In: Advances in neural information
processing systems 30 (2017).
[31] Paul Voigt and Axel Von dem Bussche. “The eu general data protection regulation (gdpr)”.
In: A Practical Guide, 1st Ed., Cham: Springer International Publishing 10.3152676
(2017), pp. 10–5555.
[32] Richard Wallace. “The elements of AIML style”. In: Alice AI Foundation 139 (2003).
[33] Jianguo Wang et al. “Milvus: A Purpose-Built Vector Data Management System”. In:
Proceedings of the 2021 International Conference on Management of Data. SIGMOD ’21.
Virtual Event, China: Association for Computing Machinery, 2021, pp. 2614–2627. isbn:
9781450383431. doi: 10.1145/3448016.3457550. url: https://doi.org/10.1145/
3448016.3457550.
[34] Jason Wei et al. “Emergent abilities of large language models”. In: arXiv preprint arXiv:2206.07682
(2022).
[35] Joseph Weizenbaum. “ELIZA—a computer program for the study of natural language
communication between man and machine”. In: Communications of the ACM 9.1 (1966),
pp. 36–45.
[36] Ziwei Xu, Sanjay Jain, and Mohan Kankanhalli. “Hallucination is inevitable: An innate
limitation of large language models”. In: arXiv preprint arXiv:2401.11817 (2024).
[37] Wayne Xin Zhao et al. “A survey of large language models”. In: arXiv preprint arXiv:2303.18223
(2023).

34
Appendix A: Sample Data Record

Index Vehicle ID Brand Product Type Defect Causal Part Complaint Cause Correction
Voltage drop in 2
voltage issue during batteries noticed Replace the batteries
1 ABC2745 Trucks BEV drop battery start-up..... during a test with the issue

35
Appendix B: GUI for the Chatbot

Document Genie http://localhost:8502/

Data-Driven Chatbot for Enhanced Vehicle Diagnostics


Configuration This chatbot is built using the Retrieval-Augmented Generation (RAG) framework, leveraging AI Search and GPT 3.5 Turbo API in Azure Cloud environment. The dataset behind all the generated answers, contains texts from
the workshops.
Select the number of documents to
retrieve
How It Works
10
Follow these simple steps to interact with the chatbot:

Documents: 1. Select K: k referes to the number of the most similar documents that the generated answer will be based on
2. Ask a Question: Ask any question related to the problem with the vehicle and the chatbot will give you resolution.
Document 1
Ask a question
{...}
Why do we need to replace seal in a vehicle, explain?
Document 2
Submit and Process
{...}

Document 3 Answer: Seals are an important component in a vehicle as they prevent oil leakage and ensure that the engine runs smoothly. The need to replace a seal in a vehicle arises when it starts to leak oil. The leakage can be caused

{...} by various factors such as worn-out seals, material defects, or poor sealing due to faulty workmanship. In such cases, the seal needs to be replaced with a new one. The replacement process involves removing the faulty seal,
cleaning the surface, and installing a new seal. The new seal is then tested to ensure that there are no leaks. It is important to note that regular maintenance of the vehicle can help prevent such issues from occurring.
Document 4

{...}

Document 5

{...}

Document 6

{...}

Document 7

{...}

Document 8

{...}

Document 9

{...}

Document 10

{...}

1 of 1 6/2/2024, 11:24 PM

36

You might also like