Grigorov D Introduction To Python and Large Language Models 2024
Grigorov D Introduction To Python and Large Language Models 2024
Dilyan Grigorov
Introduction to Python and Large Language Models: A Guide to Language Models
Dilyan Grigorov
Varna, Bulgaria
Introduction������������������������������������������������������������������������������������������������������������xxi
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
ix
Table of Contents
x
Table of Contents
StableLM����������������������������������������������������������������������������������������������������������������������������� 258
Palmyra������������������������������������������������������������������������������������������������������������������������������� 259
GPT4ALL������������������������������������������������������������������������������������������������������������������������������ 260
Summary���������������������������������������������������������������������������������������������������������������������������������� 261
xi
Table of Contents
Chapter 7: Harnessing Python 3.11 and Python Libraries for LLM Development����303
LangChain��������������������������������������������������������������������������������������������������������������������������������� 303
LangChain Features������������������������������������������������������������������������������������������������������������� 304
What Are the Integrations of LangChain?���������������������������������������������������������������������������� 305
How to Build Applications in LangChain?���������������������������������������������������������������������������� 305
Use Cases of LangChain������������������������������������������������������������������������������������������������������ 306
Example of a LangChain App – Article Summarizer������������������������������������������������������������ 307
Hugging Face���������������������������������������������������������������������������������������������������������������������������� 309
History of Hugging Face������������������������������������������������������������������������������������������������������ 310
Key Components of Hugging Face��������������������������������������������������������������������������������������� 310
OpenAI API��������������������������������������������������������������������������������������������������������������������������������� 316
Features of the OpenAI API�������������������������������������������������������������������������������������������������� 317
Industry Applications of the OpenAI API������������������������������������������������������������������������������ 318
Simple Example of a Connection to the OpenAI API������������������������������������������������������������ 320
Cohere��������������������������������������������������������������������������������������������������������������������������������������� 322
Cohere Models��������������������������������������������������������������������������������������������������������������������� 323
Pinecone����������������������������������������������������������������������������������������������������������������������������������� 327
How Vector Databases Operate������������������������������������������������������������������������������������������� 327
What Exactly Is a Vector Database?������������������������������������������������������������������������������������ 328
Pinecone’s Features������������������������������������������������������������������������������������������������������������ 328
Practical Applications���������������������������������������������������������������������������������������������������������� 329
Lamini.ai����������������������������������������������������������������������������������������������������������������������������������� 332
Lamini’s Operational Mechanics����������������������������������������������������������������������������������������� 332
Lamini’s Features, Functionalities, and Advantages������������������������������������������������������������ 332
Applications and Use Cases for Lamini������������������������������������������������������������������������������� 333
xii
Table of Contents
Index��������������������������������������������������������������������������������������������������������������������� 369
xiii
About the Author
Dilyan Grigorov is a software developer with a passion for
Python software development, generative deep learning and
machine learning, data structures, and algorithms. He is an
advocate for open source and the Python language itself. He
has 16 years of industry experience programming in Python
and has spent five of those years researching and testing
generative AI solutions. Dilyan is a Stanford student in the
Graduate Program on Artificial Intelligence in the classes of
people like Andrew Ng, Fei-Fei Li, and Christopher Manning.
He has been mentored by software engineers and AI experts
from Google and Nvidia. His passion for AI and ML stems from his background as an
SEO specialist dealing with search engine algorithms daily. He enjoys engaging with the
software community, often giving talks at local meetups and larger conferences. In his
spare time, he enjoys reading books, hiking in the mountains, taking long walks, playing
with his son, and playing the piano.
xv
About the Technical Reviewer
Tuhin Sharma is Sr. Principal Data Scientist at Red Hat in
the Data Development Insights and Strategy group. Prior to
that, he worked at Hypersonix as an AI architect. He also
co-founded and has been CEO of Binaize, a website
conversion intelligence product for e-commerce SMBs. He
received a master’s degree from IIT Roorkee and a bachelor’s
degree from IIEST Shibpur in Computer Science. He loves to
code and collaborate on open source and research projects.
He has four research papers and five patents in the field of
AI and NLP. He is a reviewer of IEEE MASS conference in
the AI track. He writes deep learning articles for O’Reilly
in collaboration with the AWS MXNET team. He is a regular speaker at prominent AI
conferences like O’Reilly Strata & AI, ODSC, GIDS, Devconf, etc.
xvii
Acknowledgments
As I reflect on the journey of writing this book, I am overwhelmed with gratitude for the
countless individuals who have supported, inspired, and guided me along this path.
First and foremost, I extend my deepest thanks to my family, whose unwavering
support and endless patience have been my anchor and source of strength.
I owe a great debt of gratitude to my mentors and colleagues, who have shared their
wisdom, critiqued my ideas with kindness, and encouraged me to push the boundaries
of my knowledge. Special thanks to my mentor Alexandre Blanchette whose insightful
feedback and encouragement were invaluable to the completion of this manuscript.
Very special thanks to another of my mentors – Haiguang Li, one of the best machine
learning and AI experts I have ever met. You have been my north star throughout this
writing process. Your belief in me has been a gift beyond measure. Your generous
sharing of knowledge, patience, and encouragement has not just shaped this book but
has transformed me as a writer, software engineer, and individual. My sincerest thanks
for your invaluable contribution.
My appreciation extends to the team at Apress and Springer Group, especially my
editor, Celestine Suresh John, whose keen eye and creative vision have significantly
enhanced this book. Thank you for your patience, guidance, and commitment to
excellence.
A special word of thanks goes to the teams behind the development of large
language models. To the researchers and engineers at organizations like OpenAI, Google
Brain, the Google Research team, and others who have pushed the boundaries of what’s
possible with AI and machine learning, and who generously share their findings and
tools with the world. Your work has not only inspired this book but also revolutionized
how we think about human–computer interaction.
Finally, to you, the reader, embarking on this journey to explore Python and large
language models: I hope this book serves as a valuable guide and inspires you to delve
deeper into the transformative power of programming and AI. Thank you for your
curiosity and your commitment to learning!
xix
Introduction
In the evolving landscape of technology, where the boundaries between science fiction
and reality blur, lies a transformative tool: the large language model (LLM). These
models, sophisticated engines of artificial intelligence, have not only redefined our
interaction with machines but have also opened new avenues for understanding human
language. This book, structured into seven comprehensive chapters, serves as both a
beacon and a bridge for those embarking on a journey through the intricate world of
LLMs and their application using Python.
Chapter 1, “Evolution and Significance of Large Language Models,” lays the
foundation. It’s here we start our journey, unraveling the complex yet fascinating
world of natural language processing (NLP) and large language models. With over 50
pages dedicated to setting the stage, this chapter aims to provide the reader with a
solid understanding of the evolution, significance, and basic concepts underpinning
LLMs. Through a meticulous exploration of topics such as text preprocessing, word
embeddings, and sentiment analysis, we uncover the magic and mechanics of LLMs and
their impact across various domains.
Chapter 2, “What Are Large Language Models?”, shifts the focus to the tools that
make working with LLMs possible, with a particular emphasis on “Python and Why
Python for LLMs?” It demystifies Python – a language synonymous with simplicity
and power in the world of programming. From basic syntax to the nuanced features
of Python 3.11, readers will gain the necessary knowledge to navigate the subsequent
chapters and harness Python for their LLM endeavors.
In Chapter 3, “Python for LLMs,” we plunge into the heart of LLMs, dissecting
their components and understanding their workings. This chapter covers everything
from embedding layers to attention mechanisms, providing insights into the technical
makeup of models like GPT-4, BERT, and others. It’s a chapter designed to equip readers
with a profound understanding of how LLMs predict the next token, learn from few
examples, and, occasionally, hallucinate.
Chapter 4, “Python and Other Programming Approaches,” is a practical guide to
leveraging Python for LLM development. Here, readers will familiarize themselves with
essential Python libraries, frameworks, and platforms such as Hugging Face and OpenAI
xxi
Introduction
API, exploring their use in building applications powered by LLMs. With a focus on data
preparation and a showcase of basic examples built with each framework, this chapter is
a testament to Python’s role in the democratization of AI.
Chapter 5, “Basic Overview of the Components of the LLM Architectures,”
demonstrates the versatility and potential of LLMs through practical Python
applications. Readers will learn how to employ LLMs for tasks ranging from text
generation to chatbots, each accompanied by step-by-step examples. This chapter not
only highlights the capabilities of LLMs but also inspires readers to envision and create
their own applications.
Chapter 6 of your document, titled “Applications of LLMs in Python,” explores how
Large Language Models (LLMs) are used in various domains, focusing on text generation
and creative writing. It details how LLMs can generate human-like text using models
like RNNs and transformers. The chapter covers key use cases, including content
creation, chatbots, virtual assistants, and data augmentation. It also highlights how LLMs
assist in creative writing tasks, brainstorming, dialogue crafting, world-building, and
experimental literature. Additionally, the chapter discusses language translation, text
summarization, and document understanding, emphasizing LLMs’ impact on improving
accuracy and efficiency in these areas. Finally, the chapter presents an example of
building a question-answering chatbot using LLMs.
Chapter 7 explores how Python 3.11 and libraries such as LangChain, Hugging
Face, and others are utilized to develop applications powered by large language models
(LLMs). It covers the features of LangChain, such as model interaction, data connection,
and memory, and explains how to build applications using these tools. The chapter
also discusses the integrations and use cases of LangChain in various industries, like
customer support, coding assistants, healthcare, and e-commerce, highlighting its
flexibility in creating AI-powered solutions.
This book is an invitation to a world where understanding meets creation. It’s for
the curious minds eager to decode the language of AI and for the creators ready to
shape the future. Whether you’re a student, software engineer, data scientist, an AI or
ML researcher or practitioner, or simply an enthusiast, the journey through these pages
will equip you with the knowledge and skills to participate in the ongoing conversation
between humans and machines. Welcome to the frontier of language, learning, and
imagination.
xxii
CHAPTER 1
Evolution and
Significance of Large
Language Models
Over the recent decades, there have been notable advancements in language models
and artificial intelligence technologies. Alongside advancements in computer vision,
voice and speech processing, and image processing models, large language models
(LLMs) are poised to profoundly impact the evolution of AI technologies. Therefore, it
is crucial to examine the progress of language models since their inception and, more
importantly, anticipate their future growth.
This chapter presents a concise overview of language models, tracing their
development from statistical and rule-based models to today’s transformer-based
multimodal large language models with billions of parameters. It also aims to provide a
good definition of what an LLM is and what are the mechanisms of work of the LLMs in
general. Additionally, the benefits and limitations of existing models are observed and
areas where current models require improvement are identified.
In recent times, substantial attention has been garnered by large language models,
owing to numerous accomplishments in the field of natural language processing.
Notably, the emergence of powerful chatbots like OpenAI’s ChatGPT, Google Bard, and
Meta’s LLaMA, among others, has played a pivotal role. The achievements in language
models are the culmination of decades of research and development. These models not
only advance state-of-the-art NLP technologies but are also anticipated to significantly
impact the evolution of AI technologies.
The foundational models, initially arising in NLP research, such as the transformers,
have transcended into other domains like computer vision and speech recognition.
However, AI models, especially language models, are not flawless, and the technology
1
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_1
Chapter 1 Evolution and Significance of Large Language Models
itself is akin to a double-edged sword. There are unresolved aspects that require further
research and analysis. It remains uncertain whether these models are sophisticated
computer programs generating responses solely through numerical computations and
probabilities or if they possess a form of understanding and intelligence.
The sheer size of these gigantic language models makes it challenging to interpret
their internal logic meaningfully. Hence, understanding the history of language models
is crucial for depicting a better picture of their future development. The following
paragraphs aim to provide a concise overview of language models and their evolution.
Their goal is to review key milestones and innovations in language and machine learning
models that have significantly influenced today’s modern large language models.
That’s a matter of subject that’s in front of us to research in the future:
• The hope is that this overview will offer valuable insights into
the past, present, and future of language models, contributing to
the development of trustworthy models aligned with universal
human values.
2
Chapter 1 Evolution and Significance of Large Language Models
1
A. A. Markov, An Example of Statistical Investigation of the Text Eugene Onegin Concerning the
Connection of Samples in Chains. www.alpha60.de/research/markov/,1913. 7., A. A. Markov,
An example of statistical investigation in the text of `eugene onegin illustrating coupling `tests’ in
chains, Proceedings of the Academy of Sciences Of St.Petersburg, vol. 7,pp. 153–162, 1913
2
C.E. Shannon, A mathematical theory of communication, The Bell System Technical Journal,
vol. 27, no. 3, pp. 379–423, 1948
3
Chapter 1 Evolution and Significance of Large Language Models
It is intriguing that the initial application area of the Markov chain was in language,
with Markov’s study serving as a foundational exploration into the simplest form of a
language model.
Moving forward in the mid-20th century, Claude Shannon suggested the utilization
of Markov processes to model natural language. Employing n-th order Markov chains,
he developed a statistical model capable of characterizing the probability of sequences
of letters, encompassing words and sentences. From a mathematical perspective,
Shannon’s approach involved counting the frequency of character sequences of length
n, referred to as n-grams.
In 1948, Claude Shannon revolutionized the field of information theory with his
seminal paper, “The Mathematical Theory of Communication.” This work marked the
inception of key concepts such as entropy and cross-entropy, as well as an exploration
of the n-gram model. (Shannon adopted the term “entropy” from statistical mechanics
following advice from John von Neumann.)
Entropy, in this context, signifies the uncertainty inherent in a probability
distribution, while cross-entropy encapsulates the uncertainty of one probability
distribution concerning another. Notably, entropy serves as a lower bound for cross-
entropy. To elaborate, as the length of a word sequence tends toward infinity, the
language’s entropy can be defined, assuming a constant value that can be estimated
from the language’s data.
Shannon’s contribution establishes a valuable tool for evaluating language models.
If one language model demonstrates superior accuracy in predicting word sequences
compared to another, it will exhibit lower cross-entropy. This insight provides a robust
framework for assessing language modeling. It is essential to recognize that language
models extend beyond natural languages to encompass formal and semi-formal
languages.
The pioneering works of Markov and Shannon paved the way for diverse approaches
to language modeling, encompassing rule-based systems, neural network–based
systems, and more recently, transformer-based pre-trained large language models
(LLMs) founded on attention mechanisms.
4
Chapter 1 Evolution and Significance of Large Language Models
• Either S3 or S4.
3
N. Chomsky, Three models for the description of language. IEEE Transactions on Information
Theory 2, 3 (1956), 113–124
5
Chapter 1 Evolution and Significance of Large Language Models
4
J. Weizenbaum, Eliza, a computer program for the study of natural language communication
between man and machine, Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966
6
Chapter 1 Evolution and Significance of Large Language Models
7
Chapter 1 Evolution and Significance of Large Language Models
approaches to models capable of learning and generalizing from data. This shift gave rise
to statistical language processing techniques like n-grams and hidden Markov models,
opening the door to more nuanced language analysis.
This distinction also highlights a significant difference between early rule-based
chatbot programs and contemporary advanced models based on neural networks and
large language models (LLMs). Another category of early language models relied on
information retrieval (IR) techniques. In this approach, chatbots generated responses
by matching patterns in pre-constructed databases of conversation pairs. Information
retrieval–based chatbots, exemplified by CleverBot from the late 1980s, benefited from
techniques like term frequency-inverse document frequency (TF-IDF), cosine similarity,
and state space models at the word level.
N-grams
The introduction of n-gram models in the 1990s and early 2000s marked a pivotal
advancement in statistical language modeling. Founded on a simple yet potent
concept, these models evaluated the likelihood of a word’s occurrence by examining the
preceding words within a sequence.
Despite their uncomplicated nature, n-gram models presented a crucial mechanism
for grasping context in language. They represent sequences of n adjacent items from a
sample of data, such as words in a sentence. By concentrating on the local relationships
between words, these models began capturing the inherent dependencies that shaped
meaningful linguistic expressions.
Despite their simplicity, n-grams have played a crucial role in predicting the
probability of the nth word based on the previous n-1 words. This concept has
been instrumental in both basic language modeling and the development of more
sophisticated models.
8
Chapter 1 Evolution and Significance of Large Language Models
Bag-of-Words (BOW)
Another fundamental technique in language modeling is the bag-of-words (BOW). This
simple approach represents elements of language as numerical values based on word
frequency in a document. Essentially, BOW6 utilizes word frequency to create fixed-
length vectors for document representation.
Although straightforward, BOW has been a foundational vectorization or embedding
technique. Modern language models often build upon advanced word embedding and
tokenization techniques.
The BOW model represents a method for converting a document into numerical
form, a prerequisite before employing it in a machine learning algorithm. In any natural
language processing task, this initial conversion is essential as machine learning
algorithms cannot operate on raw text; thus, we must transform the text into a numerical
representation, a process known as text embedding.
Text embedding involves two primary approaches: word vectors and document
vectors. In the word vectors approach, each word in the text is represented as a vector
(a sequence of numbers), and the entire document is then converted into a sequence
of these word vectors. Conversely, document vectors embed the entire document as a
single vector, simplifying the process compared to individual word embedding.
5
Lawrence Page, S. Brin, The PageRank Citation Ranking: Bringing Order to the Web, 1996
6
Z.S. Harris, Distributional structure, Word, vol. 10, no. 2–3, pp. 146–162, 1954
9
Chapter 1 Evolution and Significance of Large Language Models
Moreover, it ensures that all documents are embedded in the same size, a
convenience for machine learning algorithms that often require a fixed-size input.
For instance, with a vocabulary of 1000 words, the document is expressed as a
1000-dimensional vector, where each entry signifies the frequency of the corresponding
vocabulary word in the document.
While this technique may be limited for complex tasks, it serves well for simpler
classification problems. Its simplicity and ease of use make it an appealing choice for
embedding a set of documents and applying various machine learning algorithms. The
BOW model offers easy implementation and swift execution.
Unlike other embedding methods that often demand specialized domain knowledge
or extensive pre-training, this approach avoids such complexities or even manual
feature engineering. It essentially works out of the box. However, its efficacy is limited
to relatively simple tasks that do not rely on understanding the contextual nuances
of words.
The typical application of the bag-of-words model is in embedding documents for
classifier training. Classification tasks involve categorizing documents into multiple
types, and the model’s features are particularly effective for tasks like spam filtering,
sentiment analysis, and language identification.
For instance, spam emails can be identified based on the frequency of key phrases like
“act now” and “urgent reply,” while sentiment analysis can discern positive or negative
tones using terms like “boring” and “awful” vs. “beautiful” and “spectacular.” Additionally,
language identification becomes straightforward when examining the vocabulary.
Once documents are embedded, they can be fed into a classification algorithm.
Common choices include the naive Bayes classifier, logistic regression, or decision
trees/random forests – options that are relatively straightforward to implement and
understand compared to more complex neural network solutions.
7
H.P. Luhn, Statistical approach to mechanized encoding and searching of literary information,
IBM Journal of Research and Development, vol. 1,no. 4, pp. 309–317, 1957
10
Chapter 1 Evolution and Significance of Large Language Models
On the other hand, inverse document frequency (IDF) assesses the proportion of
documents in the corpus that contain the term. Terms unique to a small percentage
of documents, such as technical jargon terms, receive higher IDF values compared to
common words found across all documents, like “a,” “the,” and “and.”
The TF-IDF score for a term is determined by the multiplication of its TF and IDF
scores. In simpler terms, a term holds high importance when it appears frequently in a
specific document but infrequently across others. This balance between commonality
within a document, measured by TF, and rarity between documents, measured by
IDF, results in the TF-IDF score, indicating the term’s significance for a document in
the corpus.
11
Chapter 1 Evolution and Significance of Large Language Models
8
T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in
vector space, preprint arXiv:1301.3781,2013
9
D.E. Rumelhart, G.E. Hinton, R.J. Williams, et al., Learning internal representations by error
propagation, 1985
12
Chapter 1 Evolution and Significance of Large Language Models
13
Chapter 1 Evolution and Significance of Large Language Models
1735–1780
14
Chapter 1 Evolution and Significance of Large Language Models
In the RNNs at each position, there exists an intermediate representation for each
layer, embodying the “state” of the word sequence up to that point. The intermediate
representation for the current layer at the current position is determined by the
intermediate representation of the same layer at the previous position and the
intermediate representation of the layer beneath at the current position. The ultimate
intermediate representation at the current position plays a pivotal role in computing the
probability of the next word.
12
Y. Bengio, P. Simard, and P. Frasconi, Learning long-term dependencies with gradient descent is
difficult, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994
15
Chapter 1 Evolution and Significance of Large Language Models
• Forget Gate
• Input Gate
• Output Gate
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol. 9, no. 8,
13
16
Chapter 1 Evolution and Significance of Large Language Models
The initial part decides whether information from the previous timestamp is relevant
and should be retained or deemed irrelevant and discarded. The second segment
focuses on learning new information from the input. Finally, in the third section,
the updated information from the current timestamp is passed to the subsequent
timestamp, constituting a single time step in the LSTM cycle.
These three components are also known as gates. The gates govern the flow of
information in and out of the memory cell or LSTM cell. The Forget Gate, the Input Gate,
and the Output Gate regulate these flows. An LSTM unit, comprising these gates along with a
memory cell or LSTM cell, can be likened to a layer of neurons in a traditional feedforward
neural network, where each neuron encompasses a hidden layer and a current state.
Similar to a basic RNN, LSTM possesses a hidden state denoted as H(t-1) for
the previous timestamp and H(t) for the current timestamp. Additionally, LSTM
incorporates a cell state represented by C(t-1) and C(t) for the previous and current
timestamps, respectively. The hidden state is identified as Short-Term Memory, while
the cell state is termed Long-Term Memory.
Example: Consider two sentences separated by a full stop: “John is a nice person,”
followed by “Allison, on the other hand, is bad.” As we transition from the first sentence
to the second, the LSTM network, utilizing the Forget Gate, can appropriately disregard
information about John and acknowledge that the focus has shifted to Allison.
17
Chapter 1 Evolution and Significance of Large Language Models
14
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio,
Learning phrase representations using RNN encoder-decoder for statistical machine translation,
arXivpreprintarXiv:1406.1078, 2014
15
I. Sutskever, O. Vinyals, and Q.V. Le, Sequence to sequence learning with neural networks,
Advances in neural information processing systems, vol. 27, 2014
18
Chapter 1 Evolution and Significance of Large Language Models
19
Chapter 1 Evolution and Significance of Large Language Models
Attention Mechanism
For decades, researchers have endeavored to develop artificial systems and integrate
elements of the human visual system and brain mechanisms into these systems. A
significant stride toward achieving this objective occurred in 2014, where researchers
introduced a fundamental yet ingenious concept: the incorporation of an attention
mechanism16 into encoder-decoder architectures.
The key proposition was that the relative importance of elements within a sequence
could be encoded and assigned during sequence processing. The attention technique
proposed an alignment model to compute scores, which were then subjected to a
softmax function to generate weights. Subsequently, a context vector was derived
through a weighted sum of the encoder’s hidden states.
The integration of the attention mechanism stands as a pivotal concept in advancing
artificial neural networks and, more broadly, machine learning models. Notably,
it served as inspiration for several influential breakthroughs in natural language
processing (NLP), including the renowned transformer architecture. This transformer
architecture, in turn, paved the way for various innovative developments such as the
vision transformer (ViT), the BERT model, and GPT models, leading to the creation of
powerful chatbot programs like ChatGPT.
D. Soydaner, Attention mechanism in neural networks: where it comes and where it goes,
16
Neural Computing and Applications, vol. 34, no. 16, pp. 13371–13385, 2022
20
Chapter 1 Evolution and Significance of Large Language Models
17
https://research.google/pubs/attention-is-all-you-need/
18
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, arXiv preprint arXiv:1810.04805, 2018
21
Chapter 1 Evolution and Significance of Large Language Models
22
Chapter 1 Evolution and Significance of Large Language Models
images, video, audio, and text. Through the integration of different modalities, a deep
learning model can gain a more comprehensive understanding of its surroundings, as
certain cues are exclusive to specific modalities.
To illustrate, consider the task of emotion recognition; while visual cues from a
human face (visual modality) are crucial, the tone and pitch of a person’s voice (audio
modality) carry significant information about their emotional state, which may not be
evident from facial expressions alone, even when they are synchronized.
While Unimodal or Monomodal models, which process only a single modality, have
undergone extensive research and yielded impressive results in areas such as computer
vision and natural language processing, they possess inherent limitations. As a result,
there is a growing necessity for multimodal models.
Multimodal models frequently leverage deep neural networks, although earlier
research has integrated other machine learning models like hidden Markov models
(HMM) or Restricted Boltzmann Machines (RBM).
In the domain of multimodal deep learning, the prevalent modalities typically
encompass visual elements (such as images and videos), textual information, and
auditory components (including voice, sounds, and music). Nonetheless, less
conventional modalities also play a role, such as 3D visual data, depth sensor data, and
LiDAR data, particularly in applications like self-driving cars.
Nevertheless, the most prevalent combinations involve the four primary
modalities:
• Image + Text
• Image + Audio
• Text + Audio
23
Chapter 1 Evolution and Significance of Large Language Models
Representation, the first challenge, involves the encoding of data from diverse
modalities into a vector or tensor. The creation of effective representations that
encapsulate the semantic information of raw data is crucial for the success of machine
learning models. However, the difficulty lies in extracting features from heterogeneous
data in a manner that optimally exploits their synergies. It is equally essential to fully
harness complementarity among different modalities while avoiding the inclusion of
redundant information.
Multimodal representations can be categorized into two types:
1. Joint Representation: Each individual modality undergoes
encoding and is subsequently placed into a shared high-
dimensional space. This direct approach may be effective,
especially when modalities exhibit similar natures.
Fusion
The fusion task involves integrating information from two or more modalities to perform
a prediction. Effectively merging diverse modalities, such as video, speech, and text,
poses a challenge due to the heterogeneous nature of multimodal data.
Fusing heterogeneous information constitutes a fundamental aspect of multimodal
research, accompanied by a myriad of challenges. Practical issues include addressing
variations in formats, lengths, and non-synchronized data. Theoretical challenges
revolve around identifying the optimal fusion technique, encompassing simple
operations like concatenation or weighted sums, as well as more intricate mechanisms
such as transformer networks or attention-based recurrent neural networks (RNNs).
The choice between early and late fusion is also a consideration. Early fusion
integrates features immediately after extraction, employing mechanisms like those
mentioned previously. In contrast, late fusion performs integration only after each
unimodal network produces a prediction, often employing voting schemes, weighted
averages, or other techniques. Hybrid fusion techniques, which combine outputs from
early fusion and unimodal predictors, have also been proposed.
24
Chapter 1 Evolution and Significance of Large Language Models
Alignment
Alignment involves identifying direct relationships between different modalities.
Contemporary multimodal learning research strives to create modality-invariant
representations, meaning that when different modalities signify a similar semantic
concept, their representations should be close together in a latent space. For instance,
the sentence “she dived into the pool,” an image of a pool, and the audio signal of a
splash sound should share proximity in the representation space manifold.
Translation
Translation entails mapping one modality to another, exploring how one modality (e.g.,
textual) can be translated to another (e.g., visual) while preserving semantic meaning.
Translations are inherently open-ended and subjective, lacking a definitive answer,
adding complexity to the task.
Current multimodal learning research involves constructing generative models for
modal translations. Recent examples include DALL-E and other text-to-image models,
illustrating the development of generative models facilitating translations between text
and visual modalities.
Co-learning
Multimodal co-learning is directed toward transferring knowledge acquired from one or
more modalities to tasks involving other modalities. This approach is particularly crucial
in scenarios characterized by low-resource target tasks, the absence or partial presence
of modalities, or the presence of noisy data.
Translation can be employed as a method of co-learning to facilitate the transfer
of knowledge from one modality to another. Neuroscientific insights also indicate
that humans may engage in co-learning through translation methods. Individuals
experiencing aphantasia, the inability to mentally create images, demonstrate poorer
performance on memory tests. Conversely, those adept at creating such mappings,
such as translating between textual/auditory and visual modalities, exhibit enhanced
performance on memory tests. This underscores the significance of the ability to
convert representations across different modalities as a key aspect of human cognition
and memory.
25
Chapter 1 Evolution and Significance of Large Language Models
These components are commonly referred to as the encoding module, the fusion
module, and the classification module.
26
Chapter 1 Evolution and Significance of Large Language Models
Image Captioning
Image captioning involves generating brief textual descriptions for given images,
utilizing multimodal datasets containing both images and text descriptions. It addresses
the translation challenge by converting visual representations into textual ones.
Models need to grasp the semantics of an image, detecting key objects, actions, and
characteristics. This task can also extend to video captioning.
Image captioning models find utility in providing text alternatives to images,
benefiting blind and visually impaired users.
Image Retrieval
Image retrieval entails finding relevant images within a large database based on
a retrieval key. This task, also known as Content-Based Image Research (CBIR),
leverages deep learning multimodal models, offering a broader solution with enhanced
capabilities, often reducing the reliance on traditional tags. Image retrieval is evident in
everyday scenarios, such as search engine results displaying images related to a query.
Text-to-Image Generation
Text-to-image generation is a popular multimodal learning application that tackles the
translation challenge. Models like OpenAI’s DALL-E and Google’s Imagen create images
based on short text descriptions, providing a visual representation of the text’s semantic
meaning. These models find application in photoshopping, graphic design, and digital
art inspiration.
Emotion Recognition
Emotion recognition, a use case where multimodal datasets outperform monomodal
ones, involves analyzing video, text, and audio data to identify emotional states.
Incorporating sensor data like encephalogram data further enhances the multimodal
input. While multimodal datasets convey more information, training multimodal
networks poses challenges, sometimes leading to performance degradation compared
to single modality counterparts. Understanding these difficulties is crucial for effective
implementation.
In summary, multimodal deep learning finds diverse applications, enhancing the
capabilities of computer vision systems across various tasks.
27
Chapter 1 Evolution and Significance of Large Language Models
28
Chapter 1 Evolution and Significance of Large Language Models
NLP encompasses the task of enabling machines to grasp, decipher, and generate
human language in a manner that is both valuable and meaningful. OpenAI, renowned
for developing advanced language models such as ChatGPT, underscores the pivotal
role of NLP in crafting intelligent systems capable of comprehending, responding to, and
generating text, thereby enhancing the user-friendliness and accessibility of technology.
Elements of NLP
Natural language processing is not a single, unified approach but instead consists of
several constituent elements, each contributing to the comprehensive comprehension
of language. The primary components that NLP endeavors to fathom encompass syntax,
semantics, pragmatics, and discourse.
29
Chapter 1 Evolution and Significance of Large Language Models
Syntax
Syntax encompasses the organization of words and phrases to construct well-structured
sentences within a language. In the sentence “The boy played with the ball” syntax
entails scrutinizing the grammatical structure of this sentence to ensure it adheres to
English grammatical rules, including subject-verb agreement and correct word order.
Semantics
Semantics is concerned with grasping the meanings of words and how they collectively
convey meaning when used in sentences. In the sentence “The panda eats shoots and
leaves,” semantics assists in discerning whether the panda consumes plants (shoots and
leaves) or engages in a violent act (shoots) and subsequently departs (leaves), depending
on word meanings and context.
Pragmatics
Pragmatics deals with comprehending language within diverse contexts, ensuring
that the intended meaning is derived based on the situation, the speaker’s intent, and
shared knowledge. When someone says, “Can you pass the salt?” pragmatics involves
recognizing that it is a request rather than an inquiry about one’s capability to pass the
salt, interpreting the speaker’s intent within the dining context.
Discourse
Discourse concentrates on the analysis and interpretation of language beyond the
sentence level, considering how sentences interrelate within texts and conversations.
In a conversation where one person exclaims, “I’m freezing,” and another responds,
“I’ll close the window,” discourse entails understanding the coherence between the
two statements, acknowledging that the second statement is a response to the implied
request in the first.
Comprehending these constituent elements is imperative for individuals venturing
into NLP, as they constitute the foundation of how NLP models decipher and generate
human language.
30
Chapter 1 Evolution and Significance of Large Language Models
NLP Tasks
Human language is rife with complexities that pose significant challenges when it comes
to developing software capable of accurately discerning the intended meaning from
text or spoken data. Homonyms, homophones, sarcasm, idioms, metaphors, exceptions
in grammar and usage, and variations in sentence structures are just a few of the
intricacies of human language. While humans may take years to master these nuances,
programmers must instill the ability to recognize and understand them accurately from
the outset in natural language-driven applications, if these applications are to be truly
effective.
Various NLP tasks are employed to deconstruct human text and voice data in ways
that facilitate the computer’s comprehension. Some of these tasks encompass:
31
Chapter 1 Evolution and Significance of Large Language Models
Tokenization
This process entails the segmentation of text into individual words, phrases, symbols,
or other meaningful units known as tokens. Tokenization serves as a fundamental
objective: to convey text in a format that retains its context while making it
comprehensible to machines. Through the transformation of text into tokens, algorithms
gain the capability to discern patterns effectively.
This capacity for pattern recognition is pivotal as it empowers machines to interpret
and react to human inputs. For instance, when confronted with the word “running,” a
machine doesn’t perceive it as a solitary entity; instead, it views it as a composition of
tokens that it can dissect and extract meaning from.
Tokenization methods can vary depending on the level of granularity required and
the specific demands of the task. These methods encompass a spectrum, ranging from
disassembling text into individual words to breaking it down into characters or even
smaller units:
32
Chapter 1 Evolution and Significance of Large Language Models
Parsing
In natural language processing (NLP), parsing refers to the process of analyzing the
grammatical structure of a sentence or a piece of text to determine its syntactic and
grammatical elements. The primary goal of parsing is to understand the hierarchical
relationships between words and phrases within a sentence, identifying how they
function within the sentence’s structure.
Parsing involves the following key tasks:
33
Chapter 1 Evolution and Significance of Large Language Models
Lemmatization
This technique aims to reduce words to their base or root form, facilitating the grouping
of various word forms with a common root. Lemmatization aims to find the base or
dictionary form of a word, known as the “lemma,” while preserving the word’s actual
meaning. It involves considering the word’s part of speech (e.g., noun, verb, adjective)
and context in order to perform a more accurate transformation.
Lemmatization results in valid words that can be found in a dictionary, making it
suitable for tasks where the interpretability and meaningfulness of words are crucial.
For example, the word “running” would be lemmatized to “run,” and “better” would be
lemmatized to “good.”
Word Segmentation
Word segmentation involves the process of extracting individual words from a given text
string. For instance, when a person scans a handwritten document into a computer, an
algorithm can be employed to analyze the page and identify word boundaries, typically
indicated by spaces between words.
34
Chapter 1 Evolution and Significance of Large Language Models
Morphological Segmentation
Morphological segmentation dissects words into smaller units known as morphemes.
For instance, the word “untestably” would be deconstructed into [[un[[test]able]]ly],
with the algorithm recognizing “un,” “test,” “able,” and “ly” as distinct morphemes. This
technique proves particularly valuable in applications like machine translation and
speech recognition.
Stemming
Stemming aims to isolate the root forms of words, especially those that contain
inflections. For instance, in the sentence, “The boy played with the ball,” the algorithm
can identify that the root form of the word “played” is “play.” This functionality proves
useful when users seek to analyze a text for all occurrences of a specific word, including
its various conjugations. The algorithm recognizes that these conjugated forms
essentially represent the same word, despite variations in their lettering.
Employing this text-processing technique proves valuable in addressing issues
of sparsity and standardizing vocabulary. It not only aids in minimizing redundancy,
as inflected words and their word stems typically convey the same meaning, but it
also enables NLP models to establish connections between inflected words and their
corresponding word stems. This linkage enhances the model’s comprehension of how
these words are utilized in analogous contexts.
Stemming algorithms operate by identifying common prefixes and suffixes
encountered in inflected words and truncating them from the word. Occasionally, this
may yield word stems that do not correspond to actual words. Consequently, while this
approach undoubtedly offers advantages, it is not exempt from its inherent limitations.
35
Chapter 1 Evolution and Significance of Large Language Models
36
Chapter 1 Evolution and Significance of Large Language Models
37
Chapter 1 Evolution and Significance of Large Language Models
Tokenization
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
text = "The quick brown fox jumped over the lazy dog."
tokens = word_tokenize(text)
print(tokens)
Output:
['The', 'quick', 'brown', 'fox', 'jumped', 'over', 'the', 'lazy',
'dog', '.']
38
Chapter 1 Evolution and Significance of Large Language Models
nltk.download('stopwords')
text = "The quick brown fox jumped over the lazy dog."
stop_words = set(stopwords.words('english'))
tokens = word_tokenize(text)
filtered_tokens = [token for token in tokens if not token in stop_words]
print(filtered_tokens)
Output:
['The', 'quick', 'brown', 'fox', 'jumped', 'lazy', 'dog', '.']
Stemming
import nltk
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
nltk.download('punkt')
text = "The quick brown foxes jumped over the lazy dogs."
stemmer = PorterStemmer()
tokens = word_tokenize(text)
stemmed_tokens = [stemmer.stem(token) for token in tokens]
print(stemmed_tokens)
Output
['the', 'quick', 'brown', 'fox', 'jump', 'over', 'the', 'lazi', 'dog', '.']
Lemmatization
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('wordnet')
text = "The quick brown foxes jumped over the lazy dogs."
lemmatizer = WordNetLemmatizer()
tokens = word_tokenize(text)
lemmatized_tokens = [lemmatizer.lemmatize(token) for token in tokens]
print(lemmatized_tokens)
39
Chapter 1 Evolution and Significance of Large Language Models
Output:
text = "The quick brown foxes jumped over the lazy dogs."
tokens = nltk.word_tokenize(text)
bigrams = ngrams(tokens, 2) trigrams = ngrams(tokens, 3)
print(list(bigrams))
print(list(trigrams))
Output:
Output:
40
Chapter 1 Evolution and Significance of Large Language Models
Output:
TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [ "The quick brown fox jumps over the lazy dog.",
"The quick brown foxes jump over the lazy dogs and cats.",
"The lazy dogs and cats watch the quick brown foxes jump over
the moon."]
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(corpus)
print(tfidf_matrix.toarray())
Output:
Table 1-1 lists some common feature extraction techniques for natural language
processing based on NLTK.
41
Chapter 1 Evolution and Significance of Large Language Models
42
Chapter 1 Evolution and Significance of Large Language Models
Word Embedding
In word embedding, both words and documents are converted into numerical vectors.
This conversion ensures that words with similar meanings share similar vector
representations. Its technique captures meanings in a compressed dimensional space,
offering a faster alternative to traditional, more complex graph embedding models like
WordNet.
The vectors are then utilized by machine learning models, maintaining the text’s
semantic and syntactic integrity. The transformed data is processed by NLP algorithms,
which efficiently interpret these representations.
43
Chapter 1 Evolution and Significance of Large Language Models
• Count Vector
• TF-IDF Vector
• Co-occurrence Vector
Word embeddings are capable of training advanced deep learning models such
as GRU, LSTM, and Transformers. These models have shown remarkable success in
various NLP tasks, including sentiment analysis, named entity recognition, and speech
recognition.
The effectiveness of word embedding has boosted the popularity of ML in NLP,
making it a highly favored field among developers. For example, a 50-value word
embedding can represent 50 distinct characteristics. Many opt for preexisting word
embedding models such as Flair, fastText, and spaCy, among others.
Common techniques for word embeddings, some of them already featured in
this book, are as follows:
Benefits
1. Word embeddings excel in grasping the meaning and context
of words, leading to enhanced accuracy in text analysis and
predictions.
44
Chapter 1 Evolution and Significance of Large Language Models
Limitations
1. The effectiveness of word embeddings might be limited for certain
text types or languages.
2. They might not fully capture all subtleties of meaning and context,
as they rely on statistical patterns in data and may not accurately
represent real semantic links between words.
Example:
Before running the example, run the command pip install gensim to get the latest
version of Gensim.
sentences = [ "The quick brown fox jumps over the lazy dog".split(),
"The lazy dog watches the quick brown fox".split(),
"The quick brown cat jumps over the lazy dog".split(),
"The lazy dog watches the quick brown cat".split() ]
Output:
The expected output is too big; that’s why just an exception is presented:
Semantic Understanding
Machines often struggle to grasp the inherent meaning in words, sentences, and
documents. Techniques like word sense disambiguation and recognition of meanings
can improve machines’ understanding of language.
45
Chapter 1 Evolution and Significance of Large Language Models
46
Chapter 1 Evolution and Significance of Large Language Models
Grasping these concepts is vital for NLP applications aimed at extracting insights
and information and supporting automated processing and conversational systems like
chatbots.
Sentiment Analysis
The primary goal of sentiment analysis is to automatically assess the sentiment or
emotional state conveyed by the text, whether it is positive, negative, neutral, or a more
nuanced sentiment like joy, anger, sadness, or surprise.
47
Chapter 1 Evolution and Significance of Large Language Models
Each of these techniques plays a pivotal role in enabling computers to process and
make sense of human language, serving as the fundamental building blocks for more
advanced NLP applications.
Example:
Before running the example, run the command pip install textblob==0.18.0.
def analyze_sentiment(text):
testimonial = TextBlob(text)
polarity = testimonial.sentiment.polarity
subjectivity = testimonial.sentiment.subjectivity
48
Chapter 1 Evolution and Significance of Large Language Models
# Example usage
text = "I love this phone. It has an amazing camera!"
result = analyze_sentiment(text)
print(f"Sentiment Analysis Result: {result}")
Text Classification
In today’s increasingly digital world, understanding and processing vast volumes of
data is crucial. Natural language processing (NLP), a dynamic and influential domain in
computer science, addresses this need.
For instance, emails can be classified as spam or not, tweets as positive or negative,
and articles as relevant or irrelevant to a specific subject. This guide aims to introduce
the fundamentals of text classification, paving the way for you to develop your
own models.
49
Chapter 1 Evolution and Significance of Large Language Models
50
Chapter 1 Evolution and Significance of Large Language Models
51
Chapter 1 Evolution and Significance of Large Language Models
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
Supervised NLP
Supervised NLP methods involve training the software using a dataset where both
inputs and corresponding outputs are labeled or known. Initially, the program processes
extensive sets of established data, learning to generate correct outputs for any unfamiliar
input. For instance, companies utilize supervised NLP to train tools for categorizing
documents according to predefined labels.
Unsupervised NLP
Unsupervised NLP relies on statistical language models to anticipate patterns when
provided with input that lacks labeling. An example of this is the autocomplete feature
in text messaging, which suggests appropriate words for a sentence by observing the
user’s input.
52
Chapter 1 Evolution and Significance of Large Language Models
NLP Challenges
Despite its progress, natural language processing (NLP) grapples with several formidable
challenges arising from the intricate and nuanced nature of human language. Here are
some of these prominent NLP challenges:
53
Chapter 1 Evolution and Significance of Large Language Models
54
Chapter 1 Evolution and Significance of Large Language Models
techniques, LLMs do not require explicit programming or feature engineering for each
specific task. Instead, they can learn intricate relationships and patterns directly from the
data, making them exceptionally versatile.
However, this versatility comes at a cost. LLMs are computationally intensive,
requiring substantial resources for training and deployment. Additionally, they are
data-hungry, relying on vast amounts of text data to achieve optimal performance. These
factors pose challenges in terms of computational infrastructure and data availability,
which must be carefully considered when employing LLMs in real-world applications.
Understanding the distinction between traditional NLP techniques and LLMs is vital
for selecting the most appropriate approach for a given task. Traditional NLP techniques
may be more suitable when the task is well defined and the required data is limited. On
the other hand, LLMs offer unparalleled versatility and the potential to tackle a broader
range of tasks, albeit with higher computational demands and data requirements.
55
Chapter 1 Evolution and Significance of Large Language Models
Table 1-2 summarizes the key differences between conventional NLP techniques
and LLMs.
Despite these challenges, LLMs represent a major breakthrough in the field of NLP
and are rapidly advancing the state of the art in many applications. As these models
continue to develop and improve, it is expected to see them play an increasingly
important role in our lives.
While the bedrock of NLP provides the core techniques and foundational knowledge
necessary for tackling language processing tasks, LLMs herald a groundbreaking leap
forward. LLMs offer an unrivaled breadth of versatility and exceptional performance
across a diverse spectrum of tasks. However, it is important to acknowledge that LLMs
come with their own set of unique requirements and challenges.
For those aspiring to embark on a journey into the captivating realm of language
processing and artificial intelligence, a comprehensive understanding of both NLP
principles and the capabilities of LLMs is of paramount importance. By mastering the
fundamentals of NLP, one gains insights into the essential techniques, algorithms,
and mathematical underpinnings that facilitate the processing, understanding, and
generation of human language.
On the other hand, exploring the world of LLMs unveils their remarkable ability
to learn from vast text corpora, enabling them to perform a multitude of language-
related tasks, ranging from text generation and translation to question answering and
summarization. LLMs have demonstrated impressive results in various domains, such
as healthcare, finance, and customer service, showcasing their potential to revolutionize
industries.
56
Chapter 1 Evolution and Significance of Large Language Models
However, it is crucial to recognize that LLMs are not without limitations. They may
exhibit biases inherited from the training data, struggle with common-sense reasoning,
and occasionally generate inaccurate or nonsensical responses. Understanding these
challenges is essential for responsible and effective deployment of LLMs in real-world
applications.
By cultivating a deep understanding of both NLP fundamentals and LLM
capabilities, aspiring practitioners and researchers will be well equipped to navigate the
complexities of language processing, contribute to the ongoing advancements in the
field, and unlock the full potential of LLMs in driving innovation and solving real-world
problems.
Summary
This chapter provides a comprehensive overview of the development and impact of
large language models (LLMs) within the field of artificial intelligence. The chapter
traces the progression from early statistical and rule-based models to contemporary
transformer-based models with billions of parameters. It discusses the emergence and
achievements of notable LLMs such as OpenAI’s ChatGPT, Google Bard, and Meta’s
LLaMA, highlighting their contributions to natural language processing (NLP).
The chapter concludes by discussing the era of multimodal learning, where LLMs are
integrated with diverse data types such as text, images, and audio.
In the following chapter, we will explore in detail the following:
57
CHAPTER 2
1
W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al.,
“A survey of large language models,” arXiv preprint arXiv:2303.18223, 2023
59
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_2
Chapter 2 What Are Large Language Models?
The initial milestone in LM was the advent of Statistical Language Models (SLMs),
including n-gram models.2 These models gauge the probability of the next word in
a sequence based on the frequency of previous n-grams of words.3 For instance, a
bigram model leverages the frequency of word pairs to estimate the probability of the
succeeding word.
The second phase of LM development introduced Neural Language Models
(NLMs), also known as neural language modeling. This approach employs neural
networks to predict the probability distribution of the next word given the preceding
words in the sequence. Recurrent neural networks (RNNs) and their variations like Long
Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) are commonly utilized
in this paradigm.4
The third stage of LM evolution encompasses the emergence of contextualized word
embeddings, termed Pre-trained Language Models (PLMs). These models employ
neural networks to acquire a vector representation of words that considers the context in
which the word appears. Examples of contextualized word embeddings include ELMo5
and BERT.
The fourth phase of language model (LM) advancement marked the inception of
extensive pre-training language models known as Large Language Models (LLMs).6
These models, exemplified by GPT-3 and GPT-4, possess the capability to excel in
various natural language processing (NLP) tasks. They undergo training on vast volumes
of text data and can be fine-tuned for specific tasks like language translation or question
answering.
To sum up, these four developmental stages in LM (visualized in Figure 2-1)
signify substantial progress in the field, with each stage building upon its predecessor
and pushing the boundaries of what machines can accomplish in both NLP and
computer vision.
2
J. Gao and C.-Y. Lin, “Introduction to the special issue on statistical language modeling,” 2004
3
A. Stolcke, “Srilm-an extensible language modeling toolkit,” in Seventh international conference
on spoken language processing, 2002
4
S. Kombrink, T. Mikolov, M. Karafi´at, L. Burget, “Recurrent neural network based language
modeling in meeting recognition,” in Interspeech, vol. 11, pp. 2877–2880, 2011
5
M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep
contextualized word representations. arxiv 2018,” arXiv preprint arXiv:1802.05365, vol. 12, 2018
6
M. Shanahan, “Talking about large language models,” arXiv preprint arXiv:2212.03551, 2022
60
Chapter 2 What Are Large Language Models?
61
Chapter 2 What Are Large Language Models?
text data from diverse sources, such as books, articles, websites, and various written
content forms. Through this training, they analyze statistical relationships among words,
phrases, and sentences, enabling them to generate coherent and contextually relevant
responses to prompts or queries.
For example, GPT-3, the large language model within ChatGPT, is an example
trained on vast Internet text data, endowing it with the ability to comprehend multiple
languages and possess knowledge across diverse topics. Its proficiency in producing text
across various styles, including translation, text summarization, and question answering,
may appear remarkable. However, these capabilities operate using specific “grammars”
aligned with prompts, explaining their impressive performance.
62
Chapter 2 What Are Large Language Models?
63
Chapter 2 What Are Large Language Models?
64
Chapter 2 What Are Large Language Models?
During the inference stage, when interacting with an LLM, a user inputs a prompt
or query, and the model generates a response based on learned knowledge and context.
This response is generated using probabilistic methods that consider the likelihood of
various words or phrases given the input context, showcasing the data preprocessing
pipelines for pre-training LLMs.
Tokenization
In the training of LLMs to predict text, a fundamental preprocessing step is tokenization,
a common practice in natural language processing systems. Tokenization aims to break
down the text into nondecomposing units known as tokens, with these tokens being
characters, subwords, symbols, or words depending on the model’s size and type.
Attention
The concept of attention, particularly selective attention, has been extensively examined
within the realms of perception, psychophysics, and psychology. Selective attention
can be understood as “the programming by the O of which stimuli will be processed or
encoded and in what order this will occur.”
65
Chapter 2 What Are Large Language Models?
While this definition originates from visual perception, it bears striking resemblances
to the recently formulated attention mechanisms7 (determining which stimuli will be
processed) and positional encoding (deciding the order of processing) in LLMs.
7
S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan,
S. Purohit, U. S. Prashanth, E. Raff, et al., “Pythia: A suite for analyzing large language models
across training and scaling,” arXiv preprint arXiv:2304.01373, 2023
66
Chapter 2 What Are Large Language Models?
Positional Encoding
The attention modules, as designed, do not inherently account for the order of
processing. To address this, transformers introduced “positional encodings” to
incorporate information about the positions of tokens in input sequences. Various
positional encoding variants have been proposed. Intriguingly, a recent study8 suggests
that incorporating this information may not significantly impact state-of-the-art
decoder-only transformers.
• Alibi: This approach subtracts a scalar bias from the attention score
calculated using two tokens, and this bias increases with the distance
between the positions of the tokens. This learned method effectively
favors recent tokens for attention.
8
M. Irfan, A. I. Sanka, Z. Ullah, R. C. Cheung, “Reconfigurable content-addressable memory
(CAM) on FPGAs: A tutorial and survey,” Future Generation Computer Systems, vol. 128,
pp. 451–465, 2022
67
Chapter 2 What Are Large Language Models?
• RoPE: In LLMs, keys, queries, and values are all vectors. RoPE
involves rotating the query and key representations at an angle
proportional to their absolute positions in the input sequence.
This rotation results in a relative positional encoding scheme that
diminishes with the distance between the tokens.
Activation Functions
Activation functions play a crucial role in enhancing the curve-fitting capabilities of
neural networks. The contemporary activation functions employed in LLMs differ from
earlier squashing functions but are integral to the success of LLMs.
ReLU
The Rectified Linear Unit (ReLU) is defined as ReLU(x) = max(0, x) (1).
GELU
The Gaussian Error Linear Unit (GELU) combines ReLU, dropout, and zoneout. It stands
out as the most widely utilized activation function in current LLM literature.
GLU Variants
The Gated Linear Unit is a neural network layer involving an element-wise product (⊗)
of a linear transformation and a sigmoid-transformed (σ) linear projection of the input,
given by the equation:
GLU(x,W,V,b,c)=(xW+b)⊗σ(xV+c) (2)
Here, x represents the input of the layer, and W, b, V, and c are learned parameters.
GLU was modified to assess the impact of various variations in the training and
testing of transformers, leading to improved empirical results. The following are different
GLU variations introduced9 and utilized in LLMs:
9
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu,
“Exploring the limits of transfer learning with a unified text-to-text transformer,” The Journal of
Machine Learning Research, vol. 21, no. 1, pp. 5485–5551, 2020
68
Chapter 2 What Are Large Language Models?
• ReGLU(x,W,V,b,c)=max(0,xW+b)⊗
• GEGLU(x,W,V,b,c)=GELU(xW+b)⊗(xV+c)
• SwiGLU(x,W,V,b,c,β)=Swishβ(xW+b)⊗(xV+c)
Layer Normalization
Layer normalization facilitates quicker convergence and is a prevalent component in
transformers. In this section, we explore various normalization techniques extensively
employed in LLM literature.
LayerNorm
Layer normalization calculates statistics across all the hidden units within a layer (l)
using the following formula:
1 n l 1 n l
∑ai ∑ ( ai − u l ) n
2
ul = σl =
n i n i
In this equation, where n represents the number of neurons in layer l and ail is the
aggregated input of the i-th neuron in layer l, LayerNorm offers invariance to both weight
rescaling and distribution re-centering.
RMSNorm
RMSNorm challenges the presumed invariance properties of LayerNorm. It suggests
that comparable performance benefits to those of LayerNorm can be achieved through
a computationally efficient normalization technique that compromises re-centering
invariance for speed. The normalized summed input to layer \(l\) in LayerNorm is
expressed as follows:
ail − u l l
ail = gi
σ
69
Chapter 2 What Are Large Language Models?
DeepNorm
Despite the advantages of pre-LN over post-LN training, pre-LN training can
inadvertently impact gradients,10 leading to larger gradients in the earlier layers
compared to those at the bottom. DeepNorm is introduced as a solution to mitigate
these adverse effects on gradients. Its formulation is expressed as
l l l
( l
x f = LN (α x p + G p x p ,θ
lp
)
In this expression, α denotes a constant, and θ_lp represents the parameters of layer
lp. These parameters undergo scaling by another constant β. Both of these constants are
architecture dependent.
10
J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas,
L. A. Hendricks, J. Welbl, A. Clark, et al., “Training compute-optimal large language models,”
arXiv preprint arXiv:2203.15556, 2022
70
Chapter 2 What Are Large Language Models?
Data Preprocessing
Data preprocessing is a crucial step in machine and deep learning that involves cleaning,
transforming, and organizing raw data into a usable format. This process enhances data
quality, ensures consistency, and prepares the dataset for effective analysis and model
training, ultimately leading to more accurate and reliable results.
The data preprocessing techniques utilized in the training of large language
models (LLMs) include the following:
71
Chapter 2 What Are Large Language Models?
Architectures
In this section, different variants of transformer architectures are explored at a higher
level, which stem from variations in attention application and the interconnection of
transformer blocks. Figure 2-2 illustrates the attention patterns of these architectures.
72
Chapter 2 What Are Large Language Models?
Encoder-Decoder
Originally designed for sequence transduction models, transformers adopted the
encoder-decoder architecture, particularly for tasks like machine translation. This
architectural design involves an encoder that encodes input sequences into variable-
length context vectors. These vectors are then passed to the decoder, aiming to minimize
the discrepancy between predicted token labels and the actual target token labels.
Causal Decoder
The primary goal of an LLM is to predict the next token based on the input sequence.
While the encoder provides additional context, it has been observed that LLMs can
perform well even without an encoder, relying solely on the decoder. Similar to the
original encoder-decoder architecture’s decoder block, this decoder limits information
flow backward. In other words, the predicted token tk depends only on the tokens
preceding it, up to tk−1. This variant is widely utilized in state-of-the-art LLMs.
Prefix Decoder
Causal masked attention is justifiable in encoder-decoder architectures, where the
encoder can attend to all tokens in the sentence from any position using self-attention.
However, when the encoder is removed, and only the decoder is retained, this flexibility
in attention is lost. A variation in decoder-only architectures involves changing the mask
from strictly causal to fully visible on a portion of the input sequence. The Prefix decoder
is also referred to as the non-causal decoder architecture.
Pre-training Objectives
The pre-training objectives of large language models (LLMs) include the following:
73
Chapter 2 What Are Large Language Models?
Model Adaptation
This section provides an overview of the fundamental stages of large language models
(LLMs) adaptation, spanning from pre-training to fine-tuning for downstream tasks
and practical application. The term “alignment-tuning” is used to denote aligning with
human preferences, while the literature may occasionally employ the term “alignment”
for different purposes.
Pre-training
In the initial stage, the model undergoes self-supervised training on a large corpus,
predicting the next tokens based on the input. The design choices for LLMs encompass a
range of architectures, including encoder-decoder and decoder-only, employing diverse
building blocks and loss functions.
Fine-Tuning
Fine-tuning LLMs can be achieved through various approaches:
74
Chapter 2 What Are Large Language Models?
“An era of ChatGPT as a significant futuristic support tool: A study on features, abilities, and
11
75
Chapter 2 What Are Large Language Models?
Prompting/Utilization
Prompting is a method for querying trained LLMs to generate responses. LLMs can be
prompted in various setups, adapting to instructions either without fine-tuning or with
fine-tuning on data containing different prompt styles:
12
Adrian Tam, What Are Zero-Shot Prompting and Few-Shot Prompting, https://
machinelearningmastery.com/what-are-zero-shot-prompting-and-few-shot-prompting/
13
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu,
Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, Zhifang Sui, A Survey on In-context Learning,
https://arxiv.org/abs/2301.00234
14
Jie Huang and Kevin Chen-Chuan Chang, Towards Reasoning in Large Language Models: A
Survey, https://arxiv.org/abs/2212.10403
15
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc
Le, Denny Zhou, Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,
https://arxiv.org/abs/2201.11903
76
Chapter 2 What Are Large Language Models?
Training of LLMs
Developing large language models encompasses several essential steps crucial for their
successful training. The process typically initiates with the gathering and preprocessing
of an extensive volume of text data from diverse sources, including books, articles,
websites, and various textual corpora. The meticulously curated dataset forms the
cornerstone for the training of large language models (LLMs). The training procedure
itself revolves around unsupervised learning, wherein the model acquires the ability
to predict the subsequent word in a sequence based on the preceding context. This
particular task is commonly known as language modeling.
LLMs employ advanced neural network architectures, such as transformers,
enabling them to capture intricate patterns and dependencies within language. The
primary training objective is to optimize the model’s parameters, aiming to maximize the
likelihood of generating the correct next word within a given context. This optimization
is typically achieved through an algorithm known as stochastic gradient descent (SGD)
or its variants, coupled with backpropagation, which iteratively computes gradients to
update the model’s parameters.
16
Humza Naveeda, Asad Ullah Khana, Shi Qiub, Muhammad Saqibc, Saeed Anwar, Muhammad
Usman, Naveed Akhtarg, Nick Barnesh, Ajmal Miani, A Comprehensive Overview of Large
Language Models
17
Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng
Zhang, Di Zhang, Kun Gai, Parrot: Enhancing Multi-Turn Instruction Following for Large
Language Models
77
Chapter 2 What Are Large Language Models?
General Purpose
LLMs stand out as versatile tools capable of handling a diverse array of tasks, even those
beyond their specific training. Their proficiency lies in their capacity to comprehend,
produce, and modify text in ways that are contextually appropriate and human-like.
This versatility enables them to undertake a range of functions, from straightforward
language translation and answering questions to more intricate activities like text
summarization, creative writing, and programming assistance. LLMs’ adaptability
extends to mimicking the style and tone of the text they are working with, resulting in
outputs that are both user-centric and context-sensitive.
78
Chapter 2 What Are Large Language Models?
In everyday scenarios, LLMs find utility as virtual personal assistants, aiding in tasks
such as composing emails or organizing meetings. They are also increasingly employed
in customer service, handling routine inquiries and thereby allowing human staff to
focus on more complex matters. Additionally, LLMs are being used in content creation
for digital platforms, generating human-like text based on specified prompts. Another
significant role of LLMs is in data analysis, where they can sift through and summarize
large volumes of text data, identifying patterns and key points more rapidly than human
analysts.
Despite their broad utility, it’s crucial to acknowledge that LLMs, like any AI
technology, are limited by the quality of their training data. As such, they need to be
utilized cautiously, given that they may inadvertently reflect biases present in their
training data, leading to potentially skewed or inaccurate outcomes.
Medical Applications
In healthcare, LLMs such as ChatGPT have shown remarkable promise across various
medical applications. They have been effectively used in medical education, radiology
decision-making, clinical genetics, and patient care, as evidenced by numerous studies.
In medical education, for instance, ChatGPT has become a valuable interactive learning
and problem-solving tool. Its performance on the United States Medical Licensing Exam
(USMLE) has been notably proficient, achieving scores that meet or surpass the passing
criteria, showcasing its deep understanding of medical knowledge without needing
specific training.
Studies point toward the future development of AI-driven clinical decision-making
tools. ChatGPT, for instance, has shown potential in radiological decision-making,
enhancing clinical workflows and the responsible use of radiology services. Research
by Kung and others has indicated that LLMs like ChatGPT could significantly improve
personalized, compassionate, and scalable healthcare delivery, aiding both in education
and clinical decision-making.
A study on clinical genetics found ChatGPT’s performance comparable to human
responses in answering genetics-related questions, excelling in memory-based queries
but less so in critical thinking tasks. This research also noted ChatGPT’s ability to provide
varied, plausible explanations for both correct and incorrect responses. Additionally,
research evaluating ChatGPT’s accuracy in life support and resuscitation questions
79
Chapter 2 What Are Large Language Models?
applications where LLMs provide more natural, human-like interactions. For instance,
they can handle complex customer queries, offer personalized recommendations, and
automate routine tasks, significantly improving efficiency and user experience.
Education
Recent discussions have highlighted the transformative role of artificial intelligence (AI)
in education, particularly concerning student assignments and examinations. Since
OpenAI introduced ChatGPT, there has been a noticeable shift in how students engage
with educational content, assignments, and coursework.
One significant benefit of incorporating ChatGPT and AI bots in educational settings
is the enhancement of assignment completion efficiency. ChatGPT, for instance, is adept
at producing high-quality answers across diverse prompts, thereby saving students
considerable time and effort in their academic work. Furthermore, AI bots have the
potential to streamline the grading process, lightening the load on educators while
affording them the opportunity to offer more comprehensive feedback to students.
Another key advantage of these AI tools is their ability to deliver tailored learning
experiences. AI bots can assess a student’s performance on past assignments and tests,
using this data to tailor future learning recommendations. This approach helps students
identify their academic strengths and weaknesses, allowing them to concentrate on
areas needing improvement.
Khan Academy, a renowned educational non-profit, has expressed interest in
leveraging ChatGPT through its AI chatbot, Khanmigo. This virtual tutor and classroom
assistant aims to enrich tutoring and coaching by facilitating direct interactions with
students. Such initiatives reflect the optimism surrounding AI’s potential in education,
challenging the misconception that its primary use is for cheating. AI technology, though
still evolving, is seen as promising for catering to diverse student needs.
Nevertheless, the use of ChatGPT and AI bots in education is not without potential
downsides. A primary concern is the risk of diminishing creativity and critical thinking
skills among students. Overreliance on AI for assignments and exams might hinder the
development of essential skills needed for independent problem-solving and critical
analysis.
81
Chapter 2 What Are Large Language Models?
Finance
Large language models (LLMs) are increasingly influential in the finance sector, offering
a range of applications from financial natural language processing (NLP) tasks to risk
analysis, algorithmic trading, market forecasting, and financial reporting. Models like
BloombergGPT, a 50-billion-parameter LLM trained on a vast and diverse financial
dataset, have significantly improved tasks like news categorization, entity identification,
and query answering in financial NLP. Leveraging the extensive financial data at its
disposal, this model significantly enhances customer service by effectively addressing
customer inquiries and providing top-tier financial advice.
82
Chapter 2 What Are Large Language Models?
Creative Arts
In the realm of creative arts, LLMs are tools for inspiration and creativity. They assist
artists, writers, and musicians in generating ideas, lyrics, scripts, and even entire
compositions. This collaborative process between AI and human creativity is spawning
new forms of art and entertainment.
83
Chapter 2 What Are Large Language Models?
for legal terminology, with a focus on enhancing factual accuracy and relevance. This
is achieved by integrating sentences from relevant case law into the LLM, enabling
it to produce more accurate and high-quality explanations with fewer factual errors.
Additionally, LLMs have been developed and trained with specific legal knowledge,
enabling them to engage in legal reasoning tasks and respond to legal queries effectively.
84
Chapter 2 What Are Large Language Models?
Accessibility Enhancements
LLMs significantly contribute to making technology more accessible. They can generate
real-time captions and descriptions for audio and video content, aiding those with
hearing or visual impairments. They also improve voice recognition software, making
technology more accessible for individuals with different speech patterns or accents.
85
Chapter 2 What Are Large Language Models?
Chatbots
Chatbots are increasingly common in customer service roles, adept at handling
inquiries, providing support, and resolving issues. Their applications extend to
entertainment, healthcare, and education sectors. These chatbots are often integrated
with LLMs to craft more advanced and interactive conversational experiences. For
instance, a chatbot might employ an LLM like ChatGPT to enhance the quality of its
textual responses. Well-known examples of such chatbots include ChatGPT, Google
Bard, and Microsoft Bing. An illustration of an educational interaction with ChatGPT is
provided as an example.
In terms of comparison, ChatGPT and Google Bard represent two of the leading
LLMs currently in use. Both are skilled in generating text, translating languages, creating
various forms of creative content, and providing informative responses to queries.
Despite their similarities, notable differences exist between these models. For instance,
ChatGPT is recognized for its creative capabilities, whereas Google Bard is known for
its authenticity. A detailed comparison of ChatGPT, Google Bard, and Microsoft Bing
Chatbots is presented, highlighting the unique features and capabilities of each.
86
Chapter 2 What Are Large Language Models?
LLM Agents
LLM agents are advanced AI systems built using large-scale language models like
OpenAI’s GPT-4. Think of them as highly intelligent virtual assistants that can
understand and generate human language, making them incredibly versatile for
various tasks.
LLM agents can grasp the meaning behind complex questions and instructions,
making them great at comprehending human language. They can write like humans,
drafting emails, creating reports, and even writing stories or articles.
These agents can chat with you, providing relevant responses and engaging in
meaningful dialogues. Perfect for customer service or virtual assistants. They can pull
information from vast databases and even use real-time data to give you up-to-date
answers.
LLM agents are also scalable. They can handle a ton of interactions at the same time,
making them ideal for businesses with high customer engagement.
They are able to take care of repetitive tasks, freeing up human workers for more
complex duties. By automating routine tasks, they can help save on operational costs,
which makes them cost-effective.
LLM Limitations
While large language models (LLMs) have contributed significantly to natural language
processing, they are accompanied by a range of shortcomings. This section outlines
various such limitations. These include biases in the data used for training, an
overdependence on superficial patterns, a lack of robust common sense, challenges in
reasoning and processing feedback, a need for extensive data and computing power, and
issues with generalizing their knowledge.
Further, LLMs struggle with interpretability, handling rare or unknown words,
grasping syntax and grammar intricacies, possessing domain-specific expertise, and
vulnerability to targeted misinformation attempts. Ethical concerns are also prominent,
along with difficulties in contextual language processing, emotion and sentiment
analysis, multilingual support, and memory constraints. Their creativity is limited,
as are their real-time processing, training, and maintenance affordability. Scalability,
causal understanding, multimodal input processing, attention span, transfer learning,
world knowledge beyond text, and human behavior and psychology comprehension are
additional limitations.
87
Chapter 2 What Are Large Language Models?
Bias
Bias in language models arises when their training data reflects existing prejudices.
As highlighted by Schramowski18 and colleagues, these models, although aimed at
emulating natural language, can inadvertently propagate biases, leading to unfair or
skewed outputs. This can spark critique in various sectors such as politics, law, and
society. The forms of bias include the following:
• User Interaction Bias: The input from users shapes the responses of
chatbots. If the input is consistently biased or prejudiced, the model
may learn and repeat these biases. For example, a model exposed to
frequent discriminatory queries against a certain group might start to
reflect these biases in its responses.
models contain human-like biases of what is right and wrong to do,” Nature Machine Intelligence,
vol. 4, no. 3, pp. 258–268, 2022
88
Chapter 2 What Are Large Language Models?
Understanding and addressing these biases is crucial for ensuring fairness and
accuracy in language model outputs.
Hallucinations
LLMs occasionally produce content that deviates from factual accuracy, a phenomenon
known as “information hallucination.”19 This typically occurs when the model attempts
to bridge gaps in its knowledge or context, drawing upon learned patterns rather than
actual data. Such instances can result in false or misleading information, which is
especially concerning in critical applications.
The underlying reasons for these hallucinations are the subject of ongoing
investigation. Current research indicates that the issue may stem from various aspects,
including the training process, the dataset used, and the model’s structure. LLMs
may have a tendency to generate more engaging or coherent content, inadvertently
increasing the likelihood of hallucinations.
Efforts to curb hallucinations have led to several strategies. One such method
involves adjusting the training regimen to discourage hallucinations, as seen in
techniques like “reality grounding.” Expanding the training dataset to be more
varied and extensive could also diminish the model’s tendency to make unfounded
assumptions.
Another avenue being explored is training models with data that is verifiable or
capable of being fact-checked. This approach aims to make the model more reliant on
factual information over assumptions. However, implementing this strategy requires a
meticulous selection of data and metrics.
T. McCoy, E. Pavlick, and T. Linzen, “Right for the wrong reasons: Diagnosing syntactic
19
heuristics in natural language inference,” in Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics (Florence, Italy), pp. 3428–3448, Association for
Computational Linguistics, July 2019
89
Chapter 2 What Are Large Language Models?
90
Chapter 2 What Are Large Language Models?
LLMs offer industries worldwide the ability to automate intellectual tasks, provide
round-the-clock customer engagement, and extract vital insights from significant data
and analytics. They support businesses in predicting market trends while minimizing the
time, resources, and expenses involved.
These models are tailored with industry-specific data, enabling them to deliver
precise and reliable outcomes in fields that demand intricate domain knowledge. LLMs,
when configured appropriately, can process vast text datasets and complex structures.
This capability provides organizations with more dependable insights, freeing up
resources to concentrate on other areas and amplify business growth opportunities.
91
Chapter 2 What Are Large Language Models?
92
Chapter 2 What Are Large Language Models?
• Impact on Art and Creativity: Beyond text, some LLMs are venturing
into artistic domains, aiding in music composition, visual arts, and
creative writing, thereby influencing the artistic landscape.
93
Chapter 2 What Are Large Language Models?
AI and large language models (LLMs) have revolutionized the way individuals and
organizations interact with digital technology. These advancements propel innovation
and automate processes across various industries, saving time and simultaneously
altering the decision-making processes and customer communication strategies of
professionals. They have reshaped specific industry domains, bolstering industrial
advancement and the potential for innovation. As research and development continue
to progress, AI-driven models are on track to emulate human speech and interaction
characteristics.
94
Chapter 2 What Are Large Language Models?
• Tailoring content for voice search queries, in line with the growing
trend of voice-based searches.
95
Chapter 2 What Are Large Language Models?
Content Moderation
In the realm of automated content regulation, large language models (LLMs) are
integral, adeptly spotting and sifting through user-generated material on various digital
platforms. Their capabilities include recognizing problematic elements such as offensive
language, hate speech, threats, misinformation, spam, and other forms of harmful or
inappropriate content, thereby fostering a safer digital environment. These models can
either mark such content for further inspection or autonomously eliminate it based on
established moderation policies.
It’s crucial to acknowledge that despite the substantial aid LLMs provide in content
regulation, the necessity for human supervision remains, especially for complex or
nuanced instances.
Emotion/Sentiment Analysis
LLMs are proficient in evaluating customer emotions through social media posts,
reviews, and feedback. This process enables businesses to understand customer
viewpoints and levels of satisfaction, an invaluable asset in monitoring social media and
managing brand image.
96
Chapter 2 What Are Large Language Models?
These models classify emotional tone into various categories like positive, negative,
or neutral and offer detailed emotion analysis by identifying varying intensities and
subtle expression differences. This leads to a deeper comprehension of the emotions
conveyed in texts.
Additionally, they are evolving in understanding context, capable of recognizing
sarcasm, irony, and other complex language forms. While there has been notable
progress, the area is still developing, with ongoing enhancements expected in the future.
Client Services
LLMs significantly contribute to customer service by improving and automating different
facets of client interactions.
These include the following:
Language Translation
LLMs have transformed language translation, offering effective and accurate translation
capabilities. Their key feature is real-time translation of spoken or written material,
invaluable in live conversations, international events, or instantaneous customer
support.
Moreover, LLMs can specialize in specific domains or industries to heighten
translation precision, including sector-specific terminology. Utilizing LLMs, businesses
can communicate effectively with international clientele, bridge language gaps, and
venture into new markets.
97
Chapter 2 What Are Large Language Models?
Virtual Teamwork
Incorporating LLMs in workplaces significantly boosts staff productivity and
collaboration. They are essential in virtual teamwork and streamlining routine
operations, capable of
98
Chapter 2 What Are Large Language Models?
Sales Enhancement
LLMs significantly aid sales teams by providing insights and support throughout the
sales cycle, including the following:
Fraud Identification
LLMs excel in textual analysis, pattern recognition, and anomaly detection, making them
effective in risk assessment and fraud prevention. Their real-time monitoring of data
streams like financial transactions and client interactions allows quick identification of
irregular or suspicious patterns, triggering immediate alerts for investigation.
Moreover, they assign risk scores to various transactions or activities based on a
broad range of data, aiding in fraud likelihood assessment.
The future scope and capabilities of AI remain uncertain, yet its potential for
innovation and progress seems boundless. The swift expansion of AI in business and
industry sectors indicates that we are just beginning to uncover its full potential.
As AI functions evolve to be quicker and more efficient, sectors like healthcare,
education, and financial services are set to flourish even more, providing reliable and
trustworthy care and services to patients, students, and clients globally. LLMs, offering
key support in data analysis and analytics, will lead to cost reductions as professionals
redirect their focus and efforts. This era is marked by thrilling technological
advancements, with both users and developers exploring the future direction of business
and technology.
99
Chapter 2 What Are Large Language Models?
Summary
This chapter delves into the concept of large language models (LLMs), exploring
their development, capabilities, and underlying technologies. It also describes the
architecture of LLMs, which includes various neural network layers such as recurrent
layers, feedforward layers, embedding layers, and attention layers. These layers work
together to process input text and generate output predictions.
The development of LLMs is attributed to advances in deep learning, substantial
computational resources, and the abundance of training data. These models, typically
pre-trained on extensive web corpora, can grasp intricate patterns, linguistic subtleties,
and semantic relationships. Fine-tuning these models for specific tasks has led to state-
of-the-art performance across various benchmarks.
In the following chapter, we’ll get deeper into
• Python functions
100
CHAPTER 3
Python at a Glance
Working with computers frequently leads to the realization that automating certain tasks
can be highly beneficial. For instance, one might want to automate the process of editing
multiple text files; other tasks could include creating a basic database, a unique GUI
application, building a simple game, or dealing with data science, machine learning, and
AI in your daily work.
Professional software developers often face the need to interact with various C/C++/
Java libraries, but the standard cycle of writing, compiling, testing, and re-compiling can
be inefficient. Writing test suites for such libraries and adding an extension language to a
program are other common tasks where automation and simplicity are desired. In these
scenarios, Python emerges as an ideal solution.
Python, on the other hand, is not only easier to use but is also available across
various operating systems including Windows, Mac OS X, and Unix. Its simplicity doesn’t
take away from its capabilities as a real programming language, offering more structure
and support for larger programs than what is possible with shell scripts or batch files.
101
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_3
Chapter 3 Python for LLMs
Python’s modularity allows for the reuse of code in other programs, enhancing
efficiency. It comes with a vast array of standard modules that can serve as a foundation
for your programs or as examples to learn Python programming. These modules cover
a wide range of functionalities, including file I/O, system calls, sockets, and interfaces to
GUI toolkits like Tk.
Being an interpreted language, Python cuts down on development time, as
it doesn’t require compilation and linking. Its interactive nature facilitates easy
experimentation with language features, writing of temporary programs, or testing
functions in a bottom-up approach to program development. Python also serves as a
convenient desktop calculator.
Python’s ability to enable compact and readable programs is one of its key
strengths. Programs in Python are typically shorter than their equivalents in C, C++,
or Java. This brevity is due to several factors: high-level data types allow for complex
operations in single statements, indentation rather than brackets for statement grouping,
and no need for declaring variables or arguments.
102
Chapter 3 Python for LLMs
Interestingly, Python’s name is derived from the BBC show “Monty Python’s Flying
Circus,” and it has no association with snakes. In fact, incorporating references to Monty
Python’s sketches in documentation is not just permitted but actively encouraged.
Zen of Python
• Beautiful is better than ugly.
• Readability counts.
• There should be one – and preferably only one – obvious way to do it.
• Although that way may not be obvious at first unless you’re Dutch.
1
“PEP 20 - The Zen of Python”. Python Software Foundation. 2004-08-23
103
Chapter 3 Python for LLMs
Python Identifiers
In Python, an identifier refers to a name given to entities like variables, functions, classes,
modules, or other objects. It begins with a letter (A–Z or a–z) or an underscore (_),
followed by any combination of letters, underscores, and digits (0–9). Python identifiers
cannot include special characters such as @, $, or %.
Python is case-sensitive, meaning identifiers like “Hello” and “hello” are considered
distinct.
Python follows specific naming conventions for identifiers:
• Identifiers that start and end with two underscores are reserved for
special names defined by the language.
Table 3-1 lists 35 keywords or reserved words that are not available for use as
identifiers.
104
Chapter 3 Python for LLMs
Python Indentation
Python employs whitespace for defining blocks of control flow, adhering to the off-side
rule. This characteristic of using indentation rather than punctuation or keywords to
signify a block’s extent is inherited from its forerunner, ABC.
In computer programming, a language is described as following the off-side
rule syntax if the structure of its blocks is determined by indentation. This term was
introduced by Peter Landin and is thought to be a play on words relating to the offside
rule in soccer. This concept differs from free-form languages, especially those using
curly brackets, where indentation doesn’t carry computational significance and is purely
a matter of coding style and format. Languages that apply the off-side rule are often
characterized by their significant indentation.
Contrastingly, in “free-format” languages that evolve from the ALGOL block
structure, code blocks are demarcated using braces ({ }) or specific keywords. Typically,
in these languages, coders indent the contents of a block as a conventional practice to
visually distinguish it from the adjacent code.
Example:
def greetings():
print("hello world")
for i in range(10)
print(i)
105
Chapter 3 Python for LLMs
total = one + \
two + \
three
Statements enclosed within square brackets [], curly braces {}, or parentheses () do
not require the line continuation character. For instance, the following statement is valid
in Python:
Example:
Quotations in Python
In Python, string literals can be represented using single (’), double (”), or triple
(’’’ or ” ” ”) quotes, provided that the same type of quote is used to both start and end
the string.
Triple quotes, on the other hand, are employed to encompass strings that span
multiple lines. For instance, the following examples are all valid in Python:
Example:
name = 'Alice'
print(f"My name is {name}.")
106
Chapter 3 Python for LLMs
print(story)
Output:
My name is Alice.
Shakespeare once said, 'To be or not to be, that is the question.'
Once upon a time,
there was a brave knight
who embarked on a grand adventure.
Comments in Python
In Python, comments serve as programmer-readable explanations or annotations within
the source code. They are included to enhance the code’s comprehensibility for human
readers and are disregarded by the Python interpreter.
Python, like many contemporary programming languages, supports both single-
line (end-of-line) and multiline (block) comments. The structure of Python comments
closely resembles those found in languages such as PHP, BASH, and Perl.
To create a comment in Python, you use the hash sign (#). Any hash sign that is
not within a string literal marks the beginning of a comment. Everything following
the # symbol up to the end of the physical line is considered part of the comment, and
Python’s interpreter disregards it.
# First comment
print ("Hello, World!") # Second comment
You can type a comment on the same line after a statement or expression:
# Comment 1
# Comment 2
# Comment 3
107
Chapter 3 Python for LLMs
The Python interpreter will also ignore the triple-quoted string that follows,
allowing it to serve as a multiline comment:
'''
This is my first multiline
comment.
'''
x = 5; y = 10; z = x + y; print(z)
• Windows
• macOS
• Linux
108
Chapter 3 Python for LLMs
109
Chapter 3 Python for LLMs
For a customized installation where you can modify parameters such as the
installation directory or select specific components, click “Customize installation”
(Figure 3-2).
110
Chapter 3 Python for LLMs
• pip: This option installs pip, enabling you to install additional Python
packages as needed.
• Tkinter and IDLE (tcl/tk and IDLE): This option installs tkinter
and IDLE.
• Python test suite: Selecting this option installs the standard library
test suite, helpful for testing your code.
Once you’ve made your selections, click “Next.” A new dialog box with advanced
options will appear, offering further choices as shown in Figure 3-3.
111
Chapter 3 Python for LLMs
Ensure that the chosen installation directory is correct, then proceed to the next step.
Step 4: Commence Python Installation
After configuring your preferred installation settings, click “Install” to initiate the
installation procedure. The installer will copy essential files to your computer and
configure Python. This process may take a few minutes.
Step 5: Confirm the Installation
Once the installation concludes, confirm that Python has been successfully installed
by opening the Command Prompt (search for “cmd” in the Start menu) and entering the
following command:
python --version
python --version
Press “Enter,” and the version number will appear in the output, displayed as follows:
Python 2.7.x
If Python 3.x is already installed on your system, you can ascertain its version by
executing the subsequent command:
python3 --version
Step 2: Visit the Python Site and Download the macOS installer
Navigate to the download page, where you’ll find the macOS installer package in the
form of a .pkg file, corresponding to the latest Python release. Proceed to download the
installer to your computer.
Locate the installer file you’ve downloaded, typically located in your Downloads
folder, and double-click it to initiate the installation procedure. Subsequently, follow the
on-screen instructions as prompted.
112
Chapter 3 Python for LLMs
print("Test message")
Press “Enter,” and you should observe the text “Test message” appearing on the
subsequent line within the IDLE environment. You can also confirm the installation
through the Terminal application. Open Terminal and input the following com mand:
python3 --version
Press “Enter,” and you should see the version of Python that you recently installed
displayed in the output. This serves as confirmation that Python has been successfully
installed on your Mac.
python --version
These commands will help you effortlessly install Python on your Linux distribution
using the respective Package Manager.
113
Chapter 3 Python for LLMs
To write your first program, open it and type the following line of code, then
hit Enter:
You can activate the Python interpreter by opening your operating system’s terminal
application and typing “python3”. This command will launch the Python interpreter in
your terminal, allowing you to interact with Python directly from the command line.
Python offers versatility in its usage, allowing you to utilize it in two primary modes:
interactive mode and scripting mode. The Python program you’ve installed inherently
functions as an interpreter. An interpreter processes text commands and executes them
as you input them, making it particularly convenient for experimentation and rapid
testing.
user_name = "John"
114
Chapter 3 Python for LLMs
Once you’ve defined the "user_name" variable, your program will allocate a
specific portion of your computer’s memory to hold this data. You can subsequently
access and modify this data by referring to it with its age, "user_age". When introducing
a new variable, you must provide it with an initial value. In this instance, we’ve assigned
it the value “John”. It’s important to note that you can always alter this value within
your program later on.
Moreover, you can define multiple variables in a single step. To do this, simply write
Naming a Variable
In Python, variable names must adhere to certain rules and conventions:
115
Chapter 3 Python for LLMs
• Pascal Case: Identical to Camel Case, except the first word is also
capitalized. Example: NumberOfUsers.
Ultimately, adhering to these rules and conventions ensures that your variable
names are both valid and readable within the Python programming language.
Using underscores to separate words in variable names, also known as snake_case,
is more widely used as the naming convention for Python variables. This convention is
recommended by the official Python style guide, PEP 8, which is widely followed by the
Python community. Snake_case is considered more Pythonic and is the preferred style
for variable names in most Python projects and libraries.
Data Types
Numbers in Python
There are three numeric types in Python:
• int
• float
• complex
x = 10 # int
y = 2.8 # float
z = 3 + 2j # complex
Integers
Integers represent whole numbers, whether positive or negative, without any decimal
component. For instance, in the field of neuroscience, researchers might use integers to
tally the count of active neurons at a specific moment.
116
Chapter 3 Python for LLMs
Floats
Floats, also known as floating-point real values, encompass real numbers that can
include decimal points. For instance, recording a participant’s reaction time in
milliseconds for a specific task would involve using floating-point numbers.
Complex
In Python, complex numbers are a built-in data type used to represent numbers in
the form “a + bj,” where “a” and “b” are real numbers and “j” represents the imaginary
unit, equal to the square root of -1. Complex numbers are often used in mathematical
and scientific computations where real and imaginary parts need to be handled
simultaneously.
You can perform various operations on complex numbers in Python, including
addition, subtraction, multiplication, division, and more. Python’s standard library
provides functions and methods for working with complex numbers.
Note Numeric Variables
They are automatically generated when you assign values to them, making it
straightforward in regular coding. Therefore, you can assign values to variables
without concerning yourself with specifying their data type.
Strings
A string is a collection of characters, which can include letters, numbers, symbols, or
spaces, enclosed within either “single” or “double” quotation marks in Python. It’s
important to note that strings in Python are immutable, which means once a string is
created, you cannot modify its contents directly; instead, you can only overwrite it by
redefining the variable.
117
Chapter 3 Python for LLMs
You can display strings on the screen using the print function. In Python, strings
are sequences of bytes, and you can access a specific character within a string using
indexing or in Python called “slicing,” as demonstrated in the code example:
118
Chapter 3 Python for LLMs
To include either type of quotation mark within the string, a simple and effective
method is to enclose the string with the opposite type of quotation marks. If you wish to
incorporate a single quote, surround the string with double quotes, and conversely,
if you need to include a double quote, enclose the string within single quotes. This
approach ensures that the desired quotation mark is correctly interpreted as part of
the string.
To achieve this, you can use the backslash (\) character. When a backslash appears
in a string, it signals that one or more characters immediately following it should be
treated in a special manner. This mechanism is known as an “escape sequence” because
the backslash causes the subsequent character sequence to deviate from its standard
interpretation.
119
Chapter 3 Python for LLMs
RAW Strings
To denote a raw string literal, you use the prefix “r” or “R,” indicating that escape
sequences within the string will remain unaltered. This means the backslash character
will not be interpreted as an escape character:
Raw String:
print(r'foo\nbar')
foo\nbar
Raw String:
print(R'foo\\bar')
foo\\bar
In a raw string, backslashes are not treated as escape characters, preserving their
literal representation.
In triple-quoted strings, the inclusion of single and double quotes does not
necessitate escape characters, making it a convenient choice for such scenarios.
120
Chapter 3 Python for LLMs
• True
• False
No other value within Python assumes the bool type. You can ascertain the type of
True and False by employing the built-in “type()” function:
>>> type(False)
<class 'bool'>
>>> type(True)
<class 'bool'>
The output <class ‘bool’> indicates the variable is a Boolean data type. It’s worth
noting that the “bool” type is an intrinsic component of Python, eliminating the need for
importing external libraries. However, the name “bool” itself is not a reserved keyword in
the Python language.
>>> x = true
Traceback (most recent call last):
File "<input>", line 1, in <module>
NameError: name 'true' is not defined
121
Chapter 3 Python for LLMs
zero_int = 0
bool(zero_int)
# Output: False
pos_int = 1
bool(pos_int)
# Output: True
neg_flt = -5.1
bool(neg_flt)
# Output: True
Boolean Operators
Boolean arithmetic revolves around the manipulation and combination of true and false
logic values. In Python, boolean values are represented as either True or False, and you
can perform operations on them using boolean operators.
The commonly used boolean operators in Python are as follows:
• or
• and
• not
• == (equivalent)
• != (not equivalent)
122
Chapter 3 Python for LLMs
In the following code segment, two variables, A and B, are assigned boolean values
of True and False, respectively. Subsequently, these boolean values are subjected to
various operations using boolean operators:
A = True
B = False
C = False
A or (C and B) # Result: True
(A and B) or C # Result: False
T F F T F T T F
F T T F F T T F
T T F F T F T T
F F T T T F F F
Python Operators
Operators are specialized symbols employed to execute operations on both values and
variables. They encompass a set of unique symbols dedicated to carrying out arithmetic
and logical computations. The item upon which the operator acts is referred to as the
operand.
123
Chapter 3 Python for LLMs
Arithmetic Operators
In Table 3-3, Python arithmetic operators serve the purpose of executing mathematical
operations, including addition, subtraction, multiplication, and division.
Example:
>>> a = 3
>>> b = 2
>>> print(a + b)
5
>>> print(a - b)
1
>>> print(a * b)
6
>>> print(a % b)
1
>>> print(a ** b)
9
• Float division
• Floor division
124
Chapter 3 Python for LLMs
>>> print(5/5)
>>> print(10/2)
>>> print(-10/2)
>>> print(20.0/2)
Output:
1.0
5.0
-5.0
10.0
>>> print(10//3)
>>> print(-5//2)
>>> print(5.0//2)
>>> print(-5.0//2)
Output:
3
-3
2.0
-3.0
Comparison Operators
Table 3-4 shows a comparison using relational operators involving evaluating values
and produces a result of either True or False, depending on whether the condition is
met or not.
125
Chapter 3 Python for LLMs
> Greater than: True if the left operand is greater than the right x>y
< Less than: True if the left operand is less than the right x<y
== Equal to: True if both operands are equal x == y
!= Not equal to: True if operands are not equal x != y
>= Greater than or equal to True if the left operand is greater than or equal x >= y
to the right
<= Less than or equal to True if the left operand is less than or equal to the x <= y
right
Example:
>>> a = 5
>>> b = 3
>>> print(a > b)
True
>>> print(a < b)
False
>>> print(a == b)
False
>>> print(a != b)
True
>>> print(a >= b)
True
>>> print(a <= b)
False
Logical Operators
Within the realm of Python, logical operators come into play when handling conditional
statements, which typically revolve around either True or False outcomes, presented in
Table 3-5. These operators are responsible for executing logical AND, logical OR, and
logical NOT operations.
126
Chapter 3 Python for LLMs
And Returns True if both the operands are true x and y x>5 and x>7
Or Returns True if either of the operands is true x or y x<7 or x>21
Not Returns True if the operand is false not x not(x>11 and x> 21)
Example:
>>> a = True
>>> b = False
>>> print(a and b)
False
>>> print(a or b)
True
>>> print(not a)
False
Bitwise Operators
Python’s bitwise operators, presented in Table 3-6, function at the level of individual bits
and execute operations that involve the manipulation of bits themselves. They find their
utility in working with binary numbers.
127
Chapter 3 Python for LLMs
Example:
>>> a = 10
>>> b = 4
>>> print(a & b)
0
>>> print(a | b)
14
>>> print(~a)
-11
>>> print(a ^ b)
14
>>> print(a >> 2)
2
>>> print(a << 2)
40
Assignment Operators
Assignment operators in Python, represented in Table 3-7, serve the purpose of assigning
values to variables.
128
Chapter 3 Python for LLMs
= Assign the value of the right side of the expression to the left- x=y+z
side operand
+= Add AND: Add right-side operand with left-side operand and then a+=b a=a+b
assign to left operand
-= Subtract AND: Subtract right operand from left operand and then a-=b a=a-b
assign to left operand
*= Multiply AND: Multiply right operand with left operand and then a*=b a=a*b
assign to left operand
/= Divide AND: Divide left operand with right operand and then a/=b a=a/b
assign to left operand
%= Modulus AND: Take modulus using left and right operands and a%=b a=a%b
assign the result to left operand
//= Divide(floor) AND: Divide left operand with right operand and then a//=b a=a//b
assign the value(floor) to left operand
**= Exponent AND: Calculate exponent(raise power) value using a**=b a=a**b
operands and assign value to left operand
&= Perform Bitwise AND on operands and assign value to left a&=b a=a&b
operand
|= Perform Bitwise OR on operands and assign value to left operand a|=b a=a|b
^= Perform Bitwise xOR on operands and assign value to left a^=b a=a^b
operand
>>= Perform Bitwise right shift on operands and assign value to left a>>=b
operand a=a>>b
<<= Perform Bitwise left shift on operands and assign value to left a <<= b
operand a= a << b
Example:
>>> a = 10
>>> b = a
129
Chapter 3 Python for LLMs
>>> print(b)
10
>>> b += a
>>> print(b)
20
>>> b -= a
>>> print(b)
10
>>> b *= a
>>> print(b)
100
>>> b <<= a
>>> print(b)
102400
Identity Operators
In Python, “is” and “is not” are identity operators employed to verify whether two values
occupy the same memory location as shown in Table 3-8. It’s important to note that
equality between two variables does not necessarily imply their identity.
Example:
>>> a = 10
>>> b = 20
>>> c = a
>>> print(a is not b)
True
>>> print(a is c)
True
130
Chapter 3 Python for LLMs
Membership Operators
Within Python, the “in” and “not in” operators are categorized as membership operators
(Table 3-9), and their primary function is to assess whether a particular value or variable
exists within a given sequence.
Example:
>>> x = 21
>>> y = 10
>>> list = [10, 20, 30, 40, 50]
>>> if (x not in list):
... print("x is NOT present in given list")
>>> else:
... print("x is present in given list")
>>> x is NOT present in given list
>>> if (y in list):
... print("y is present in given list")
>>> else:
... print("y is NOT present in given list")
>>> y is present in given list
Ternary Operator
A Ternary operator, also referred to as conditional expressions, is an operator designed
to assess a condition as either true or false. It was introduced to Python starting from
version 2.5. It is a concise way to evaluate a condition in a single line, thus replacing the
need for a multiline if-else statement and resulting in more compact code.
Syntax: [on_true] if [expression] else [on_false]
131
Chapter 3 Python for LLMs
Example:
>>> a, b = 10, 20
>>> min = a if a < b else b
>>> print(min)
>>> 10
Conditionals
In the examples up to this point, you’ve accumulated a substantial amount of Python
programming knowledge, focusing on code that executes in a linear sequence. This
means each command is processed one after the other, in the precise sequence they are
laid out.
However, real-world scenarios often demand more flexibility. Programs may need
to bypass certain instructions, repeatedly execute a block of statements, or choose from
among different sets of instructions depending on various conditions.
This is where the concept of control structures becomes crucial. Control structures
are instrumental in guiding the flow of execution within a program, also known as its
control flow.
132
Chapter 3 Python for LLMs
Within the context of Python, the “if” statement is the fundamental mechanism
for making decisions. It enables the conditional execution of a single statement or a
collection of statements, contingent on the evaluation of an expression.
Example:
if <expr>:
<statement>
• It’s important to note the necessity of the colon (“:”) after “<expr>”.
Unlike some languages that mandate the encapsulation of “<expr>”
within parentheses, Python has no such requirement.
Example:
>>> x = 0
>>> y = 5
>>> if x < y: # Truthy
... print('yes')
yes
Grouping Statements
As mentioned earlier in this chapter, Python employs a programming principle known
as the off-side rule. This rule is characterized using indentation to demarcate blocks of
code. Python belongs to a relatively exclusive group of languages that implement the
off-side rule.
133
Chapter 3 Python for LLMs
The role of indentation is not merely stylistic but carries functional significance
in Python code. The reason behind this is now clear: indentation serves to delineate
compound statements or blocks within the code. Therefore, in Python, lines of code
that share the same level of indentation are treated as part of the same block.
Consequently, the structure of a compound “if” statement in Python is defined
through indentation:
1| if <expr>:
2| <statement>
3| <statement>
4| <statement>
...
5| <following_statement>
In this case, statements that share the same indentation level (from lines 2 to 4)
are grouped together as a single block. If <expr> evaluates to true, the entire block
is executed; if <expr> is false, the block is bypassed. After processing this block, the
program continues with <following_statement>.
Nested Blocks
Blocks within Python can be nested to any level of depth, where each additional
indentation level signifies the beginning of a new block and each decrease in indentation
marks the end of the current block. This hierarchical arrangement of code blocks creates
a structure that is simple, uniform, and easy to understand.
Example:
age = 20
has_license = True
134
Chapter 3 Python for LLMs
if <expr>:
<statement(s)>
else:
<statement(s)>
When <expr> evaluates to true, the program executes the initial set of statements
and bypasses the second set. Conversely, if <expr> is false, the program skips the first
set and proceeds with the execution of the second set. Following the completion of
these conditional branches, the program’s flow continues beyond the second set of
statements.
Example:
temperature = 30
135
Chapter 3 Python for LLMs
One-Line if Statements
Conventionally, “<expr>” in an “if” statement is written on one line, with the
“<statement>” indented beneath it on the next line. However, it’s also acceptable to
construct the entire “if” statement on a single line, achieving the same functionality.
Syntax:
if <expr>: <statement>
Loops can include various control statements to modify their execution flow, such
as “break” to exit the loop prematurely, “continue” to skip the current iteration and
proceed to the next one, or “else” to execute a block of code once the loop condition is
no longer true (applies to Python).
while expression:
statement(s)
Like with the if conditionals, Python groups statements into a single block of code
based on the uniform indentation level following a programming construct. The same
number of spaces used to indent statements determines their inclusion in the same
code block.
Example:
print("Loop finished")
Output:
Counter is 0
Counter is 1
Counter is 2
137
Chapter 3 Python for LLMs
Counter is 3
Counter is 4
Loop finished
In this example, the while loop executes as long as count is less than 3. It prints the
current value of count and then increments count by 1 each time through the loop.
When count reaches 3, the loop condition count < 3 becomes false, causing the loop to
exit. At this point, the else block is executed, printing “Count is no longer less than 3”.
The else part of a while loop runs when the loop condition becomes false naturally,
meaning it wasn’t exited through a break statement.
138
Chapter 3 Python for LLMs
Example:
n = 5
for i in range(0, n):
print(i)
Output:
0
1
2
3
4
139
Chapter 3 Python for LLMs
for i in range(3):
print(f"i is {i}")
else:
print("Loop completed without break")
In this example, the for loop iterates over a sequence generated by range(3), which
produces the numbers 0, 1, and 2. For each iteration, the value of i is printed out. Once
the loop has iterated over all items in the sequence (i.e., after printing 0, 1, and 2), the
loop naturally concludes, and control passes to the else block. The else block then
executes, printing “Loop completed without break”.
While loop:
while expression:
while expression:
statement(s)
statement(s)
140
Chapter 3 Python for LLMs
Example:
# Outer loop
for i in range(3): # Will iterate over 0, 1, 2
# Inner loop
for j in range(2): # Will iterate over 0, 1
print(f"i = {i}, j = {j}")
In this example, there are two loops: an outer loop and an inner loop. The outer loop
iterates through a range of numbers from 0 to 2 (inclusive), and for each iteration of the
outer loop, the inner loop iterates through a range of numbers from 0 to 1 (inclusive).
The outer loop starts with i = 0. Then, the inner loop begins its execution, iterating
with j taking values 0 and then 1. For each iteration of the inner loop, it prints the current
values of i and j.
After the inner loop completes its iterations for j = 0 and j = 1, control returns to the
outer loop, incrementing i to the next value.
This process repeats until the outer loop has completed all its iterations (for i = 0,
i = 1, and i = 2).
The result is a series of prints showing each combination of i and j, demonstrating
how nested loops can be used to generate or iterate over a Cartesian product of two
ranges. This pattern is commonly used in scenarios requiring iteration over multiple
dimensions, such as processing the cells in a 2D matrix or grid.
Output:
i = 0, j = 0
i = 0, j = 1
i = 1, j = 0
i = 1, j = 1
i = 2, j = 0
i = 2, j = 1
141
Chapter 3 Python for LLMs
if num % 2 == 0:
continue # Skip the rest of the loop for even numbers
if num == 5:
pass # Do nothing for num == 5, placeholder for future code
if num == 7:
break # Exit the loop when num is 7
print(num)
142
Chapter 3 Python for LLMs
Each data structure has distinct characteristics and use cases, making Python a
powerful and flexible language for managing data.
143
Chapter 3 Python for LLMs
Creating Lists
You can create a list by enclosing your elements within square brackets. Leaving the
brackets empty will result in an empty list.
Example:
Output:
Adding Elements
To add elements to a list, Python provides methods such as the following:
• The append() method adds its argument as a single element to the
end of a list.
144
Chapter 3 Python for LLMs
Example:
Output:
Deleting Elements
Elements can be removed from a list using several techniques:
Example:
145
Chapter 3 Python for LLMs
Output:
Accessing Elements
Accessing elements in a list is straightforward and similar to accessing characters in
a string; you use the index to obtain the desired element.
Example:
In this example:
146
Chapter 3 Python for LLMs
• index() searches for a given value and returns the index of its first
occurrence.
• Both sorted() and sort() are used for sorting lists. sorted() returns
a new sorted list, leaving the original list unchanged, while sort()
sorts the list in place.
Example:
Output:
6
3
2
[2, 5, 8, 10, 10, 25]
[2, 5, 8, 10, 10, 25]
147
Chapter 3 Python for LLMs
Understanding these fundamentals of lists will enhance your ability to manage and
manipulate data effectively in Python.
Dictionaries in Python
Dictionaries in Python are collections that store data as key-value pairs, akin to a phone
directory where each name (key) is associated with a phone number (value). Keys are
unique identifiers that map to values, allowing for efficient data retrieval. This structure
is akin to looking up a name in a phone book to find the corresponding number.
Creating a Dictionary
You can create dictionaries by enclosing key-value pairs within curly braces {}, or
by using the dict() constructor. Each key-value pair is added to the dictionary in
this manner.
Example:
Output:
Person Information: {'name': 'John Doe', 'age': 30, 'city': 'New York'}
Empty Dictionary: {}
148
Chapter 3 Python for LLMs
Example:
# Initial dictionary
car = {"make": "Ford", "model": "Mustang", "year": 1964}
Output:
• The popitem() method removes and returns the last key-value pair
as a tuple.
• The clear() method empties the entire dictionary, removing all its
contents.
Example:
# Initial dictionary
book = {"title": "The Great Gatsby", "author": "F. Scott Fitzgerald",
"year": 1925}
Output:
Accessing Elements
Elements are accessed through their keys. You can retrieve a value by referencing its
key directly or by using the get() method, which returns the value associated with a
given key.
Example:
# Creating a dictionary
student_info = {
"name": "Alice",
"age": 25,
"grade": "A"
}
print("Name:", name)
print("Age:", age)
150
Chapter 3 Python for LLMs
print("Grade:", grade)
Output:
Name: Alice
Age: 25
Grade: A
Other Functions
Dictionaries offer several methods for interacting with the data they contain:
This overview introduces the basic operations and methods available for dictionaries
in Python, showcasing their flexibility and power in organizing and accessing data.
Example:
# Creating a dictionary
student_info = {
"name": "Alice",
"age": 25,
"grade": "A"
}
151
Chapter 3 Python for LLMs
Output:
Tuples in Python
Tuples in Python resemble lists in many ways, except for one crucial difference:
once data is added to a tuple, it cannot be altered or modified. There is, however, an
exception: when the data contained within a tuple is mutable, it can be changed.
Creating a Tuple
You can create a tuple using parentheses () or by using the tuple() function.
Example:
152
Chapter 3 Python for LLMs
Output:
Accessing Elements
Accessing elements in a tuple is identical to accessing values in lists.
Example:
# Creating a tuple
fruits_tuple = ("Apple", "Banana", "Cherry", "Date")
Output:
Appending Elements
To append values to a tuple, you can use the + operator, which allows you to concatenate
another tuple onto it.
153
Chapter 3 Python for LLMs
Example:
Output:
Tuple 1: (1, 2, 3)
Tuple 2: (4, 5, 6)
Appended Tuple: (1, 2, 3, 4, 5, 6)
Other Functions
Functions available for tuples are similar to those for lists, as tuples share many
characteristics with lists in terms of data access and manipulation.
Example:
# Finding the index of a specific element (the modified list) in the tuple
index_element = my_tuple.index(['english', 'python'])
154
Chapter 3 Python for LLMs
Output:
Sets in Python
Sets in Python are collections of unique and unordered elements. This means that even
if data is repeated multiple times, it will be included in the set only once, similar to
mathematical sets. Sets support operations akin to those used in arithmetic sets.
Creating a Set
To create a set in Python, you use curly braces “{}”. Unlike dictionaries, you only provide
values, not key-value pairs.
Example:
Output:
My Set: {1, 2, 3, 4, 5}
Adding Elements
To add elements to a set, you can use the “add()” function and pass the value you
want to add.
Example:
155
Chapter 3 Python for LLMs
Output:
Operations on Sets
Various set operations like union and intersection are demonstrated here. These
operations allow you to manipulate and combine sets as needed.
Example:
# Performing set operations and comparing them using union and | operator
union_result = my_set.union(my_set_2)
union_operator_result = my_set | my_set_2
# Performing set operations and comparing them using intersection and &
operator
intersection_result = my_set.intersection(my_set_2)
intersection_operator_result = my_set & my_set_2
156
Chapter 3 Python for LLMs
Output:
In this example:
• Union: The union() function combines the data from both sets,
creating a new set that contains all unique elements from both sets.
• Intersection: The intersection() function finds the data that
is common to both sets. It returns a new set containing only the
elements that exist in both sets.
157
Chapter 3 Python for LLMs
These functions allow you to perform various set operations and manipulate sets to
get specific subsets of data as needed.
158
Chapter 3 Python for LLMs
Example:
Output:
• Exit the function at a certain point before the end of its block of code
• Pass control back to the point in the program where the function was
called, optionally passing data back to that point
159
Chapter 3 Python for LLMs
• return sends a value back to the caller and exits the function.
It allows the function to output a result, which can then be used
elsewhere in your program. The returned value can be stored in a
variable, used in expressions, or passed to other functions.
In short, use “return” when you want to output a value from a function and use it
further, and use “print” when you want to display something to the console without
affecting the function’s output to its caller.
function_name(arguments)
160
Chapter 3 Python for LLMs
For example:
result = add_numbers(5, 3)
This executes the “add_numbers” function with “5” and “3” as its arguments and
stores the return value in the variable “result”. If the function does not require any
arguments, you still need to use parentheses, but leave them empty:
function_name()
This syntax is straightforward and is the same for both built-in functions and user-
defined functions.
Positional Arguments
These are the most common type of arguments, where the order in which the arguments
are passed matters. The number of positional arguments in the call must match the
number expected by the function definition.
Example:
Output:
161
Chapter 3 Python for LLMs
Keyword Arguments
These are arguments passed to a function by explicitly specifying the name of the
parameter and the value. Keyword arguments can be listed in any order when calling the
function. This makes your code more readable and allows you to call functions with the
parameters in a different order than they were defined.
Example:
Default Arguments
A function can have default values for parameters. If the caller does not provide a value
for such an argument, the function uses the default value. Default arguments make
certain parameters optional.
Example:
162
Chapter 3 Python for LLMs
Example:
This syntax allows the lambda function to take any number of arguments, but it can
only have one expression. The expression is evaluated and returned when the lambda
function is called.
Characteristics of Lambda Functions
• Single Expression: Unlike standard functions defined with “def”,
which can consist of multiple expressions and statements, a lambda
function is limited to a single expression.
163
Chapter 3 Python for LLMs
Example:
square = lambda x: x * x
print(square(5)) # Output: 25
Output:
25
And here’s an example of using a lambda function with the “filter()” function to filter
out even numbers from a list:
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print(even_numbers) # Output: [2, 4, 6]
Output:
[2,4,6]
Lambda functions are particularly useful for simple operations that can be expressed
in a single line. However, for complex operations, it’s recommended to use named
functions for the sake of clarity and readability.
Summary
Python is a general-purpose, high-level programming language, an ideal choice for
scripting and rapid application development across various domains and platforms.
In this chapter, we reviewed Python’s syntax and semantics and noted its similarity to
other programming languages while highlighting its unique features. In the next chapter,
we’ll see how Python supports various programming approaches, including structured,
object-oriented, and functional programming.
164
CHAPTER 4
165
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_4
Chapter 4 Python and Other Programming Approaches
166
Chapter 4 Python and Other Programming Approaches
Output:
HELLO PYTHON
First, we define a class named “Book”, which is like a blueprint for creating “Book”
objects. This class will have attributes to store data such as the title and author of the
book and a method to display this information.
class Book:
def __init__(self, title, author):
self.title = title
self.author = author
def display_info(self):
print(f"Book Title: {self.title}\nAuthor: {self.author}")
Explanation:
The “my_book” instance is an object of the “Book” class. It encapsulates data (the
title and author attributes) and provides functionality through a method (“display_info”).
This object is a practical embodiment of OOP principles in Python, demonstrating
how data and related behaviors are bundled together. Each “Book” object can hold
different data, showcasing the power of objects in managing state and behavior in a
structured way.
@abstractmethod
def area(self):
pass
169
Chapter 4 Python and Other Programming Approaches
def area(self):
return 3.1415 * self.radius ** 2
def area(self):
return self.width * self.height
Output:
Inheritance
Inheritance is the mechanism by which classes can derive characteristics and behaviors
from pre-existing classes. This principle supports the Don’t Repeat Yourself (DRY)
philosophy, enabling the reuse of code by incorporating shared attributes and methods
into base classes (superclasses). This concept mirrors biological inheritance, where
subclasses inherit traits and behaviors from their parents (superclasses), showcasing
shared attributes and potentially shared methods.
170
Chapter 4 Python and Other Programming Approaches
Example:
def speak(self):
pass # Placeholder for the speak method
Output:
Polymorphism
Polymorphism permits the tailoring of methods and attributes in subclasses that were
initially defined in a superclass, embodying the concept of “many forms.” It allows for the
creation of methods with the same name but differing implementations across classes.
Reflecting on our real-life analogy, children (subclasses) may share a common behavior
like getting hungry (a method) from their parents, but the specifics, such as frequency of
hunger, can vary, demonstrating polymorphism.
171
Chapter 4 Python and Other Programming Approaches
Example:
def speak(self):
pass # Placeholder for the speak method
# Function that takes an Animal object and calls its speak method
def animal_sound(animal):
return animal.speak()
Output:
172
Chapter 4 Python and Other Programming Approaches
Encapsulation
Encapsulation ensures the safekeeping of a class’s internal data, safeguarding the
integrity and privacy of the data within. Although Python does not explicitly support
private attributes through syntax, it achieves encapsulation through name mangling and
by using getter and setter methods for controlled access and modification of data.
Example:
class Human:
def __init__(self, name, age):
self.__name = name # Private attribute
self.__age = age # Private attribute
173
Chapter 4 Python and Other Programming Approaches
Output:
Name: Alice
Name: Alice, Age: 35
Invalid age
# math_operations.py
174
Chapter 4 Python and Other Programming Approaches
Note Search path
The search path is essentially a list of directories that the interpreter scans when
searching for a module.
If you keep the previous file we created and create a new one to call it, you should
have the following structure:
├── calc.py
└── math_operations.py
Output:
Addition result: 8
Subtraction result: 6
175
Chapter 4 Python and Other Programming Approaches
# Importing the math module and specifically importing the sqrt function
from math import sqrt
Output:
number = 25
result = sqrt(number)
print(f"The square root of {number} is: {result}")
176
Chapter 4 Python and Other Programming Approaches
Output:
The use of “*” has its advantages and disadvantages. It is not recommended to use it
if you know precisely what you need from the module. Use it judiciously.
import sys
import importlib
try:
custom_module = importlib.import_module(module_name)
print(f"Successfully imported {module_name} from {custom_module.__
file__}")
except ImportError:
print(f"Failed to import {module_name}")
177
Chapter 4 Python and Other Programming Approaches
Output on my computer:
• urllib: Provides tools for working with URLs and fetching data from
web resources, often used in web scraping and web requests.
179
Chapter 4 Python and Other Programming Approaches
These are just a few of the popular built-in modules in Python. Python’s extensive
standard library includes many more modules, each serving specific purposes and
making Python a versatile language for a wide range of applications.
180
Chapter 4 Python and Other Programming Approaches
├── example.txt
└── test.py
# Specify the file path (replace with the actual file path)
file_path = "example.txt"
try:
# Open the file and read its contents
file = open(file_path, 'r')
file_contents = file.read()
Output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the
1500s, when an unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five centuries, but
also the leap into electronic typesetting, remaining essentially unchanged.
It was popularized in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop publishing
software like Aldus PageMaker including versions of Lorem Ipsum.
• w: Opens an existing file for writing. If the file already contains data, it
will be overwritten. If the file doesn’t exist, it will be created.
181
Chapter 4 Python and Other Programming Approaches
• r+: Allows reading and writing data into the file. Existing data in the
file will be overwritten.
• w+: Enables writing and reading data. It will overwrite existing data.
• a+: Permits appending and reading data from the file. It won’t
overwrite existing data.
182
Chapter 4 Python and Other Programming Approaches
Output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the
1500s, when an unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five centuries, but
also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop publishing
software like Aldus PageMaker including versions of Lorem Ipsum.
Example:
Output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the
1500s, when an unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five centuries, but
also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop publishing
software like Aldus PageMaker including versions of Lorem Ipsum.
Example:
print(data)
183
Chapter 4 Python and Other Programming Approaches
Output:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the
1500s, when an unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five centuries, but
also the leap into electronic typesetting, remaining essentially unchanged.
It was popularized in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop publishing
software like Aldus PageMaker including versions of Lorem Ipsum.
In Python, it’s possible to split lines while reading files. The split() function is used
to divide a string into parts when a space is encountered by default. However, you can
customize the splitting character to be any character of your choice.
Example:
Output:
184
Chapter 4 Python and Other Programming Approaches
# Specify the file path (replace with the desired file path)
file_path = "new_file.txt"
try:
# Open the file in write mode ('w')
with open(file_path, 'w') as file:
# Write content to the file
file.write("This is the first line.\n")
file.write("This is the second line.\n")
file.write("This is the third line.\n")
Output:
185
Chapter 4 Python and Other Programming Approaches
TypedDicts
TypedDict was specified in PEP 589 and introduced in Python 3.8. On older versions
of Python, you can install it from typing_extensions (pip install typing_extensions). In
Python 3.11, it is directly imported from typing.
• The dict [key: value] type lets you declare uniform dictionary types,
where every value has non-defined type, and arbitrary keys are
supported.
Example:
class SalesSummary(TypedDict):
sales: int
country: str
product_codes: List[str]
186
Chapter 4 Python and Other Programming Approaches
class User(TypedDict):
name: str
age: int
married: NotRequired[bool]
Example:
187
Chapter 4 Python and Other Programming Approaches
class MyThing(TypedDict):
req1: int
opt1: -str # - means a potentially-missing key, or NotRequired[]
req2: float
class MyThing(TypedDict):
req1: int
opt1: ~str # ~ means a opposite-of-normal-totality key
req2: float
Self Type
Previously, if you had to define a class method that returned an object of the class itself,
it would look something like this:
T = TypeVar('T', bound=type)
class Circle:
def __init__(self, radius: int) -> None:
self.radius = radius
@classmethod
def from_diameter(cls: T, diameter) -> T:
circle = cls(radius=diameter/2)
return circle
To be able to say that a method returns the same type as the class itself, you had
to define a TypeVar and say that the method returns the same type T as the current
class itself.
188
Chapter 4 Python and Other Programming Approaches
class Language:
Output:
3.12
Improved Exceptions
Better Error Messages
Until now, in a traceback, the only information you got about where an exception got
raised was the line. In Python 3.11, the exact error locations in tracebacks are showed:
189
Chapter 4 Python and Other Programming Approaches
Exception Notes
Python 3.11 introduces exception notes (PEP 678). Now, inside your except clauses, you
can call the add_note() function and pass a custom message when you raise an error.
Example:
import math
try:
math.sqrt(-1)
except ValueError as e:
e.add_note("Negative value passed! Please try again.")
raise
import math
class MyOwnError(Exception):
__notes__ = ["This is a custom error!"]
try:
math.sqrt(-1)
except:
raise MyOwnError
Exception Groups
One way to think about exception groups (Figure 4-1) is that they’re regular exceptions
wrapping several other regular exceptions. However, while exception groups behave
like regular exceptions in many respects, they also support special syntax that helps you
handle each of the wrapped exceptions effectively. In Python 3.11, we group exceptions
with ExceptionGroup().
190
Chapter 4 Python and Other Programming Approaches
Example:
def exceptionGroup():
exec_gr = ExceptionGroup('ExceptionGroup Message!',
[FileNotFoundError("This File is not found"),
ValueError("Invalid Value Provided"),
ZeroDivisionError("Trying to divide by 0")])
raise exec_gr
TOML Support
TOML is short for Tom’s Obvious Minimal Language. It’s a configuration file format that’s
gotten popular over the last decade. The Python community has embraced TOML as the
format of choice when specifying metadata for packages and projects.
TOML has been designed to be easy for humans to read and easy for computers
to parse. While TOML has been used for years by many different tools, Python hasn’t
had built-in TOML support. That changes in Python 3.11, when tomllib is added to the
standard library. This new module builds on top of the popular tomli third-party library
and allows you to parse TOML files.
191
Chapter 4 Python and Other Programming Approaches
Example:
[second]
label = { singular = "second", plural = "seconds" }
aliases = ["s", "sec", "seconds"]
[minute]
label = { singular = "minute", plural = "minutes" }
aliases = ["min", "minutes"]
multiplier = 60
to_unit = "second"
[day]
label = { singular = "day", plural = "days" }
aliases = ["d", "days"]
multiplier = 24
to_unit = "hour"
[year]
label = { singular = "year", plural = "years" }
aliases = ["y", "yr", "years", "julian_year", "julian years"]
multiplier = 365.25
to_unit = "day"
The new tomllib library brings support for parsing TOML files. tomllib does not
support writing TOML. It’s based on the tomli library. When using tomllib.load(), you
pass in a file object that must be opened in binary mode by specifying mode=”rb”.
The two main functions in tomllib are as follows:
Example:
import tomllib
192
Chapter 4 Python and Other Programming Approaches
# correct
with open('t.toml', 'rb') as f:
tomllib.load(f)
• Variadic generics
To address this limitation, Python 3.11 introduces a new general type LiteralString,
which allows the users to enter any string literals, like the following:
paint_color("cyan")
paint_color("blue")
193
Chapter 4 Python and Other Programming Approaches
Variadic Generics
from typing import Generic, TypeVar
Dim1 = TypeVar('Dim1')
Dim2 = TypeVar('Dim2')
Dim3 = TypeVar('Dim3')
class Shape1(Generic[Dim1]):
pass
class Shape2(Generic[Dim1, Dim2]):
pass
class Shape3(Generic[Dim1, Dim2, Dim3]):
Pass
As shown, for three dimensions, we’ll have to define three types and their respective
classes, which isn’t clean and represents a high level of repetition that we should be
cautious about. Python 3.11 is introducing the TypeVarTuple that allows you to create
generics using multiple types.
small_num = -0.00321
print(f"{small_num:z.2f}")
# 0.00
194
Chapter 4 Python and Other Programming Approaches
3. Easy to Learn and Read: Python is known for its simplicity and
readability. Its clear and concise syntax makes it an excellent
choice for both beginners and experienced programmers. This
simplicity reduces the barrier to entry for individuals looking to
get started in AI.
4. Versatility: Python is a versatile language that can be used for a
wide range of tasks beyond AI, including web development, data
analysis, scripting, and more. This versatility means that you can
apply Python across different domains and integrate AI solutions
into various projects.
195
Chapter 4 Python and Other Programming Approaches
8. Open Source: Python and most of its AI libraries are open source,
which means they are freely available and can be customized or
extended as needed. This open nature fosters innovation and
collaboration in the AI community.
196
Chapter 4 Python and Other Programming Approaches
Summary
Python’s syntax is designed for readability and simplicity, adhering to the philosophy
that there should be a clear way to accomplish tasks. This design choice, coupled
with English keywords instead of punctuation, makes Python’s code clean and easy to
understand.
In the next chapter, we will discuss the following:
• Attention mechanisms
199
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_5
Chapter 5 Basic Overview of the Components of the LLM Architectures
Recurrent layers are crucial for processing sequential data, particularly in natural
language processing (NLP). These layers maintain hidden states that carry forward
information from previous elements in a sequence, allowing the model to consider
the context of earlier words when interpreting current ones. Training recurrent layers
involves techniques like backpropagation through time (BPTT), which adjusts weights
based on the model’s performance across entire sequences. Variants like Long Short-
Term Memory (LSTM) units and Gated Recurrent Units (GRUs) are employed to manage
long-term dependencies more effectively.
Attention mechanisms are a cornerstone of modern LLMs, enabling them to focus
on specific parts of the input text that are most relevant to the task at hand. The attention
mechanism functions through a two-phase approach: during the attention phase, words
seek out and exchange information with relevant words in the context, and during the
feedforward phase, they process this accumulated information. Transformers, a type of
LLM, utilize multiple attention heads to perform various information-exchange tasks in
parallel, enhancing their ability to process extensive texts efficiently.
Self-attention allows each word in a sentence to attend to all other words, capturing
their contextual relationships. Multi-head attention extends this by enabling the
model to focus on different parts of the sentence simultaneously, thereby improving
its understanding of syntactic and semantic relationships. These mechanisms are vital
for tasks requiring a comprehensive understanding of text, such as translation and
summarization.
Tokenization is the process of breaking down text into manageable units known as
tokens, which are then transformed into numerical representations called embeddings.
This segmentation is fundamental for LLMs to process and learn from large datasets
efficiently. The distribution and handling of these tokens, especially predicting the next
token in a sequence, form the basis of many LLM tasks, including text generation and
translation.
Embedding Layers
Embedding layers in neural networks, including large language models (LLMs), play a
crucial role in transforming discrete inputs, such as words or tokens, into continuous
vector forms. At the heart of this transformation is the embedding matrix, a learnable
component of the layer that starts off with random values.
200
Chapter 5 Basic Overview of the Components of the LLM Architectures
This matrix features rows that correspond to the unique tokens in the model’s
vocabulary. In operation, the embedding layer conducts a lookup for each input token,
fetching the corresponding row from the matrix. These fetched rows serve as the
continuous vector representations, or embeddings (Figure 5-1), of the tokens. This
outlines the fundamental operation of embedding layers, although each embedding
algorithm may vary, incorporating the position and context of sentences. Such variations
enable different models to perform with varying effectiveness across diverse scenarios.
Word embeddings (Figure 5-2) represent individual words within a text as vectors,
generated by specialized models designed for this purpose. Each word is assigned
a distinct vector, which is essentially an array filled with numbers that uniquely
identify that word. These vectors are multidimensional entities, with each dimension
representing a numerical component specific to the word. The uniqueness of these
vectors allows for the distinct representation of each word within a document.
201
Chapter 5 Basic Overview of the Components of the LLM Architectures
The principle of word embedding is to map words with similar meanings close
to each other in the vector space. For instance, the vector assigned to “apple” would
be more similar to that of “orange” than to “violin,” reflecting the closer relationship
between the fruits compared to the musical instrument. This similarity in vectors
mirrors the semantic proximity between words.
It’s important to mention that the choice of dimensionality for these vectors lies in
the hands of the model’s architect. Take Word2Vec as an instance; it employs vectors of
300 dimensions, representing each word with 300 distinct numbers. In this context, we
aim to construct a basic word embedding model using just two dimensions.
Stage 1: Nodes
The initial phase involves establishing a series of nodes grouped within a “hidden layer”
(Figure 5-3). Each node is linked to an input word through a weighted connection.
Nodes are structured in two segments. The first segment aggregates the weighted inputs
from each word. This aggregated sum is then forwarded to the second segment, the
activation function. The role of the activation function is to determine the node’s output
based on its specific process. In this scenario, the activation functions act as simple
identity functions, leaving the input unchanged.
The total number of nodes determines the vector dimensionality, as previously
discussed. This means it sets the quantity of numbers (or vector components) assigned
to each input word. Typically, in practical applications, the count of nodes ranges from
tens to hundreds.
Every word in the dataset is fed into each node with a specific weight, aiming to
facilitate the model’s learning of word relationships without increasing the system’s
complexity.
202
Chapter 5 Basic Overview of the Components of the LLM Architectures
Figure 5-3. Input words are processed through a hidden layer in a neural network
(Source: http://medium.com)
In the initial stage, as outlined, each word is linked to nodes through weighted
connections, represented as (w₁, w₂, ... wn). These weights are initially set to random
values by the models to kick-start the learning process. The objective is to adjust and
refine these weights through numerous iterations to enhance model accuracy. Naturally,
it’s not feasible to perform mathematical operations like multiplication between a string
(word) and a number directly. Hence, a binary approach is employed where words are
encoded as 0 or 1. Specifically, the word immediately preceding the target word for
prediction is assigned a value of 1.
203
Chapter 5 Basic Overview of the Components of the LLM Architectures
Figure 5-4. Detailed neural network diagram with a focus on the hidden layer
and includes specific weights for connections
204
Chapter 5 Basic Overview of the Components of the LLM Architectures
Feedforward Layers
Feedforward layers in a neural network are layers where the connections between nodes
do not form cycles. Each node in one layer connects only to nodes in the next layer,
allowing information to flow in one direction – from input to output. This structure is
fundamental in neural networks for tasks like classification and regression.
205
Chapter 5 Basic Overview of the Components of the LLM Architectures
Feedforward Phase
During this phase, the network processes the input by advancing it through the layers.
Hidden layers calculate the weighted sum of inputs and then apply an activation
function (e.g., ReLU, Sigmoid, TanH) to incorporate nonlinearity, continually forwarding
the data until the output layer is reached and a prediction is made.
Backpropagation Phase
Following a prediction, the network evaluates the discrepancy between the actual output
and the expected output. This error is then propagated backward through the network.
To reduce the error, the network employs a gradient descent optimization technique,
adjusting the weights across connections accordingly.
The colored circles represent neurons, which are essentially mathematical functions
calculating a weighted sum of their inputs. The strength of the feedforward layer lies
in its extensive network of connections. Although illustrated with a modest number
of neurons, the actual scale in GPT-3’s feedforward layers is vastly larger, featuring
12,288 neurons in the output layer to match its 12,288-dimensional word vectors, and a
staggering 49,152 neurons in the hidden layer.
In GPT-3’s most comprehensive version, the hidden layer alone comprises 49,152
neurons, each receiving 12,288 inputs, equating to the same number of weight parameters
per neuron. The output layer contains 12,288 neurons, each with 49,152 inputs, resulting
in 49,152 weight parameters per neuron. This configuration results in over 1.2 billion
weight parameters per feedforward layer, and with 96 such layers, the total parameter
count reaches 116 billion. This massive number represents nearly two-thirds of GPT-3’s
total parameter count of 175 billion.
Research by Tel Aviv University in 20201 revealed that feedforward layers function
through pattern recognition: each neuron in the hidden layer identifies specific patterns
within the input text. For instance, in a 16-layer GPT-2 model, neurons were found to
recognize patterns ranging from word sequences ending with “substitutes” to those
pertaining to military bases, time intervals, and television shows, showcasing an increase
in the abstraction level of recognized patterns across layers. Early layers pinpointed
specific words, while deeper layers identified phrases within broader semantic fields.
This insight is particularly fascinating given that the feedforward layer processes
each word individually. For example, when identifying a phrase as television related, it
relies on the vector for “archived” without direct access to associated words like “NBC”
or “daytime”. This suggests that attention mechanisms preceding the feedforward layer
integrate contextual information into the vector, enabling pattern recognition.
Upon recognizing a pattern, a neuron enriches the word vector with additional
information, which, while sometimes abstract, can often hint at a probable
following word.
1
Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy, Transformer Feed-Forward Layers Are
Key-Value Memories
207
Chapter 5 Basic Overview of the Components of the LLM Architectures
Recurrent Layers
The last layer of large language models (LLMs) is known as the recurrent layer. This layer
is responsible for understanding the words in the input text sequence, enabling it to grasp
the connections among various words within the sequence presented in user prompts.
Recurrent layers in large language models (LLMs) are pivotal for processing
sequential data, particularly in the domain of natural language processing (NLP). These
layers enable LLMs to handle temporal dependencies and contextual nuances in text
data, leveraging the sequential nature of language.
Hidden States
At the core of a recurrent layer is the hidden state, which is updated at each step of
the sequence. When a new input is received, the recurrent layer combines it with the
existing hidden state to produce a new hidden state. This mechanism enables the model
to accumulate knowledge about the entire sequence up to the current point, which is
essential for understanding the structure and meaning of the text.
208
Chapter 5 Basic Overview of the Components of the LLM Architectures
Attention Mechanisms
Let’s explore the internal workings of a transformer in processing input text. The
mechanism involves a dual-phase approach to refresh the contextual information for
every word:
It’s crucial to note that these operations are conducted by the network as a whole
rather than by individual words. This clarification is important to highlight the
transformer’s method of analyzing text at the word level, as opposed to larger text blocks.
This strategy leverages the extensive parallel processing capabilities of contemporary
GPUs, enabling large language models (LLMs) to efficiently process extensive texts, an
aspect where previous models faced limitations.
209
Chapter 5 Basic Overview of the Components of the LLM Architectures
The attention mechanism functions akin to a matchmaking system for words, where
each word generates a query vector outlining the traits of words it seeks and a key
vector detailing its own traits. The network evaluates the compatibility of key and query
vectors through dot product calculations to identify matching words, facilitating the
exchange of information between them.
Consider a scenario where a transformer determines that “his” refers to “John” in
a sentence fragment. This process involves matching the query vector for “his,” which
might be looking for a noun identifying a male person, with the key vector for “John,”
indicating it as a noun for a male person. Upon finding a match, the network transfers
information from “John” to “his.”
Transformers feature multiple “attention heads” in each attention layer, enabling
parallel processing of various information-exchange tasks. These tasks range from
associating pronouns with nouns, clarifying homonyms, to linking multi-word phrases. The
operation of attention heads is sequential, with the output from one serving as the input for
another in the next layer, often requiring multiple attention heads for complex tasks.
In the case of GPT-3’s largest variant, it consists of 96 layers, each with 96 attention
heads, culminating in 9,216 attention operations for every word prediction.
Large language models (LLMs), such as those based on the transformer architecture,
utilize sophisticated attention mechanisms to process and generate language. These
mechanisms are pivotal for understanding the context and relationships between words
in a sentence. Let’s explore the primary types of attention mechanisms used in LLMs.
Self-attention (Intra-attention)
Self-attention, a cornerstone of the transformer model, allows each word in a sentence
to attend to all other words to capture their contextual relationships. This mechanism
helps the model understand the meaning of each word within the context of the entire
sentence. It’s particularly effective in identifying dependencies and relationships,
regardless of their distance in the text.
Multi-head Attention
Multi-head attention is an extension of self-attention that allows the model to focus on
different parts of the sentence simultaneously. By dividing the attention mechanism
into multiple “heads,” the model can capture various aspects of word context, such as
syntactic and semantic relationships, in parallel. This leads to a more comprehensive
understanding of the text.
210
Chapter 5 Basic Overview of the Components of the LLM Architectures
Masked Attention
Used primarily in the decoder part of the transformer, masked attention prevents
positions from attending to subsequent positions. This is essential during training to
ensure that the prediction for a particular word only depends on previously generated
words, maintaining the autoregressive property. Masked attention is key in generating
coherent and contextually appropriate text.
Sparse Attention
Sparse attention mechanisms are designed to improve efficiency and scalability for
processing long sequences of text. By selectively focusing on a subset of the input
positions, sparse attention reduces computational complexity. Models like Longformer
and BigBird implement variations of sparse attention to handle longer documents
effectively.
Global/Local Attention
Some models incorporate global and local attention mechanisms to balance between
focusing on the entire text and concentrating on specific, relevant segments. Global
attention may consider all input tokens, while local attention focuses on a neighborhood
around the current token, optimizing both performance and computational efficiency.
These attention mechanisms enable LLMs to process and understand language at an
unprecedented scale, handling complex linguistic patterns and nuances. By leveraging
these diverse attention strategies, LLMs achieve remarkable performance across a wide
range of natural language processing tasks, from translation and summarization to
question answering and creative writing.
211
Chapter 5 Basic Overview of the Components of the LLM Architectures
212
Chapter 5 Basic Overview of the Components of the LLM Architectures
213
Chapter 5 Basic Overview of the Components of the LLM Architectures
Tokenization, a critical step in processing text for natural language models, faces
several notable challenges.
Navigating Ambiguity
Tokenization may struggle to address lexical ambiguity, where a single word carries
multiple meanings. Deciphering the correct interpretation can prove to be a
complex puzzle.
Interpreting Idioms
The segmentation of idiomatic expressions into discrete tokens risks stripping them
of their inherent meanings, as these phrases derive their significance from the specific
combination of words used together.
215
Chapter 5 Basic Overview of the Components of the LLM Architectures
217
Chapter 5 Basic Overview of the Components of the LLM Architectures
2
Coursera, Natural Language Processing with Attention Models
218
Chapter 5 Basic Overview of the Components of the LLM Architectures
219
Chapter 5 Basic Overview of the Components of the LLM Architectures
6. The transformer (Figure 5-11 and Figure 5-12) selects the word
with the highest probability score, which is then fed into the
decoder to facilitate the generation of the subsequent word.
This iterative process continues until the model produces an
end token, illustrating the procedure of language translation
performed by a transformer.
Few-Shot Learning
Few-shot learning is a technique that involves training models on a significantly smaller
dataset than is typically used. This approach is a prime example of meta-learning, where
the model undergoes training across a variety of related tasks during its meta-training
stage. This process equips the model to perform effectively on new, unseen data by
leveraging a very limited number of examples.
222
Chapter 5 Basic Overview of the Components of the LLM Architectures
Zero-Shot Learning
Zero-shot learning is an advanced approach in machine learning where a model, once
trained, is capable of making predictions for classes it has never seen during its training
phase. This technique draws inspiration from human cognitive abilities to recognize
and relate new information based on learned concepts, enabling machines to similarly
identify new categories.
The primary goal of zero-shot learning is to empower models to accurately classify
or recognize objects from entirely new categories without the need for direct training
on those specific classes. This capability is achieved through the transfer of knowledge
from previously learned data, emphasizing the model’s ability to understand and apply
semantic relationships and attributes to new, unseen data.
Zero-shot learning focuses on the development of models that can interpret and
utilize intermediate semantic features to identify novel classes. An illustrative example of
this can be understood through the analogy of distinguishing a zebra from a horse. Even
if one has never encountered a zebra, understanding that it resembles a horse with black
and white stripes enables recognition upon first sight.
223
Chapter 5 Basic Overview of the Components of the LLM Architectures
224
Chapter 5 Basic Overview of the Components of the LLM Architectures
Each of these approaches addresses the challenge of data scarcity from a different
angle, offering solutions that range from utilizing a handful of examples to making
educated guesses without any examples at all. Through these methodologies, machine
learning can achieve remarkable flexibility and adaptability, pushing the boundaries of
what’s possible even when data is limited.
225
Chapter 5 Basic Overview of the Components of the LLM Architectures
Examples
Sure, here are the examples of few-shot learning, one-shot learning, and zero-shot
learning in the context of large language models.
Few-Shot Learning
• Scenario: Providing a language model with a small number of
examples to perform a specific task.
• Task: After seeing these few examples, the language model can
generate new poems that mimic the style and themes of Poet A.
One-Shot Learning
• Scenario: Providing a language model with a single example to
learn a task.
• Training Data
• Task: After seeing this single example, the language model can
translate other similar phrases from English to the rare language.
226
Chapter 5 Basic Overview of the Components of the LLM Architectures
Zero-Shot Learning
• Scenario: Asking a language model to perform a task it hasn’t been
explicitly trained on by leveraging its general knowledge.
These examples illustrate how large language models can be adapted to new tasks
and contexts with varying amounts of specific training data or even just descriptive
information, showcasing the flexibility and power of modern AI systems.
LLM Hallucinations
AI hallucination refers to the occurrence where advanced artificial intelligence systems,
particularly large language models (LLMs) and computer vision technologies, generate
outputs that contain fabricated or nonsensical information, not grounded in reality
or the data they were trained on. This phenomenon can result in responses or visual
outputs that seem bizarre or entirely incorrect from a human perspective.
Typically, users expect AI-generated responses to accurately reflect the information
pertinent to their queries or prompts. Nonetheless, there are instances where the
AI’s algorithms produce outcomes that diverge from the underlying training data,
misinterpret the input due to flawed decoding processes, or generate responses without
a discernible logical foundation, effectively “hallucinating” the information.
The use of the term “hallucination” to describe such AI behaviors might appear
odd at first, as it anthropomorphizes machine processes with a term usually applied
to human or animal experiences. Yet this metaphor aptly conveys the unexpected and
227
Chapter 5 Basic Overview of the Components of the LLM Architectures
often surreal nature of the outputs, reminiscent of how humans might discern shapes in
clouds or faces on the moon, driven by the AI’s misinterpretations caused by issues like
overfitting, biases or inaccuracies in the training data, and the inherent complexity of the
models themselves.
Instances of AI hallucinations have been noted in several high-profile cases,
underscoring the challenges inherent in deploying generative, open source AI
technologies. Examples include Google’s Bard chatbot making unfounded claims
about astronomical discoveries, Microsoft’s AI expressing inappropriate emotional
attachments or behaviors, and Meta withdrawing its Galactica LLM demo due to the
propagation of biased or incorrect information.
Although steps have been taken to address and rectify these problems, they highlight
the potential for unexpected and sometimes problematic outcomes in the application of
AI technologies, even under optimal conditions.
Factuality Hallucinations
These occur when an LLM produces information that is factually incorrect. An example
of this would be an LLM asserting that Charles Lindbergh was the first person to land
on the moon, clearly a factual mistake. Such errors typically stem from the model’s
inadequate grasp of context and the presence of inaccuracies or misleading information
in its training data, resulting in outputs that fail to align with actual facts.
Faithfulness Hallucinations
This category encompasses situations where the content generated by an LLM deviates
from or contradicts the source material it is supposed to reflect or summarize. For
instance, during summarization tasks, if a source article mentions the FDA’s approval of
the first Ebola vaccine in 2019, a faithfulness hallucination might manifest as the model
inaccurately stating that the FDA disapproved of the vaccine (an intrinsic hallucination),
228
Chapter 5 Basic Overview of the Components of the LLM Architectures
Implications of AI Hallucination
The repercussions of AI hallucination are profound, especially in critical sectors such
as healthcare, where an AI misdiagnosis could mistakenly identify a harmless condition
as severe, prompting unnecessary treatments. Moreover, AI-generated inaccuracies
can fuel the dissemination of false information. Consider a scenario where AI-driven
news platforms inaccurately report on an unfolding crisis based on unchecked facts,
potentially exacerbating the situation by spreading misinformation.
A primary factor contributing to AI hallucinations is the presence of bias in the
training data. When AI systems are trained on datasets that are skewed or not fully
representative, they might generate outputs that inaccurately reflect these biases,
interpreting nonexistent patterns or characteristics.
Furthermore, AI systems are susceptible to adversarial attacks, where malicious
entities deliberately alter inputs to deceive the AI into making incorrect identifications
or decisions. In the context of image recognition, such an attack could involve the
introduction of imperceptible, specially designed noise to an image, leading the AI to
erroneously categorize it. This vulnerability is particularly alarming in areas critical to
public safety and security, including cybersecurity measures and the development of
autonomous driving technologies.
To counter these threats, AI researchers are diligently working on developing robust
defense mechanisms, such as adversarial training, which involves training AI models
on both standard and adversarially modified inputs to enhance their resilience against
such attacks. Despite these advancements, maintaining rigorous standards during the
training process and ensuring thorough verification of information remain essential to
mitigating the risks associated with AI hallucinations.
229
Chapter 5 Basic Overview of the Components of the LLM Architectures
Efforts to mitigate LLM hallucinations include improving the models’ training data,
refining their architectures, and developing more sophisticated techniques for checking
and validating generated content. Additionally, user feedback and prompt engineering
(designing inputs to the model that are more likely to yield accurate and relevant
outputs) are crucial strategies for reducing the occurrence of hallucinations.
230
Chapter 5 Basic Overview of the Components of the LLM Architectures
231
Chapter 5 Basic Overview of the Components of the LLM Architectures
232
Chapter 5 Basic Overview of the Components of the LLM Architectures
Future Implications
Although large language models (LLMs) hold the promise of transforming numerous
sectors, it’s crucial to acknowledge their constraints and the ethical concerns they
raise. Companies and professionals ought to weigh the benefits against the potential
drawbacks and hazards of implementing LLMs. Moreover, it’s imperative for creators
of these models to persistently enhance their designs, aiming to reduce biases and
augment their applicability across varied contexts.
As we navigate through the existing boundaries of large language models (LLMs),
the quest for the next breakthrough in AI is leading researchers down innovative paths. A
key insight fueling this journey is the realization that the majority of data interacted with
daily by humans is scarcely digital, and an even smaller fraction is textual.
The exploration into multimodal learning stands out as a pivotal direction, merging
textual data with other forms such as images, videos, and audio. This integration
promises to unlock a richer, more complex understanding of information, enabling AI
systems to grasp and interpret human language with unprecedented depth and nuance.
Achieving this enhanced understanding necessitates progress in specialized fields
including computer vision and video analysis, potentially revolutionizing speech
recognition for more engaging and holistic AI interactions. However, the shift toward a
multimodal approach introduces its own set of challenges, particularly in handling the
increased data scale and complexity, necessitating innovative solutions to streamline
and optimize the training process.
233
Chapter 5 Basic Overview of the Components of the LLM Architectures
234
Chapter 5 Basic Overview of the Components of the LLM Architectures
both directions (left and right) for tasks like question answering and sentiment analysis.
Additionally, T5 (Text-to-Text Transfer Transformer), also by Google, converts all NLP
tasks into a text-to-text format, enabling a unified approach to various language tasks.
These architectures leverage the power of transformers to achieve state-of-the-art
performance in diverse NLP applications.
GPT-4
The fourth iteration in the series of foundational models by OpenAI, the Generative
Pre-trained Transformer 4 (GPT-4), is a multimodal language model that was
introduced on March 14, 2023. It is accessible through the subscription-based ChatGPT
Plus, OpenAI’s API, and Microsoft Copilot, a free chatbot service. GPT-4 employs a
transformer architecture, leveraging a combination of publicly available data and data
obtained under license from third-party providers for pre-training. This process involves
predicting subsequent tokens, which is then refined through fine-tuning with human
and AI-generated reinforcement learning feedback to ensure alignment with human
values and adherence to policy guidelines.
Compared to its predecessor, GPT-3.5, the GPT-4 version of ChatGPT is seen as an
enhancement, though it still shares some limitations of its earlier versions. A distinctive
feature of GPT-4, referred to as GPT-4V, includes the ability to process image inputs
in ChatGPT. Despite its advancements, OpenAI has chosen not to disclose specific
technical details and metrics about the model, including its exact size.
Model Characteristics
235
Chapter 5 Basic Overview of the Components of the LLM Architectures
GPT-4 is capable of handling over 25,000 words of text, allowing for use cases like
long form content creation, extended conversations, and document search and analysis.
GPT-4 Limitations
Similar to its forerunners, GPT-4 sometimes produces information that is either not
present in its training data or contradicts the input provided by users, a phenomenon
often referred to as “hallucination.” Additionally, the model operates with a lack of
transparency regarding its decision-making process. Although it can offer explanations
236
Chapter 5 Basic Overview of the Components of the LLM Architectures
for its responses, these are generated after the fact and may not accurately represent the
underlying reasoning process. Frequently, the explanations provided by GPT-4 can be
inconsistent with its prior responses.
In an evaluation using ConceptARC,3 a benchmark designed for assessing abstract
reasoning, GPT-4’s performance was significantly lower than expected, scoring under
33% across all categories. This was in stark contrast to specialized models and human
performance, which scored around 60% and at least 91%, respectively.
This outcome suggests that abstract reasoning, which involves understanding
complex relationships and patterns, remains a challenging domain for general-purpose
AI models like GPT-4. This benchmark focuses on abstract reasoning, a critical aspect
of human cognition that involves identifying patterns, logical rules, and relationships
among objects. GPT-4’s sub-33% performance indicates its struggles in this domain.
The specialized models, designed with specific architectures or trained on datasets
tailored for abstract reasoning, showed much better performance, scoring at 60%,
emphasizing the importance of domain-specific training and optimization.
Specialized models refer to those specifically designed and optimized to handle
particular types of tasks or domains. In the context of abstract reasoning and the
ConceptARC benchmark, specialized models demonstrated superior performance
compared to GPT-4. Here are details about these specialized models.
Table 5-1 compares the performance of humans, the top two ARC-Kaggle
competition entries, and GPT-4 on various conceptual tasks. Each task is evaluated
based on the accuracy or success rate of completing the task.
3
Arseny Moskvichev, Victor Vikram Odouard, Melanie Mitchell, The ConceptARC Benchmark:
Evaluating Understanding and Generalization in the ARC Domain, https://arxiv.org/
abs/2305.07141
237
Chapter 5 Basic Overview of the Components of the LLM Architectures
4
Arseny Moskvichev, Victor Vikram Odouard, Melanie Mitchell, The ConceptARC Benchmark:
Evaluating Understanding and Generalization in the ARC Domain, https://arxiv.org/
abs/2305.07141
238
Chapter 5 Basic Overview of the Components of the LLM Architectures
The specialized models (first and second place in ARC-Kaggle) outperformed GPT-4
across various concept groups in the ConceptARC benchmark. Here are some examples
of their performance.
• Above and Below: First Place ARC-Kaggle scored 70%, while GPT-4
scored 23%.
• Filled and Not Filled: First Place ARC-Kaggle scored 73%, while
GPT-4 scored 17%.
Key Takeaways
Sam Bowman, a researcher not involved in the study, noted that these findings
might not necessarily point to a deficiency in abstract reasoning capabilities of GPT-4,
considering that the assessment was visually oriented and GPT-4 is primarily a
linguistic model.
A study in January 2024 by Cohen Children’s Medical Center5 researchers reported
that GPT-4 had a 17% accuracy rate in diagnosing pediatric medical conditions,
highlighting limitations in its medical diagnostic capabilities.
Regarding bias, GPT-4 underwent a two-stage training process. Initially, it was
fed vast amounts of Internet text to learn to predict the next token in a sequence.
Subsequently, it underwent refinement through reinforcement learning from human
5
JAMA Pediatr. Published online, January 2, 2024. doi: 10.1001/jamapediatrics.2023.5750
239
Chapter 5 Basic Overview of the Components of the LLM Architectures
feedback, aimed at teaching the model to reject prompts that could lead to harmful
behaviors, as defined by OpenAI. This includes generating responses related to illegal
activities, self-harm, or the description of explicit content.
Researchers from Microsoft have raised concerns that GPT-4 may display certain
cognitive biases, including confirmation bias, anchoring, and base-rate neglect,
indicating potential areas where the model’s reasoning could be influenced by
inherent biases.
BERT
BERT, short for Bidirectional Encoder Representations from Transformers, is a freely
available software library developed for tasks related to natural language processing
(NLP). Introduced in 2018 by the Google AI Language team, BERT is designed to
understand the subtleties and context of human language.
Introduction to BERT
At its core, BERT utilizes a neural network architecture based on transformers to process
and interpret the nuances of human language. Unlike the traditional transformer model,
which includes both encoder and decoder components, BERT incorporates only the
encoder mechanism. This design choice underscores BERT’s focus on comprehending
input texts over generating new text.
240
Chapter 5 Basic Overview of the Components of the LLM Architectures
furnishes a more layered interpretation, recognizing that the “bank” here pertains to
a financial establishment, given the context of money. This showcases the enhanced
comprehension BERT achieves by considering the entire sentence’s context, overcoming
limitations seen in unidirectional models.
241
Chapter 5 Basic Overview of the Components of the LLM Architectures
242
Chapter 5 Basic Overview of the Components of the LLM Architectures
243
Chapter 5 Basic Overview of the Components of the LLM Architectures
244
Chapter 5 Basic Overview of the Components of the LLM Architectures
T5
The “Text-to-Text Transfer Transformer” (T5) (Figure 5-15), unveiled by Google in
2020,6 represents an advanced model framework based on the transformer architecture,
specifically employing both encoder and decoder components for generating text. This
approach distinguishes it from other notable transformer-based models like BERT or
GPT, which utilize either encoder or decoder structures but not both. The innovation of
the T5 model is further underscored by its introduction of the Colossal Clean Crawled
Corpus (C4), a vast, meticulously cleaned dataset designed for pre-training the language
model through self-supervised learning techniques.
Training or fine-tuning the T5 model necessitates a pair of input and output text
sequences, enabling the execution of diverse tasks such as text generation, translation,
and more. Subsequent developments have led to various iterations of the original T5
model, including T5v1.1, which benefits from architectural improvements and exclusive
pre-training on the C4 dataset; mT5, a multilingual variant trained across 101 languages;
ByT5, which leverages byte sequences instead of subword token sequences; and LongT5,
tailored for processing extended text inputs.
The T5 model’s examination of transformer architectures (Figure 5-16) revealed
three main types: the standard encoder-decoder, the single-layer language model, and
the prefix language model, each distinguished by unique masking strategies to control
attention mechanisms. Among these, the encoder-decoder setup, characterized by its
comprehensive masking techniques, proved to be the most effective.
6
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena,
Yanqi Zhou, Wei Li, Peter J. Liu, Exploring the Limits of Transfer Learning with a Unified Text-to-
Text Transformer
245
Chapter 5 Basic Overview of the Components of the LLM Architectures
Cohere
Cohere has emerged as a distinguished entity within the field of natural language
processing (NLP), achieving remarkable progress since its foundation. The organization
is driven by a vision to master language comprehension, which has culminated in the
creation of an advanced language model. A pivotal achievement for Cohere has been
the integration of transformer technologies, enhancing the model’s capability to analyze
246
Chapter 5 Basic Overview of the Components of the LLM Architectures
and generate text with an acute awareness of context and meaning. Additionally, Cohere
places a strong emphasis on the ethical deployment of AI, instituting comprehensive
measures to counteract bias and adhere to ethical principles, thereby ensuring the
development of both potent and principled AI tools.
At the heart of Cohere’s technological infrastructure lies the implementation of
cutting-edge neural network innovations, particularly transformer models. These
models excel in processing sequential information, grasping the intricate dynamics
between words within and across sentences. Employing attention mechanisms, Cohere’s
models adeptly prioritize significant segments of text when necessary. This meticulous
attention to context and sequence grants Cohere a superior capacity for deciphering the
subtleties of language, including its tone, style, and underlying implications. Designed
with scalability in mind, the architecture effortlessly accommodates an array of linguistic
tasks, ranging from straightforward text categorization to elaborate question-answering
frameworks.
Known for their effectiveness and scalability, Cohere’s models deliver consistent
results even under limited computational conditions. These models are exceptionally
advantageous for enterprises seeking a tailor-made solution that seamlessly meshes
with their current infrastructure without demanding extensive computational power.
Cohere’s proficiency in language comprehension and generation renders it an excellent
tool for automating customer support, analyzing sentiments, and crafting content.
In various sectors, Cohere’s AI model demonstrates its versatility. In customer
support, for instance, it powers chatbots capable of not just understanding and
addressing user inquiries but also tailoring responses to reflect the customer’s emotional
state and prior interactions. When it comes to content moderation, Cohere plays a
crucial role in efficiently screening and managing user-generated content to ensure
adherence to community standards. Moreover, in the educational technology space,
Cohere’s models are instrumental in customizing learning materials to fit individual
preferences and learning velocities, thereby transforming the e-learning landscape.
PaLM 2
Language models have revolutionized the field of natural language processing,
significantly enhancing AI’s capacity to understand and produce text that closely
mimics human communication. Among these innovative developments, the Pathways
Language Model 2 (PaLM 2) is a standout example, advancing the frontiers of linguistic
comprehension and context-aware processing.
247
Chapter 5 Basic Overview of the Components of the LLM Architectures
Extensive Pre-training
The pre-training phase involves the model learning through predicting missing words,
grasping contexts, and generating coherent text across a vast dataset. This exposure
allows PaLM 2 to familiarize itself with various language patterns and nuances,
progressively honing its linguistic representation skills.
248
Chapter 5 Basic Overview of the Components of the LLM Architectures
Task-Specific Fine-Tuning
While general pre-training provides a broad language foundation, fine-tuning tailors
PaLM 2 for specific applications by training it on targeted, domain-specific datasets. This
process enables the model to apply its extensive language understanding to specific real-
world tasks effectively.
249
Chapter 5 Basic Overview of the Components of the LLM Architectures
Generating Outputs
After processing, PaLM 2 produces outputs tailored to the task it’s fine-tuned for,
showcasing its versatility across a range of language processing applications.
PaLM 2 signifies a monumental stride in AI, ushering in a new era of sophisticated
language understanding and generation. By incorporating advanced techniques and a
multifaceted architecture, PaLM 2 excels in adaptability and generalization, establishing
itself as a formidable tool for addressing complex linguistic tasks.
With its profound grasp of context and nuanced expression, PaLM 2 promises more
natural and human-like interactions with AI systems, enhancing user experiences across
various applications. As we move forward, the impact of PaLM 2 on the development of
conversational agents, machine translation, and text summarization will undoubtedly be
profound, marking a significant milestone in the evolution of AI technologies.
Jurassic-2
AI21’s Jurassic-2 language model comes in three variants – Jumbo, Grande, and Large
– with each offered at distinct price levels. The specifics of the model sizes are kept
confidential; however, the documentation highlights the Jumbo version as the most
potent option. These models are characterized as versatile, excelling across all types of
generative tasks. The Jurassic-2 model is proficient in seven languages and allows for
fine-tuning with specific datasets. Users can obtain an API key through the AI21 platform
and utilize the AI21() class for model access.
The J2 models have been developed using an extensive database of textual content,
equipping them with the capability to generate text that closely mimics human writing.
They excel in a wide range of complex activities, including but not limited to answering
questions, categorizing text, and more.
These models are adaptable to almost any task involving language, through the use
of prompt engineering. This involves designing a prompt that outlines the task at hand
and may include examples. They are particularly beneficial for creating advertising
content, powering conversational agents, and enhancing creative writing efforts.
While experimenting with different prompts can lead to satisfactory outcomes
for your specific needs, optimizing performance and expanding your application’s
capabilities may require training a bespoke model.
250
Chapter 5 Basic Overview of the Components of the LLM Architectures
Claude v1
Claude v1, Anthropic’s inaugural release of its conversational AI assistant, marks
a significant achievement in the company’s quest to create safe artificial general
intelligence. Established by Dario Amodei and Daniela Amodei in 2021, after their
tenure at OpenAI focusing on AI safety, Anthropic is a San Francisco-based AI safety
startup. The organization is committed to crafting AI technologies that are beneficial,
nonharmful, and truthful, emphasizing the importance of aligning AI development with
human values through a safety and ethics-first research approach.
In April 2022, Anthropic introduced Claude v1 to the public, following more than a year of
development in stealth. This debut was designed to demonstrate the company’s commitment
to delivering safe and effective conversational AI tools. Claude is engineered to engage in
natural conversations while steering clear of damaging, unethical, or false exchanges.
Model Design
Anthropic crafted a bespoke neural network structure for Claude v1, based on
transformer technology commonly applied in natural language processing tasks. While
the precise size of Claude v1’s model is proprietary, its billions of parameters enable it to
handle extensive discussions across myriad subjects.
Incorporating safety measures such as Constitutional AI and Vigilance into its
architecture, Claude v1 is engineered to avoid unsafe interactions. Its primary features
include engaging in natural dialogues, providing valuable assistance, refraining from
harmful content, maintaining honesty, offering personalized experiences, and the
capacity for continuous improvement.
251
Chapter 5 Basic Overview of the Components of the LLM Architectures
Falcon 40B
Falcon 40B is a member of the Falcon series of large language models (LLMs), developed
by the Technology Innovation Institute (TII). This series also features models like Falcon
7B and Falcon 180B. Specifically, Falcon 40B is a causal decoder-only model tailored for
a variety of natural language processing tasks.
This model boasts multilingual support, covering languages such as English,
German, Spanish, and French, and offers partial proficiency in additional languages like
Italian, Portuguese, Polish, Dutch, Romanian, Czech, and Swedish.
Model Design
The design of Falcon 40B (Figure 5-17) draws inspiration from the GPT-3 architecture
but incorporates significant enhancements to boost its performance. It introduces rotary
positional embeddings to improve its understanding of sequences. The model also
benefits from advanced attention mechanisms, including multi-query attention and
Flash Attention, alongside a decoder architecture that combines parallel attention with
Multilayer Perceptron (MLP) structures, all under a dual-layer normalization framework
to optimize computational efficiency.
252
Chapter 5 Basic Overview of the Components of the LLM Architectures
Training Process
The training of Falcon 40B utilized the power of AWS SageMaker equipped with 384
A100 40GB GPUs, adopting a 3D parallelism approach (Tensor Parallelism=8, Pipeline
Parallelism=4, Data Parallelism=12) in harmony with ZeRO optimization. The training
kicked off in December 2022 and was completed over a span of two months.
For enthusiasts and developers interested in training large language models using
PyTorch, a comprehensive guide is available that walks through the entire process from
setup to execution.
LLaMA
Meta AI initiated the LLaMA (Large Language Model Meta AI) project in February
2023, introducing a series of autoregressive large language models (LLMs) that marked
a significant advancement in the field. The initial launch featured models of varying
complexities, including those with 7, 13, 33, and 65 billion parameters.
Impressively, the 13-billion-parameter version outperformed the vastly larger GPT-3,
which consists of 175 billion parameters, across numerous NLP benchmarks, while the
largest LLaMA model showed competitive results against leading models like PaLM and
Chinchilla.7 Unlike previous high-capacity LLMs that were typically available through
restricted APIs, Meta took an unprecedented step by making LLaMA’s models openly
accessible to researchers under a noncommercial license, although the model weights
were leaked online shortly after their release.
In July 2023, Meta further expanded its LLaMA offerings with the introduction of
LLaMA 2, developed in collaboration with Microsoft. This next iteration included models
with 7, 13, and 70 billion parameters,8 and although the architectural foundation remained
7
Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne;
Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez,
Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). “LLaMA: Open and
Efficient Foundation Language Models”. arXiv:2302.13971 [cs.CL]
8
“Meta and Microsoft Introduce the Next Generation of LLaMA”. Meta. July 18, 2023. Retrieved
July 21, 2023
254
Chapter 5 Basic Overview of the Components of the LLM Architectures
similar to the first version, the data used for training was expanded by 40%. Meta’s strategy
involved not only releasing foundational models but also versions fine-tuned for dialog,
dubbed LLaMA-2 Chat, which were made available for broad commercial use, albeit with
certain restrictions that stirred debate regarding their open source status.
An evaluation by Patronus AI in November 2023 compared LLaMA 2 with other
prominent AI models like GPT-4 and Anthropic’s Claude 2 in a specialized test, revealing
strengths and weaknesses in their abilities to process and interpret complex financial
documents.
Leveraging the transformer architecture, LLaMA incorporates unique features such as
the SwiGLU activation function,9 rotary positional embeddings,10 and root-mean-squared
layer normalization11 to enhance its performance, instead of standard layer normalization.
The LLaMA 2 series further increased its context length capability, underscoring Meta’s
continuous efforts to push the boundaries of LLM efficiency and effectiveness.
Training for these models prioritized the augmentation of data volume over
parameter count, with the LLaMA 1 models being trained on a dataset comprising
1.4 trillion tokens from diverse and publicly accessible sources. The LLaMA 2 models
benefited from an even larger dataset, meticulously curated to enhance reliability
and minimize privacy concerns, alongside specialized fine-tuning to optimize dialog
interactions and ensure AI alignment through innovative training methods.
This evolution from LLaMA to LLaMA 2 not only showcases Meta’s commitment
to advancing AI technology but also emphasizes its intention to make powerful LLMs
more accessible and applicable across a range of uses, setting a new standard in the
development and deployment of language models.
LaMDA
Introduced by Google as the next iteration following Meena in 2020, LaMDA made
its debut at the Google I/O keynote in 2021, with its advancement revealed in the
subsequent year. This conversational AI model distinguishes itself by its capacity for
engaging in unrestricted dialogues.
9
Shazeer, Noam (2020-02-01), “GLU Variants Improve Transformer”. arXiv:2104.09864 [cs.CL]
10
Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01),
“RoFormer: Enhanced Transformer with Rotary Position Embedding”. arXiv:2104.09864 [cs.CL]
11
Zhang, Biao; Sennrich, Rico (2019-10-01), “Root Mean Square Layer Normalization”.
arXiv:1910.07467 [cs.LG]
255
Chapter 5 Basic Overview of the Components of the LLM Architectures
Guanaco-65B
Guanaco, a large language model (LLM), employs a fine-tuning technique known as
LoRA, created by Tim Dettmers and colleagues within the University of Washington’s
Natural Language Processing group. Leveraging QLoRA, this approach enables the fine-
tuning of models with as many as 65 billion parameters on a 48GB GPU, matching the
performance of 16-bit models without degradation.
The Guanaco series of models surpasses the performance of all prior models on the
Vicuna benchmark. However, due to their foundation on the LLaMA model family, their
use in commercial settings is restricted.
QLoRA is a groundbreaking fine-tuning technique designed to significantly reduce
memory requirements, enabling the fine-tuning of models with up to 65 billion
parameters on a single 48GB GPU without compromising the quality of 16-bit fine-
tuning tasks. This method innovatively directs gradient backpropagation through
a statically quantized, 4-bit version of a pre-trained language model into Low Rank
Adapters (LoRA).
The premier model suite, dubbed Guanaco, sets a new standard by outshining
all previously available models on the Vicuna benchmark, achieving 99.3% of
ChatGPT’s performance with just 24 hours of fine-tuning on a solitary GPU. QLoRA’s
memory efficiency is achieved through several key innovations: the introduction of
4-bit NormalFloat (NF4), an optimally efficient data type for normally distributed
weights; Double Quantization, which further reduces memory demands by quantizing
quantization constants; and Paged Optimizers, designed to smooth out memory
usage spikes.
257
Chapter 5 Basic Overview of the Components of the LLM Architectures
Orca
Microsoft Research unveiled Orca 2, an advanced version of the LLaMA 2 language
model, demonstrating performance on par with or surpassing models with ten times
its parameter count. This leap in efficiency is attributed to a novel training approach
involving a synthetic dataset and a technique known as Prompt Erasure.
In the development of Orca 2, a teacher-student learning framework is employed,
where a larger, more adept language model (the teacher) guides a smaller, less complex
model (the student) toward achieving performance levels akin to those of significantly
larger counterparts.
This method allows the smaller model to learn various reasoning strategies and
select the most suitable one for any given problem. The teacher model uses complex
prompts to elicit specific reasoning behaviors, but with Prompt Erasure, these prompts
are not passed to the student model. Instead, the student model receives only the task
specifications and the expected outcome.
This approach enabled a 13-billion-parameter Orca 2 model to outshine a similarly
sized LLaMA 2 model by 47.54% in benchmark tests. Moreover, the 7-billion-parameter
version of Orca 2 was found to perform on a level “better or comparable” to a 70-billion-
parameter LLaMA 2, particularly in reasoning tasks.
StableLM
Stability AI has introduced StableLM, marking a significant step forward in enhancing
language comprehension within the realm of machine learning. The launch features
two versions of StableLM, one equipped with 3 billion parameters and the other with 7
billion parameters.
The alpha release of StableLM invites users to explore and evaluate the model’s
performance, offering insights that will aid in refining its capabilities. Stability AI
is gearing up to unveil two additional models, boasting 15 billion and 65 billion
parameters, to further push the boundaries of language processing capabilities.
StableLM functions as a self-regressive model, adept at recognizing language
patterns and crafting responses based on provided inputs. It comprises a foundational
model that reliably predicts subsequent tokens and a more specialized model fine-
tuned to adhere to explicit instructions. This fine-tuning process utilizes diverse datasets
like Alpaca, GPT4All, Dolly, and HH, enhancing the model’s proficiency in delivering
customized responses and instructions.
258
Chapter 5 Basic Overview of the Components of the LLM Architectures
Palmyra
Palmyra Base underwent its primary training phase focusing on English language texts,
although a small proportion of non-English content from Common Crawl was also
included in its training dataset. Employing a causal language modeling (CLM) strategy
for pretraining, Palmyra Base aligns with models like GPT-3 by incorporating only a
decoder component in its architecture. This approach to training, centered on self-
supervised causal language modeling, mirrors that of GPT-3, including the use of similar
prompts and experimental frameworks for evaluation.
Palmyra Base distinguishes itself through its remarkable speed and capability,
proving adept across a variety of sophisticated tasks, including sentiment analysis and
content summarization.
The model, Palmyra Base (5b), was developed using a proprietary dataset curated
by Writer.
Palmyra Base is engineered to internalize and represent the English language,
making it a valuable tool for deriving features applicable to a range of downstream
applications. Nonetheless, its primary strength lies in text generation based on prompts,
the task for which it was originally designed and trained.
259
Chapter 5 Basic Overview of the Components of the LLM Architectures
GPT4ALL
GPT4All emerges as a pioneering open source solution aimed at enhancing accessibility
and privacy in the digital realm. Designed for users seeking a powerful, privacy-
conscious chatBot that runs on local machines without the need for sophisticated
hardware or Internet connectivity, GPT4All offers a blend of performance and privacy.
Key attributes of GPT4All include the following:
• Compact Model Sizes: The models range between 3GB and 8GB,
simplifying the download and integration process.
Ecosystem Components
260
Chapter 5 Basic Overview of the Components of the LLM Architectures
The strength of GPT4All lies in its support for multiple model architectures,
including the following:
• EleutherAI’s GPT-J
• Meta’s LLaMA
• Replit
• TII’s Falcon
• BigCode’s StarCoder
These models are regularly updated to ensure they offer peak performance and
quality, with GPT-J and MPT models, in particular, demonstrating impressive results
compared to LLaMA, and ongoing innovations in MPT models suggesting exciting future
enhancements.
Developed and maintained by Nomic AI, the GPT4All project stands as a testament
to the potential for high-quality, secure, and locally operated chatbot solutions, offering
versatility for both personal and professional use across various model architectures.
Summary
This chapter provides an overview of the components of large language model (LLM)
architectures, essential for transforming raw textual data into meaningful, context-aware
outputs. It begins with embedding layers, which convert tokens into continuous vector
representations through an adaptable embedding matrix.
261
Chapter 5 Basic Overview of the Components of the LLM Architectures
Next, feedforward neural networks (FFNs) are introduced, highlighting their role
in processing input data through weighted sums and activation functions to recognize
complex patterns. Recurrent layers are also discussed, emphasizing their importance in
handling sequential data by maintaining hidden states and considering the context of
earlier words.
The chapter covers also the attention mechanisms, such as self-attention and
multi-head attention, crucial for focusing on relevant parts of input text. Transformers
utilize these mechanisms to process extensive texts efficiently. Additionally, activation
functions and normalization techniques are discussed for enhancing neural network
performance and training stability.
In the next chapter, we explore diverse and impactful applications of large
language models within the Python ecosystem like the following:
262
CHAPTER 6
Applications of LLMs
in Python
In this chapter, we explore the diverse and impactful applications of large language
models (LLMs) within the Python ecosystem. From enhancing natural language
processing tasks to generating creative content, LLMs have become integral to many
innovative solutions. We will delve into practical use cases, examine real-world
examples, and provide insights into how these powerful models are transforming
industries and driving technological advancements.
263
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_6
Chapter 6 Applications of LLMs in Python
265
Chapter 6 Applications of LLMs in Python
Note You need to install OpenAI with the command pip install openai
and also to sign up on their website, pay for credit, if you don’t set up automatic
payments, and generate an API key for the utilization of the following app.
import openai
def generate_blog_post(prompt):
response = openai.chat.completions.create(
266
Chapter 6 Applications of LLMs in Python
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are an experienced blog post writer."
},
{
"role": "user",
"content": f"{prompt}"
}
],
max_tokens=1000,
temperature=0.7,
n=1,
stop=None
)
return response.choices[0].message.content
def main():
# Provide a prompt to start generating a blog post
user_input = input("Enter your blog post topic: ")
length = input("How many words?")
prompt = f"""Write a blog post about the topic {user_input} and make it
{length} words long."""
if __name__ == "__main__":
main()
Note Feel free to play with the temperature parameter of the model and you will
see different nuances of the text generated.
267
Chapter 6 Applications of LLMs in Python
As a result, the app will produce a blog post with a length close to the number of
words you pointed to when the app prompted you to input it.
268
Chapter 6 Applications of LLMs in Python
While LLMs offer myriad benefits for translation, it’s essential to address associated
challenges. With the ongoing development of LLMs, efforts are underway to overcome
these obstacles, paving the way for further advancements in translation technology.
The emergence of large language models (LLMs) is poised to bring about significant
transformations within the translation sector, presenting a multitude of potential
impacts.
269
Chapter 6 Applications of LLMs in Python
Elevated Quality
Leveraging massive datasets encompassing extensive text and code, LLMs are adept
at assimilating the intricacies inherent in various languages. This comprehensive
training equips them to produce translations characterized by heightened accuracy
and natural fluency. Consequently, businesses stand to benefit from translations that
resonate authentically with target audiences, fostering improved communication and
comprehension.
Pioneering Opportunities
The advent of LLMs heralds new prospects for translation endeavors, particularly in
realms that have traditionally remained underserved or overlooked. These models can
effectively navigate the translation of languages that were previously sidelined, including
minority languages and specialized technical jargon. By facilitating the translation
of content into such languages, LLMs open avenues to untapped markets, enabling
businesses to extend their reach and diversify their clientele.
270
Chapter 6 Applications of LLMs in Python
Note Please note that the T5-Small model has a limitation in supporting all
languages. Nonetheless, this guide will illustrate the procedure for utilizing
translation models in our projects.
t5_small_pipeline = pipeline(
task="text2text-generation",
model="t5-large",
max_length=1000,
model_kwargs={"cache_dir": '/content/Translation Test' },
)
t5_small_pipeline(prompt)
Sample output:
Enter your text for translation: Hello, how is it going for you?
Which langauge do you want to translate from: english
Enter your desired language to translate to: german
271
Chapter 6 Applications of LLMs in Python
import openai
openai.api_key = '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'
response = openai.chat.completions.create(
model='gpt-4',
messages=[
{
"role": "system",
"content": "You are an experienced translator from one to another
language."
},
{
"role": "user",
"content": f"Translate the following text from {source_language} to
{target_language}:\n{text}"
}
],
max_tokens=100,
n=1,
stop=None,
temperature=0.2,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0,
)
translation = response.choices[0].message.content
return translation
Output:
273
Chapter 6 Applications of LLMs in Python
openai_key = 'YOUR-API-KEY'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
session = requests.Session()
try:
response = session.get(article_url, headers=headers, timeout=10)
274
Chapter 6 Applications of LLMs in Python
article_title = article.title
article_text = article.text
# validating whether the generated summary has at least three lines
@field_validator('summary')
def has_three_or_more_lines(cls, list_of_lines):
if len(list_of_lines) < 3:
raise ValueError("Generated summary has less than three bullet
points!")
return list_of_lines
==================
Title: {article_title}
{article_text}
==================
{format_instructions}
"""
275
Chapter 6 Applications of LLMs in Python
prompt = PromptTemplate(
template=template,
input_variables=["article_title", "article_text"],
partial_variables={"format_instructions": parser.get_format_
instructions()}
)
# Format the prompt using the article title and text obtained from scraping
formatted_prompt = prompt.format_prompt(article_title=article_title,
article_text=article_text)
Sample output:
276
Chapter 6 Applications of LLMs in Python
Explanation
This app is an extended version of the summarizing online articles app from the
previous chapter using the OpenAI GPT-3.5 model. Before using it, you need to install
the following packages:
• Langchain==0.1.4
• deeplake==3.9.11
• Openai==1.10.0
• Tiktoken==0.7.0
• Newspaper3k==0.2.8
• Pydantic==2.7.4
This is something you can do by executing the following command in your notebook
or terminal: pip install langchain==0.1.4 deeplake openai==1.10.0 tiktoken
newspaper3k pydantic. You also need an API key for OpenAI.
Here's a breakdown of what happens in the code:
277
Chapter 6 Applications of LLMs in Python
6. It defines a template for the prompt using the article’s title and text
obtained from scraping.
8. It instantiates the OpenAI model class with the specified API key
and model name.
11. It prints the parsed output, displaying the title and summary of the
article.
Note OpenAI summarization
The actual summarization happens within the OpenAI model, where the input
prompt is provided along with the article details, and the model generates a
summary based on that. The Pydantic model is used to ensure the structure and
validation of the output summary.
278
Chapter 6 Applications of LLMs in Python
279
Chapter 6 Applications of LLMs in Python
Answer Generation
Functioning as a generative model, the LLM utilizes retrieved contexts alongside
the query to formulate responses. By calculating conditional probabilities of word
sequences, it generates contextually accurate and insightful answers.
This systematic approach underscores how generative QA (GQA) systems,
augmented with LLMs, not only retrieve relevant information but also produce
responses that deepen query understanding. These systems epitomize the next
280
Chapter 6 Applications of LLMs in Python
281
Chapter 6 Applications of LLMs in Python
1. Scrape online articles provided by the user and archive the textual
content along with the corresponding URLs.
Before proceeding, ensure you have installed the necessary packages by executing
the following command:
282
Chapter 6 Applications of LLMs in Python
This command will install the required packages, including langchain version 0.1.4,
deeplake, openai version 0.27.8, and tiktoken. Additionally, it will install newspaper3k
package version 0.2.8.
Next, it’s essential to incorporate your OpenAI and Deep Lake API keys into the
environment variables. The LangChain library will then access these tokens and employ
them for the integrations:
import os
os.environ["OPENAI_API_KEY"] = "<YOUR-OPENAI-API-KEY>"
os.environ["ACTIVELOOP_TOKEN"] = "<YOUR-ACTIVELOOP-API-KEY>"
import requests
from newspaper import Article # https://github.com/codelucas/newspaper
import time
283
Chapter 6 Applications of LLMs in Python
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82
Safari/537.36'
}
article_urls = [
"https://www.site.com/2023/05/16/page-one/",
"https://www.site.com/2023/05/16/page-two/",
"https://www.site.com/2023/05/16/page-three/",
Add Your URLs…
]
session = requests.Session()
pages_content = [] # where we save the scraped articles
Following this, we’ll proceed to compute the embeddings of our documents utilizing
an embedding model and preserve them within Deep Lake, a database capable of
handling multimodal vectors. OpenAIEmbeddings will serve as the tool for generating
vector representations of our documents.
284
Chapter 6 Applications of LLMs in Python
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
db = DeepLake(dataset_path=dataset_path, embedding=embeddings)
This segment is pivotal in configuring the system to handle the storage and retrieval
of documents based on their semantic content. Such functionality stands as a linchpin
for subsequent stages, where the objective is to pinpoint the most pertinent documents
to address user queries.
Subsequently, we’ll dissect these articles into smaller segments, with each segment’s
corresponding URL being preserved as a point of reference. This segmentation aids
in streamlining data processing, rendering the retrieval task more manageable, and
directing attention toward the most pertinent text fragments when responding to
inquiries.
The RecursiveCharacterTextSplitter is instantiated with a chunk size of 1000
characters and an overlap of 100 characters between adjacent chunks. The “chunk_
size” parameter delineates the length of each text segment, while “chunk_overlap”
specifies the number of characters shared between adjacent segments. For every
document within “pages_content”, the text undergoes segmentation using the “.split_
text()” method.
285
Chapter 6 Applications of LLMs in Python
chunks = text_splitter.split_text(d["text"])
for chunk in chunks:
all_texts.append(chunk)
all_metadatas.append({ "source": d["url"] })
In the metadata dictionary, we utilize the “source” key to conform with the
expectations of the RetrievalQAWithSourcesChain class, which autonomously retrieves
this “source” item from the metadata. Subsequently, we incorporate these segmented
chunks into our Deep Lake database alongside their corresponding metadata.
db.add_texts(all_texts, all_metadatas)
Let’s dive into the exciting phase of constructing the QA Chatbot. We’ll embark
on developing a RetrievalQAWithSourcesChain, a chain designed not only to retrieve
pertinent document excerpts for answering queries but also to maintain records of the
sources associated with these documents.
286
Chapter 6 Applications of LLMs in Python
llm = OpenAI(model_name="gpt-3.5-turbo-instruct",
temperature=0)
chain = RetrievalQAWithSourcesChain.from_chain_type(llm=llm,
chain_type="stuff",
retriever=db.as_retriever())
Finally, we’ll utilize the chain to generate a response to a question. This response will
encompass both the answer to the question and its associated sources.
print("Response:")
print(d_response["answer"])
print("Sources:")
for source in d_response["sources"].split(", "):
print("- " + source)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
article_urls = [
"https://www.site.com/2023/05/16/page-one/",
"https://www.site.com/2023/05/16/page-two/",
"https://www.site.com/2023/05/16/page-three/",
Add Your URLs…
]
287
Chapter 6 Applications of LLMs in Python
session = requests.Session()
pages_content = []
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
db = DeepLake(dataset_path=dataset_path, embedding=embeddings)
288
Chapter 6 Applications of LLMs in Python
db.add_texts(all_texts, all_metadatas)
chain = RetrievalQAWithSourcesChain.from_chain_type(llm=llm,
chain_type="stuff",
retriever=db.as_
retriever())
print("Response:")
print(d_response["answer"])
print("Sources:")
for source in d_response["sources"].split(", "):
print("- " + source)
Example output:
Sources:
https://developers.google.com/search/docs/specialty/ecommerce/pagination-
and-incremental-page-loading
289
Chapter 6 Applications of LLMs in Python
290
Chapter 6 Applications of LLMs in Python
291
Chapter 6 Applications of LLMs in Python
Model Selection
The initial step is crucial – selecting the right language model. Choices abound, from the
likes of GPT-3, GPT-Neo, GPT-2, to BERT, each differing in size, capabilities, and pre-
trained weights. Opt for a model that harmonizes with the chatbot’s intended purpose
and available resources.
292
Chapter 6 Applications of LLMs in Python
Ensure you’ve installed the required packages by running the following command:
“pip install langchain==0.0.208 deeplake openai==0.27.8 tiktoken”. Now, let’s proceed to
import the necessary libraries.
293
Chapter 6 Applications of LLMs in Python
These libraries offer capabilities for managing OpenAI embeddings, handling vector
storage, text segmentation, and interfacing with the OpenAI API. They facilitate the
development of a context-aware question-answering system, integrating retrieval and
text generation functionalities. Our chatbot’s database will primarily comprise articles
concerning technical issues.
loader = SeleniumURLLoader(urls=urls)
docs_not_splitted = loader.load()
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")
294
Chapter 6 Applications of LLMs in Python
{chunks_formatted}
Answer to the following question from a customer. Use only information from
the previous context information. Do not invent stuff.
Question: {query}
Answer:"""
prompt = PromptTemplate(
input_variables=["chunks_formatted", "query"],
template=template,
)
# user question
295
Chapter 6 Applications of LLMs in Python
# generate answer
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
answer = llm(prompt_formatted)
print(answer)
296
Chapter 6 Applications of LLMs in Python
{chunks_formatted}
Answer to the following question from a user. Use only information from the
previous context information. Do not be creative.
Question: {query}
Answer:"""
prompt = PromptTemplate(
input_variables=["chunks_formatted", "query"],
template=template,
)
# user question
query = "Your question?"
297
Chapter 6 Applications of LLMs in Python
chunks_formatted = "\n\n".join(retrieved_chunks)
prompt_formatted = prompt.format(chunks_formatted=chunks_formatted,
query=query)
# generate answer
llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0)
answer = llm(prompt_formatted)
print(answer)
Example output:
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding embedding (58, 1536) float32 None
id text (58, 1) str None
metadata json (58, 1) str None
text text (58, 1) str None
298
Chapter 6 Applications of LLMs in Python
Understanding Prompting
Prompting involves furnishing instructions or cues to an LLM, informing it of the desired
task. This could be a simple query, an elaborate task description, or even a creative
stimulus. Clarity, brevity, and specificity in instructions are crucial for optimal outcomes.
Benefits of Effective Prompting
299
Chapter 6 Applications of LLMs in Python
Creative Writing
Code Generation
• Bug Fixing: “Fix the syntax error in the following code snippet.”
Translation
300
Chapter 6 Applications of LLMs in Python
Summary
This chapter delves into the diverse and impactful applications of large language models
(LLMs) within the Python ecosystem, highlighting their ability to enhance natural
language processing tasks and generate creative content. The chapter covers key use
cases such as text generation, language translation, and document summarization,
explaining the underlying mechanisms and practical benefits in each domain.
It also explores the construction of advanced chatbots and virtual assistants using
LLMs, providing insights into model selection, data preprocessing, and integration
strategies. Emphasis is placed on practical applications, including customer support
automation, efficient search in unstructured documents, and knowledge management
in large organizations. The chapter concludes by discussing best practices for building
effective and secure LLM-driven solutions, ensuring continuous learning and
improvement in user experience.
In the upcoming final chapter, we will explore how to build real-life applications
using LLMs with
• LangChain
• Hugging Face
• Pinecone
• OpenAI
• Cohere
• Lamini.ai
301
CHAPTER 7
LangChain
LangChain is a public, open source platform designed to empower developers
working in the realm of artificial intelligence (AI) and machine learning. It facilitates
the integration of expansive language models with various external systems, thereby
enabling the creation of applications powered by large language models (LLMs).
LangChain’s primary objective is to forge connections between robust LLMs, such
303
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0_7
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
as OpenAI’s GPT-3.5 and GPT-4, Cohere, and multiple external data sources. This
integration aims to enhance the development and utilization of natural language
processing (NLP) applications.
The framework is accessible to developers, software engineers, and data scientists
proficient in Python, JavaScript, or TypeScript, providing packages in these languages.
LangChain was initiated as a public, open source endeavor by Harrison Chase and
Ankush Gola in 2022, with its first version also being released within the same year.
The significance of LangChain lies in its ability to streamline the creation of generative
AI applications. It offers a simplified avenue for developers to build sophisticated
NLP applications by organizing and making accessible large volumes of data. This is
particularly beneficial for LLMs that need to process and access vast datasets.
LangChain Features
LangChain encompasses a suite of components designed to enhance the
development and functionality of NLP applications:
305
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
This approach enables the development of robust applications that harness the
power of language models to meet diverse and specific use cases.
306
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
307
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
import json
from dotenv import load_dotenv
load_dotenv()
import requests
from newspaper import Article
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
article_url = "YOUR-URL"
308
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
session = requests.Session()
try:
response = session.get(article_url, headers=headers, timeout=10)
print(f"Title: {article.title}")
print(f"Text: {article.text}")
else:
print(f"Failed to fetch article at {article_url}")
except Exception as e:
print(f"Error occurred while fetching article at {article_url}: {e}")
Sample output:
Hugging Face
While the term “Hugging Face” might evoke images of a friendly emoji for many,
within the technological community, it represents something far more significant: a
central hub akin to the “GitHub” for machine learning (ML), dedicated to the
collaborative development, training, and deployment of natural language processing
(NLP) and ML models through open source collaboration.
309
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
The standout feature of Hugging Face lies in its provision of pre-trained models.
This key innovation means that developers no longer need to initiate their projects from
the ground up; instead, they can leverage these ready-made models, adjusting them to fit
their specific requirements, thereby streamlining the development workflow.
Hugging Face serves as a vital gathering place for data scientists, researchers, and ML
engineers to share insights, solicit support, and contribute to the broader open source
movement. Identifying itself as “the AI community for building the future,” Hugging
Face’s ethos is deeply rooted in community-driven advancement.
The platform’s rapid expansion can also be attributed to its user-friendly design,
which welcomes both beginners and seasoned professionals alike. By striving to amass
the most extensive collection of NLP and ML resources, Hugging Face is on a mission to
democratize AI technology, making it widely available to an international audience.
310
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Transformers Library
At the heart of Hugging Face is the Transformers library, a collection of cutting-
edge machine learning models tailored for NLP tasks. This library includes a wide
range of pre-trained models designed for text analysis, content generation, language
translation, and summary creation, among other applications. The introduction of the
“pipeline()” method simplifies the application of these complex models to practical
scenarios, offering an intuitive API for a variety of NLP tasks. This library is pivotal for
democratizing access to advanced NLP technologies, allowing users to easily customize
and deploy sophisticated models.
Additionally, the Hub is equipped with features such as versioning, commit history,
diffs, branches, and integration with over a dozen libraries. For a deeper understanding
of these functionalities, the Repositories documentation provides detailed insights.
311
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Model Hub
Serving as the community’s hub, the Model Hub (Figure 7-1) is where users can explore
and share a plethora of models and datasets. This feature promotes a collaborative
environment for NLP development, enabling practitioners to contribute their own
models and benefit from the collective wisdom of the community. The Model Hub is
easily navigable on the Hugging Face website, featuring various filters to help users find
models suited to specific tasks. This hub is instrumental in fostering a dynamic, evolving
ecosystem where new models are regularly added and refined.
Within the extensive library of over 200,000 models, you have access to a broad
spectrum of functionalities, including the following:
312
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Tokenizers
Essential for the preprocessing of text, tokenizers break down language into manageable
pieces, or tokens, which are then used by machine learning models to understand and
generate human language. These tokens can represent words, subwords, or characters
and are crucial for converting text into a machine-readable format. Hugging Face’s
tokenizers are optimized for compatibility with the Transformers library, ensuring
efficient text preprocessing for a variety of languages and text formats.
Datasets Library
The Hub features a diverse collection of over 5,000 datasets spanning more than 100
languages, suitable for a wide array of applications in NLP, computer vision, and audio
analysis. It streamlines the process of discovering, downloading, and contributing
datasets. To enhance user experience, each dataset is presented with comprehensive
documentation through Dataset Cards and an interactive Dataset Preview, allowing for
in-browser exploration.
The datasets library facilitates a programmatic approach to interacting with these
datasets, making it straightforward to integrate them into your projects. This library
supports efficient data handling, enabling access to even the largest datasets that exceed
your local storage capacity through streaming technology.
313
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Example App:
First, sign up for an account on Hugging Face and then install the required packages
(note that these packages are required by the selected model and could be different for
another model).
The code begins by installing necessary Python packages using “pip”. These packages
include “torch” (PyTorch), “huggingface_hub”, “torch accelerate”, “torchaudio”, “datasets”,
“transformers”, and “pillow” (PIL – Python Imaging Library). These packages are
essential for working with deep learning models, datasets, and image processing.
After installing the required packages, the code imports necessary modules and
functions from these packages. Key imports include “huggingface_hub” for interacting
with the Hugging Face model hub, “transformers” for accessing pre-trained models,
“PIL.Image” for handling images, “requests” for making HTTP requests to fetch images,
“torch.nn” for neural network operations, and “matplotlib.pyplot” for plotting images.
Log in to Hugging Face Model Hub: The code logs into the Hugging Face model hub
using the “login()” function. This step is necessary if you plan to use private models or
datasets hosted on the Hugging Face platform.
314
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
315
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
processor = SegformerImageProcessor.from_pretrained("mattmdjaga/segformer_
b2_clothes")
model = AutoModelForSemanticSegmentation.from_pretrained("mattmdjaga/
segformer_b2_clothes")
url = "https://www.telegraph.co.uk/content/dam/luxury/2018/09/28/L1010137_
trans_NvBQzQNjv4BqZgEkZX3M936N5BQK4Va8RWtT0gK_6EfZT336f62EI5U.JPG"
outputs = model(**inputs)
logits = outputs.logits.cpu()
upsampled_logits = nn.functional.interpolate(
logits,
size=image.size[::-1],
mode="bilinear",
align_corners=False,
)
pred_seg = upsampled_logits.argmax(dim=1)[0]
plt.imshow(pred_seg)
OpenAI API
The OpenAI API acts as a gateway to the advanced machine learning capabilities
developed by OpenAI, enabling seamless integration of state-of-the-art AI functionalities
into your applications. Essentially, this API functions as a conduit, granting access to
OpenAI’s sophisticated algorithms, which include capabilities for text understanding,
generation, and even code creation, all without requiring deep technical knowledge of
the models that power these abilities.
316
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Pre-trained Models
The API offers access to a variety of pre-trained models, which have been developed
and refined by OpenAI. These models, including versions of GPT-4, GPT-3.5, and others,
are ready to be deployed for tasks ranging from text and code generation to image
creation and audio transcription. This collection also includes specialized models for
embeddings, content moderation, and more, all trained on vast datasets with substantial
computational resources, making sophisticated machine learning accessible to a wider
audience.
Key models include the following:
317
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Scalable Infrastructure
The infrastructure behind the OpenAI API, including robust Kubernetes clusters,
ensures scalability to accommodate projects of any size. This scalability is crucial for
supporting the deployment of large models and accommodating the growth of user
projects over time.
• Image Recognition: With models like CLIP, the API extends its
capabilities to visual tasks, enabling object detection and image
classification, which have applications in fields from retail to
healthcare.
318
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
319
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
wish to edit and an RGBA mask marking the edit point in addition
to the other parameters. The variation endpoint, on the other hand,
only requires the target image, the variation count, and the output
size. The endpoint for generating images enables the creation of
unique visuals from a textual description. With DALL-E 3, these
images can be produced in dimensions of 1024x1024, 1024x1792, or
1792x1024 pixels.
The OpenAI API stands as a powerful tool for integrating advanced AI into a
multitude of projects, democratizing access to machine learning innovations and
fostering the development of intelligent, responsive, and personalized technologies.
client = OpenAI(
api_key='YOUR API KEY' # this is also the default, it can be omitted
)
person = input() # Write the name of the person you are interested in.
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a person biography summarizer."},
{"role": "user", "content": f"Summarize this biography for me {person}"},
]
)
print(response.choices[0].message.content)
320
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Instructions
client = OpenAI(
api_key='YOUR API KEY' # this is also the default, it can
be omitted
)
4. Prompt the user to input the name of the person they are
interested in:
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a person biography
summarizer."},
{"role": "user", "content": f"Summarize this biography for me
{person}"},
]
)
After setting up the OpenAI client with the API key, the Python
script sends a request to OpenAI’s chat API to generate a summary
of the biography for the specified person.
321
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Once the request is sent to the OpenAI API, the response contains the
generated summary, which is then extracted and printed using the print
statement.
print(response.choices[0].message.content)
Cohere
Cohere, established in 2019 and headquartered across Toronto and San Francisco with
additional offices in Palo Alto and London, operates as a global technology enterprise
with a focus on artificial intelligence solutions for businesses, particularly through the
development of sophisticated large language models. The company’s inception was the
collective effort of founders Aidan Gomez, Ivan Zhang, and Nick Frosst, all of whom
share an academic background from the University of Toronto.
322
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Cohere Models
Cohere offers a diverse array of models tailored to meet a broad spectrum of needs. For
those seeking a more bespoke solution, there is the option to custom-train a model to
align precisely with particular requirements.
323
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Command
The Command model serves as Cohere’s primary generation tool, designed to interpret
and execute textual commands or prompts from users. This model is not only adept at
generating text in response to instructions but also possesses conversational abilities,
making it ideal for powering chat-based applications.
Embed
The Embed models provide functionality for generating text embeddings or for
classifying text according to a set of criteria. These embeddings are useful for a range
of tasks, such as measuring the semantic similarity between sentences, selecting the
sentence most likely to succeed another, or sorting user feedback into categories.
Additionally, the Classify function within the Embed models supports various
classification or analytical tasks. The Representation model enhances these capabilities
with additional support functions, including language detection for inputs.
Rerank
Lastly, the Rerank model is designed to refine and optimize the outputs of existing
models by reordering their results based on specific criteria. This functionality is
particularly beneficial for enhancing the efficacy of search algorithms.
import cohere
from cohere.responses.classify import Example
co = cohere.Client('GHviIR5p9NC7kNzRf383ykOxU2Y9LQVbSAvAdNSj')
examples=[
Example("Dermatologists don't like her!", "Spam"),
Example("'Hello, open to this?'", "Spam"),
Example("I need help please wire me $1000 right now", "Spam"),
Example("Nice to know you ;)", "Spam"),
Example("Please help me?", "Spam"),
324
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
response = co.classify(
model = 'large',
inputs=inputs,
examples=examples,
)
print(response.classifications)
Output:
This Python code uses the “cohere” library to perform sentiment analysis. Here’s
a breakdown of what the code does:
• “cohere”: The main library for interfacing with the Cohere API.
325
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
• This will print out the classification result, indicating whether the
input is categorized as “Spam” or “Not spam”.
326
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Pinecone
In today’s digital era, where rapid access to and storage of diverse information forms are
paramount, traditional relational databases fall short in managing varied data types like
documents, key-value pairs, and graphs. Enter the era of vector databases, a cutting-edge
solution that employs vectorization for enhanced search capabilities, efficient storage,
and in-depth data analysis.
Among these innovative databases, Pinecone stands out as a leading vector database
widely recognized for its ability to tackle issues related to complexity and dimensionality.
Pinecone, a vector database engineered for the cloud, excels in managing high-
dimensional vector data. At its core, Pinecone leverages the Approximate Nearest
Neighbor (ANN) search algorithm to swiftly find and rank the closest matches within
vast datasets.
This guide will delve into the intricacies of Pinecone, highlighting its key features,
challenges it addresses, and practical applications.
327
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Pinecone’s Features
Pinecone distinguishes itself with a suite of features that cater to the needs of modern
data infrastructure:
328
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Practical Applications
Pinecone’s utility spans across numerous sectors:
• Audio/Text Searches: Offers advanced search capabilities for text
and audio data
pc = Pinecone(api_key="YOUR-API-KEY")
index = pc.Index("test-index")
329
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
index.upsert(
vectors=[
{
"id": "vec1",
"values": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
"metadata": {"genre": "drama"}
}, {
"id": "vec2",
"values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
"metadata": {"genre": "action"}
}, {
"id": "vec3",
"values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
"metadata": {"genre": "drama"}
}, {
"id": "vec4",
"values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
"metadata": {"genre": "action"}
}
],
namespace= "ns1"
)
index.query(
namespace="ns1",
vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
top_k=2,
include_values=True,
include_metadata=True,
filter={"genre": {"$eq": "action"}}
)
Output:
330
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
331
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Lamini.ai
Lamini is at the forefront of artificial intelligence innovation, introducing an advanced
AI-powered large language model (LLM) platform designed to transform enterprise
software development. Leveraging the power of generative AI and machine learning,
Lamini offers a dynamic tool that automates workflows, enriches the software
development lifecycle, and elevates productivity levels. What sets this platform apart
is its capacity to equip developers with sophisticated tools and features, enabling the
crafting of private, tailored models that surpass the efficiency, speed, and usability of
conventional LLMs.
332
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
• Produce and refine content with AI tools that align with brand
guidelines and voice
• Simplify the coding and debugging workflow through AI models
that are adept at navigating a company’s unique code base and
programming methodologies
Example App:
In order to use Lamini, first you need to install and upgrade it in your notebook or
terminal and to sign up on their website in order to get an API key. Install and upgrade
Lamini by using the following commands:
333
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
import lamini
lamini.api_key = "YOUR-API-KEY"
llm = LaminiClassifier()
prompts={
"cat": "Cats are generally more independent and aloof than dogs, who are
often more social and affectionate.",
"dog": "Dogs are more pack-oriented and tend to be more loyal to their
human family.",
}
llm.prompt_train(prompts)
Sample output:
334
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
['cat', 'dog']
Explanation
The preceding code utilizes the Lamini library for text classification, specifically for
classifying text into different categories or classes. Here’s a detailed breakdown of what
each part of the code does:
1. Installation of Lamini
The first two lines ("!pip install lamini" and "!pip install
--upgrade lamini") are using “pip”, a package management
system for Python, to install the Lamini library and ensure it’s up
to date.
2. Importing LaminiClassifier
• In this case, there are two categories: “cat” and “dog”, with
example texts describing characteristics or behaviors associated
with each category.
335
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
7. Making Predictions
• In this case, the model predicts the category labels for the given
texts “I’m more independent than dogs” and “woof”, based on the
training it received using the prompts and example texts.
Note Environmental variables
It’s recommended to add your API key as an environmental variable for security
purposes.
336
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Data Acquisition
The journey to developing a robust LLM begins with the collection of a comprehensive
natural language corpus, sourced from a wide array of platforms. The diversity and
volume of data collected from these sources directly influence the model’s proficiency.
Types of Data Used
338
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
• Deduplication
• Decontamination
• Prompt control
339
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Tools and methods like justext and trafilatura are effective for eliminating standard
web page filler while maintaining a balance between minimizing irrelevant content
(precision) and retaining all pertinent content (recall). Additionally, leveraging metadata
associated with web content can serve as an effective filter.
A simple example with trafilatura:
import trafilatura
if __name__ == "__main__":
# Define URL of the web page to extract content from
url = 'https://example.com'
import trafilatura
if __name__ == "__main__":
# Define URL of the web page to extract content from - feel free to
change the URL
url = 'https://blog.hootsuite.com/what-is-discord/'
If you work in social media, you may be wondering, "What is Discord — and
wait, why should I care?"
What is the Discord app?
Servers can be public or private spaces. You can join a big community for
people who share a common interest or start a smaller private server for a
group of friends.
How did Discord get started?
Discord launched in 2015, and its initial growth was largely thanks to
its widespread adoption by gamers. However, it wasn’t until the COVID-19
pandemic that it began to attract a broader audience.
The company embraced its newfound audience, changing its motto from "Chat
for Gamers" to "Chat for Communities and Friends" in May 2020 to reflect
its more inclusive direction.
Who uses Discord now?
341
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Source: eMarketer
1. Build community
These lfg channels accomplish two things for Fortnite. First, they build
a community around the brand by making it easier for fans to connect. And
they make it easier for players to use their product.
In this case, Discord doesn’t just help Fortnite players connect outside
the game. It improves their experience of the product itself.
2. Use roles to customize your audience’s Discord experience
(Discord roles are a defined set of permissions that you can grant to
users. They’re handy for plenty of reasons, including customizing your
community’s experience on your server)
Here are a few ways to use roles in your server:
- Flair: Use roles to give users aesthetic perks, like changing the color
of their usernames or giving them custom icons.
- Custom alerts: Use "@role" in the chat bar to notify all users with
the role. This allows you to send messages to specific segments of your
audience.
- Role-based channels: Grant users access to exclusive channels open only
to users with certain roles.
- VIP roles: Reward paying subscribers or customers with a VIP role.
Combined with role-based channels, you can make subscriber-only channels.
- Identity roles: Discord profiles are pretty bare bones. With roles,
users can let each other know what their pronouns are or what country
they’re from.
……………………………………………….
A server template provides a Discord server’s basic structure. Templates
define a server’s channels, channel topics, roles, permissions, and default
settings.
You can use one of Discord’s pre-made templates, one from a third-party
site, or create your own.
Can I advertise on Discord?
Save time managing your social media presence with Hootsuite. Publish and
schedule posts, find relevant conversions, engage the audience, measure
results, and more — all from one dashboard. Try it free today.
342
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
In this app:
• In the __main__ block, the URL of the web page, the output file path,
and the list of unwanted keywords are defined.
def segment_text(examples):
segmented_texts = []
for text in examples['text']:
# Break the text into segments of 70 characters
segmented_texts += [text[j:j + 70] for j in range(0, len(text), 70)]
return {'segmented_texts': segmented_texts}
343
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
344
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
import pandas as pd
# Drop duplicates
df = df.drop_duplicates()
print(unique_content)
Output:
Data Decontamination
Ensuring the cleanliness and integrity of data in machine learning involves
straightforward practices like separating training and testing datasets. Yet, for large
language models (LLMs) drawing both training and evaluation data from the expansive
terrain of the Internet, maintaining a clear distinction becomes a complex challenge.
345
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Simple example:
import pandas as pd
df = pd.DataFrame(data)
346
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Output:
Text IsSensitive
2 Our support email is [email protected] False
347
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
RealToxicityPrompts1 have quantified the prevalence of toxic content within widely used
datasets, highlighting the necessity of filtering out such content to prevent perpetuating
harmful biases in model outputs.
Techniques and tools like Perspective API serve to identify and mitigate the inclusion
of toxic materials in training datasets, ensuring the resulting language models do not
propagate or amplify these biases. Nevertheless, filtering for toxicity and bias demands
meticulous consideration to avoid silencing marginalized voices or reinforcing dominant
narratives, necessitating a comprehensive analysis of the content for pejorative language
and biases related to gender, religion, and other sensitive areas before training.
Here’s a simplified approach using Python:
1
Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith,
REALTOXICITYPROMPTS: Evaluating Neural Toxic Degeneration in Language Models, Paul
G. Allen School of Computer Science & Engineering, University of Washington, Allen Institute for
Artificial Intelligence, Seattle, USA
348
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
The following is a basic outline of a Python program that incorporates these steps.
It assumes you have access to a toxicity detection API and functions for bias detection,
which you may need to implement or integrate based on available resources:
Simple example:
import pandas as pd
from your_toxicity_detection_tool import detect_toxicity
from your_bias_detection_tool import detect_bias
# Load dataset
def load_dataset(file_path):
return pd.read_csv(file_path)
# Preprocess text
def preprocess_text(text):
# Implement text cleaning here
return text.lower()
# Main function
def main():
dataset_path = 'path_to_your_dataset.csv'
dataset = load_dataset(dataset_path)
dataset['text'] = dataset['text'].apply(preprocess_text)
349
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
if __name__ == "__main__":
main()
350
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Simple example:
import pandas as pd
from sklearn.impute import SimpleImputer
print("Original DataFrame:")
print(df)
351
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
df_filled_most_frequent = pd.DataFrame(imputer.fit_transform(df),
columns=df.columns)
print("\nDataFrame after imputing missing values with most frequent
value:")
print(df_filled_most_frequent)
Output:
Original DataFrame:
Feature1 Feature2 Feature3
0 1.0 NaN 1.0
1 2.0 2.0 NaN
2 NaN 3.0 3.0
3 4.0 4.0 4.0
This program demonstrates four basic strategies for handling missing data:
These methods are basic and widely used, but the choice of method depends on the
dataset’s specifics and the problem at hand. More advanced techniques, such as using
models to predict missing values or employing deep learning for imputation, can also be
explored for complex datasets.
Data Normalization
Normalization plays a crucial role in standardizing the structure of dataset features to
a consistent scale, enhancing the efficiency and accuracy of machine learning models.
Techniques such as Min-Max scaling, log transformation, and z-score standardization
are commonly employed by machine learning practitioners to achieve this uniformity.
353
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
By adjusting the data to fit within a more restricted range, normalization facilitates
quicker model convergence. Research in the field of data science has revealed that
applying normalization techniques to datasets can enhance the performance of
multiclass classification models by as much as 6%.
Simple example:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
print("Original DataFrame:")
print(df)
# Data normalization
354
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Output:
Original DataFrame:
Feature1 Feature2 Feature3
0 1 10 100
1 2 20 200
2 3 30 300
3 4 40 400
Explanation
355
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
For instance, if your model needs input features within a specific range, Min-Max
scaling might be more appropriate. On the other hand, if your model benefits from
features having properties of a standard normal distribution (mean=0, variance=1), then
Standardization would be the better choice.
Data Parsing
Parsing is the process of breaking down data to understand its syntax and extract useful
information. This information then becomes input for large language models (LLM). In
the realm of structured data, such as XML, JSON, or HTML, parsing is straightforward as
it involves data formats with clear organization. For natural language processing (NLP),
parsing takes on the task of deciphering the grammatical structure of sentences or
phrases, which is essential for applications like machine translation, text summarization,
and sentiment analysis.
Moreover, parsing extends to making sense of semi-structured or unstructured data
sources, including email messages, social media content, or web pages. This capability
is crucial for performing tasks like topic modeling, recognizing entities, and extracting
relationships between them.
Simple example:
import string
def preprocess_text(file_path):
"""
This function reads a text file and preprocesses it by:
- Removing punctuation
- Converting to lowercase
- Splitting into words
"""
# Define a translation table to remove punctuation
translator = str.maketrans('', '', string.punctuation)
356
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
return words
Output:
Sample_data.txt content:
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the
1500s, when an unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five centuries, but
also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop publishing
software like Aldus PageMaker including versions of Lorem Ipsum.
Explanation
• The text is then split into individual words using the split() method.
• Finally, the script prints the first ten words from the processed
dataset to show the result of the parsing.
This example is quite basic and intended for demonstration purposes. Depending
on your specific needs, you might want to include additional preprocessing steps like
removing stop words, stemming, lemmatization, or handling special text patterns
and emojis.
Tokenization
Tokenization is the process of dividing text into smaller pieces, known as tokens. These
tokens can range from individual words and subwords to characters. This segmentation
turns complex text into a simpler, structured format that the model can efficiently
process. By dissecting text into tokens, the model acquires a detailed insight into the
nuances of language and its syntax, facilitating the generation and analysis of coherent
word sequences.
Additionally, tokenization plays a critical role in establishing a vocabulary and
developing word embeddings, which are essential for the model’s ability to comprehend
and produce language. This foundational step is vital for the preprocessing of text in
large language models (LLMs), setting the stage for advanced language modeling and
understanding.
Simple example:
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')
# Sample text
text = "Hello, world! This is an example of tokenization for language
models."
print(tokens)
358
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Output:
Explanation
• First, the script imports nltk and the word_tokenize function from
nltk.tokenize. The word_tokenize function is designed to split text
into words using the Punkt tokenizer.
• The word_tokenize function is then called with the sample text as its
argument, which returns a list of word tokens.
• Finally, it prints the list of tokens to show the result of the tokenization.
This example demonstrates basic word-level tokenization, which is suitable for many
natural language processing (NLP) tasks. However, when working with LLMs, especially
those using models like BERT or GPT, you might use more sophisticated tokenizers like
byte pair encoding (BPE), WordPiece, or SentencePiece.
These tokenizers are capable of breaking text down into subword units, helping
the model handle a wider variety of words, including those not seen during training,
more efficiently. Many deep learning frameworks and libraries, such as Hugging Face’s
Transformers, provide easy access to these advanced tokenizers.
359
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
To do it, first you need to install NLTK with the command pip install nltk, then
download the following data needed:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
import nltk
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
nltk.download('punkt')
# Sample text
text = "The leaves on the tree are falling quickly due to the strong wind."
Output:
360
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
nltk.download('wordnet')
# Sample text
text = "The leaves on the tree were falling quickly due to the
strong winds."
Output:
Explanation
361
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Word Embeddings
This process converts words or phrases into numerical vectors, positioning words with
similar meanings close together in a continuous vector space. Static word embedding
techniques like Word2Vec, GloVe, and fastText are renowned for producing these
compact, multidimensional representations of text.
Word embeddings capture the essence of word context and semantic relationships,
facilitating language models in tasks such as text classification, sentiment analysis, and
language translation by understanding word usage and associations.
Contextual Embeddings
Offering a leap beyond traditional word embeddings, contextual embeddings generate
dynamic word representations based on their usage in sentences. This approach, utilized
by models like GPT and BERT, allows for the nuanced differentiation of meanings in
polysemous words (words with several meanings) and homonyms (words identical in
spelling but varying in meaning), like the word “bank,” which could denote a financial
establishment or a river’s edge.
Contextual embeddings dynamically adjust word representations to reflect their
specific context within sentences, significantly improving LLMs’ performance across a
spectrum of NLP applications by capturing the intricate variances in word meanings.
362
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Subword Embeddings
Subword embeddings represent another innovative strategy, breaking down words into
smaller subword units or vectors. This technique proves invaluable for managing rare
or out-of-vocabulary (OOV) words, which fall outside the model’s known vocabulary.
By dissecting words into their subcomponents, the model can still attribute meaningful
representations to these unfamiliar terms.
Techniques like byte pair encoding (BPE) and WordPiece are instrumental in this
process. BPE progressively merges frequent subword pairs, whereas WordPiece divides
words into characters before combining common character pairs. These methods
adeptly grasp the morphological structures of words, boosting the model’s capacity to
handle a vast and varied vocabulary, thereby sharpening its semantic and syntactic
discernment of words.
To make some embeddings, first you need to install transformers with the following
command:
# Encode text
text = "Hello, world!"
encoded_input = tokenizer(text, return_tensors='pt')
# Get embeddings
with torch.no_grad():
outputs = model(**encoded_input)
# The last hidden state is the sequence of hidden states of the last layer
of the model
last_hidden_states = outputs.last_hidden_state
363
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
# For simplicity, we can take the mean of the last hidden state as the
sentence embedding
sentence_embedding = torch.mean(last_hidden_states, dim=1)
print(sentence_embedding)
Output:
Keep in mind that this is a basic approach to obtaining sentence embeddings and
there are more sophisticated methods for specific tasks or to capture deeper semantic
meanings.
364
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
365
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
366
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
• Pattern: A go-to for extracting data from the web, Pattern merges
NLP, web scraping, and data mining functionalities for online content
analysis.
367
Chapter 7 Harnessing Python 3.11 and Python Libraries for LLM Development
Summary
This chapter focuses on harnessing Python 3.11 and Python libraries for the
development of large language models (LLMs), introducing key frameworks such as
LangChain and Hugging Face, and detailing their features, applications, and practical
implementations.
As we conclude our journey through the fascinating world of Python and large
language models (LLMs), it’s evident how these powerful tools have revolutionized the
landscape of technology and data science. Python’s simplicity and versatility, coupled
with the transformative capabilities of LLMs, offer unprecedented opportunities for
innovation and problem-solving across various domains.
Whether you’re developing sophisticated AI applications, automating complex
workflows, or exploring new frontiers in natural language processing, the knowledge
and skills you’ve gained from this book provide a strong foundation. As you continue
to experiment, learn, and grow, remember that the fusion of creativity, curiosity,
and technical prowess is the key to unlocking the full potential of Python and LLMs.
The future is bright, and your contributions will undoubtedly shape the next wave of
technological advancements.
368
Index
A data preprocessing, 71, 72
deduplication, 72
Adaptation
DeepNorm, 70
alignment-tuning, 74
distributed training, 70, 71
alignment verification/utilization, 75
LayerNorm, 69
fine-tuning, 74
pre-norm/post-norm, 70
prompting, 76, 77
layer normalization RMSNorm, 69
Approximate Nearest Neighbor
positional encodings, 67, 68
(ANN), 327
RoPE/Alibi, 68
Artificial intelligence (AI), 22
education, 81
Lamini.ai, 332
B
LangChain, 303
libraries, 303 Backpropagation through time
multimodal deep learning, 22 (BPTT), 208
natural language processing, 28 Bag-of-words (BOW) model, 9, 10, 12
OOP, 195, 196 Bengio, Yoshua, 13
significant advancements, 90 Bidirectional Encoder Representations
technologies, 1 from Transformers (BERT), 21, 234
Artificial neural networks (ANNs), 14 architectural innovations, 242
Attention-based language models bidirectional strategy, 240
attention mechanism, 20 classification token (CLS), 243
innovative developments, 20 definition, 240
self-attention, 20 dual-phase strategy, 241
transformer architecture, 21, 22 functions, 241, 242
Attention mechanisms input visualization, 242
activation functions, 68 language processing, 243, 244
cross/self-attention, 66 pre-training/fine-tuning, 241
definition, 66 transformer classification layer, 243
flash attention, 67 Business models
full/sparse attention, 66 client interactions, 97
layer normalization, 69 content moderation, 96
369
© Dilyan Grigorov 2024
D. Grigorov, Introduction to Python and Large Language Models,
https://doi.org/10.1007/979-8-8688-0540-0
INDEX
370
INDEX
371
INDEX
I attention, 65–72
attention-based language, 20–22
Information retrieval (IR) techniques, 8
benefits, 78
Integrated development environment
BERT, 240–244
(IDE), 114
business models, 95–99
Inverse document frequency (IDF),
capabilities, 22
11–13
chatbots/virtual assistants, 290–298
Chomsky’s theory, 5
J, K computer vision/speech
Jurassic-2 language model, 250 recognition, 1
content creation, 82
creative arts, 83
L disaster/management, 84
LaMDA, 257–259 DOCTOR, 7
Lamini.ai education, 81
applications/use cases, 333–336 ELIZA program, 6–8
definition, 332 embedding layers, 200
features/functionalities/advantages, engineering disciplines, 85, 86
332, 333 engineering feature, 362–364
operational mechanics, 332 environmental science, 85
LangChain ethics/bias/misinformation, 83
article summarization, 307–309 evolutionary steps, 2
capabilities, 306 factors, 22
cloud storage services, 305 finance sector, 82, 83
components, 304 forecasting/finance, 84
definition, 303 gaming industry, 84
development process, 305, 306 hallucination, 227–233
integrations, 305 healthcare communication/
Language modeling (LM), 59–62 management, 80
Large Language Model Meta AI (LLaMA), implications, 233, 234
254, 255 in-depth architecture, 64, 65
Large language models (LLMs) industry analyses, 90
accessibility, 85 key advantage, 91–94
adaptation, 74–77 Lamini.ai, 332–336
advanced NLP, 80 language modeling, 59–62
agents, 87 legal/compliance assistance, 83
architecture, 63, 64, 199, 234 libraries, 366, 367
architecturesGPT-4, 235, 236 limitations, 87
372
INDEX
373
INDEX
N summarization, 273
supervised methods, 52
Named entity recognition (NER), 31,
syntax encompasses, 30
36, 41, 243
tasks, 31, 32
Natural language generation (NLG), 53
text generation, 264
Natural language processing (NLP), 5, 52,
text preprocessing/engineering, 32–43
80, 82, 312
tokenization, 215
ambiguity/context/precision, 53
tone/inflection/semantic analysis, 54
attention-based language models, 20
unsupervised method, 52
computational linguistics, 28
word embedding, 43–55
computer programs, 28
word sense disambiguation, 31
conventional techniques/LLMs, 55–57
Natural Language Toolkit (NLTK), 366
coreference resolution, 32
extraction techniques, 41
data preprocessing, 29
lemmatization, 39
discourse, 30
libraries, 38
foundational principles, 54
named entity recognition, 41
Hugging Face, 309 n-grams, 40
human communication/computer POS tagging, 40
comprehension, 28 Python, 38
humans possess, 29 stemming, 39
interface, 28 stop word removal, 39
LangChain, 304 TF-IDF, 41
language modelling, 60 tokenization, 38
libraries, 366 Natural language understanding (NLU), 52
machine learning/deep learning Neural language models (NLMs), 60
models, 53 conditional language models, 19
NLG capabilities, 53 conventional n-gram model, 13
NLU model, 52 distributed representation, 14
OOP, 196, 197 encoder-decoder architectures, 18
parsing, 356 GRUs, 18
part-of-speech tagging, 31 LSTM model, 16, 17
pragmatics, 30 recurrent neural networks, 14, 15
primary components, 29 seq2seq models, 18, 19
real-world language, 54 Word2Vec model, 14
semantics, 30 Next Sentence Prediction (NSP), 242
sentiment analysis, 32, 47–52 Non-negative matrix factorization
speech recognition, 31 (NMF), 42
statistical model, 53 NormalFloat (NF4), 257
374
INDEX
O features, 186
negative zero formatting, 194
Object-Oriented Programming (OOP)
Required()/NotRequired(), 187, 188
abstraction, 169, 170
self type, 188, 189
AI, 195, 196
TOML files, 191–193
attributes/methods, 167
TypedDict, 186, 187
coding approach, 165
type variables, 193
core principle, 167
variadic generics, 194
encapsulation, 173, 174
Python object
file handling
attributes/methods, 168
append() mode, 185
creation (Book object), 168
capabilities, 180
structured programming, 166
End of Line (EOL), 180
OpenAI API
reading mode, 182–184
capabilities, 316
reading/writing, 180–182
code completion service, 319
split() function, 184
customization, 317
write() function, 185 features, 317
getter and setter methods, 173 image generation, 319
inheritance, 170, 171 industries, 318–320
modules pre-trained models, 317
asterisk (*) symbol, 177 scalable infrastructure, 318
attributes, 176 sentiment analysis, 318
built-in modules, 178, 179 summarized biography, 320–322
creation, 174 text comparison, 318
definitions/statements, 174 user-friendly interface, 318
dot (.) operator, 175 Optical Character Recognition (OCR), 344
“from” statement, 176 Orca 2, 258
import functions, 175 Out-of-vocabulary (OOV), 363
locations, 177, 178
rename, 178
structure, 175 P
NLP, 196, 197 Palmyra Base (5b), 259
objects, 166 Part-of-speech (POS), 43
polymorphism, 171, 172 Pathway interaction and collaboration
Python 3.11 Claude v1, 251, 252
arbitrary literal string type, 193 Falcon 40B, 252
error messages, 189 accessibility, 254
exception notes, 190, 191 design, 252
375
INDEX
376
INDEX
377
INDEX
378
INDEX
379
INDEX
380