Large Language Models
Large Language Models
MODELS
LARGE LANGUAGE
MODELS
A Bite Sized Guide on LLMs
for Managers
Preface
Ever since Chat-GPT was released to the general public late last year, the interest in
Large Language Models has skyrocketed. Having seen the relatively high accuracy
and ease of use of Chat-GPT, everyone seems convinced about the potential of
Large Language Models.
The pace at which the technology has evolved over the last 6-8 months has been
breathtaking. A lot of managers we spoke to want to develop an in-depth
understanding of this emerging field so that they can make informed decisions.
In this book, you will find our learnings from the field and our experiments on LLM,
condensed into a short 40 page book which will help you develop a foundational
understanding of this field from different perspectives.
All you need to invest is a couple of hours of your time to read this book and you will
have a deeper understanding of LLMs, their capabilities, what is happening in
businesses around LLMs and how you can infuse LLMs into your business.
We hope that you find this book useful and enjoy reading it as much as we enjoyed
working on it.
TABLE OF CONTENTS
01 Introduction 1
02 Technology Landscape 11
LLMs’ Practical
04 Considerations
19
01 Introduction to LLMs
In this section, we provide a collection of foundational articles
on Large Language Models (LLMs). We begin with the
fundamental concepts and move on to the evolution of LLMs.
We then delve into their operational mechanics without getting
into too many technical details. The article on fundamentals of
LLMs is aimed at helping you understand LLMs sans all the
complicated technical concepts. We also take a look at LLM
related jargons. We conclude this section with the impact of
LLMs on various industries. After reading this section, you will
have a solid understanding of LLMs, their functionality, and
potential applications.
1
LARGE LANGUAGE MODELS
Large Language Models (LLMs) are a type of artificial intelligence model designed to
process, understand, and generate human-like text.
In technical terms, Large Language Models are complex neural networks that are
trained on massive amount of text data. They use this data to learn patterns in language
and generate text that is similar to what a human would write.
LLMs are being used for a wide range of tasks, such as language translation, text
summarization, and generating content.
Language Models
1. Autoregressive models which predict the next word in a sequence of words based
on the patterns it has learned.
2. Autoencoding models which predict the missing or masked word based on other
words in the text.
LLMs that generate responses to text queries (“prompts”) use Autoregressive Models to
predict what comes next in a piece of text, given what they've seen so far.
Language Models use Neural Networks (NN) which are modeled on human brains to get
trained to understand the patterns in language. Neural networks are structured in a
layered fashion. Neural networks consist of layers of nodes, or "neurons", which are
interconnected. Each node takes in input, performs a calculation, and then passes that
output on to the next layer. The final layer of nodes provides the model's prediction.
The "learning" part of a neural network happens during training, when it adjusts the
weights and biases of the connections between these nodes to minimize the difference
between the model's predictions and the actual data. These weights and biases
between neurons are called parameters. A Large Language Model has billions of
parameters. Autoregressive models predict the next word in a sequence of words based
on these parameters.
2
LARGE LANGUAGE MODELS
We can think of it this way – a Large Language Model uses 100 billion equations to
predict which words belong at which position in a sentence. So when Chat-GPT
generates a text response to your query ( prompt ), it is just predicting which word will
come next in the context of your input prompt. Technically it is not generating anything
new, it is just using its knowledge to come up with a string of words that have the best
probability of occurring together.
NATURAL
ARTIFICIAL LANGUAGE
NEURAL PROCESSING
NETWORKS (NLP)
Rules-Based
Systems
DEEP Language
LEARNING (DL) Models
3+ Layers
LARGE
LANGUAGE
MODELS
(LLMs)
>1 Billion parameters
Common LLMs
Chat-GPT:
The renowned language model powering conversations and text
generation.
Vicuna 13B:
A powerful LLM designed to handle complex language tasks with its
13 billion parameters.
Bloom:
An innovative LLM that blooms with creative and insightful
responses, making language generation a breeze.
Databricks Dolly:
LLM trained on the Databricks machine learning platform. Read
more about it on the next page.
3
LARGE LANGUAGE MODELS
Training costs and operating costs are lower. Operating costs are higher.
Most open source LLMs have a non-commercial license which restricts the
commercial use of these LLMs. However, these LLMs can be used for research and
internal projects where the organization does not make any commercial gains. For
example, organization's internal knowledge sharing system can be converted into a
question-answer system using an open source LLM.
Databricks
Databricks Dolly 2.0 is the first open source, instruction-following LLM, fine-tuned on
a human-generated instruction dataset, licensed for research and commercial use.
Dolly 2.0 is a 12B parameter language model based on the EleutherAI pythia model
family and fine-tuned exclusively on a new, high quality human generated instruction
following dataset.
Databricks has open sourced the entirety of Dolly 2.0, including the training code,
the dataset, and the model weights, all suitable for commercial use. This means that
any organization can create, own, and customize powerful LLMs that can talk to
people, without paying for API access or sharing data with third parties.
4
LARGE LANGUAGE MODELS
5
LARGE LANGUAGE MODELS
0.3 Weig
ht Output Layer A basic neural network has
interconnected artificial
neurons in three layers:
1.1 1.1
0.4
W
ei 0.8
Weig gh
t
0.8 0.6 0.15 Category 1 Input Layer
h
0.6
t Information enters the artificial
We
neural network from the input
ig
-0.3 0
ht
layer. Input nodes process the
W
0.8 0.4 0.4 data and pass it on to the next
ei
0.1 0.07 Category 2
gh
layer.
t
Hidden Layer
Hidden layers take their input
t
gh
from the input layer or other
ei
Input Layer
W
ht
ht
ig
hidden layers. Each hidden layer
ig
We
We
-0.4 0 -0.6 0 0.36
1.5 Category 3 analyzes the output from the
h t
Weig previous layer, processes it
0.2 W further, and passes it on to the
eig
ht next layer.
0.2 0.2 Weight -0.5 1.2 -0.3 0.05 Category 4
1.0 Output Layer
The output layer gives the final
result of all the data processing
0
by the artificial neural network.
Biases Biases Biases
6
LARGE LANGUAGE MODELS
Encoding a sentence to
A Part of a transformer that helps it
Encoder represent its meaning in a
understand and remember input.
numerical form.
Beam Search A method to find the best word Choosing the most suitable
sequence when generating text. words to generate a poem.
7
LARGE LANGUAGE MODELS
LLM Concepts
1. PROMPT & PROMPT ENGINEERING
Prompting Output
Input
Summarize a text of X
e.g.
Large Writing
Prompt engineering is the process of designing and refining specific text prompts to
guide Large Language Models (LLMs) in generating desired outputs. It involves crafting
clear and specific instructions to get the desired output.
Agents use LLMs’ outputs to decide what actions should be taken. Tools like web
search or calculators are packaged into a logical chain of operations. Use of agents
involves, a base LLM, a tool that it will be interacting with, an agent to control the
interaction.
8
LARGE LANGUAGE MODELS
LLMs make it possible to analyze huge amounts of unstructured data using just natural
language. A well trained LLM can “read” through thousands of pages of text and come
back with the insights you asked for within just a few seconds.
LLMs have the potential to revolutionize the way tedious manual tasks related to text
processing are performed across industries. LLMs are helping businesses unleash the
wave of hyperproductivity.
Over the last two years, LLM neural networks have been expanding AI’s impact in fields
such as healthcare, education, finance, software and manufacturing. However, it was
Chat-GPT which really created ripples in the industry with its Generative capabilities.
Let’s take a look at LLM use cases that have been implemented across various industries.
LLMs in Healthcare
Use Cases:
1. Questions and Answers based enquiry about medical topics. Example “How to
treat rashes?”
2. Interpret medical research papers and extract important results.
Example, input PubMed abstract and ask about key results.
3. Automatically generate detailed medical reports based on key inputs from
the doctor.
4. Summarize long form clinical notes such as discharge summaries, visit
summaries, pathology and other test reports into a short paragraph or
bulleted list.
LLMs in Finance
Use Cases:
Continued...
9
LARGE LANGUAGE MODELS
Software Engineering
Use Cases:
1. AI co-pilot to help write code which adheres to the syntax of the programming
language.
2. Code generation based on natural language prompts.
3. Documentation generation – software release notes, change logs, product
documentation can be generated using LLMs.
4. Error finding and Testing.
Education
Use Cases:
Across Industries
Use Cases:
New use cases are emerging even as this book goes to publishing.
10
LARGE LANGUAGE MODELS
02 LLM LANDSCAPE
We have established the fundamentals of Large Language
Models (LLMs) in Section 1. Let’s shift our focus to the diverse
array of operational LLMs. In this section, we take you through
the LLM landscape and help you understand which different
LLMs are available, their sizes and the companies that are
building those models. We will cover ways in which you can
implement LLM, which will help you gain a deeper understanding
of the implementation approaches. This section concludes with
an important article on whether the size of LLMs should be the
only consideration while choosing it. After reading this section,
you will have a nuanced understanding of the LLM landscape.
11
LARGE LANGUAGE MODELS
Open/Closed
# Name Full form Company Parameters Notes
Source
Generative Pre-
GPT-4 released in March 2023
1 GPT-3 trained Open AI 175B Closed with unknown parameter size.
Transformer
Downloadable model. Access to
Model only available to
2 LLaMa - Meta 7-65B Open
researchers and non-commercial
personnel.
Language Model Designed to have more natural
3 LaMDA for Dialogue Google 173B Closed and engaging conversations
Applications with users.
Chat- Generative
4 ChatGPT Pre-trained Open AI 20B Closed Provides API access only.
Transformer
Positioned as "Next
10 Claude - Anthropic Unknown Closed generation Al assistant".
12
LARGE LANGUAGE MODELS
You can use an LLM like Chat-GPT with API key or an open source LLM. When you use any LLM via API
every query has a cost associated with it. Cost for Chat-GPT is typically $0.002 per 1000 queries
When to use:
• When you need a single model that can be used for multiple tasks.
• Need to make predictions based on just a handful of labeled examples.
• When versatility and enterprise access are more important than latency.
• Model parameters are not released for closed LLMs - so it is a black box
• These models may not work the best on your specific data.
• Data privacy and security concerns exist.
2. Fine-Tuned LLMs
Fine-tuning improves the ability of the model to complete a specific task. You start with an existing LLM
and fine-tune it for your specific context. Fine-tuned models are generally smaller than their large
language model counterparts.
When to use:
3. Edge LLMs
Purposefully small in size, they can take the form of fine-tuned models. Edge models run offline and
on-device, there aren’t any cloud usage fees to pay. They offer privacy, no need to connect to cloud.
When to use:
13
LARGE LANGUAGE MODELS
LLM choice should be based on your use case requirements. Larger LLMs offer
advantages such as improved performance in language generation and context
understanding. Nevertheless, they also come with drawbacks, including higher
computational demands, longer inference times, and increased costs.
However, it's important to note that smaller models may not excel at tasks involving
reasoning or mathematics. If your use case requires such capabilities, careful
evaluation of your options is necessary.
BERT - 340M
GPT 1 - 117M GLM - 130B
Chinchilla Bloom - 176B NLLB
GPT-2 - 1.5B Cohere - 70B
GPT-4 - 54.5B
Megatron - 11B
- 100 Trillion ML
ruGPT - 3
Parameters OPT-175b
PaLM BB3- 175B
GPT-3 - 175B Jurassic -1 PaLM-Coder
- 178B Minerva- 540B MT-NLG - 530B
YaLM
UL - 20B - 100B
Cedille - 6B
Parameters Fairseq - 13B
Vicuna - 13B
AI Lab/Group GPT-J - 6B Gopher - 280B
LaMDA Dolly- 15B
Available - 137B AlxeaTM - 20B
GPT-NeoX - 20B
Closed BlenderBot2.0 - 9.4B
14
LARGE LANGUAGE MODELS
03 LLMs AND
YOUR BUSINESSES
In this section, we evaluate the influence of LLMs on business with
their transformative potential. We begin with the benefits LLMs can
drive for your businesses. Next, we cover the emerging trends in
LLMs. This article will help you understand how other businesses are
looking at LLMs. We close this section with a practical framework to
stimulate brainstorming on how LLMs could be harnessed to benefit
your unique business needs. The aim is to empower you to not just
understand, but to strategize and innovate using LLMs in your
business landscape. After reading this section, you will have a deeper
understanding of how LLMs can impact your business.
15
LARGE LANGUAGE MODELS
Hyper Productivity
Agility / Speed of Operation
Reduced Costs
Automation of manual tasks results in reduction of personnel
costs, faster cycle time also reduces overheads and other costs.
Improved Accuracy
Automation of any sort results in elimination of human errors.
LLM led automation also has this effect.
Enhanced Personalization
LLMs can be used for personalization at scale - something
which was not possible in the past.
Organizational Impact
Democratization of AI
Thanks to LLM based natural language querying systems
everyone can use the power of AI via simple prompt based chats.
Innovation
LLM based automation helps knowledge workers significantly
reduce time spent in non-core activities freeing up bandwidth for
innovation.
16
LARGE LANGUAGE MODELS
Companies want to start with open source LLMs for their first LLM
2 experiments and then based on their use cases and requirements plan to
evaluate commercial options.
ROI calculation is still evolving - in most cases, LLMs are saving your time and
3 improving productivity. Hyper-productivity is the only theme right now.
Prompt engineering that is how to pose questions to LLMs to get the best
5 answers - is becoming an emerging must have skill.
17
LARGE LANGUAGE MODELS
“Conversations with
Knowledge Base” Use
Cases:
Find tasks where
interactions with text data
or knowledge base can
speed up decision making.
“Hyperproductivity”
Use Cases:
Find areas where
text generation,
summarization and
automation can be
used to enhance
productivity.
“Human in the
loop” Use Cases:
Identify workflows
where computer
output with 80%+
accuracy approved
by humans can
help.
“Re-imagine processes”
Use Cases:
Redefine the way value is
delivered to stakeholder
using LLMs. Think of newer
customer experiences.
The best way to brainstorm about LLM use cases is to consider LLM strengths like
superior context-based search, automated text generation, content summarization,
question and answers based conversational information flow etc. can be built into your
workflows. Avoid mission critical use cases where anything below 100% accuracy will be
an issue, in such cases consider Human-in-the-loop approach.
18
LARGE LANGUAGE MODELS
04 PRACTICAL
CONSIDERATIONS
WHILE IMPLEMENTING LLMs
This section underscores the practical aspects of implementing
Large Language Models. We provide comprehensive articles
discussing common pitfalls to avoid with LLMs, the paradigms
you need to consider while productionizing LLMs. Additionally,
we delve into the profound influence LLMs can exert on your
organizational culture and present essential change
management considerations. Our objective here is to offer a
pragmatic approach towards LLM adoption, emphasizing its
strategic impact on your organization and its culture. After
reading this section you’ll have a well rounded practical
perspective on LLMs.
19
LARGE LANGUAGE MODELS
Mistakes to avoid
While LLMs are easy to use and most often correct, any mistakes can prove
to be very expensive. Here's our list of mistakes to avoid while using LLMs.
20
LARGE LANGUAGE MODELS
Productionizing LLMs
For business managers, productionizing LLMs is driven by the Return on Investment
(ROI). Everyone is talking about AI induced hyper productivity. Most LLM use cases are
evaluated on the ROI they generate in terms of time savings. In this article we introduce
another ROI paradigm you need to consider while productionizing LLMs -
"Conversational Insights" use case for Productionizing LLMs.
Conversational bots are popular because of its simplicity and ease of use. We
recommend building your own version of conversational chatbot or AI copilot with an
open source LLM and a framework such as Langchain. You can use this framework for
deriving business insights using your own knowledge base.
We've developed an architecture where you can choose your own model (we have tried
Vicuna 13B and Databricks Dolly), your own vector database to train the model on your
own knowledge base. This model can be queried using Natural Language queries to get
insights on your own data. This conversational chatbot will complement your BI
dashboard layer which is generally the data consumption layer in a traditional analytics
architecture. In this approach 80% of your insights can come from conversational
chatbot or AI co-pilot while the remaining 20% can continue to depend on dashboarding.
Organizational
Documents
docx. pptx. xlsx
Large
Reports, Policies, Language
Invoices, Proposals Models
Documents Parser Service
PDFs
Project Tracking,
Question
Support Tickets,
Answering
Handbooks
and
Webpages Search
Query Parsing
and retrieval
Application Logs
txt, logs Vector DB
Access
Management
Organizational
Knowledge Base Document Processing and Query Engine Chat Interface
This approach where we focus on LLM capabilities to drive data driven decision making
for businesses has several advantages over the traditional BI dashboards driven decision
making. In addition to the time savings, this approach provides flexibility as well as
enhanced user experience.
On the next page, we enlist all the advantages of the Conversational Insights approach.
21
LARGE LANGUAGE MODELS
Productionizing LLMs
Advantages of this approach:
2. This approach is modular so you can choose your own model and vector
database. You can even evaluate combinations to choose the one that suits
you the best.
3. The open-source models mean that you know the parameter weights, and
the model provides the transparency businesses need.
Let's now take a look at use cases driven by time savings as most businesses are
focused on using LLMs to drive Hyper-productivity in their businesses.
All of these use cases along with almost 90% of other use cases are such that their
ROI is connected with time savings. The return on investment is justified by the
shortened cycle times or reduced costs due to time saved. This is the common ROI
paradigm considered by businesses while considering LLM use cases. We
recommend considering both these paradigms while planning to productionize
LLMs.
22
LARGE LANGUAGE MODELS
AI powered Specialized
hyper-productivity human-in-the-loop
will become norm systems will have
in businesses with to be implemented
up to 10x increase in areas where
in productivity in LLMs are likely to
tasks such as provide incorrect
content writing, outputs.
research, graphic
design, coding etc.
• Prompt Engineering will become a key skill. Employees who can create good
prompts will be more productive.
• Employees will have to be sensitized on workings of LLMs so that they fully
understand where to trust the inputs provided by LLMs and which cases to
cross verify.
23
LARGE LANGUAGE MODELS
Have clear
objectives – while
Educate all
beginning your LLM
stakeholders – while
journey, clearly
most users know
define your
how to use LLMs via
objectives.
a chat interface,
very few
understand the Define success – what
inner workings and improvement are you
risks associated trying to achieve in
with completely terms of
relying on LLMs. Not • Productivity
understanding how • Reduced costs
• New capability
LLMs work may be
development
catastrophic in
• Process
some cases.
innovation
In the absence of
measurable success
criteria LLM projects
Identify strategies
may end up becoming
for Knowledge
“cool toys” without
Workers to
strategic impact.
Maximize the
Power of LLMs.
Choose the right use case – We recommend beginning with a use case where the
potential savings from LLM based workflow will fund the cost of implementing the
use case.
For example, if you have a manual process with 10 personnel which you can
automate to save 8 person years worth of costs, you should begin with this use case
while allocating a budget equivalent to the potential savings.
“As long as you keep the needs of people at the heart of your plan, there are many
ways to orchestrate successful and lasting change.”
24
LARGE LANGUAGE MODELS
05 GENERATIVE AI
AND LLMs
In this final section, we introduce you to Generative AI which is one
of the most hyped fields these days. We also have a small seven
question test to test your Generative AI knowledge. We then have
some recommendations on how to leverage LLMs. Next, we share
insights into how we're aiding our existing customers in harnessing
the transformational power of LLMs and Generative AI. We close this
section with the answers to the Generative AI test. We recommend
that you take a look at our LLM accelerator and our playstore by
scanning the QR code on page 32!
25
LARGE LANGUAGE MODELS
Generative AI (GenAI)
What is Generative Al?
� GenAl is a type of Artificial Intelligence that creates new content based on what it has
learned from existing content.
� The process of learning from existing content is called training and results in the creation
of a statistical model.
� When given a prompt, GenAl uses this statistical model to predict what an expected
response might be and this generates new content.
Discriminative model
Discriminative Classify (Classify as a Airplane or
Technique
a Ship)
Generative AI uses Generative Models which use existing content to generate new content.
26
LARGE LANGUAGE MODELS
Generative AI - Capabilities
Generative AI is an umbrella term for transformers, large language models, diffusion models
and other neural networks which can create text, images, music, software and more.
Fig. Capabilities of Generative AI
Translation
Summarization
OUTPUT:
Text
Chat Based
Information Retrieval
Input: Text
Grammar Correction
Image Generation
OUTPUT:
Image / Video
Video Generation
OUTPUT:
Text To speech
Audio
OUTPUT:
Insights Engine
Decisions
Large Language Models based on Transformer architecture generate text output only.
For generating image/audio/video Generative Models such as Generative Adversarial
Networks (GANs), Diffusion Models, Variational Autoencoders (VAEs), and Flow-based
models are used.
Generative Adversarial Networks or GANs-technologies are used. These GANs can
create visual and multimedia artifacts from both imagery and textual input data.
A generative adversarial network or GAN is a machine learning algorithm that puts the
two neural networks, generator and discriminator, against each other, hence the name
“Adversarial”.
Diffusion models train by adding gaussian noise to images. The model then learns
how to remove this noise. The model then applies this denoising process to random
seeds to generate realistic images. Diffusion models can be used for image
generation, image denoising, inpainting, outpainting, and bit diffusion.
27
LARGE LANGUAGE MODELS
Generative AI - Test
________ is a short piece of text that is given to the large language model as
? input, and it can be used to control the output of the model in many ways.
a. Input b. Generator c. Transformer d. Prompt
28
LARGE LANGUAGE MODELS
29
LARGE LANGUAGE MODELS
We work with some of the most cutting-edge startups and growth oriented SMBs. We
partner with enterprises to help them drive innovation. All these customer segments
are looking at leveraging LLMs and the power of Generative AI to drive a competitive
advantage. As the technology partner for our customers, here are 5 things we are
doing for our customers:
30
LARGE LANGUAGE MODELS
A generative model could generate new photos of animals that look like real
animals, while a discriminative model could tell a dog from a cat. In terms of
probability given a set of data instances X and a set of labels Y:
1
Generative models capture the joint probability p(X, Y), or just p(X) if there are
no labels.
Discriminative models capture the conditional probability p(Y | X).
Generative AI generates text, audio, video and images, while large language
2
models can generate only text.
Prompt is the technical term used to describe the instructions provided to the
3
generative model.
A foundation model (also called base model) is a large machine learning (ML)
model trained on a vast quantity of data at scale (often by self-supervised
4
learning or semi-supervised learning) resulting in a model that can be adapted
to a wide range of tasks.
LLMs are fundamentally next word prediction machines. Images and videos are
7
produced by diffusion models which are different from LLMs.
31
LARGE LANGUAGE MODELS
32
LARGE LANGUAGE MODELS
AUTHOR
KETAN PAITHANKAR
CO-FOUNDER & CTO - KONVERGE.AI
Ketan served as a Research Assistant for Cisco Systems during his Master's
program. Notably, he received the Mayor's award in a city council meeting for his
contributions to building the m-governance platform for the City of Wichita.
Recognized as a dynamic public speaker, Ketan has delivered over 50 talks across
different platforms and industries. This book marks Ketan's third publication,
following the success of his previous books, "Accelerate your AI Product Journey"
and "Modern Data Stack."
Co-Authoring Team