0% found this document useful (0 votes)

113 views41 pages

ChatGPT Mastery - Prompt Engineering

The document is a comprehensive guide on prompt engineering for large language models, detailing strategies for crafting effective prompts to enhance AI output quality. It covers essential principles, techniques for utilizing external tools, and methods for systematic testing and evaluation of model performance. The guide aims to equip users with the skills to effectively communicate with AI systems and achieve desired outcomes across various applications.

Uploaded by

todd mcguire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views41 pages

ChatGPT Mastery - Prompt Engineering

Uploaded by

todd mcguire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

ChatGPT Mastery:

Prompt Engineering
___

Alfonso Fuertes Fuentes

CHAPTER 1: INTRODUCTION TO PROMPT
ENGINEERING 3

CHAPTER 2: CRAFTING CLEAR AND EFFECTIVE

INSTRUCTIONS 5

CHAPTER 3: STRATEGIES FOR ENHANCED PROMPT

ENGINEERING** 7

CHAPTER 4: GIVING THE MODEL TIME TO "THINK" 9

CHAPTER 5: UTILIZING EXTERNAL TOOLS 11

CHAPTER 6: SYSTEMATIC TESTING AND

EVALUATION 14

CHAPTER 7: STRATEGIES FOR HANDLING LONG

DOCUMENTS AND CONVERSATIONS 17

CHAPTER 8: GUIDING THE MODEL'S REASONING

PROCESS 20

CHAPTER 9: LEVERAGING EXTERNAL TOOLS 24

CHAPTER 10: EVALUATING AND TESTING MODEL

PERFORMANCE 28

QUIZ 32

ESSAY QUESTIONS 35

GLOSSARY OF KEY TERMS 36

FAQ 38

2
Chapter 1: Introduction to Prompt
Engineering
Prompt engineering is a critical skill for effectively using
large language models (LLMs) like GPT-4. It involves
designing and refining input prompts to elicit the
desired outputs from these AI systems. The quality of
the prompt directly impacts the quality of the output,
making prompt engineering both an art and a science.

Imagine interacting with an LLM as a conversation with

a highly skilled but literal-minded assistant. This
assistant possesses a vast knowledge base and
exceptional language abilities but lacks the intuition
and contextual understanding of a human. To get the
best results, you need to provide this assistant
with clear, specific, and well-structured
instructions. This is where prompt engineering comes
into play.

Prompt engineering enables users to communicate

their intentions effectively to LLMs and guide them
towards the desired output. The process involves
understanding the model's capabilities and limitations
while carefully crafting prompts that minimize ambiguity
and maximize clarity.

Here are some key reasons why prompt engineering is

crucial:

• LLMs can't read your mind: They rely solely

on the information provided in the prompt to
generate responses.

• Ambiguity leads to unpredictable outputs:

Vague or poorly structured prompts can result
in irrelevant, nonsensical, or even fabricated
answers.

3
• Effective prompts unlock LLM potential: A
well-crafted prompt can guide the model to
produce creative, accurate, and insightful
outputs, making it a powerful tool for various
applications.

The applications of prompt engineering are vast

and continue to expand as LLMs evolve. Some
common use cases include:

• Content Creation: Generating creative stories,

articles, marketing copy, and other written
content.

• Translation: Translating text between multiple

languages accurately and efficiently.

• Code Generation and Debugging: Writing,

debugging, and explaining code in various
programming languages.

• Question Answering and Research:

Answering questions, retrieving specific
information, and summarizing complex
documents.

• Dialogue Systems and Chatbots: Building

engaging and interactive conversational agents
for various purposes.

This book will provide a comprehensive guide to

mastering the art of prompt engineering. From basic
principles and strategies to advanced techniques and
real-world examples, you'll learn how to communicate
effectively with LLMs and harness their power to
achieve your desired outcomes.

4
Chapter 2: Crafting Clear and Effective
Instructions

Crafting clear and effective instructions is the

foundation of successful prompt engineering. As
discussed in the previous chapter, LLMs rely heavily on
the clarity and specificity of the prompt to generate
relevant and accurate responses. Ambiguity in
instructions can lead to unpredictable and often
undesirable results.

To ensure that the model understands your intent,

it's crucial to follow these key principles when
writing instructions:

• Be Specific: Avoid vague language and

provide precise details about the desired
output. For example, instead of asking the
model to "write about climate change," specify
the type of content, target audience, and
desired tone. A more specific instruction could
be: "Write a 500-word informative article for a
general audience about the impact of climate
change on coastal communities."

• Use Actionable Verbs: Clearly state what you

want the model to do using action verbs like
"write," "summarize," "translate," or "generate."
For instance, instead of "information about the
American Revolution," instruct the model to
"Summarize the key events of the American
Revolution in three paragraphs."

• Define the Output Format: If you have specific

formatting requirements, clearly state them in
the prompt. For instance, you can ask the
model to "Generate a list of five bullet points
outlining the benefits of solar energy" or "Write
a haiku about the beauty of nature."
5
• Set Expectations for Length and Style:
Guide the model by specifying the desired
length of the output, whether it's in words,
sentences, paragraphs, or bullet points.
Additionally, you can specify the desired style,
such as formal, informal, technical, or creative.

• Iterate and Refine: Don't expect to get the

perfect prompt on the first try. Experiment with
different phrasings, levels of detail, and
formatting to observe how the model responds.
Refine your instructions based on the outputs
you receive to improve the model's
performance.

By following these principles and iteratively refining

your prompts, you can effectively communicate your
intentions to LLMs and consistently achieve the desired
outcomes.

6
Chapter 3: Strategies for Enhanced
Prompt Engineering**

This chapter can expand on the six key strategies

outlined in the sources for achieving better outcomes
when working with large language models.

• Write Clear Instructions: Emphasize the

importance of providing specific details in
prompts to guide the model effectively. Include
examples like those in the sources, contrasting
vague prompts with more detailed ones that
yield better results. You can further elaborate
on:

o Tactic: Ask the Model to Adopt a

Persona: Explain how instructing the
model to respond in a specific persona
can influence the style and tone of the
output.

o Tactic: Use Delimiters: Discuss the

use of delimiters like triple quotes, tags,
or section titles to clearly separate
different parts of the input, particularly
when dealing with complex tasks.

• Provide Reference Text: Explain how

providing relevant information to the model can
improve accuracy and reduce the chances of
fabricated answers. You can explore tactics
like:

o Tactic: Instruct the Model to Answer

Using a Reference Text: Show how to
direct the model to use the provided text
as the sole basis for its response.

7
o Tactic: Instruct the Model to Answer
With Citations From a Reference
Text: Explain how to prompt the model
to cite specific passages from the
provided text to support its answers,
increasing transparency and verifiability.

• Split Complex Tasks into Simpler Subtasks:

Discuss how breaking down intricate tasks into
a series of manageable steps can reduce errors
and improve overall performance. Explain
tactics like:

o Tactic: Use Intent Classification:

Demonstrate how to categorize user
queries to provide the most relevant
instructions for each specific type of
request.

o Tactic: For Dialogue Applications

That Require Very Long
Conversations, Summarize or Filter
Previous Dialogue: Explain methods
for handling long conversations,
including summarizing previous turns or
selectively filtering relevant information.

o Tactic: Summarize Long Documents

Piecewise and Construct a Full
Summary Recursively: Describe
techniques for summarizing extensive
documents by breaking them down into
smaller sections and creating
summaries hierarchically.

8
Chapter 4: Giving the Model Time to
"Think"

Given the previous chapters on prompt engineering,

we can now explore a crucial aspect of improving the
performance of large language models: giving them
adequate time to process information and reason. This
chapter will focus on the strategy of giving the model
time to "think" to enhance accuracy, especially when
dealing with complex tasks.

This concept is analogous to human problem-solving.

When faced with a challenging question, we don't
always have an immediate answer. We need time to
analyze the information, consider different
perspectives, and work through potential solutions.
Similarly, large language models can benefit from a
structured approach that allows them to "think" before
providing a final answer.

This chapter will discuss two primary tactics outlined in

the sources for implementing this strategy:

• Instruct the Model to Work Out Its Own

Solution Before Rushing to a Conclusion:
This tactic encourages the model to generate
its own solution to a problem before evaluating
other solutions. This can help identify errors
that might be missed if the model simply
focuses on assessing the correctness of a
given answer. The sources illustrate this
concept with an example of a math problem. By
first prompting the model to solve the problem
independently, it can then effectively compare
its solution to a student's solution and
accurately identify any discrepancies.

• Use Inner Monologue or a Sequence of

Queries to Hide the Model's Reasoning
9
Process: This tactic involves structuring the
model's output to separate the reasoning
process from the final answer. This is
particularly useful in scenarios where revealing
the model's thought process might be
detrimental, such as in educational applications
where students should be encouraged to solve
problems independently.

o Inner Monologue: The model can be

instructed to enclose its reasoning steps
within specific delimiters, like triple
quotes. This allows for easy parsing of
the output, enabling the removal of the
reasoning steps before presenting the
final answer to the user.

o Sequence of Queries: Alternatively, the

task can be divided into a series of
queries. The initial queries focus on
guiding the model's reasoning process,
with their outputs hidden from the user.
The final query then utilizes the model's
analysis to generate the final answer.

By implementing these tactics, you can enhance the

model's ability to provide well-reasoned and accurate
responses, particularly when dealing with complex or
sensitive tasks.

10
Chapter 5: Utilizing External Tools

In this chapter, we will explore the strategy of using

external tools to enhance the capabilities of large
language models. As powerful as these models are,
they have limitations. They might not excel at tasks like
complex calculations or accessing real-time
information. By integrating external tools, we can
overcome these limitations and create more robust and
versatile applications.

The sources present three key tactics for incorporating

external tools:

• Use Embeddings-Based Search to

Implement Efficient Knowledge Retrieval:
This tactic addresses the limitation of fixed
context windows in language models. When
dealing with large amounts of information or
dynamic data, it's crucial to retrieve relevant
information efficiently.

o Embeddings: These are vector

representations of text that capture
semantic meaning. Similar texts have
embeddings that are closer together in
vector space.

o Knowledge Retrieval: By embedding

both a user's query and chunks of text
from a knowledge base, we can use
efficient vector search algorithms to find
the most relevant information. This
retrieved knowledge can then be
provided as context to the language
model, enabling it to generate more
informed and accurate responses. The
sources mention OpenAI Cookbook as a
resource for example implementations.

11
• Use Code Execution to Perform More
Accurate Calculations or Call External APIs:
Language models are not inherently reliable for
performing precise mathematical calculations or
executing code. To address this, we can
instruct the model to generate code for specific
tasks and then execute that code using a
dedicated engine.

o Calculations: By enclosing code within

delimiters like triple backticks, we can
signal to the model that this section
should be executed as code. The model
can then generate code for
mathematical operations, and the output
of this code can be fed back into the
model for further processing.

o External APIs: This tactic extends code

execution to interact with external
systems and services. By providing the
model with documentation and
examples of API usage, it can learn to
generate code that makes API calls,
retrieving real-time information or
performing actions in external systems.

o Safety Precautions: The sources

emphasize the importance of
sandboxed code execution
environments when executing code
generated by a model. This helps
mitigate potential risks associated with
running untrusted code.

• Give the Model Access to Specific

Functions: This tactic involves providing the
model with predefined functions that it can call.
The model learns to generate function

12
arguments based on the provided function
schemas, and these arguments are used to
execute the functions. The output from these
function calls is then fed back into the model.
This approach, recommended by the sources,
streamlines the integration of external
functionality into language model applications.
The sources again point to the OpenAI
Cookbook and introductory text generation
guides for more information and examples.

By strategically employing these tactics, you can

significantly expand the capabilities of large language
models, allowing them to tackle a wider range of tasks
and generate more accurate and informed responses.

13
Chapter 6: Systematic Testing and
Evaluation

This chapter will focus on the importance of

systematically testing and evaluating changes made
to prompts or the overall system design when working
with large language models. While intuition and small-
scale testing can provide some insights, a more
rigorous approach is essential to ensure that
modifications lead to genuine improvements in
performance.

The sources highlight the significance of using

evaluation procedures, also known as "evals," to
objectively assess the impact of changes. A well-
designed evaluation process should be:

• Representative of Real-World Usage: The

evaluation should use test cases that reflect the
diversity and complexity of the tasks the model
will encounter in practical applications. While
this might not always be fully achievable,
striving for representativeness is crucial to
avoid overfitting to specific examples or
scenarios.

• Statistically Robust: The evaluation should

include a sufficient number of test cases to
provide statistically significant results. The
sources offer a table that suggests the
minimum sample size needed to detect
differences in performance with a 95%
confidence level. For instance, to reliably detect
a 10% difference in performance, the evaluation
should include at least 100 test cases.

The evaluation of model outputs can be conducted

through various methods, including:

14
• Computer-Based Evaluation: This approach
utilizes computers to automatically assess
model outputs based on predetermined criteria.
It's particularly effective for tasks with objective
answers, like multiple-choice questions or
factual recall. Computers can also be used to
evaluate outputs based on subjective criteria,
such as fluency or coherence, by employing
model-based queries.

• Human Evaluation: Human judgment is often

necessary for evaluating aspects of model
output that require subjective interpretation,
such as creativity, humor, or persuasive writing.
While human evaluation can be more time-
consuming and potentially less consistent, it
remains essential for assessing qualities that
are difficult to capture through automated
metrics.

• Hybrid Evaluation: This approach combines

computer-based and human evaluation
methods to leverage the strengths of both. For
instance, a model's factual accuracy might be
assessed automatically, while its writing quality
could be evaluated by human judges.

The sources provide examples of how to design

model-based evaluations. In one example, a model is
tasked with evaluating whether a given text contains
specific factual information. The model is provided with
a set of facts and instructed to:

1. Restate each fact.

2. Find the closest citation from the text for each

fact.

3. Determine whether someone unfamiliar with the

topic could infer the fact from the citation.

15
4. Indicate with a "yes" or "no" whether the citation
effectively supports the fact.

The evaluation then counts the number of "yes"

responses, providing a quantitative measure of the
text's factual accuracy.

Another model-based evaluation example focuses on

assessing the type of overlap between a submitted
answer and an expert answer. The model is instructed
to:

1. Reason step-by-step to determine the

relationship between the two answers: disjoint,
equal, subset, superset, or overlapping.

2. Reason step-by-step to determine whether the

submitted answer contradicts the expert
answer.

3. Output a JSON object indicating the type of

overlap and whether a contradiction exists.

These examples illustrate how models can be used to

automate the evaluation process, particularly for tasks
where objective criteria can be defined. The sources
recommend experimenting with different model-based
evaluation approaches to determine their effectiveness
for specific use cases.

Additionally, the sources mention OpenAI Evals, an

open-source framework that provides tools for creating
automated evaluations. For those interested in
exploring this further, researching OpenAI Evals would
provide more insights into building robust evaluation
procedures.

16
Chapter 7: Strategies for Handling Long
Documents and Conversations

This chapter focuses on addressing the inherent

limitations of language models when dealing with
extensive amounts of text, specifically in the context of
long documents and extended conversations. The fixed
context window size of language models poses a
challenge when processing information that exceeds
this limit.

The sources present two key tactics for handling these

scenarios:

• Summarizing or Filtering Previous Dialogue

for Long Conversations: In conversational
applications where the interaction extends over
multiple turns, it becomes crucial to manage the
growing amount of text within the context
window.

o Summarization: One approach is to

periodically summarize previous
conversation turns, condensing the
information into a more concise
representation. Once the context
window reaches a predefined threshold,
a query can be initiated to summarize a
portion of the conversation history. This
summary can then replace the original
text, freeing up space within the context
window.

o Background Summarization: The

summarization process can also occur
asynchronously in the background,
continuously summarizing the
conversation as it progresses. This
ensures that the context window
17
remains manageable without
interrupting the flow of the conversation.

o Filtering: Another approach is to filter

previous conversation turns, retaining
only the most relevant information
based on the current query or context.
This can be achieved using techniques
like embeddings-based search to
identify the most semantically similar
previous turns.

• Summarizing Long Documents Piecewise:

When dealing with documents that exceed the
model's context window, it's impossible to
summarize the entire text in a single pass.

o Piecewise Summarization: The

sources suggest breaking down the
document into smaller sections and
summarizing each section individually.
These section summaries can then be
concatenated and summarized, creating
summaries of summaries.

o Recursive Summarization: This

process can be repeated recursively,
progressively summarizing higher-level
summaries until a final, concise
summary of the entire document is
generated.

o Running Summary: To enhance

coherence and maintain context, the
sources recommend incorporating a
running summary of preceding sections
while summarizing a particular section.
This helps preserve the overall flow of
information and ensures that later

18
sections are interpreted within the
context of earlier content.

The effectiveness of recursive summarization

techniques for summarizing books using variants of
GPT-3 has been previously researched by OpenAI.

19
Chapter 8: Guiding the Model's
Reasoning Process

This chapter explores strategies for guiding the

reasoning process of large language models,
prompting them to approach problem-solving in a more
deliberate and structured manner. This can lead to
more accurate and reliable results, especially for tasks
that involve logical reasoning or complex calculations.

The sources outline two main tactics for achieving this:

• Instructing the Model to Work Out Its Own

Solution Before Rushing to a Conclusion:
Instead of directly asking the model for an
answer, we can guide it to first generate its own
solution step-by-step. This encourages the
model to think through the problem
independently, reducing the risk of errors or
biases introduced by a user's potentially
incorrect input.

o Example: In a scenario where a student

is asked to solve a math problem,
instead of simply evaluating the
student's solution, the model can be
instructed to:

1. Solve the problem

independently. This step
ensures that the model has a
clear understanding of the
correct solution.

2. Compare its solution to the

student's solution. This allows
the model to identify any
discrepancies or errors in the
student's reasoning.

20
3. Evaluate the student's
solution. Based on the
comparison, the model can
provide feedback on the
student's approach.

o Benefits: This tactic helps the model to

avoid being influenced by potentially
incorrect student solutions and
encourages a more thorough analysis of
the problem.

• Using Inner Monologue or a Sequence of

Queries to Hide the Model's Reasoning
Process: In certain situations, it might be
undesirable to reveal the model's entire
reasoning process to the user.

o Inner Monologue: This tactic involves

instructing the model to structure its
output in a way that separates the
internal reasoning steps from the final
answer. For instance, the model could
be instructed to enclose its internal
workings within specific delimiters (e.g.,
triple quotes). This allows the output to
be parsed, extracting only the relevant
information for the user while concealing
the internal reasoning.

o Sequence of Queries: An alternative

approach is to break down the problem
into a series of queries, where the
output of each query (except the final
one) is hidden from the user. This allows
the model to reason through the
problem step-by-step without revealing
its internal process.

21
§ Example: Consider the same
math problem scenario. We can
use a sequence of queries:

1. Present the problem

statement to the model.
This prompts the model
to generate its solution
without being influenced
by the student's attempt.

2. Provide the model with

both its solution and
the student's solution.
This allows the model to
compare and evaluate
the student's work.

3. Instruct the model to

provide feedback to the
student, drawing on its
analysis in the previous
step.

o Benefits: These tactics are particularly

useful in educational settings where
providing the full solution might hinder a
student's learning process. They allow
the model to act as a tutor, guiding
students toward the correct answer
without simply giving it away.

These strategies, when implemented effectively,

empower language models to act not just as
information providers, but also as insightful problem
solvers. They enhance the model's ability to reason,
analyze, and provide more nuanced and helpful
responses, making them valuable tools in a wider
range of applications.

22
Additionally, the sources introduce a tactic for
prompting the model to review its previous work,
especially when dealing with tasks like information
extraction from large documents:

• Ask the Model if It Missed Anything on

Previous Passes: Language models, due to
their limited context windows, might
prematurely stop processing information,
potentially missing relevant content. This tactic
involves prompting the model to re-examine the
source material to ensure it has extracted all
pertinent information. This can be implemented
by asking the model a follow-up question, such
as "Are there any more relevant excerpts?"
after an initial response. This encourages the
model to review its work and potentially uncover
previously overlooked information.

23
Chapter 9: Leveraging External Tools

This chapter focuses on expanding the capabilities of

large language models by integrating them with
external tools. This approach allows models to access
and utilize information and functionalities that are
beyond their inherent capabilities, leading to more
powerful and versatile applications.

The sources highlight two primary strategies for

utilizing external tools:

• Embeddings-Based Search for Efficient

Knowledge Retrieval: One limitation of
language models is their fixed context window,
which restricts the amount of information they
can process at once. To overcome this, we can
use embeddings to dynamically retrieve
relevant information from external sources.

o Embeddings: Text embeddings are

vector representations of text strings
that capture semantic relationships.
Similar text strings have embeddings
that are closer together in the vector
space.

o Knowledge Retrieval: We can use

embeddings to implement efficient
knowledge retrieval by following these
steps:

1. Chunk and Embed: Divide a

large text corpus into smaller
chunks and embed each chunk.

2. Embed the Query: Embed the

user's query.

24
3. Vector Search: Perform a
vector search to find the
embedded chunks from the
corpus that are closest to the
query embedding. This retrieves
the most semantically relevant
information.

4. Provide Context: Provide the

retrieved information as context
to the language model, allowing
it to generate more informed
responses.

o Benefits: This approach allows models

to access and utilize vast amounts of
external knowledge, overcoming the
limitations of their context window. It
enables them to provide more accurate,
comprehensive, and up-to-date
answers.

o Examples: The sources provide

references to the OpenAI Cookbook for
specific implementation details and
examples of how to use knowledge
retrieval effectively.

• Code Execution for Accurate Calculations

and API Calls: Language models are not
inherently reliable for performing complex
calculations or interacting with external
systems. To address this, we can instruct them
to write and execute code instead of
attempting these tasks directly.

o Code Generation: The model can be

instructed to generate code within
specific delimiters (e.g., triple backticks).
The code can then be extracted and
25
executed by a suitable interpreter (e.g.,
a Python interpreter).

o Calculation: This technique allows

models to perform accurate arithmetic
and other calculations that would be
unreliable if performed directly by the
model.

o API Interaction: Models can also be

taught to interact with external APIs by
providing them with API documentation
and code examples. This enables them
to access and utilize a wide range of
external services and functionalities.

o Benefits: Integrating code execution

empowers language models to perform
tasks that are beyond their inherent
capabilities, making them more versatile
and powerful tools.

o Safety Precautions: The sources

emphasize the importance of taking
safety precautions when executing code
generated by a language model. They
recommend using sandboxed code
execution environments to limit
potential harm from untrusted code.

In addition to the above, the sources mention function

calling as a recommended way of using OpenAI
models to interact with external functions:

• Function Calling: The Chat Completions API

allows developers to provide descriptions of
external functions. The model can then
generate arguments for these functions, which
are returned by the API in JSON format. These
arguments can be used to execute the

26
functions, and the output can be fed back into
the model for further processing.

The sources provide references to the introductory text

generation guide and the OpenAI Cookbook for more
information on function calling and its implementation.

27
Chapter 10: Evaluating and Testing
Model Performance

This chapter shifts the focus to the crucial aspect of

evaluating and systematically testing the
performance of large language models, particularly
when making changes or improvements to prompts or
system designs. It emphasizes the importance of
moving beyond anecdotal evidence and adopting a
more rigorous and data-driven approach to assess the
impact of modifications.

The sources present the following key strategy and

tactics for evaluating model performance:

• Strategy: Test Changes Systematically:

When introducing changes to prompts or
system designs, it's essential to evaluate their
impact systematically rather than relying on
isolated examples. This involves establishing
comprehensive test suites, also known as
"evals," to measure the effects of modifications
on a representative set of inputs. This approach
ensures that changes lead to genuine
improvements rather than simply appearing
beneficial in a few specific cases while
potentially degrading performance on a broader
scale.

o Tactic: Evaluate Model Outputs with

Reference to Gold-Standard
Answers: The core principle of
evaluation involves comparing model
outputs to gold-standard answers,
which are pre-determined correct or
ideal responses. The sources highlight
various methods for conducting such
evaluations, including:

28
§ Human Evaluation: Involves
human judges assessing the
quality and accuracy of model
outputs against gold-standard
answers. This approach is
particularly valuable for
subjective or nuanced tasks
where human judgment is
crucial.

§ Computer-Based Evaluation:
Uses automated metrics and
algorithms to compare model
outputs to gold-standard
answers. This is suitable for
tasks with objective criteria and
single correct answers.

§ Model-Based Evaluation:
Employs another language
model to evaluate the outputs of
the model being tested. This can
be useful when there's a range
of acceptable outputs, and a
model can effectively judge their
quality.

§ OpenAI Evals: The sources

mention OpenAI Evals, an open-
source software framework that
provides tools for creating
automated evaluations.

o Evaluating for Factual Accuracy: The

sources provide an example of a system
message designed to evaluate a
model's ability to incorporate specific
known facts into its answer. This
involves:

29
1. Defining Required Facts: Listing the facts that
should be present in the model's answer.

2. Restating and Citing: Instructing the

evaluating model to restate each required fact and
provide a citation from the candidate answer.

3. Judging Clarity: Assessing whether the

citation is clear and understandable for someone
unfamiliar with the topic.

4. Counting "Yes" Answers: Determining how

many facts were successfully incorporated by counting
the number of "yes" responses.

o Evaluating Overlap and

Contradiction: Another example
demonstrates a model-based evaluation
that analyzes the type of overlap
between a submitted answer and an
expert answer. It also checks for any
contradictions between the two
answers. This method involves:

1. Categorizing Overlap: The evaluating model

determines whether the submitted answer is disjoint,
equal, a subset, a superset, or overlapping with the
expert answer.

2. Identifying Contradictions: The model checks

if the submitted answer contradicts any part of the
expert answer.

3. Outputting Results in JSON: The evaluation

results are provided in a structured JSON format.

o Importance of Representative Test

Cases: The sources stress the
importance of using a large and
representative set of test cases for

30
evaluation. They provide guidelines for
determining the necessary sample size
based on the desired statistical power
and the magnitude of the difference
being measured.

o The OpenAI Cookbook: The sources

recommend referring to the OpenAI
Cookbook for more examples and
inspiration for developing effective
evaluation strategies.

By adopting systematic testing and evaluation

methods, developers can ensure that changes made to
prompts or system designs genuinely enhance model
performance and contribute to the development of
more reliable and trustworthy AI systems.

31
Quiz

Instructions: Answer the following questions in 2-3

sentences each.

1. Why is it crucial to write clear and detailed

instructions when interacting with language
models?

2. Explain the concept of "persona" in the context

of prompt engineering and provide an example.

3. What are delimiters and how can they be used

to enhance the clarity of prompts?

4. Describe the strategy of splitting complex tasks

into simpler subtasks and provide a scenario
where this approach would be beneficial.

5. Why is it sometimes necessary to give the

model time to "think," and what tactics can be
employed to achieve this?

6. How can external tools such as code execution

engines be integrated into prompt engineering
to improve model performance?

7. Explain the purpose and importance of

systematic testing when refining prompts.

8. What are embeddings, and how can they be

utilized in knowledge retrieval for language
models?

9. Describe the concept of "inner monologue" as a

tactic for hiding the model's reasoning process
from the user.

32
10. Provide an example of a situation where it
would be advantageous to ask the model if it
missed anything on previous passes.

Answer Key

1. Clear and detailed instructions are crucial

because language models cannot read minds.
The more specific and unambiguous the
prompt, the better the model understands the
desired outcome and can deliver more relevant
results.

2. "Persona" refers to instructing the model to

adopt a specific character, tone, or style in its
responses. For instance, you can ask the model
to act as a helpful tutor, a sarcastic comedian,
or a formal news reporter.

3. Delimiters are symbols or phrases used to

separate different sections within a prompt.
Triple quotes, XML tags, or section headings
can clearly indicate distinct parts of the input,
helping the model differentiate instructions from
the content.

4. Splitting complex tasks into smaller,

manageable steps allows the model to focus on
one aspect at a time, reducing errors and
improving overall performance. This is
beneficial for tasks like summarizing long
documents or executing multi-step instructions.

5. Giving the model time to "think" helps it avoid

rushing to conclusions and encourages more
thoughtful reasoning. Tactics like prompting for
a "chain of thought" or using inner monologue
can encourage deliberate processing.

33
6. External tools can augment the capabilities of
language models. Code execution engines can
handle mathematical calculations or execute
API calls, providing accurate results for tasks
that models struggle with independently.

7. Systematic testing involves evaluating model

outputs against a set of predefined criteria to
assess the effectiveness of prompt
modifications. This ensures that changes lead
to consistent improvements in performance
across various scenarios.

8. Embeddings are numerical representations of

text that capture semantic meaning. They can
be used for efficient knowledge retrieval by
identifying relevant information from a large
database based on the similarity between the
query and stored text chunks.

9. "Inner monologue" involves prompting the

model to perform intermediate reasoning steps
but enclose them within delimiters. This allows
the developer to parse the output and present
only the final answer or relevant information to
the user, hiding the internal thought process.

10. When extracting information from lengthy

documents or generating lists, the model might
not capture all relevant details in a single pass.
Asking if it missed anything prompts the model
to re-evaluate the input and potentially identify
additional relevant information.

34
Essay Questions

1. Discuss the ethical considerations associated

with prompt engineering, particularly in contexts
where the model's outputs might be used for
decision-making or influencing human behavior.

2. Analyze the potential benefits and limitations of

using language models for educational
purposes. How can prompt engineering be
leveraged to create effective tutoring or learning
experiences?

3. Compare and contrast different strategies for

improving the factual accuracy and reliability of
language model outputs. How can prompt
engineering be used to mitigate the risks of
generating misleading or fabricated
information?

4. Explore the future of prompt engineering as

language models continue to evolve. What new
challenges and opportunities might arise, and
how could prompt engineering techniques
adapt to address them?

5. Critically evaluate the role of human creativity

and intuition in prompt engineering. To what
extent can prompt engineering be considered
an art as well as a science, and how does
human ingenuity influence the effectiveness of
prompts?

35
Glossary of Key Terms

Chain of thought: A prompting technique that

encourages the model to explicitly articulate its
reasoning process before arriving at a final answer.

Code execution engine: An external tool that can

execute code, enabling the model to perform
calculations or interact with APIs.

Delimiters: Symbols or phrases used to clearly

separate different sections within a prompt, improving
clarity and disambiguation.

Embeddings: Numerical representations of text that

capture semantic meaning, used for tasks like
knowledge retrieval and similarity comparisons.

Eval (evaluation procedure): A systematic process

for assessing the performance of a language model,
typically involving a set of test cases and predefined
criteria.

Few-shot prompting: A technique that provides the

model with a few examples of the desired output style
or behavior.

Inner monologue: A tactic where the model performs

reasoning steps within delimiters, allowing the
developer to filter the output before presenting it to the
user.

Intent classification: The process of categorizing user

queries based on their underlying intent or purpose.

Knowledge retrieval: The process of retrieving

relevant information from a database or external
sources based on a given query.

36
Persona: Instructing the model to adopt a specific
character, tone, or style in its responses.

Prompt engineering: The art and science of crafting

effective prompts to elicit desired responses from
language models.

Reference text: Providing the model with external

information or documents to use as a source for
answering questions or completing tasks.

Subtasks: Breaking down complex tasks into smaller,

more manageable steps for the model to process
sequentially.

Systematic testing: Evaluating model outputs against

predefined criteria to measure the impact of prompt
modifications and ensure consistent improvements.

37
FAQ

1. How can I get better results from large language

models like GPT-4?

Large language models (LLMs) can be incredibly

powerful, but they need clear instructions to perform at
their best. Here are a few key strategies:

• Write clear and detailed instructions: Avoid

ambiguity. Specify exactly what you want,
including the desired format, length, and level of
detail.

• Provide reference text: If relevant, offer

context and background information to guide
the model's responses.

• Break down complex tasks: Decompose

larger tasks into smaller, more manageable
subtasks to reduce error rates.

• Give the model time to “think”: Encourage

step-by-step reasoning or “inner monologue” to
help the model arrive at a well-reasoned
answer.

• Utilize external tools: Integrate tools like

embeddings-based search for knowledge
retrieval or code execution engines for
calculations and API calls.

• Test changes systematically: Evaluate the

impact of prompt modifications through a robust
testing framework to ensure overall
performance improvement.

2. My model keeps giving me irrelevant or made-up

answers. How can I fix this?

38
LLMs can sometimes hallucinate information,
especially when dealing with obscure topics. Here's
how to improve accuracy:

• Provide specific details in your prompt: The

more context you give, the less the model has
to guess.

• Offer reference text: Giving the model relevant

information to draw from helps reduce the
chance of fabricating answers.

• Use embeddings-based search: This can

help retrieve pertinent information from external
sources to ground the model's responses.

3. How can I control the length and format of the

model's output?

You can influence the output by:

• Specifying the desired length: Request a

specific number of words, sentences,
paragraphs, or bullet points.

• Demonstrating the desired format: Show the

model examples of the output style you prefer.

• Using delimiters: Clearly separate different

sections of the input with markers like triple
quotes or XML tags.

4. Can I make the model adopt a specific persona

or writing style?

Yes, you can guide the model's tone and style by:

• Asking the model to adopt a persona:

Provide instructions like "You are a helpful
assistant who always uses humor" or "You are
a technical expert writing a concise report."

39
• Providing examples of the desired style:
Show the model samples of the tone and
language you want it to emulate.

5. How do I handle very long conversations or

documents that exceed the model’s context
window?

• Summarize or filter previous dialogue:

Condense past interactions to free up space in
the context window.

• Summarize long documents piecewise:

Break down lengthy text into smaller sections,
summarize each part, and then combine the
summaries recursively.

6. Can I prevent the model from revealing its

reasoning process in certain situations?

Yes, you can use techniques like:

• Inner monologue: Instruct the model to

enclose its reasoning steps within specific
delimiters, allowing you to parse and hide them
before presenting the output to the user.

• Sequence of queries: Break down the task

into multiple queries, hiding the output of
intermediate steps from the end user.

7. How can I utilize external tools to enhance the

model's capabilities?

• Embeddings-based search: Retrieve relevant

information from external knowledge bases to
enrich the model's input.

• Code execution: Allow the model to execute

code for calculations, data manipulation, or API
calls.
40
• Function calling: Provide descriptions of
external functions that the model can call,
enabling interaction with various tools and
services.

8. How can I test and evaluate whether changes to

my prompts are actually improving the model’s
performance?

• Define a comprehensive test suite: Create a

diverse set of test cases that reflect real-world
usage.

• Evaluate model outputs against gold-

standard answers: Use human evaluation or
model-based evaluation methods to assess the
quality and accuracy of the responses.

• Use OpenAI Evals or other evaluation

frameworks: Leverage available tools for
automating and streamlining the evaluation
process.

More Effective ChatGPT Prompts
No ratings yet
More Effective ChatGPT Prompts
245 pages
PromptEng GenAI 45min
No ratings yet
PromptEng GenAI 45min
62 pages
The Corporate Sponsorship Toolkit, Second Edition: Using sponsorship to help people fall in love with your brand
From Everand
The Corporate Sponsorship Toolkit, Second Edition: Using sponsorship to help people fall in love with your brand
Kim Skildum-Reid
No ratings yet
Instagram Masterplan 2024 09 08 07 27 24
No ratings yet
Instagram Masterplan 2024 09 08 07 27 24
107 pages
AI and Tech For Creatives
No ratings yet
AI and Tech For Creatives
105 pages
Stable Warpfusion v0 8 6 Stable - Ipynb
No ratings yet
Stable Warpfusion v0 8 6 Stable - Ipynb
162 pages
Copyright and Artificial Intelligence Part 3 Generative AI Training Report Pre Publication Version
No ratings yet
Copyright and Artificial Intelligence Part 3 Generative AI Training Report Pre Publication Version
113 pages
Chatgpt Prompt Engineering For Developers
No ratings yet
Chatgpt Prompt Engineering For Developers
65 pages
Humanity's Last Prompt Engineering Guide
100% (1)
Humanity's Last Prompt Engineering Guide
36 pages
AI Video GPT Prompt Data
No ratings yet
AI Video GPT Prompt Data
26 pages
ChatGPT Paid Prompts Buy You B Tech
No ratings yet
ChatGPT Paid Prompts Buy You B Tech
12 pages
How To Prompt ChatGPT, Claude, Gemini
No ratings yet
How To Prompt ChatGPT, Claude, Gemini
43 pages
Chat
100% (1)
Chat
1 page
Starter Prompt Library
No ratings yet
Starter Prompt Library
10 pages
Generative Ai Leader Exam Guide English
No ratings yet
Generative Ai Leader Exam Guide English
5 pages
50+ AI Image Prompts To Create Stunning Visuals - ClickUp
No ratings yet
50+ AI Image Prompts To Create Stunning Visuals - ClickUp
30 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
NVIDIA Gen AI Slides Download
No ratings yet
NVIDIA Gen AI Slides Download
353 pages
Ai Agents Faceless + Mini Course Beginner's
No ratings yet
Ai Agents Faceless + Mini Course Beginner's
14 pages
ChatGPT-repositories JP
0% (1)
ChatGPT-repositories JP
102 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Chat GPT User Guide
No ratings yet
Chat GPT User Guide
40 pages
The Art of Prompt Engineering-Alexandru Cocindau
No ratings yet
The Art of Prompt Engineering-Alexandru Cocindau
69 pages
Prompt Crafting
100% (2)
Prompt Crafting
60 pages
Prompting - Unleashing The Potential of Prompt Engineering in Large Language Models
No ratings yet
Prompting - Unleashing The Potential of Prompt Engineering in Large Language Models
58 pages
GPT-4.1 Prompting Guide - OpenAI Cookbook
No ratings yet
GPT-4.1 Prompting Guide - OpenAI Cookbook
10 pages
Prompt Engineering Guide For MidJourney & ChatGPT
No ratings yet
Prompt Engineering Guide For MidJourney & ChatGPT
1 page
Advanced Prompt Engineering
No ratings yet
Advanced Prompt Engineering
27 pages
Prompt Engineering Guide
100% (1)
Prompt Engineering Guide
33 pages
LLMs For Me - Introduction LLMs & Generative Text
No ratings yet
LLMs For Me - Introduction LLMs & Generative Text
38 pages
Prompt Engineering
No ratings yet
Prompt Engineering
45 pages
Lesson 03 Prompt Engineering
No ratings yet
Lesson 03 Prompt Engineering
63 pages
Self Study - Prompt Engineering
No ratings yet
Self Study - Prompt Engineering
20 pages
Session 7 LLMs Fine Tuning and RAG
No ratings yet
Session 7 LLMs Fine Tuning and RAG
21 pages
AI, Chatgpt & Real Estate
No ratings yet
AI, Chatgpt & Real Estate
255 pages
Ai Agent Overview
100% (2)
Ai Agent Overview
33 pages
Generative AI in Cybersecurity
No ratings yet
Generative AI in Cybersecurity
43 pages
The Prompt Canvas
No ratings yet
The Prompt Canvas
16 pages
Top 50 Text-to-Image Prompts For AI Art Generators Midjourney and DALL-E - Metaverse Post
No ratings yet
Top 50 Text-to-Image Prompts For AI Art Generators Midjourney and DALL-E - Metaverse Post
5 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
The Six Best PDF Generator APIs - PSPDFKit
No ratings yet
The Six Best PDF Generator APIs - PSPDFKit
22 pages
AI Art Series - Best Text Prompts To Create Stunning AI Art
No ratings yet
AI Art Series - Best Text Prompts To Create Stunning AI Art
2 pages
ChatGPT Prompt For Content Creator
No ratings yet
ChatGPT Prompt For Content Creator
11 pages
Chatgpt Developer Cheatsheet
100% (1)
Chatgpt Developer Cheatsheet
56 pages
AI Character Shorts
100% (1)
AI Character Shorts
2 pages
AI Tools
100% (1)
AI Tools
14 pages
State of GPT
No ratings yet
State of GPT
50 pages
Chatgpt Prompt Engineering
50% (2)
Chatgpt Prompt Engineering
12 pages
Conv AI Brochure v2
100% (1)
Conv AI Brochure v2
7 pages
Prompt Eng Techniques
100% (2)
Prompt Eng Techniques
17 pages
Training Manual For Prompt Engineering For Trainee's
No ratings yet
Training Manual For Prompt Engineering For Trainee's
22 pages
LLM - A Introduction To Generative AI
100% (1)
LLM - A Introduction To Generative AI
31 pages
AIML Dom 25 Nov 2024
No ratings yet
AIML Dom 25 Nov 2024
22 pages
Mastering Generative AI Text Prompts
100% (1)
Mastering Generative AI Text Prompts
29 pages
Finxter Prompting OpenAI-2
No ratings yet
Finxter Prompting OpenAI-2
1 page
The Top 12 AI Creation Tools - Stories, Images, Videos, Sound and More
No ratings yet
The Top 12 AI Creation Tools - Stories, Images, Videos, Sound and More
5 pages
Master Prompt Engineering Like Pro
No ratings yet
Master Prompt Engineering Like Pro
31 pages
Auto-GPT Article - Final
No ratings yet
Auto-GPT Article - Final
8 pages
17 To 21 Billionaire Blueprint by Rajesh Sanghar
No ratings yet
17 To 21 Billionaire Blueprint by Rajesh Sanghar
2 pages
Iiitb+Epgp+Ml v5 Compressed+
No ratings yet
Iiitb+Epgp+Ml v5 Compressed+
19 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
AI Enable ICT Workforce Consortium Report
No ratings yet
AI Enable ICT Workforce Consortium Report
195 pages
Chat GPT
No ratings yet
Chat GPT
10 pages
Step by Step Guide To Using ChatGPT For Business Professional Clean
No ratings yet
Step by Step Guide To Using ChatGPT For Business Professional Clean
5 pages
Prompt Injection Attacks in Defended Systems
No ratings yet
Prompt Injection Attacks in Defended Systems
10 pages
ChatFPT Prompt Injection
No ratings yet
ChatFPT Prompt Injection
10 pages
Chatgpt Prompt Engineering
0% (1)
Chatgpt Prompt Engineering
9 pages
Text Generation - OpenAI API
No ratings yet
Text Generation - OpenAI API
12 pages
Ai Agents
No ratings yet
Ai Agents
1 page
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
Prompt Engineering Practice Questions
No ratings yet
Prompt Engineering Practice Questions
2 pages
Chatgpt Is Old Stuff
No ratings yet
Chatgpt Is Old Stuff
22 pages
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
ChatGPT Mastery - Speed Your Productivity
No ratings yet
ChatGPT Mastery - Speed Your Productivity
32 pages
Review of Generative AI Methods in Cybersecurity
No ratings yet
Review of Generative AI Methods in Cybersecurity
41 pages
Agentic RAG - Unleashing The Power of Agent-Based Tools - by Kanishk Khatter - Medium
No ratings yet
Agentic RAG - Unleashing The Power of Agent-Based Tools - by Kanishk Khatter - Medium
13 pages
Education Agent Team - Deep Research REport
No ratings yet
Education Agent Team - Deep Research REport
44 pages
Text To Image Conversion
No ratings yet
Text To Image Conversion
5 pages
Prompt Diffusion in Context Learning For Generative Models
No ratings yet
Prompt Diffusion in Context Learning For Generative Models
5 pages
(Optimize For Speed and Efficiency) (Best Practices) Azure DevOps Pipeline
No ratings yet
(Optimize For Speed and Efficiency) (Best Practices) Azure DevOps Pipeline
1 page
Huyenchip Com 2023 04 11 LLM Engineering HTML
No ratings yet
Huyenchip Com 2023 04 11 LLM Engineering HTML
13 pages
Vonage - Interview Guide
No ratings yet
Vonage - Interview Guide
10 pages
Prompt Techniques
No ratings yet
Prompt Techniques
28 pages
Unit 2 Prompt Engi
No ratings yet
Unit 2 Prompt Engi
16 pages
An End-to-End Model With Adaptive Filtering For Retrieval-Augmented Generation
No ratings yet
An End-to-End Model With Adaptive Filtering For Retrieval-Augmented Generation
13 pages
Art of Prompting
No ratings yet
Art of Prompting
2 pages
Cetas-Cltr Ai Risk Briefing Paper
No ratings yet
Cetas-Cltr Ai Risk Briefing Paper
49 pages
An Empirical Study On Using Large Language Models To Analyze Software Supply Chain Security Failures
No ratings yet
An Empirical Study On Using Large Language Models To Analyze Software Supply Chain Security Failures
11 pages
Test Strategy Template en - MD
No ratings yet
Test Strategy Template en - MD
3 pages
Manyshort
No ratings yet
Manyshort
29 pages
GEN AI AWS COURSE Syllabus
No ratings yet
GEN AI AWS COURSE Syllabus
3 pages
Opportunity Brochure - US - Single Pages
No ratings yet
Opportunity Brochure - US - Single Pages
12 pages
Engineering Onboarding & Tech Stack Overview
No ratings yet
Engineering Onboarding & Tech Stack Overview
5 pages
A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
No ratings yet
A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems
17 pages
Start Your Norwex Business - Final - 1
No ratings yet
Start Your Norwex Business - Final - 1
1 page
Bright Start FAQ - US CND
No ratings yet
Bright Start FAQ - US CND
1 page
Resume Manish
No ratings yet
Resume Manish
1 page
Copia de AIPC - User Flow Template
No ratings yet
Copia de AIPC - User Flow Template
6 pages
Bright Start Brochure - US CDN
No ratings yet
Bright Start Brochure - US CDN
2 pages
New Consultant UX Starter Kit - US CDN
No ratings yet
New Consultant UX Starter Kit - US CDN
2 pages
April 2024 Three Ways To Join FAQ US
No ratings yet
April 2024 Three Ways To Join FAQ US
2 pages
April 2024 Three Ways To Join - US
No ratings yet
April 2024 Three Ways To Join - US
1 page
2024 Norwex Perks Flyer - US CDN
No ratings yet
2024 Norwex Perks Flyer - US CDN
1 page
Warehouse Sale April 2024 Flyer US
No ratings yet
Warehouse Sale April 2024 Flyer US
1 page