Prompt Engg Module2
Prompt Engg Module2
Module 2
Zero-Shot Prompting
Zero-Shot Prompting refers to the practice of asking a language model to perform a task
without providing any explicit examples or prior instruction on how to do so. The model is
expected to understand the task based purely on the prompt, without being given training
examples or additional context specific to the task at hand.
Key Features:
In this example, the model translates the sentence without being explicitly trained or given
prior examples of translation.
Benefits:
Limitations:
• Lower Accuracy: Since no examples are provided, the model may not always
interpret the task correctly or produce the most accurate results compared to
approaches like few-shot prompting or fine-tuning.
Few-Shot Prompting
Few-Shot Prompting is a technique in which a language model is given a few examples
(typically 1 to 5) of the task at hand within the prompt, to guide it on how to generate the
expected output. The idea is to help the model understand the task by providing a minimal
number of examples, which increases the likelihood of the model producing more accurate or
task-specific results compared to zero-shot prompting.
Key Features:
• Minimal Examples Provided: The model is given a few examples to understand the
pattern or structure of the task.
• Better Task Understanding: By seeing how the task is performed, the model can
infer how to complete it in a more consistent and reliable way.
• Useful for Complex Tasks: Few-shot prompting is especially helpful when the task
is complex, or when the model may need more specific guidance on what kind of
output is expected.
• Prompt:
rust
Copy code
Translate the following sentences from English to French:
1. "I love cats." -> "J'aime les chats."
2. "The sky is blue." -> "Le ciel est bleu."
3. "She is reading a book." ->
In this example, the model is provided with two examples of translations before being asked
to translate a third sentence. The few examples guide the model toward understanding how to
complete the task.
Benefits:
• Scaling Issues: If too many examples are needed, the prompt might become too long
or inefficient.
• Performance Variation: The quality of the model’s output depends on the examples
provided—if examples are unclear or inconsistent, the model may generate poor
results.
Few-shot prompting strikes a balance between zero-shot and fully trained approaches,
providing better performance without requiring large amounts of training data.
• Prompt:
Example 1:
Input: "Hello"
Output: "olleH"
Example 2:
Input: "Computer"
Output: "retupmoC"
Example 3:
Input: "Engineering"
Output:
• Prompt:
Example 1:
Input: [1, 2, 3, 4, 5]
Output: 5
Example 2:
Input: [10, 20, 30]
Output: 30
Example 3:
Input: [7, 12, 5, 20]
Output:
• Expected Output: 20
• Explanation: The prompt gives examples of finding the maximum in a list,
• Prompt:
Example 1:
Input: [5, 3, 8, 4]
Output: [3, 4, 5, 8]
Example 2:
Input: [10, 1, 7, 6]
Output: [1, 6, 7, 10]
Example 3:
Input: [15, 20, 5, 10]
Output:
• Prompt:
Example 1:
Input: 5
Output: 5 (Sequence: 0, 1, 1, 2, 3, 5)
Example 2:
Input: 7
Output: 13 (Sequence: 0, 1, 1, 2, 3, 5, 8, 13)
Example 3:
Input: 10
Output:
• Expected Output: 55
• Explanation: The examples guide students to calculate the Fibonacci sequence and
then apply the function for the third input.
Chain-of-Thought (CoT) Prompting
Prompt:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12,
2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 17, 10, 19, 4, 8,
12, 24.
A: Adding all the odd numbers (17, 19) gives 36. The answer is True.
The odd numbers in this group add up to an even number: 16, 11, 14, 4, 8,
13, 24.
A: Adding all the odd numbers (11, 13) gives 24. The answer is True.
The odd numbers in this group add up to an even number: 17, 9, 10, 12,
13, 4, 2.
A: Adding all the odd numbers (17, 9, 13) gives 39. The answer is
False.The odd numbers in this group add up to an even number: 15, 32, 5,
13, 82, 7, 1. A:
Output:
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is
False.
Prompt:
The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12,
2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82,
7, 1.
A:
Output:
Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is
False.
One recent idea that came out more recently is the idea of zero-shot CoT (Kojima et
al. 2022) that essentially involves adding "Let's think step by step" to the original
prompt.
Automatic Chain-of-Thought (Auto-CoT)
When applying chain-of-thought prompting with demonstrations, the process
involves hand-crafting effective and diverse examples. This manual effort could lead
to suboptimal solutions. Kojima et al.(2022) propose an approach to eliminate
manual efforts by leveraging LLMs with "Let's think step by step" prompt to
generate reasoning chains for demonstrations one by one. This automatic process
can still end up with mistakes in generated chains. To mitigate the effects of the
mistakes, the diversity of demonstrations matter. This work proposes Auto-CoT,
which samples questions with diversity and generates reasoning chains to
construct the demonstrations.
The simple heuristics could be length of questions (e.g., 60 tokens) and number of
steps in rationale (e.g., 5 reasoning steps). This encourages the model to use simple
and accurate demonstrations.
Step 1: Recall that Bubble Sort compares adjacent elements and swaps
them if they are in the wrong order.
Step 2: Compare the first two elements (5 and 3). Since 5 > 3, swap
them. The array becomes [3, 5, 8, 4].
Step 3: Compare the next pair (5 and 8). No swap needed.
Step 4: Compare the next pair (8 and 4). Since 8 > 4, swap them. The
array becomes [3, 5, 4, 8].
Step 5: Repeat the process for the second pass and continue until the
array is fully sorted: [3, 4, 5, 8].
Therefore, the sorted array is [3, 4, 5, 8].
• Problem: Analyze the time complexity of a function that performs binary search on a
sorted array of size nnn.
• Chain-of-Thought Prompt:
1. Prompt Creation: Generating new prompts for a specific task by providing a base
prompt to create other prompts.
o Example: "Create five different prompts to teach a model how to classify news
articles as politics, sports, or entertainment."
2. Prompt Refinement: Asking the AI to improve or rephrase existing prompts for
clarity, specificity, or tone.
o Example: "Here’s a prompt: 'Explain the effects of climate change on
agriculture.' Can you refine this to make it more detailed and specific?"
3. Prompt Evaluation: Using prompts to evaluate the effectiveness of other prompts,
based on criteria like relevance, accuracy, or engagement.
o Example: "Given the prompt 'Describe a method to reduce traffic congestion
in cities,' evaluate whether it encourages creative problem-solving."
4. Prompt Combination: Merging multiple prompts into a more comprehensive or
versatile prompt for broader task coverage.
o Example: "Combine these prompts: 'Summarize the article' and 'Explain the
author’s viewpoint' into a single prompt."
• Prompt Engineering: Meta prompting is useful for iterating on and refining the
effectiveness of prompts when working with AI models, especially when fine-tuning
responses.
• Human-AI Collaboration: It enables a more interactive and dynamic way to craft,
analyze, and improve prompts during conversations with AI.
• Model Training: This technique can be used to create training data in cases where AI
models need to be trained to handle a wide variety of tasks through diverse prompts.
Meta Prompting ultimately makes the AI model more versatile by guiding how it interacts
with other prompts, enabling deeper control over how outputs are generated and refined.
Self Consistency
Self-Consistency Prompting is a technique used in prompt engineering, particularly for
improving the quality and reliability of responses generated by AI models. This approach
leverages the fact that even large language models can sometimes produce inconsistent or
varying results, and it helps to improve accuracy by focusing on the most robust output.
• First Attempt:
5! = 5 * 4 * 3 * 2 * 1 = 120
• Second Attempt:
• Third Attempt:
5! = 5 * 4 * 3 * 2 * 1 = 120
• Self-Consistency: All the outputs agree that the answer is 120, so the model returns
120 as the final, consistent answer.
1. Generate Multiple Outputs: For any given prompt, the AI generates multiple
responses.
o Prompt: "Explain how photosynthesis works in plants."
o Output 1: "Photosynthesis is the process by which plants convert light energy
into chemical energy using chlorophyll."
o Output 2: "Plants use chlorophyll in their leaves to convert sunlight into
chemical energy during photosynthesis."
o Output 3: "Through photosynthesis, plants convert light energy into sugars,
using chlorophyll as a key catalyst."
2. Compare and Choose: The model evaluates these responses and determines that all
are consistent in explaining the core concept (photosynthesis involves light energy,
chlorophyll, and sugar production), so it outputs a final answer that synthesizes this
information.
Applications:
• Math Problems: Generating multiple ways to solve a problem to ensure the correct
result.
• Natural Language Tasks: Ensuring consistent answers in summarization or
explanation tasks.
• Question-Answering Systems: Providing reliable responses by cross-verifying
multiple answers generated by the model.
Generated Knowledge Prompting
Why is it Important?
1. Understand the Model: Know what the AI model is capable of and how it responds
to different types of prompts.
2. Design Effective Prompts: Create prompts that clearly convey the context and the
specific information needed. For example, instead of asking "How does a pump
work?" you might ask, "Explain the working principle of a centrifugal pump and its
applications in chemical engineering."
3. Iterate and Refine: Test and adjust your prompts based on the responses you get.
Refine them to improve clarity and relevance.
Example in Engineering
Scenario: You’re working on a project involving the design of a new heat exchanger.
The second prompt is more focused and will likely result in a more detailed and relevant
response, aiding your engineering project.
Practical Tips
1. Be Specific: The more detailed your prompt, the more specific and useful the
response will be.
2. Use Context: Provide background information or context to help the model
understand the scope of the question.
3. Test Variations: Experiment with different ways of phrasing your prompts to find the
most effective approach.
Tree of Thoughts (ToT)
What is Tree of Thoughts (ToT)?
The Tree of Thoughts is a framework for organizing and guiding the generation of ideas or
solutions by breaking down complex problems into a structured set of interconnected
prompts or “thoughts.” It leverages the model's ability to handle and generate detailed
information by systematically exploring various aspects of a problem.
How It Works
Example
1. Problem Decomposition:
o Node 1: Requirements for the irrigation system.
o Node 2: Sensor technologies for monitoring soil moisture.
o Node 3: Control mechanisms for automated watering.
o Node 4: Cost analysis and budget considerations.
2. Prompt Structuring:
o Node 1 Prompt: “What are the key requirements for an automated irrigation
system in a large-scale farm?”
o Node 2 Prompt: “Explain the latest sensor technologies available for
monitoring soil moisture in agricultural applications.”
o Node 3 Prompt: “Describe different control mechanisms that can be used to
automate irrigation based on sensor data.”
o Node 4 Prompt: “Perform a cost analysis for implementing an automated
irrigation system, including initial setup and maintenance costs.”
3. Interconnection:
o Link the responses from Node 2 and Node 3 to Node 1 to ensure that the
sensor technologies and control mechanisms meet the requirements specified.
o Use information from Node 4 to evaluate if the proposed solutions from
Nodes 1, 2, and 3 fit within the budget.
4. Iterative Refinement:
o Review the responses and refine the prompts based on the integration of
information from different nodes. For example, if the control mechanisms are
too expensive, you might need to adjust your requirements or explore
alternative options.
Benefits
Practical Tips
1. Define Clear Nodes: Ensure that each node represents a distinct aspect of the
problem.
2. Maintain Coherence: Regularly check how the responses from different nodes
integrate and address the overall problem.
3. Iterate and Refine: Continuously refine prompts and responses as you progress
through the tree to enhance the quality of the solution.
Retrieval Augmented Generation
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the
strengths of two approaches: information retrieval and text generation, It can be especially
useful in applications where accurate and up-to-date information is essential, such as
answering technical questions, troubleshooting, or generating design ideas based on existing
knowledge.
RAG is a hybrid model that retrieves relevant information from an external database or
knowledge source and uses that information to generate a response to a query. It enhances
the generation of responses by grounding them in factual, retrievable data, which is crucial
for accurate and context-aware outputs.
1. Query Input: A user inputs a question or request (e.g., "How do I calculate the stress
in a beam?").
2. Retrieval: The RAG model searches a database of documents, textbooks, or research
papers to find relevant information (e.g., formulas and concepts related to beam stress
analysis).
3. Generation: Using the retrieved information, the model then generates a response
that answers the query (e.g., providing a step-by-step explanation of beam stress
calculation).
4. Output: The generated response is returned to the user, combining the retrieved facts
with human-readable explanations.
• Technical Support: A user asks, "What is the best material to use for a heat
exchanger?" RAG retrieves data on material properties, thermal conductivity, and
heat exchanger design and generates a recommendation based on facts.
• Design Assistance: For example, an engineer might ask, "What are the design
constraints for a cantilever beam?" The RAG system retrieves relevant technical
documents and generates a comprehensive answer covering material properties,
dimensions, and load-bearing capabilities.
• Research and Development: RAG can assist engineers in staying up-to-date with
cutting-edge research by retrieving relevant research papers and summarizing the
findings.
6. Advantages of RAG:
7. Challenges:
• Data Availability: The effectiveness of RAG depends on the availability and quality
of the data sources it retrieves from.
• Domain-Specific Knowledge: The retriever needs to have access to relevant
engineering databases and not just general knowledge.
• Computational Cost: The process of retrieving and generating answers can be
computationally intensive, especially for complex queries.
Automatic Reasoning and Tool-use (ART)
1. Automatic Reasoning:
o AI models use reasoning techniques, such as deductive or inductive reasoning,
to make decisions, solve problems, or derive conclusions based on provided
information.
o This reasoning may involve understanding causal relationships, performing
multi-step logical operations, or recognizing patterns in data.
2. Tool-use:
o AI systems can utilize external tools (such as search engines, databases,
calculators, or APIs) to fetch additional information or perform specialized
tasks.
o This allows AI to extend its capabilities beyond what is present in its training
data, enabling access to up-to-date or domain-specific resources.
Benefits of ART:
Automatic Reasoning and Tool-use represents a promising direction for making AI more
capable, versatile, and reliable in diverse real-world applications.
Examples of Automatic Reasoning and Tool-use (ART) prompts that showcase how AI
can leverage both reasoning and external tools to generate responses:
• Prompt: "What are the latest advancements in quantum computing? Use an external
database or a web search tool to retrieve the most up-to-date information."
• Expected Output:
o The AI would first reason through the basics of quantum computing and then
access external resources, like recent academic papers or news articles, to
gather the most current information on recent advancements.
• Prompt: "Based on the following scenario, determine whether the defendant’s actions
constitute negligence under U.S. law. Use external legal databases to reference
relevant case law."
• Expected Output:
o The AI would reason through the legal definition of negligence and utilize an
external legal database (such as Westlaw or LexisNexis) to find similar cases
and precedents that match the scenario, providing a legally informed
conclusion.
4. Code Generation and Testing
• Prompt: "Write a Python function that sorts a list of integers using the quicksort
algorithm. Test the function using a coding environment to ensure it works correctly."
• Expected Output:
o The AI would reason through the steps of implementing the quicksort
algorithm, generate the appropriate Python code, and then test it using an
external execution environment (like a Python interpreter) to confirm the
function behaves as expected.
Automatic Prompt Engineer (APE)
Automatic Prompt Engineer (APE) is an emerging concept in AI and machine learning
where an AI system autonomously creates, refines, and optimizes prompts to improve its own
performance in completing tasks. The idea is that instead of relying on human engineers to
carefully craft prompts, the AI can dynamically generate and test different prompt
formulations to get the best possible result for any given task.
1. Self-Optimizing Prompts:
o The AI generates multiple prompt variations for a task and selects the ones
that yield the most accurate or desired outcomes. It iterates through different
formats, styles, or levels of detail until it converges on the most effective
prompt.
2. Adaptive Learning:
o The AI adapts its prompt generation strategy based on feedback from previous
outputs. For example, if one style of prompt consistently produces better
results, the AI will prioritize similar prompts in future tasks.
3. Task-Specific Refinement:
o APE can customize prompts for specific domains (e.g., medical, legal,
technical) by analyzing patterns in the task and using specialized vocabulary,
structures, or examples relevant to that domain.
4. Efficiency in Complex Tasks:
o For tasks requiring complex or multi-step reasoning, APE can break down the
process by generating prompts that guide the model through each step,
ensuring that it doesn’t miss any crucial details or misinterpret instructions.
1. Code Generation:
o Task: "Generate a function to calculate the factorial of a number."
o APE Process: The AI tries various prompt styles like "Write a Python
function to calculate the factorial of an integer," "Generate a Python recursive
function to compute the factorial of a number," and "Create an iterative
Python function for factorial calculation." It evaluates which prompt produces
the most efficient and accurate code and refines it further if necessary.
2. Text Summarization:
o Task: "Summarize this research article on climate change."
o APE Process: The AI could start with a simple prompt like "Summarize this
article in one sentence," then refine it to "Summarize the key findings of this
research article in three bullet points," based on how well the initial
summaries capture key details.
3. Question Answering:
o Task: "What is the capital of Japan?"
o APE Process: The AI might first ask the question directly and then generate
more detailed prompts like "Explain why Tokyo is the capital of Japan and
how it became the capital," refining the prompt to provide both the answer and
relevant context.
4. Data Classification:
o Task: "Classify these reviews as either 'Positive' or 'Negative'."
o APE Process: The AI might test various formulations like "Is the following
review positive or negative?" or "Categorize the sentiment of this review,"
iterating on the prompt structure to maximize classification accuracy.
Benefits of APE:
• Scalability: Automatically generating effective prompts at scale saves time and effort,
especially for large or complex datasets.
• Improved Accuracy: Through self-optimization, the AI can consistently improve its
performance on a given task by finding the best prompts.
• Adaptability: APE adapts to various tasks and domains, making it a powerful tool in
dynamic environments where tasks or data requirements change frequently.
Challenges:
• Over-Optimization: The AI may focus too much on certain prompts that are
effective for short-term tasks but lack long-term generalization.
• Bias: APE systems can inadvertently reinforce biases if they optimize prompts based
on biased datasets or tasks.
Active-Prompt
Active-Prompt stands in contrast to static prompting, where a single, fixed prompt is used
throughout a task. Instead, with Active-Prompt, the AI adjusts its prompts or further
questions to refine its output as the process continues.
1. Dynamic Adaptation:
o Prompts can change based on the AI’s partial outputs or feedback from the
user. As more information becomes available, the system refines its queries to
improve the accuracy or relevance of the results.
2. Iterative Improvement:
o Rather than generating a final response immediately, the AI engages in an
iterative process where it actively modifies or updates its prompts based on the
quality or content of its intermediate outputs. This allows for incremental
improvements.
3. Contextual Awareness:
o The AI becomes more contextually aware as it actively uses prior outputs or
knowledge from previous parts of the conversation or task. This means that
the AI’s understanding deepens as the interaction continues, leading to more
nuanced and precise results.
4. Multi-Step Reasoning:
o Active-Prompt is particularly useful in tasks that require multi-step reasoning
or complex problem-solving, where one prompt’s answer feeds into the next
prompt. For example, in tasks involving logic, math, or scientific reasoning,
active prompts guide the AI step-by-step, refining each step based on previous
answers.
Benefits of Active-Prompt:
• Improved Accuracy: By allowing the AI to adapt its prompts during the task,
Active-Prompt helps ensure more accurate and relevant outputs.
• Flexibility: The AI can handle complex, evolving tasks that require different
information at different stages, improving its utility in dynamic environments.
• User Interaction: Active-Prompt can be used to guide AI-human collaboration,
where the system actively asks for clarifications or more details from the user as the
interaction unfolds.
Challenges:
• Benefit: Directional stimulus prompts help to narrow down the focus of AI models,
ensuring that responses are aligned with specific goals. By guiding the AI to attend to
particular aspects of a topic, the outputs become more contextually relevant.
• Example: Instead of a general question like, "What are the benefits of renewable
energy?" a prompt with directional stimuli like "Explain the economic benefits of
solar energy for small businesses" ensures a focused, relevant response.
3. Reduced Ambiguity
• Benefit: With clear guidance in the prompts, fewer iterations are needed to achieve
the desired outcome, making the interaction more efficient. The AI produces higher-
quality responses early on, reducing the need for multiple rounds of refinement.
• Example: Instead of trial and error with general prompts, using targeted directional
stimuli like "Analyze the environmental impacts of plastic waste in oceans, with a
focus on marine life" saves time and leads directly to the relevant insights.
• Benefit: Directional stimulus prompting can adjust the AI's tone, style, and content to
suit different audiences or contexts, making the outputs more engaging and effective.
It allows prompt engineers to control the AI's voice based on the situation.
• Example: A prompt like "Explain quantum computing to a high school student"
tailors the explanation to a simpler level, whereas "Provide a detailed technical
overview of quantum computing for computer science professionals" guides the AI
toward a more advanced and technical response.
• Benefit: Directional stimuli improve the alignment between the user’s intent and the
AI’s output. By offering specific guidance, prompt engineers can ensure that the
model’s responses better match what the user wants to achieve.
• Example: A prompt like "Design a marketing campaign" could yield broad results,
but a more directed prompt such as "Design a social media marketing campaign for a
new vegan product targeting millennials, with a focus on sustainability" aligns the
response with the user’s clear intent.
Examples of Directional Stimulus Prompting applied in different contexts:
1. Reasoning:
o Involves providing the model with a structured approach to think through a
problem step by step.
o The model "thinks aloud," breaking down the components of the problem,
making inferences, and identifying the steps needed to solve it.
2. Action:
o Once reasoning is established, the model takes action based on the conclusions
drawn during the reasoning process.
o These actions can include performing specific operations, interacting with
external tools (in cases like web browsing or interacting with an API), or
choosing an option from a set of possibilities.
• Prompt:
Problem: "John has 10 apples. He gives 4 apples to Sarah and then buys 6 more. How
many apples does John have now?"
Step 1 (Reasoning):
"John starts with 10 apples. He gives away 4 apples, so now he has 10 - 4 = 6 apples.
After that, he buys 6 more apples, so we need to add those 6 to his current number of
apples."
Step 2 (Action):
"6 + 6 = 12."
Step 3 (Reasoning):
"Therefore, John has 12 apples in total now."
In this case, the model first reasons through the operations that need to be performed, then
takes the actions (in this case, performing arithmetic operations), and finally confirms the
result by reasoning through the final step.
• Prompt:
"Find the current price of Bitcoin."
Step 1 (Reasoning):
"To find the price of Bitcoin, I need to check a reliable source, such as a financial
website or a cryptocurrency exchange."
Step 2 (Action):
"Let's visit the website of a popular cryptocurrency exchange like Coinbase or
Binance."
Step 3 (Reasoning):
"I have found the price of Bitcoin on Coinbase, and it is $35,000 at the moment."
In this case, ReAct prompting structures the AI's reasoning before it takes an action (visiting
the website), ensuring that the steps are clear, logical, and goal-directed.
Use Cases of ReAct Prompting:
• Multi-step problem solving: For complex math or logic problems that require
multiple steps and decisions.
• Tool-assisted tasks: When an AI is interacting with external tools like APIs or
databases.
• Puzzle solving and reasoning challenges: Where AI needs to reason through puzzles
(e.g., Sudoku) or strategic problems.
• Research or retrieval tasks: Finding specific information on the web, browsing
documents, or conducting detailed searches.
Multimodal Chain-of-Thought (CoT) Prompting
Multimodal Chain-of-Thought (CoT) Prompting is an advanced technique in AI that
combines multimodal inputs (such as text, images, audio, or video) with chain-of-thought
reasoning to guide an AI model through complex problem-solving tasks. This approach
enables the model to reason step-by-step while processing and integrating different types of
information (modalities), improving its ability to generate more accurate and contextually
rich responses.
1. Multimodal Inputs:
o Involves providing the AI with more than one type of input—such as text
combined with images, diagrams, or even audio.
o This is crucial for tasks where understanding or generating information
requires more than just textual data. For example, analyzing visual data (e.g., a
chart) alongside a written report or correlating images with descriptions.
2. Chain-of-Thought (CoT) Reasoning:
o The Chain-of-Thought approach allows the AI to reason through problems in
a step-by-step manner. Instead of jumping to an answer directly, the model
breaks the problem into smaller steps, reasoning through each part
sequentially.
o CoT enhances transparency, as it provides insight into the model’s decision-
making process, making it easier to follow how it arrived at the final solution.
1. Enhanced Understanding:
o By integrating multiple forms of information, the AI gains a deeper, more
comprehensive understanding of the problem. It can make better inferences by
considering text descriptions alongside images, diagrams, or other data types.
o Example: In a task where the model needs to interpret an image of a graph
and a related paragraph of text, CoT prompting allows the AI to reason
through how the textual explanation matches the trends in the graph.
2. Improved Transparency and Explainability:
o The step-by-step reasoning makes the model’s thought process more
transparent, allowing users to follow how the AI interprets different inputs and
combines them to reach the final conclusion.
o Example: In a medical diagnosis scenario, the AI can be asked to explain how
a symptom in the text correlates with a feature in the medical image, making
the diagnostic process more understandable.
3. Better Performance on Complex Tasks:
o For tasks that require analyzing and combining data from different modalities,
multimodal CoT prompting ensures that the AI doesn't overlook or
misinterpret any piece of information. This is especially useful in domains like
data analysis, research, or technical problem-solving, where multiple types of
information need to be synthesized.
o Example: In technical research, where data from experiments (e.g., numerical
tables) and written reports must be analyzed together, multimodal CoT
prompting ensures that the AI carefully reasons through how each modality
contributes to the conclusions.
4. Versatility Across Domains:
o Multimodal CoT prompting can be applied across a wide range of domains:
healthcare, scientific research, visual tasks (like art analysis or object
recognition), business intelligence, and more. It excels in tasks that require a
combination of logical reasoning with multimodal data processing.
Task: An AI is asked to analyze a chart showing sales data along with a text report
explaining the reasons for fluctuations.
• Prompt:
Step 1 (Analyze the chart): "First, examine the sales chart. Identify any trends or
significant changes over time."
Step 2 (Analyze the text): "Next, read the text report and summarize the reasons
provided for any increases or decreases in sales."
Step 3 (Combine insights): "Now, combine your analysis of the chart with the
information from the report. Explain how the trends in sales data correspond to the
reasons outlined in the report."
Here, the model is guided through a step-by-step process: analyzing each modality (the chart
and the text) independently, then reasoning through how the two relate to one another to form
a final conclusion.
Example in Medical Diagnosis (Text and Image):
Task: A doctor provides an AI assistant with a text description of a patient's symptoms and
an X-ray image for diagnosis.
• Prompt:
Step 1 (Analyze symptoms): "First, analyze the patient’s symptoms: cough, fever, and
shortness of breath."
Step 2 (Analyze X-ray): "Now, examine the X-ray image. Identify any abnormalities,
such as fluid buildup in the lungs."
Step 3 (Reason through the diagnosis): "Next, based on the combination of symptoms
and the X-ray findings, suggest a diagnosis and explain your reasoning."
Here, the model reasons through the relationship between the patient’s symptoms and the
visual data in the X-ray, allowing it to generate a diagnosis based on multimodal inputs.
• Medical Diagnosis: Combining patient descriptions (text) with medical images (e.g.,
X-rays, MRI scans) to reason through potential diagnoses.
• Data Analysis: Analyzing data from graphs, tables, and reports in business
intelligence or scientific research.
• Creative Arts: Interpreting images, videos, or audio alongside text, such as reviewing
an artwork (image) and its critical analysis (text).
• Education: Assisting students in understanding complex subjects by analyzing
diagrams, equations, and text together in subjects like physics or chemistry.
Graph Prompting
Graph Prompting refers to the use of graph-based structures to enhance and guide AI
models in reasoning, understanding, and generating responses based on relationships between
data points. Graphs, in this context, are representations where nodes (or vertices) represent
entities, concepts, or data points, and edges (or links) represent relationships or connections
between them.
By utilizing graph-based structures within prompts, the AI can more effectively interpret,
reason through, and generate information that is contextually linked and relationally aware.
1. Graph-Structured Inputs:
o Prompts can include graphs (or references to graph structures) as part of the
input, instructing the AI to consider the relationships between nodes in the
graph.
o The model is asked to analyze the graph and reason through the relationships
before generating an output.
2. Reasoning Based on Graph Topology:
o The AI can be guided to perform specific types of reasoning based on the
graph’s structure. For example, in a social graph, the model could be prompted
to analyze how the removal of a key node (a highly connected person) might
impact the overall network.
3. Chain-of-Thought Reasoning with Graphs:
o Chain-of-thought (CoT) reasoning can be applied within the graph structure,
where the model is prompted to think step by step about how moving through
different nodes (concepts or decisions) impacts the final outcome or
conclusion.
4. Inference from Relationships:
o The model is guided to make inferences based on the edges (relationships)
between nodes. For example, in a knowledge graph, the AI can be prompted to
infer new information by following the relationships between connected
concepts.
1. Relational Understanding:
o Graph prompting helps AI understand complex relational data, making it more
adept at answering questions or solving problems that depend on
interconnected pieces of information.
o Example: In a knowledge graph of historical events, the AI can infer that two
events are related by following the edges between them, leading to a deeper
understanding of their cause and effect.
2. Efficient Problem Solving:
o By using graph structures, AI can efficiently traverse through nodes to find
solutions, reducing the computational complexity of searching through large
datasets.
o Example: In a decision tree graph, the AI can systematically explore different
decision paths to determine the most optimal outcome.
3. Improved Multimodal Integration:
o Graph prompting is useful in tasks that require integrating multiple types of
information. For instance, a graph may represent both textual data (concepts)
and visual data (images), allowing the AI to reason across modalities.
o Example: In a medical diagnosis graph, text-based symptoms could be
connected to image-based results (X-rays), guiding the AI to make more
accurate diagnoses.
4. Enhanced Explainability:
o The graph structure, especially when paired with step-by-step reasoning,
provides a more transparent explanation of how the AI arrived at a particular
conclusion. Users can trace the path through the graph to understand the
reasoning process.
o Example: In a knowledge graph-based system, the AI can explain how two
concepts are related by walking through the specific nodes and edges that
connect them.
Example of Graph Prompting:
Task: An AI is asked to analyze a family tree (a graph where nodes represent family
members and edges represent relationships like parent-child or siblings) to answer a question
about relationships.
• Prompt:
Graph Input: "Here is a family tree with nodes representing family members and
edges showing relationships (e.g., parent-child, sibling). Analyze this graph."
Question: "Who is the grandparent of Sarah?"
Step 1 (Analyze nodes and edges): "Sarah is a node. Her parents are connected to her
by edges. I will trace those edges to find Sarah's parents."
Step 2 (Trace relationships): "Now, I will trace the edges from Sarah's parents to their
parents, which gives me Sarah's grandparents."
Conclusion (Generate answer): "Sarah's grandparent is [Name], based on the graph."
In this case, the AI reasons through the family tree by following the edges to find the relevant
relationships.