The document summarizes various prompt engineering techniques for large language models (LLMs), including zero-shot and few-shot prompting, chain-of-thought prompting, and retrieval augmented generation. Each technique is accompanied by examples and references to relevant research papers, highlighting their effectiveness in improving model performance across different tasks. Additionally, it addresses methods to reduce hallucination in LLM outputs, emphasizing the importance of structured reasoning and validation processes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
96 views25 pages
LLM+技巧总结+ +Prompt+Engineering指南
The document summarizes various prompt engineering techniques for large language models (LLMs), including zero-shot and few-shot prompting, chain-of-thought prompting, and retrieval augmented generation. Each technique is accompanied by examples and references to relevant research papers, highlighting their effectiveness in improving model performance across different tasks. Additionally, it addresses methods to reduce hallucination in LLM outputs, emphasizing the importance of structured reasoning and validation processes.
图方便快速检阅和使用。 🌟 1. New Tasks Without Extensive Training 1.1 Zero-Shot Prompting Zero-Shot Prompting is an important innovation in the field of LLMs. Proposed by Radford et al. (2019), this technique allows us to guide models to new tasks with cleverly designed prompts in the absence of large-scale, specialized training data. This means that the model receives a description of the task, rather than specific training labels or data for that task. This technology relies on the knowledge base of the model itself, which can use these prompts to react and predict new tasks. Here's an example: Input: 💡 Classify the text into neutral, negative or positive. Text: I think the vacation is okay. Sentiment: Outpu: 💡 Neutral 1.2 Few-Shot Prompting Few-Shot Prompting was proposed by Brown et al. (2020) to help models learn specific tasks by providing a few input-output norms compared to zero-shot prompts. As described in the paper, the performance of a model when performing complex tasks can be significantly improved by a selection of high-quality examples, especially in the absence of examples at all. Nonetheless, this approach may have difficulties with long texts due to the need for more input tokens. In addition, the selection of examples is critical to the final performance of the model, and inappropriate example selection may lead to the model learning imprecise or biased information. Input: 💡 A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is: Outpu: 💡 When we won the game, we all started to farduddle in celebration. 🌟2.Reasoning and Logic 2.1 Chain-of-Thought (CoT) Prompting LLMs To overcome the limitations of LLMs in handling complex inference tasks, Wei et al. (2022) proposed an innovative approach called CoT. The technique aims to facilitate a more continuous and step-by-step thought process of the model by introducing a special prompting strategy. The main contribution of the coherent thinking technique is that it can more effectively motivate LLMs to produce structured and well-considered responses than traditional prompting methods. Through a series of experiments, this technique has demonstrated its unique utility in facilitating the execution of logical reasoning by models, especially in enabling models to gain a deeper understanding of the problem. For example, it depicts in detail the logical steps required to solve complex mathematical problems, a process much like human problem- solving thinking. Using CoT, the researchers achieved an unprecedented accuracy of 90.2% in mathematical and common-sense reasoning tests using the PaLM 540B model. 2.2 Automatic Chain-of-Thought (Auto-CoT) Prompting While establishing a manual CoT paradigm can improve the inference of the model, the process is time-consuming and inefficient. To solve this problem, Zhang et al. (2022) proposed the Auto-CoT technique. This technology can automatically generate "let's think step by step" prompts to help large language models form inference chains. In particular, this technique focuses on avoiding errors that can occur in a single chain of inference, and improving overall stability through diverse sample generation. It is capable of generating multiple unique chains of inference for a variety of problems and combining them into one ultimate paradigm collection. This automated and diverse sample generation method effectively reduces the error rate, improves the efficiency of few-shot learning, and avoids the tedious work of manually constructing CoT. After applying this technology, the accuracy of arithmetic and symbolic reasoning task tests using GPT-3 increased by 1.33% and 1.5%, respectively, compared to traditional CoT. 2.3 Self-Consistency Wang et al. (2022) proposed a novel decoding strategy, Self-Consistency, which aims to "replace the naïve greedy decoding used in chain-thinking prompts". The Self-Consistency method extracts multiple different inference paths from the decoder of the language model to generate a variety of possible inference chains. Then, by synthesizing these chains of reasoning, the most consistent answer is found. This strategy is based on the idea that those questions that require in-depth analysis often have more inference paths, increasing the likelihood of finding the right answer. Combining Self-Consistency with CoT resulted in significant accuracy gains across multiple standard tests, such as a 17.9% improvement in the GSM8K test, an 11.0% improvement in the SVAMP test, a 12.2% improvement in the AQuA test, a 6.4% improvement in the StrategyQA test, and a 3.9% improvement in the ARC challenge. 2.4 Logical Chain-of-Thought (LogiCoT) Prompting For LLMs, the ability to make logical reasoning is an important key to solving complex, multi- step problems across domains. The LogiCoT proposed by Zhao et al. (2023) introduces a completely new framework compared to previous stepwise inference methods (e.g., CoT). The framework takes the essence of symbolic logic to enhance the reasoning process in a more structured and organized way. In particular, LogiCoT uses the strategy of counterproof, which verifies and corrects the inference steps generated by the model by proving that if it causes a contradiction, the step is wrong. This think-verify-revise cycle effectively reduces logical errors and incorrect assumptions. In the tests of Vicuna-33b and GPT-4, LogiCoT has significantly improved the inference ability, with the accuracy of the GSM8K dataset improved by 0.16% and 1.42%, respectively, and the accuracy of the AQuA dataset was improved by 3.15% and 2.75%, respectively, compared with the traditional CoT. 2.5 Chain-of-Symbol (CoS) Prompting LLMs often encounter challenges when faced with tasks involving complex spatial relationships, in part because they rely on natural language that is easily obscured and can be biased. To overcome this limitation, Hu et al. (2023) proposed a new approach to CoS. This method chooses not to use natural language, but to use simplified symbols as prompts, which has the advantage of making the prompts clearer and more concise, and at the same time significantly improving the model's ability to deal with spatial relationship problems, and making the operation of the model easier to understand. However, there are still some challenges in terms of scalability, applicability, integration with other technologies, and symbol-based inference and interpretability. It is worth noting that after using CoS technology, the accuracy rate of ChatGPT's space tasks in Brick World has improved significantly, jumping from 31.8% to 92.6%. In addition, the number of symbols required in the process of simplifying the hint is reduced by up to 65.8%, which not only improves efficiency but also maintains a high level of accuracy. 2.6 Tree-of-Thoughts (ToT) Prompting Yao et al. (2023) and Long (2023) propose a novel prompt framework called ToT, which aims to enhance the ability of models to handle complex tasks that require deep exploration and forward-looking thinking. ToT expands on the existing prompting methods by creating a tree- like structure of intermediate inference steps, called "thinking". Each thought represents a coherent sequence of language that moves towards the final answer. This structure allows the language model to purposefully assess these thoughts in relation to progress in solving the problem. ToT enables a systematic exploration of the reasoning process by integrating functions that generate and evaluate "minds" with search algorithms (e.g., width-first search or depth-first search). This allows the model to expand when a potential solution is found, or to backtrack when an error is encountered. In the task of "24 point games", ToT is particularly effective, with a success rate of 74%, which is significantly higher than the 4% of traditional methods. In addition, ToT also performed well when dealing with word-level tasks, with a success rate of 60%, significantly higher than the 16% of traditional methods. 2.6 Tree-of-Thoughts (ToT) Prompting Our thought process is often non-linear and does not progress step by step, which presents challenges for traditional ToT-based approaches. In response to this, Yao et al. (2023) proposed an innovative "Graph Thinking" (GoT)prompting method. This method simulates the non-linear thinking pattern of the human brain by constructing a mind map, so that information can be freely jumped, retraced and integrated between different thinking paths. This makes it possible to think from multiple perspectives, thus breaking through the limitations of traditional linear thinking. The core innovation of GoT is to treat the reasoning process as a directional graph structure, and to support the diverse transformation of thinking through a flexible modular design. This approach not only brings closer to the human way of thinking, but also significantly enhances the model's ability to deal with complex problems. In practice, GoT shows significant performance improvement on multiple tasks compared to traditional CoT-based thinking (CoT) prompts. For example, on the GSM8K dataset, the accuracy of the T5-base and T5-large models is improved by 3.41% and 5.08%, respectively. At the same time, compared with the state-of-the-art multimodal CoT method, the accuracy of ScienceQA is increased by 6.63% and 1.09%, respectively. 2.7 Graph-of-Thoughts (GoT) Prompting Our thought process is often non-linear and does not progress step by step, which presents challenges for traditional ToT-based approaches. In response to this, Yao et al. (2023) proposed an innovative "Graph Thinking" (GoT)prompting method. This method simulates the non-linear thinking pattern of the human brain by constructing a mind map, so that information can be freely jumped, retraced and integrated between different thinking paths. This makes it possible to think from multiple perspectives, thus breaking through the limitations of traditional linear thinking. The core innovation of GoT is to treat the reasoning process as a directional graph structure, and to support the diverse transformation of thinking through a flexible modular design. This approach not only brings closer to the human way of thinking, but also significantly enhances the model's ability to deal with complex problems. In practice, GoT shows significant performance improvement on multiple tasks compared to traditional CoT-based thinking (CoT) prompts. For example, on the GSM8K dataset, the accuracy of the T5-base and T5-large models is improved by 3.41% and 5.08%, respectively. At the same time, compared with the state-of-the-art multimodal CoT method, the accuracy of ScienceQA is increased by 6.63% and 1.09%, respectively. 2.8 System 2 Attention (S2A) Prompting In the application of LLMs, soft attention can sometimes attract irrelevant information, which can reduce the accuracy of the model in generating answers. To overcome this challenge, Weston and Sukhbaatar (2023) propose an innovative approach called S2A. This approach significantly improves the quality of information processing and the relevance of responses by reconstructing the context of the input so that the model can focus on the most critical parts of the information. In particular, S2A works through a two-stage process to improve attention mechanisms and improve the quality of answers—first with the regeneration of the context, followed by the generation of the answer on this refined context. The method was tested on a variety of tasks, including factual question answering, long text generation, and solving math problems. In the factual question answering task, S2A achieves a high accuracy rate of 80.3%, which significantly improves the accuracy of information. In terms of long text generation, it also improves the objectivity of the text, with a score of 3.82 out of 5. 2.9 Thread of Thought (ThoT) Prompting Zhou et al. (2023) proposed ThoT, a technique designed to improve the reasoning ability of LLMs in dealing with complex situations. This approach mimics the human thought process, analyzing complex situations step by step by breaking them down into smaller, more manageable pieces. It uses a two-stage strategy where each small part is first generalized and examined, and then the information is further refined to arrive at the final answer. One of the highlights of ThoT's flexibility is that it can be used as a versatile "plug-and-play" component that effectively improves the inference efficiency of multiple models and prompting techniques. When tested on Q&A and conversational datasets, especially in complex scenarios, ThoT showed significant performance gains of 47.20% and 17.8%, respectively.
2.10 Chain-of-Table Prompting
Traditional methods such as CoT, PoT, and ToT often rely on free text or code to demonstrate the inference steps, which often encounters challenges when dealing with complex tabular data. To address this problem, Wang et al. (2024) developed an innovative Chain-of-Table prompting method. This method implements a dynamic tabular inference process by performing stepwise SQL/DataFrame operations on the table, in which each iteration is aimed at improving the intermediate results, thereby improving the LLM's ability to use the logical inference chain to make predictions. It is worth noting that the tabular chain prompt method achieves significant performance improvement on the tabular datasets of TabFact and WikiTQ, reaching 8.69% and 6.72%, respectively.
🌟3. Reduce Hallucination
3.1 Retrieval Augmented Generation (RAG) While LLMs have made breakthroughs in the field of text generation, their reliance on limited and fixed training data limits their ability to provide accurate answers on tasks that require extensive external knowledge. Traditional prompting techniques cannot overcome this limitation and require costly model retraining. Faced with this challenge, Lewis et al. (2020) proposed an innovative approach called Retrieval Augmented Generation (RAG), which provides an entirely new solution by seamlessly integrating information retrieval technology into the prompting process. The RAG approach analyzes user input, generates targeted queries, retrieves relevant information in a pre-built knowledge base, and then incorporates the retrieved pieces of information into the original prompt to add context. This approach not only improves the innovation and accuracy of the answers, but also breaks through the limitations of traditional models through its flexible nature, bringing significant improvements to tasks that rely on the latest knowledge. In ODQA's standard tests, the RAG model outperformed the seq2seq model and task-specific architecture, achieving an accurate match score of 56.8% on the TriviaQA dataset and 44.5% on the Natural Questions dataset. For a detailed introduction and practical work of RAG, please refer to my other articles. 3.2 ReAct Prompting Unlike traditional research that treats reasoning and action as independent elements, Yao et al. (2022) proposed the ReAct technique, which gives LLMs the ability to act while generating reasoning. This integrated approach fosters greater synergy between reasoning and action, allowing the model to more effectively develop, track, and update its action plan in the face of unexpected events. ReAct technology has been used in a variety of language processing and decision-making tasks and outperforms current state-of-the-art methods in terms of performance. In particular, in the HotpotQA and Fever tasks, ReAct interacts with the Wikipedia API to effectively address the problem of fiction and mispropagation, providing a clearer path to the solution. ReAct also performed well in interactive decision-making tasks such as ALFWorld and WebShop, with success rates of 34% and 10%, respectively, with minimal contextual example input.
3.3 Chain-of-Verification (CoVe) Prompting
For the hallucinatory phenomenon, Dhuliawala et al. (2023) proposed a method called CoVe. There are four main steps to this method: 1. Generate initial answers 2. Plan validation questions to validate your work 3. Answer these questions independently 4. Revise the preliminary answer based on the results of the validation CoVe mimics the thought process of human validation, improving the consistency and accuracy of the output of large language models. When dealing with tasks such as list questions, Q&A, and long text generation, CoVe effectively reduces the occurrence of fictitious information while ensuring the authenticity of the information provided. With well-designed validation questions, the model is able to identify its own errors and correct them, resulting in a significant increase in accuracy.
🌟4. User Interface
4.1 Active Prompting Active Prompting, developed by Diao et al. (2023), aims to make LLMs more effective in adapting to a variety of complex inference tasks. This approach introduces task-specific example prompts and coding to improve the model's performance in complex Q&A. Unlike traditional CoTs, which rely on fixed samples, Active Prompting employs a new strategy that focuses on identifying and selecting the questions that are most helpful and uncertain to the model's progress. This method is inspired by the uncertainty-based active learning strategy to optimize the problem selection process by evaluating different uncertainty indicators. In the performance of eight complex inference tasks, Active Prompting significantly outperformed the self-consistency strategy, achieving an average improvement of 7.0% and 1.8% on the text- davinci-002 and code-davinci-002 models, respectively, demonstrating its leading technical effect. 🌟5. Fine-Tuning and Optimization 5.1 Automatic Prompt Engineer (APE) In general, designing effective prompts for LLMs requires expert crafting, which is a complex task. However, the APE technique proposed by Zhou et al. (2022) opens up a new way to automatically create and select instructions. APE technology breaks through the limitations of manual and fixed prompts, dynamically generating and selecting the most effective prompts for a specific task. This method analyzes user input, designs a series of candidate instructions, and then selects the optimal prompt through reinforcement learning, which can be adapted to different scenarios in real time. After extensive testing on a variety of BIG-Bench test suites and CoT tasks, APE has shown significant results, outperforming human-written Prompts in most cases (19/24 tasks) and significantly enhancing the inference performance of LLMs. Innovative advances in APE technology provide LLMs with a more efficient and flexible way to handle a wider range of tasks, maximizing their potential in a variety of applications. 🌟6. Knowledge-Based Reasoning and Generation 6.1 Automatic Reasoning and Tool-use (ART) LLMs are limited in their ability to handle complex tasks due to their limited inference capabilities and inability to utilize external tools. In response to this problem, Paranjape et al. (2023) proposed the ART technique, which gives LLMs the ability to reason through a multi- step process and seamlessly integrate external knowledge. ART technology effectively complements the inference gap, enabling LLMs to handle more complex problems that go far beyond simple text generation. By integrating external expertise and computational tools, ART brings unprecedented versatility and utility to LLMs, enabling them to make a difference in areas such as scientific research, data analysis, and decision support. ART eliminates the need for tedious manual design by automating inference steps with structured procedures, and its dynamic tool integration capabilities ensure smooth collaboration with external tools. In empirical testing of two challenging benchmarks, BigBench and MMLU, ART demonstrated exceptional results, not only surpassing traditional facilitation techniques, but in some cases even reaching a level comparable to well-designed demonstrations. 🌟7. Improving Consistency and Coherence 7.1 Contrastive Chain-of-Thought (CCoT) Prompting Traditional coding techniques often miss out on learning from mistakes. To address this, Chia et al. (2023) proposed CCoT technology. This technique guides the model by providing examples of both right and wrong inference, like exploring a map that shows both the right path and the wrong bends, demonstrating what makes CCoT unique. This dual-view approach has been validated in inference benchmarks such as SQuAD and COPA, leading LLMs to perform stepwise inference, achieving a 4% to 16% improvement over traditional CoT in the assessment of strategic and mathematical reasoning. When combined with self-consistency, performance is further improved by about 5%. However, the technology still faces some challenges, such as how to automatically generate comparative examples for different problems, and its applicability to other natural language processing tasks in addition to inference.
🌟8. Managing Emotions and Tone
8.1 Emotion Prompting While LLMs have demonstrated excellent performance on many tasks, their ability to understand psychological and emotional signals leaves much to be desired. To address this issue, Li et al. (2023) proposed the EmotionPrompt technique, which is inspired by psychological research that studies the effects of language on human emotional performance, by adding 11 emotionally motivated sentences to the prompt, aiming to enhance the emotional intelligence of LLMs. Experimental results show that the introduction of these emotionally motivated sentences significantly improves the performance of LLMs on various tasks. Specifically, EmotionPrompt achieves an 8% performance improvement in instruction learning tasks and a significant leap of up to 115% in BIG-Bench tasks, which fully demonstrates its effectiveness in improving LLM processing of emotional signals. In addition, evaluations involving 106 participants showed an average improvement of 10.9% in terms of performance, authenticity, and accountability on creative tasks compared to standard prompts.
🌟9. Code Generation and Execution
9.1 Scratchpad Prompting While Transformer-based LLMs excel at writing code for simple programming tasks, they face challenges for complex, multi-step algorithmic computational tasks that require precise reasoning. In response to this problem, Nye et al. (2021) proposed a new approach that focuses on task design rather than modification of the model itself, introducing the concept of "notebook". This strategy allows the model to generate a series of intermediate steps before giving a final answer. After using the notebook prompting method, the success rate of the model in MBPP-aug reached 46.8%. When combined with CodeNet and a single-line dataset, the model showed the best performance, with 26.6% of the final output being correct and 24.6% of the perfectly executed path. However, notebook prompting has its limitations, including a fixed context window limited to 512 steps and a high reliance on supervised learning to make effective use of notebooks. 9.2 Program of Thoughts (PoT) Prompting Due to a tendency to arithmetic errors, inadequate ability to handle complex equations, and inefficiencies in expressing complex iterations. In order to enhance the capabilities of LLMs in numerical reasoning, Chen et al. (2022) proposed PoT, which encourages the use of external language interpreters to handle computational steps. With this approach, models such as Codex are able to execute Python programs to show their inference processes, improving performance by about 12% on average compared to traditional coding hints when processing datasets that include math questions and financial problems. 9.3 Structured Chain-of-Thought (SCoT) Prompting The CoT approach commonly adopted by LLMs in the field of code generation is to first produce an intermediate inference step in natural language before generating code. While this is very effective for natural language generation, it is less accurate in code generation tasks. In response to this problem, Li et al. (2023) proposed an innovative Prompt specifically for code generation – SCoT. SCoT significantly improves LLMs' ability to generate structured source code by incorporating program structures such as sequences, branches, and loops into the inference step. This approach places a strong emphasis on considering requirements from a source code perspective, resulting in a significant improvement in the efficiency of code generation compared to traditional coding. The method verified its effectiveness in three benchmarks (HumanEval, MBPP, and MBCPP) on ChatGPT and Codex, and demonstrated that its performance was up to 13.79% higher than that of the CoT prompting method. 9.4 Chain-of-Code (CoC) Prompting Although CoT is excellent at improving the semantic reasoning ability of LLMs, it is somewhat ineffective in dealing with problems that require numerical or symbolic inference. In response to this problem, Li et al. (2023) proposed the CoC technique, which aims to enhance the model's reasoning ability on logical and semantic tasks through programming. The CoC encourages LLMs to transform semantic subtasks into flexible pseudocode, which not only allows interpreters to recognize and handle undefined behavior, but also simulates operations through "LMulator". The experimental results show that CoC surpasses CoT and other baselines with an accuracy rate of 84% in the BIG-Bench Hard test, and the accuracy rate is improved by 12%. 🌟10. Optimization and Efficiency 10.1 Optimization by Prompting (OPRO) Finding the best solution in a variety of fields often requires trial and error. Yang et al. (2023) proposed an innovative idea: using LLMs to assist in finding solutions, an approach known as OPRO. The peculiarity of this approach is that it uses LLM prompts to find a solution step by step based on the problem description, allowing it to quickly adapt to different problems and adjust the process of finding a solution as needed. Through case studies of typical problems such as linear regression and the traveling salesman problem, this study demonstrates the great potential of LLMs in finding solutions. At the same time, it explores how to optimize prompts to achieve the highest accuracy when dealing with natural language tasks, further demonstrating the high sensitivity of LLMs. With OPRO optimized prompts, performance is improved by up to 8% on the GSM8K dataset and up to 50% on some of the more challenging tasks in Big-Bench compared to manually designed prompts. 🌟11. Understanding User Intent 11.1 Rephrase and Respond (RaR) Prompting Deng et al. (2023) pointed out that when we use LLMs, we often ignore the differences between the human way of thinking and the way of thinking of LLMs. To bridge this gap, they proposed a new method called RaR. This approach allows the LLM to rephrase and expand the question in the prompt, improving the understanding of the question and the accuracy of the answer. By combining rewriting and answering, RaR's two-step approach delivers significant performance gains across a wide range of tasks. The study found that rewritten questions conveyed semantics more clearly and reduced ambiguity than randomly asked human questions. These findings provide valuable insights into understanding and improving the effectiveness of LLMs in different applications.
🌟12. Metacognition and Self-Reflection
11.1 Rephrase and Respond (RaR) Prompting Faced with the challenge of complex multi-step reasoning, Zheng et al. (2023) proposed Take a Step Back Prompting for high-level language models such as PaLM-2L. This innovation allows the model to engage in high-level abstract thinking, drawing basic principles and high-level concepts from concrete cases. Take a Step Back Prompting uses a two-step process that includes abstraction and inference, which has been validated by extensive experiments, and the application of the technology to reasoning-intensive tasks such as STEM, quiz, and multi- step reasoning significantly improves the reasoning capabilities of PaLM-2L. In particular, the performance of MMLU for physical and chemical tasks, TimeQA, and MuSiQue improved by 7%, 27%, and 7%, respectively.