0% found this document useful (0 votes)

36 views471 pages

Langchain API Docs

Uploaded by

胡修雨

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views471 pages

Langchain API Docs

Uploaded by

胡修雨

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 471

LangChain Expression How Inspect your

Language to runnables

On this page

Inspect your runnables

Once you create a runnable with LCEL, you may often want to inspect it to get a better sense for what is going on. This
notebook covers some methods for doing so.

First, let’s create an example LCEL. We will create one that does retrieval

%pip install --upgrade --quiet langchain langchain-openai faiss-cpu tiktoken

from langchain.prompts import ChatPromptTemplate
from langchain.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

Get a graph

You can get a graph of the runnable

chain.get_graph()

Print a graph

While that is not super legible, you can print it to get a display that’s easier to understand

chain.get_graph().print_ascii()
+---------------------------------+
| Parallel<context,question>Input |
+---------------------------------+
** **
*** ***
** **
+----------------------+ +-------------+
| VectorStoreRetriever | | Passthrough |
+----------------------+ +-------------+
** **
*** ***
** **
+----------------------------------+
| Parallel<context,question>Output |
+----------------------------------+
*
*
*
+--------------------+
| ChatPromptTemplate |
+--------------------+
*
*
*
+------------+
| ChatOpenAI |
+------------+
*
*
*
+-----------------+
| StrOutputParser |
+-----------------+
*
*
*
+-----------------------+
| StrOutputParserOutput |
+-----------------------+

Get the prompts

An important part of every chain is the prompts that are used. You can get the prompts present in the chain:

chain.get_prompts()
[ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'quest

Help us out by providing feedback on this documentation page:

Previous
« Stream custom generator functions
Next
Add message history (memory) »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O

On this page

Model I/O
The core element of any language model application is...the model. LangChain gives you the building blocks to interface with
any language model.

Conceptual Guide

A conceptual explanation of messages, prompts, LLMs vs ChatModels, and output parsers. You should read this before
getting started.

Quickstart

Covers the basics of getting started working with diﬀerent types of models. You should walk throughthis section if you want to
get an overview of the functionality.

Prompts

This section deep dives into the diﬀerent types of prompt templates and how to use them.

LLMs

This section covers functionality related to the LLM class. This is a type of model that takes a text string as input and returns
a text string.

ChatModels
This section covers functionality related to the ChatModel class. This is a type of model that takes a list of messages as input
and returns a message.

Output Parsers

Output parsers are responsible for transforming the output of LLMs and ChatModels into more structured data.This section
covers the diﬀerent types of output parsers.

Help us out by providing feedback on this documentation page:

Previous
« Modules
Next
Model I/O »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversTime-weighted vector store retriever

On this page

Time-weighted vector store retriever

This retriever uses a combination of semantic similarity and a time decay.

The algorithm for scoring them is:

semantic_similarity + (1.0 - decay_rate) ^ hours_passed

Notably, hours_passed refers to the hours passed since the object in the retrieverwas last accessed, not since it was created.
This means that frequently accessed objects remain “fresh”.

from datetime import datetime, timedelta

import faiss
from langchain.docstore import InMemoryDocstore
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

Low decay rate

A low decay rate (in this, to be extreme, we will set it close to 0) means memories will be “remembered” for longer. Adecay rate
of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup.

# Deﬁne your embedding model

embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1
)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
[Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])
['c3dcf671-3c0a-4273-9334-c4a913076bfa']
# "Hello World" is returned ﬁrst because it is most salient, and the decay rate is close to 0., meaning it's still recent enough
retriever.get_relevant_documents("hello world")
[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 18, 457125), 'created_at': datetime.datetime(2023, 12

High decay rate

With a high decay rate (e.g., several 9’s), the recency score quickly goes to 0! If you set this all the way to 1,recency is 0 for all
objects, once again making this equivalent to a vector lookup.
# Deﬁne your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.999, k=1
)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
[Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])
['eb1c4c86-01a8-40e3-8393-9a927295a950']
# "Hello Foo" is returned ﬁrst because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 50, 57185), 'created_at': datetime.datetime(2023, 12, 27

Virtual time

Using some utils in LangChain, you can mock out the time component.

import datetime

from langchain.utils import mock_now

# Notice the last access time is that date time
with mock_now(datetime.datetime(2024, 2, 3, 10, 11)):
print(retriever.get_relevant_documents("hello world"))
[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2024, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 12, 27, 15, 30, 44, 53

Help us out by providing feedback on this documentation page:

Previous
« Self-querying
Next
Indexing »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory
ModulesMoreMemorytypes

Memory types
There are many diﬀerent types of memory. Each has their own parameters, their own return types, and is useful in diﬀerent
scenarios. Please see their individual page for more detail on each one.

Help us out by providing feedback on this documentation page:

Previous
« Chat Messages
Next
Conversation Buﬀer »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookAdding moderation

Adding moderation
This shows how to add in moderation (or other safeguards) around your LLM application.

%pip install --upgrade --quiet langchain langchain-openai

from langchain.chains import OpenAIModerationChain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
moderate = OpenAIModerationChain()
model = OpenAI()
prompt = ChatPromptTemplate.from_messages([("system", "repeat after me: {input}")])
chain = prompt | model
chain.invoke({"input": "you are stupid"})
'\n\nYou are stupid.'
moderated_chain = chain | moderate
moderated_chain.invoke({"input": "you are stupid"})
{'input': '\n\nYou are stupid',
'output': "Text was found that violates OpenAI's content policy."}

Help us out by providing feedback on this documentation page:

Previous
« Adding memory
Next
Managing prompt size »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
ModulesAgentsTools

On this page

Tools
Tools are interfaces that an agent can use to interact with the world. They combine a few things:

1. The name of the tool

2. A description of what the tool is
3. JSON schema of what the inputs to the tool are
4. The function to call
5. Whether the result of a tool should be returned directly to the user

It is useful to have all this information because this information can be used to build action-taking systems! The name,
description, and JSON schema can be used to prompt the LLM so it knows how to specify what action to take, and then the
function to call is equivalent to taking that action.

The simpler the input to a tool is, the easier it is for an LLM to be able to use it. Many agents will only work with tools that
have a single string input. For a list of agent types and which ones work with more complicated inputs, please see this
documentation

Importantly, the name, description, and JSON schema (if used) are all used in the prompt. Therefore, it is really important that
they are clear and describe exactly how the tool should be used. You may need to change the default name, description, or
JSON schema if the LLM is not understanding how to use the tool.

Default Tools

Let’s take a look at how to work with tools. To do this, we’ll work with a built in tool.

from langchain_community.tools import WikipediaQueryRun

from langchain_community.utilities import WikipediaAPIWrapper

Now we initialize the tool. This is where we can conﬁgure it as we please

api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)

tool = WikipediaQueryRun(api_wrapper=api_wrapper)

This is the default name

tool.name
'Wikipedia'

This is the default description

tool.description
'A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Inpu

This is the default JSON schema of the inputs

tool.args
{'query': {'title': 'Query', 'type': 'string'}}

We can see if the tool should return directly to the user

tool.return_direct
False

We can call this tool with a dictionary input

tool.run({"query": "langchain"})
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

We can also call this tool with a single string input. We can do this because this tool expects only a single input. If it required
multiple inputs, we would not be able to do that.

tool.run("langchain")
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

Customizing Default Tools

We can also modify the built in name, description, and JSON schema of the arguments.

When deﬁning the JSON schema of the arguments, it is important that the inputs remain the same as the function, so you
shouldn’t change that. But you can deﬁne custom descriptions for each input easily.

from langchain_core.pydantic_v1 import BaseModel, Field

class WikiInputs(BaseModel):
"""Inputs to the wikipedia tool."""

query: str = Field(

description="query to look up in Wikipedia, should be 3 or less words"
)
tool = WikipediaQueryRun(
name="wiki-tool",
description="look up things in wikipedia",
args_schema=WikiInputs,
api_wrapper=api_wrapper,
return_direct=True,
)
tool.name
'wiki-tool'
tool.description
'look up things in wikipedia'
tool.args
{'query': {'title': 'Query',
'description': 'query to look up in Wikipedia, should be 3 or less words',
'type': 'string'}}
tool.return_direct
True
tool.run("langchain")
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

Built-In Tools: For a list of all built-in tools, seethis page

Custom Tools: Although built-in tools are useful, it’s highly likely that you’ll have to deﬁne your own tools. Seethis guide for
instructions on how to do so.

Toolkits: Toolkits are collections of tools that work well together. For a more in depth description as well as a list of all built-in
toolkits, see this page

Tools as OpenAI Functions: Tools are very similar to OpenAI Functions, and can easily be converted to that format. See
this notebook for instructions on how to do that.

Help us out by providing feedback on this documentation page:

Previous
« Agents
Next
Toolkits »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels

On this page

Text embedding models

INFO

Head to Integrations for documentation on built-in integrations with text embedding model providers.

The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model
providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the
vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a
query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two
separate methods is that some embedding providers have diﬀerent embedding methods for documents (to be searched over)
vs queries (the search query itself).

Get started

Setup

OpenAI
Cohere

To start we'll need to install the OpenAI partner package:

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(openai_api_key="...")

Otherwise you can initialize without any params:

from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()

embed_documents

Embed list of texts

embeddings = embeddings_model.embed_documents(
[
"Hi there!",
"Oh, hello!",
"What's your name?",
"My friends call me World",
"Hello World!"
]
)
len(embeddings), len(embeddings[0])
(5, 1536)

embed_query

Embed single query

Embed a single piece of text for the purpose of comparing to other embedded pieces of texts.

embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?")

embedded_query[:5]
[0.0053587136790156364,
-0.0004999046213924885,
0.038883671164512634,
-0.003001077566295862,
-0.00900818221271038]

Help us out by providing feedback on this documentation page:

Previous
« Retrieval
Next
CacheBackedEmbeddings »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Message Memory in Agent backed by a
ModulesMoreMemorydatabase

Message Memory in Agent backed by a database

This notebook goes over adding memory to an Agent where the memory uses an external message store. Before going
through this notebook, please walkthrough the following notebooks, as this will build on top of both of them:

Memory in LLMChain
Custom Agents
Memory in Agent

In order to add a memory with an external message store to an agent we are going to do the following steps:

1. We are going to create a RedisChatMessageHistory to connect to an external database to store the messages in.
2. We are going to create an LLMChain using that chat history as memory.
3. We are going to use that LLMChain to create a custom Agent.

For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBuﬀerMemory class.

from langchain.agents import AgentExecutor, Tool, ZeroShotAgent

from langchain.chains import LLMChain
from langchain.memory import ConversationBuﬀerMemory
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain_community.utilities import GoogleSearchAPIWrapper
from langchain_openai import OpenAI
search = GoogleSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]

Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBuﬀerMemory.

preﬁx = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suﬃx = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)

Now we can create the RedisChatMessageHistory backed by the database.

message_history = RedisChatMessageHistory(
url="redis://localhost:6379/0", ttl=600, session_id="my-session"
)

memory = ConversationBuﬀerMemory(
memory_key="chat_history", chat_memory=message_history
)

We can now construct the LLMChain, with the Memory object, and then create the agent.
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_chain.run(input="How many people live in canada?")

> Entering new AgentExecutor chain...

Thought: I need to ﬁnd out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the ﬁnal answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'

To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.

agent_chain.run(input="what is their national anthem called?")

> Entering new AgentExecutor chain...

Thought: I need to ﬁnd out what the national anthem of Canada is called.
Action: Search
Action Input: National Anthem of Canada
Observation: Jun 7, 2010 ... https://twitter.com/CanadaImmigrantCanadian National Anthem O Canada in HQ - complete with lyrics, captions, vocals & music.LYRICS
Thought: I now know the ﬁnal answer.
Final Answer: The national anthem of Canada is called "O Canada".
> Finished AgentExecutor chain.

'The national anthem of Canada is called "O Canada".'

We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.

For fun, let’s compare this to an agent that does NOT have memory.

preﬁx = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suﬃx = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_without_memory = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True
)
agent_without_memory.run("How many people live in canada?")

> Entering new AgentExecutor chain...

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
agent_without_memory.run("what is their national anthem called?")

> Entering new AgentExecutor chain...

Thought: I should look up the answer
Action: Search
Action Input: national anthem of [country]
Observation: Most nation states have an anthem, deﬁned as "a song, as of praise, devotion, or patriotism"; most anthems are either marches or hymns in style. List o
Thought: I now know the ﬁnal answer
Final Answer: The national anthem of [country] is [name of anthem].
> Finished AgentExecutor chain.
'The national anthem of [country] is [name of anthem].'

Help us out by providing feedback on this documentation page:

Previous
« Memory in Agent
Next
Customizing Conversational Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O PromptsPipeline

Pipeline
This notebook goes over how to compose multiple prompts together. This can be useful when you want to reuse parts of
prompts. This can be done with a PipelinePrompt. A PipelinePrompt consists of two main parts:

Final prompt: The ﬁnal prompt that is returned

Pipeline prompts: A list of tuples, consisting of a string name and a prompt template. Each prompt template will be
formatted and then passed to future prompt templates as a variable with the same name.

from langchain.prompts.pipeline import PipelinePromptTemplate

from langchain.prompts.prompt import PromptTemplate
full_template = """{introduction}

{example}

{start}"""
full_prompt = PromptTemplate.from_template(full_template)
introduction_template = """You are impersonating {person}."""
introduction_prompt = PromptTemplate.from_template(introduction_template)
example_template = """Here's an example of an interaction:

Q: {example_q}
A: {example_a}"""
example_prompt = PromptTemplate.from_template(example_template)
start_template = """Now, do this for real!

Q: {input}
A:"""
start_prompt = PromptTemplate.from_template(start_template)
input_prompts = [
("introduction", introduction_prompt),
("example", example_prompt),
("start", start_prompt),
]
pipeline_prompt = PipelinePromptTemplate(
ﬁnal_prompt=full_prompt, pipeline_prompts=input_prompts
)
pipeline_prompt.input_variables
['example_q', 'example_a', 'input', 'person']
print(
pipeline_prompt.format(
person="Elon Musk",
example_q="What's your favorite car?",
example_a="Tesla",
input="What's your favorite social media site?",
)
)
You are impersonating Elon Musk.

Here's an example of an interaction:

Q: What's your favorite car?

A: Tesla

Now, do this for real!

Q: What's your favorite social media site?

Help us out by providing feedback on this documentation page:

Previous
« Partial prompt templates
Next
Chat Models »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreCallbacksAsync callbacks

Async callbacks
If you are planning to use the async API, it is recommended to useAsyncCallbackHandler to avoid blocking the runloop.

Advanced if you use a sync CallbackHandler while using an async method to run your LLM / Chain / Tool / Agent, it will still
work. However, under the hood, it will be called with run_in_executor which can cause issues if your CallbackHandler is not thread-
safe.

import asyncio
from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler

from langchain_core.messages import HumanMessage, LLMResult
from langchain_openai import ChatOpenAI

class MyCustomSyncHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")

class MyCustomAsyncHandler(AsyncCallbackHandler):
"""Async callback handler that can be used to handle callbacks from langchain."""

async def on_llm_start(

self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when chain starts running."""
print("zzzz....")
await asyncio.sleep(0.3)
class_name = serialized["name"]
print("Hi! I just woke up. Your llm is starting")

async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:

"""Run when chain ends running."""
print("zzzz....")
await asyncio.sleep(0.3)
print("Hi! I just woke up. Your llm is ending")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor

# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(
max_tokens=25,
streaming=True,
callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()],
)

await chat.agenerate([[HumanMessage(content="Tell me a joke")]])

zzzz....
Hi! I just woke up. Your llm is starting
Sync handler being called in a `thread_pool_executor`: token:
Sync handler being called in a `thread_pool_executor`: token: Why
Sync handler being called in a `thread_pool_executor`: token: don
Sync handler being called in a `thread_pool_executor`: token: 't
Sync handler being called in a `thread_pool_executor`: token: scientists
Sync handler being called in a `thread_pool_executor`: token: trust
Sync handler being called in a `thread_pool_executor`: token: atoms
Sync handler being called in a `thread_pool_executor`: token: ?
Sync handler being called in a `thread_pool_executor`: token:

Sync handler being called in a `thread_pool_executor`: token: Because

Sync handler being called in a `thread_pool_executor`: token: they
Sync handler being called in a `thread_pool_executor`: token: make
Sync handler being called in a `thread_pool_executor`: token: up
Sync handler being called in a `thread_pool_executor`: token: everything
Sync handler being called in a `thread_pool_executor`: token: .
Sync handler being called in a `thread_pool_executor`: token:
zzzz....
Hi! I just woke up. Your llm is ending
LLMResult(generations=[[ChatGeneration(text="Why don't scientists trust atoms? \n\nBecause they make up everything.", generation_info=None, message=AIMessa

Help us out by providing feedback on this documentation page:

Previous
« Callbacks
Next
Custom callback handlers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Bind runtime
Language to args

On this page

Bind runtime args

Sometimes we want to invoke a Runnable within a Runnable sequence with constant arguments that are not part of the
output of the preceding Runnable in the sequence, and which are not part of the user input. We can use Runnable.bind() to
easily pass these arguments in.

Suppose we have a simple prompt + model sequence:

%pip install --upgrade --quiet langchain langchain-openai

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Write out the following equation using algebraic symbols then solve it. Use the format\n\nEQUATION:...\nSOLUTION:...\n\n",
),
("human", "{equation_statement}"),
]
)
model = ChatOpenAI(temperature=0)
runnable = (
{"equation_statement": RunnablePassthrough()} | prompt | model | StrOutputParser()
)

print(runnable.invoke("x raised to the third plus seven equals 12"))

EQUATION: x^3 + 7 = 12

SOLUTION:
Subtracting 7 from both sides of the equation, we get:
x^3 = 12 - 7
x^3 = 5

Taking the cube root of both sides, we get:

x = ∛5

Therefore, the solution to the equation x^3 + 7 = 12 is x = ∛5.

and want to call the model with certain stop words:

runnable = (
{"equation_statement": RunnablePassthrough()}
| prompt
| model.bind(stop="SOLUTION")
| StrOutputParser()
)
print(runnable.invoke("x raised to the third plus seven equals 12"))
EQUATION: x^3 + 7 = 12

Attaching OpenAI functions

One particularly useful application of binding is to attach OpenAI functions to a compatible OpenAI model:
function = {
"name": "solver",
"description": "Formulates and solves an equation",
"parameters": {
"type": "object",
"properties": {
"equation": {
"type": "string",
"description": "The algebraic expression of the equation",
},
"solution": {
"type": "string",
"description": "The solution to the equation",
},
},
"required": ["equation", "solution"],
},
}
# Need gpt-4 to solve this one correctly
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Write out the following equation using algebraic symbols then solve it.",
),
("human", "{equation_statement}"),
]
)
model = ChatOpenAI(model="gpt-4", temperature=0).bind(
function_call={"name": "solver"}, functions=[function]
)
runnable = {"equation_statement": RunnablePassthrough()} | prompt | model
runnable.invoke("x raised to the third plus seven equals 12")
AIMessage(content='', additional_kwargs={'function_call': {'name': 'solver', 'arguments': '{\n"equation": "x^3 + 7 = 12",\n"solution": "x = ∛5"\n}'}}, example=False)

Attaching OpenAI tools

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
model = ChatOpenAI(model="gpt-3.5-turbo-1106").bind(tools=tools)
model.invoke("What's the weather in SF, NYC and LA?")
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zHN0ZHwrxM7nZDdqTp6dkPko', 'function': {'arguments': '{"location": "San Francisco, CA", "unit": "c

Help us out by providing feedback on this documentation page:

Previous
« RunnableBranch: Dynamically route logic based on input
Next
Conﬁgure chain internals at runtime »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Structured
ModulesAgentsAgent Typeschat

On this page

Structured chat
The structured chat agent is capable of using multi-input tools.

from langchain import hub

from langchain.agents import AgentExecutor, create_structured_chat_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

Initialize Tools

We will test the agent using Tavily Search

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/structured-chat-agent")
# Choose the LLM that will drive the agent
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-1106")

# Construct the JSON agent

agent = create_structured_chat_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools

agent_executor = AgentExecutor(
agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...

Action:
```
{
"action": "tavily_search_results_json",
"action_input": {"query": "LangChain"}
}
```[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps an
```
{
"action": "Final Answer",
"action_input": "LangChain is an open source orchestration framework for the development of applications using large language models. It simpliﬁes the process of
}
```

> Finished chain.

{'input': 'what is LangChain?',

'output': 'LangChain is an open source orchestration framework for the development of applications using large language models. It simpliﬁes the process of program
Use with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Do not use tools unless you have to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...

Could not parse LLM output: I understand. Your name is Bob.Invalid or incomplete responseCould not parse LLM output: Apologies for any confusion. Your name is B
"action": "Final Answer",
"action_input": "Your name is Bob."
}

> Finished chain.

{'input': "what's my name? Do not use tools unless you have to",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« JSON Chat Agent
Next
ReAct »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Get started

Get started
Get started with LangChain

️ Introduction
LangChain is a framework for developing applications powered by language models. It enables applications that:

️ Installation
Oﬃcial release

️ Quickstart
In this quickstart we'll show you how to:

️ Security
LangChain has a large ecosystem of integrations with various external resources like local and remote ﬁle systems, APIs and databases. These integrations
… allow de

Next
Introduction »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Example Selector Select by n-gram
ModulesI/O Prompts Types overlap

Select by n-gram overlap

The NGramOverlapExampleSelector selects and orders examples based on which examples are most similar to the input,
according to an ngram overlap score. The ngram overlap score is a ﬂoat between 0.0 and 1.0, inclusive.

The selector allows for a threshold score to be set. Examples with an ngram overlap score less than or equal to the threshold
are excluded. The threshold is set to -1.0, by default, so will not exclude any examples, only reorder them. Setting the
threshold to 0.0 will exclude examples that have no ngram overlaps with the input.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate

from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a ﬁctional translation task.

examples = [
{"input": "See Spot run.", "output": "Ver correr a Spot."},
{"input": "My dog barks.", "output": "Mi perro ladra."},
{"input": "Spot can run.", "output": "Spot puede correr."},
]
example_selector = NGramOverlapExampleSelector(
# The examples it has available to choose from.
examples=examples,
# The PromptTemplate being used to format the examples.
example_prompt=example_prompt,
# The threshold, at which selector stops.
# It is set to -1.0 by default.
threshold=-1.0,
# For negative threshold:
# Selector sorts examples by ngram overlap score, and excludes none.
# For threshold greater than 1.0:
# Selector excludes all examples, and returns an empty list.
# For threshold equal to 0.0:
# Selector sorts examples by ngram overlap score,
# and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
preﬁx="Give the Spanish translation of every input",
suﬃx="Input: {sentence}\nOutput:",
input_variables=["sentence"],
)
# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.

Output: Spot puede correr.

Input: See Spot run.

Output: Ver correr a Spot.

Input: My dog barks.

Output: Mi perro ladra.

Input: Spot can run fast.

Output:
# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.

Output: Spot puede correr.

Input: See Spot run.

Output: Ver correr a Spot.

Input: Spot plays fetch.

Output: Spot juega a buscar.

Input: My dog barks.

Output: Mi perro ladra.

Input: Spot can run fast.

Output:
# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.

Output: Spot puede correr.

Input: See Spot run.

Output: Ver correr a Spot.

Input: Spot plays fetch.

Output: Spot juega a buscar.

Input: Spot can run fast.

Output:
# Setting small nonzero threshold
example_selector.threshold = 0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))
Give the Spanish translation of every input

Input: Spot can run.

Output: Spot puede correr.

Input: Spot plays fetch.

Output: Spot juega a buscar.

Input: Spot can play fetch.

Output:
# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))
Give the Spanish translation of every input

Input: Spot can play fetch.

Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by maximal marginal relevance (MMR)
Next
Select by similarity »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Model Function
ModulesI/O Chat Modelscalling

On this page

Function calling
A growing number of chat models, like OpenAI, Gemini, etc., have a function-calling API that lets you describe functions and
their arguments, and have the model return a JSON object with a function to invoke and the inputs to that function. Function-
calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more
generally.

LangChain comes with a number of utilities to make function-calling easy. Namely, it comes with:

simple syntax for binding functions to models

converters for formatting various types of objects to the expected function schemas
output parsers for extracting the function invocations from API responses
chains for getting structured outputs from a model, built on top of function calling

We’ll focus here on the ﬁrst two points. For a detailed guide on output parsing check out theOpenAI Tools output parsers
and to see the structured output chains check out the Structured output guide.

Before getting started make sure you have langchain-core installed.

%pip install -qU langchain-core langchain-openai

import getpass
import os

Binding functions

A number of models implement helper methods that will take care of formatting and binding diﬀerent function-like objects to
the model. Let’s take a look at how we might take the following Pydantic function schema and get diﬀerent models to invoke
it:

from langchain_core.pydantic_v1 import BaseModel, Field

# Note that the docstrings here are crucial, as they will be passed along
# to the model along with the class name.
class Multiply(BaseModel):
"""Multiply two integers together."""

a: int = Field(..., description="First integer")

b: int = Field(..., description="Second integer")

OpenAI
Fireworks
Mistral
Together

Set up dependencies and API keys:

%pip install -qU langchain-openai

os.environ["OPENAI_API_KEY"] = getpass.getpass()

We can use the ChatOpenAI.bind_tools() method to handle converting Multiply to an OpenAI function and binding it to the model
(i.e., passing it in each time the model is invoked).
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_Q8ZQ97Qrj5zalugSkYMGV1Uo', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'Multiply'}, 'type': 'fun

We can add a tool parser to extract the tool calls from the generated message to JSON:

from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser

tool_chain = llm_with_tools | JsonOutputToolsParser()

tool_chain.invoke("what's 3 * 12")
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]

Or back to the original Pydantic class:

from langchain_core.output_parsers.openai_tools import PydanticToolsParser

tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])

tool_chain.invoke("what's 3 * 12")
[Multiply(a=3, b=12)]

If we wanted to force that a tool is used (and that it is used only once), we can set thetool_choice argument:

llm_with_multiply = llm.bind_tools([Multiply], tool_choice="Multiply")

llm_with_multiply.invoke(
"make up some numbers if you really want but I'm not forcing you"
)
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f3DApOzb60iYjTfOhVFhDRMI', 'function': {'arguments': '{"a":5,"b":10}', 'name': 'Multiply'}, 'type': 'func

For more see the ChatOpenAI API reference.

Deﬁning functions schemas

In case you need to access function schemas directly, LangChain has a built-in converter that can turn Python functions,
Pydantic classes, and LangChain Tools into the OpenAI format JSON schema:

Python function
import json

from langchain_core.utils.function_calling import convert_to_openai_tool

def multiply(a: int, b: int) -> int:

"""Multiply two integers together.

Args:
a: First integer
b: Second integer
"""
return a * b

print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "integer",
"description": "First integer"
},
"b": {
"type": "integer",
"description": "Second integer"
}
},
"required": [
"a",
"b"
]
}
}
}

Pydantic class
from langchain_core.pydantic_v1 import BaseModel, Field

class multiply(BaseModel):
"""Multiply two integers together."""

a: int = Field(..., description="First integer")

b: int = Field(..., description="Second integer")

print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}

LangChain Tool
from typing import Any, Type

from langchain_core.tools import BaseTool

class MultiplySchema(BaseModel):
"""Multiply tool schema."""

a: int = Field(..., description="First integer")

b: int = Field(..., description="Second integer")

class Multiply(BaseTool):
args_schema: Type[BaseModel] = MultiplySchema
name: str = "multiply"
description: str = "Multiply two integers together."

def _run(self, a: int, b: int, **kwargs: Any) -> Any:

return a * b

# Note: we're passing in a Multiply object not the class itself.

print(json.dumps(convert_to_openai_tool(Multiply()), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}

Next steps

Output parsing: See OpenAI Tools output parsers and OpenAI Functions output parsers to learn about extracting the
function calling API responses into various formats.
Structured output chains: Some models have constructors that handle creating a structured output chain for you.
Tool use: See how to construct chains and agents that actually call the invoked tools inthese guides.

Help us out by providing feedback on this documentation page:

Previous
« Quick Start
Next
Caching »

Community
Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Querying a SQL
Language CookbookDB

Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.

%pip install –upgrade –quiet langchain langchain-openai

from langchain_core.prompts import ChatPromptTemplate

template = """Based on the table schema below, write a SQL query that would answer the user's question:
{schema}

Question: {question}
SQL Query:"""
prompt = ChatPromptTemplate.from_template(template)
from langchain_community.utilities import SQLDatabase

We’ll need the Chinook sample DB for this example. There’s many places to download it from, e.g.https://database.guide/2-
sample-databases-sqlite/

db = SQLDatabase.from_uri("sqlite:///./Chinook.db")
def get_schema(_):
return db.get_table_info()
def run_query(query):
return db.run(query)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

model = ChatOpenAI()

sql_response = (
RunnablePassthrough.assign(schema=get_schema)
| prompt
| model.bind(stop=["\nSQLResult:"])
| StrOutputParser()
)
sql_response.invoke({"question": "How many employees are there?"})
'SELECT COUNT(*) FROM Employee'
template = """Based on the table schema below, question, sql query, and sql response, write a natural language response:
{schema}

Question: {question}
SQL Query: {query}
SQL Response: {response}"""
prompt_response = ChatPromptTemplate.from_template(template)
full_chain = (
RunnablePassthrough.assign(query=sql_response).assign(
schema=get_schema,
response=lambda x: db.run(x["query"]),
)
| prompt_response
| model
)
full_chain.invoke({"question": "How many employees are there?"})
AIMessage(content='There are 8 employees.', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Multiple chains
Next
Agents »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Multiple callback
ModulesMoreCallbackshandlers

Multiple callback handlers

In the previous examples, we passed in callback handlers upon creation of an object by usingcallbacks=. In this case, the
callbacks will be scoped to that particular object.

However, in many cases, it is advantageous to pass in handlers instead when running the object. When we pass through
CallbackHandlers using the callbacks keyword arg when executing an run, those callbacks will be issued by all nested objects
involved in the execution. For example, when a handler is passed through to an Agent, it will be used for all callbacks related
to the agent and all the objects involved in the agent’s execution, in this case, the Tools, LLMChain, and LLM.

This prevents us from having to manually attach the handlers to each individual nested object.
from typing import Any, Dict, List, Union

from langchain.agents import AgentType, initialize_agent, load_tools

from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.agents import AgentAction
from langchain_openai import OpenAI

# First, deﬁne custom callback handler implementations

class MyCustomHandlerOne(BaseCallbackHandler):
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
print(f"on_llm_start {serialized['name']}")

def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:

print(f"on_new_token {token}")

def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""

def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> Any:
print(f"on_chain_start {serialized['name']}")

def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> Any:
print(f"on_tool_start {serialized['name']}")

def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:

print(f"on_agent_action {action}")

class MyCustomHandlerTwo(BaseCallbackHandler):
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
print(f"on_llm_start (I'm the second handler!!) {serialized['name']}")

# Instantiate the handlers

handler1 = MyCustomHandlerOne()
handler2 = MyCustomHandlerTwo()

# Setup the agent. Only the `llm` will issue callbacks for handler2
llm = OpenAI(temperature=0, streaming=True, callbacks=[handler2])
tools = load_tools(["llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

# Callbacks for handler1 will be issued by every object involved in the

# Agent execution (llm, llmchain, tool, agent executor)
agent.run("What is 2 raised to the 0.235 power?", callbacks=[handler1])
on_chain_start AgentExecutor
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token I
on_new_token need
on_new_token to
on_new_token use
on_new_token a
on_new_token calculator
on_new_token to
on_new_token solve
on_new_token this
on_new_token .
on_new_token
Action
on_new_token :
on_new_token Calculator
on_new_token
Action
on_new_token Input
on_new_token :
on_new_token 2
on_new_token ^
on_new_token 0
on_new_token .
on_new_token .
on_new_token 235
on_new_token
on_agent_action AgentAction(tool='Calculator', tool_input='2^0.235', log=' I need to use a calculator to solve this.\nAction: Calculator\nAction Input: 2^0.235')
on_tool_start Calculator
on_chain_start LLMMathChain
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token
on_new_token ```text
on_new_token

on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token

on_new_token ```

on_new_token ...
on_new_token num
on_new_token expr
on_new_token .
on_new_token evaluate
on_new_token ("
on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token ")
on_new_token ...
on_new_token

on_new_token
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token I
on_new_token now
on_new_token know
on_new_token the
on_new_token ﬁnal
on_new_token answer
on_new_token .
on_new_token
Final
on_new_token Answer
on_new_token :
on_new_token 1
on_new_token .
on_new_token 17
on_new_token 690
on_new_token 67
on_new_token 372
on_new_token 187
on_new_token 674
on_new_token

'1.1769067372187674'

Help us out by providing feedback on this documentation page:

Previous
« Logging to ﬁle
Next
Tags »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrievers

On this page

Retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A
retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the
backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

Advanced Retrieval Types

LangChain provides several advanced retrieval types. A full list is below, along with the following information:

Name: Name of the retrieval algorithm.

Index Type: Which index type (if any) this relies on.

Uses an LLM: Whether this retrieval method uses an LLM.

When to Use: Our commentary on when you should considering using this retrieval method.

Description: Description of what this retrieval algorithm is doing.

Index Uses an
Name When to Use Description
Type LLM
If you are just getting This is the simplest method and the one that is easiest to
Vectorstore Vectorstore No started and looking for get started with. It involves creating embeddings for each
something quick and easy. piece of text.
If your pages have lots of
Vectorstore smaller pieces of distinct This involves indexing multiple chunks for each
+ information that are best document. Then you find the chunks that are most similar
ParentDocument No
Document indexed by themselves, in embedding space, but you retrieve the whole parent
Store but best retrieved all document and return that (rather than individual chunks).
together.
If you are able to extract
Vectorstore This involves creating multiple vectors for each
Sometimes information from
+ document. Each vector could be created in a myriad of
Multi Vector during documents that you think
Document ways - examples include summaries of the text and
indexing is more relevant to index
Store hypothetical questions.
than the text itself.
If users are asking
This uses an LLM to transform user input into two things:
questions that are better
(1) a string to look up semantically, (2) a metadata filer to
answered by fetching
Self Query Vectorstore Yes go along with it. This is useful because oftentimes
documents based on
questions are about the METADATA of documents (not
metadata rather than
the content itself).
similarity with the text.
If you are finding that your
This puts a post-processing step on top of another
retrieved documents
Contextual retriever and extracts only the most relevant information
Any Sometimes contain too much
Compression from retrieved documents. This can be done with
irrelevant information and
embeddings or an LLM.
are distracting the LLM.
If you have timestamps
associated with your This fetches documents based on a combination of
Time-Weighted
Vectorstore No documents, and you want semantic similarity (as in normal vector retrieval) and
Vectorstore
to retrieve the most recent recency (looking at timestamps of indexed documents)
ones
If users are asking This uses an LLM to generate multiple queries from the
questions that are complex original one. This is useful when the original query needs
Multi-Query
Any Yes and require multiple pieces pieces of information about multiple topics to be properly
Retriever
of distinct information to answered. By generating multiple queries, we can then
respond fetch documents for each of them.
If you have multiple
retrieval methods and This fetches documents from multiple retrievers and then
Ensemble Any No
want to try combining combines them.
them.
If you are working with a This fetches documents from an underlying retriever, and
long-context model and then reorders them so that the most similar are near the
Long-Context noticing that it's not paying beginning and end. This is useful because it's been
Any No
Reorder attention to information in shown that for longer context models they sometimes
the middle of retrieved don't pay attention to information in the middle of the
documents. context window.

Third Party Integrations

LangChain also integrates with many third-party retrieval services. For a full list of these, check outthis list of all integrations.

Using Retrievers in LCEL

Since retrievers are Runnable 's, we can easily compose them with otherRunnable objects:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])

chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

chain.invoke("What did the president say about technology?")

Custom Retriever

Since the retriever interface is so simple, it's pretty easy to write a custom one.

from langchain_core.retrievers import BaseRetriever

from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from typing import List

class CustomRetriever(BaseRetriever):

def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
return [Document(page_content=query)]

retriever = CustomRetriever()

retriever.get_relevant_documents("bar")

Help us out by providing feedback on this documentation page:

Previous
« Vector stores
Next
Vector store-backed retriever »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Managing prompt
Language Cookbooksize

Managing prompt size

Agents dynamically call tools. The results of those tool calls are added back to the prompt, so that the agent can plan the
next action. Depending on what tools are being used and how they’re being called, the agent prompt can easily grow larger
than the model context window.

With LCEL, it’s easy to add custom functionality for managing the size of prompts within your chain or agent. Let’s look at
simple agent example that can search Wikipedia for information.

%pip install --upgrade --quiet langchain langchain-openai wikipedia

from operator import itemgetter

from langchain.agents import AgentExecutor, load_tools

from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
wiki = WikipediaQueryRun(
api_wrapper=WikipediaAPIWrapper(top_k_results=5, doc_content_chars_max=10_000)
)
tools = [wiki]
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
llm = ChatOpenAI(model="gpt-3.5-turbo")

Let’s try a many-step question without any prompt size handling:

agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(
{
"input": "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientiﬁc name?"
}
)
> Entering new AgentExecutor chain...

Invoking: `Wikipedia` with `List of presidents of the United States`

Page: List of presidents of the United States

Summary: The president of the United States is the head of state and head of government of the United States, indirectly elected to a four-year term via the Electoral

Page: List of presidents of the United States by age

Summary: In this list of presidents of the United States by age, the ﬁrst table charts the age of each president of the United States at the time of presidential inaugura

Page: List of vice presidents of the United States

Summary: There have been 49 vice presidents of the United States since the oﬃce was created in 1789. Originally, the vice president was the person who received t
The persons who have served as vice president were born in or primarily aﬃliated with 27 states plus the District of Columbia. New York has produced the most of an

Page: List of presidents of the United States by net worth

Summary: The list of presidents of the United States by net worth at peak varies greatly. Debt and depreciation often means that presidents' net worth is less than $0
Presidents since 1929, when Herbert Hoover took oﬃce, have generally been wealthier than presidents of the late nineteenth and early twentieth centuries; with the e

Page: List of presidents of the United States by home state

Summary: These lists give the states of primary aﬃliation and of birth for each president of the United States.
Invoking: `Wikipedia` with `Joe Biden`

Page: Joe Biden

Summary: Joseph Robinette Biden Jr. ( BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States. A me
Born in Scranton, Pennsylvania, Biden moved with his family to Delaware in 1953. He graduated from the University of Delaware before earning his law degree from
As president, Biden signed the American Rescue Plan Act in response to the COVID-19 pandemic and subsequent recession. He signed bipartisan bills on infrastruc

Page: Presidency of Joe Biden

Summary: Joe Biden's tenure as the 46th president of the United States began with his inauguration on January 20, 2021. Biden, a Democrat from Delaware who pre
The foreign policy goal of the Biden administration is to restore the US to a "position of trusted leadership" among global democracies in order to address the challeng

Page: Family of Joe Biden

Summary: Joe Biden, the 46th and current president of the United States, has family members who are prominent in law, education, activism and politics. Biden's imm

Page: Inauguration of Joe Biden

Summary: The inauguration of Joe Biden as the 46th president of the United States took place on Wednesday, January 20, 2021, marking the start of the four-year te
The inauguration took place amidst extraordinary political, public health, economic, and national security crises, including the ongoing COVID-19 pandemic; outgoing
Invoking: `Wikipedia` with `Delaware`

Page: Delaware
Summary: Delaware ( DEL-ə-wair) is a state in the northeast and Mid-Atlantic regions of the United States. It borders Maryland to its south and west, Pennsylvania to
The southern two counties, Kent and Sussex counties, historically have been predominantly agrarian economies. New Castle is more urbanized and is considered pa
Delaware was one of the Thirteen Colonies that participated in the American Revolution and American Revolutionary War, in which the American Continental Army, le
On December 7, 1787, Delaware was the ﬁrst state to ratify the Constitution of the United States, earning it the nickname "The First State".Since the turn of the 20th c

Page: Delaware City, Delaware

Summary: Delaware City is a city in New Castle County, Delaware, United States. The population was 1,885 as of 2020. It is a small port town on the eastern terminu

Page: Delaware River

Summary: The Delaware River is a major river in the Mid-Atlantic region of the United States and is the longest free-flowing (undammed) river in the Eastern United S
The river has been recognized by the National Wildlife Federation as one of the country's Great Waters and has been called the "Lifeblood of the Northeast" by Amer
The Delaware River has two branches that rise in the Catskill Mountains of New York: the West Branch at Mount Jefferson in Jefferson, Schoharie County, and the E
Before the arrival of European settlers, the river was the homeland of the Lenape native people. They called the river Lenapewihittuk, or Lenape River, and Kithanne,

Page: University of Delaware

Summary: The University of Delaware (colloquially known as UD or Delaware) is a privately governed, state-assisted land-grant research university located in Newar

Page: Lenape
Summary: The Lenape (English: , , ; Lenape languages: [lənaːpe]), also called the Lenni Lenape and Delaware people, are an Indigenous people of the Northeastern
During the last decades of the 18th century, European settlers and the eﬀects of the American Revolutionary War displaced most Lenape from their homelands and p

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 5487 tokens (541

LangSmith trace

Unfortunately we run out of space in our model’s context window before we the agent can get to the ﬁnal answer. Now let’s
add some prompt handling logic. To keep things simple, if our messages have too many tokens we’ll start dropping the
earliest AI, Function message pairs (this is the model tool invocation message and the subsequent tool output message) in
the chat history.
def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
messages = prompt.to_messages()
num_tokens = llm.get_num_tokens_from_messages(messages)
ai_function_messages = messages[2:]
while num_tokens > 4_000:
ai_function_messages = ai_function_messages[2:]
num_tokens = llm.get_num_tokens_from_messages(
messages[:2] + ai_function_messages
)
messages = messages[:2] + ai_function_messages
return ChatPromptValue(messages=messages)

agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| condense_prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

agent_executor.invoke(
{
"input": "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientiﬁc name?"
}
)

> Entering new AgentExecutor chain...

Invoking: `Wikipedia` with `List of presidents of the United States`

Page: List of presidents of the United States

Summary: The president of the United States is the head of state and head of government of the United States, indirectly elected to a four-year term via the Electoral

Page: List of presidents of the United States by age

Summary: In this list of presidents of the United States by age, the ﬁrst table charts the age of each president of the United States at the time of presidential inaugura

Page: List of vice presidents of the United States

Page: List of presidents of the United States by net worth

Page: List of presidents of the United States by home state

Summary: These lists give the states of primary aﬃliation and of birth for each president of the United States.
Invoking: `Wikipedia` with `Joe Biden`

Page: Joe Biden

Page: Presidency of Joe Biden

Page: Family of Joe Biden

Summary: Joe Biden, the 46th and current president of the United States, has family members who are prominent in law, education, activism and politics. Biden's imm

Page: Inauguration of Joe Biden

Page: Delaware River

Page: University of Delaware

Summary: The University of Delaware (colloquially known as UD or Delaware) is a privately governed, state-assisted land-grant research university located in Newar

Invoking: `Wikipedia` with `Blue hen chicken`

Page: Delaware Blue Hen

Summary: The Delaware Blue Hen or Blue Hen of Delaware is a blue strain of American gamecock. Under the name Blue Hen Chicken it is the oﬃcial bird of the Sta

Page: Delaware Fightin' Blue Hens

Summary: The Delaware Fightin' Blue Hens are the athletic teams of the University of Delaware (UD) of Newark, Delaware, in the United States. The Blue Hens com
On November 28, 2023, UD and Conference USA (CUSA) jointly announced that UD would start a transition to the Division I Football Bowl Subdivision (FBS) in 2024

Page: Brahma chicken

Summary: The Brahma is an American breed of chicken. It was bred in the United States from birds imported from the Chinese port of Shanghai,: 78 and was the prin

Page: Silkie
Summary: The Silkie (also known as the Silky or Chinese silk chicken) is a breed of chicken named for its atypically ﬂuﬀy plumage, which is said to feel like silk and s

Page: Silverudd Blue

Summary: The Silverudd Blue, Swedish: Silverudds Blå, is a Swedish breed of chicken. It was developed by Martin Silverudd in Småland, in southern Sweden. Hens

> Finished chain.

{'input': "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientiﬁc name?",
'output': 'The current US president is Joe Biden. His home state is Delaware. The home state bird of Delaware is the Delaware Blue Hen. The scientiﬁc name of the D

LangSmith trace

Help us out by providing feedback on this documentation page:

Previous
« Adding moderation
Next
Using tools »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model Output Custom Output
ModulesI/O Parsers Parsers

On this page

Custom Output Parsers

In some situations you may want to implement a custom parser to structure the model output into a custom format.

There are two ways to implement a custom parser:

1. Using RunnableLambda or RunnableGenerator in LCEL – we strongly recommend this for most use cases
2. By inherting from one of the base classes for out parsing – this is the hard way of doing things

The diﬀerence between the two approaches are mostly superﬁcial and are mainly in terms of which callbacks are triggered
(e.g., on_chain_start vs. on_parser_start), and how a runnable lambda vs. a parser might be visualized in a tracing platform like
LangSmith.

Runnable Lambdas and Generators

The recommended way to parse is using runnable lambdas and runnable generators!

Here, we will make a simple parse that inverts the case of the output from the model.

For example, if the model outputs: “Meow”, the parser will produce “mEOW”.

from typing import Iterable

from langchain_anthropic.chat_models import ChatAnthropic

from langchain_core.messages import AIMessage, AIMessageChunk

model = ChatAnthropic(model_name="claude-2.1")

def parse(ai_message: AIMessage) -> str:

"""Parse the AI message."""
return ai_message.content.swapcase()

chain = model | parse

chain.invoke("hello")
'hELLO!'
TIP

LCEL automatically upgrades the function parse to RunnableLambda(parse) when composed using a | syntax.

If you don’t like that you can manually import RunnableLambda and then runparse = RunnableLambda(parse) .

Does streaming work?

for chunk in chain.stream("tell me about yourself in one sentence"):

print(chunk, end="|", ﬂush=True)
i'M cLAUDE, AN ai ASSISTANT CREATED BY aNTHROPIC TO BE HELPFUL, HARMLESS, AND HONEST.|

No, it doesn’t because the parser aggregates the input before parsing the output.

If we want to implement a streaming parser, we can have the parser accept an iterable over the input instead and yield the
results as they’re available.
from langchain_core.runnables import RunnableGenerator

def streaming_parse(chunks: Iterable[AIMessageChunk]) -> Iterable[str]:

for chunk in chunks:
yield chunk.content.swapcase()

streaming_parse = RunnableGenerator(streaming_parse)
INFO

Please wrap the streaming parser in RunnableGenerator as we may stop automatically upgrading it with the| syntax.

chain = model | streaming_parse

chain.invoke("hello")
'hELLO!'

Let’s conﬁrm that streaming works!

for chunk in chain.stream("tell me about yourself in one sentence"):

print(chunk, end="|", ﬂush=True)
i|'M| cLAUDE|,| AN| ai| ASSISTANT| CREATED| BY| aN|THROP|IC| TO| BE| HELPFUL|,| HARMLESS|,| AND| HONEST|.|

Inherting from Parsing Base Classes

Another approach to implement a parser is by inherting from BaseOutputParser, BaseGenerationOutputParser or another one of the
base parsers depending on what you need to do.

In general, we do not recommend this approach for most use cases as it results in more code to write without signiﬁcant
beneﬁts.

The simplest kind of output parser extends theBaseOutputParser class and must implement the following methods:

parse:takes the string output from the model and parses it

(optional) _type: identiﬁes the name of the parser.

When the output from the chat model or LLM is malformed, the can throw anOutputParserException to indicate that parsing fails
because of bad input. Using this exception allows code that utilizes the parser to handle the exceptions in a consistent
manner.

:::{.callout-tip} Parsers are Runnables!

Because BaseOutputParser implements the Runnable interface, any custom parser you will create this way will become valid
LangChain Runnables and will beneﬁt from automatic async support, batch interface, logging support etc. :::

Simple Parser

Here’s a simple parser that can parse a string representation of a booealn (e.g., YES or NO) and convert it into the
corresponding boolean type.
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import BaseOutputParser

# The [bool] desribes a parameterization of a generic.

# It's basically indicating what the return type of parse is
# in this case the return type is either True or False
class BooleanOutputParser(BaseOutputParser[bool]):
"""Custom boolean parser."""

true_val: str = "YES"

false_val: str = "NO"

def parse(self, text: str) -> bool:

cleaned_text = text.strip().upper()
if cleaned_text not in (self.true_val.upper(), self.false_val.upper()):
raise OutputParserException(
f"BooleanOutputParser expected output value to either be "
f"{self.true_val} or {self.false_val} (case-insensitive). "
f"Received {cleaned_text}."
)
return cleaned_text == self.true_val.upper()

@property
def _type(self) -> str:
return "boolean_output_parser"
parser = BooleanOutputParser()
parser.invoke("YES")
True
try:
parser.invoke("MEOW")
except Exception as e:
print(f"Triggered an exception of type: {type(e)}")
Triggered an exception of type: <class 'langchain_core.exceptions.OutputParserException'>

Let’s test changing the parameterization

parser = BooleanOutputParser(true_val="OKAY")
parser.invoke("OKAY")
True

Let’s conﬁrm that other LCEL methods are present

parser.batch(["OKAY", "NO"])
[True, False]
await parser.abatch(["OKAY", "NO"])
[True, False]
from langchain_anthropic.chat_models import ChatAnthropic

anthropic = ChatAnthropic(model_name="claude-2.1")
anthropic.invoke("say OKAY or NO")
AIMessage(content='OKAY')

Let’s test that our parser works!

chain = anthropic | parser

chain.invoke("say OKAY or NO")
True
NOTE

The parser will work with either the output from an LLM (a string) or the output from a chat model (anAIMessage)!

Parsing Raw Model Outputs

Sometimes there is additional metadata on the model output that is important besides the raw text. One example of this is tool
calling, where arguments intended to be passed to called functions are returned in a separate property. If you need this ﬁner-
grained control, you can instead subclass the BaseGenerationOutputParser class.

This class requires a single method parse_result. This method takes raw model output (e.g., list ofGeneration or ChatGeneration)
and returns the parsed output.

Supporting both Generation and ChatGeneration allows the parser to work with both regular LLMs as well as with Chat Models.
from typing import List

from langchain_core.exceptions import OutputParserException

from langchain_core.messages import AIMessage
from langchain_core.output_parsers import BaseGenerationOutputParser
from langchain_core.outputs import ChatGeneration, Generation

class StrInvertCase(BaseGenerationOutputParser[str]):
"""An example parser that inverts the case of the characters in the message.

This is an example parse shown just for demonstration purposes and to keep
the example as simple as possible.
"""

def parse_result(self, result: List[Generation], *, partial: bool = False) -> str:

"""Parse a list of model Generations into a speciﬁc format.

Args:
result: A list of Generations to be parsed. The Generations are assumed
to be diﬀerent candidate outputs for a single model input.
Many parsers assume that only a single generation is passed it in.
We will assert for that
partial: Whether to allow partial results. This is used for parsers
that support streaming
"""
if len(result) != 1:
raise NotImplementedError(
"This output parser can only be used with a single generation."
)
generation = result[0]
if not isinstance(generation, ChatGeneration):
# Say that this one only works with chat generations
raise OutputParserException(
"This output parser can only be used with a chat generation."
)
return generation.message.content.swapcase()

chain = anthropic | StrInvertCase()

Let’s the new parser! It should be inverting the output from the model.

chain.invoke("Tell me a short sentence about yourself")

'hELLO! mY NAME IS cLAUDE.'

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
CSV parser »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model Tracking token
ModulesI/O LLMsusage

Tracking token usage

This notebook goes over how to track your token usage for speciﬁc calls. It is currently only implemented for the OpenAI API.

Let’s ﬁrst look at an extremely simple example of tracking token usage for a single LLM call.

from langchain.callbacks import get_openai_callback

from langchain_openai import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
with get_openai_callback() as cb:
result = llm.invoke("Tell me a joke")
print(cb)
Tokens Used: 37
Prompt Tokens: 4
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05

Anything inside the context manager will get tracked. Here’s an example of using it to track multiple calls in sequence.

with get_openai_callback() as cb:

result = llm.invoke("Tell me a joke")
result2 = llm.invoke("Tell me a joke")
print(cb.total_tokens)
72

If a chain or agent with multiple steps in it is used, it will track all those steps.

from langchain.agents import AgentType, initialize_agent, load_tools

from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
> Entering new AgentExecutor chain...
I need to ﬁnd out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: ["Olivia Wilde and Harry Styles took fans by surprise with their whirlwind romance, which began when they met on the set of Don't Worry Darling.", 'Olivi
Thought: Harry Styles is Olivia Wilde's boyfriend.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the ﬁnal answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557.

> Finished chain.

Total Tokens: 2205
Prompt Tokens: 2053
Completion Tokens: 152
Total Cost (USD): $0.0441

Help us out by providing feedback on this documentation page:

Previous
« Streaming
Next
Output Parsers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Output CSV
ModulesI/O Parsers Types parser

CSV parser
This output parser can be used when you want to return a list of comma-separated items.

from langchain.output_parsers import CommaSeparatedListOutputParser

from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template="List ﬁve {subject}.\n{format_instructions}",
input_variables=["subject"],
partial_variables={"format_instructions": format_instructions},
)

model = ChatOpenAI(temperature=0)

chain = prompt | model | output_parser

chain.invoke({"subject": "ice cream ﬂavors"})
['Vanilla',
'Chocolate',
'Strawberry',
'Mint Chocolate Chip',
'Cookies and Cream']
for s in chain.stream({"subject": "ice cream ﬂavors"}):
print(s)
['Vanilla']
['Chocolate']
['Strawberry']
['Mint Chocolate Chip']
['Cookies and Cream']

Help us out by providing feedback on this documentation page:

Previous
« Custom Output Parsers
Next
Datetime parser »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreMemory

On this page

[Beta] Memory
Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to
information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some
window of past messages directly. A more complex system will need to have a world model that it is constantly updating,
which allows it to do things like maintain information about entities and their relationships.

We call this ability to store information about past interactions "memory". LangChain provides a lot of utilities for adding
memory to a system. These utilities can be used by themselves or incorporated seamlessly into a chain.

Most of memory-related functionality in LangChain is marked as beta. This is for two reasons:

1. Most functionality (with some exceptions, see below) are not production ready

2. Most functionality (with some exceptions, see below) work with Legacy chains, not the newer LCEL syntax.

The main exception to this is the ChatMessageHistory functionality. This functionality is largely production ready and does
integrate with LCEL.

LCEL Runnables: For an overview of how to useChatMessageHistory with LCEL runnables, see these docs

Integrations: For an introduction to the various ChatMessageHistory integrations, see these docs

Introduction

A memory system needs to support two basic actions: reading and writing. Recall that every chain deﬁnes some core
execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can
come from memory. A chain will interact with its memory system twice in a given run.

1. AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory
system and augment the user inputs.
2. AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the
current run to memory, so that they can be referred to in future runs.
Building memory into a system

The two core design decisions in any memory system are:

How state is stored

How state is queried

Storing: List of chat messages

Underlying any memory is a history of all chat interactions. Even if these are not all used directly, they need to be stored in
some form. One of the key parts of the LangChain memory module is a series of integrations for storing these chat
messages, from in-memory lists to persistent databases.

Chat message storage: How to work with Chat Messages, and the various integrations oﬀered.

Querying: Data structures and algorithms on top of chat messages

Keeping a list of chat messages is fairly straight-forward. What is less straight-forward are the data structures and algorithms
built on top of chat messages that serve a view of those messages that is most useful.

A very simple memory system might just return the most recent messages each run. A slightly more complex memory
system might return a succinct summary of the past K messages. An even more sophisticated system might extract entities
from stored messages and only return information about entities referenced in the current run.

Each application can have diﬀerent requirements for how memory is queried. The memory module should make it easy to
both get started with simple memory systems and write your own custom systems if needed.

Memory types: The various data structures and algorithms that make up the memory types LangChain supports

Get started

Let's take a look at what Memory actually looks like in LangChain. Here we'll cover the basics of interacting with an arbitrary
memory class.

Let's take a look at how to useConversationBufferMemory in chains. ConversationBufferMemory is an extremely simple form of
memory that just keeps a list of chat messages in a buffer and passes those into the prompt template.
from langchain.memory import ConversationBufferMemory

memory = ConversationBuﬀerMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

When using memory in a chain, there are a few key concepts to understand. Note that here we cover general concepts that
are useful for most types of memory. Each individual memory type may very well have its own parameters and concepts that
are necessary to understand.

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have speciﬁc names which need to align with the
variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the
empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon
the input variables, you may need to pass some in.

memory.load_memory_variables({})
{'history': "Human: hi!\nAI: what's up?"}

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your
prompt) should expect an input named history. You can usually control this variable through parameters on the memory class.
For example, if you want the memory variables to be returned in the key chat_history you can do:

memory = ConversationBuﬀerMemory(memory_key="chat_history")
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'chat_history': "Human: hi!\nAI: what's up?"}

The parameter name to control these keys may vary per memory type, but it's important to understand that (1) this is
controllable, and (2) how to control it.

Whether memory is a string or a list of messages

One of the most common types of memory involves returning a list of chat messages. These can either be returned as a
single string, all concatenated together (useful when they will be passed into LLMs) or a list of ChatMessages (useful when
passed into ChatModels).

By default, they are returned as a single string. In order to return as a list of messages, you can setreturn_messages=True

memory = ConversationBuﬀerMemory(return_messages=True)
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'history': [HumanMessage(content='hi!', additional_kwargs={}, example=False),
AIMessage(content='what's up?', additional_kwargs={}, example=False)]}

What keys are saved to memory

Often times chains take in or return multiple input/output keys. In these cases, how can we know which keys we want to save
to the chat message history? This is generally controllable by input_key and output_key parameters on the memory types. These
default to None - and if there is only one input/output key it is known to just use that. However, if there are multiple input/output
keys then you MUST specify the name of which one to use.

End to end example

Finally, let's take a look at using this in a chain. We'll use anLLMChain, and show working with both an LLM and a ChatModel.

Using an LLM
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBuﬀerMemory

llm = OpenAI(temperature=0)
# Notice that "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}

Response:"""
prompt = PromptTemplate.from_template(template)
# Notice that we need to align the `memory_key`
memory = ConversationBuﬀerMemory(memory_key="chat_history")
conversation = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory
)
# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

Using a ChatModel
from langchain_openai import ChatOpenAI
from langchain.prompts import (
ChatPromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.memory import ConversationBuﬀerMemory

llm = ChatOpenAI()
prompt = ChatPromptTemplate(
messages=[
SystemMessagePromptTemplate.from_template(
"You are a nice chatbot having a conversation with a human."
),
# The `variable_name` here is what must align with memory
MessagesPlaceholder(variable_name="chat_history"),
HumanMessagePromptTemplate.from_template("{question}")
]
)
# Notice that we `return_messages=True` to ﬁt into the MessagesPlaceholder
# Notice that `"chat_history"` aligns with the MessagesPlaceholder name.
memory = ConversationBuﬀerMemory(memory_key="chat_history", return_messages=True)
conversation = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory
)
# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

Next steps

And that's it for getting started! Please see the other sections for walkthroughs of more advanced topics, like custom memory,
multiple memories, and more.

Help us out by providing feedback on this documentation page:

Previous
« Chains
Next
Chat Messages »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Add
Language to fallbacks

On this page

Add fallbacks
There are many possible points of failure in an LLM application, whether that be issues with LLM API’s, poor model outputs,
issues with other integrations, etc. Fallbacks help you gracefully handle and isolate these issues.

Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level.

Handling LLM API Errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API
could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these
types of things.

IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those oﬀ when
working with fallbacks. Otherwise the ﬁrst wrapper will keep on retrying and not failing.

%pip install --upgrade --quiet langchain langchain-openai

from langchain_community.chat_models import ChatAnthropic
from langchain_openai import ChatOpenAI

First, let’s mock out what happens if we hit a RateLimitError from OpenAI

from unittest.mock import patch

import httpx
from openai import RateLimitError

request = httpx.Request("GET", "/")

response = httpx.Response(200, request=request)
error = RateLimitError("rate limit", response=response, body="")
# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc
openai_llm = ChatOpenAI(max_retries=0)
anthropic_llm = ChatAnthropic()
llm = openai_llm.with_fallbacks([anthropic_llm])
# Let's use just the OpenAI LLm first, to show that we run into an error
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(openai_llm.invoke("Why did the chicken cross the road?"))
except RateLimitError:
print("Hit error")
Hit error
# Now let's try with fallbacks to Anthropic
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(llm.invoke("Why did the chicken cross the road?"))
except RateLimitError:
print("Hit error")
content=' I don\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\n\n- To get to the other side!\n\n- It was too chicke

We can use our “LLM with Fallbacks” as we would a normal LLM.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_eﬀect=error):
try:
print(chain.invoke({"animal": "kangaroo"}))
except RateLimitError:
print("Hit error")
content=" I don't actually know why the kangaroo crossed the road, but I'm happy to take a guess! Maybe the kangaroo was trying to get to the other side to ﬁnd som

Specifying errors to handle

We can also specify the errors to handle if we want to be more speciﬁc about when the fallback is invoked:

llm = openai_llm.with_fallbacks(
[anthropic_llm], exceptions_to_handle=(KeyboardInterrupt,)
)

chain = prompt | llm

with patch("openai.resources.chat.completions.Completions.create", side_eﬀect=error):
try:
print(chain.invoke({"animal": "kangaroo"}))
except RateLimitError:
print("Hit error")
Hit error

Fallbacks for Sequences

We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two diﬀerent models:
ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely
want a diﬀerent prompt.

# First let's create a chain with a ChatModel

# We add in a string output parser here so the outputs between the two are the same type
from langchain_core.output_parsers import StrOutputParser

chat_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model_name="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()
# Now lets create a chain with the normal OpenAI model
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

prompt_template = """Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?"""

prompt = PromptTemplate.from_template(prompt_template)
llm = OpenAI()
good_chain = prompt | llm
# We can now create a ﬁnal chain which combines the two
chain = bad_chain.with_fallbacks([good_chain])
chain.invoke({"animal": "turtle"})
'\n\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'

Help us out by providing feedback on this documentation page:

Previous
« Create a runnable with the `@chain` decorator
Next
Stream custom generator functions »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters Semantic Chunking

On this page

Semantic Chunking
Splits the text based on semantic similarity.

Taken from Greg Kamradt’s wonderful notebook: https://github.com/FullStackRetrieval-

com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb

All credit to him.

At a high level, this splits into sentences, then groups into groups of 3 sentences, and then merges one that are similar in the
embedding space.

Install Dependencies

!pip install --quiet langchain_experimental langchain_openai

Load Example Data

# This is a long document we can split up.

with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()

Create Text Splitter

from langchain_experimental.text_splitter import SemanticChunker

from langchain_openai.embeddings import OpenAIEmbeddings
text_splitter = SemanticChunker(OpenAIEmbeddings())

Split Text

docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Breakpoints

This chunker works by determining when to “break” apart sentences. This is done by looking for diﬀerences in embeddings
between any two sentences. When that diﬀerence is past some threshold, then they are split.

There are a few ways to determine what that threshold is.

Percentile

The default way to split is based on percentile. In this method, all diﬀerences between sentences are calculated, and then
any diﬀerence greater than the X percentile is split.
text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="percentile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
26

Standard Deviation

In this method, any diﬀerence greater than X standard deviations is split.

text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="standard_deviation"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
4

Interquartile

In this method, the interquartile distance is used to split chunks.

text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="interquartile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
25

Help us out by providing feedback on this documentation page:

Previous
« Recursively split by character
Next
Split by tokens »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language Get started

On this page

Get started
LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as
streaming, parallelism, and logging.

Basic example: prompt + model + output parser

The most basic and common use case is chaining a prompt template and a model together. To see how this works, let’s
create a chain that takes a topic and generates a joke:

%pip install --upgrade --quiet langchain-core langchain-community langchain-openai

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("tell me a short joke about {topic}")

model = ChatOpenAI(model="gpt-4")
output_parser = StrOutputParser()

chain = prompt | model | output_parser

chain.invoke({"topic": "ice cream"})

"Why don't ice creams ever get invited to parties?\n\nBecause they always drip when things heat up!"

Notice this line of this code, where we piece together then diﬀerent components into a single chain using LCEL:

chain = prompt | model | output_parser

The | symbol is similar to a unix pipe operator, which chains together the diﬀerent components feeds the output from one
component as input into the next component.

In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model, then
the model output is passed to the output parser. Let’s take a look at each component individually to really understand what’s
going on.

1. Prompt
prompt is a BasePromptTemplate , which means it takes in a dictionary of template variables and produces aPromptValue. A
PromptValue is a wrapper around a completed prompt that can be passed to either an LLM (which takes a string as input) or
ChatModel (which takes a sequence of messages as input). It can work with either language model type because it deﬁnes
logic both for producing BaseMessages and for producing a string.

prompt_value = prompt.invoke({"topic": "ice cream"})

prompt_value
ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])
prompt_value.to_messages()
[HumanMessage(content='tell me a short joke about ice cream')]
prompt_value.to_string()
'Human: tell me a short joke about ice cream'

2. Model

The PromptValue is then passed to model. In this case our model is a ChatModel, meaning it will output a BaseMessage.

message = model.invoke(prompt_value)
message
AIMessage(content="Why don't ice creams ever get invited to parties?\n\nBecause they always bring a melt down!")
If our model was an LLM, it would output a string.

from langchain_openai.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm.invoke(prompt_value)
'\n\nRobot: Why did the ice cream truck break down? Because it had a meltdown!'

3. Output parser

And lastly we pass our model output to the output_parser, which is a BaseOutputParser meaning it takes either a string or a
BaseMessage as input. The StrOutputParser speciﬁcally simple converts any input into a string.

output_parser.invoke(message)
"Why did the ice cream go to therapy? \n\nBecause it had too many toppings and couldn't ﬁnd its cone-ﬁdence!"

4. Entire Pipeline

To follow the steps along:

1. We pass in user input on the desired topic as{"topic": "ice cream"}

2. The prompt component takes the user input, which is then used to construct a PromptValue after using thetopic to
construct the prompt.
3. The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The
generated output from the model is a ChatMessage object.
4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned
from the invoke method.

Note that if you’re curious about the output of any components, you can always test out a smaller version of the chain such
as prompt or prompt | model to see the intermediate results:

input = {"topic": "ice cream"}

prompt.invoke(input)
# > ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])

(prompt | model).invoke(input)
# > AIMessage(content="Why did the ice cream go to therapy?\nBecause it had too many toppings and couldn't cone-trol itself!")

RAG Search Example

For our next example, we want to run a retrieval-augmented generation chain to add some context when responding to
questions.
# Requires:
# pip install langchain docarray tiktoken

from langchain_community.vectorstores import DocArrayInMemorySearch

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings

vectorstore = DocArrayInMemorySearch.from_texts(
["harrison worked at kensho", "bears like to eat honey"],
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

chain.invoke("where did harrison work?")

In this case, the composed chain is:

chain = setup_and_retrieval | prompt | model | output_parser

To explain this, we ﬁrst can see that the prompt template above takes incontext and question as values to be substituted in the
prompt. Before building the prompt template, we want to retrieve relevant documents to the search and include them as part
of the context.

As a preliminary step, we’ve setup the retriever using an in memory store, which can retrieve documents based on a query.
This is a runnable component as well that can be chained together with other components, but you can also try to run it
separately:

retriever.invoke("where did harrison work?")

We then use the RunnableParallel to prepare the expected inputs into the prompt by using the entries for the retrieved
documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass
the user’s question:

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)

To review, the complete chain is:

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

With the ﬂow being:

1. The ﬁrst steps create a RunnableParallel object with two entries. The ﬁrst entry, context will include the document results
fetched by the retriever. The second entry, question will contain the user’s original question. To pass on the question, we
use RunnablePassthrough to copy this entry.
2. Feed the dictionary from the step above to theprompt component. It then takes the user input which is question as well as
the retrieved document which is context to construct a prompt and output a PromptValue.
3. The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The
generated output from the model is a ChatMessage object.
4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned
from the invoke method.

Next steps
We recommend reading our Why use LCEL section next to see a side-by-side comparison of the code needed to produce
common functionality with and without LCEL.

Help us out by providing feedback on this documentation page:

Previous
« LangChain Expression Language (LCEL)
Next
Why use LCEL »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookUsing tools

Using tools
You can use any Tools with Runnables easily.

%pip install --upgrade --quiet langchain langchain-openai duckduckgo-search

from langchain.tools import DuckDuckGoSearchRun
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
search = DuckDuckGoSearchRun()
template = """turn the following user input into a search query for a search engine:

{input}"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()
chain = prompt | model | StrOutputParser() | search
chain.invoke({"input": "I'd like to ﬁgure out what games are tonight"})
'What sports games are on TV today & tonight? Watch and stream live sports on TV today, tonight, tomorrow. Today\'s 2023 sports TV schedule includes football, bas

Help us out by providing feedback on this documentation page:

Previous
« Managing prompt size
Next
LangChain Expression Language (LCEL) »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers Quickstart

On this page

Quickstart
Language models output text. But many times you may want to get more structured information than just text back. This is
where output parsers come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser
must implement:

“Get format instructions”: A method which returns a string containing instructions for how the output of a language
model should be formatted.
“Parse”: A method which takes in a string (assumed to be the response from a language model) and parses it into
some structure.

And then one optional one:

“Parse with prompt”: A method which takes in a string (assumed to be the response from a language model) and a
prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is
largely provided in the event the OutputParser wants to retry or ﬁx the output in some way, and needs information from
the prompt to do so.

Get started

Below we go over the main type of output parser, thePydanticOutputParser.

from langchain.output_parsers import PydanticOutputParser

from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.0)

# Deﬁne your desired data structure.

class Joke(BaseModel):
setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")

# You can add custom validation logic easily with Pydantic.

@validator("setup")
def question_ends_with_question_mark(cls, field):
if field[-1] != "?":
raise ValueError("Badly formed question!")
return field

# Set up a parser + inject instructions into the prompt template.

parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

LCEL

Output parsers implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL).
This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

Output parsers accept a string or BaseMessage as input and can return an arbitrary type.

parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

Instead of manually invoking the parser, we also could’ve just added it to ourRunnable sequence:

chain = prompt | model | parser

chain.invoke({"query": "Tell me a joke."})
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

While all parsers support the streaming interface, only certain parsers can stream through partially parsed objects, since this
is highly dependent on the output type. Parsers which cannot construct partial objects will simply yield the fully parsed output.

The SimpleJsonOutputParser for example can stream through partial outputs:

from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
"Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser
list(json_chain.stream({"question": "Who invented the microscope?"}))
[{},
{'answer': ''},
{'answer': 'Ant'},
{'answer': 'Anton'},
{'answer': 'Antonie'},
{'answer': 'Antonie van'},
{'answer': 'Antonie van Lee'},
{'answer': 'Antonie van Leeu'},
{'answer': 'Antonie van Leeuwen'},
{'answer': 'Antonie van Leeuwenho'},
{'answer': 'Antonie van Leeuwenhoek'}]

While the PydanticOutputParser cannot:

list(chain.stream({"query": "Tell me a joke."}))

[Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')]

Help us out by providing feedback on this documentation page:

Previous
« Output Parsers
Next
Custom Output Parsers »

Community

Discord

Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableParallel: Manipulating
Language to data

On this page

Manipulating inputs & output

RunnableParallel can be useful for manipulating the output of one Runnable to match the input format of the next Runnable in
a sequence.

%pip install --upgrade --quiet langchain langchain-openai

from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

retrieval_chain.invoke("where did harrison work?")

'Harrison worked at Kensho.'
TIP

Note that when composing a RunnableParallel with another Runnable we don’t even need to wrap our dictionary in the
RunnableParallel class — the type conversion is handled for us. In the context of a chain, these are equivalent:

{"context": retriever, "question": RunnablePassthrough()}

RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
RunnableParallel(context=retriever, question=RunnablePassthrough())

Using itemgetter as shorthand

Note that you can use Python’s itemgetter as shorthand to extract data from the map when combining withRunnableParallel. You
can ﬁnd more information about itemgetter in the Python Documentation.

In the example below, we use itemgetter to extract speciﬁc keys from the map:
from operator import itemgetter

from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:

{context}

Question: {question}

Answer in the following language: {language}

"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"language": itemgetter("language"),
}
| prompt
| model
| StrOutputParser()
)

chain.invoke({"question": "where did harrison work", "language": "italian"})

'Harrison ha lavorato a Kensho.'

Parallelize steps

RunnableParallel (aka. RunnableMap) makes it easy to execute multiple Runnables in parallel, and to return the output of
these Runnables as a map.

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnableParallel
from langchain_openai import ChatOpenAI

model = ChatOpenAI()
joke_chain = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model
poem_chain = (
ChatPromptTemplate.from_template("write a 2-line poem about {topic}") | model
)

map_chain = RunnableParallel(joke=joke_chain, poem=poem_chain)

map_chain.invoke({"topic": "bear"})
{'joke': AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="In the wild's embrace, bear roams free,\nStrength and grace, a majestic decree.")}

Parallelism

RunnableParallel are also useful for running independent processes in parallel, since each Runnable in the map is executed
in parallel. For example, we can see our earlier joke_chain, poem_chain and map_chain all have about the same runtime, even
though map_chain executes both of the other two.

%%timeit

joke_chain.invoke({"topic": "bear"})
958 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit

poem_chain.invoke({"topic": "bear"})
1.22 s ± 508 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit

map_chain.invoke({"topic": "bear"})
1.15 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Help us out by providing feedback on this documentation page:

Previous
« How to
Next
RunnablePassthrough: Passing data through »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsQuick Start

On this page

Quick Start
Chat models are a variation on language models. While chat models use language models under the hood, the interface they
use is a bit diﬀerent. Rather than using a “text in, text out” API, they use an interface where “chat messages” are the inputs
and outputs.

Setup

For this example we’ll need to install the OpenAI partner package:

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we’ll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you’d prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import ChatOpenAI

chat = ChatOpenAI(openai_api_key="...")

Otherwise you can initialize without any params:

from langchain_openai import ChatOpenAI

chat = ChatOpenAI()

Messages

The chat model interface is based around messages rather than raw text. The types of messages currently supported in
LangChain are AIMessage, HumanMessage , SystemMessage, FunctionMessage and ChatMessage – ChatMessage takes in an arbitrary role
parameter. Most of the time, you’ll just be dealing with HumanMessage , AIMessage, and SystemMessage

LCEL

Chat models implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL). This
means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

Chat models accept List[BaseMessage] as inputs, or objects which can be coerced to messages, includingstr (converted to
HumanMessage ) and PromptValue.

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
chat.invoke(messages)
AIMessage(content="The purpose of model regularization is to prevent overﬁtting in machine learning models. Overﬁtting occurs when a model becomes too comple

for chunk in chat.stream(messages):

print(chunk.content, end="", flush=True)
The purpose of model regularization is to prevent overfitting and improve the generalization of a machine learning model. Overfitting occurs when a model is too com

chat.batch([messages])
[AIMessage(content="The purpose of model regularization is to prevent overﬁtting in machine learning models. Overﬁtting occurs when a model becomes too comple

await chat.ainvoke(messages)
AIMessage(content='The purpose of model regularization is to prevent overﬁtting in machine learning models. Overﬁtting occurs when a model becomes too complex

async for chunk in chat.astream(messages):

print(chunk.content, end="", flush=True)
The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex and starts to memor

async for chunk in chat.astream_log(messages):

print(chunk)
RunLogPatch({'op': 'replace',
'path': '',
'value': {'final_output': None,
'id': '754c4143-2348-46c4-ad2b-3095913084c6',
'logs': {},
'streamed_output': []}})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='The')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' purpose')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' regularization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' is')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' prevent')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' machine')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' learning')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' from')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' training')})
RunLogPatch({'op': 'add',
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' improve')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' its')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' general')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' ability')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' Over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' occurs')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' when')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' becomes')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' too')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complex')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' learns')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' noise')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' or')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' random')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' fluctuations')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' in')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' training')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' instead')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' capturing')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' underlying')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' patterns')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' relationships')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' Regular')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' techniques')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' introduce')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' penalty')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' term')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content="'s")})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' objective')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' function')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' which')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' discour')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ages')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' from')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' becoming')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' too')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complex')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' This')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' helps')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' control')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content="'s")})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complexity')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' reduces')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' risk')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' leading')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' better')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' performance')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' on')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' unseen')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='')})
RunLogPatch({'op': 'replace',
'path': '/final_output',
'value': {'generations': [[{'generation_info': {'finish_reason': 'stop'},
'message': AIMessageChunk(content="The purpose of model regularization is to prevent a machine learning model from overfitting the training da
'text': 'The purpose of model regularization is '
'to prevent a machine learning model '
'from overfitting the training data and '
'improve its generalization ability. '
'Overfitting occurs when a model becomes '
'too complex and learns to fit the noise '
'or random fluctuations in the training '
'data, instead of capturing the '
'underlying patterns and relationships. '
'Regularization techniques introduce a '
"penalty term to the model's objective "
'function, which discourages the model '
'from becoming too complex. This helps '
"to control the model's complexity and "
'reduces the risk of overfitting, '
'leading to better performance on unseen '
'data.'}]],
'llm_output': None,
'run': None}})

LangSmith

All ChatModels come with built-in LangSmith tracing. Just set the following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>

and any ChatModel invocation (whether it’s nested in a chain or not) will automatically be traced. A trace will include inputs,
outputs, latency, token usage, invocation params, environment params, and more. See an example here:
https://smith.langchain.com/public/a54192ae-dd5c-4f7a-88d1-daa1eaba1af7/r.

In LangSmith you can then provide feedback for any trace, compile annotated datasets for evals, debug performance in the
playground, and more.

[Legacy] call {#legacy-call}

Messages in -> message out

For convenience you can also treat chat models as callables. You can get chat completions by passing one or more
messages to the chat model. The response will be a message.
from langchain_core.messages import HumanMessage, SystemMessage

chat(
[
HumanMessage(
content="Translate this sentence from English to French: I love programming."
)
]
)
AIMessage(content="J'adore la programmation.")

OpenAI’s chat model supports multiple messages as input. Seehere for more information. Here is an example of sending a
system and user message to the chat model:

messages = [
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
]
chat(messages)
AIMessage(content="J'adore la programmation.")

[Legacy] generate

Batch calls, richer outputs

You can go one step further and generate completions for multiple sets of messages usinggenerate. This returns an LLMResult
with an additional message parameter. This will include additional information about each generation beyond the returned
message (e.g. the ﬁnish reason) and additional information about the full API call (e.g. total tokens used).

batch_messages = [
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
],
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love artiﬁcial intelligence."),
],
]
result = chat.generate(batch_messages)
result
LLMResult(generations=[[ChatGeneration(text="J'adore programmer.", generation_info={'ﬁnish_reason': 'stop'}, message=AIMessage(content="J'adore programmer.

You can recover things like token usage from this LLMResult:

result.llm_output
{'token_usage': {'prompt_tokens': 53,
'completion_tokens': 18,
'total_tokens': 71},
'model_name': 'gpt-3.5-turbo'}

Help us out by providing feedback on this documentation page:

Previous
« Chat Models
Next
Function calling »
Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
LangChain Expression Code
Language Cookbookwriting

Code writing
Example of how to use LCEL to write Python code.

%pip install --upgrade --quiet langchain-core langchain-experimental langchain-openai

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import (
ChatPromptTemplate,
)
from langchain_experimental.utilities import PythonREPL
from langchain_openai import ChatOpenAI
template = """Write some python code to solve the user's problem.

Return only python code in Markdown format, e.g.:

```python
....
```"""
prompt = ChatPromptTemplate.from_messages([("system", template), ("human", "{input}")])

model = ChatOpenAI()
def _sanitize_output(text: str):
_, after = text.split("```python")
return after.split("```")[0]
chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run
chain.invoke({"input": "whats 2 plus 2"})
Python REPL can execute arbitrary code. Use with caution.
'4\n'

Help us out by providing feedback on this documentation page:

Previous
« Agents
Next
Routing by semantic similarity »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers

Output Parsers
Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very
useful when you are using LLMs to generate any form of structured data.

Besides having a large collection of diﬀerent types of output parsers, one distinguishing beneﬁt of LangChain OutputParsers
is that many of them support streaming.

Quick Start

See this quick-start guide for an introduction to output parsers and how to work with them.

Output Parser Types

LangChain has lots of diﬀerent types of output parsers. This is a list of output parsers LangChain supports. The table below
has various pieces of information:

Name: The name of the output parser

Supports Streaming: Whether the output parser supports streaming.

Has Format Instructions: Whether the output parser has format instructions. This is generally available except when (a) the
desired schema is not speciﬁed in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the
OutputParser wraps another OutputParser.

Calls LLM: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct
misformatted output.

Input Type: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions)
need a message with speciﬁc kwargs.

Output Type: The output type of the object returned by the parser.

Description: Our commentary on this output parser and when to use it.
Supports Has Format Calls Input
Name Output Type Description
Streaming Instructions LLM Type
Message
Uses latest OpenAI function calling args tools and
(Passes tools tool_choice to structure the return output. If you
OpenAITools (with JSON object
to model) tool_choice)
are using a model that supports function calling,
this is generally the most reliable method.
(Passes Message Uses legacy OpenAI function calling args
OpenAIFunctions ✅ functions to (with JSON object functions and function_call to structure the return
model) function_call) output.
Returns a JSON object as specified. You can
str \|
specify a Pydantic model and it will return JSON
JSON ✅ ✅ Message
JSON object for that model. Probably the most reliable output
parser for getting structured data that does NOT
use function calling.
str \|
Returns a dictionary of tags. Use when XML
XML ✅ ✅ Message
dict output is needed. Use with models that are good
at writing XML (like Anthropic's).
str \|
CSV ✅ ✅ List[str] Returns a list of comma separated values.
Message
Wraps another output parser. If that output
str \| parser errors, then this will pass the error
OutputFixing ✅
Message message and the bad output to an LLM and ask
it to fix the output.
Wraps another output parser. If that output
parser errors, then this will pass the original
str \| inputs, the bad output, and the error message to
RetryWithError ✅
Message an LLM and ask it to fix it. Compared to
OutputFixingParser, this one also sends the
original instructions.
str \| Takes a user defined Pydantic model and
Pydantic ✅ pydantic.BaseModel
Message returns data in that format.
Takes a user defined Pydantic model and
str \|
YAML ✅ Message
pydantic.BaseModel returns data in that format. Uses YAML to
encode it.
str \| Useful for doing operations with pandas
PandasDataFrame ✅ dict
Message DataFrames.
str \| Parses response into one of the provided enum
Enum ✅ Enum
Message values.
str \|
Datetime ✅ datetime.datetime Parses response into a datetime string.
Message
An output parser that returns structured
information. It is less powerful than other output
str \|
Structured ✅ Dict[str, str] parsers since it only allows for fields to be
Message
strings. This can be useful when you are
working with smaller LLMs.

Help us out by providing feedback on this documentation page:

Previous
« Tracking token usage
Next
Quickstart »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiVector Retriever

On this page

MultiVector Retriever
It can often be beneﬁcial to store multiple vectors per document. There are multiple use cases where this is beneﬁcial.
LangChain has a base MultiVectorRetriever which makes querying this type of setup easy. A lot of the complexity lies in how to
create the multiple vectors per document. This notebook covers some of the common ways to create those vectors and use
the MultiVectorRetriever.

The methods to create multiple vectors per document include:

Smaller chunks: split a document into smaller chunks, and embed those (this is ParentDocumentRetriever).
Summary: create a summary for each document, embed that along with (or instead of) the document.
Hypothetical questions: create hypothetical questions that each document would be appropriate to answer, embed
those along with (or instead of) the document.

Note that this also enables another method of adding embeddings - manually. This is great because you can explicitly add
questions or queries that should lead to a document being recovered, giving you more control.

from langchain.retrievers.multi_vector import MultiVectorRetriever

from langchain.storage import InMemoryByteStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loaders = [
TextLoader("../../paul_graham_essay.txt"),
TextLoader("../../state_of_the_union.txt"),
]
docs = []
for loader in loaders:
docs.extend(loader.load())
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000)
docs = text_splitter.split_documents(docs)

Smaller chunks

Often times it can be useful to retrieve larger chunks of information, but embed smaller chunks. This allows for embeddings
to capture the semantic meaning as closely as possible, but for as much context as possible to be passed downstream. Note
that this is what the ParentDocumentRetriever does. Here we show what is going on under the hood.

# The vectorstore to use to index the child chunks

vectorstore = Chroma(
collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
import uuid

doc_ids = [str(uuid.uuid4()) for _ in docs]

# The splitter to use to create smaller chunks
child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
sub_docs = []
for i, doc in enumerate(docs):
_id = doc_ids[i]
_sub_docs = child_text_splitter.split_documents([doc])
for _doc in _sub_docs:
_doc.metadata[id_key] = _id
sub_docs.extend(_sub_docs)
retriever.vectorstore.add_documents(sub_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
# Vectorstore alone retrieves the small chunks
retriever.vectorstore.similarity_search("justice breyer")[0]
Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitution

# Retriever returns larger chunks

len(retriever.get_relevant_documents("justice breyer")[0].page_content)
9875

The default search type the retriever performs on the vector database is a similarity search. LangChain Vector Stores also
support searching via Max Marginal Relevance so if you want this instead you can just set thesearch_type property as follows:

from langchain.retrievers.multi_vector import SearchType

retriever.search_type = SearchType.mmr

len(retriever.get_relevant_documents("justice breyer")[0].page_content)
9875

Summary

Oftentimes a summary may be able to distill more accurately what a chunk is about, leading to better retrieval. Here we show
how to create summaries, and then embed those.

import uuid

from langchain_core.documents import Document

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
chain = (
{"doc": lambda x: x.page_content}
| ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
| ChatOpenAI(max_retries=0)
| StrOutputParser()
)
summaries = chain.batch(docs, {"max_concurrency": 5})
# The vectorstore to use to index the child chunks
vectorstore = Chroma(collection_name="summaries", embedding_function=OpenAIEmbeddings())
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
# # We can also add the original chunks to the vectorstore if we so want
# for i, doc in enumerate(docs):
# doc.metadata[id_key] = doc_ids[i]
# retriever.vectorstore.add_documents(docs)
sub_docs = vectorstore.similarity_search("justice breyer")
sub_docs[0]
Document(page_content="The document is a speech given by President Biden addressing various issues and outlining his agenda for the nation. He highlights the im

retrieved_docs = retriever.get_relevant_documents("justice breyer")

len(retrieved_docs[0].page_content)
9194

Hypothetical Queries
An LLM can also be used to generate a list of hypothetical questions that could be asked of a particular document. These
questions can then be embedded

functions = [
{
"name": "hypothetical_questions",
"description": "Generate hypothetical questions",
"parameters": {
"type": "object",
"properties": {
"questions": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["questions"],
},
}
]
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

chain = (
{"doc": lambda x: x.page_content}
# Only asking for 3 hypothetical questions, but this could be adjusted
| ChatPromptTemplate.from_template(
"Generate a list of exactly 3 hypothetical questions that the below document could be used to answer:\n\n{doc}"
)
| ChatOpenAI(max_retries=0, model="gpt-4").bind(
functions=functions, function_call={"name": "hypothetical_questions"}
)
| JsonKeyOutputFunctionsParser(key_name="questions")
)
chain.invoke(docs[0])
["What was the author's ﬁrst experience with programming like?",
'Why did the author switch their focus from AI to Lisp during their graduate studies?',
'What led the author to contemplate a career in art instead of computer science?']
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5})
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="hypo-questions", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]
question_docs = []
for i, question_list in enumerate(hypothetical_questions):
question_docs.extend(
[Document(page_content=s, metadata={id_key: doc_ids[i]}) for s in question_list]
)
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
sub_docs = vectorstore.similarity_search("justice breyer")
sub_docs
[Document(page_content='Who has been nominated to serve on the United States Supreme Court?', metadata={'doc_id': '0b3a349e-c936-4e77-9c40-0a39fc3e07f0'}
Document(page_content="What was the context and content of Robert Morris' advice to the document's author in 2010?", metadata={'doc_id': 'b2b2cdca-988a-4af1-
Document(page_content='How did personal circumstances inﬂuence the decision to pass on the leadership of Y Combinator?', metadata={'doc_id': 'b2b2cdca-988a-
Document(page_content='What were the reasons for the author leaving Yahoo in the summer of 1999?', metadata={'doc_id': 'ce4f4981-ca60-4f56-86f0-89466de623

retrieved_docs = retriever.get_relevant_documents("justice breyer")

len(retrieved_docs[0].page_content)
9194

Help us out by providing feedback on this documentation page:

Previous
« Long-Context Reorder
Next
Parent Document Retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text Recursively split by
ModulesRetrievalSplitters character

Recursively split by character

This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in
order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""] . This has the eﬀect of trying to keep all paragraphs
(and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest
semantically related pieces of text.

1. How the text is split: by list of characters.

2. How the chunk size is measured: by number of characters.

%pip install -qU langchain-text-splitters

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=100,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
print(texts[1])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and'
page_content='of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.'
text_splitter.split_text(state_of_the_union)[:2]
['Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and',
'of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.']

Help us out by providing feedback on this documentation page:

Previous
« Recursively split JSON
Next
Semantic Chunking »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage
Blog

YouTube
LangChain Expression
Language Cookbook

Cookbook
Example code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show
how to compose diﬀerent Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting
acquainted with LCEL, the Prompt + LLM page is a good place to start.

️ Prompt + LLM
The most common and valuable composition is taking:

️ RAG
Let’s look at adding in a retrieval step to a prompt and LLM, which adds

️ Multiple chains
Runnables can easily be used to string together multiple Chains

️ Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.

️ Agents
You can pass a Runnable into an agent. Make sure you have langchainhub

️ Code writing
Example of how to use LCEL to write Python code.

️ Routing by semantic similarity

With LCEL you can easily add [custom routing
️ Adding memory
This shows how to add memory to an arbitrary chain. Right now, you can

️ Adding moderation
This shows how to add in moderation (or other safeguards) around your

️ Managing prompt size

Agents dynamically call tools. The results of those tool calls are added

️ Using tools
You can use any Tools with Runnables easily.

Help us out by providing feedback on this documentation page:

Previous
« Add message history (memory)
Next
Prompt + LLM »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory in the Multi-Input
ModulesMoreMemoryChain

Memory in the Multi-Input Chain

Most memory objects assume a single input. In this notebook, we go over how to add memory to a chain that has multiple
inputs. We will add memory to a question/answering chain. This chain takes as inputs both related documents and a user
question.

from langchain_community.vectorstores import Chroma

from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(
texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query)
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationBuﬀerMemory
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
template = """You are a chatbot having a conversation with a human.

Given the following extracted parts of a long document and a question, create a ﬁnal answer.

{context}

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBuﬀerMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
OpenAI(temperature=0), chain_type="stuﬀ", memory=memory, prompt=prompt
)
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "human_input": query}, return_only_outputs=True)
{'output_text': ' Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, a

print(chain.memory.buﬀer)

Human: What did the president say about Justice Breyer

AI: Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring

Help us out by providing feedback on this documentation page:

Previous
« Memory in LLMChain
Next
Memory in Agent »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog

YouTube
Document
ModulesRetrievalloaders File Directory

On this page

File Directory
This covers how to load all documents in a directory.

Under the hood, by default this uses the UnstructuredLoader.

from langchain_community.document_loaders import DirectoryLoader

We can use the glob parameter to control which files to load. Note that here it doesn't load the.rst file or the .html files.

loader = DirectoryLoader('../', glob="**/*.md")

docs = loader.load()
len(docs)
1

Show a progress bar

By default a progress bar will not be shown. To show a progress bar, install thetqdm library (e.g. pip install tqdm ), and set the
show_progress parameter to True.

loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)

docs = loader.load()
Requirement already satisﬁed: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)

0it [00:00, ?it/s]

Use multithreading

By default the loading happens in one thread. In order to utilize several threads set theuse_multithreading ﬂag to true.

loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)

docs = loader.load()

Change loader class

By default this uses the UnstructuredLoader class. However, you can change up the type of loader pretty easily.

from langchain_community.document_loaders import TextLoader

loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
docs = loader.load()
len(docs)
1

If you need to load Python source code ﬁles, use the PythonLoader.

from langchain_community.document_loaders import PythonLoader

loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
docs = loader.load()
len(docs)
691
Auto-detect ﬁle encodings with TextLoader

In this example we will see some strategies that can be useful when loading a large list of arbitrary ﬁles from a directory using
the TextLoader class.

First to illustrate the problem, let's try to load multiple texts with arbitrary encodings.

path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)

A. Default Behavior
loader.load()
<pre style="white-space:pre;overﬂow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">│ /data/source/langchain/langch
│ │ 26 │ 27 │ 28 │ ❱ 29 │ 30 │ 31 │ 32 │ │ /home/spike/.pyenv/versions/3
│ │ 319 │ 320 │ 321 │ ❱ 322 │ 323 │ 324 │ 325 ╰────────────────────────────────────────────
UnicodeDecodeError: <span style="color: #008000; text-decoration-color

The above exception was the direct cause of the following exception:

╭─────────────────────────────── <span style

│ in <module>│ │ ❱ 1 loader.load()
│ 2 
│ │ /data/source/langchain/langch
│ │ 81 │ 82 │ 83 │ ❱ 84 │ 85 │ 86 │ 87 │ │ /data/source/langchain/langch
│ │ 75 │ 76 │ 77 │ ❱ 78 │ 79 │ 80 │ 81 │ │ /data/source/langchain/langch
│ │ 41 │ 42 │ 43 │ ❱ 44 │ 45 │ 46 │ 47 
╰────────────────────────────────────────────
RuntimeError: Error loading ..
The file example-non-utf8.txt uses a different encoding, so the load() function fails with a helpful message indicating which file
failed decoding.

With the default behavior of TextLoader any failure to load any of the documents will fail the whole loading process and no
documents are loaded.

B. Silent fail

We can pass the parameter silent_errors to the DirectoryLoader to skip the ﬁles which could not be loaded and continue the load
process.

loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)

docs = loader.load()
Error loading ../../../../../tests/integration_tests/examples/example-non-utf8.txt
doc_sources = [doc.metadata['source'] for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']

C. Auto detect encodings

We can also ask TextLoader to auto detect the ﬁle encoding before failing, by passing theautodetect_encoding to the loader class.

text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
doc_sources = [doc.metadata['source'] for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
'../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']

Help us out by providing feedback on this documentation page:

Previous
« CSV
Next
HTML »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnablePassthrough: Passing data
Language to through

On this page

Passing data through

RunnablePassthrough allows to pass inputs unchanged or with the addition of extra keys. This typically is used in conjuction
with RunnableParallel to assign data to a new key in the map.

RunnablePassthrough() called on it’s own, will simply take the input and pass it through.

RunnablePassthrough called with assign (RunnablePassthrough.assign(...)) will take the input, and will add the extra arguments
passed to the assign function.

See the example below:

%pip install --upgrade --quiet langchain langchain-openai

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

runnable = RunnableParallel(
passed=RunnablePassthrough(),
extra=RunnablePassthrough.assign(mult=lambda x: x["num"] * 3),
modiﬁed=lambda x: x["num"] + 1,
)

runnable.invoke({"num": 1})
{'passed': {'num': 1}, 'extra': {'num': 1, 'mult': 3}, 'modiﬁed': 2}

As seen above, passed key was called with RunnablePassthrough() and so it simply passed on {'num': 1} .

In the second line, we used RunnablePastshrough.assign with a lambda that multiplies the numerical value by 3. In this cased,extra
was set with {'num': 1, 'mult': 3} which is the original value with the mult key added.

Finally, we also set a third key in the map withmodiﬁed which uses a lambda to set a single value adding 1 to the num, which
resulted in modiﬁed key with the value of 2.

Retrieval Example

In the example below, we see a use case where we use RunnablePassthrough along with RunnableMap.
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

retrieval_chain.invoke("where did harrison work?")

'Harrison worked at Kensho.'

Here the input to prompt is expected to be a map with keys “context” and “question”. The user input is just the question. So
we need to get the context using our retriever and passthrough the user input under the “question” key. In this case, the
RunnablePassthrough allows us to pass on the user’s question to the prompt and model.

Help us out by providing feedback on this documentation page:

Previous
« RunnableParallel: Manipulating data
Next
RunnableLambda: Run Custom Functions »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
LangSmith

LangSmith
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you move from
prototype to production.

Check out the interactive walkthrough to get started.

For more information, please refer to theLangSmith documentation.

For tutorials and other end-to-end examples demonstrating ways to integrate LangSmith in your workﬂow, check out the
LangSmith Cookbook. Some of the guides therein include:

Leveraging user feedback in your JS application (link).

Building an automated feedback pipeline (link).
How to evaluate and audit your RAG workﬂows (link).
How to ﬁne-tune an LLM on real usage data (link).
How to use the LangChain Hub to version your prompts (link)

Help us out by providing feedback on this documentation page:

Previous
« ️ LangServe
Next
LangSmith »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
ModulesRetrievalIndexing

On this page

Indexing
Here, we will look at a basic indexing workﬂow using the LangChain indexing API.

The indexing API lets you load and keep in sync documents from any source into a vector store. Speciﬁcally, it helps:

Avoid writing duplicated content into the vector store

Avoid re-writing unchanged content
Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

Crucially, the indexing API will work even with documents that have gone through several transformation steps (e.g., via text
chunking) with respect to the original source documents.

How it works

LangChain indexing makes use of a record manager (RecordManager) that keeps track of document writes into the vector store.

When indexing content, hashes are computed for each document, and the following information is stored in the record
manager:

the document hash (hash of both page content and metadata)

write time
the source id – each document should include information in its metadata to allow us to determine the ultimate source
of this document

Deletion modes

When indexing documents into a vector store, it’s possible that some existing documents in the vector store should be
deleted. In certain situations you may want to remove any existing documents that are derived from the same sources as the
new documents being indexed. In others you may want to delete all existing documents wholesale. The indexing API deletion
modes let you pick the behavior you want:

Cleanup De-Duplicates Cleans Up Deleted Cleans Up Mutations of Source Docs Clean Up

Parallelizable
Mode Content Source Docs and/or Derived Docs Timing
None ✅ ✅ ❌ ❌ -
Incremental ✅ ✅ ❌ ✅ Continuously
At end of
Full ✅ ❌ ✅ ✅
indexing

None does not do any automatic clean up, allowing the user to manually do clean up of old content.

incremental and full oﬀer the following automated clean up:

If the content of the source document or derived documents haschanged, both incremental or full modes will clean up
(delete) previous versions of the content.
If the source document has been deleted (meaning it is not included in the documents currently being indexed), thefull
cleanup mode will delete it from the vector store correctly, but the incremental mode will not.

When content is mutated (e.g., the source PDF ﬁle was revised) there will be a period of time during indexing when both the
new and old versions may be returned to the user. This happens after the new content was written, but before the old version
was deleted.

incremental indexing minimizes this period of time as it is able to do clean up continuously, as it writes.
full mode does the clean up after all batches have been written.

Requirements

1. Do not use with a store that has been pre-populated with content independently of the indexing API, as the record
manager will not know that records have been inserted previously.
2. Only works with LangChain vectorstore’s that support:
document addition by id (add_documents method with ids argument)
delete by id (delete method with ids argument)

Compatible Vectorstores: AnalyticDB, AstraDB, AwaDB, Bagel, Cassandra, Chroma, DashVector, DatabricksVectorSearch, DeepLake, Dingo,
ElasticVectorSearch, ElasticsearchStore, FAISS , HanaDB, Milvus, MyScale, OpenSearchVectorSearch , PGVector, Pinecone, Qdrant, Redis, Rockset,
ScaNN, SupabaseVectorStore, SurrealDBStore, TimescaleVector, Vald, Vearch, VespaStore, Weaviate, ZepVectorStore.

Caution

The record manager relies on a time-based mechanism to determine what content can be cleaned up (when usingfull or
incremental cleanup modes).

If two tasks run back-to-back, and the ﬁrst task ﬁnishes before the clock time changes, then the second task may not be able
to clean up content.

This is unlikely to be an issue in actual settings for the following reasons:

1. The RecordManager uses higher resolution timestamps.

2. The data would need to change between the ﬁrst and the second tasks runs, which becomes unlikely if the time interval
between the tasks is small.
3. Indexing tasks typically take more than a few ms.

Quickstart

from langchain.indexes import SQLRecordManager, index

from langchain_core.documents import Document
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings

Initialize a vector store and set up the embeddings:

collection_name = "test_index"

embedding = OpenAIEmbeddings()

vectorstore = ElasticsearchStore(
es_url="http://localhost:9200", index_name="test_index", embedding=embedding
)

Initialize a record manager with an appropriate namespace.

Suggestion: Use a namespace that takes into account both the vector store and the collection name in the vector store; e.g.,
‘redis/my_docs’, ‘chromadb/my_docs’ or ‘postgres/my_docs’.

namespace = f"elasticsearch/{collection_name}"
record_manager = SQLRecordManager(
namespace, db_url="sqlite:///record_manager_cache.sql"
)

Create a schema before using the record manager.

record_manager.create_schema()

Let’s index some test documents:

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})

doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})
Indexing into an empty vector store:

def _clear():
"""Hacky helper method to clear content. See the `full` mode section to to understand why it works."""
index([], record_manager, vectorstore, cleanup="full", source_id_key="source")

None deletion mode

This mode does not do automatic clean up of old versions of content; however, it still takes care of content de-duplication.

_clear()
index(
[doc1, doc1, doc1, doc1, doc1],
record_manager,
vectorstore,
cleanup=None,
source_id_key="source",
)
{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
_clear()
index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Second time around all content will be skipped:

index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

"incremental" deletion mode

_clear()
index(
[doc1, doc2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Indexing again should result in both documents gettingskipped – also skipping the embedding operation!

index(
[doc1, doc2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

If we provide no documents with incremental indexing mode, nothing will change.

index([], record_manager, vectorstore, cleanup="incremental", source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

If we mutate a document, the new version will be written and all old versions sharing the same source will be deleted.

changed_doc_2 = Document(page_content="puppy", metadata={"source": "doggy.txt"})

index(
[changed_doc_2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 1}

"full" deletion mode

In full mode the user should pass the full universe of content that should be indexed into the indexing function.

Any documents that are not passed into the indexing function and are present in the vectorstore will be deleted!

This behavior is useful to handle deletions of source documents.

_clear()
all_docs = [doc1, doc2]
index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Say someone deleted the ﬁrst doc:

del all_docs[0]
all_docs
[Document(page_content='doggy', metadata={'source': 'doggy.txt'})]

Using full mode will clean up the deleted content as well.

index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")

{'num_added': 0, 'num_updated': 0, 'num_skipped': 1, 'num_deleted': 1}

Source

The metadata attribute contains a ﬁeld called source. This source should be pointing at theultimate provenance associated
with the given document.

For example, if these documents are representing chunks of some parent document, thesource for both documents should be
the same and reference the parent document.

In general, source should always be speciﬁed. Only use a None, if you never intend to use incremental mode, and for some
reason can’t specify the source ﬁeld correctly.

from langchain_text_splitters import CharacterTextSplitter

doc1 = Document(
page_content="kitty kitty kitty kitty kitty", metadata={"source": "kitty.txt"}
)
doc2 = Document(page_content="doggy doggy the doggy", metadata={"source": "doggy.txt"})
new_docs = CharacterTextSplitter(
separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
).split_documents([doc1, doc2])
new_docs
[Document(page_content='kitty kit', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='doggy doggy', metadata={'source': 'doggy.txt'}),
Document(page_content='the doggy', metadata={'source': 'doggy.txt'})]
_clear()
index(
new_docs,
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 5, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
changed_doggy_docs = [
Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]

This should delete the old versions of documents associated withdoggy.txt source and replace them with the new versions.

index(
changed_doggy_docs,
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]

Using with loaders

Indexing can accept either an iterable of documents or else any loader.

Attention: The loader must set source keys correctly.

from langchain_community.document_loaders.base import BaseLoader

class MyCustomLoader(BaseLoader):
def lazy_load(self):
text_splitter = CharacterTextSplitter(
separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
)
docs = [
Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]
yield from text_splitter.split_documents(docs)

def load(self):
return list(self.lazy_load())
_clear()
loader = MyCustomLoader()
loader.load()
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]
index(loader, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]

Help us out by providing feedback on this documentation page:

Previous
« Time-weighted vector store retriever
Next
Agents »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Tools as OpenAI
ModulesAgentsToolsFunctions

Tools as OpenAI Functions

This notebook goes over how to use LangChain tools as OpenAI functions.

%pip install -qU langchain-community langchain-openai

from langchain_community.tools import MoveFileTool
from langchain_core.messages import HumanMessage
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-3.5-turbo")
tools = [MoveFileTool()]
functions = [convert_to_openai_function(t) for t in tools]
functions[0]
{'name': 'move_file',
'description': 'Move or rename a file from one location to another',
'parameters': {'type': 'object',
'properties': {'source_path': {'description': 'Path of the file to move',
'type': 'string'},
'destination_path': {'description': 'New path for the moved file',
'type': 'string'}},
'required': ['source_path', 'destination_path']}}
message = model.invoke(
[HumanMessage(content="move file foo to bar")], functions=functions
)
message
AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}', 'name': 'move_file'}})
message.additional_kwargs["function_call"]
{'name': 'move_file',
'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}'}

With OpenAI chat models we can also automatically bind and convert function-like objects withbind_functions

model_with_functions = model.bind_functions(tools)
model_with_functions.invoke([HumanMessage(content="move ﬁle foo to bar")])
AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}', 'name': 'move_ﬁle'}})

Or we can use the update OpenAI API that uses tools and tool_choice instead of functions and function_call by using
ChatOpenAI.bind_tools:

model_with_tools = model.bind_tools(tools)
model_with_tools.invoke([HumanMessage(content="move ﬁle foo to bar")])
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_btkY3xV71cEVAOHnNa5qwo44', 'function': {'arguments': '{\n "source_path": "foo",\n "destination_p

Help us out by providing feedback on this documentation page:

Previous
« Deﬁning Custom Tools
Next
Chains »

Community
Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Get startedInstallation

On this page

Installation

Oﬃcial release

To install LangChain run:

Pip
Conda

pip install langchain

This will install the bare minimum requirements of LangChain. A lot of the value of LangChain comes when integrating it with
various model providers, datastores, etc. By default, the dependencies needed to do that are NOT installed. You will need to
install the dependencies for speciﬁc integrations separately.

From source

If you want to install from source, you can do so by cloning the repo and be sure that the directory is
PATH/TO/REPO/langchain/libs/langchain running:

pip install -e .

LangChain community

The langchain-community package contains third-party integrations. It is automatically installed by langchain , but can also be used
separately. Install with:

pip install langchain-community

LangChain core

The langchain-core package contains base abstractions that the rest of the LangChain ecosystem uses, along with the
LangChain Expression Language. It is automatically installed by langchain , but can also be used separately. Install with:

pip install langchain-core

LangChain experimental

The langchain-experimental package holds experimental LangChain code, intended for research and experimental uses. Install
with:

pip install langchain-experimental

LangServe
LangServe helps developers deploy LangChain runnables and chains as a REST API. LangServe is automatically installed by
LangChain CLI. If not using LangChain CLI, install with:

pip install "langserve[all]"

for both client and server dependencies. Or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server
code.

LangChain CLI

The LangChain CLI is useful for working with LangChain templates and other LangServe projects. Install with:

pip install langchain-cli

LangSmith SDK

The LangSmith SDK is automatically installed by LangChain. If not using LangChain, install with:

pip install langsmith

Help us out by providing feedback on this documentation page:

Previous
« Introduction
Next
Quickstart »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Types of
ModulesI/O Prompts `MessagePromptTemplate`

Types of `MessagePromptTemplate`
LangChain provides diﬀerent types of MessagePromptTemplate. The most commonly used are AIMessagePromptTemplate,
SystemMessagePromptTemplate and HumanMessagePromptTemplate , which create an AI message, system message and human
message respectively.

However, in cases where the chat model supports taking chat message with arbitrary role, you can use
ChatMessagePromptTemplate, which allows user to specify the role name.

from langchain.prompts import ChatMessagePromptTemplate

prompt = "May the {subject} be with you"

chat_message_prompt = ChatMessagePromptTemplate.from_template(
role="Jedi", template=prompt
)
chat_message_prompt.format(subject="force")
ChatMessage(content='May the force be with you', role='Jedi')

LangChain also provides MessagesPlaceholder, which gives you full control of what messages to be rendered during formatting.
This can be useful when you are uncertain of what role you should be using for your message prompt templates or when you
wish to insert a list of messages during formatting.

from langchain.prompts import (

ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)

human_prompt = "Summarize our conversation so far in {word_count} words."

human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages(
[MessagesPlaceholder(variable_name="conversation"), human_message_template]
)
from langchain_core.messages import AIMessage, HumanMessage

human_message = HumanMessage(content="What is the best way to learn programming?")

ai_message = AIMessage(
content="""\
1. Choose a programming language: Decide on a programming language that you want to learn.

2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.

3. Practice, practice, practice: The best way to learn programming is through hands-on experience\
"""
)

chat_prompt.format_prompt(
conversation=[human_message, ai_message], word_count="10"
).to_messages()
[HumanMessage(content='What is the best way to learn programming?'),
AIMessage(content='1. Choose a programming language: Decide on a programming language that you want to learn.\n\n2. Start with the basics: Familiarize yoursel
HumanMessage(content='Summarize our conversation so far in 10 words.')]

Help us out by providing feedback on this documentation page:

Previous
« Few-shot examples for chat models
Next
Partial prompt templates »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrieval

Retrieval
Many LLM applications require user-speciﬁc data that is not part of the model's training set. The primary way of accomplishing
this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM
when doing the generation step.

LangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation
covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly
complex. This encompasses several key modules.

Document loaders

Document loaders load documents from many diﬀerent sources. LangChain provides over 100 diﬀerent document loaders
as well as integrations with other major providers in the space, like AirByte and Unstructured. LangChain provides
integrations to load all types of documents (HTML, PDF, code) from all types of locations (private S3 buckets, public
websites).

Text Splitting

A key part of retrieval is fetching only the relevant parts of documents. This involves several transformation steps to prepare
the documents for retrieval. One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
LangChain provides several transformation algorithms for doing this, as well as logic optimized for speciﬁc document types
(code, markdown, etc).

Text embedding models

Another key part of retrieval is creating embeddings for documents. Embeddings capture the semantic meaning of the text,
allowing you to quickly and efficiently find other pieces of a text that are similar. LangChain provides integrations with over 25
different embedding providers and methods, from open-source to proprietary API, allowing you to choose the one best suited
for your needs. LangChain provides a standard interface, allowing you to easily swap between models.

Vector stores

With the rise of embeddings, there has emerged a need for databases to support eﬃcient storage and searching of these
embeddings. LangChain provides integrations with over 50 diﬀerent vectorstores, from open-source local ones to cloud-
hosted proprietary ones, allowing you to choose the one best suited for your needs. LangChain exposes a standard interface,
allowing you to easily swap between vector stores.

Retrievers

Once the data is in the database, you still need to retrieve it. LangChain supports many diﬀerent retrieval algorithms and is
one of the places where we add the most value. LangChain supports basic methods that are easy to get started - namely
simple semantic search. However, we have also added a collection of algorithms on top of this to increase performance.
These include:

Parent Document Retriever: This allows you to create multiple embeddings per parent document, allowing you to look
up smaller chunks but return larger context.
Self Query Retriever: User questions often contain a reference to something that isn't just semantic but rather
expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the semantic
part of a query from other metadata filters present in the query.
Ensemble Retriever: Sometimes you may want to retrieve documents from multiple different sources, or using multiple
different algorithms. The ensemble retriever allows you to easily do this.
And more!

Indexing

The LangChain Indexing API syncs your data from any source into a vector store, helping you:

Avoid writing duplicated content into the vector store

Avoid re-writing unchanged content
Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

Help us out by providing feedback on this documentation page:

Previous
« YAML parser
Next
Document loaders »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O PromptsQuick Start

On this page

Quick Start
Prompt templates are predeﬁned recipes for generating prompts for language models.

A template may include instructions, few-shot examples, and speciﬁc context and questions appropriate for a given task.

LangChain provides tooling to create and work with prompt templates.

LangChain strives to create model agnostic templates to make it easy to reuse existing templates across diﬀerent language
models.

Typically, language models expect the prompt to either be a string or else a list of chat messages.

PromptTemplate

Use PromptTemplate to create a template for a string prompt.

By default, PromptTemplate uses Python’s str.format syntax for templating.

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
"Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="chickens")
'Tell me a funny joke about chickens.'

The template supports any number of variables, including no variables:

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("Tell me a joke")

prompt_template.format()
'Tell me a joke'

You can create custom prompt templates that format the prompt in any way you want. For more information, seePrompt
Template Composition.

ChatPromptTemplate

The prompt to chat models is a list of chat messages.

Each chat message is associated with content, and an additional parameter calledrole. For example, in the OpenAI Chat
Completions API, a chat message can be associated with an AI assistant, a human or a system role.

Create a chat prompt template like this:

from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful AI bot. Your name is {name}."),
("human", "Hello, how are you doing?"),
("ai", "I'm doing well, thanks!"),
("human", "{user_input}"),
]
)

messages = chat_template.format_messages(name="Bob", user_input="What is your name?")

ChatPromptTemplate.from_messages accepts a variety of message representations.

For example, in addition to using the 2-tuple representation of (type, content) used above, you could pass in an instance of
MessagePromptTemplate or BaseMessage.

from langchain.prompts import HumanMessagePromptTemplate

from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

chat_template = ChatPromptTemplate.from_messages(
[
SystemMessage(
content=(
"You are a helpful assistant that re-writes the user's text to "
"sound more upbeat."
)
),
HumanMessagePromptTemplate.from_template("{text}"),
]
)
messages = chat_template.format_messages(text="I don't like eating tasty things")
print(messages)
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content="I don't like eating tasty things"

This provides you with a lot of ﬂexibility in how you construct your chat prompts.

LCEL

and ChatPromptTemplate implement the Runnable interface, the basic building block of theLangChain Expression
PromptTemplate
Language (LCEL). This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

PromptTemplateaccepts a dictionary (of the prompt variables) and returns aStringPromptValue. A ChatPromptTemplate accepts a
dictionary and returns a ChatPromptValue.

prompt_val = prompt_template.invoke({"adjective": "funny", "content": "chickens"})

prompt_val
StringPromptValue(text='Tell me a joke')
prompt_val.to_string()
'Tell me a joke'
prompt_val.to_messages()
[HumanMessage(content='Tell me a joke')]
chat_val = chat_template.invoke({"text": "i dont like eating tasty things."})
chat_val.to_messages()
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."),
HumanMessage(content='i dont like eating tasty things.')]
chat_val.to_string()
"System: You are a helpful assistant that re-writes the user's text to sound more upbeat.\nHuman: i dont like eating tasty things."

Help us out by providing feedback on this documentation page:

Previous
« Prompts
Next
Composition »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory in
ModulesMoreMemoryAgent

Memory in Agent
This notebook goes over adding memory to an Agent. Before going through this notebook, please walkthrough the following
notebooks, as this will build on top of both of them:

Memory in LLMChain
Custom Agents

In order to add a memory to an agent we are going to perform the following steps:

1. We are going to create an LLMChain with memory.

2. We are going to use that LLMChain to create a custom Agent.

For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBuﬀerMemory class.

from langchain.agents import AgentExecutor, Tool, ZeroShotAgent

from langchain.chains import LLMChain
from langchain.memory import ConversationBuﬀerMemory
from langchain_community.utilities import GoogleSearchAPIWrapper
from langchain_openai import OpenAI
search = GoogleSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]

Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBuﬀerMemory.

preﬁx = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suﬃx = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")

We can now construct the LLMChain, with the Memory object, and then create the agent.

llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)

agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_chain.run(input="How many people live in canada?")
> Entering new AgentExecutor chain...
Thought: I need to ﬁnd out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the ﬁnal answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'

To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.

agent_chain.run(input="what is their national anthem called?")

> Entering new AgentExecutor chain...

'The national anthem of Canada is called "O Canada".'

We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.

For fun, let’s compare this to an agent that does NOT have memory.

preﬁx = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suﬃx = """Begin!"

Question: {input}
{agent_scratchpad}"""

> Entering new AgentExecutor chain...

'The national anthem of [country] is [name of anthem].'

Help us out by providing feedback on this documentation page:

Previous
« Memory in the Multi-Input Chain
Next
Message Memory in Agent backed by a database »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Stream custom generator
Language to functions

On this page

Stream custom generator functions

You can use generator functions (ie. functions that use theyield keyword, and behave like iterators) in a LCEL pipeline.

The signature of these generators should be Iterator[Input] -> Iterator[Output]. Or for async generators: AsyncIterator[Input] ->
AsyncIterator[Output].

These are useful for: - implementing a custom output parser - modifying the output of a previous step, while preserving
streaming capabilities

Let’s implement a custom output parser for comma-separated lists.

Sync version

%pip install --upgrade --quiet langchain langchain-openai

from typing import Iterator, List

from langchain.prompts.chat import ChatPromptTemplate

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template(
"Write a comma-separated list of 5 animals similar to: {animal}"
)
model = ChatOpenAI(temperature=0.0)

str_chain = prompt | model | StrOutputParser()

for chunk in str_chain.stream({"animal": "bear"}):
print(chunk, end="", flush=True)
lion, tiger, wolf, gorilla, panda
str_chain.invoke({"animal": "bear"})
'lion, tiger, wolf, gorilla, panda'
# This is a custom parser that splits an iterator of llm tokens
# into a list of strings separated by commas
def split_into_list(input: Iterator[str]) -> Iterator[List[str]]:
# hold partial input until we get a comma
buffer = ""
for chunk in input:
# add current chunk to buffer
buffer += chunk
# while there are commas in the buffer
while "," in buffer:
# split buffer on comma
comma_index = buffer.index(",")
# yield everything before the comma
yield [buffer[:comma_index].strip()]
# save the rest for the next iteration
buffer = buffer[comma_index + 1 :]
# yield the last chunk
yield [buffer.strip()]
list_chain = str_chain | split_into_list
for chunk in list_chain.stream({"animal": "bear"}):
print(chunk, flush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['panda']
list_chain.invoke({"animal": "bear"})
['lion', 'tiger', 'wolf', 'gorilla', 'panda']
Async version

from typing import AsyncIterator

async def asplit_into_list(

input: AsyncIterator[str],
) -> AsyncIterator[List[str]]: # async def
buffer = ""
async for (
chunk
) in input: # ìnput` is a àsync_generator` object, so use àsync for`
buffer += chunk
while "," in buffer:
comma_index = buffer.index(",")
yield [buffer[:comma_index].strip()]
buffer = buffer[comma_index + 1 :]
yield [buffer.strip()]

list_chain = str_chain | asplit_into_list

async for chunk in list_chain.astream({"animal": "bear"}):
print(chunk, ﬂush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['panda']
await list_chain.ainvoke({"animal": "bear"})
['lion', 'tiger', 'wolf', 'gorilla', 'panda']

Help us out by providing feedback on this documentation page:

Previous
« Add fallbacks
Next
Inspect your runnables »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Long-Context
ModulesRetrievalRetrieversReorder

Long-Context Reorder
No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved
documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the
provided documents. See: https://arxiv.org/abs/2307.03172

To avoid this issue you can re-order documents after retrieval to avoid performance degradation.

%pip install --upgrade --quiet sentence-transformers > /dev/null

from langchain.chains import LLMChain, StuﬀDocumentsChain
from langchain.prompts import PromptTemplate
from langchain_community.document_transformers import (
LongContextReorder,
)
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAI

# Get embeddings.
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

texts = [
"Basquetball is a great sport.",
"Fly me to the moon is one of my favourite songs.",
"The Celtics are my favourite team.",
"This is a document about the Boston Celtics",
"I simply love going to the movies",
"The Boston Celtics won the game by 20 points",
"This is just a random text.",
"Elden Ring is one of the best games in the last 15 years.",
"L. Kornet is one of the best Celtics players.",
"Larry Bird was an iconic NBA player.",
]

# Create a retriever
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
search_kwargs={"k": 10}
)
query = "What can you tell me about the Celtics?"

# Get relevant documents ordered by relevance score

docs = retriever.get_relevant_documents(query)
docs
[Document(page_content='This is a document about the Boston Celtics'),
Document(page_content='The Celtics are my favourite team.'),
Document(page_content='L. Kornet is one of the best Celtics players.'),
Document(page_content='The Boston Celtics won the game by 20 points'),
Document(page_content='Larry Bird was an iconic NBA player.'),
Document(page_content='Elden Ring is one of the best games in the last 15 years.'),
Document(page_content='Basquetball is a great sport.'),
Document(page_content='I simply love going to the movies'),
Document(page_content='Fly me to the moon is one of my favourite songs.'),
Document(page_content='This is just a random text.')]
# Reorder the documents:
# Less relevant document will be at the middle of the list and more
# relevant elements at beginning / end.
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Conﬁrm that the 4 relevant documents are at beginning and end.

reordered_docs
[Document(page_content='The Celtics are my favourite team.'),
Document(page_content='The Boston Celtics won the game by 20 points'),
Document(page_content='Elden Ring is one of the best games in the last 15 years.'),
Document(page_content='I simply love going to the movies'),
Document(page_content='This is just a random text.'),
Document(page_content='Fly me to the moon is one of my favourite songs.'),
Document(page_content='Basquetball is a great sport.'),
Document(page_content='Larry Bird was an iconic NBA player.'),
Document(page_content='L. Kornet is one of the best Celtics players.'),
Document(page_content='This is a document about the Boston Celtics')]
# We prepare and run a custom Stuﬀ chain with reordered docs as context.

# Override prompts
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
llm = OpenAI()
stuﬀ_prompt_override = """Given this text extracts:
-----
{context}
-----
Please answer the following question:
{query}"""
prompt = PromptTemplate(
template=stuﬀ_prompt_override, input_variables=["context", "query"]
)

# Instantiate the chain

llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = StuﬀDocumentsChain(
llm_chain=llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
)
chain.run(input_documents=reordered_docs, query=query)
'\n\nThe Celtics are referenced in four of the nine text extracts. They are mentioned as the favorite team of the author, the winner of a basketball game, a team with o

Help us out by providing feedback on this documentation page:

Previous
« Ensemble Retriever
Next
MultiVector Retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Multiple
Language Cookbookchains

On this page

Multiple chains
Runnables can easily be used to string together multiple Chains

%pip install –upgrade –quiet langchain langchain-openai

from operator import itemgetter

from langchain_core.output_parsers import StrOutputParser

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt1 = ChatPromptTemplate.from_template("what is the city {person} is from?")

prompt2 = ChatPromptTemplate.from_template(
"what country is the city {city} in? respond in {language}"
)

model = ChatOpenAI()

chain1 = prompt1 | model | StrOutputParser()

chain2 = (
{"city": chain1, "language": itemgetter("language")}
| prompt2
| model
| StrOutputParser()
)

chain2.invoke({"person": "obama", "language": "spanish"})

'El país donde se encuentra la ciudad de Honolulu, donde nació Barack Obama, el 44º Presidente de los Estados Unidos, es Estados Unidos. Honolulu se encuentra

from langchain_core.runnables import RunnablePassthrough

prompt1 = ChatPromptTemplate.from_template(
"generate a {attribute} color. Return the name of the color and nothing else:"
)
prompt2 = ChatPromptTemplate.from_template(
"what is a fruit of color: {color}. Return the name of the fruit and nothing else:"
)
prompt3 = ChatPromptTemplate.from_template(
"what is a country with a ﬂag that has the color: {color}. Return the name of the country and nothing else:"
)
prompt4 = ChatPromptTemplate.from_template(
"What is the color of {fruit} and the ﬂag of {country}?"
)

model_parser = model | StrOutputParser()

color_generator = (
{"attribute": RunnablePassthrough()} | prompt1 | {"color": model_parser}
)
color_to_fruit = prompt2 | model_parser
color_to_country = prompt3 | model_parser
question_generator = (
color_generator | {"fruit": color_to_fruit, "country": color_to_country} | prompt4
)
question_generator.invoke("warm")
ChatPromptValue(messages=[HumanMessage(content='What is the color of strawberry and the ﬂag of China?', additional_kwargs={}, example=False)])
prompt = question_generator.invoke("warm")
model.invoke(prompt)
AIMessage(content='The color of an apple is typically red or green. The ﬂag of China is predominantly red with a large yellow star in the upper left corner and four sm
Branching and Merging

You may want the output of one component to be processed by 2 or more other components.RunnableParallels let you split
or fork the chain so multiple components can process the input in parallel. Later, other components can join or merge the
results to synthesize a ﬁnal response. This type of chain creates a computation graph that looks like the following:

Input
/\
/ \
Branch1 Branch2
\ /
\/
Combine
planner = (
ChatPromptTemplate.from_template("Generate an argument about: {input}")
| ChatOpenAI()
| StrOutputParser()
| {"base_response": RunnablePassthrough()}
)

arguments_for = (
ChatPromptTemplate.from_template(
"List the pros or positive aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)
arguments_against = (
ChatPromptTemplate.from_template(
"List the cons or negative aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)

ﬁnal_responder = (
ChatPromptTemplate.from_messages(
[
("ai", "{original_response}"),
("human", "Pros:\n{results_1}\n\nCons:\n{results_2}"),
("system", "Generate a ﬁnal response given the critique"),
]
)
| ChatOpenAI()
| StrOutputParser()
)

chain = (
planner
|{
"results_1": arguments_for,
"results_2": arguments_against,
"original_response": itemgetter("base_response"),
}
| ﬁnal_responder
)
chain.invoke({"input": "scrum"})
'While Scrum has its potential cons and challenges, many organizations have successfully embraced and implemented this project management framework to great e

Help us out by providing feedback on this documentation page:

Previous
« RAG
Next
Querying a SQL DB »
Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Ensemble
ModulesRetrievalRetrieversRetriever

On this page

Ensemble Retriever
The EnsembleRetriever takes a list of retrievers as input and ensemble the results of theirget_relevant_documents() methods and
rerank the results based on the Reciprocal Rank Fusion algorithm.

By leveraging the strengths of diﬀerent algorithms, the EnsembleRetriever can achieve better performance than any single
algorithm.

The most common pattern is to combine a sparse retriever (like BM25) with a dense retriever (like embedding similarity),
because their strengths are complementary. It is also known as “hybrid search”. The sparse retriever is good at ﬁnding
relevant documents based on keywords, while the dense retriever is good at ﬁnding relevant documents based on semantic
similarity.

%pip install --upgrade --quiet rank_bm25 > /dev/null

from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
doc_list_1 = [
"I like apples",
"I like oranges",
"Apples and oranges are fruits",
]

# initialize the bm25 retriever and faiss retriever

bm25_retriever = BM25Retriever.from_texts(
doc_list_1, metadatas=[{"source": 1}] * len(doc_list_1)
)
bm25_retriever.k = 2

doc_list_2 = [
"You like apples",
"You like oranges",
]

embedding = OpenAIEmbeddings()
faiss_vectorstore = FAISS.from_texts(
doc_list_2, embedding, metadatas=[{"source": 2}] * len(doc_list_2)
)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 2})

# initialize the ensemble retriever

ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
)
docs = ensemble_retriever.invoke("apples")
docs
[Document(page_content='You like apples', metadata={'source': 2}),
Document(page_content='I like apples', metadata={'source': 1}),
Document(page_content='You like oranges', metadata={'source': 2}),
Document(page_content='Apples and oranges are fruits', metadata={'source': 1})]

Runtime Conﬁguration

We can also configure the retrievers at runtime. In order to do this, we need to mark the fields as configurable

from langchain_core.runnables import ConﬁgurableField

faiss_retriever = faiss_vectorstore.as_retriever(
search_kwargs={"k": 2}
).configurable_fields(
search_kwargs=ConfigurableField(
id="search_kwargs_faiss",
name="Search Kwargs",
description="The search kwargs to use",
)
)
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
)
config = {"configurable": {"search_kwargs_faiss": {"k": 1}}}
docs = ensemble_retriever.invoke("apples", config=config)
docs

Notice that this only returns one source from the FAISS retriever, because we pass in the relevant conﬁguration at run time

Help us out by providing feedback on this documentation page:

Previous
« Contextual compression
Next
Long-Context Reorder »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsAgent TypesXML Agent

On this page

XML Agent
Some language models (like Anthropic’s Claude) are particularly good at reasoning/writing XML. This goes over how to use
an agent that uses XML when prompting.

Use with regular LLMs, not with chat models.

Use only with unstructured tools; i.e., tools that accept a single string input.
See AgentTypes documentation for more agent types.

from langchain import hub

from langchain.agents import AgentExecutor, create_xml_agent
from langchain_community.chat_models import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults

Initialize Tools

We will initialize the tools we want to use

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/xml-agent-convo")
# Choose the LLM that will drive the agent
llm = ChatAnthropic(model="claude-2")

# Construct the XML agent

agent = create_xml_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...

<tool>tavily_search_results_json</tool><tool_input>what is LangChain?[{'url': 'https://aws.amazon.com/what-is/langchain/', 'content': 'What Is LangChain? What is L

> Finished chain.

{'input': 'what is LangChain?',

'output': 'LangChain is an open source framework for building applications based on large language models (LLMs). It allows developers to leverage the power of LL

Using with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
# Notice that chat_history is a string, since this prompt is aimed at LLMs, not chat models
"chat_history": "Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you",
}
)

> Entering new AgentExecutor chain...

<ﬁnal_answer>Your name is Bob.</ﬁnal_answer>

Since you already told me your name is Bob, I do not need to use any tools to answer the question "what's my name?". I can provide the ﬁnal answer directly that you

> Finished chain.

{'input': "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
'chat_history': 'Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you',
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI tools
Next
JSON Chat Agent »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Vector
ModulesRetrievalstores

On this page

Vector stores
INFO

Head to Integrations for documentation on built-in integrations with 3rd-party vector stores.

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding
vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to
the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

Get started

This walkthrough showcases basic functionality related to vector stores. A key part of working with vector stores is creating
the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself
with the text embedding model interfaces before diving into this.

There are many great vector store options, here are a few that are free, open-source, and run entirely on your local machine.
Review all integrations for many great hosted oﬀerings.

Chroma
FAISS
Lance

This walkthrough uses the chroma vector database, which runs on your local machine as a library.

pip install chromadb

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.vectorstores import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Similarity search by vector

It is also possible to do a search for documents similar to a given embedding vector usingsimilarity_search_by_vector which
accepts an embedding vector as a parameter instead of a string.

embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

The query is the same, and so the result is also the same.

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Asynchronous operations

Vector stores are usually run as a separate service that requires some IO operations, and therefore they might be called
asynchronously. That gives performance beneﬁts as you don't waste time waiting for responses from external services. That
might also be important if you work with an asynchronous framework, such as FastAPI.

LangChain supports async operation on vector stores. All the methods might be called using their async counterparts, with
the preﬁx a, meaning async.

Qdrant is a vector store, which supports all the async operations, thus it will be used in this walkthrough.

pip install qdrant-client

from langchain_community.vectorstores import Qdrant

Create a vector store asynchronously

db = await Qdrant.afrom_documents(documents, embeddings, "http://localhost:6333")

Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = await db.asimilarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Similarity search by vector

embedding_vector = embeddings.embed_query(query)
docs = await db.asimilarity_search_by_vector(embedding_vector)

Maximum marginal relevance search (MMR)

Maximal marginal relevance optimizes for similarity to query and diversity among selected documents. It is also supported in
async API.

query = "What did the president say about Ketanji Brown Jackson"
found_docs = await qdrant.amax_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
print(f"{i + 1}.", doc.page_content, "\n")
1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together.

I recently visited the New York City Police Department days after the funerals of Oﬃcer Wilbert Mora and his partner, Oﬃcer Jason Rivera.

They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.

Oﬃcer Mora was 27 years old.

Oﬃcer Rivera was 22.

Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police oﬃcers.

I spoke with their families and told them that we are forever in debt for their sacriﬁce, and we will carry on their mission to restore the trust and safety every communit

I’ve worked on these issues a long time.

I know what works: Investing in crime prevention and community police oﬃcers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and sa

Help us out by providing feedback on this documentation page:

Previous
« CacheBackedEmbeddings
Next
Retrievers »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Custom
ModulesMoreMemoryMemory

Custom Memory
Although there are a few predeﬁned types of memory in LangChain, it is highly possible you will want to add your own type of
memory that is optimal for your application. This notebook covers how to do that.

For this notebook, we will add a custom memory type toConversationChain. In order to add a custom memory class, we need to
import the base memory class and subclass it.

from typing import Any, Dict, List

from langchain.chains import ConversationChain

from langchain.schema import BaseMemory
from langchain_openai import OpenAI
from pydantic import BaseModel

In this example, we will write a custom memory class that uses spaCy to extract entities and save information about them in
a simple hash table. Then, during the conversation, we will look at the input text, extract any entities, and put any information
about them into the context.

Please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.

For this, we will need spaCy.

%pip install --upgrade --quiet spacy

# !python -m spacy download en_core_web_lg
import spacy

nlp = spacy.load("en_core_web_lg")
class SpacyEntityMemory(BaseMemory, BaseModel):
"""Memory class for storing information about entities."""

# Deﬁne dictionary to store information about entities.

entities: dict = {}
# Deﬁne key to pass information about entities into prompt.
memory_key: str = "entities"

def clear(self):
self.entities = {}

@property
def memory_variables(self) -> List[str]:
"""Deﬁne the variables we are providing to the prompt."""
return [self.memory_key]

def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:

"""Load the memory variables, in this case the entity key."""
# Get the input text and run through spaCy
doc = nlp(inputs[list(inputs.keys())[0]])
# Extract known information about entities, if they exist.
entities = [
self.entities[str(ent)] for ent in doc.ents if str(ent) in self.entities
]
# Return combined information about entities to put into context.
return {self.memory_key: "\n".join(entities)}

def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buﬀer."""
# Get the input text and run through spaCy
text = inputs[list(inputs.keys())[0]]
doc = nlp(text)
# For each entity that was mentioned, save this information to the dictionary.
for ent in doc.ents:
ent_str = str(ent)
if ent_str in self.entities:
self.entities[ent_str] += f"\n{text}"
else:
self.entities[ent_str] = text

We now deﬁne a prompt that takes in information about entities as well as user input.

from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI doe

Relevant entity information:

{entities}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["entities", "input"], template=template)

And now we put it all together!

llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm, prompt=prompt, verbose=True, memory=SpacyEntityMemory()
)

In the ﬁrst example, with no prior knowledge about Harrison, the “Relevant entity information” section is empty.

conversation.predict(input="Harrison likes machine learning")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Relevant entity information:

Conversation:
Human: Harrison likes machine learning
AI:

> Finished ConversationChain chain.

" That's great to hear! Machine learning is a fascinating ﬁeld of study. It involves using algorithms to analyze data and make predictions. Have you ever studied mach

Now in the second example, we can see that it pulls in information about Harrison.

conversation.predict(
input="What do you think Harrison's favorite subject in college was?"
)

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Relevant entity information:

Harrison likes machine learning

Conversation:
Human: What do you think Harrison's favorite subject in college was?
AI:

> Finished ConversationChain chain.

' From what I know about Harrison, I believe his favorite subject in college was machine learning. He has expressed a strong interest in the subject and has mentione

Again, please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.

Help us out by providing feedback on this documentation page:

Previous
« Customizing Conversational Memory
Next
Multiple Memory classes »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Conﬁgure chain internals at
Language to runtime

On this page

Conﬁgure chain internals at runtime

Oftentimes you may want to experiment with, or even expose to the end user, multiple diﬀerent ways of doing things. In order
to make this experience as easy as possible, we have deﬁned two methods.

First, a configurable_fields method. This lets you configure particular fields of a runnable.

Second, a conﬁgurable_alternatives method. With this method, you can list out alternatives for any particular runnable that can be
set during runtime.

Conﬁguration Fields

With LLMs

With LLMs we can conﬁgure things like temperature

%pip install --upgrade --quiet langchain langchain-openai

from langchain.prompts import PromptTemplate
from langchain_core.runnables import ConﬁgurableField
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0).configurable_fields(
temperature=ConfigurableField(
id="llm_temperature",
name="LLM Temperature",
description="The temperature of the LLM",
)
)
model.invoke("pick a random number")
AIMessage(content='7')
model.with_config(configurable={"llm_temperature": 0.9}).invoke("pick a random number")
AIMessage(content='34')

We can also do this when its used as part of a chain

prompt = PromptTemplate.from_template("Pick a random number above {x}")

chain = prompt | model
chain.invoke({"x": 0})
AIMessage(content='57')
chain.with_conﬁg(conﬁgurable={"llm_temperature": 0.9}).invoke({"x": 0})
AIMessage(content='6')

With HubRunnables

This is useful to allow for switching of prompts

from langchain.runnables.hub import HubRunnable

prompt = HubRunnable("rlm/rag-prompt").configurable_fields(
owner_repo_commit=ConfigurableField(
id="hub_commit",
name="Hub Commit",
description="The Hub commit to pull from",
)
)
prompt.invoke({"question": "foo", "context": "bar"})
ChatPromptValue(messages=[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer t
prompt.with_config(configurable={"hub_commit": "rlm/rag-prompt-llama"}).invoke(
{"question": "foo", "context": "bar"}
)
ChatPromptValue(messages=[HumanMessage(content="[INST]<<SYS>> You are an assistant for question-answering tasks. Use the following pieces of retrieved co

Conﬁgurable Alternatives

With LLMs

Let’s take a look at doing this with LLMs

from langchain.prompts import PromptTemplate

from langchain_community.chat_models import ChatAnthropic
from langchain_core.runnables import ConfigurableField
from langchain_openai import ChatOpenAI
llm = ChatAnthropic(temperature=0).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="llm"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="anthropic",
# This adds a new option, with name òpenai` that is equal to `ChatOpenAI()`
openai=ChatOpenAI(),
# This adds a new option, with name `gpt4` that is equal to `ChatOpenAI(model="gpt-4")`
gpt4=ChatOpenAI(model="gpt-4"),
# You can add more configuration options here
)
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | llm
# By default it will call Anthropic
chain.invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")
# We can use `.with_config(configurable={"llm": "openai"})` to specify an llm to use
chain.with_config(configurable={"llm": "openai"}).invoke({"topic": "bears"})
AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they already have bear feet!")
# If we use the `default_key` then it uses the default
chain.with_config(configurable={"llm": "anthropic"}).invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")

With Prompts

We can do a similar thing, but alternate between prompts

llm = ChatAnthropic(temperature=0)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# By default it will write a joke
chain.invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")
# We can configure it write a poem
chain.with_config(configurable={"prompt": "poem"}).invoke({"topic": "bears"})
AIMessage(content=' Here is a short poem about bears:\n\nThe bears awaken from their sleep\nAnd lumber out into the deep\nForests filled with trees so tall\nForag

With Prompts and LLMs

We can also have multiple things configurable! Here’s an example doing that with both prompts and LLMs.
llm = ChatAnthropic(temperature=0).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="llm"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="anthropic",
# This adds a new option, with name òpenai` that is equal to `ChatOpenAI()`
openai=ChatOpenAI(),
# This adds a new option, with name `gpt4` that is equal to `ChatOpenAI(model="gpt-4")`
gpt4=ChatOpenAI(model="gpt-4"),
# You can add more configuration options here
)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# We can configure it write a poem with OpenAI
chain.with_config(configurable={"prompt": "poem", "llm": "openai"}).invoke(
{"topic": "bears"}
)
AIMessage(content="In the forest, where tall trees sway,\nA creature roams, both fierce and gray.\nWith mighty paws and piercing eyes,\nThe bear, a symbol of stren

# We can always just conﬁgure only one if we want

chain.with_conﬁg(conﬁgurable={"llm": "openai"}).invoke({"topic": "bears"})
AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they have bear feet!")

Saving conﬁgurations

We can also easily save conﬁgured chains as their own objects

openai_joke = chain.with_conﬁg(conﬁgurable={"llm": "openai"})

openai_joke.invoke({"topic": "bears"})
AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!")

Help us out by providing feedback on this documentation page:

Previous
« Bind runtime args
Next
Create a runnable with the `@chain` decorator »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels CacheBackedEmbeddings

On this page

CacheBackedEmbeddings
sidebar_label: Caching

Embeddings can be stored or temporarily cached to avoid needing to recompute them.

Caching embeddings can be done using a CacheBackedEmbeddings. The cache backed embedder is a wrapper around an
embedder that caches embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.

The main supported way to initialized a CacheBackedEmbeddings is from_bytes_store. This takes in the following parameters:

underlying_embedder: The embedder to use for embedding.

document_embedding_cache: Any ByteStore for caching document embeddings.
namespace: (optional, defaults to "") The namespace to use for document cache. This namespace is used to avoid
collisions with other caches. For example, set it to the name of the embedding model used.

Attention: Be sure to set the namespace parameter to avoid collisions of the same text embedded using diﬀerent embeddings
models.

from langchain.embeddings import CacheBackedEmbeddings

Using with a Vector Store

First, let’s see an example that uses the local ﬁle system for storing embeddings and uses FAISS vector store for retrieval.

%pip install --upgrade --quiet langchain-openai faiss-cpu

from langchain.storage import LocalFileStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

underlying_embeddings = OpenAIEmbeddings()

store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)

The cache is empty prior to embedding:

list(store.yield_keys())
[]

Load the document, split it into chunks, embed each chunk and load it into the vector store.

raw_documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

Create the vector store:

%%time
db = FAISS.from_documents(documents, cached_embedder)
CPU times: user 218 ms, sys: 29.7 ms, total: 248 ms
Wall time: 1.02 s
If we try to create the vector store again, it’ll be much faster since it does not need to re-compute any embeddings.

%%time
db2 = FAISS.from_documents(documents, cached_embedder)
CPU times: user 15.7 ms, sys: 2.22 ms, total: 18 ms
Wall time: 17.2 ms

And here are some of the embeddings that got created:

list(store.yield_keys())[:5]
['text-embedding-ada-00217a6727d-8916-54eb-b196-ec9c9d6ca472',
'text-embedding-ada-0025fc0d904-bd80-52da-95c9-441015bfb438',
'text-embedding-ada-002e4ad20ef-dfaa-5916-9459-f90c6d8e8159',
'text-embedding-ada-002ed199159-c1cd-5597-9757-f80498e8f17b',
'text-embedding-ada-0021297d37a-2bc1-5e19-bf13-6c950f075062']

Swapping the ByteStore

In order to use a diﬀerent ByteStore, just use it when creating your CacheBackedEmbeddings. Below, we create an equivalent
cached embeddings object, except using the non-persistent InMemoryByteStore instead:

from langchain.embeddings import CacheBackedEmbeddings

from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)

Help us out by providing feedback on this documentation page:

Previous
« Text embedding models
Next
Vector stores »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Get startedSecurity

On this page

Security
LangChain has a large ecosystem of integrations with various external resources like local and remote ﬁle systems, APIs and
databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the
ability to access, interact with and manipulate external resources.

Best Practices

When building such applications developers should remember to follow good security practices:

Limit Permissions: Scope permissions speciﬁcally to the application's need. Granting broad or excessive permissions
can introduce signiﬁcant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials,
disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), etc. as
appropriate for your application.
Anticipate Potential Misuse: Just as humans can err, so can Large Language Models (LLMs). Always assume that
any system access or credentials may be used in any way allowed by the permissions they are assigned. For example,
if a pair of database credentials allows deleting data, it’s safest to assume that any LLM able to use those credentials
may in fact delete data.
Defense in Depth: No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate,
the odds that a Large Language Model (LLM) may make a mistake. It’s best to combine multiple layered security
approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only
permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.

Risks of not doing so include, but are not limited to:

Data corruption or loss.

Unauthorized access to conﬁdential information.
Compromised performance or availability of critical resources.

Example scenarios with mitigation strategies:

A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of
files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to
read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that
API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such
misuse.
A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the
credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.

If you're building applications that access external resources like ﬁle systems, APIs or databases, consider speaking with your
company's security team to determine how to best design and secure your applications.

Reporting a Vulnerability

Please report security vulnerabilities by email to [email protected]. This will ensure the issue is promptly triaged and
acted upon as needed.

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
LangChain Expression Language (LCEL) »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Get log
ModulesI/O Chat Modelsprobabilities

On this page

Get log probabilities

Certain chat models can be conﬁgured to return token-level log probabilities. This guide walks through how to get logprobs
for a number of models.

OpenAI

Install the LangChain x OpenAI package and set your API key

%pip install -qU langchain-openai

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

For the OpenAI API to return log probabilities we need to conﬁgure thelogprobs=True param

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125").bind(logprobs=True)

msg = llm.invoke(("human", "how are you today"))

The logprobs are included on each output Message as part of theresponse_metadata:

msg.response_metadata["logprobs"]["content"][:5]
[{'token': 'As',
'bytes': [65, 115],
'logprob': -1.5358024,
'top_logprobs': []},
{'token': ' an',
'bytes': [32, 97, 110],
'logprob': -0.028062303,
'top_logprobs': []},
{'token': ' AI',
'bytes': [32, 65, 73],
'logprob': -0.009415812,
'top_logprobs': []},
{'token': ',', 'bytes': [44], 'logprob': -0.07371779, 'top_logprobs': []},
{'token': ' I',
'bytes': [32, 73],
'logprob': -4.298773e-05,
'top_logprobs': []}]

And are part of streamed Message chunks as well:

ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
if ct < 5:
full = chunk if full is None else full + chunk
if "logprobs" in full.response_metadata:
print(full.response_metadata["logprobs"]["content"])
else:
break
ct += 1
[]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b

Help us out by providing feedback on this documentation page:

Previous
« Custom Chat Model
Next
Streaming »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Create a runnable with the `@chain`
Language to decorator

Create a runnable with the `@chain` decorator

You can also turn an arbitrary function into a chain by adding a@chain decorator. This is functionaly equivalent to wrapping in
a RunnableLambda.

This will have the beneﬁt of improved observability by tracing your chain correctly. Any calls to runnables inside this function
will be traced as nested childen.

It will also allow you to use this as any other runnable, compose it in chain, etc.

Let’s take a look at this in action!

%pip install --upgrade --quiet langchain langchain-openai

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import chain
from langchain_openai import ChatOpenAI
prompt1 = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
prompt2 = ChatPromptTemplate.from_template("What is the subject of this joke: {joke}")
@chain
def custom_chain(text):
prompt_val1 = prompt1.invoke({"topic": text})
output1 = ChatOpenAI().invoke(prompt_val1)
parsed_output1 = StrOutputParser().invoke(output1)
chain2 = prompt2 | ChatOpenAI() | StrOutputParser()
return chain2.invoke({"joke": parsed_output1})

custom_chain is now a runnable, meaning you will need to useinvoke

custom_chain.invoke("bears")
'The subject of this joke is bears.'

If you check out your LangSmith traces, you should see acustom_chain trace in there, with the calls to OpenAI nested
underneath

Help us out by providing feedback on this documentation page:

Previous
« Conﬁgure chain internals at runtime
Next
Add fallbacks »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model Example Selector Select by maximal marginal relevance
ModulesI/O Prompts Types (MMR)

Select by maximal marginal relevance (MMR)

The MaxMarginalRelevanceExampleSelector selects examples based on a combination of which examples are most similar to the
inputs, while also optimizing for diversity. It does this by ﬁnding the examples with the embeddings that have the greatest
cosine similarity with the inputs, and then iteratively adding them while penalizing them for closeness to already selected
examples.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate

from langchain.prompts.example_selector import (
MaxMarginalRelevanceExampleSelector,
SemanticSimilarityExampleSelector,
)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.

examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
FAISS,
# The number of examples to produce.
k=2,
)
mmr_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# Input is a feeling, so should select the happy/sad example as the first one
print(mmr_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: windy
Output: calm

Input: worried
Output:
# Let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
FAISS,
# The number of examples to produce.
k=2,
)
similar_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: sunny
Output: gloomy

Input: worried
Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by length
Next
Select by n-gram overlap »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O LLMsStreaming

Streaming
All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke, batch,
abatch, stream, astream. This gives all LLMs basic support for streaming.

Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the ﬁnal
result returned by the underlying LLM provider. This obviously doesn’t give you token-by-token streaming, which requires
native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of ourLLM
integrations.

See which integrations support token-by-token streaming here.

from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)

for chunk in llm.stream("Write me a song about sparkling water."):
print(chunk, end="", ﬂush=True)

Verse 1:
Bubbles dancing in my glass
Clear and crisp, it's such a blast
Refreshing taste, it's like a dream
Sparkling water, you make me beam

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Verse 2:
No sugar, no calories, just pure bliss
You're the perfect drink, I must confess
From lemon to lime, so many ﬂavors to choose
Sparkling water, you never fail to amuse

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Bridge:
Some may say you're just plain water
But to me, you're so much more
You bring a sparkle to my day
In every single way

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Outro:
So here's to you, my dear sparkling water
You'll always be my go-to drink forever
With your eﬀervescence and refreshing taste
You'll always have a special place.

Help us out by providing feedback on this documentation page:

Previous
« Caching
Next
Tracking token usage »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Text Split by
ModulesRetrievalSplitters character

Split by character
This is the simplest method. This splits based on characters (by default “”) and measure chunk length by number of
characters.

1. How the text is split: by single character.

2. How the chunk size is measured: by number of characters.

%pip install -qU langchain-text-splitters

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
separator="\n\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Co

Here’s an example of passing metadata along with the documents, notice that it is split along with the documents.

metadatas = [{"document": 1}, {"document": 2}]

documents = text_splitter.create_documents(
[state_of_the_union, state_of_the_union], metadatas=metadatas
)
print(documents[0])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Co

text_splitter.split_text(state_of_the_union)[0]
'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Help us out by providing feedback on this documentation page:

Previous
« HTMLHeaderTextSplitter
Next
Split code »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model Custom
ModulesI/O LLMsLLM

Custom LLM
This notebook goes over how to create a custom LLM wrapper, in case you want to use your own LLM or a diﬀerent wrapper
than one that is supported in LangChain.

There are only two required things that a custom LLM needs to implement:

A _call method that takes in a string, some optional stop words, and returns a string.
A _llm_type property that returns a string. Used for logging purposes only.

There is a second optional thing it can implement:

An _identifying_params property that is used to help with printing of this class. Should return a dictionary.

Let’s implement a very simple custom LLM that just returns the ﬁrst n characters of the input.

from typing import Any, List, Mapping, Optional

from langchain_core.callbacks.manager import CallbackManagerForLLMRun

from langchain_core.language_models.llms import LLM
class CustomLLM(LLM):
n: int

@property
def _llm_type(self) -> str:
return "custom"

def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
return prompt[: self.n]

@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"n": self.n}

We can now use this as an any other LLM.

llm = CustomLLM(n=10)
llm.invoke("This is a foobar thing")
'This is a '

We can also print the LLM and see its custom print.

print(llm)
CustomLLM
Params: {'n': 10}

Help us out by providing feedback on this documentation page:

Previous
« Quick Start
Next
Caching »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Custom
ModulesAgentsHow-toagent

On this page

Custom agent
This notebook goes through how to create your own custom agent.

In this example, we will use OpenAI Tool Calling to create this agent.This is generally the most reliable way to create
agents.

We will ﬁrst create it WITHOUT memory, but we will then show how to add memory in. Memory is needed to enable
conversation.

Load the LLM

First, let’s load the language model we’re going to use to control the agent.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

Deﬁne Tools

Next, let’s deﬁne some tools to use. Let’s write a really simple Python function to calculate the length of a word that is passed
in.

Note that here the function docstring that we use is pretty important. Read more about why this is the casehere

from langchain.agents import tool

@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)

get_word_length.invoke("abc")
3
tools = [get_word_length]

Create Prompt

Now let us create the prompt. Because OpenAI Function Calling is ﬁnetuned for tool usage, we hardly need any instructions
on how to reason, or how to output format. We will just have two input variables: input and agent_scratchpad. input should be a
string containing the user objective. agent_scratchpad should be a sequence of messages that contains the previous agent tool
invocations and the corresponding tool outputs.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but don't know current events",
),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

Bind tools to LLM

How does the agent know what tools it can use?

In this case we’re relying on OpenAI tool calling LLMs, which take tools as a separate argument and have been speciﬁcally
trained to know when to invoke those tools.

To pass in our tools to the agent, we just need to format them to theOpenAI tool format and pass them to our model. (By bind -
ing the functions, we’re making sure that they’re passed in each time the model is invoked.)

llm_with_tools = llm.bind_tools(tools)

Create the Agent

Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for
formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a
component for converting the output message into an agent action/agent ﬁnish.

from langchain.agents.format_scratchpad.openai_tools import (

format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

list(agent_executor.stream({"input": "How many letters in the word eudca"}))

> Entering new AgentExecutor chain...

Invoking: `get_word_length` with `{'word': 'eudca'}`

5There are 5 letters in the word "eudca".

> Finished chain.

[{'actions': [OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudca'}, log="\nInvoking: `get_word_length` with `{'word': 'eudca'}`\n\n\n", message_lo
'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_A07D5TuyqcNIL0DIEVRPpZkg', 'function': {'arguments': '{\n "word": "
{'steps': [AgentStep(action=OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudca'}, log="\nInvoking: `get_word_length` with `{'word': 'eudca'}`\n\
'messages': [FunctionMessage(content='5', name='get_word_length')]},
{'output': 'There are 5 letters in the word "eudca".',
'messages': [AIMessage(content='There are 5 letters in the word "eudca".')]}]

If we compare this to the base LLM, we can see that the LLM alone struggles

llm.invoke("How many letters in the word educa")

AIMessage(content='There are 6 letters in the word "educa".')
Adding memory

This is great - we have an agent! However, this agent is stateless - it doesn’t remember anything about previous interactions.
This means you can’t ask follow up questions easily. Let’s ﬁx that by adding in memory.

In order to do this, we need to do two things:

1. Add a place for memory variables to go in the prompt

2. Keep track of the chat history

First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key"chat_history".
Notice that we put this ABOVE the new user input (to follow the conversation ﬂow).

from langchain.prompts import MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but bad at calculating lengths of words.",
),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

We can then set up a list to track the chat history

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

We can then put it all together!

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: x["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

When running, we now need to track the inputs and outputs as chat history

input1 = "how many letters in the word educa?"

result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend(
[
HumanMessage(content=input1),
AIMessage(content=result["output"]),
]
)
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})

> Entering new AgentExecutor chain...

Invoking: `get_word_length` with `{'word': 'educa'}`

5There are 5 letters in the word "educa".

> Finished chain.

> Entering new AgentExecutor chain...

No, "educa" is not a real word in English.

> Finished chain.

{'input': 'is that a real word?',
'chat_history': [HumanMessage(content='how many letters in the word educa?'),
AIMessage(content='There are 5 letters in the word "educa".')],
'output': 'No, "educa" is not a real word in English.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI assistants
Next
Streaming »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
ModulesMoreMemoryChat Messages

Chat Messages
INFO

Head to Integrations for documentation on built-in memory integrations with 3rd-party databases and tools.

One of the core utility classes underpinning most (if not all) memory modules is theChatMessageHistory class. This is a super
lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them
all.

You may want to use this class directly if you are managing memory outside of a chain.

from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()

history.add_user_message("hi!")

history.add_ai_message("whats up?")
history.messages
[HumanMessage(content='hi!', additional_kwargs={}),
AIMessage(content='whats up?', additional_kwargs={})]

Help us out by providing feedback on this documentation page:

Previous
« [Beta] Memory
Next
Memory types »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory Conversation Knowledge
ModulesMoreMemorytypes Graph

On this page

Conversation Knowledge Graph

This type of memory uses a knowledge graph to recreate memory.

Using memory with LLM

from langchain.memory import ConversationKGMemory

from langchain_openai import OpenAI
llm = OpenAI(temperature=0)
memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
memory.load_memory_variables({"input": "who is sam"})
{'history': 'On Sam: Sam is friend.'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationKGMemory(llm=llm, return_messages=True)

memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
memory.load_memory_variables({"input": "who is sam"})
{'history': [SystemMessage(content='On Sam: Sam is friend.', additional_kwargs={})]}

We can also more modularly get current entities from a new message (will use previous messages as context).

memory.get_current_entities("what's Sams favorite color?")

['Sam']

We can also more modularly get knowledge triplets from a new message (will use previous messages as context).

memory.get_knowledge_triplets("her favorite color is red")

[KnowledgeTriple(subject='Sam', predicate='favorite color', object_='red')]

Using in a chain

Let’s now use this in a chain!

llm = OpenAI(temperature=0)
from langchain.chains import ConversationChain
from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

{history}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation_with_kg = ConversationChain(
llm=llm, verbose=True, prompt=prompt, memory=ConversationKGMemory(llm=llm)
)

conversation_with_kg.predict(input="Hi, what's up?")

> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

Conversation:
Human: Hi, what's up?
AI:

> Finished chain.

" Hi there! I'm doing great. I'm currently in the process of learning about the world around me. I'm learning about diﬀerent cultures, languages, and customs. It's really

conversation_with_kg.predict(
input="My name is James and I'm helping Will. He's an engineer."
)

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

Conversation:
Human: My name is James and I'm helping Will. He's an engineer.
AI:

> Finished chain.

" Hi James, it's nice to meet you. I'm an AI and I understand you're helping Will, the engineer. What kind of engineering does he do?"
conversation_with_kg.predict(input="What do you know about Will?")

> Entering new ConversationChain chain...

Relevant Information:

On Will: Will is an engineer.

Conversation:
Human: What do you know about Will?
AI:

> Finished chain.

' Will is an engineer.'

Help us out by providing feedback on this documentation page:

Previous
« Entity
Next
Conversation Summary »

Community
Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversSelf-querying

On this page

Self-querying
Head to Integrations for documentation on vector stores with built-in support for self-querying.

A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural
language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that
structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic
similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of
stored documents and to execute those filters.

Get started

For demonstration purposes we’ll use a Chroma vector store. We’ve created a small demo set of documents that contain
summaries of movies.

Note: The self-query retriever requires you to have lark package installed.

%pip install --upgrade --quiet lark chromadb

from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science ﬁction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

Creating our self-querying retriever

Now we can instantiate our retriever. To do this we’ll need to provide some information upfront about the metadata ﬁelds that
our documents support and a short description of the document contents.

from langchain.chains.query_constructor.base import AttributeInfo

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI

metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
)

Testing it out

And now we can actually try using our retriever!

# This example only speciﬁes a ﬁlter
retriever.invoke("I want to watch a movie rated higher than 8.5")
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year
Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director':

# This example speciﬁes a query and a ﬁlter

retriever.invoke("Has Greta Gerwig directed any movies about women")
[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8

# This example speciﬁes a composite ﬁlter

retriever.invoke("What's a highly rated (above 8.5) science ﬁction ﬁlm?")
[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director':
Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year

# This example speciﬁes a query and composite ﬁlter

retriever.invoke(
"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Filter k

We can also use the self query retriever to specify k : the number of documents to fetch.

We can do this by passing enable_limit=True to the constructor.

retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_ﬁeld_info,
enable_limit=True,
)

# This example only speciﬁes a relevant query

retriever.invoke("What are two movies about dinosaurs")
[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'science ﬁction', 'rating': 7.7, 'year': 1993}),
Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Constructing from scratch with LCEL

To see what’s going on under the hood, and to have more custom control, we can reconstruct our retriever from scratch.

First, we need to create a query-construction chain. This chain will take a user query and generated aStructuredQuery object
which captures the ﬁlters speciﬁed by the user. We provide some helper functions for creating a prompt and output parser.
These have a number of tunable params that we’ll ignore here for simplicity.

from langchain.chains.query_constructor.base import (

StructuredQueryOutputParser,
get_query_constructor_prompt,
)

prompt = get_query_constructor_prompt(
document_content_description,
metadata_ﬁeld_info,
)
output_parser = StructuredQueryOutputParser.from_components()
query_constructor = prompt | llm | output_parser

Let’s look at our prompt:

print(prompt.format(query="dummy question"))
Your goal is to structure the user's query to match the request schema provided below.

<< Structured Request Schema >>

When responding use a markdown code snippet with a JSON object formatted in the following schema:

```json
{
"query": string \ text string to compare to document contents
"ﬁlter": string \ logical condition statement for ﬁltering documents
}
```

The query string should contain only text that is expected to match the contents of documents. Any conditions in the ﬁlter should not be mentioned in the query as we
A logical condition statement is composed of one or more comparison and logical operation statements.

A comparison statement takes the form: `comp(attr, val)`:

- `comp` (eq | ne | gt | gte | lt | lte | contain | like | in | nin): comparator
- `attr` (string): name of attribute to apply the comparison to
- `val` (string): is the comparison value

A logical operation statement takes the form `op(statement1, statement2, ...)`:

- `op` (and | or | not): logical operator
- `statement1`, `statement2`, ... (comparison statements or logical operation statements): one or more statements to apply the operation to

Make sure that you only use the comparators and logical operators listed above and no others.
Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters only use the attributed names with its function names if there are functions applied on them.
Make sure that filters only use format `YYYY-MM-DD` when handling date data typed values.
Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.
Make sure that filters are only used as needed. If there are no filters that should be applied return "NO_FILTER" for the filter value.

<< Example 1. >>

Data Source:
```json
{
"content": "Lyrics of a song",
"attributes": {
"artist": {
"type": "string",
"description": "Name of the song artist"
},
"length": {
"type": "integer",
"description": "Length of the song in seconds"
},
"genre": {
"type": "string",
"description": "The song genre, one of "pop", "rock" or "rap""
}
}
}
```

User Query:
What are songs by Taylor Swift or Katy Perry about teenage romance under 3 minutes long in the dance pop genre

Structured Request:
```json
{
"query": "teenager love",
"ﬁlter": "and(or(eq(\"artist\", \"Taylor Swift\"), eq(\"artist\", \"Katy Perry\")), lt(\"length\", 180), eq(\"genre\", \"pop\"))"
}
```

<< Example 2. >>

User Query:
What are songs that were not published on Spotify

Structured Request:
```json
{
"query": "",
"ﬁlter": "NO_FILTER"
"ﬁlter": "NO_FILTER"
}
```

<< Example 3. >>

Data Source:
```json
{
"content": "Brief summary of a movie",
"attributes": {
"genre": {
"description": "The genre of the movie. One of ['science ﬁction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
"type": "string"
},
"year": {
"description": "The year the movie was released",
"type": "integer"
},
"director": {
"description": "The name of the movie director",
"type": "string"
},
"rating": {
"description": "A 1-10 rating for the movie",
"type": "ﬂoat"
}
}
}
```

User Query:
dummy question

Structured Request:

And what our full chain produces:

query_constructor.invoke(
{
"query": "What are some sci-ﬁ movies from the 90's directed by Luc Besson about taxi drivers"
}
)
StructuredQuery(query='taxi driver', ﬁlter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre

The query constructor is the key element of the self-query retriever. To make a great retrieval system you’ll need to make
sure your query constructor works well. Often this requires adjusting the prompt, the examples in the prompt, the attribute
descriptions, etc. For an example that walks through reﬁning a query constructor on some hotel inventory data, check out this
cookbook.

The next key element is the structured query translator. This is the object responsible for translating the genericStructuredQuery
object into a metadata ﬁlter in the syntax of the vector store you’re using. LangChain comes with a number of built-in
translators. To see them all head to the Integrations section.

from langchain.retrievers.self_query.chroma import ChromaTranslator

retriever = SelfQueryRetriever(
query_constructor=query_constructor,
vectorstore=vectorstore,
structured_query_translator=ChromaTranslator(),
)
retriever.invoke(
"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Help us out by providing feedback on this documentation page:

Previous
« Parent Document Retriever
Next
Time-weighted vector store retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Document
ModulesRetrievalloaders Markdown

On this page

Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor.

This covers how to load Markdown documents into a document format that we can use downstream.

# !pip install unstructured > /dev/null

from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)
data = loader.load()
data
[Document(page_content="ð\x9f¦\x9cï¸\x8fð\x9f”\x97 LangChain\n\nâ\x9a¡ Building applications with LLMs through composability â\x9a¡\n\nLooking for the JS/TS v

Retain Elements

Under the hood, Unstructured creates diﬀerent "elements" for diﬀerent chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".

loader = UnstructuredMarkdownLoader(markdown_path, mode="elements")

data = loader.load()
data[0]
Document(page_content='ð\x9f¦\x9cï¸\x8fð\x9f”\x97 LangChain', metadata={'source': '../../../../../README.md', 'page_number': 1, 'category': 'Title'})

Help us out by providing feedback on this documentation page:

Previous
« JSON
Next
PDF »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O PromptsComposition

On this page

Composition
LangChain provides a user friendly interface for composing diﬀerent parts of prompts together. You can do this with either
string prompts or chat prompts. Constructing prompts this way allows for easy reuse of components.

String prompt composition

When working with string prompts, each template is joined together. You can work with either prompts directly or strings (the
ﬁrst element in the list needs to be a prompt).

from langchain.prompts import PromptTemplate

prompt = (
PromptTemplate.from_template("Tell me a joke about {topic}")
+ ", make it funny"
+ "\n\nand in {language}"
)
prompt
PromptTemplate(input_variables=['language', 'topic'], output_parser=None, partial_variables={}, template='Tell me a joke about {topic}, make it funny\n\nand in {langu

prompt.format(topic="sports", language="spanish")
'Tell me a joke about sports, make it funny\n\nand in spanish'

You can also use it in an LLMChain, just like before.

from langchain.chains import LLMChain

from langchain_openai import ChatOpenAI
model = ChatOpenAI()
chain = LLMChain(llm=model, prompt=prompt)
chain.run(topic="sports", language="spanish")
'¿Por qué el futbolista llevaba un paraguas al partido?\n\nPorque pronosticaban lluvia de goles.'

Chat prompt composition

A chat prompt is made up a of a list of messages. Purely for developer experience, we’ve added a convenient way to create
these prompts. In this pipeline, each new element is a new message in the ﬁnal prompt.

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

First, let’s initialize the base ChatPromptTemplate with a system message. It doesn’t have to start with a system, but it’s often
good practice

prompt = SystemMessage(content="You are a nice pirate")

You can then easily create a pipeline combining it with other messagesor message templates. Use a Message when there is
no variables to be formatted, use a MessageTemplate when there are variables to be formatted. You can also use just a string
(note: this will automatically get inferred as a HumanMessagePromptTemplate.)

new_prompt = (
prompt + HumanMessage(content="hi") + AIMessage(content="what?") + "{input}"
)

Under the hood, this creates an instance of the ChatPromptTemplate class, so you can use it just as you did before!

new_prompt.format_messages(input="i said hi")

[SystemMessage(content='You are a nice pirate', additional_kwargs={}),
HumanMessage(content='hi', additional_kwargs={}, example=False),
AIMessage(content='what?', additional_kwargs={}, example=False),
HumanMessage(content='i said hi', additional_kwargs={}, example=False)]

You can also use it in an LLMChain, just like before.

from langchain.chains import LLMChain

from langchain_openai import ChatOpenAI
model = ChatOpenAI()
chain = LLMChain(llm=model, prompt=new_prompt)
chain.run("i said hi")
'Oh, hello! How can I assist you today?'

Help us out by providing feedback on this documentation page:

Previous
« Quick Start
Next
Example Selector Types »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typestools

On this page

OpenAI tools
Newer OpenAI models have been ﬁne-tuned to detect when one or more function(s) should be called and respond with the
inputs that should be passed to the function(s). In an API call, you can describe functions and have the model intelligently
choose to output a JSON object containing arguments to call these functions. The goal of the OpenAI tools APIs is to more
reliably return valid and useful function calls than what can be done using a generic text completion or chat API.

OpenAI termed the capability to invoke a single function as functions, and the capability to invoke one or more functions as
tools.

In the OpenAI Chat API, functions are now considered a legacy options that is deprecated in favor oftools.

If you’re creating agents using OpenAI models, you should be using this OpenAI Tools agent rather than the OpenAI
functions agent.

Using tools allows the model to request that more than one function will be called upon when appropriate.

In some situations, this can help signﬁcantly reduce the time that it takes an agent to achieve its goal.

See

OpenAI chat create

OpenAI function calling

%pip install --upgrade --quiet langchain-openai tavily-python

from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

Initialize Tools

For this agent let’s give it the ability to search the web with Tavily.

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/openai-tools-agent")
# Choose the LLM that will drive the agent
# Only certain models support this
llm = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0)

# Construct the OpenAI Tools agent

agent = create_openai_tools_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})
> Entering new AgentExecutor chain...

Invoking: `tavily_search_results_json` with `{'query': 'LangChain'}`

[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c

> Finished chain.

{'input': 'what is LangChain?',

'output': 'LangChain is an open source orchestration framework for the development of applications using large language models. It is essentially a library of abstract

Using with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Don't use tools to look this up unless you NEED to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...

Your name is Bob.

> Finished chain.

{'input': "what's my name? Don't use tools to look this up unless you NEED to",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI functions
Next
XML Agent »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Document
ModulesRetrievalloaders HTML

On this page

HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be
displayed in a web browser.

This covers how to load HTML documents into a document format that we can use downstream.

from langchain_community.document_loaders import UnstructuredHTMLLoader

loader = UnstructuredHTMLLoader("example_data/fake-content.html")
data = loader.load()
data
[Document(page_content='My First Heading\n\nMy ﬁrst paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]

Loading HTML with BeautifulSoup4

We can also use BeautifulSoup4 to load HTML documents using the BSHTMLLoader. This will extract the text from the HTML into
page_content, and the page title as title into metadata.

from langchain_community.document_loaders import BSHTMLLoader

loader = BSHTMLLoader("example_data/fake-content.html")
data = loader.load()
data
[Document(page_content='\n\nTest Title\n\n\nMy First Heading\nMy ﬁrst paragraph.\n\n\n', metadata={'source': 'example_data/fake-content.html', 'title': 'Test Title'})

Help us out by providing feedback on this documentation page:

Previous
« File Directory
Next
JSON »

Community

Discord
Twitter
GitHub

Python

JS/TS
More
Homepage
Blog

YouTube
LangChain Expression Prompt +
Language CookbookLLM

On this page

Prompt + LLM
The most common and valuable composition is taking:

PromptTemplate / ChatPromptTemplate -> LLM / ChatModel -> OutputParser

Almost any other chains you build will use this building block.

PromptTemplate + LLM

The simplest composition is just combining a prompt and model to create a chain that takes user input, adds it to a prompt,
passes it to a model, and returns the raw model output.

Note, you can mix and match PromptTemplate/ChatPromptTemplates and LLMs/ChatModels as you like here.

%pip install –upgrade –quiet langchain langchain-openai

from langchain_core.prompts import ChatPromptTemplate

from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")

model = ChatOpenAI()
chain = prompt | model
chain.invoke({"foo": "bears"})
AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!", additional_kwargs={}, example=False)

Often times we want to attach kwargs that’ll be passed to each model call. Here are a few examples of that:

Attaching Stop Sequences

chain = prompt | model.bind(stop=["\n"])
chain.invoke({"foo": "bears"})
AIMessage(content='Why did the bear never wear shoes?', additional_kwargs={}, example=False)

Attaching Function Call information

functions = [
{
"name": "joke",
"description": "A joke",
"parameters": {
"type": "object",
"properties": {
"setup": {"type": "string", "description": "The setup for the joke"},
"punchline": {
"type": "string",
"description": "The punchline for the joke",
},
},
"required": ["setup", "punchline"],
},
}
]
chain = prompt | model.bind(function_call={"name": "joke"}, functions=functions)
chain.invoke({"foo": "bears"}, conﬁg={})
AIMessage(content='', additional_kwargs={'function_call': {'name': 'joke', 'arguments': '{\n "setup": "Why don\'t bears wear shoes?",\n "punchline": "Because they hav
PromptTemplate + LLM + OutputParser

We can also add in an output parser to easily transform the raw LLM/ChatModel output into a more workable format

from langchain_core.output_parsers import StrOutputParser

chain = prompt | model | StrOutputParser()

Notice that this now returns a string - a much more workable format for downstream tasks

chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?\n\nBecause they have bear feet!"

Functions Output Parser

When you specify the function to return, you may just want to parse that directly

from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser

chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonOutputFunctionsParser()
)
chain.invoke({"foo": "bears"})
{'setup': "Why don't bears like fast food?",
'punchline': "Because they can't catch it!"}
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?"

Simplifying input

To make invocation even simpler, we can add a RunnableParallel to take care of creating the prompt input dict for us:

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

map_ = RunnableParallel(foo=RunnablePassthrough())
chain = (
map_
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears wear shoes?"

Since we’re composing our map with another Runnable, we can even use some syntactic sugar and just use a dict:

chain = (
{"foo": RunnablePassthrough()}
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears like fast food?"

Help us out by providing feedback on this documentation page:

Previous
« Cookbook
Next
RAG »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text Split by
ModulesRetrievalSplitters tokens

On this page

Split by tokens
Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is
therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you
should use the same tokenizer as used in the language model.

tiktoken

tiktoken is a fast BPE tokenizer created by OpenAI.

We can use it to estimate tokens used. It will probably be more accurate for the OpenAI models.

1. How the text is split: by character passed in.

2. How the chunk size is measured: by tiktoken tokenizer.

%pip install --upgrade --quiet langchain-text-splitters tiktoken

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Last year COVID-19 kept us apart. This year we are ﬁnally together again.

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

Note that if we use CharacterTextSplitter.from_tiktoken_encoder, text is only split by CharacterTextSplitter and tiktoken tokenizer is used to
merge splits. It means that split can be larger than chunk size measured by tiktoken tokenizer. We can use
RecursiveCharacterTextSplitter.from_tiktoken_encoder to make sure splits are not larger than chunk size of tokens allowed by the
language model, where each split will be recursively split if it has a larger size.

We can also load a tiktoken splitter directly, which ensure each split is smaller than chunk size.

from langchain_text_splitters import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)

texts = text_splitter.split_text(state_of_the_union)
print(texts[0])

spaCy

spaCy is an open-source software library for advanced natural language processing, written in the programming
languages Python and Cython.

Another alternative to NLTK is to use spaCy tokenizer.

1. How the text is split: by spaCy tokenizer.
2. How the chunk size is measured: by number of characters.

%pip install --upgrade --quiet spacy

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import SpacyTextSplitter

text_splitter = SpacyTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.

Members of Congress and the Cabinet.

Justices of the Supreme Court.

My fellow Americans.

Last year COVID-19 kept us apart.

This year we are ﬁnally together again.

Tonight, we meet as Democrats Republicans and Independents.

But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

And with an unwavering resolve that freedom will always triumph over tyranny.

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.

But he badly miscalculated.

He thought he could roll into Ukraine and the world would roll over.

Instead he met a wall of strength he never imagined.

He met the Ukrainian people.

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.

SentenceTransformers

The SentenceTransformersTokenTextSplitter is a specialized text splitter for use with the sentence-transformer models. The default
behaviour is to split the text into chunks that ﬁt the token window of the sentence transformer model that you would like to
use.

from langchain_text_splitters import SentenceTransformersTokenTextSplitter

splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0)
text = "Lorem "
count_start_and_stop_tokens = 2
text_token_count = splitter.count_tokens(text=text) - count_start_and_stop_tokens
print(text_token_count)
2
token_multiplier = splitter.maximum_tokens_per_chunk // text_token_count + 1

# `text_to_split` does not ﬁt in a single chunk

text_to_split = text * token_multiplier

print(f"tokens in text to split: {splitter.count_tokens(text=text_to_split)}")

tokens in text to split: 514
text_chunks = splitter.split_text(text=text_to_split)

print(text_chunks[1])
lorem

NLTK

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and
statistical natural language processing (NLP) for English written in the Python programming language.

Rather than just splitting on “”, we can use NLTK to split based on NLTK tokenizers.

1. How the text is split: by NLTK tokenizer.

2. How the chunk size is measured: by number of characters.

# pip install nltk

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.

Members of Congress and the Cabinet.

Justices of the Supreme Court.

My fellow Americans.

Last year COVID-19 kept us apart.

This year we are ﬁnally together again.

Tonight, we meet as Democrats Republicans and Independents.

But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

And with an unwavering resolve that freedom will always triumph over tyranny.

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.

But he badly miscalculated.

He thought he could roll into Ukraine and the world would roll over.

Instead he met a wall of strength he never imagined.

He met the Ukrainian people.

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.

Groups of citizens blocking tanks with their bodies.

KoNLPY

KoNLPy: Korean NLP in Python is is a Python package for natural language processing (NLP) of the Korean
language.

Token splitting involves the segmentation of text into smaller, more manageable units called tokens. These tokens are often
words, phrases, symbols, or other meaningful elements crucial for further processing and analysis. In languages like English,
token splitting typically involves separating words by spaces and punctuation marks. The eﬀectiveness of token splitting
largely depends on the tokenizer’s understanding of the language structure, ensuring the generation of meaningful tokens.
Since tokenizers designed for the English language are not equipped to understand the unique semantic structures of other
languages, such as Korean, they cannot be eﬀectively used for Korean language processing.

Token splitting for Korean with KoNLPy’s Kkma Analyzer

In case of Korean text, KoNLPY includes at morphological analyzer calledKkma (Korean Knowledge Morpheme Analyzer).
Kkma provides detailed morphological analysis of Korean text. It breaks down sentences into words and words into their
respective morphemes, identifying parts of speech for each token. It can segment a block of text into individual sentences,
which is particularly useful for processing long texts.

Usage Considerations

While Kkma is renowned for its detailed analysis, it is important to note that this precision may impact processing speed. Thus,
Kkma is best suited for applications where analytical depth is prioritized over rapid text processing.

# pip install konlpy

# This is a long Korean document that we want to split up into its component sentences.
with open("./your_korean_doc.txt") as f:
korean_document = f.read()
from langchain_text_splitters import KonlpyTextSplitter

text_splitter = KonlpyTextSplitter()
texts = text_splitter.split_text(korean_document)
# The sentences are split with "\n\n" characters.
print(texts[0])
춘향전 옛날에 남원에 이 도령이라는 벼슬아치 아들이 있었다.

그의 외모는 빛나는 달처럼 잘생겼고, 그의 학식과 기예는 남보다 뛰어났다.

한편, 이 마을에는 춘향이라는 절세 가인이 살고 있었다.

춘 향의 아름다움은 꽃과 같아 마을 사람들 로부터 많은 사랑을 받았다.

어느 봄날, 도령은 친구들과 놀러 나갔다가 춘 향을 만 나 첫 눈에 반하고 말았다.

두 사람은 서로 사랑하게 되었고, 이내 비밀스러운 사랑의 맹세를 나누었다.

하지만 좋은 날들은 오래가지 않았다.

도령의 아버지가 다른 곳으로 전근을 가게 되어 도령도 떠나 야만 했다.

이별의 아픔 속에서도, 두 사람은 재회를 기약하며 서로를 믿고 기다리기로 했다.

그러나 새로 부임한 관아의 사또가 춘 향의 아름다움에 욕심을 내 어 그녀에게 강요를 시작했다.

춘 향 은 도령에 대한 자신의 사랑을 지키기 위해, 사또의 요구를 단호히 거절했다.

이에 분노한 사또는 춘 향을 감옥에 가두고 혹독한 형벌을 내렸다.

이야기는 이 도령이 고위 관직에 오른 후, 춘 향을 구해 내는 것으로 끝난다.

두 사람은 오랜 시련 끝에 다시 만나게 되고, 그들의 사랑은 온 세상에 전해 지며 후세에까지 이어진다.

- 춘향전 (The Tale of Chunhyang)

Hugging Face tokenizer

Hugging Face has many tokenizers.

We use Hugging Face tokenizer, the GPT2TokenizerFast to count the text length in tokens.

1. How the text is split: by character passed in.

2. How the chunk size is measured: by number of tokens calculated by theHugging Face tokenizer.

from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# This is a long document we can split up.
with open("../../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
tokenizer, chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Last year COVID-19 kept us apart. This year we are ﬁnally together again.

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

Help us out by providing feedback on this documentation page:

Previous
« Semantic Chunking
Next
Retrieval »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesChains

Chains
Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to
do this is with LCEL.

LCEL is great for constructing your own chains, but it’s also nice to have chains that you can use oﬀ-the-shelf. There are two
types of oﬀ-the-shelf chains that LangChain supports:

Chains that are built with LCEL. In this case, LangChain oﬀers a higher-level constructor method. However, all that is
being done under the hood is constructing a chain with LCEL.

[Legacy] Chains constructed by subclassing from a legacy Chain class. These chains do not use LCEL under the hood
but are rather standalone classes.

We are working creating methods that create LCEL versions of all chains. We are doing this for a few reasons.

1. Chains constructed in this way are nice because if you want to modify the internals of a chain you can simply modify the
LCEL.

2. These chains natively support streaming, async, and batch out of the box.

3. These chains automatically get observability at each step.

This page contains two lists. First, a list of all LCEL chain constructors. Second, a list of all legacy Chains.

LCEL Chains

Below is a table of all LCEL chain constructors. In addition, we report on:

Chain Constructor

The constructor function for this chain. These are all methods that return LCEL runnables. We also link to the API
documentation.

Function Calling

Whether this requires OpenAI function calling.

Other Tools

What other tools (if any) are used in this chain.

When to Use

Our commentary on when to use this chain.

Functio
Other
Chain Constructor n When to Use
Tools
Calling
This chain takes a list of documents and formats them all into a
prompt, then passes that prompt to an LLM. It passes ALL
create_stuﬀ_documents_chain
documents, so you should make sure it ﬁts within the context window
the LLM you are using.
If you want to use OpenAI function calling to OPTIONALLY structured
create_openai_fn_runnable ✅ an output response. You may pass in multiple functions for it call, but it
does not have to call it.
If you want to use OpenAI function calling to FORCE the LLM to
create_structured_output_runnable ✅ respond with a certain function. You may only pass in one function,
and the chain will ALWAYS return this response.
Can be used to generate queries. You must specify a list of allowed
load_query_constructor_runnable operations, and then will return a runnable that converts a natural
language query into those allowed operations.
SQL If you want to construct a query for a SQL database from natural
create_sql_query_chain
Database language.
This chain takes in conversation history and then uses that to
create_history_aware_retriever Retriever
generate a search query which is passed to the underlying retriever.
This chain takes in a user inquiry, which is then passed to the retriever
create_retrieval_chain Retriever to fetch relevant documents. Those documents (and original inputs)
are then passed to an LLM to generate a response

Legacy Chains

Below we report on the legacy chain types that exist. We will maintain support for these until we are able to create a LCEL
alternative. We report on:

Chain

Name of the chain, or name of the constructor method. If constructor method, this will return aChain subclass.

Function Calling

Whether this requires OpenAI Function Calling.

Other Tools

Other tools used in the chain.

When to Use

Our commentary on when to use.

Functio
Chain n Other Tools When to Use
Calling
This chain uses an LLM to convert a query into an API request,
Requests
APIChain then executes that request, gets back a response, and then
Wrapper
passes that request to an LLM to respond
Similar to APIChain, this chain is designed to interact with APIs.
OpenAPI
OpenAPIEndpointChain The main difference is this is optimized for ease of use with
Spec
OpenAPI endpoints
This chain can be used to have conversations with a document.
It takes in a question and (optional) previous conversation history.
If there is previous conversation history, it uses an LLM to rewrite
ConversationalRetrievalChain Retriever the conversation into a query to send to a retriever (otherwise it
just uses the newest user input). It then fetches those documents
and passes them (along with the conversation) to an LLM to
respond.
This chain takes a list of documents and formats them all into a
prompt, then passes that prompt to an LLM. It passes ALL
StuffDocumentsChain
documents, so you should make sure it fits within the context
window the LLM you are using.
This chain combines documents by iterative reducing them. It
groups documents into chunks (less than some context length)
groups documents into chunks (less than some context length)
Functio then passes them into an LLM. It then takes the responses and
ReduceDocumentsChain
Chain n Other Tools continues to do this until it can
When to Use
fit everything into one final LLM
Calling call. Useful when you have a lot of documents, you want to have
the LLM run over all of them, and you can do in parallel.
This chain first passes each document through an LLM, then
reduces them using the ReduceDocumentsChain. Useful in the
MapReduceDocumentsChain
same situations as ReduceDocumentsChain, but does an initial
LLM call before trying to reduce the documents.
This chain collapses documents by generating an initial answer
based on the first document and then looping over the remaining
documents to refine its answer. This operates sequentially, so it
RefineDocumentsChain cannot be parallelized. It is useful in similar situatations as
MapReduceDocuments Chain, but for cases where you want to
build up an answer by refining the previous answer (rather than
parallelizing calls).
This calls on LLM on each document, asking it to not only answer
but also produce a score of how confident it is. The answer with
the highest confidence is then returned. This is useful when you
MapRerankDocumentsChain
have a lot of documents, but only want to answer based on a
single document, rather than trying to combine answers (like
Refine and Reduce methods do).
This chain answers, then attempts to refine its answer based on
ConstitutionalChain constitutional principles that are provided. Use this when you want
to enforce that a chain’s answer follows some principles.
LLMChain
This chain converts a natural language question to an
ElasticSearch ElasticSearch query, and then runs it, and then summarizes the
ElasticsearchDatabaseChain
Instance response. This is useful for when you want to ask natural
language questions of an Elastic Search database
This implements FLARE, an advanced retrieval technique. It is
FlareChain
primarily meant as an exploratory advanced retrieval method.
This chain constructs an Arango query from natural language,
Arango
ArangoGraphQAChain executes that query against the graph, and then passes the
Graph
results back to an LLM to respond.
A graph that
This chain constructs an Cypher query from natural language,
works with
GraphCypherQAChain executes that query against the graph, and then passes the
Cypher query
results back to an LLM to respond.
language
This chain constructs a FalkorDB query from natural language,
Falkor
FalkorDBGraphQAChain executes that query against the graph, and then passes the
Database
results back to an LLM to respond.
This chain constructs an HugeGraph query from natural
HugeGraphQAChain HugeGraph language, executes that query against the graph, and then
passes the results back to an LLM to respond.
This chain constructs a Kuzu Graph query from natural language,
KuzuQAChain Kuzu Graph executes that query against the graph, and then passes the
results back to an LLM to respond.
This chain constructs a Nebula Graph query from natural
Nebula
NebulaGraphQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
This chain constructs an Neptune Graph query from natural
Neptune
NeptuneOpenCypherQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
Graph that This chain constructs an SparQL query from natural language,
GraphSparqlChain works with executes that query against the graph, and then passes the
SparQL results back to an LLM to respond.
This chain converts a user question to a math problem and then
LLMMath
executes it (using numexpr)
This chain uses a second LLM call to varify its initial answer. Use
LLMCheckerChain this when you to have an extra layer of validation on the initial
LLM call.
This chain creates a summary using a sequence of LLM calls to
make sure it is extra correct. Use this over the normal
LLMSummarizationChecker
summarization chain when you are okay with multiple LLM calls
(eg you care more about accuracy than speed/cost).
Uses OpenAI function calling to answer questions and cite its
create_citation_fuzzy_match_chain ✅
sources.
sources.
create_extraction_chain Functio
✅ Uses OpenAI Function calling to extract information from text.
Chain n Other Tools When to Use
Uses OpenAI function calling to extract information from text into
Calling
create_extraction_chain_pydantic ✅ a Pydantic model. Compared to create_extraction_chain this has a
tighter integration with Pydantic.
OpenAPI
get_openapi_chain ✅ Uses OpenAI function calling to query an OpenAPI.
Spec
Uses OpenAI function calling to do question answering over text
create_qa_with_structure_chain ✅
and respond in a specific format.
create_qa_with_sources_chain ✅ Uses OpenAI function calling to answer questions with citations.
Creates both questions and answers from documents. Can be
QAGenerationChain used to generate question/answer pairs for evaluation of retrieval
projects.
Does question answering over retrieved documents, and cites it
sources. Use this when you want the answer response to have
sources in the text response. Use this over
RetrievalQAWithSourcesChain Retriever
load_qa_with_sources_chain when you want to use a retriever to fetch
the relevant document as part of the chain (rather than pass them
in).
Does question answering over documents you pass in, and cites
it sources. Use this when you want the answer response to have
load_qa_with_sources_chain Retriever sources in the text response. Use this over
RetrievalQAWithSources when you want to pass in the documents
directly (rather than rely on a retriever to get them).
This chain first does a retrieval step to fetch relevant documents,
RetrievalQA Retriever then passes those documents into an LLM to generate a
response.
This chain routes input between multiple prompts. Use this when
MultiPromptChain you have multiple potential prompts you could use to respond and
want to route to just one.
This chain routes input between multiple retrievers. Use this when
MultiRetrievalQAChain Retriever you have multiple potential retrievers you could fetch relevant
documents from and want to route to just one.
EmbeddingRouterChain This chain uses embedding similarity to route incoming queries.
LLMRouterChain This chain uses an LLM to route between potential options.
load_summarize_chain
This chain constructs a URL from user input, gets data at that
LLMRequestsChain URL, and then summarizes the response. Compared to APIChain,
this chain is not focused on a single API spec but is more general

Help us out by providing feedback on this documentation page:

Previous
« Tools as OpenAI Functions
Next
[Beta] Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsConcepts

On this page

Concepts
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.

There are several key components here:

Schema

LangChain has several abstractions to make working with agents easy.

AgentAction

This is a dataclass that represents the action an agent should take. It has a tool property (which is the name of the tool that
should be invoked) and a tool_input property (the input to that tool)

AgentFinish

This represents the ﬁnal result from an agent, when it is ready to return to the user. It contains areturn_values key-value
mapping, which contains the ﬁnal agent output. Usually, this contains an output key containing a string that is the agent's
response.

Intermediate Steps

These represent previous agent actions and corresponding outputs from this CURRENT agent run. These are important to
pass to future iteration so the agent knows what work it has already done. This is typed as a List[Tuple[AgentAction, Any]] . Note
that observation is currently left as type Any to be maximally ﬂexible. In practice, this is often a string.

Agent

This is the chain responsible for deciding what step to take next. This is usually powered by a language model, a prompt, and
an output parser.

Different agents have different prompting styles for reasoning, different ways of encoding inputs, and different ways of
parsing the output. For a full list of built-in agents see agent types. You can also easily build custom agents, should you
need further control.

Agent Inputs

The inputs to an agent are a key-value mapping. There is only one required key:intermediate_steps, which corresponds to
Intermediate Steps as described above.

Generally, the PromptTemplate takes care of transforming these pairs into a format that can best be passed into the LLM.

Agent Outputs

The output is the next action(s) to take or the ﬁnal response to send to the userAgentAction
( s or AgentFinish ). Concretely, this
can be typed as Union[AgentAction, List[AgentAction], AgentFinish] .

The output parser is responsible for taking the raw LLM output and transforming it into one of these three types.
AgentExecutor

The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions it chooses, passes
the action outputs back to the agent, and repeats. In pseudocode, this looks roughly like:

next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action

While this may seem simple, there are several complexities this runtime handles for you, including:

1. Handling cases where the agent selects a non-existent tool

2. Handling cases where the tool errors
3. Handling cases where the agent produces output that cannot be parsed into a tool invocation
4. Logging and observability at all levels (agent decisions, tool calls) to stdout and/or toLangSmith.

Tools

Tools are functions that an agent can invoke. The Tool abstraction consists of two components:

1. The input schema for the tool. This tells the LLM what parameters are needed to call the tool. Without this, it will not
know what the correct inputs are. These parameters should be sensibly named and described.
2. The function to run. This is generally just a Python function that is invoked.

Considerations

There are two important design considerations around tools:

1. Giving the agent access to the right tools

2. Describing the tools in a way that is most helpful to the agent

Without thinking through both, you won't be able to build a working agent. If you don't give the agent access to a correct set
of tools, it will never be able to accomplish the objectives you give it. If you don't describe the tools well, the agent won't know
how to use them properly.

LangChain provides a wide set of built-in tools, but also makes it easy to deﬁne your own (including custom descriptions). For
a full list of built-in tools, see the tools integrations section

Toolkits

For many common tasks, an agent will need a set of related tools. For this LangChain provides the concept of toolkits -
groups of around 3-5 tools needed to accomplish speciﬁc objectives. For example, the GitHub toolkit has a tool for searching
through GitHub issues, a tool for reading a ﬁle, a tool for commenting, etc.

LangChain provides a wide set of toolkits to get started. For a full list of built-in toolkits, see thetoolkits integrations section

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
Agent Types »

Community
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgents

On this page

Agents
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.

Quickstart

For a quick start to working with agents, please check outthis getting started guide. This covers basics like initializing an
agent, creating tools, and adding memory.

Concepts

There are several key concepts to understand when building agents: Agents, AgentExecutor, Tools, Toolkits. For an in depth
explanation, please check out this conceptual guide

Agent Types

There are many diﬀerent types of agents to use. For a overview of the diﬀerent types and when to use them, please check
out this section.

Tools

Agents are only as good as the tools they have. For a comprehensive guide on tools, please seethis section.

How To Guides

Agents have a lot of related functionality! Check out comprehensive guides including:

Building a custom agent

Streaming (of both intermediate steps and tokens
Building an agent that returns structured output
Lots functionality around using AgentExecutor, including: using it as an iterator, handle parsing errors, returning
intermediate steps, capping the max number of iterations, and timeouts for agents

Help us out by providing feedback on this documentation page:

Previous
« Indexing
Next
Quickstart »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangGraph

On this page

️LangGraph
⚡ Building language agents as graphs ⚡

Overview

LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with)
LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across
multiple steps of computation in a cyclic manner. It is inspired by Pregel and Apache Beam. The current interface exposed is
one inspired by NetworkX.

The main use is for adding cycles to your LLM application. Crucially, this is NOT a DAG framework. If you want to build a
DAG, you should just use LangChain Expression Language.

Cycles are important for agent-like behaviors, where you call an LLM in a loop, asking it what action to take next.

Installation

pip install langgraph

Quick Start

Here we will go over an example of creating a simple agent that uses chat models and function calling. This agent will
represent all its state as a list of messages.

We will need to install some LangChain packages, as well asTavily to use as an example tool.

pip install -U langchain langchain_openai tavily-python

We also need to export some environment variables for OpenAI and Tavily API access.

export OPENAI_API_KEY=sk-...
export TAVILY_API_KEY=tvly-...

Optionally, we can set up LangSmith for best-in-class observability.

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=ls__...

Set up the tools

We will ﬁrst deﬁne the tools we want to use. For this simple example, we will use a built-in search tool via Tavily. However, it
is really easy to create your own tools - see documentation here on how to do that.

from langchain_community.tools.tavily_search import TavilySearchResults

tools = [TavilySearchResults(max_results=1)]

We can now wrap these tools in a simple LangGraphToolExecutor . This is a simple class that receives ToolInvocation objects,
calls that tool, and returns the output. ToolInvocation is any class with tool and tool_input attributes.
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tools)

Set up the model

Now we need to load the chat model we want to use. Importantly, this should satisfy two criteria:

1. It should work with lists of messages. We will represent all agent state in the form of messages, so it needs to be able
to work well with them.
2. It should work with the OpenAI function calling interface. This means it should either be an OpenAI model or a model
that exposes a similar interface.

Note: these model requirements are not requirements for using LangGraph - they are just requirements for this one example.

from langchain_openai import ChatOpenAI

# We will set streaming=True so that we can stream tokens

# See the streaming section for more information on this.
model = ChatOpenAI(temperature=0, streaming=True)

After we've done this, we should make sure the model knows that it has these tools available to call. We can do this by
converting the LangChain tools into the format for OpenAI function calling, and then bind them to the model class.

from langchain.tools.render import format_tool_to_openai_function

functions = [format_tool_to_openai_function(t) for t in tools]

model = model.bind_functions(functions)

Deﬁne the agent state

The main type of graph in langgraph is the StatefulGraph. This graph is parameterized by a state object that it passes around to
each node. Each node then returns operations to update that state. These operations can either SET speciﬁc attributes on
the state (e.g. overwrite the existing values) or ADD to the existing attribute. Whether to set or add is denoted by annotating
the state object you construct the graph with.

For this example, the state we will track will just be a list of messages. We want each node to just add messages to that list.
Therefore, we will use a TypedDict with one key (messages) and annotate it so that the messages attribute is always added to.

from typing import TypedDict, Annotated, Sequence

import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]

Deﬁne the nodes

We now need to deﬁne a few diﬀerent nodes in our graph. Inlanggraph, a node can be either a function or a runnable. There
are two main nodes we need for this:

1. The agent: responsible for deciding what (if any) actions to take.
2. A function to invoke tools: if the agent decides to take an action, this node will then execute that action.

We will also need to deﬁne some edges. Some of these edges may be conditional. The reason they are conditional is that
based on the output of a node, one of several paths may be taken. The path that is taken is not known until that node is run
(the LLM decides).

1. Conditional Edge: after the agent is called, we should either:

a. If the agent said to take an action, then the function to invoke tools should be called

b. If the agent said that it was ﬁnished, then it should ﬁnish

2. Normal Edge: after the tools are invoked, it should always go back to the agent to decide what to do next

Let's deﬁne the nodes, as well as a function to decide how what conditional edge to take.
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

# Deﬁne the function that determines whether to continue or not

def should_continue(state):
messages = state['messages']
last_message = messages[-1]
# If there is no function call, then we ﬁnish
if "function_call" not in last_message.additional_kwargs:
return "end"
# Otherwise if there is, we continue
else:
return "continue"

# Deﬁne the function that calls the model

def call_model(state):
messages = state['messages']
response = model.invoke(messages)
# We return a list, because this will get added to the existing list
return {"messages": [response]}

# Deﬁne the function to execute tools

def call_tool(state):
messages = state['messages']
# Based on the continue condition
# we know the last message involves a function call
last_message = messages[-1]
# We construct an ToolInvocation from the function_call
action = ToolInvocation(
tool=last_message.additional_kwargs["function_call"]["name"],
tool_input=json.loads(last_message.additional_kwargs["function_call"]["arguments"]),
)
# We call the tool_executor and get back a response
response = tool_executor.invoke(action)
# We use the response to create a FunctionMessage
function_message = FunctionMessage(content=str(response), name=action.tool)
# We return a list, because this will get added to the existing list
return {"messages": [function_message]}

Deﬁne the graph

We can now put it all together and deﬁne the graph!

from langgraph.graph import StateGraph, END
# Deﬁne a new graph
workﬂow = StateGraph(AgentState)

# Deﬁne the two nodes we will cycle between

workﬂow.add_node("agent", call_model)
workﬂow.add_node("action", call_tool)

# Set the entrypoint as `agent`

# This means that this node is the ﬁrst one called
workﬂow.set_entry_point("agent")

# We now add a conditional edge

workflow.add_conditional_edges(
# First, we define the start node. We use àgent`.
# This means these are the edges taken after the àgent` node is called.
"agent",
# Next, we pass in the function that will determine which node is called next.
should_continue,
# Finally we pass in a mapping.
# The keys are strings, and the values are other nodes.
# END is a special node marking that the graph should finish.
# What will happen is we will call `should_continue`, and then the output of that
# will be matched against the keys in this mapping.
# Based on which one it matches, that node will then be called.
{
# If `tools`, then we call the tool node.
"continue": "action",
# Otherwise we finish.
"end": END
}
)

# We now add a normal edge from `tools` to `agent`.

# This means that after `tools` is called, `agent` node is called next.
workﬂow.add_edge('action', 'agent')

# Finally, we compile it!

# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workﬂow.compile()

Use it!

We can now use it! This now exposes thesame interface as all other LangChain runnables. This runnable accepts a list of
messages.

from langchain_core.messages import HumanMessage

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}

app.invoke(inputs)

This may take a little bit - it's making a few calls behind the scenes. In order to start seeing some intermediate results as they
happen, we can use streaming - see below for more information on that.

Streaming

LangGraph has support for several diﬀerent types of streaming.

Streaming Node Output

One of the beneﬁts of using LangGraph is that it is easy to stream output as it's produced by each node.

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}

for output in app.stream(inputs):
# stream() yields dictionaries with output keyed by node name
for key, value in output.items():
print(f"Output from node '{key}':")
print("---")
print(value)
print("\n---\n")
Output from node 'agent':
---
{'messages': [AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "query": "weather in San Francisco"\n}', 'name': 'tavily_search_results_json'

---

Output from node 'action':

---
{'messages': [FunctionMessage(content="[{'url': 'https://weatherspark.com/h/m/557/2024/1/Historical-Weather-in-January-2024-in-San-Francisco-California-United-St

---

Output from node 'agent':

---
{'messages': [AIMessage(content="I couldn't ﬁnd the current weather in San Francisco. However, you can visit [WeatherSpark](https://weatherspark.com/h/m/557/202

---

Output from node 'end':

---
{'messages': [HumanMessage(content='what is the weather in sf'), AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "query": "weather in S

---

Streaming LLM Tokens

You can also access the LLM tokens as they are produced by each node. In this case only the "agent" node produces LLM
tokens. In order for this to work properly, you must be using an LLM that supports streaming as well as have set it when
constructing the LLM (e.g. ChatOpenAI(model="gpt-3.5-turbo-1106", streaming=True) )

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}

async for output in app.astream_log(inputs, include_types=["llm"]):
# astream_log() yields the requested logs (here LLMs) in JSONPatch format
for op in output.ops:
if op["path"] == "/streamed_output/-":
# this is the output from .stream()
...
elif op["path"].startswith("/logs/") and op["path"].endswith(
"/streamed_output/-"
):
# because we chose to only include LLMs, these are LLM tokens
print(op["value"])
content='' additional_kwargs={'function_call': {'arguments': '', 'name': 'tavily_search_results_json'}}
content='' additional_kwargs={'function_call': {'arguments': '{\n', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' ', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' "', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': 'query', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '":', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' "', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': 'weather', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' in', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' San', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' Francisco', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '"\n', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '}', 'name': ''}}
content=''
content=''
content='I'
content="'m"
content=' sorry'
content=','
content=' but'
content=' I'
content=' couldn'
content="'t"
content=' ﬁnd'
content=' the'
content=' current'
content=' weather'
content=' in'
content=' San'
content=' Francisco'
content='.'
content=' However'
content=','
content=' you'
content=' can'
content=' check'
content=' the'
content=' historical'
content=' historical'
content=' weather'
content=' data'
content=' for'
content=' January'
content=' '
content='202'
content='4'
content=' in'
content=' San'
content=' Francisco'
content=' ['
content='here'
content=']('
content='https'
content='://'
content='we'
content='athers'
content='park'
content='.com'
content='/h'
content='/m'
content='/'
content='557'
content='/'
content='202'
content='4'
content='/'
content='1'
content='/H'
content='istorical'
content='-'
content='Weather'
content='-in'
content='-Jan'
content='uary'
content='-'
content='202'
content='4'
content='-in'
content='-S'
content='an'
content='-F'
content='r'
content='anc'
content='isco'
content='-Cal'
content='ifornia'
content='-'
content='United'
content='-'
content='States'
content=').'
content=''

When to Use

When should you use this versus LangChain Expression Language?

If you need cycles.

Langchain Expression Language allows you to easily deﬁne chains (DAGs) but does not have a good mechanism for adding
in cycles. langgraph adds that syntax.

How-to Guides

These guides show how to use LangGraph in particular ways.

Async

If you are running LangGraph in async workﬂows, you may want to create the nodes to be async by default. For a
walkthrough on how to do that, see this documentation

Streaming Tokens
Sometimes language models take a while to respond and you may want to stream tokens to end users. For a guide on how
to do this, see this documentation

Persistence

LangGraph comes with built-in persistence, allowing you to save the state of the graph at point and resume from there. For a
walkthrough on how to do that, see this documentation

Human-in-the-loop

LangGraph comes with built-in support for human-in-the-loop workﬂows. This is useful when you want to have a human
review the current state before proceeding to a particular node. For a walkthrough on how to do that, see this documentation

Visualizing the graph

Agents you create with LangGraph can be complex. In order to make it easier to understand what is happening under the
hood, we've added methods to print out and visualize the graph. This can create both ascii art as well as pngs. For a
walkthrough on how to do that, see this documentation

Examples

ChatAgentExecutor: with function calling

This agent executor takes a list of messages as input and outputs a list of messages. All agent state is represented as a list
of messages. This speciﬁcally uses OpenAI function calling. This is recommended agent executor for newer chat based
models that support function calling.

Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.

Modiﬁcations

We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build oﬀ the
getting started notebook so it is recommended you start with that ﬁrst.

Human-in-the-loop: How to add a human-in-the-loop component

Force calling a tool first: How to always call a specific tool first
Respond in a specific format: How to force the agent to respond in a specific format
Dynamically returning tool output directly: How to dynamically let the agent choose whether to return the result of a tool
directly to the user
Managing agent steps: How to more explicitly manage intermediate steps that an agent takes

AgentExecutor

This agent executor uses existing LangChain agents.

Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.

Modiﬁcations

We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build oﬀ the
getting started notebook so it is recommended you start with that ﬁrst.

Human-in-the-loop: How to add a human-in-the-loop component

Force calling a tool first: How to always call a specific tool first
Managing agent steps: How to more explicitly manage intermediate steps that an agent takes

Planning Agent Examples

The following notebooks implement agent architectures prototypical of the "plan-and-execute" style, where an LLM planner
decomposes a user request into a program, an executor executes the program, and an LLM synthesizes a response (and/or
dynamically replans) based on the program outputs.

Plan-and-execute: a simple agent with a planner that generates a multi-step task list, an executor that invokes the
tools in the plan, and a replanner that responds or generates an updated plan. Based on thePlan-and-solve paper by
Wang, et. al.
Reasoning without Observation: planner generates a task list whose observations are saved asvariables. Variables
can be used in subsequent tasks to reduce the need for further re-planning. Based on the ReWOO paper by Xu, et. al.
LLMCompiler: planner generates a DAG of tasks with variable responses. Tasks arestreamed and executed eagerly to
minimize tool execution runtime. Based on the paper by Kim, et. al.

Reﬂection / Self-Critique

When output quality is a major concern, it's common to incorporate some combination of self-critique or reﬂection and
external validation to reﬁne your system's outputs. The following examples demonstrate research that implement this type of
design.

Basic Reflection: add a simple "reflect" step in your graph to prompt your system to revise its outputs.
Reflexion: critique missing and superflous aspects of the agent's response to guide subsequent steps. Based on
Reflexion, by Shinn, et. al.
Language Agent Tree Search: execute multiple agents in parallel, using reflection and environmental rewards to drive a
Monte Carlo Tree Search. Based on LATS, by Zhou, et. al.

Multi-agent Examples

Multi-agent collaboration: how to create two agents that work together to accomplish a task
Multi-agent with supervisor: how to orchestrate individual agents by using an LLM as a "supervisor" to distribute work
Hierarchical agent teams: how to orchestrate "teams" of agents as nested graphs that can collaborate to solve a
problem

Web Research

STORM: writing system that generates Wikipedia-style articles on any topic, applying outline generation (planning) +
multi-perspective question-answering for added breadth and reliability. Based on STORM by Shao, et. al.

Chatbot Evaluation via Simulation

It can often be tough to evaluation chat bots in multi-turn situations. One way to do this is with simulations.

Chat bot evaluation as multi-agent simulation: how to simulate a dialogue between a "virtual user" and your chat bot
Evaluating over a dataset: benchmark your assistant over a LangSmith dataset, which tasks a simulated customer to
red-team your chat bot.

Multimodal Examples

WebVoyager: vision-enabled web browsing agent that uses Set-of-marks prompting to navigate a web browser and
execute tasks

Chain-of-Table

Chain of Table is a framework that elicits SOTA performance when answering questions over tabular data.This
implementation by Github user CYQIQ uses LangGraph to control the ﬂow.

Documentation

There are only a few new APIs to use.

StateGraph

The main entrypoint is StateGraph.

from langgraph.graph import StateGraph

This class is responsible for constructing the graph. It exposes an interface inspired byNetworkX. This graph is
parameterized by a state object that it passes around to each node.

__init__
def __init__(self, schema: Type[Any]) -> None:

When constructing the graph, you need to pass in a schema for a state. Each node then returns operations to update that
state. These operations can either SET speciﬁc attributes on the state (e.g. overwrite the existing values) or ADD to the
existing attribute. Whether to set or add is denoted by annotating the state object you construct the graph with.

The recommended way to specify the schema is with a typed dictionary:from typing import TypedDict
You can then annotate the diﬀerent attributes using from typing imoport Annotated . Currently, the only supported annotation is import
operator; operator.add. This annotation will make it so that any node that returns this attribute ADDS that new result to the
existing value.

Let's take a look at an example:

from typing import TypedDict, Annotated, Union

from langchain_core.agents import AgentAction, AgentFinish
import operator

class AgentState(TypedDict):
# The input string
input: str
# The outcome of a given call to the agent
# Needs `None` as a valid type, since this is what this will start as
agent_outcome: Union[AgentAction, AgentFinish, None]
# List of actions and corresponding observations
# Here we annotate this with `operator.add` to indicate that operations to
# this state should be ADDED to the existing values (not overwrite it)
intermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]

We can then use this like:

# Initialize the StateGraph with this state

graph = StateGraph(AgentState)
# Create nodes and edges
...
# Compile the graph
app = graph.compile()

# The inputs should be a dictionary, because the state is a TypedDict

inputs = {
# Let's assume this the input
"input": "hi"
# Let's assume agent_outcome is set by the graph as some point
# It doesn't need to be provided, and it will be None by default
# Let's assume ìntermediate_steps` is built up over time by the graph
# It doesn't need to provided, and it will be empty list by default
# The reason ìntermediate_steps` is an empty list and not `None` is because
# it's annotated with òperator.add`
}

.add_node
def add_node(self, key: str, action: RunnableLike) -> None:

This method adds a node to the graph. It takes two arguments:

key: A string representing the name of the node. This must be unique.
action : The action to take when this node is called. This should either be a function or a runnable.

.add_edge
def add_edge(self, start_key: str, end_key: str) -> None:

Creates an edge from one node to the next. This means that output of the ﬁrst node will be passed to the next node. It takes
two arguments.

start_key: A string representing the name of the start node. This key must have already been registered in the graph.
end_key: A string representing the name of the end node. This key must have already been registered in the graph.

.add_conditional_edges
def add_conditional_edges(
self,
start_key: str,
condition: Callable[..., str],
conditional_edge_mapping: Dict[str, str],
) -> None:

This method adds conditional edges. What this means is that only one of the downstream edges will be taken, and which one
that is depends on the results of the start node. This takes three arguments:

start_key: A string representing the name of the start node. This key must have already been registered in the graph.
condition: A function to call to decide what to do next. The input will be the output of the start node. It should return a
string that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.

.set_entry_point
def set_entry_point(self, key: str) -> None:

The entrypoint to the graph. This is the node that is ﬁrst called. It only takes one argument:

key: The name of the node that should be called ﬁrst.

.set_conditional_entry_point
def set_conditional_entry_point(
self,
condition: Callable[..., str],
conditional_edge_mapping: Optional[Dict[str, str]] = None,
) -> None:

This method adds a conditional entry point. What this means is that when the graph is called, it will call thecondition Callable
to decide what node to enter into ﬁrst.

condition: A function to call to decide what to do next. The input will be the input to the graph. It should return a string
that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.

.set_ﬁnish_point
def set_ﬁnish_point(self, key: str) -> None:

This is the exit point of the graph. When this node is called, the results will be the ﬁnal result from the graph. It only has one
argument:

key: The name of the node that, when called, will return the results of calling it as the ﬁnal output

Note: This does not need to be called if at any point you previously created an edge (conditional or normal) toEND

Graph
from langgraph.graph import Graph

graph = Graph()

This has the same interface as StateGraph with the exception that it doesn't update a state object over time, and rather relies
on passing around the full state from each step. This means that whatever is returned from one node is the input to the next
as is.

END

from langgraph.graph import END

This is a special node representing the end of the graph. This means that anything passed to this node will be the ﬁnal output
of the graph. It can be used in two places:

As the end_key in add_edge

As a value in conditional_edge_mapping as passed to add_conditional_edges

Prebuilt Examples

There are also a few methods we've added to make it easy to use common, prebuilt graphs and components.

ToolExecutor
from langgraph.prebuilt import ToolExecutor

This is a simple helper class to help with calling tools. It is parameterized by a list of tools:

tools = [...]
tool_executor = ToolExecutor(tools)

It then exposes a runnable interface. It can be used to call tools: you can pass in anAgentAction and it will look up the
relevant tool and call it with the appropriate input.
chat_agent_executor.create_function_calling_executor
from langgraph.prebuilt import chat_agent_executor

This is a helper function for creating a graph that works with a chat model that utilizes function calling. Can be created by
passing in a model and a list of tools. The model must be one that supports OpenAI function calling.

from langchain_openai import ChatOpenAI

from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import chat_agent_executor
from langchain_core.messages import HumanMessage

tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()

app = chat_agent_executor.create_function_calling_executor(model, tools)

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}

for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

chat_agent_executor.create_tool_calling_executor
from langgraph.prebuilt import chat_agent_executor

This is a helper function for creating a graph that works with a chat model that utilizes tool calling. Can be created by passing
in a model and a list of tools. The model must be one that supports OpenAI tool calling.

from langchain_openai import ChatOpenAI

from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import chat_agent_executor
from langchain_core.messages import HumanMessage

tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()

app = chat_agent_executor.create_tool_calling_executor(model, tools)

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}

for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

create_agent_executor
from langgraph.prebuilt import create_agent_executor

This is a helper function for creating a graph that works withLangChain Agents. Can be created by passing in an agent and
a list of tools.

from langgraph.prebuilt import create_agent_executor

from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain_community.tools.tavily_search import TavilySearchResults

tools = [TavilySearchResults(max_results=1)]

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/openai-functions-agent")

# Choose the LLM that will drive the agent

llm = ChatOpenAI(model="gpt-3.5-turbo-1106")

# Construct the OpenAI Functions agent

agent_runnable = create_openai_functions_agent(llm, tools, prompt)

app = create_agent_executor(agent_runnable, tools)

inputs = {"input": "what is the weather in sf", "chat_history": []}

for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

Help us out by providing feedback on this documentation page:

Previous
« LangSmith Walkthrough

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage

Blog
YouTube
Token
ModulesMoreCallbackscounting

Token counting
LangChain oﬀers a context manager that allows you to count tokens.

import asyncio

from langchain.callbacks import get_openai_callback

from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm("What is the square root of 4?")

total_tokens = cb.total_tokens
assert total_tokens > 0

with get_openai_callback() as cb:

llm("What is the square root of 4?")
llm("What is the square root of 4?")

assert cb.total_tokens == total_tokens * 2

# You can kick oﬀ concurrent runs from within the context manager
with get_openai_callback() as cb:
await asyncio.gather(
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
)

assert cb.total_tokens == total_tokens * 3

# The context manager is concurrency safe

task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
with get_openai_callback() as cb:
await llm.agenerate(["What is the square root of 4?"])

await task
assert cb.total_tokens == total_tokens

Help us out by providing feedback on this documentation page:

Previous
« Tags
Next
️ LangServe »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage

Blog
YouTube
Model
ModulesI/O LLMs

On this page

LLMs
Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather
provides a standard interface for interacting with many diﬀerent LLMs. To be speciﬁc, this interface is one that takes as input
a string and returns a string.

There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - theLLM class is designed to provide a standard
interface for all of them.

Quick Start

Check out this quick start to get an overview of working with LLMs, including all the diﬀerent methods they expose

Integrations

For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page

How-To Guides

We have several how-to guides for more advanced usage of LLMs. This includes:

How to write a custom LLM class

How to cache LLM responses
How to stream responses from an LLM
How to track token usage in an LLM call

Help us out by providing feedback on this documentation page:

Previous
« Tracking token usage
Next
Quick Start »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Document
ModulesRetrievalloaders PDF

On this page

PDF
Portable Document Format (PDF), standardized as ISO 32000, is a ﬁle format developed by Adobe in 1992 to
present documents, including text formatting and images, in a manner independent of application software,
hardware, and operating systems.

This covers how to load PDF documents into the Document format that we use downstream.

Using PyPDF

Load PDF using pypdf into array of documents, where each document contains the page content and metadata withpage
number.

pip install pypdf

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
pages[0]
Document(page_content='LayoutParser : A Uni\x0ced Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1( \x00), Ruochen Zhang2, Meli

An advantage of this approach is that documents can be retrieved with page numbers.

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

OpenAI API Key: ········
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())

docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:
print(str(doc.metadata["page"]) + ":", doc.page_content[:300])
9: 10 Z. Shen et al.
Fig. 4: Illustration of (a) the original historical Japanese document with layout
detection results and (b) a recreated version of the document image that achieves
much better character recognition recall. The reorganization algorithm rearranges
the tokens based on the their detect
3: 4 Z. Shen et al.
Eﬃcient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detectio
T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y ou

Extracting images

Using the rapidocr-onnxruntime package we can extract images as text as well:

pip install rapidocr-onnxruntime

loader = PyPDFLoader("https://arxiv.org/pdf/2103.15348.pdf", extract_images=True)
pages = loader.load()
pages[4].page_content
'LayoutParser : A Uniﬁed Toolkit for DL-Based DIA 5\nTable 1: Current layout detection models in the LayoutParser model zoo\nDataset Base Model1Large Model No
Using MathPix

Inspired by Daniel Gross's https://gist.github.com/danielgross/3ab4104e14faccc12b49200843adab21

from langchain_community.document_loaders import MathpixPDFLoader

loader = MathpixPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using Unstructured

from langchain_community.document_loaders import UnstructuredPDFLoader

loader = UnstructuredPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Retain Elements

Under the hood, Unstructured creates diﬀerent "elements" for diﬀerent chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".

loader = UnstructuredPDFLoader("example_data/layout-parser-paper.pdf", mode="elements")

data = loader.load()
data[0]
Document(page_content='LayoutParser: A Uniﬁed Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 (�), Ruochen Zhang2, Melissa D

Fetching remote PDFs using Unstructured

This covers how to load online PDFs into a document format that we can use downstream. This can be used for various
online PDF sites such as https://open.umn.edu/opentextbooks/textbooks/ and https://arxiv.org/archive/

Note: all other PDF loaders can also be used to fetch remote PDFs, butOnlinePDFLoader is a legacy function, and works
speciﬁcally with UnstructuredPDFLoader.

from langchain_community.document_loaders import OnlinePDFLoader

loader = OnlinePDFLoader("https://arxiv.org/pdf/2302.03803.pdf")
data = loader.load()
print(data)
[Document(page_content='A WEAK ( k, k ) -LEFSCHETZ THEOREM FOR PROJECTIVE TORIC ORBIFOLDS\n\nWilliam D. Montoya\n\nInstituto de Matem´atica,

Using PyPDFium2

from langchain_community.document_loaders import PyPDFium2Loader

loader = PyPDFium2Loader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using PDFMiner

from langchain_community.document_loaders import PDFMinerLoader

loader = PDFMinerLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using PDFMiner to generate HTML text

This can be helpful for chunking texts semantically into sections as the output html content can be parsed viaBeautifulSoup to
get more structured and rich information about font size, page numbers, PDF headers/footers, etc.

from langchain_community.document_loaders import PDFMinerPDFasHTMLLoader

loader = PDFMinerPDFasHTMLLoader("example_data/layout-parser-paper.pdf")
data = loader.load()[0] # entire PDF is loaded as a single Document
from bs4 import BeautifulSoup
soup = BeautifulSoup(data.page_content,'html.parser')
content = soup.find_all('div')
import re
cur_fs = None
cur_text = ''
snippets = [] # first collect all snippets that have the same font size
for c in content:
sp = c.find('span')
if not sp:
continue
st = sp.get('style')
if not st:
continue
fs = re.findall('font-size:(\d+)px',st)
if not fs:
continue
fs = int(fs[0])
if not cur_fs:
cur_fs = fs
if fs == cur_fs:
cur_text += c.text
else:
snippets.append((cur_text,cur_fs))
cur_fs = fs
cur_text = c.text
snippets.append((cur_text,cur_fs))
# Note: The above logic is very straightforward. One can also add more strategies such as removing duplicate snippets (as
# headers/footers in a PDF appear on multiple pages so if we find duplicates it's safe to assume that it is redundant info)
from langchain.docstore.document import Document
cur_idx = -1
semantic_snippets = []
# Assumption: headings have higher font size than their respective content
for s in snippets:
# if current snippet's font size > previous section's heading => it is a new heading
if not semantic_snippets or s[1] > semantic_snippets[cur_idx].metadata['heading_font']:
metadata={'heading':s[0], 'content_font': 0, 'heading_font': s[1]}
metadata.update(data.metadata)
semantic_snippets.append(Document(page_content='',metadata=metadata))
cur_idx += 1
continue

# if current snippet's font size <= previous section's content => content belongs to the same section (one can also create
# a tree like structure for sub sections if needed but that may require some more thinking and may be data speciﬁc)
if not semantic_snippets[cur_idx].metadata['content_font'] or s[1] <= semantic_snippets[cur_idx].metadata['content_font']:
semantic_snippets[cur_idx].page_content += s[0]
semantic_snippets[cur_idx].metadata['content_font'] = max(s[1], semantic_snippets[cur_idx].metadata['content_font'])
continue

# if current snippet's font size > previous section's content but less than previous section's heading than also make a new
# section (e.g. title of a PDF will have the highest font size but we don't want it to subsume all sections)
metadata={'heading':s[0], 'content_font': 0, 'heading_font': s[1]}
metadata.update(data.metadata)
semantic_snippets.append(Document(page_content='',metadata=metadata))
cur_idx += 1
semantic_snippets[4]
Document(page_content='Recently, various DL models and datasets have been developed for layout analysis\ntasks. The dhSegment [22] utilizes fully convolution

Using PyMuPDF

This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as
returns one document per page.

from langchain_community.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()
data[0]
Document(page_content='LayoutParser: A Uniﬁed Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 (�), Ruochen Zhang2, Melissa D

Additionally, you can pass along any of the options from thePyMuPDF documentation as keyword arguments in the load call,
and it will be pass along to the get_text() call.

PyPDF Directory

Load PDFs from directory

from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader("example_data/")
docs = loader.load()

Using PDFPlumber

Like PyMuPDF, the output Documents contain detailed metadata about the PDF and its pages, and returns one document per
page.

from langchain_community.document_loaders import PDFPlumberLoader

loader = PDFPlumberLoader("example_data/layout-parser-paper.pdf")
data = loader.load()
data[0]
Document(page_content='LayoutParser: A Uniﬁed Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 ((cid:0)), Ruochen Zhang2, Meliss

Using AmazonTextractPDFParser

The AmazonTextractPDFLoader calls the Amazon Textract Service to convert PDFs into a Document structure. The loader
does pure OCR at the moment, with more features like layout support planned, depending on demand. Single and multi-page
documents are supported with up to 3000 pages and 512 MB of size.

For the call to be successful an AWS account is required, similar to theAWS CLI requirements.

Besides the AWS conﬁguration, it is very similar to the other PDF loaders, while also supporting JPEG, PNG and TIFF and
non-native PDF formats.

from langchain_community.document_loaders import AmazonTextractPDFLoader

loader = AmazonTextractPDFLoader("example_data/alejandro_rosalez_sample-small.jpeg")
documents = loader.load()

Help us out by providing feedback on this documentation page:

Previous
« Markdown
Next
Text Splitters »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesassistants

On this page

OpenAI assistants
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions
and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports
three types of tools: Code Interpreter, Retrieval, and Function calling

You can interact with OpenAI Assistants using OpenAI tools or custom tools. When using exclusively OpenAI tools, you can
just invoke the assistant directly and get ﬁnal answers. When using custom tools, you can run the assistant and tool
execution loop using the built-in AgentExecutor or easily write your own executor.

Below we show the diﬀerent ways to interact with Assistants. As a simple example, let’s build a math tutor that can write and
run code.

Using only OpenAI tools

from langchain.agents.openai_assistant import OpenAIAssistantRunnable
interpreter_assistant = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview",
)
output = interpreter_assistant.invoke({"content": "What's 10 - 4 raised to the 2.7"})
output
[ThreadMessage(id='msg_qgxkD5kvkZyl0qOaL4czPFkZ', assistant_id='asst_0T8S7CJuUa4Y4hm1PF6n62v7', content=[MessageContentText(text=Text(annotations

As a LangChain agent with arbitrary tools

Now let’s recreate this functionality using our own tools. For this example we’ll use theE2B sandbox runtime tool.

%pip install --upgrade --quiet e2b duckduckgo-search

import getpass

from langchain.tools import DuckDuckGoSearchRun, E2BDataAnalysisTool

tools = [E2BDataAnalysisTool(api_key=getpass.getpass()), DuckDuckGoSearchRun()]

agent = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant e2b tool",
instructions="You are a personal math tutor. Write and run code to answer math questions. You can also search the internet.",
tools=tools,
model="gpt-4-1106-preview",
as_agent=True,
)

Using AgentExecutor

The OpenAIAssistantRunnable is compatible with the AgentExecutor, so we can pass it in as an agent directly to the
executor. The AgentExecutor handles calling the invoked tools and uploading the tool outputs back to the Assistants API.
Plus it comes with built-in LangSmith tracing.

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools)

agent_executor.invoke({"content": "What's the weather in SF today divided by 2.7"})
{'content': "What's the weather in SF today divided by 2.7",
'output': "The search results indicate that the weather in San Francisco is 67 °F. Now I will divide this temperature by 2.7 and provide you with the result. Please note
'thread_id': 'thread_hcpYI0tfpB9mHa9d95W7nK2B',
'run_id': 'run_qOuVmPXS9xlV3XNPcfP8P9W2'}
LangSmith trace

Custom execution

Or with LCEL we can easily write our own execution loop for running the assistant. This gives us full control over execution.

agent = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant e2b tool",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=tools,
model="gpt-4-1106-preview",
as_agent=True,
)
from langchain_core.agents import AgentFinish

def execute_agent(agent, tools, input):

tool_map = {tool.name: tool for tool in tools}
response = agent.invoke(input)
while not isinstance(response, AgentFinish):
tool_outputs = []
for action in response:
tool_output = tool_map[action.tool].invoke(action.tool_input)
print(action.tool, action.tool_input, tool_output, end="\n\n")
tool_outputs.append(
{"output": tool_output, "tool_call_id": action.tool_call_id}
)
response = agent.invoke(
{
"tool_outputs": tool_outputs,
"run_id": action.run_id,
"thread_id": action.thread_id,
}
)

return response
response = execute_agent(agent, tools, {"content": "What's 10 - 4 raised to the 2.7"})
print(response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7\nprint(result)'} {"stdout": "-32.22425314473263", "stderr": "", "artifacts": []}

$ 10 - 4^{2.7} $ equals approximately -32.224.

Using existing Thread

To use an existing thread we just need to pass the “thread_id” in when invoking the agent.

next_response = execute_agent(
agent,
tools,
{"content": "now add 17.241", "thread_id": response.return_values["thread_id"]},
)
print(next_response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7 + 17.241\nprint(result)'} {"stdout": "-14.983253144732629", "stderr": "", "artifacts": []}

$ 10 - 4^{2.7} + 17.241 $ equals approximately -14.983.

Using existing Assistant

To use an existing Assistant we can initialize the OpenAIAssistantRunnable directly with an assistant_id .

agent = OpenAIAssistantRunnable(assistant_id="<ASSISTANT_ID>", as_agent=True)

Help us out by providing feedback on this documentation page:

Previous
« Self-ask with search
Next
Custom agent »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreCallbacksTags

Tags
You can add tags to your callbacks by passing atags argument to the call()/run()/apply() methods. This is useful for filtering your
logs, e.g. if you want to log all requests made to a specific LLMChain, you can add a tag, and then filter your logs by that tag.
You can pass tags to both constructor and request callbacks, see the examples above for details. These tags are then
passed to the tags argument of the "start" callback methods, ie. on_llm_start, on_chat_model_start, on_chain_start, on_tool_start.

Help us out by providing feedback on this documentation page:

Previous
« Multiple callback handlers
Next
Token counting »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Get startedQuickstart

On this page

Quickstart
In this quickstart we'll show you how to:

Get setup with LangChain, LangSmith and LangServe

Use the most basic and common components of LangChain: prompt templates, models, and output parsers
Use LangChain Expression Language, the protocol that LangChain is built on and which facilitates component chaining
Build a simple application with LangChain
Trace your application with LangSmith
Serve your application with LangServe

That's a fair amount to cover! Let's dive in.

Setup

Jupyter Notebook

This guide (and most of the other guides in the documentation) useJupyter notebooks and assume the reader is as well.
Jupyter notebooks are perfect for learning how to work with LLM systems because often times things can go wrong
(unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better
understand them.

You do not NEED to go through the guide in a Jupyter Notebook, but it is recommended. Seehere for instructions on how to
install.

Installation

To install LangChain run:

Pip
Conda

pip install langchain

For more details, see our Installation guide.

LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these
applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain
or agent. The best way to do this is with LangSmith.

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above,
make sure to set your environment variables to start logging traces:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="..."

Building with LangChain

LangChain enables building application that connect external sources of data and computation to LLMs. In this quickstart, we
will walk through a few diﬀerent ways of doing that. We will start with a simple LLM chain, which just relies on information in
the prompt template to respond. Next, we will build a retrieval chain, which fetches data from a separate database and
passes that into the prompt template. We will then add in chat history, to create a conversation retrieval chain. This allows
you to interact in a chat manner with this LLM, so it remembers previous questions. Finally, we will build an agent - which
utilizes an LLM to determine whether or not it needs to fetch data to answer questions. We will cover these at a high level,
but there are lot of details to all of these! We will link to relevant docs.

LLM Chain

We'll show how to use models available via API, like OpenAI, and local open source models, using integrations like Ollama.

OpenAI
Local (using Ollama)
Anthropic
Cohere

First we'll need to import the LangChain x OpenAI integration package.

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

We can then initialize the model:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_key="...")

Once you've installed and initialized the LLM of your choice, we can try using it! Let's ask it what LangSmith is - this is
something that wasn't present in the training data so it shouldn't have a very good response.

llm.invoke("how can langsmith help with testing?")

We can also guide it's response with a prompt template. Prompt templates are used to convert raw user input to a better
input to the LLM.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
("system", "You are world class technical documentation writer."),
("user", "{input}")
])

We can now combine these into a simple LLM chain:

chain = prompt | llm

We can now invoke it and ask the same question. It still won't know the answer, but it should respond in a more proper tone
for a technical writer!

chain.invoke({"input": "how can langsmith help with testing?"})

The output of a ChatModel (and therefore, of this chain) is a message. However, it's often much more convenient to work with
strings. Let's add a simple output parser to convert the chat message to a string.

from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

We can now add this to the previous chain:

chain = prompt | llm | output_parser

We can now invoke it and ask the same question. The answer will now be a string (rather than a ChatMessage).
chain.invoke({"input": "how can langsmith help with testing?"})

Diving Deeper

We've now successfully set up a basic LLM chain. We only touched on the basics of prompts, models, and output parsers -
for a deeper dive into everything mentioned here, see this section of documentation.

Retrieval Chain

In order to properly answer the original question ("how can langsmith help with testing?"), we need to provide additional
context to the LLM. We can do this via retrieval. Retrieval is useful when you have too much data to pass to the LLM
directly. You can then use a retriever to fetch only the most relevant pieces and pass those in.

In this process, we will look up relevant documents from aRetriever and then pass them into the prompt. A Retriever can be
backed by anything - a SQL table, the internet, etc - but in this instance we will populate a vector store and use that as a
retriever. For more information on vectorstores, see this documentation.

First, we need to load the data that we want to index. In order to do this, we will use the WebBaseLoader. This requires
installing BeautifulSoup:

pip install beautifulsoup4

After that, we can import and use WebBaseLoader.

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()

Next, we need to index it into a vectorstore. This requires a few components, namely anembedding model and a vectorstore.

For embedding models, we once again provide examples for accessing via API or by running local models.

OpenAI (API)
Local (using Ollama)
Cohere (API)

Make sure you have the `langchain_openai` package installed an the appropriate environment variables set (these are the
same as needed for the LLM).
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

Now, we can use this embedding model to ingest documents into a vectorstore. We will use a simple local vectorstore,
FAISS, for simplicity's sake.

First we need to install the required packages for that:

pip install faiss-cpu

Then we can build our index:

from langchain_community.vectorstores import FAISS

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

Now that we have this data indexed in a vectorstore, we will create a retrieval chain. This chain will take an incoming
question, look up relevant documents, then pass those documents along with the original question into an LLM and ask it to
answer the original question.

First, let's set up the chain that takes a question and the retrieved documents and generates an answer.
from langchain.chains.combine_documents import create_stuﬀ_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuﬀ_documents_chain(llm, prompt)

If we wanted to, we could run this ourselves by passing in documents directly:

from langchain_core.documents import Document

document_chain.invoke({
"input": "how can langsmith help with testing?",
"context": [Document(page_content="langsmith can let you visualize test results")]
})

However, we want the documents to ﬁrst come from the retriever we just set up. That way, for a given question we can use
the retriever to dynamically select the most relevant documents and pass those in.

from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

We can now invoke this chain. This returns a dictionary - the response from the LLM is in theanswer key

response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})

print(response["answer"])

# LangSmith oﬀers several features that can help with testing:...

This answer should be much more accurate!

Diving Deeper

We've now successfully set up a basic retrieval chain. We only touched on the basics of retrieval - for a deeper dive into
everything mentioned here, see this section of documentation.

Conversation Retrieval Chain

The chain we've created so far can only answer single questions. One of the main types of LLM applications that people are
building are chat bots. So how do we turn this chain into one that can answer follow up questions?

We can still use the create_retrieval_chain function, but we need to change two things:

1. The retrieval method should now not just work on the most recent input, but rather should take the whole history into
account.
2. The ﬁnal LLM chain should likewise take the whole history into account

Updating Retrieval

In order to update retrieval, we will create a new chain. This chain will take in the most recent inputinput
( ) and the
conversation history (chat_history) and use an LLM to generate a search query.

from langchain.chains import create_history_aware_retriever

from langchain_core.prompts import MessagesPlaceholder

# First we need a prompt that we can pass into an LLM to generate this search query

prompt = ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

We can test this out by passing in an instance where the user is asking a follow up question.
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]

retriever_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

You should see that this returns documents about testing in LangSmith. This is because the LLM generated a new query,
combining the chat history with the follow up question.

Now that we have this new retriever, we can create a new chain to continue the conversation with these retrieved documents
in mind.

prompt = ChatPromptTemplate.from_messages([
("system", "Answer the user's questions based on the below context:\n\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
])
document_chain = create_stuﬀ_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

We can now test this out end-to-end:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]

retrieval_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

We can see that this gives a coherent answer - we've successfully turned our retrieval chain into a chatbot!

Agent

We've so far create examples of chains - where each step is known ahead of time. The ﬁnal thing we will create is an agent -
where the LLM decides what steps to take.

NOTE: for this example we will only show how to create an agent using OpenAI models, as local models are not
reliable enough yet.

One of the ﬁrst things to do when building an agent is to decide what tools it should have access to. For this example, we will
give the agent access to two tools:

1. The retriever we just created. This will let it easily answer questions about LangSmith
2. A search tool. This will let it easily answer questions that require up to date information.

First, let's set up a tool for the retriever we just created:

from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)

The search tool that we will use is Tavily. This will require an API key (they have generous free tier). After creating it on their
platform, you need to set it as an environment variable:

export TAVILY_API_KEY=...

If you do not want to set up an API key, you can skip creating this tool.

from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults()

We can now create a list of the tools we want to work with:

tools = [retriever_tool, search]

Now that we have the tools, we can create an agent to use them. We will go over this pretty quickly - for a deeper dive into
what exactly is going on, check out the Agent's Getting Started documentation
Install langchain hub ﬁrst

pip install langchainhub

Now we can use it to get a predeﬁned prompt

from langchain_openai import ChatOpenAI

from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

We can now invoke the agent and see how it responds! We can ask it questions about LangSmith:

agent_executor.invoke({"input": "how can langsmith help with testing?"})

We can ask it about the weather:

agent_executor.invoke({"input": "what is the weather in SF?"})

We can have conversations with it:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]

agent_executor.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

Diving Deeper

We've now successfully set up a basic agent. We only touched on the basics of agents - for a deeper dive into everything
mentioned here, see this section of documentation.

Serving with LangServe

Now that we've built an application, we need to serve it. That's where LangServe comes in. LangServe helps developers
deploy LangChain chains as a REST API. You do not need to use LangServe to use LangChain, but in this guide we'll show
how you can deploy your app with LangServe.

While the ﬁrst part of this guide was intended to be run in a Jupyter Notebook, we will now move out of that. We will be
creating a Python ﬁle and then interacting with it from the command line.

Install with:

pip install "langserve[all]"

Server

To create a server for our application we'll make aserve.py ﬁle. This will contain our logic for serving our application. It consists
of three things:

1. The deﬁnition of our chain that we just built above

2. Our FastAPI app
3. A deﬁnition of a route from which to serve the chain, which is done withlangserve.add_routes
#!/usr/bin/env python
from typing import List

from fastapi import FastAPI

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor
from langchain.pydantic_v1 import BaseModel, Field
from langchain_core.messages import BaseMessage
from langserve import add_routes

# 1. Load Retriever
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()

# 2. Create Tools
retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)
search = TavilySearchResults()
tools = [retriever_tool, search]

# 3. Create Agent
prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# 4. App deﬁnition
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple API server using LangChain's Runnable interfaces",
)

# 5. Adding chain route

# We need to add these input/output schemas because the current AgentExecutor

# is lacking in schemas.

class Input(BaseModel):
input: str
chat_history: List[BaseMessage] = Field(
...,
extra={"widget": {"type": "chat", "input": "location"}},
)

class Output(BaseModel):
output: str

add_routes(
app,
agent_executor.with_types(input_type=Input, output_type=Output),
path="/agent",
)

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="localhost", port=8000)

And that's it! If we execute this ﬁle:

python serve.py

we should see our chain being served at localhost:8000.

Playground

Every LangServe service comes with a simple built-in UI for conﬁguring and invoking the application with streaming output
and visibility into intermediate steps. Head to http://localhost:8000/agent/playground/ to try it out! Pass in the same question
as before - "how can langsmith help with testing?" - and it should respond same as before.

Client

Now let's set up a client for programmatically interacting with our service. We can easily do this with the
[langserve.RemoteRunnable](/docs/langserve#client). Using this, we can interact with the served chain as if it were running client-side.

from langserve import RemoteRunnable

remote_chain = RemoteRunnable("http://localhost:8000/agent/")
remote_chain.invoke({
"input": "how can langsmith help with testing?",
"chat_history": [] # Providing an empty list as this is the ﬁrst call
})

To learn more about the many other features of LangServehead here.

Next steps

We've touched on how to build an application with LangChain, how to trace it with LangSmith, and how to serve it with
LangServe. There are a lot more features in all three of these than we can cover here. To continue on your journey, we
recommend you read the following (in order):

All of these features are backed by LangChain Expression Language (LCEL) - a way to chain these components
together. Check out that documentation to better understand how to create custom chains.
Model IO covers more details of prompts, LLMs, and output parsers.
Retrieval covers more details of everything related to retrieval
Agents covers details of everything related to agents
Explore common end-to-end use cases and template applications
Read up on LangSmith, the platform for debugging, testing, monitoring and more
Learn more about serving your applications with LangServe

Help us out by providing feedback on this documentation page:

Previous
« Installation
Next
Security »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage

Blog
YouTube
LangChain Expression
Language

LangChain Expression Language (LCEL)

LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together. LCEL was designed from
day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the
most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of
the reasons you might want to use LCEL:

Streaming support When you build your chains with LCEL you get the best possible time-to-ﬁrst-token (time elapsed until
the ﬁrst chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming
output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw
tokens.

Async support Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while
prototyping) as well as with the asynchronous API (eg. in a LangServe server). This enables using the same code for
prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.

Optimized parallel execution Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch
documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest
possible latency.

Retries and fallbacks Conﬁgure retries and fallbacks for any part of your LCEL chain. This is a great way to make your
chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the
added reliability without any latency cost.

Access intermediate results For more complex chains it’s often very useful to access the results of intermediate steps even
before the ﬁnal output is produced. This can be used to let end-users know something is happening, or even just to debug
your chain. You can stream intermediate results, and it’s available on every LangServe server.

Input and output schemas Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred
from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.

Seamless LangSmith tracing integration As your chains get more and more complex, it becomes increasingly important to
understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for
maximum observability and debuggability.

Seamless LangServe deployment integration Any chain created with LCEL can be easily deployed usingLangServe.

Help us out by providing feedback on this documentation page:

Previous
« Security
Next
Get started »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language Interface

On this page

Interface
To make it as easy as possible to create custom chains, we’ve implemented a“Runnable” protocol. The Runnable protocol is
implemented for most components. This is a standard interface, which makes it easy to deﬁne custom chains as well as
invoke them in a standard way. The standard interface includes:

stream: stream back chunks of the response

invoke : call the chain on an input
batch: call the chain on a list of inputs

These also have corresponding async methods:

astream: stream back chunks of the response async

ainvoke: call the chain on an input async
abatch: call the chain on a list of inputs async
astream_log: stream back intermediate steps as they happen, in addition to the ﬁnal response
astream_events: beta stream events as they happen in the chain (introduced inlangchain-core 0.1.14)

The input type and output type varies by component:

Component Input Type Output Type

Prompt Dictionary PromptValue
Single string, list of chat messages or a
ChatModel ChatMessage
PromptValue
Single string, list of chat messages or a
LLM String
PromptValue
Depends on the
OutputParser The output of an LLM or ChatModel
parser
Retriever Single string List of Documents
Tool Single string or dictionary, depending on the tool Depends on the tool

All runnables expose input and output schemas to inspect the inputs and outputs: - input_schema: an input Pydantic model
auto-generated from the structure of the Runnable - output_schema: an output Pydantic model auto-generated from the structure
of the Runnable

Let’s take a look at these methods. To do so, we’ll create a super simple PromptTemplate + ChatModel chain.

%pip install –upgrade –quiet langchain-core langchain-community langchain-openai

from langchain_core.prompts import ChatPromptTemplate

from langchain_openai import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
chain = prompt | model

Input Schema

A description of the inputs accepted by a Runnable. This is a Pydantic model dynamically generated from the structure of any
Runnable. You can call .schema() on it to obtain a JSONSchema representation.

# The input schema of the chain is the input schema of its first part, the prompt.
chain.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
prompt.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
model.input_schema.schema()
{'title': 'ChatOpenAIInput',
'anyOf': [{'type': 'string'},
{'$ref': '#/definitions/StringPromptValue'},
{'$ref': '#/definitions/ChatPromptValueConcrete'},
{'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}}],
'definitions': {'StringPromptValue': {'title': 'StringPromptValue',
'description': 'String prompt value.',
'type': 'object',
'properties': {'text': {'title': 'Text', 'type': 'string'},
'type': {'title': 'Type',
'default': 'StringPromptValue',
'enum': ['StringPromptValue'],
'type': 'string'}},
'required': ['text']},
'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']},
'ChatPromptValueConcrete': {'title': 'ChatPromptValueConcrete',
'description': 'Chat prompt value which explicitly lists out the message types it accepts.\nFor use in external schemas.',
'type': 'object',
'properties': {'messages': {'title': 'Messages',
'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}},
'type': {'title': 'Type',
'default': 'ChatPromptValueConcrete',
'enum': ['ChatPromptValueConcrete'],
'type': 'string'}},
'required': ['messages']}}}

Output Schema

A description of the outputs produced by a Runnable. This is a Pydantic model dynamically generated from the structure of
any Runnable. You can call .schema() on it to obtain a JSONSchema representation.

# The output schema of the chain is the output schema of its last part, in this case a ChatModel, which outputs a ChatMessage
chain.output_schema.schema()
{'title': 'ChatOpenAIOutput',
'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}],
'definitions': {'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']}}}

Stream

for s in chain.stream({"topic": "bears"}):

print(s.content, end="", ﬂush=True)
Sure, here's a bear-themed joke for you:

Why don't bears wear shoes?

Because they already have bear feet!

Invoke
chain.invoke({"topic": "bears"})
AIMessage(content="Why don't bears wear shoes? \n\nBecause they have bear feet!")

Batch

chain.batch([{"topic": "bears"}, {"topic": "cats"}])

[AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they already have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!")]

You can set the number of concurrent requests by using themax_concurrency parameter

chain.batch([{"topic": "bears"}, {"topic": "cats"}], conﬁg={"max_concurrency": 5})

[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild? Too many cheetahs!")]

Async Stream

async for s in chain.astream({"topic": "bears"}):

print(s.content, end="", ﬂush=True)
Why don't bears wear shoes?

Because they have bear feet!

Async Invoke

await chain.ainvoke({"topic": "bears"})

AIMessage(content="Why don't bears ever wear shoes?\n\nBecause they already have bear feet!")

Async Batch

await chain.abatch([{"topic": "bears"}])

[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!")]

Async Stream Events (beta)

Event Streaming is a beta API, and may change a bit based on feedback.

Note: Introduced in langchain-core 0.2.0

For now, when using the astream_events API, for everything to work properly please:

Use async throughout the code (including async tools etc)

Propagate callbacks if deﬁning custom functions / runnables.
Whenever using runnables without LCEL, make sure to call.astream() on LLMs rather than .ainvoke to force the LLM to
stream tokens.

Event Reference

Here is a reference table that shows some events that might be emitted by the various Runnable objects. Deﬁnitions for
some of the Runnable are included after the table.

⚠️ When streaming the inputs for the runnable will not be available until the input stream has been entirely consumed This
means that the inputs will be available at for the corresponding end hook rather than start event.
event name chunk input output
{“messages”:
on_chat_model_start [model name] [[SystemMessage,
HumanMessage]]}
on_chat_model_stream [model name] AIMessageChunk(content=“hello”)
{“messages”:
{“generations”: […],
on_chat_model_end [model name] [[SystemMessage,
“llm_output”: None, …}
HumanMessage]]}
on_llm_start [model name] {‘input’: ‘hello’}
on_llm_stream [model name] ‘Hello’
on_llm_end [model name] ‘Hello human!’
on_chain_start format_docs
on_chain_stream format_docs “hello world!, goodbye world!”
“hello world!, goodbye
on_chain_end format_docs [Document(…)]
world!”
on_tool_start some_tool {“x”: 1, “y”: “2”}
on_tool_stream some_tool {“x”: 1, “y”: “2”}
on_tool_end some_tool {“x”: 1, “y”: “2”}
on_retriever_start [retriever name] {“query”: “hello”}
on_retriever_chunk [retriever name] {documents: […]}
on_retriever_end [retriever name] {“query”: “hello”} {documents: […]}
on_prompt_start [template_name] {“question”: “hello”}
ChatPromptValue(messages:
on_prompt_end [template_name] {“question”: “hello”}
[SystemMessage, …])

Here are declarations associated with the events shown above:

format_docs:

def format_docs(docs: List[Document]) -> str:

'''Format the docs.'''
return ", ".join([doc.page_content for doc in docs])

format_docs = RunnableLambda(format_docs)

some_tool:

@tool
def some_tool(x: int, y: str) -> dict:
'''Some_tool.'''
return {"x": x, "y": y}

prompt:

template = ChatPromptTemplate.from_messages(
[("system", "You are Cat Agent 007"), ("human", "{question}")]
).with_conﬁg({"run_name": "my_template", "tags": ["my_template"]})

Let’s deﬁne a new chain to make it more interesting to show oﬀ theastream_events interface (and later the astream_log interface).
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

retrieval_chain = (
{
"context": retriever.with_conﬁg(run_name="Docs"),
"question": RunnablePassthrough(),
}
| prompt
| model.with_conﬁg(run_name="my_llm")
| StrOutputParser()
)

Now let’s use astream_events to get events from the retriever and the LLM.

async for event in retrieval_chain.astream_events(

"where did harrison work?", version="v1", include_names=["Docs", "my_llm"]
):
kind = event["event"]
if kind == "on_chat_model_stream":
print(event["data"]["chunk"].content, end="|")
elif kind in {"on_chat_model_start"}:
print()
print("Streaming LLM:")
elif kind in {"on_chat_model_end"}:
print()
print("Done streaming LLM.")
elif kind == "on_retriever_end":
print("--")
print("Retrieved the following documents:")
print(event["data"]["output"]["documents"])
elif kind == "on_tool_end":
print(f"Ended tool: {event['name']}")
else:
pass
/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.
warn_beta(

--
Retrieved the following documents:
[Document(page_content='harrison worked at kensho')]

Streaming LLM:
|H|arrison| worked| at| Kens|ho|.||
Done streaming LLM.

Async Stream Intermediate Steps

All runnables also have a method .astream_log() which is used to stream (as they happen) all or part of the intermediate steps
of your chain/sequence.

This is useful to show progress to the user, to use intermediate results, or to debug your chain.

You can stream all steps (default) or include/exclude steps by name, tags or metadata.

This method yields JSONPatch ops that when applied in the same order as received build up the RunState.
class LogEntry(TypedDict):
id: str
"""ID of the sub-run."""
name: str
"""Name of the object being run."""
type: str
"""Type of the object being run, eg. prompt, chain, llm, etc."""
tags: List[str]
"""List of tags for the run."""
metadata: Dict[str, Any]
"""Key-value pairs of metadata for the run."""
start_time: str
"""ISO-8601 timestamp of when the run started."""

streamed_output_str: List[str]
"""List of LLM tokens streamed by this run, if applicable."""
final_output: Optional[Any]
"""Final output of this run.
Only available after the run has finished successfully."""
end_time: Optional[str]
"""ISO-8601 timestamp of when the run ended.
Only available after the run has finished."""

class RunState(TypedDict):
id: str
"""ID of the run."""
streamed_output: List[Any]
"""List of output chunks streamed by Runnable.stream()"""
ﬁnal_output: Optional[Any]
"""Final output of the run, usually the result of aggregating (`+`) streamed_output.
Only available after the run has ﬁnished successfully."""

logs: Dict[str, LogEntry]

"""Map of run names to sub-runs. If ﬁlters were supplied, this list will
contain only the runs that matched the ﬁlters."""

Streaming JSONPatch chunks

This is useful eg. to stream the JSONPatch in an HTTP server, and then apply the ops on the client to rebuild the run state
there. See LangServe for tooling to make it easier to build a webserver from any Runnable.

async for chunk in retrieval_chain.astream_log(

"where did harrison work?", include_names=["Docs"]
):
print("-" * 40)
print(chunk)
----------------------------------------
RunLogPatch({'op': 'replace',
'path': '',
'value': {'final_output': None,
'id': '82e9b4b1-3dd6-4732-8db9-90e79c4da48c',
'logs': {},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'}})
----------------------------------------
RunLogPatch({'op': 'add',
'path': '/logs/Docs',
'value': {'end_time': None,
'final_output': None,
'id': '9206e94a-57bd-48ee-8c5e-fdd1c52a6da2',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:55.902+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}})
----------------------------------------
RunLogPatch({'op': 'add',
'path': '/logs/Docs/final_output',
'value': {'documents': [Document(page_content='harrison worked at kensho')]}},
{'op': 'add',
'path': '/logs/Docs/end_time',
'value': '2024-01-19T22:33:56.064+00:00'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ''},
{'op': 'replace', 'path': '/final_output', 'value': ''})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'H'},
{'op': 'replace', 'path': '/final_output', 'value': 'H'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'arrison'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' worked'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' at'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked at'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Kens'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked at Kens'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'ho'},
{'op': 'replace',
'path': '/final_output',
'value': 'Harrison worked at Kensho'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'},
{'op': 'replace',
'path': '/final_output',
'value': 'Harrison worked at Kensho.'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ''})

Streaming the incremental RunState

You can simply pass diﬀ=False to get incremental values of RunState. You get more verbose output with more repetitive parts.

async for chunk in retrieval_chain.astream_log(

"where did harrison work?", include_names=["Docs"], diff=False
):
print("-" * 70)
print(chunk)
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': None,
'final_output': None,
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': '',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [''],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'H',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kens',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens', 'ho'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho.',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens', 'ho', '.'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho.',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['',
'H',
'arrison',
' worked',
' at',
' Kens',
'ho',
'.',
''],
'type': 'chain'})

Parallelism

Let’s take a look at how LangChain Expression Language supports parallel requests. For example, when using a
RunnableParallel (often written as a dictionary) it executes each element in parallel.

from langchain_core.runnables import RunnableParallel

chain1 = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model

chain2 = (
ChatPromptTemplate.from_template("write a short (2 line) poem about {topic}")
| model
)
combined = RunnableParallel(joke=chain1, poem=chain2)
%%time
chain1.invoke({"topic": "bears"})
CPU times: user 18 ms, sys: 1.27 ms, total: 19.3 ms
Wall time: 692 ms
AIMessage(content="Why don't bears wear shoes?\n\nBecause they already have bear feet!")
%%time
chain2.invoke({"topic": "bears"})
CPU times: user 10.5 ms, sys: 166 µs, total: 10.7 ms
Wall time: 579 ms
AIMessage(content="In forest's embrace,\nMajestic bears pace.")
%%time
combined.invoke({"topic": "bears"})
CPU times: user 32 ms, sys: 2.59 ms, total: 34.6 ms
Wall time: 816 ms
{'joke': AIMessage(content="Sure, here's a bear-related joke for you:\n\nWhy did the bear bring a ladder to the bar?\n\nBecause he heard the drinks were on the hou
'poem': AIMessage(content="In wilderness they roam,\nMajestic strength, nature's throne.")}

Parallelism on batches

Parallelism can be combined with other runnables. Let’s try to use parallelism with batches.

%%time
chain1.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 17.3 ms, sys: 4.84 ms, total: 22.2 ms
Wall time: 628 ms
[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!")]
%%time
chain2.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 15.8 ms, sys: 3.83 ms, total: 19.7 ms
Wall time: 718 ms
[AIMessage(content='In the wild, bears roam,\nMajestic guardians of ancient home.'),
AIMessage(content='Whiskers grace, eyes gleam,\nCats dance through the moonbeam.')]
%%time
combined.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 44.8 ms, sys: 3.17 ms, total: 48 ms
Wall time: 721 ms
[{'joke': AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="Majestic bears roam,\nNature's strength, beauty shown.")},
{'joke': AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!"),
'poem': AIMessage(content="Whiskers dance, eyes aglow,\nCats embrace the night's gentle ﬂow.")}]

Help us out by providing feedback on this documentation page:

Previous
« Why use LCEL
Next
Streaming »

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage

Blog

YouTube
Memory Backed by a Vector
ModulesMoreMemorytypes Store

On this page

Backed by a Vector Store

VectorStoreRetrieverMemory stores memories in a vector store and queries the top-K most "salient" docs every time it is called.

This diﬀers from most of the other Memory classes in that it doesn't explicitly track the order of interactions.

In this case, the "docs" are previous conversation snippets. This can be useful to refer to relevant pieces of information that
the AI was told earlier in the conversation.

from datetime import datetime

from langchain_openai import OpenAIEmbeddings
from langchain_openai import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

Initialize your vector store

Depending on the store you choose, this step may look diﬀerent. Consult the relevant vector store documentation for more
details.

import faiss

from langchain.docstore import InMemoryDocstore

from langchain_community.vectorstores import FAISS

embedding_size = 1536 # Dimensions of the OpenAIEmbeddings

index = faiss.IndexFlatL2(embedding_size)
embedding_fn = OpenAIEmbeddings().embed_query
vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {})

Create your VectorStoreRetrieverMemory

The memory object is instantiated from any vector store retriever.

# In actual usage, you would set `k` to be a higher value, but we use k=1 to show that
# the vector lookup still returns the semantically relevant information
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)

# When added to an agent, the memory object can save pertinent information from conversations or used tools
memory.save_context({"input": "My favorite food is pizza"}, {"output": "that's good to know"})
memory.save_context({"input": "My favorite sport is soccer"}, {"output": "..."})
memory.save_context({"input": "I don't the Celtics"}, {"output": "ok"}) #
print(memory.load_memory_variables({"prompt": "what sport should i watch?"})["history"])
input: My favorite sport is soccer
output: ...

Using in a chain

Let's walk through an example, again setting verbose=True so we can see the prompt.
llm = OpenAI(temperature=0) # Can be any valid LLM
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its cont

Relevant pieces of previous conversation:

{history}

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)
conversation_with_summary = ConversationChain(
llm=llm,
prompt=PROMPT,
memory=memory,
verbose=True
)
conversation_with_summary.predict(input="Hi, my name is Perry, what's up?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know

Relevant pieces of previous conversation:

input: My favorite food is pizza
output: that's good to know

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: Hi, my name is Perry, what's up?
AI:

> Finished chain.

" Hi Perry, I'm doing well. How about you?"

# Here, the basketball related content is surfaced

conversation_with_summary.predict(input="what's my favorite sport?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know

Relevant pieces of previous conversation:

input: My favorite sport is soccer
output: ...

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: what's my favorite sport?
AI:

> Finished chain.

' You told me earlier that your favorite sport is soccer.'

# Even though the language model is stateless, since relevant memory is fetched, it can "reason" about the time.
# Timestamping memories and data is useful in general to let the agent determine temporal relevance
conversation_with_summary.predict(input="Whats my favorite food")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know

Relevant pieces of previous conversation:

input: My favorite food is pizza
output: that's good to know

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: Whats my favorite food
AI:

> Finished chain.

' You said your favorite food is pizza.'

# The memories from the conversation are automatically stored,

# since this query best matches the introduction chat above,
# the agent is able to 'remember' the user's name.
conversation_with_summary.predict(input="What's my name?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know

Relevant pieces of previous conversation:

input: Hi, my name is Perry, what's up?
response: Hi Perry, I'm doing well. How about you?

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: What's my name?
AI:

> Finished chain.

' Your name is Perry.'

Help us out by providing feedback on this documentation page:

Previous
« Conversation Token Buﬀer
Next
[Beta] Memory »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableBranch: Dynamically route logic based on
Language to input

On this page

Dynamically route logic based on input

This notebook covers how to do routing in the LangChain Expression Language.

Routing allows you to create non-deterministic chains where the output of a previous step deﬁnes the next step. Routing
helps provide structure and consistency around interactions with LLMs.

There are two ways to perform routing:

1. Conditionally return runnables from a RunnableLambda (recommended)

2. Using a RunnableBranch.

We’ll illustrate both methods using a two step sequence where the ﬁrst step classiﬁes an input question as being about
LangChain, Anthropic, or Other, then routes to a corresponding prompt chain.

Example Setup

First, let’s create a chain that will identify incoming questions as being aboutLangChain, Anthropic, or Other:

from langchain_community.chat_models import ChatAnthropic

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

chain = (
PromptTemplate.from_template(
"""Given the user question below, classify it as either being about `LangChain`, `Anthropic`, or `Other`.

Do not respond with more than one word.

<question>
{question}
</question>

Classiﬁcation:"""
)
| ChatAnthropic()
| StrOutputParser()
)

chain.invoke({"question": "how do I call Anthropic?"})

' Anthropic'

Now, let’s create three sub chains:

langchain_chain = (
PromptTemplate.from_template(
"""You are an expert in langchain. \
Always answer questions starting with "As Harrison Chase told me". \
Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)
anthropic_chain = (
PromptTemplate.from_template(
"""You are an expert in anthropic. \
Always answer questions starting with "As Dario Amodei told me". \
Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)
general_chain = (
PromptTemplate.from_template(
"""Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)

Using a custom function (Recommended)

You can also use a custom function to route between diﬀerent outputs. Here’s an example:

def route(info):
if "anthropic" in info["topic"].lower():
return anthropic_chain
elif "langchain" in info["topic"].lower():
return langchain_chain
else:
return general_chain
from langchain_core.runnables import RunnableLambda

full_chain = {"topic": chain, "question": lambda x: x["question"]} | RunnableLambda(

route
)
full_chain.invoke({"question": "how do I use Anthropic?"})
AIMessage(content=' As Dario Amodei told me, to use Anthropic IPC you ﬁrst need to import it:\n\n```python\nfrom anthroipc import ic\n```\n\nThen you can create a c

full_chain.invoke({"question": "how do I use LangChain?"})

AIMessage(content=' As Harrison Chase told me, to use LangChain you ﬁrst need to sign up for an API key at platform.langchain.com. Once you have your API key,

full_chain.invoke({"question": "whats 2 + 2"})

AIMessage(content=' 4', additional_kwargs={}, example=False)

Using a RunnableBranch

A RunnableBranch is a special type of runnable that allows you to deﬁne a set of conditions and runnables to execute based on
the input. It does not oﬀer anything that you can’t achieve in a custom function as described above, so we recommend using
a custom function instead.

A RunnableBranch is initialized with a list of (condition, runnable) pairs and a default runnable. It selects which branch by
passing each condition the input it’s invoked with. It selects the ﬁrst condition to evaluate to True, and runs the corresponding
runnable to that condition with the input.

If no provided conditions match, it runs the default runnable.

Here’s an example of what it looks like in action:

from langchain_core.runnables import RunnableBranch

branch = RunnableBranch(
(lambda x: "anthropic" in x["topic"].lower(), anthropic_chain),
(lambda x: "langchain" in x["topic"].lower(), langchain_chain),
general_chain,
)
full_chain = {"topic": chain, "question": lambda x: x["question"]} | branch
full_chain.invoke({"question": "how do I use Anthropic?"})
AIMessage(content=" As Dario Amodei told me, here are some ways to use Anthropic:\n\n- Sign up for an account on Anthropic's website to access tools like Claude

full_chain.invoke({"question": "how do I use LangChain?"})

AIMessage(content=' As Harrison Chase told me, here is how you use LangChain:\n\nLangChain is an AI assistant that can have conversations, answer questions, a

full_chain.invoke({"question": "whats 2 + 2"})

AIMessage(content=' 2 + 2 = 4', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« RunnableLambda: Run Custom Functions
Next
Bind runtime args »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Parent Document
ModulesRetrievalRetrieversRetriever

On this page

Parent Document Retriever

When splitting documents for retrieval, there are often conﬂicting desires:

1. You may want to have small documents, so that their embeddings can most accurately reﬂect their meaning. If too
long, then the embeddings can lose meaning.
2. You want to have long enough documents that the context of each chunk is retained.

The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data. During retrieval, it ﬁrst fetches the
small chunks but then looks up the parent ids for those chunks and returns those larger documents.

Note that “parent document” refers to the document that a small chunk originated from. This can either be the whole raw
document OR a larger chunk.

from langchain.retrievers import ParentDocumentRetriever

from langchain.storage import InMemoryStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loaders = [
TextLoader("../../paul_graham_essay.txt"),
TextLoader("../../state_of_the_union.txt"),
]
docs = []
for loader in loaders:
docs.extend(loader.load())

Retrieving full documents

In this mode, we want to retrieve the full documents. Therefore, we only specify a child splitter.

# This text splitter is used to create the child documents

child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
)
retriever.add_documents(docs, ids=None)

This should yield two keys, because we added two documents.

list(store.yield_keys())
['cfdf4af7-51f2-4ea3-8166-5be208efa040',
'bf213c21-cc66-4208-8a72-733d030187e6']

Let’s now call the vector store search functionality - we should see that it returns small chunks (since we’re storing the small
chunks).

sub_docs = vectorstore.similarity_search("justice breyer")

print(sub_docs[0].page_content)
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

Let’s now retrieve from the overall retriever. This should return large documents - since it returns the documents where the
smaller chunks are located.

retrieved_docs = retriever.get_relevant_documents("justice breyer")

len(retrieved_docs[0].page_content)
38540

Retrieving larger chunks

Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to ﬁrst
split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on
retrieval we retrieve the larger chunks (but still not the full documents).

# This text splitter is used to create the parent documents

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="split_parents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
retriever.add_documents(docs)

We can see that there are much more than two documents now - these are the larger chunks.

len(list(store.yield_keys()))
66

Let’s make sure the underlying vector store still retrieves the small chunks.

sub_docs = vectorstore.similarity_search("justice breyer")

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

retrieved_docs = retriever.get_relevant_documents("justice breyer")

len(retrieved_docs[0].page_content)
1849
print(retrieved_docs[0].page_content)
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.

We cannot let this happen.

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police oﬃcers. A consensus builder. Since

And if we are to advance liberty and justice, we need to secure the Border and ﬁx the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traﬃckers.

We’re putting in place dedicated immigration judges so families ﬂeeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Help us out by providing feedback on this documentation page:

Previous
« MultiVector Retriever
Next
Self-querying »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiQueryRetriever

On this page

MultiQueryRetriever
Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and ﬁnds similar embedded
documents based on “distance”. But, retrieval may produce diﬀerent results with subtle changes in query wording or if the
embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually
address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from diﬀerent
perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union
across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same
question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a
richer set of results.

# Build a sample vectorDB

from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load blog post

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# VectorDB
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

Simple usage

Specify the LLM to use for query generation, and the retriever will do the rest.

from langchain.retrievers.multi_query import MultiQueryRetriever

from langchain_openai import ChatOpenAI

question = "What are the approaches to Task Decomposition?"

llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
retriever=vectordb.as_retriever(), llm=llm
)
# Set logging for the queries
import logging

logging.basicConﬁg()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the diﬀerent methods for Task Decompos

Supplying your own prompt

You can also supply a prompt along with an output parser to split the results into a list of queries.
from typing import List

from langchain.chains import LLMChain

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
# "lines" is the key (attribute name) of the parsed output
lines: List[str] = Field(description="Lines of text")

class LineListOutputParser(PydanticOutputParser):
def __init__(self) -> None:
super().__init__(pydantic_object=LineList)

def parse(self, text: str) -> LineList:

lines = text.strip().split("\n")
return LineList(lines=lines)

output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate ﬁve
diﬀerent versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines.
Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Other inputs
question = "What are the approaches to Task Decomposition?"
# Run
retriever = MultiQueryRetriever(
retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
) # "lines" is the key (attribute name) of the parsed output

# Results
unique_docs = retriever.get_relevant_documents(
query="What does the course say about regression?"
)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ["1. What is the course's perspective on regression?", '2. Can you provide information on regression as dis

Help us out by providing feedback on this documentation page:

Previous
« Vector store-backed retriever
Next
Contextual compression »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
ModulesAgentsAgent Types

Agent Types
This categorizes all the available agents along a few dimensions.

Intended Model Type

Whether this agent is intended for Chat Models (takes in messages, outputs message) or LLMs (takes in string, outputs
string). The main thing this aﬀects is the prompting strategy used. You can use an agent with a diﬀerent type of model than it
is intended for, but it likely won't produce results of the same quality.

Supports Chat History

Whether or not these agent types support chat history. If it does, that means it can be used as a chatbot. If it does not, then
that means it's more suited for single tasks. Supporting chat history generally requires better models, so earlier agent types
aimed at worse models may not support it.

Supports Multi-Input Tools

Whether or not these agent types support tools with multiple inputs. If a tool only requires a single input, it is generally easier
for an LLM to know how to invoke it. Therefore, several earlier agent types aimed at worse models may not support them.

Supports Parallel Function Calling

Having an LLM call multiple tools at the same time can greatly speed up agents whether there are tasks that are assisted by
doing so. However, it is much more challenging for LLMs to do this, so some agent types do not support this.

Required Model Params

Whether this agent requires the model to support any additional parameters. Some agent types take advantage of things like
OpenAI function calling, which require other model parameters. If none are required, then that means that everything is done
via prompting

When to Use

Our commentary on when you should consider using this agent type.

Supports Supports
Intended Support Require
Agent Multi- Parallel
Model s Chat d Model When to Use API
Type Input Function
Type History Params
Tools Calling
OpenAI
Chat ✅ ✅ ✅ tools If you are using a recent OpenAI model (1106 onwards) Ref
Tools
If you are using an OpenAI model, or an open-source
OpenAI
Chat ✅ ✅ functions model that has been ﬁnetuned for function calling and Ref
Functions
exposes the same functions parameters as OpenAI
If you are using Anthropic models, or other models good
XML LLM ✅ Ref
at XML
Structured
Chat ✅ ✅ If you need to support tools with multiple inputs Ref
Chat
JSON
Chat ✅ If you are using a model good at JSON Ref
Chat
ReAct LLM ✅ If you are using a simple model Ref
Self Ask
If you are using a simple model and only have one
With LLM Ref
search tool
Search

Help us out by providing feedback on this documentation page:

Previous
« Concepts
Next
OpenAI functions »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters MarkdownHeaderTextSplitter

On this page

MarkdownHeaderTextSplitter
Motivation

Many chat or Q+A applications involve chunking input documents prior to embedding and vector storage.

These notes from Pinecone provide some useful tips:

When a full paragraph or document is embedded, the embedding process considers both the overall context and the relationships between the sentences and phrase

As mentioned, chunking often aims to keep text with common context together. With this in mind, we might want to
specifically honor the structure of the document itself. For example, a markdown file is organized by headers. Creating
chunks within specific header groups is an intuitive idea. To address this challenge, we can use MarkdownHeaderTextSplitter. This
will split a markdown file by a specified set of headers.

For example, if we want to split this markdown:

md = '# Foo\n\n ## Bar\n\nHi this is Jim \nHi this is Joe\n\n ## Baz\n\n Hi this is Molly'

We can specify the headers to split on:

[("#", "Header 1"),("##", "Header 2")]

And content is grouped or split by common headers:

{'content': 'Hi this is Jim \nHi this is Joe', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar'}}
{'content': 'Hi this is Molly', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Baz'}}

Let’s have a look at some examples below.

%pip install -qU langchain-text-splitters

from langchain_text_splitters import MarkdownHeaderTextSplitter
markdown_document = "# Foo\n\n ## Bar\n\nHi this is Jim\n\nHi this is Joe\n\n ### Boo \n\n Hi this is Lance \n\n ## Baz\n\n Hi this is Molly"

headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
("###", "Header 3"),
]

markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='Hi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='Hi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='Hi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
type(md_header_splits[0])
langchain.schema.document.Document

By default, MarkdownHeaderTextSplitter strips headers being split on from the output chunk’s content. This can be disabled by
setting strip_headers = False .

markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='# Foo \n## Bar \nHi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='### Boo \nHi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='## Baz \nHi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
Within each markdown group we can then apply any text splitter we want.

markdown_document = "# Intro \n\n ## History \n\n Markdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber

headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]

# MD splits
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)

# Char-level splits
from langchain_text_splitters import RecursiveCharacterTextSplitter

chunk_size = 250
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

# Split
splits = text_splitter.split_documents(md_header_splits)
splits

[Document(page_content='# Intro \n## History \nMarkdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber crea
Document(page_content='Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme ﬁles.', m
Document(page_content='## Rise and divergence \nAs Markdown popularity grew rapidly, many Markdown implementations appeared, driven mostly by the need fo
Document(page_content='#### Standardization \nFrom 2012, a group of people, including Jeﬀ Atwood and John MacFarlane, launched what Atwood characterised
Document(page_content='## Implementations \nImplementations of Markdown are available for over a dozen programming languages.', metadata={'Header 1': 'Intro

Help us out by providing feedback on this documentation page:

Previous
« Split code
Next
Recursively split JSON »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesfunctions

On this page

OpenAI functions
CAUTION

OpenAI API has deprecated functions in favor of tools. The diﬀerence between the two is that thetools API allows the model to
request that multiple functions be invoked at once, which can reduce response times in some architectures. It’s
recommended to use the tools agent for OpenAI models.

See the following links for more information:

OpenAI Tools

OpenAI chat create

OpenAI function calling

Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been ﬁne-tuned to detect when a function should be
called and respond with the inputs that should be passed to the function. In an API call, you can describe functions and have
the model intelligently choose to output a JSON object containing arguments to call those functions. The goal of the OpenAI
Function APIs is to more reliably return valid and useful function calls than a generic text completion or chat API.

A number of open source models have adopted the same format for function calls and have also ﬁne-tuned the model to
detect when a function should be called.

The OpenAI Functions Agent is designed to work with these models.

Install openai, tavily-python packages which are required as the LangChain packages call them internally.

TIP

The functions format remains relevant for open source models and providers that have adopted it, and this agent is expected
to work for such models.

%pip install --upgrade --quiet langchain-openai tavily-python

Initialize Tools

We will ﬁrst create some tools we can use

from langchain import hub

from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI
tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/openai-functions-agent")
prompt.messages
[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant')),
MessagesPlaceholder(variable_name='chat_history', optional=True),
HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),
MessagesPlaceholder(variable_name='agent_scratchpad')]
# Choose the LLM that will drive the agent
llm = ChatOpenAI(model="gpt-3.5-turbo-1106")

# Construct the OpenAI Functions agent

agent = create_openai_functions_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...

Invoking: `tavily_search_results_json` with `{'query': 'LangChain'}`

[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c

> Finished chain.

{'input': 'what is LangChain?',

'output': 'LangChain is a tool for building applications using large language models (LLMs) like chatbots and virtual agents. It simpliﬁes the process of programming a

Using with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name?",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...

Your name is Bob.

> Finished chain.

{'input': "what's my name?",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« Agent Types
Next
OpenAI tools »

Community
Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Example Selector Select by
ModulesI/O Prompts Types similarity

Select by similarity
This object selects examples based on similarity to the inputs. It does this by ﬁnding the examples with the embeddings that
have the greatest cosine similarity with the inputs.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.

examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# The number of examples to produce.
k=1,
)
similar_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
preﬁx="Give the antonym of every input",
suﬃx="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: worried
Output:
# Input is a measurement, so should select the tall/short example
print(similar_prompt.format(adjective="large"))
Give the antonym of every input

Input: tall
Output: short

Input: large
Output:
# You can add new examples to the SemanticSimilarityExampleSelector as well
similar_prompt.example_selector.add_example(
{"input": "enthusiastic", "output": "apathetic"}
)
print(similar_prompt.format(adjective="passionate"))
Give the antonym of every input

Input: enthusiastic
Output: apathetic

Input: passionate
Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by n-gram overlap
Next
Example selectors »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Self-ask with
ModulesAgentsAgent Typessearch

On this page

Self-ask with search

This walkthrough showcases the self-ask with search agent.

from langchain import hub

from langchain.agents import AgentExecutor, create_self_ask_with_search_agent
from langchain_community.llms import Fireworks
from langchain_community.tools.tavily_search import TavilyAnswer

Initialize Tools

We will initialize the tools we want to use. This is a good tool because it gives usanswers (not documents)

For this agent, only one tool can be used and it needs to be named “Intermediate Answer”

tools = [TavilyAnswer(max_results=1, name="Intermediate Answer")]

Create Agent

# Get the prompt to use - you can modify this!

prompt = hub.pull("hwchase17/self-ask-with-search")
# Choose the LLM that will drive the agent
llm = Fireworks()

# Construct the Self Ask With Search Agent

agent = create_self_ask_with_search_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke(
{"input": "What is the hometown of the reigning men's U.S. Open champion?"}
)

> Entering new AgentExecutor chain...

Yes.
Follow up: Who is the reigning men's U.S. Open champion?The reigning men's U.S. Open champion is Novak Djokovic. He won his 24th Grand Slam singles title by
So the ﬁnal answer is: Novak Djokovic.

> Finished chain.

{'input': "What is the hometown of the reigning men's U.S. Open champion?",
'output': 'Novak Djokovic.'}

Help us out by providing feedback on this documentation page:

Previous
« ReAct
Next
OpenAI assistants »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsCaching

On this page

Caching
LangChain provides an optional caching layer for chat models. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same
completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM
provider.

from langchain.globals import set_llm_cache

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

In Memory Cache

%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The ﬁrst time, it is not yet in cache, so it should take longer

llm.predict("Tell me a joke")
CPU times: user 17.7 ms, sys: 9.35 ms, total: 27.1 ms
Wall time: 801 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 1.42 ms, sys: 419 µs, total: 1.83 ms
Wall time: 1.83 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

SQLite Cache

!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
# The ﬁrst time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 23.2 ms, sys: 17.8 ms, total: 40.9 ms
Wall time: 592 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 5.61 ms, sys: 22.5 ms, total: 28.1 ms
Wall time: 47.5 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

Help us out by providing feedback on this documentation page:

Previous
« Function calling
Next
Custom Chat Model »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Memory Conversation Summary
ModulesMoreMemorytypes Buﬀer

On this page

Conversation Summary Buﬀer

ConversationSummaryBufferMemory combines the two ideas. It keeps a buffer of recent interactions in memory, but rather than just
completely flushing old interactions it compiles them into a summary and uses both. It uses token length rather than number
of interactions to determine when to flush interactions.

Let’s ﬁrst walk through how to use the utilities.

Using memory with LLM

from langchain.memory import ConversationSummaryBuﬀerMemory

from langchain_openai import OpenAI

llm = OpenAI()
memory = ConversationSummaryBuﬀerMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'System: \nThe human says "hi", and the AI responds with "whats up".\nHuman: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationSummaryBuﬀerMemory(
llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

We can also utilize the predict_new_summary method directly.

messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)
'\nThe human and AI state that they are not doing much.'

Using in a chain

Let’s walk through an example, again setting verbose=True so we can see the prompt.

from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
llm=llm,
# We set a very low max_token_limit for the purposes of testing.
memory=ConversationSummaryBuﬀerMemory(llm=OpenAI(), max_token_limit=40),
verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Human: Hi, what's up?

AI:

> Finished chain.

" Hi there! I'm doing great. I'm learning about the latest advances in artiﬁcial intelligence. What about you?"
conversation_with_summary.predict(input="Just working on writing some documentation!")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm spending some time learning about the latest developments in AI technology. How about you?
Human: Just working on writing some documentation!
AI:

> Finished chain.

' That sounds like a great use of your time. Do you have experience with writing documentation?'
# We can see here that there is a summary of the conversation and then some previous interactions
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology.
Human: Just working on writing some documentation!
AI: That sounds like a great use of your time. Do you have experience with writing documentation?
Human: For LangChain! Have you heard of it?
AI:

> Finished chain.

" No, I haven't heard of LangChain. Can you tell me more about it?"
# We can see here that the summary and the buﬀer are updated
conversation_with_summary.predict(
input="Haha nope, although a lot of people confuse it for that"
)

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology. The human then mentioned they
Human: For LangChain! Have you heard of it?
AI: No, I haven't heard of LangChain. Can you tell me more about it?
Human: Haha nope, although a lot of people confuse it for that
AI:

> Finished chain.

' Oh, okay. What is LangChain?'

Help us out by providing feedback on this documentation page:

Previous
« Conversation Summary
Next
Conversation Token Buﬀer »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookAgents

Agents
You can pass a Runnable into an agent. Make sure you havelangchainhub installed: pip install langchainhub

from langchain import hub

from langchain.agents import AgentExecutor, tool
from langchain.agents.output_parsers import XMLAgentOutputParser
from langchain_community.chat_models import ChatAnthropic
model = ChatAnthropic(model="claude-2")
@tool
def search(query: str) -> str:
"""Search things about current events."""
return "32 degrees"
tool_list = [search]
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/xml-agent-convo")
# Logic for going from intermediate steps to a string to pass into model
# This is pretty tied to the prompt
def convert_intermediate_steps(intermediate_steps):
log = ""
for action, observation in intermediate_steps:
log += (
f"<tool>{action.tool}</tool><tool_input>{action.tool_input}"
f"</tool_input><observation>{observation}</observation>"
)
return log

# Logic for converting tools to string to go in prompt

def convert_tools(tools):
return "\n".join([f"{tool.name}: {tool.description}" for tool in tools])

Building an agent from a runnable usually involves a few things:

1. Data processing for the intermediate steps. These need to be represented in a way that the language model can
recognize them. This should be pretty tightly coupled to the instructions in the prompt

2. The prompt itself

3. The model, complete with stop tokens if needed

4. The output parser - should be in sync with how the prompt speciﬁes things to be formatted.

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: convert_intermediate_steps(
x["intermediate_steps"]
),
}
| prompt.partial(tools=convert_tools(tool_list))
| model.bind(stop=["</tool_input>", "</ﬁnal_answer>"])
| XMLAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tool_list, verbose=True)
agent_executor.invoke({"input": "whats the weather in New york?"})

> Entering new AgentExecutor chain...

<tool>search</tool><tool_input>weather in New York32 degrees <tool>search</tool>
<tool_input>weather in New York32 degrees <ﬁnal_answer>The weather in New York is 32 degrees

> Finished chain.

{'input': 'whats the weather in New york?',
'output': 'The weather in New York is 32 degrees'}
Help us out by providing feedback on this documentation page:

Previous
« Querying a SQL DB
Next
Code writing »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableLambda: Run Custom
Language to Functions

On this page

Run custom functions

You can use arbitrary functions in the pipeline.

Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple
arguments, you should write a wrapper that accepts a single input and unpacks it into multiple argument.

%pip install –upgrade –quiet langchain langchain-openai

from operator import itemgetter

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

def length_function(text):
return len(text)

def _multiple_length_function(text1, text2):

return len(text1) * len(text2)

def multiple_length_function(_dict):
return _multiple_length_function(_dict["text1"], _dict["text2"])

prompt = ChatPromptTemplate.from_template("what is {a} + {b}")

model = ChatOpenAI()

chain1 = prompt | model

chain = (
{
"a": itemgetter("foo") | RunnableLambda(length_function),
"b": {"text1": itemgetter("foo"), "text2": itemgetter("bar")}
| RunnableLambda(multiple_length_function),
}
| prompt
| model
)
chain.invoke({"foo": "bar", "bar": "gah"})
AIMessage(content='3 + 9 equals 12.')

Accepting a Runnable Conﬁg

Runnable lambdas can optionally accept a RunnableConﬁg, which they can use to pass callbacks, tags, and other
conﬁguration information to nested runs.

from langchain_core.output_parsers import StrOutputParser

from langchain_core.runnables import RunnableConﬁg
import json

def parse_or_fix(text: str, config: RunnableConfig):

fixing_chain = (
ChatPromptTemplate.from_template(
"Fix the following text:\n\n```text\n{input}\n```\nError: {error}"
" Don't narrate, just respond with the fixed data."
)
| ChatOpenAI()
| StrOutputParser()
)
for _ in range(3):
try:
return json.loads(text)
except Exception as e:
text = fixing_chain.invoke({"input": text, "error": e}, config)
return "Failed to parse"
from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:

output = RunnableLambda(parse_or_ﬁx).invoke(
"{foo: bar}", {"tags": ["my-tag"], "callbacks": [cb]}
)
print(output)
print(cb)
{'foo': 'bar'}
Tokens Used: 65
Prompt Tokens: 56
Completion Tokens: 9
Successful Requests: 1
Total Cost (USD): $0.00010200000000000001

Help us out by providing feedback on this documentation page:

Previous
« RunnablePassthrough: Passing data through
Next
RunnableBranch: Dynamically route logic based on input »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Few-shot prompt
ModulesI/O Prompts templates

On this page

Few-shot prompt templates

In this tutorial, we’ll learn how to create a prompt template that uses few-shot examples. A few-shot prompt template can be
constructed from either a set of examples, or from an Example Selector object.

Use Case

In this tutorial, we’ll conﬁgure few-shot examples for self-ask with search.

Using an example set

Create the example set

To get started, create a list of few-shot examples. Each example should be a dictionary with the keys being the input
variables and the values being the values for those input variables.
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
{
"question": "Who lived longer, Muhammad Ali or Alan Turing?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
},
{
"question": "When was the founder of craigslist born?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
},
{
"question": "Who was the maternal grandfather of George Washington?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
},
{
"question": "Are both the directors of Jaws and Casino Royale from the same country?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
},
]

Create a formatter for the few-shot examples

Conﬁgure a formatter that will format the few-shot examples into a string. This formatter should be aPromptTemplate object.

example_prompt = PromptTemplate(
input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))
Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.

Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the ﬁnal answer is: Muhammad Ali

Feed examples and formatter to FewShotPromptTemplate

Finally, create a FewShotPromptTemplate object. This object takes in the few-shot examples and the formatter for the few-shot
examples.
prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
suﬃx="Question: {input}",
input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.

Question: When was the founder of craigslist born?

Are follow up questions needed here: Yes.

Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the ﬁnal answer is: December 6, 1952

Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.

Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the ﬁnal answer is: Joseph Ball

Question: Are both the directors of Jaws and Casino Royale from the same country?

Are follow up questions needed here: Yes.

Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the ﬁnal answer is: No

Question: Who was the father of Mary Ball Washington?

Using an example selector

Feed examples into ExampleSelector

We will reuse the example set and the formatter from the previous section. However, instead of feeding the examples directly
into the FewShotPromptTemplate object, we will feed them into an ExampleSelector object.

In this tutorial, we will use theSemanticSimilarityExampleSelector class. This class selects few-shot examples based on their
similarity to the input. It uses an embedding model to compute the similarity between the input and the few-shot examples, as
well as a vector store to perform the nearest neighbor search.
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
# This is the list of examples available to select from.
examples,
# This is the embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# This is the VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# This is the number of examples to produce.
k=1,
)

# Select the most similar example to the input.

question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: {question}")
for example in selected_examples:
print("\n")
for k, v in example.items():
print(f"{k}: {v}")
Examples most similar to the input: Who was the father of Mary Ball Washington?

answer:
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the ﬁnal answer is: Joseph Ball

question: Who was the maternal grandfather of George Washington?

Feed example selector into FewShotPromptTemplate

Finally, create a FewShotPromptTemplate object. This object takes in the example selector and the formatter for the few-shot
examples.

prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
suﬃx="Question: {input}",
input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.

Question: Who was the father of Mary Ball Washington?

Help us out by providing feedback on this documentation page:

Previous
« Example selectors
Next
Few-shot examples for chat models »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Deﬁning Custom
ModulesAgentsToolsTools

On this page

Deﬁning Custom Tools

When constructing your own agent, you will need to provide it with a list of Tools that it can use. Besides the actual function
that is called, the Tool consists of several components:

name (str), is required and must be unique within a set of tools provided to an agent
description (str), is optional but recommended, as it is used by an agent to determine tool use
args_schema (Pydantic BaseModel), is optional but recommended, can be used to provide more information (e.g., few-
shot examples) or validation for expected parameters.

There are multiple ways to deﬁne a tool. In this guide, we will walk through how to do for two functions:

1. A made up search function that always returns the string “LangChain”

2. A multiplier function that will multiply two numbers by eachother

The biggest difference here is that the first function only requires one input, while the second one requires multiple. Many
agents only work with functions that require single inputs, so it’s important to know how to work with those. For the most part,
defining these custom tools is the same, but there are some differences.

# Import things that are needed generically

from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

@tool decorator

This @tool decorator is the simplest way to deﬁne a custom tool. The decorator uses the function name as the tool name by
default, but this can be overridden by passing a string as the ﬁrst argument. Additionally, the decorator will use the function’s
docstring as the tool’s description - so a docstring MUST be provided.

@tool
def search(query: str) -> str:
"""Look up things online."""
return "LangChain"
print(search.name)
print(search.description)
print(search.args)
search
search(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'type': 'string'}}
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
print(multiply.name)
print(multiply.description)
print(multiply.args)
multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}

You can also customize the tool name and JSON args by passing them into the tool decorator.
class SearchInput(BaseModel):
query: str = Field(description="should be a search query")

@tool("search-tool", args_schema=SearchInput, return_direct=True)

def search(query: str) -> str:
"""Look up things online."""
return "LangChain"
print(search.name)
print(search.description)
print(search.args)
print(search.return_direct)
search-tool
search-tool(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}
True

Subclass BaseTool

You can also explicitly deﬁne a custom tool by subclassing the BaseTool class. This provides maximal control over the tool
deﬁnition, but is a bit more work.

from typing import Optional, Type

from langchain.callbacks.manager import (

AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)

class SearchInput(BaseModel):
query: str = Field(description="should be a search query")

class CalculatorInput(BaseModel):
a: int = Field(description="ﬁrst number")
b: int = Field(description="second number")

class CustomSearchTool(BaseTool):
name = "custom_search"
description = "useful for when you need to answer questions about current events"
args_schema: Type[BaseModel] = SearchInput

def _run(
self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return "LangChain"

async def _arun(

self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("custom_search does not support async")

class CustomCalculatorTool(BaseTool):
name = "Calculator"
description = "useful for when you need to answer questions about math"
args_schema: Type[BaseModel] = CalculatorInput
return_direct: bool = True

def _run(
self, a: int, b: int, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return a * b

async def _arun(

self,
a: int,
b: int,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("Calculator does not support async")
search = CustomSearchTool()
print(search.name)
print(search.description)
print(search.args)
custom_search
useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}
multiply = CustomCalculatorTool()
print(multiply.name)
print(multiply.description)
print(multiply.args)
print(multiply.return_direct)
Calculator
useful for when you need to answer questions about math
{'a': {'title': 'A', 'description': 'ﬁrst number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}
True

StructuredTool dataclass

You can also use a StructuredTool dataclass. This methods is a mix between the previous two. It’s more convenient than
inheriting from the BaseTool class, but provides more functionality than just using a decorator.

def search_function(query: str):

return "LangChain"

search = StructuredTool.from_function(
func=search_function,
name="Search",
description="useful for when you need to answer questions about current events",
# coroutine= ... <- you can specify an async method if desired as well
)
print(search.name)
print(search.description)
print(search.args)
Search
Search(query: str) - useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'type': 'string'}}

You can also deﬁne a custom args_schema to provide more information about inputs.

class CalculatorInput(BaseModel):
a: int = Field(description="ﬁrst number")
b: int = Field(description="second number")

def multiply(a: int, b: int) -> int:

"""Multiply two numbers."""
return a * b

calculator = StructuredTool.from_function(
func=multiply,
name="Calculator",
description="multiply numbers",
args_schema=CalculatorInput,
return_direct=True,
# coroutine= ... <- you can specify an async method if desired as well
)
print(calculator.name)
print(calculator.description)
print(calculator.args)
Calculator
Calculator(a: int, b: int) -> int - multiply numbers
{'a': {'title': 'A', 'description': 'ﬁrst number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}

Handling Tool Errors

When a tool encounters an error and the exception is not caught, the agent will stop executing. If you want the agent to
continue execution, you can raise a ToolException and set handle_tool_error accordingly.

When ToolException is thrown, the agent will not stop working, but will handle the exception according to thehandle_tool_error
variable of the tool, and the processing result will be returned to the agent as observation, and printed in red.
You can set handle_tool_error to True, set it a uniﬁed string value, or set it as a function. If it’s set as a function, the function
should take a ToolException as a parameter and return a str value.

Please note that only raising a ToolException won’t be eﬀective. You need to ﬁrst set thehandle_tool_error of the tool because its
default value is False.

from langchain_core.tools import ToolException

def search_tool1(s: str):

raise ToolException("The search tool1 is not available.")

First, let’s see what happens if we don’t sethandle_tool_error - it will error.

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
)

search.run("test")
ToolException: The search tool1 is not available.

Now, let’s set handle_tool_error to be True

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=True,
)

search.run("test")
'The search tool1 is not available.'

We can also deﬁne a custom way to handle the tool error

def _handle_error(error: ToolException) -> str:

return (
"The following errors occurred during tool execution:"
+ error.args[0]
+ "Please try another tool."
)

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=_handle_error,
)

search.run("test")
'The following errors occurred during tool execution:The search tool1 is not available.Please try another tool.'

Help us out by providing feedback on this documentation page:

Previous
« Toolkits
Next
Tools as OpenAI Functions »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Modules

On this page

Modules
LangChain provides standard, extendable interfaces and external integrations for the following main modules:

Model I/O

Interface with language models

Retrieval

Interface with application-speciﬁc data

Agents

Let chains choose which tools to use given high-level directives

Additional

Chains

Common, building block compositions

Memory

Persist application state between runs of a chain

Callbacks

Log and stream intermediate steps of any chain

Help us out by providing feedback on this documentation page:

Previous
« LangChain Expression Language (LCEL)
Next
Model I/O »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage

Blog
YouTube
Model Example Selector
ModulesI/O Prompts Types

Example Selector Types

Name Description
Similarity Uses semantic similarity between inputs and examples to decide which examples to choose.
Uses Max Marginal Relevance between inputs and examples to decide which examples to
MMR
choose.
Length Selects examples based on how many can ﬁt within a certain length
Ngram Uses ngram overlap between inputs and examples to decide which examples to choose.

Help us out by providing feedback on this documentation page:

Previous
« Composition
Next
Select by length »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model
ModulesI/O Chat Models

On this page

Chat Models
Chat Models are a core component of LangChain.

A chat model is a language model that uses chat messages as inputs and returns chat messages as outputs (as opposed to
using plain text).

LangChain has integrations with many model providers (OpenAI, Cohere, Hugging Face, etc.) and exposes a standard
interface to interact with all of these models.

LangChain allows you to use models in sync, async, batching and streaming modes and provides other features (e.g.,
caching) and more.

Quick Start

Check out this quick start to get an overview of working with ChatModels, including all the diﬀerent methods they expose

Integrations

For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page

How-To Guides

We have several how-to guides for more advanced usage of LLMs. This includes:

How to cache ChatModel responses

How to use ChatModels that support function calling
How to stream responses from a ChatModel
How to track token usage in a ChatModel call
How to creat a custom ChatModel

Help us out by providing feedback on this documentation page:

Previous
« Pipeline
Next
Quick Start »

Community
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Memory in
ModulesMoreMemoryLLMChain

On this page

Memory in LLMChain
This notebook goes over how to use the Memory class with anLLMChain.

We will add the ConversationBuﬀerMemory class, although this can be any memory class.

from langchain.chains import LLMChain

from langchain.memory import ConversationBuﬀerMemory
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

The most important step is setting up the prompt correctly. In the below prompt, we have two input keys: one for the actual
input, another for the input from the Memory class. Importantly, we make sure the keys in the PromptTemplate and the
ConversationBuﬀerMemory match up (chat_history).

template = """You are a chatbot having a conversation with a human.

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
input_variables=["chat_history", "human_input"], template=template
)
memory = ConversationBuﬀerMemory(memory_key="chat_history")
llm = OpenAI()
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
llm_chain.predict(human_input="Hi there my friend")

> Entering new LLMChain chain...

Prompt after formatting:
You are a chatbot having a conversation with a human.

Human: Hi there my friend

Chatbot:

> Finished chain.

' Hi there! How can I help you today?'
llm_chain.predict(human_input="Not too bad - how are you?")

> Entering new LLMChain chain...

Prompt after formatting:
You are a chatbot having a conversation with a human.

Human: Hi there my friend

AI: Hi there! How can I help you today?
Human: Not too bad - how are you?
Chatbot:

> Finished chain.

" I'm doing great, thanks for asking! How are you doing?"

Adding Memory to a chat model-based LLMChain

The above works for completion-style LLMs, but if you are using a chat model, you will likely get better performance using
structured chat messages. Below is an example.

from langchain.prompts import (

ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

We will use the ChatPromptTemplate class to set up the chat prompt.

The from_messages method creates a ChatPromptTemplate from a list of messages (e.g., SystemMessage, HumanMessage , AIMessage,
ChatMessage, etc.) or message templates, such as the MessagesPlaceholder below.

The conﬁguration below makes it so the memory will be injected to the middle of the chat prompt, in thechat_history key, and
the user’s inputs will be added in a human/user message to the end of the chat prompt.

prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a chatbot having a conversation with a human."
), # The persistent system prompt
MessagesPlaceholder(
variable_name="chat_history"
), # Where the memory will be stored.
HumanMessagePromptTemplate.from_template(
"{human_input}"
), # Where the human input will injected
]
)

memory = ConversationBuﬀerMemory(memory_key="chat_history", return_messages=True)

llm = ChatOpenAI()

chat_llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
chat_llm_chain.predict(human_input="Hi there my friend")

> Entering new LLMChain chain...

Prompt after formatting:
System: You are a chatbot having a conversation with a human.
Human: Hi there my friend

> Finished chain.

'Hello! How can I assist you today, my friend?'
chat_llm_chain.predict(human_input="Not too bad - how are you?")

> Entering new LLMChain chain...

Prompt after formatting:
System: You are a chatbot having a conversation with a human.
Human: Hi there my friend
AI: Hello! How can I assist you today, my friend?
Human: Not too bad - how are you?

> Finished chain.

"I'm an AI chatbot, so I don't have feelings, but I'm here to help and chat with you! Is there something speciﬁc you would like to talk about or any questions I can assis

Help us out by providing feedback on this documentation page:

Previous
« [Beta] Memory
Next
Memory in the Multi-Input Chain »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Prompts

On this page

Prompts
A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it
understand the context and generate relevant and coherent language-based output, such as answering questions,
completing sentences, or engaging in a conversation.

Quickstart

This quick start provides a basic overview of how to work with prompts.

How-To Guides

We have many how-to guides for working with prompts. These include:

How to use few-shot examples with LLMs

How to use few-shot examples with chat models
How to use example selectors
How to partial prompts
How to work with message prompts
How to compose prompts together
How to create a pipeline prompt

Example Selector Types

LangChain has a few diﬀerent types of example selectors you can use oﬀ the shelf. You can explore those typeshere

Help us out by providing feedback on this documentation page:

Previous
« Concepts
Next
Quick Start »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Text Split
ModulesRetrievalSplitters code

On this page

Split code
CodeTextSplitter allows you to split your code with multiple languages supported. Import enumLanguage and specify the
language.

%pip install -qU langchain-text-splitters

from langchain_text_splitters import (
Language,
RecursiveCharacterTextSplitter,
)
# Full list of supported languages
[e.value for e in Language]
['cpp',
'go',
'java',
'kotlin',
'js',
'ts',
'php',
'proto',
'python',
'rst',
'ruby',
'rust',
'scala',
'swift',
'markdown',
'latex',
'html',
'sol',
'csharp',
'cobol']
# You can also see the separators used for a given language
RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)
['\nclass ', '\ndef ', '\n\tdef ', '\n\n', '\n', ' ', '']

Python

Here’s an example using the PythonTextSplitter:

PYTHON_CODE = """
def hello_world():
print("Hello, World!")

# Call the function

hello_world()
"""
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON, chunk_size=50, chunk_overlap=0
)
python_docs = python_splitter.create_documents([PYTHON_CODE])
python_docs
[Document(page_content='def hello_world():\n print("Hello, World!")'),
Document(page_content='# Call the function\nhello_world()')]

JS
Here’s an example using the JS text splitter:

JS_CODE = """
function helloWorld() {
console.log("Hello, World!");
}

// Call the function

helloWorld();
"""

js_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.JS, chunk_size=60, chunk_overlap=0
)
js_docs = js_splitter.create_documents([JS_CODE])
js_docs
[Document(page_content='function helloWorld() {\n console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]

Here’s an example using the TS text splitter:

TS_CODE = """
function helloWorld(): void {
console.log("Hello, World!");
}

// Call the function

helloWorld();
"""

ts_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.TS, chunk_size=60, chunk_overlap=0
)
ts_docs = ts_splitter.create_documents([TS_CODE])
ts_docs
[Document(page_content='function helloWorld(): void {'),
Document(page_content='console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]

Markdown

Here’s an example using the Markdown text splitter:

markdown_text = """
# ️ LangChain

⚡ Building applications with LLMs through composability ⚡

## Quick Install

```bash
# Hopefully this code block isn't split
pip install langchain
```

As an open-source project in a rapidly developing ﬁeld, we are extremely open to contributions.

"""
md_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
)
md_docs = md_splitter.create_documents([markdown_text])
md_docs
[Document(page_content='# ️ LangChain'),
Document(page_content='⚡ Building applications with LLMs through composability ⚡'),
Document(page_content='## Quick Install\n\n```bash'),
Document(page_content="# Hopefully this code block isn't split"),
Document(page_content='pip install langchain'),
Document(page_content='```'),
Document(page_content='As an open-source project in a rapidly developing ﬁeld, we'),
Document(page_content='are extremely open to contributions.')]
Latex

Here’s an example on Latex text:

latex_text = """
\documentclass{article}

\begin{document}

\maketitle

\section{Introduction}
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent yea

\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power availab

\subsection{Applications of LLMs}
LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, p

\end{document}
"""

latex_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
)
latex_docs = latex_splitter.create_documents([latex_text])
latex_docs
[Document(page_content='\\documentclass{article}\n\n\x08egin{document}\n\n\\maketitle'),
Document(page_content='\\section{Introduction}'),
Document(page_content='Large language models (LLMs) are a type of machine learning'),
Document(page_content='model that can be trained on vast amounts of text data to'),
Document(page_content='generate human-like language. In recent years, LLMs have'),
Document(page_content='made signiﬁcant advances in a variety of natural language'),
Document(page_content='processing tasks, including language translation, text'),
Document(page_content='generation, and sentiment analysis.'),
Document(page_content='\\subsection{History of LLMs}'),
Document(page_content='The earliest LLMs were developed in the 1980s and 1990s,'),
Document(page_content='but they were limited by the amount of data that could be'),
Document(page_content='processed and the computational power available at the'),
Document(page_content='time. In the past decade, however, advances in hardware and'),
Document(page_content='software have made it possible to train LLMs on massive'),
Document(page_content='datasets, leading to signiﬁcant improvements in'),
Document(page_content='performance.'),
Document(page_content='\\subsection{Applications of LLMs}'),
Document(page_content='LLMs have many applications in industry, including'),
Document(page_content='chatbots, content creation, and virtual assistants. They'),
Document(page_content='can also be used in academia for research in linguistics,'),
Document(page_content='psychology, and computational linguistics.'),
Document(page_content='\\end{document}')]

HTML

Here’s an example using an HTML text splitter:

html_text = """
<!DOCTYPE html>
<html>
<head>
<title> ️ LangChain</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: darkblue;
}
</style>
</head>
<body>
<div>
<h1> ️ LangChain</h1>
⚡ Building applications with LLMs through composability ⚡
</div>
<div>
As an open-source project in a rapidly developing ﬁeld, we are extremely open to contributions.
</div>
</body>
</html>
"""
html_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.HTML, chunk_size=60, chunk_overlap=0
)
html_docs = html_splitter.create_documents([html_text])
html_docs
[Document(page_content='<!DOCTYPE html>\n<html>'),
Document(page_content='<head>\n <title> ️ LangChain</title>'),
Document(page_content='<style>\n body {\n font-family: Aria'),
Document(page_content='l, sans-serif;\n }\n h1 {'),
Document(page_content='color: darkblue;\n }\n </style>\n </head'),
Document(page_content='>'),
Document(page_content='<body>'),
Document(page_content='<div>\n <h1> ️ LangChain</h1>'),
Document(page_content='⚡ Building applications with LLMs through composability ⚡'),
Document(page_content='\n </div>'),
Document(page_content='<div>\n As an open-source project in a rapidly dev'),
Document(page_content='eloping ﬁeld, we are extremely open to contributions.'),
Document(page_content='</div>\n </body>\n</html>')]

Solidity

Here’s an example using the Solidity text splitter:

SOL_CODE = """
pragma solidity ^0.8.20;
contract HelloWorld {
function add(uint a, uint b) pure public returns(uint) {
return a + b;
}
}
"""

sol_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.SOL, chunk_size=128, chunk_overlap=0
)
sol_docs = sol_splitter.create_documents([SOL_CODE])
sol_docs
[Document(page_content='pragma solidity ^0.8.20;'),
Document(page_content='contract HelloWorld {\n function add(uint a, uint b) pure public returns(uint) {\n return a + b;\n }\n}')]

Here’s an example using the C# text splitter:

C_CODE = """
using System;
class Program
{
static void Main()
{
int age = 30; // Change the age value as needed

// Categorize the age without any console output

if (age < 18)
{
// Age is under 18
}
else if (age >= 18 && age < 65)
{
// Age is an adult
}
else
{
// Age is a senior citizen
}
}
}
"""
c_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.CSHARP, chunk_size=128, chunk_overlap=0
)
c_docs = c_splitter.create_documents([C_CODE])
c_docs
[Document(page_content='using System;'),
Document(page_content='class Program\n{\n static void Main()\n {\n int age = 30; // Change the age value as needed'),
Document(page_content='// Categorize the age without any console output\n if (age < 18)\n {\n // Age is under 18'),
Document(page_content='}\n else if (age >= 18 && age < 65)\n {\n // Age is an adult\n }\n else\n {'),
Document(page_content='// Age is a senior citizen\n }\n }\n}')]

Help us out by providing feedback on this documentation page:

Previous
« Split by character
Next
MarkdownHeaderTextSplitter »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters

On this page

Text Splitters
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is
you may want to split a long document into smaller chunks that can ﬁt into your model's context window. LangChain has a
number of built-in document transformers that make it easy to split, combine, ﬁlter, and otherwise manipulate documents.

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds,
there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What
"semantically related" means could depend on the type of text. This notebook showcases several ways to do that.

At a high level, text splitters work as following:

1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some
overlap (to keep context between chunks).

That means there are two diﬀerent axes along which you can customize your text splitter:

1. How the text is split

2. How the chunk size is measured

Types of Text Splitters

LangChain oﬀers many diﬀerent types of text splitters. These all live in thelangchain-text-splitters package. Below is a table
listing all of them, along with a few characteristics:

Name: Name of the text splitter

Splits On: How this text splitter splits text

Adds Metadata: Whether or not this text splitter adds metadata about where each chunk came from.

Description: Description of the splitter, including recommendation on when to use it.

Adds
Name Splits On Description
Metadata
A list of user Recursively splits text. Splitting text recursively serves the purpose of trying to
Recursive defined keep related pieces of text next to each other. This is the recommended way to
characters start splitting text.
HTML specific Splits text based on HTML-specific characters. Notably, this adds in relevant
HTML ✅
characters information about where that chunk came from (based on the HTML)
Markdown
Splits text based on Markdown-specific characters. Notably, this adds in relevant
Markdown specific ✅
information about where that chunk came from (based on the Markdown)
characters
Code (Python,
Splits text based on characters specific to coding languages. 15 different
Code JS) specific
languages are available to choose from.
characters
Token Tokens Splits text on tokens. There exist a few different ways to measure tokens.
A user defined
Character Splits text based on a user defined character. One of the simpler methods.
character
[Experimental]
First splits on sentences. Then combines ones next to each other if they are
Semantic Sentences
semantically similar enough. Taken from Greg Kamradt
Chunker

Evaluate text splitters

You can evaluate text splitters with theChunkviz utility created by Greg Kamradt. Chunkviz is a great tool for visualizing how your
text splitter is working. It will show you how your text is being split up and help in tuning up the splitting parameters.

Help us out by providing feedback on this documentation page:

Previous
« PDF
Next
HTMLHeaderTextSplitter »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Routing by semantic
Language Cookbooksimilarity

Routing by semantic similarity

With LCEL you can easily add custom routing logic to your chain to dynamically determine the chain logic based on user
input. All you need to do is deﬁne a function that given an input returns a Runnable .

One especially useful technique is to use embeddings to route a query to the most relevant prompt. Here’s a very simple
example.

%pip install --upgrade --quiet langchain-core langchain langchain-openai

from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

physics_template = """You are a very smart physics professor. \

You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)

def prompt_router(input):
query_embedding = embeddings.embed_query(input["query"])
similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
most_similar = prompt_templates[similarity.argmax()]
print("Using MATH" if most_similar == math_template else "Using PHYSICS")
return PromptTemplate.from_template(most_similar)

chain = (
{"query": RunnablePassthrough()}
| RunnableLambda(prompt_router)
| ChatOpenAI()
| StrOutputParser()
)
print(chain.invoke("What's a black hole"))
Using PHYSICS
A black hole is a region in space where gravity is extremely strong, so strong that nothing, not even light, can escape its gravitational pull. It is formed when a massiv

print(chain.invoke("What's a path integral"))

Using MATH
Thank you for your kind words! I will do my best to break down the concept of a path integral for you.

In mathematics and physics, a path integral is a mathematical tool used to calculate the probability amplitude or wave function of a particle or system of particles. It w

To understand the concept better, let's consider an example. Suppose we have a particle moving from point A to point B in space. Classically, we would describe this

The path integral formalism considers all possible paths that the particle could take and assigns a probability amplitude to each path. These probability amplitudes ar

To calculate a path integral, we need to deﬁne an action, which is a mathematical function that describes the behavior of the system. The action is usually expressed

Once we have the action, we can write down the path integral as an integral over all possible paths. Each path is weighted by a factor determined by the action and t

Mathematically, the path integral is expressed as:

∫ e^(iS/ħ) D[x(t)]

Here, S is the action, ħ is the reduced Planck's constant, and D[x(t)] represents the integration over all possible paths x(t) of the particle.

By evaluating this integral, we can obtain the probability amplitude for the particle to go from the initial state to the ﬁnal state. The absolute square of this amplitude g

Path integrals have proven to be a powerful tool in various areas of physics, including quantum mechanics, quantum ﬁeld theory, and statistical mechanics. They allo

I hope this explanation helps you understand the concept of a path integral. If you have any further questions, feel free to ask!

Help us out by providing feedback on this documentation page:

Previous
« Code writing
Next
Adding memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Logging to
ModulesMoreCallbacksﬁle

Logging to file
This example shows how to print logs to file. It shows how to use theFileCallbackHandler, which does the same thing as
StdOutCallbackHandler, but instead writes the output to file. It also uses theloguru library to log other outputs that are not captured
by the handler.

from langchain.callbacks import FileCallbackHandler

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
from loguru import logger

logﬁle = "output.log"

logger.add(logﬁle, colorize=True, enqueue=True)

handler = FileCallbackHandler(logﬁle)

llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

# this chain will both print to stdout (because verbose=True) and write to 'output.log'
# if verbose=False, the FileCallbackHandler will still write to 'output.log'
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)
answer = chain.run(number=2)
logger.info(answer)

> Entering new LLMChain chain...

Prompt after formatting:
1+2=

> Finished chain.

2023-06-01 18:36:38.929 | INFO | __main__:<module>:20 -

Now we can open the ﬁle output.log to see that the output has been captured.

%pip install --upgrade --quiet ansi2html > /dev/null

from ansi2html import Ansi2HTMLConverter
from IPython.display import HTML, display

with open("output.log", "r") as f:

content = f.read()

conv = Ansi2HTMLConverter()
html = conv.convert(content, full=True)

display(HTML(html))
> Entering new LLMChain chain...
Prompt after formatting:
1+2=
> Finished chain.
2023-06-01 18:36:38.929 | INFO | __main__:<module>:20 -
3

Help us out by providing feedback on this documentation page:

Previous
« Custom callback handlers
Next
Multiple callback handlers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Customizing Conversational
ModulesMoreMemoryMemory

On this page

Customizing Conversational Memory

This notebook walks through a few ways to customize conversational memory.

from langchain.chains import ConversationChain

from langchain.memory import ConversationBuﬀerMemory
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)

AI preﬁx

The first way to do so is by changing the AI prefix in the conversation summary. By default, this is set to “AI”, but you can set
this to be anything you want. Note that if you change this, you should also change the prompt used in the chain to reflect this
naming change. Let’s walk through an example of that in the example below.

# Here it is by default set to "AI"

conversation = ConversationChain(
llm=llm, verbose=True, memory=ConversationBuﬀerMemory()
)
conversation.predict(input="Hi there!")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI:

> Finished ConversationChain chain.

' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the next few days is sunny with temperatures in the mid-70s.'
# Now we can override it and set it to "AI Assistant"
from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI doe

Current conversation:
{history}
Human: {input}
AI Assistant:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBuﬀerMemory(ai_preﬁx="AI Assistant"),
)

conversation.predict(input="Hi there!")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI Assistant:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI Assistant: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI Assistant:

> Finished ConversationChain chain.

' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is sunny with a high of 78 degrees and a lo

Human preﬁx

The next way to do so is by changing the Human preﬁx in the conversation summary. By default, this is set to “Human”, but
you can set this to be anything you want. Note that if you change this, you should also change the prompt used in the chain
to reﬂect this naming change. Let’s walk through an example of that in the example below.

# Now we can override it and set it to "Friend"

from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI doe

Current conversation:
{history}
Friend: {input}
AI:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBuﬀerMemory(human_preﬁx="Friend"),
)

conversation.predict(input="Hi there!")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Friend: Hi there!
AI:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...

Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of speciﬁc details from its context. If the AI does not know th

Current conversation:

Friend: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Friend: What's the weather?
AI:

> Finished ConversationChain chain.

' The weather right now is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is mostly sunny with a high of 82 degree

Help us out by providing feedback on this documentation page:

Previous
« Message Memory in Agent backed by a database
Next
Custom Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Custom callback
ModulesMoreCallbackshandlers

Custom callback handlers

You can create a custom handler to set on the object as well. In the example below, we’ll implement streaming with a custom
handler.

from langchain_core.callbacks import BaseCallbackHandler

from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

class MyCustomHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"My custom handler, token: {token}")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor

# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[MyCustomHandler()])

chat.invoke([HumanMessage(content="Tell me a joke")])
My custom handler, token:
My custom handler, token: Why
My custom handler, token: don
My custom handler, token: 't
My custom handler, token: scientists
My custom handler, token: trust
My custom handler, token: atoms
My custom handler, token: ?
My custom handler, token:

My custom handler, token: Because

My custom handler, token: they
My custom handler, token: make
My custom handler, token: up
My custom handler, token: everything
My custom handler, token: .
My custom handler, token:
AIMessage(content="Why don't scientists trust atoms? \n\nBecause they make up everything.", additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Async callbacks
Next
Logging to ﬁle »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
LangChain Expression Why use
Language LCEL

On this page

Why use LCEL

We recommend reading the LCEL Get started section ﬁrst.

LCEL makes it easy to build complex chains from basic components. It does this by providing: 1.A unified interface: Every
LCEL object implements the Runnable interface, which defines a common set of invocation methods (invoke , batch, stream, ainvoke,
…). This makes it possible for chains of LCEL objects to also automatically support these invocations. That is, every chain of
LCEL objects is itself an LCEL object. 2. Composition primitives: LCEL provides a number of primitives that make it easy to
compose chains, parallelize components, add fallbacks, dynamically configure chain internal, and more.

To better understand the value of LCEL, it’s helpful to see it in action and think about how we might recreate similar
functionality without it. In this walkthrough we’ll do just that with our basic example from the get started section. We’ll take our
simple prompt + model chain, which under the hood already deﬁnes a lot of functionality, and see what it would take to
recreate all of it.

%pip install –upgrade –quiet langchain-core langchain-openai langchain-anthropic

Invoke

In the simplest case, we just want to pass in a topic string and get back a joke string:

Without LCEL

from typing import List

import openai

prompt_template = "Tell me a short joke about {topic}"

client = openai.OpenAI()

def call_chat_model(messages: List[dict]) -> str:

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

def invoke_chain(topic: str) -> str:

prompt_value = prompt_template.format(topic=topic)
messages = [{"role": "user", "content": prompt_value}]
return call_chat_model(messages)

invoke_chain("ice cream")

LCEL

from langchain_openai import ChatOpenAI

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| output_parser
)

chain.invoke("ice cream")
Stream

If we want to stream results instead, we’ll need to change our function:

Without LCEL

from typing import Iterator

def stream_chat_model(messages: List[dict]) -> Iterator[str]:

stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True,
)
for response in stream:
content = response.choices[0].delta.content
if content is not None:
yield content

def stream_chain(topic: str) -> Iterator[str]:

prompt_value = prompt.format(topic=topic)
return stream_chat_model([{"role": "user", "content": prompt_value}])

for chunk in stream_chain("ice cream"):

print(chunk, end="", ﬂush=True)

LCEL

for chunk in chain.stream("ice cream"):

print(chunk, end="", ﬂush=True)

Batch

If we want to run on a batch of inputs in parallel, we’ll again need a new function:

Without LCEL

from concurrent.futures import ThreadPoolExecutor

def batch_chain(topics: list) -> list:

with ThreadPoolExecutor(max_workers=5) as executor:
return list(executor.map(invoke_chain, topics))

batch_chain(["ice cream", "spaghetti", "dumplings"])

LCEL

chain.batch(["ice cream", "spaghetti", "dumplings"])

Async

If we need an asynchronous version:

Without LCEL

async_client = openai.AsyncOpenAI()

async def acall_chat_model(messages: List[dict]) -> str:

response = await async_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

async def ainvoke_chain(topic: str) -> str:

prompt_value = prompt_template.format(topic=topic)
messages = [{"role": "user", "content": prompt_value}]
return await acall_chat_model(messages)
await ainvoke_chain("ice cream")
LCEL

chain.ainvoke("ice cream")

LLM instead of chat model

If we want to use a completion endpoint instead of a chat endpoint:

Without LCEL

def call_llm(prompt_value: str) -> str:

response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=prompt_value,
)
return response.choices[0].text

def invoke_llm_chain(topic: str) -> str:

prompt_value = prompt_template.format(topic=topic)
return call_llm(prompt_value)

invoke_llm_chain("ice cream")

LCEL

from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm_chain = (
{"topic": RunnablePassthrough()}
| prompt
| llm
| output_parser
)

llm_chain.invoke("ice cream")

Diﬀerent model provider

If we want to use Anthropic instead of OpenAI:

Without LCEL

import anthropic

anthropic_template = f"Human:\n\n{prompt_template}\n\nAssistant:"
anthropic_client = anthropic.Anthropic()

def call_anthropic(prompt_value: str) -> str:

response = anthropic_client.completions.create(
model="claude-2",
prompt=prompt_value,
max_tokens_to_sample=256,
)
return response.completion

def invoke_anthropic_chain(topic: str) -> str:

prompt_value = anthropic_template.format(topic=topic)
return call_anthropic(prompt_value)

invoke_anthropic_chain("ice cream")

LCEL

from langchain_anthropic import ChatAnthropic

anthropic = ChatAnthropic(model="claude-2")
anthropic_chain = (
{"topic": RunnablePassthrough()}
| prompt
| anthropic
| output_parser
)

anthropic_chain.invoke("ice cream")

Runtime conﬁgurability
If we wanted to make the choice of chat model or LLM conﬁgurable at runtime:

Without LCEL

def invoke_conﬁgurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> str:
if model == "chat_openai":
return invoke_chain(topic)
elif model == "openai":
return invoke_llm_chain(topic)
elif model == "anthropic":
return invoke_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def stream_conﬁgurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> Iterator[str]:
if model == "chat_openai":
return stream_chain(topic)
elif model == "openai":
# Note we haven't implemented this yet.
return stream_llm_chain(topic)
elif model == "anthropic":
# Note we haven't implemented this yet
return stream_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def batch_conﬁgurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
# You get the idea
...

async def abatch_conﬁgurable_chain(

topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

invoke_conﬁgurable_chain("ice cream", model="openai")

stream = stream_conﬁgurable_chain(
"ice_cream",
model="anthropic"
)
for chunk in stream:
print(chunk, end="", ﬂush=True)

# batch_conﬁgurable_chain(["ice cream", "spaghetti", "dumplings"])

# await ainvoke_conﬁgurable_chain("ice cream")

With LCEL

from langchain_core.runnables import ConﬁgurableField

configurable_model = model.configurable_alternatives(
ConfigurableField(id="model"),
default_key="chat_openai",
openai=llm,
anthropic=anthropic,
)
configurable_chain = (
{"topic": RunnablePassthrough()}
| prompt
| configurable_model
| output_parser
)
configurable_chain.invoke(
"ice cream",
config={"model": "openai"}
)
stream = configurable_chain.stream(
"ice cream",
config={"model": "anthropic"}
)
for chunk in stream:
print(chunk, end="", flush=True)

conﬁgurable_chain.batch(["ice cream", "spaghetti", "dumplings"])

# await conﬁgurable_chain.ainvoke("ice cream")

Logging
If we want to log our intermediate results:

Without LCEL

We’ll print intermediate steps for illustrative purposes

def invoke_anthropic_chain_with_logging(topic: str) -> str:
print(f"Input: {topic}")
prompt_value = anthropic_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_anthropic(prompt_value)
print(f"Output: {output}")
return output

invoke_anthropic_chain_with_logging("ice cream")

LCEL

Every component has built-in integrations with LangSmith. If we set the following two environment variables, all chain traces are logged to LangSmith.
import os

os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"

anthropic_chain.invoke("ice cream")

Here’s what our LangSmith trace looks like: https://smith.langchain.com/public/e4de52f8-bcd9-4732-b950-deee4b04e313/r

Fallbacks

If we wanted to add fallback logic, in case one model API is down:

Without LCEL

def invoke_chain_with_fallback(topic: str) -> str:

try:
return invoke_chain(topic)
except Exception:
return invoke_anthropic_chain(topic)

async def ainvoke_chain_with_fallback(topic: str) -> str:

try:
return await ainvoke_chain(topic)
except Exception:
# Note: we haven't actually implemented this.
return ainvoke_anthropic_chain(topic)

async def batch_chain_with_fallback(topics: List[str]) -> str:

try:
return batch_chain(topics)
except Exception:
# Note: we haven't actually implemented this.
return batch_anthropic_chain(topics)

invoke_chain_with_fallback("ice cream")
# await ainvoke_chain_with_fallback("ice cream")
batch_chain_with_fallback(["ice cream", "spaghetti", "dumplings"]))

LCEL

fallback_chain = chain.with_fallbacks([anthropic_chain])

fallback_chain.invoke("ice cream")
# await fallback_chain.ainvoke("ice cream")
fallback_chain.batch(["ice cream", "spaghetti", "dumplings"])

Full code comparison

Even in this simple case, our LCEL chain succinctly packs in a lot of functionality. As chains become more complex, this
becomes especially valuable.

Without LCEL

from concurrent.futures import ThreadPoolExecutor

from typing import Iterator, List, Tuple
import anthropic
import openai

prompt_template = "Tell me a short joke about {topic}"

anthropic_template = f"Human:\n\n{prompt_template}\n\nAssistant:"
client = openai.OpenAI()
async_client = openai.AsyncOpenAI()
anthropic_client = anthropic.Anthropic()

def call_chat_model(messages: List[dict]) -> str:

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

def invoke_chain(topic: str) -> str:

print(f"Input: {topic}")
prompt_value = prompt_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
messages = [{"role": "user", "content": prompt_value}]
output = call_chat_model(messages)
print(f"Output: {output}")
return output

def stream_chat_model(messages: List[dict]) -> Iterator[str]:

def stream_chain(topic: str) -> Iterator[str]:

print(f"Input: {topic}")
prompt_value = prompt.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
stream = stream_chat_model([{"role": "user", "content": prompt_value}])
for chunk in stream:
print(f"Token: {chunk}", end="")
yield chunk

def batch_chain(topics: list) -> list:

with ThreadPoolExecutor(max_workers=5) as executor:
return list(executor.map(invoke_chain, topics))

def call_llm(prompt_value: str) -> str:

response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=prompt_value,
)
return response.choices[0].text

def invoke_llm_chain(topic: str) -> str:

print(f"Input: {topic}")
prompt_value = promtp_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_llm(prompt_value)
print(f"Output: {output}")
return output

def call_anthropic(prompt_value: str) -> str:

response = anthropic_client.completions.create(
model="claude-2",
prompt=prompt_value,
max_tokens_to_sample=256,
)
return response.completion

def invoke_anthropic_chain(topic: str) -> str:

print(f"Input: {topic}")
prompt_value = anthropic_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_anthropic(prompt_value)
print(f"Output: {output}")
return output

async def ainvoke_anthropic_chain(topic: str) -> str:

...

def stream_anthropic_chain(topic: str) -> Iterator[str]:

...

def batch_anthropic_chain(topics: List[str]) -> List[str]:

...

def batch_conﬁgurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

async def abatch_conﬁgurable_chain(

topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

def invoke_chain_with_fallback(topic: str) -> str:

try:
return invoke_chain(topic)
except Exception:
return invoke_anthropic_chain(topic)

async def ainvoke_chain_with_fallback(topic: str) -> str:

try:
return await ainvoke_chain(topic)
except Exception:
return ainvoke_anthropic_chain(topic)

async def batch_chain_with_fallback(topics: List[str]) -> str:

try:
return batch_chain(topics)
except Exception:
return batch_anthropic_chain(topics)

LCEL

import os

from langchain_anthropic import ChatAnthropic

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, ConﬁgurableField

os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"

prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
chat_openai = ChatOpenAI(model="gpt-3.5-turbo")
openai = OpenAI(model="gpt-3.5-turbo-instruct")
anthropic = ChatAnthropic(model="claude-2")
model = (
chat_openai
.with_fallbacks([anthropic])
.conﬁgurable_alternatives(
ConﬁgurableField(id="model"),
default_key="chat_openai",
openai=openai,
anthropic=anthropic,
)
)

chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

Next steps

To continue learning about LCEL, we recommend: - Reading up on the full LCELInterface, which we’ve only partially covered
here. - Exploring the How-to section to learn about additional composition primitives that LCEL provides. - Looking through
the Cookbook section to see LCEL in action for common use cases. A good next use case to look at would beRetrieval-
augmented generation.

Help us out by providing feedback on this documentation page:

Previous
« Get started
Next
Interface »

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage
Blog

YouTube
Model Few-shot examples for chat
ModulesI/O Prompts models

On this page

Few-shot examples for chat models

This notebook covers how to use few-shot examples in chat models. There does not appear to be solid consensus on how
best to do few-shot prompting, and the optimal prompt compilation will likely vary by model. Because of this, we provide few-
shot prompt templates like the FewShotChatMessagePromptTemplate as a ﬂexible starting point, and you can modify or
replace them as you see ﬁt.

The goal of few-shot prompt templates are to dynamically select examples based on an input, and then format the examples
in a ﬁnal prompt to provide for the model.

Note: The following code examples are for chat models. For similar few-shot prompt examples for completion models (LLMs),
see the few-shot prompt templates guide.

Fixed Examples

The most basic (and common) few-shot prompting technique is to use a ﬁxed prompt example. This way you can select a
chain, evaluate it, and avoid worrying about additional moving parts in production.

The basic components of the template are: -examples: A list of dictionary examples to include in the ﬁnal prompt. -
example_prompt: converts each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message followed
by a function call message.

Below is a simple demonstration. First, import the modules for this example:

from langchain.prompts import (

ChatPromptTemplate,
FewShotChatMessagePromptTemplate,
)

Then, deﬁne the examples you’d like to include.

examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
]

Next, assemble them into the few-shot prompt template.

# This is a prompt template used to format each individual example.

example_prompt = ChatPromptTemplate.from_messages(
[
("human", "{input}"),
("ai", "{output}"),
]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=example_prompt,
examples=examples,
)

print(few_shot_prompt.format())
Human: 2+2
AI: 4
Human: 2+3
AI: 5

Finally, assemble your ﬁnal prompt and use it with a model.

ﬁnal_prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a wondrous wizard of math."),
few_shot_prompt,
("human", "{input}"),
]
)
from langchain_community.chat_models import ChatAnthropic

chain = ﬁnal_prompt | ChatAnthropic(temperature=0.0)

chain.invoke({"input": "What's the square of a triangle?"})

AIMessage(content=' Triangles do not have a "square". A square refers to a shape with 4 equal sides and 4 right angles. Triangles have 3 sides and 3 angles.\n\nThe

Dynamic few-shot prompting

Sometimes you may want to condition which examples are shown based on the input. For this, you can replace theexamples
with an example_selector. The other components remain the same as above! To review, the dynamic few-shot prompt template
would look like:

example_selector:responsible for selecting few-shot examples (and the order in which they are returned) for a given input.
These implement the BaseExampleSelector interface. A common example is the vectorstore-backed
SemanticSimilarityExampleSelector
example_prompt: convert each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message
followed by a function call message.

These once again can be composed with other messages and chat templates to assemble your ﬁnal prompt.

from langchain.prompts import SemanticSimilarityExampleSelector

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

Since we are using a vectorstore to select examples based on semantic similarity, we will want to ﬁrst populate the store.

examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
{"input": "2+4", "output": "6"},
{"input": "What did the cow say to the moon?", "output": "nothing at all"},
{
"input": "Write me a poem about the moon",
"output": "One for the moon, and one for me, who are we to talk about the moon?",
},
]

to_vectorize = [" ".join(example.values()) for example in examples]

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=examples)

Create the example_selector

With a vectorstore created, you can create the example_selector. Here we will isntruct it to only fetch the top 2 examples.

example_selector = SemanticSimilarityExampleSelector(
vectorstore=vectorstore,
k=2,
)

# The prompt template will load examples by passing the input do the `select_examples` method
example_selector.select_examples({"input": "horse"})
[{'input': 'What did the cow say to the moon?', 'output': 'nothing at all'},
{'input': '2+4', 'output': '6'}]

Create prompt template

Assemble the prompt template, using the example_selector created above.

from langchain.prompts import (
ChatPromptTemplate,
FewShotChatMessagePromptTemplate,
)

# Deﬁne the few-shot prompt.

few_shot_prompt = FewShotChatMessagePromptTemplate(
# The input variables select the values to pass to the example_selector
input_variables=["input"],
example_selector=example_selector,
# Deﬁne how each example will be formatted.
# In this case, each example will become 2 messages:
# 1 human, and 1 AI
example_prompt=ChatPromptTemplate.from_messages(
[("human", "{input}"), ("ai", "{output}")]
),
)

Below is an example of how this would be assembled.

print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4

Assemble the ﬁnal prompt template:

ﬁnal_prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a wondrous wizard of math."),
few_shot_prompt,
("human", "{input}"),
]
)
print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4

Use with an LLM

Now, you can connect your model to the few-shot prompt.

from langchain_community.chat_models import ChatAnthropic

chain = ﬁnal_prompt | ChatAnthropic(temperature=0.0)

chain.invoke({"input": "What's 3+3?"})

AIMessage(content=' 3 + 3 = 6', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Few-shot prompt templates
Next
Types of `MessagePromptTemplate` »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
Vector store-backed
ModulesRetrievalRetrieversretriever

On this page

Vector store-backed retriever

A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the
vector store class to make it conform to the retriever interface. It uses the search methods implemented by a vector store, like
similarity search and MMR, to query the texts in the vector store.

Once you construct a vector store, it’s very easy to construct a retriever. Let’s walk through an example.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../state_of_the_union.txt")
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)
retriever = db.as_retriever()
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Maximum marginal relevance retrieval

By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal
relevance search, you can specify that as the search type.

retriever = db.as_retriever(search_type="mmr")
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Similarity score threshold retrieval

You can also set a retrieval method that sets a similarity score threshold and only returns documents with a score above that
threshold.

retriever = db.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Specifying top k

You can also specify search kwargs like k to use when doing retrieval.

retriever = db.as_retriever(search_kwargs={"k": 1})

docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
len(docs)
1

Help us out by providing feedback on this documentation page:

Previous
« Retrievers
Next
MultiQueryRetriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Contextual
ModulesRetrievalRetrieverscompression

On this page

Contextual compression
One challenge with retrieval is that usually you don’t know the speciﬁc queries your document storage system will face when
you ingest data into the system. This means that the information most relevant to a query may be buried in a document with
a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer
responses.

Contextual compression is meant to ﬁx this. The idea is simple: instead of immediately returning retrieved documents as-is,
you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing”
here refers to both compressing the contents of an individual document and ﬁltering out documents wholesale.

To use the Contextual Compression Retriever, you’ll need: - a base retriever - a Document Compressor

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them
through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the
contents of documents or dropping documents altogether.

Get started

# Helper function for printing docs

def pretty_print_docs(docs):
print(
f"\n{'-' * 100}\n".join(
[f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
)
)

Using a vanilla vector store retriever

Let’s start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can
see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the
relevant docs have a lot of irrelevant information in them.

from langchain_community.document_loaders import TextLoader

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

docs = retriever.get_relevant_documents(
"What did the president say about Ketanji Brown Jackson"
)
pretty_print_docs(docs)
Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
----------------------------------------------------------------------------------------------------
Document 2:

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police oﬃcers. A consensus builder. Since

And if we are to advance liberty and justice, we need to secure the Border and ﬁx the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traﬃckers.

We’re putting in place dedicated immigration judges so families ﬂeeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
----------------------------------------------------------------------------------------------------
Document 3:

And for our LGBTQ+ Americans, let’s ﬁnally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families

As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-A

And soon, we’ll strengthen the Violence Against Women Act that I ﬁrst wrote three decades ago. It is important for us to show the nation that we can come together a

So tonight I’m oﬀering a Unity Agenda for the Nation. Four big things we can do together.

First, beat the opioid epidemic.

----------------------------------------------------------------------------------------------------
Document 4:

Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers.

And as Wall Street ﬁrms take over more nursing homes, quality in those homes has gone down and costs have gone up.

That ends on my watch.

Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.

We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not d

Let’s pass the Paycheck Fairness Act and paid leave.

Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.

Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret

Adding contextual compression with an LLMChainExtractor

Now let’s wrap our base retriever with a ContextualCompressionRetriever. We’ll add an LLMChainExtractor, which will iterate over the
initially returned documents and extract from each only the content that is relevant to the query.

from langchain.retrievers import ContextualCompressionRetriever

from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(

Document 1:

I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.

More built-in compressors: ﬁlters

LLMChainFilter

The LLMChainFilter is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially
retrieved documents to ﬁlter out and which ones to return, without manipulating the document contents.

from langchain.retrievers.document_compressors import LLMChainFilter

_ﬁlter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=_ﬁlter, base_retriever=retriever
)

Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

EmbeddingsFilter

Making an extra LLM call over each retrieved document is expensive and slow. TheEmbeddingsFilter provides a cheaper and
faster option by embedding the documents and query and only returning those documents which have suﬃciently similar
embeddings to the query.

from langchain.retrievers.document_compressors import EmbeddingsFilter

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embeddings_ﬁlter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(
base_compressor=embeddings_ﬁlter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
Document 1: