Langchain API Docs
Langchain API Docs
Language to runnables
On this page
First, let’s create an example LCEL. We will create one that does retrieval
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Get a graph
chain.get_graph()
Print a graph
While that is not super legible, you can print it to get a display that’s easier to understand
chain.get_graph().print_ascii()
+---------------------------------+
| Parallel<context,question>Input |
+---------------------------------+
** **
*** ***
** **
+----------------------+ +-------------+
| VectorStoreRetriever | | Passthrough |
+----------------------+ +-------------+
** **
*** ***
** **
+----------------------------------+
| Parallel<context,question>Output |
+----------------------------------+
*
*
*
+--------------------+
| ChatPromptTemplate |
+--------------------+
*
*
*
+------------+
| ChatOpenAI |
+------------+
*
*
*
+-----------------+
| StrOutputParser |
+-----------------+
*
*
*
+-----------------------+
| StrOutputParserOutput |
+-----------------------+
An important part of every chain is the prompts that are used. You can get the prompts present in the chain:
chain.get_prompts()
[ChatPromptTemplate(input_variables=['context', 'question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'quest
Previous
« Stream custom generator functions
Next
Add message history (memory) »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O
On this page
Model I/O
The core element of any language model application is...the model. LangChain gives you the building blocks to interface with
any language model.
Conceptual Guide
A conceptual explanation of messages, prompts, LLMs vs ChatModels, and output parsers. You should read this before
getting started.
Quickstart
Covers the basics of getting started working with different types of models. You should walk throughthis section if you want to
get an overview of the functionality.
Prompts
This section deep dives into the different types of prompt templates and how to use them.
LLMs
This section covers functionality related to the LLM class. This is a type of model that takes a text string as input and returns
a text string.
ChatModels
This section covers functionality related to the ChatModel class. This is a type of model that takes a list of messages as input
and returns a message.
Output Parsers
Output parsers are responsible for transforming the output of LLMs and ChatModels into more structured data.This section
covers the different types of output parsers.
Previous
« Modules
Next
Model I/O »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalRetrieversTime-weighted vector store retriever
On this page
Notably, hours_passed refers to the hours passed since the object in the retrieverwas last accessed, not since it was created.
This means that frequently accessed objects remain “fresh”.
import faiss
from langchain.docstore import InMemoryDocstore
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
A low decay rate (in this, to be extreme, we will set it close to 0) means memories will be “remembered” for longer. Adecay rate
of 0 means memories never be forgotten, making this retriever equivalent to the vector lookup.
With a high decay rate (e.g., several 9’s), the recency score quickly goes to 0! If you set this all the way to 1,recency is 0 for all
objects, once again making this equivalent to a vector lookup.
# Define your embedding model
embeddings_model = OpenAIEmbeddings()
# Initialize the vectorstore as empty
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model, index, InMemoryDocstore({}), {})
retriever = TimeWeightedVectorStoreRetriever(
vectorstore=vectorstore, decay_rate=0.999, k=1
)
yesterday = datetime.now() - timedelta(days=1)
retriever.add_documents(
[Document(page_content="hello world", metadata={"last_accessed_at": yesterday})]
)
retriever.add_documents([Document(page_content="hello foo")])
['eb1c4c86-01a8-40e3-8393-9a927295a950']
# "Hello Foo" is returned first because "hello world" is mostly forgotten
retriever.get_relevant_documents("hello world")
[Document(page_content='hello foo', metadata={'last_accessed_at': datetime.datetime(2023, 12, 27, 15, 30, 50, 57185), 'created_at': datetime.datetime(2023, 12, 27
Virtual time
Using some utils in LangChain, you can mock out the time component.
import datetime
Previous
« Self-querying
Next
Indexing »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory
ModulesMoreMemorytypes
Memory types
There are many different types of memory. Each has their own parameters, their own return types, and is useful in different
scenarios. Please see their individual page for more detail on each one.
Previous
« Chat Messages
Next
Conversation Buffer »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language CookbookAdding moderation
Adding moderation
This shows how to add in moderation (or other safeguards) around your LLM application.
Previous
« Adding memory
Next
Managing prompt size »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsTools
On this page
Tools
Tools are interfaces that an agent can use to interact with the world. They combine a few things:
It is useful to have all this information because this information can be used to build action-taking systems! The name,
description, and JSON schema can be used to prompt the LLM so it knows how to specify what action to take, and then the
function to call is equivalent to taking that action.
The simpler the input to a tool is, the easier it is for an LLM to be able to use it. Many agents will only work with tools that
have a single string input. For a list of agent types and which ones work with more complicated inputs, please see this
documentation
Importantly, the name, description, and JSON schema (if used) are all used in the prompt. Therefore, it is really important that
they are clear and describe exactly how the tool should be used. You may need to change the default name, description, or
JSON schema if the LLM is not understanding how to use the tool.
Default Tools
Let’s take a look at how to work with tools. To do this, we’ll work with a built in tool.
tool.name
'Wikipedia'
tool.description
'A wrapper around Wikipedia. Useful for when you need to answer general questions about people, places, companies, facts, historical events, or other subjects. Inpu
tool.args
{'query': {'title': 'Query', 'type': 'string'}}
tool.return_direct
False
tool.run({"query": "langchain"})
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '
We can also call this tool with a single string input. We can do this because this tool expects only a single input. If it required
multiple inputs, we would not be able to do that.
tool.run("langchain")
'Page: LangChain\nSummary: LangChain is a framework designed to simplify the creation of applications '
We can also modify the built in name, description, and JSON schema of the arguments.
When defining the JSON schema of the arguments, it is important that the inputs remain the same as the function, so you
shouldn’t change that. But you can define custom descriptions for each input easily.
class WikiInputs(BaseModel):
"""Inputs to the wikipedia tool."""
More Topics
This was a quick introduction to tools in LangChain, but there is a lot more to learn
Custom Tools: Although built-in tools are useful, it’s highly likely that you’ll have to define your own tools. Seethis guide for
instructions on how to do so.
Toolkits: Toolkits are collections of tools that work well together. For a more in depth description as well as a list of all built-in
toolkits, see this page
Tools as OpenAI Functions: Tools are very similar to OpenAI Functions, and can easily be converted to that format. See
this notebook for instructions on how to do that.
Previous
« Agents
Next
Toolkits »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels
On this page
Head to Integrations for documentation on built-in integrations with text embedding model providers.
The Embeddings class is a class designed for interfacing with text embedding models. There are lots of embedding model
providers (OpenAI, Cohere, Hugging Face, etc) - this class is designed to provide a standard interface for all of them.
Embeddings create a vector representation of a piece of text. This is useful because it means we can think about text in the
vector space, and do things like semantic search where we look for pieces of text that are most similar in the vector space.
The base Embeddings class in LangChain provides two methods: one for embedding documents and one for embedding a
query. The former takes as input multiple texts, while the latter takes a single text. The reason for having these as two
separate methods is that some embedding providers have different embedding methods for documents (to be searched over)
vs queries (the search query itself).
Get started
Setup
OpenAI
Cohere
Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:
export OPENAI_API_KEY="..."
If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:
embeddings_model = OpenAIEmbeddings(openai_api_key="...")
embeddings_model = OpenAIEmbeddings()
embed_documents
embed_query
Embed a single piece of text for the purpose of comparing to other embedded pieces of texts.
Previous
« Retrieval
Next
CacheBackedEmbeddings »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Message Memory in Agent backed by a
ModulesMoreMemorydatabase
Memory in LLMChain
Custom Agents
Memory in Agent
In order to add a memory with an external message store to an agent we are going to do the following steps:
1. We are going to create a RedisChatMessageHistory to connect to an external database to store the messages in.
2. We are going to create an LLMChain using that chat history as memory.
3. We are going to use that LLMChain to create a custom Agent.
For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBufferMemory class.
Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBufferMemory.
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
{chat_history}
Question: {input}
{agent_scratchpad}"""
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
message_history = RedisChatMessageHistory(
url="redis://localhost:6379/0", ttl=600, session_id="my-session"
)
memory = ConversationBufferMemory(
memory_key="chat_history", chat_memory=message_history
)
We can now construct the LLMChain, with the Memory object, and then create the agent.
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_chain = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True, memory=memory
)
agent_chain.run(input="How many people live in canada?")
'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.
We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.
For fun, let’s compare this to an agent that does NOT have memory.
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
Question: {input}
{agent_scratchpad}"""
prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_without_memory = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True
)
agent_without_memory.run("How many people live in canada?")
'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
agent_without_memory.run("what is their national anthem called?")
Previous
« Memory in Agent
Next
Customizing Conversational Memory »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O PromptsPipeline
Pipeline
This notebook goes over how to compose multiple prompts together. This can be useful when you want to reuse parts of
prompts. This can be done with a PipelinePrompt. A PipelinePrompt consists of two main parts:
{example}
{start}"""
full_prompt = PromptTemplate.from_template(full_template)
introduction_template = """You are impersonating {person}."""
introduction_prompt = PromptTemplate.from_template(introduction_template)
example_template = """Here's an example of an interaction:
Q: {example_q}
A: {example_a}"""
example_prompt = PromptTemplate.from_template(example_template)
start_template = """Now, do this for real!
Q: {input}
A:"""
start_prompt = PromptTemplate.from_template(start_template)
input_prompts = [
("introduction", introduction_prompt),
("example", example_prompt),
("start", start_prompt),
]
pipeline_prompt = PipelinePromptTemplate(
final_prompt=full_prompt, pipeline_prompts=input_prompts
)
pipeline_prompt.input_variables
['example_q', 'example_a', 'input', 'person']
print(
pipeline_prompt.format(
person="Elon Musk",
example_q="What's your favorite car?",
example_a="Tesla",
input="What's your favorite social media site?",
)
)
You are impersonating Elon Musk.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesMoreCallbacksAsync callbacks
Async callbacks
If you are planning to use the async API, it is recommended to useAsyncCallbackHandler to avoid blocking the runloop.
Advanced if you use a sync CallbackHandler while using an async method to run your LLM / Chain / Tool / Agent, it will still
work. However, under the hood, it will be called with run_in_executor which can cause issues if your CallbackHandler is not thread-
safe.
import asyncio
from typing import Any, Dict, List
class MyCustomSyncHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")
class MyCustomAsyncHandler(AsyncCallbackHandler):
"""Async callback handler that can be used to handle callbacks from langchain."""
Previous
« Callbacks
Next
Custom callback handlers »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Bind runtime
Language to args
On this page
SOLUTION:
Subtracting 7 from both sides of the equation, we get:
x^3 = 12 - 7
x^3 = 5
runnable = (
{"equation_statement": RunnablePassthrough()}
| prompt
| model.bind(stop="SOLUTION")
| StrOutputParser()
)
print(runnable.invoke("x raised to the third plus seven equals 12"))
EQUATION: x^3 + 7 = 12
One particularly useful application of binding is to attach OpenAI functions to a compatible OpenAI model:
function = {
"name": "solver",
"description": "Formulates and solves an equation",
"parameters": {
"type": "object",
"properties": {
"equation": {
"type": "string",
"description": "The algebraic expression of the equation",
},
"solution": {
"type": "string",
"description": "The solution to the equation",
},
},
"required": ["equation", "solution"],
},
}
# Need gpt-4 to solve this one correctly
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"Write out the following equation using algebraic symbols then solve it.",
),
("human", "{equation_statement}"),
]
)
model = ChatOpenAI(model="gpt-4", temperature=0).bind(
function_call={"name": "solver"}, functions=[function]
)
runnable = {"equation_statement": RunnablePassthrough()} | prompt | model
runnable.invoke("x raised to the third plus seven equals 12")
AIMessage(content='', additional_kwargs={'function_call': {'name': 'solver', 'arguments': '{\n"equation": "x^3 + 7 = 12",\n"solution": "x = ∛5"\n}'}}, example=False)
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]
model = ChatOpenAI(model="gpt-3.5-turbo-1106").bind(tools=tools)
model.invoke("What's the weather in SF, NYC and LA?")
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zHN0ZHwrxM7nZDdqTp6dkPko', 'function': {'arguments': '{"location": "San Francisco, CA", "unit": "c
Previous
« RunnableBranch: Dynamically route logic based on input
Next
Configure chain internals at runtime »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Structured
ModulesAgentsAgent Typeschat
On this page
Structured chat
The structured chat agent is capable of using multi-input tools.
Initialize Tools
tools = [TavilySearchResults(max_results=1)]
Create Agent
Run Agent
agent_executor.invoke(
{
"input": "what's my name? Do not use tools unless you have to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)
{'input': "what's my name? Do not use tools unless you have to",
'chat_history': [HumanMessage(content='hi! my name is bob'),
AIMessage(content='Hello Bob! How can I assist you today?')],
'output': 'Your name is Bob.'}
Previous
« JSON Chat Agent
Next
ReAct »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Get started
Get started
Get started with LangChain
️ Introduction
LangChain is a framework for developing applications powered by language models. It enables applications that:
️ Installation
Official release
️ Quickstart
In this quickstart we'll show you how to:
️ Security
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and databases. These integrations
… allow de
Next
Introduction »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Example Selector Select by n-gram
ModulesI/O Prompts Types overlap
The selector allows for a threshold score to be set. Examples with an ngram overlap score less than or equal to the threshold
are excluded. The threshold is set to -1.0, by default, so will not exclude any examples, only reorder them. Setting the
threshold to 0.0 will exclude examples that have no ngram overlaps with the input.
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))
Give the Spanish translation of every input
Previous
« Select by maximal marginal relevance (MMR)
Next
Select by similarity »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Function
ModulesI/O Chat Modelscalling
On this page
Function calling
A growing number of chat models, like OpenAI, Gemini, etc., have a function-calling API that lets you describe functions and
their arguments, and have the model return a JSON object with a function to invoke and the inputs to that function. Function-
calling is extremely useful for building tool-using chains and agents, and for getting structured outputs from models more
generally.
LangChain comes with a number of utilities to make function-calling easy. Namely, it comes with:
We’ll focus here on the first two points. For a detailed guide on output parsing check out theOpenAI Tools output parsers
and to see the structured output chains check out the Structured output guide.
Binding functions
A number of models implement helper methods that will take care of formatting and binding different function-like objects to
the model. Let’s take a look at how we might take the following Pydantic function schema and get different models to invoke
it:
# Note that the docstrings here are crucial, as they will be passed along
# to the model along with the class name.
class Multiply(BaseModel):
"""Multiply two integers together."""
OpenAI
Fireworks
Mistral
Together
We can use the ChatOpenAI.bind_tools() method to handle converting Multiply to an OpenAI function and binding it to the model
(i.e., passing it in each time the model is invoked).
from langchain_openai import ChatOpenAI
We can add a tool parser to extract the tool calls from the generated message to JSON:
If we wanted to force that a tool is used (and that it is used only once), we can set thetool_choice argument:
In case you need to access function schemas directly, LangChain has a built-in converter that can turn Python functions,
Pydantic classes, and LangChain Tools into the OpenAI format JSON schema:
Python function
import json
Args:
a: First integer
b: Second integer
"""
return a * b
print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "integer",
"description": "First integer"
},
"b": {
"type": "integer",
"description": "Second integer"
}
},
"required": [
"a",
"b"
]
}
}
}
Pydantic class
from langchain_core.pydantic_v1 import BaseModel, Field
class multiply(BaseModel):
"""Multiply two integers together."""
print(json.dumps(convert_to_openai_tool(multiply), indent=2))
{
"type": "function",
"function": {
"name": "multiply",
"description": "Multiply two integers together.",
"parameters": {
"type": "object",
"properties": {
"a": {
"description": "First integer",
"type": "integer"
},
"b": {
"description": "Second integer",
"type": "integer"
}
},
"required": [
"a",
"b"
]
}
}
}
LangChain Tool
from typing import Any, Type
class MultiplySchema(BaseModel):
"""Multiply tool schema."""
class Multiply(BaseTool):
args_schema: Type[BaseModel] = MultiplySchema
name: str = "multiply"
description: str = "Multiply two integers together."
Next steps
Output parsing: See OpenAI Tools output parsers and OpenAI Functions output parsers to learn about extracting the
function calling API responses into various formats.
Structured output chains: Some models have constructors that handle creating a structured output chain for you.
Tool use: See how to construct chains and agents that actually call the invoked tools inthese guides.
Previous
« Quick Start
Next
Caching »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Querying a SQL
Language CookbookDB
Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.
template = """Based on the table schema below, write a SQL query that would answer the user's question:
{schema}
Question: {question}
SQL Query:"""
prompt = ChatPromptTemplate.from_template(template)
from langchain_community.utilities import SQLDatabase
We’ll need the Chinook sample DB for this example. There’s many places to download it from, e.g.https://database.guide/2-
sample-databases-sqlite/
db = SQLDatabase.from_uri("sqlite:///./Chinook.db")
def get_schema(_):
return db.get_table_info()
def run_query(query):
return db.run(query)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
model = ChatOpenAI()
sql_response = (
RunnablePassthrough.assign(schema=get_schema)
| prompt
| model.bind(stop=["\nSQLResult:"])
| StrOutputParser()
)
sql_response.invoke({"question": "How many employees are there?"})
'SELECT COUNT(*) FROM Employee'
template = """Based on the table schema below, question, sql query, and sql response, write a natural language response:
{schema}
Question: {question}
SQL Query: {query}
SQL Response: {response}"""
prompt_response = ChatPromptTemplate.from_template(template)
full_chain = (
RunnablePassthrough.assign(query=sql_response).assign(
schema=get_schema,
response=lambda x: db.run(x["query"]),
)
| prompt_response
| model
)
full_chain.invoke({"question": "How many employees are there?"})
AIMessage(content='There are 8 employees.', additional_kwargs={}, example=False)
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Multiple callback
ModulesMoreCallbackshandlers
However, in many cases, it is advantageous to pass in handlers instead when running the object. When we pass through
CallbackHandlers using the callbacks keyword arg when executing an run, those callbacks will be issued by all nested objects
involved in the execution. For example, when a handler is passed through to an Agent, it will be used for all callbacks related
to the agent and all the objects involved in the agent’s execution, in this case, the Tools, LLMChain, and LLM.
This prevents us from having to manually attach the handlers to each individual nested object.
from typing import Any, Dict, List, Union
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> Any:
print(f"on_chain_start {serialized['name']}")
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> Any:
print(f"on_tool_start {serialized['name']}")
class MyCustomHandlerTwo(BaseCallbackHandler):
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
print(f"on_llm_start (I'm the second handler!!) {serialized['name']}")
# Setup the agent. Only the `llm` will issue callbacks for handler2
llm = OpenAI(temperature=0, streaming=True, callbacks=[handler2])
tools = load_tools(["llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)
on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token
on_new_token ```
on_new_token ...
on_new_token num
on_new_token expr
on_new_token .
on_new_token evaluate
on_new_token ("
on_new_token 2
on_new_token **
on_new_token 0
on_new_token .
on_new_token 235
on_new_token ")
on_new_token ...
on_new_token
on_new_token
on_chain_start LLMChain
on_llm_start OpenAI
on_llm_start (I'm the second handler!!) OpenAI
on_new_token I
on_new_token now
on_new_token know
on_new_token the
on_new_token final
on_new_token answer
on_new_token .
on_new_token
Final
on_new_token Answer
on_new_token :
on_new_token 1
on_new_token .
on_new_token 17
on_new_token 690
on_new_token 67
on_new_token 372
on_new_token 187
on_new_token 674
on_new_token
'1.1769067372187674'
Previous
« Logging to file
Next
Tags »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalRetrievers
On this page
Retrievers
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A
retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the
backbone of a retriever, but there are other types of retrievers as well.
Retrievers accept a string query as input and return a list of Document's as output.
LangChain provides several advanced retrieval types. A full list is below, along with the following information:
Index Type: Which index type (if any) this relies on.
When to Use: Our commentary on when you should considering using this retrieval method.
LangChain also integrates with many third-party retrieval services. For a full list of these, check outthis list of all integrations.
Since retrievers are Runnable 's, we can easily compose them with otherRunnable objects:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Custom Retriever
Since the retriever interface is so simple, it's pretty easy to write a custom one.
class CustomRetriever(BaseRetriever):
def _get_relevant_documents(
self, query: str, *, run_manager: CallbackManagerForRetrieverRun
) -> List[Document]:
return [Document(page_content=query)]
retriever = CustomRetriever()
retriever.get_relevant_documents("bar")
Previous
« Vector stores
Next
Vector store-backed retriever »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Managing prompt
Language Cookbooksize
With LCEL, it’s easy to add custom functionality for managing the size of prompts within your chain or agent. Let’s look at
simple agent example that can search Wikipedia for information.
agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)
Page: Delaware
Summary: Delaware ( DEL-ə-wair) is a state in the northeast and Mid-Atlantic regions of the United States. It borders Maryland to its south and west, Pennsylvania to
The southern two counties, Kent and Sussex counties, historically have been predominantly agrarian economies. New Castle is more urbanized and is considered pa
Delaware was one of the Thirteen Colonies that participated in the American Revolution and American Revolutionary War, in which the American Continental Army, le
On December 7, 1787, Delaware was the first state to ratify the Constitution of the United States, earning it the nickname "The First State".Since the turn of the 20th c
Page: Lenape
Summary: The Lenape (English: , , ; Lenape languages: [lənaːpe]), also called the Lenni Lenape and Delaware people, are an Indigenous people of the Northeastern
During the last decades of the 18th century, European settlers and the effects of the American Revolutionary War displaced most Lenape from their homelands and p
BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 5487 tokens (541
LangSmith trace
Unfortunately we run out of space in our model’s context window before we the agent can get to the final answer. Now let’s
add some prompt handling logic. To keep things simple, if our messages have too many tokens we’ll start dropping the
earliest AI, Function message pairs (this is the model tool invocation message and the subsequent tool output message) in
the chat history.
def condense_prompt(prompt: ChatPromptValue) -> ChatPromptValue:
messages = prompt.to_messages()
num_tokens = llm.get_num_tokens_from_messages(messages)
ai_function_messages = messages[2:]
while num_tokens > 4_000:
ai_function_messages = ai_function_messages[2:]
num_tokens = llm.get_num_tokens_from_messages(
messages[:2] + ai_function_messages
)
messages = messages[:2] + ai_function_messages
return ChatPromptValue(messages=messages)
agent = (
{
"input": itemgetter("input"),
"agent_scratchpad": lambda x: format_to_openai_function_messages(
x["intermediate_steps"]
),
}
| prompt
| condense_prompt
| llm.bind_functions(tools)
| OpenAIFunctionsAgentOutputParser()
)
Page: Delaware
Summary: Delaware ( DEL-ə-wair) is a state in the northeast and Mid-Atlantic regions of the United States. It borders Maryland to its south and west, Pennsylvania to
The southern two counties, Kent and Sussex counties, historically have been predominantly agrarian economies. New Castle is more urbanized and is considered pa
Delaware was one of the Thirteen Colonies that participated in the American Revolution and American Revolutionary War, in which the American Continental Army, le
On December 7, 1787, Delaware was the first state to ratify the Constitution of the United States, earning it the nickname "The First State".Since the turn of the 20th c
Page: Delaware City, Delaware
Summary: Delaware City is a city in New Castle County, Delaware, United States. The population was 1,885 as of 2020. It is a small port town on the eastern terminu
Page: Lenape
Summary: The Lenape (English: , , ; Lenape languages: [lənaːpe]), also called the Lenni Lenape and Delaware people, are an Indigenous people of the Northeastern
During the last decades of the 18th century, European settlers and the effects of the American Revolutionary War displaced most Lenape from their homelands and p
Page: Silkie
Summary: The Silkie (also known as the Silky or Chinese silk chicken) is a breed of chicken named for its atypically fluffy plumage, which is said to feel like silk and s
{'input': "Who is the current US president? What's their home state? What's their home state's bird? What's that bird's scientific name?",
'output': 'The current US president is Joe Biden. His home state is Delaware. The home state bird of Delaware is the Delaware Blue Hen. The scientific name of the D
LangSmith trace
Previous
« Adding moderation
Next
Using tools »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Output Custom Output
ModulesI/O Parsers Parsers
On this page
1. Using RunnableLambda or RunnableGenerator in LCEL – we strongly recommend this for most use cases
2. By inherting from one of the base classes for out parsing – this is the hard way of doing things
The difference between the two approaches are mostly superficial and are mainly in terms of which callbacks are triggered
(e.g., on_chain_start vs. on_parser_start), and how a runnable lambda vs. a parser might be visualized in a tracing platform like
LangSmith.
The recommended way to parse is using runnable lambdas and runnable generators!
Here, we will make a simple parse that inverts the case of the output from the model.
For example, if the model outputs: “Meow”, the parser will produce “mEOW”.
model = ChatAnthropic(model_name="claude-2.1")
LCEL automatically upgrades the function parse to RunnableLambda(parse) when composed using a | syntax.
If you don’t like that you can manually import RunnableLambda and then runparse = RunnableLambda(parse) .
No, it doesn’t because the parser aggregates the input before parsing the output.
If we want to implement a streaming parser, we can have the parser accept an iterable over the input instead and yield the
results as they’re available.
from langchain_core.runnables import RunnableGenerator
streaming_parse = RunnableGenerator(streaming_parse)
INFO
Please wrap the streaming parser in RunnableGenerator as we may stop automatically upgrading it with the| syntax.
Another approach to implement a parser is by inherting from BaseOutputParser, BaseGenerationOutputParser or another one of the
base parsers depending on what you need to do.
In general, we do not recommend this approach for most use cases as it results in more code to write without significant
benefits.
The simplest kind of output parser extends theBaseOutputParser class and must implement the following methods:
When the output from the chat model or LLM is malformed, the can throw anOutputParserException to indicate that parsing fails
because of bad input. Using this exception allows code that utilizes the parser to handle the exceptions in a consistent
manner.
Because BaseOutputParser implements the Runnable interface, any custom parser you will create this way will become valid
LangChain Runnables and will benefit from automatic async support, batch interface, logging support etc. :::
Simple Parser
Here’s a simple parser that can parse a string representation of a booealn (e.g., YES or NO) and convert it into the
corresponding boolean type.
from langchain_core.exceptions import OutputParserException
from langchain_core.output_parsers import BaseOutputParser
@property
def _type(self) -> str:
return "boolean_output_parser"
parser = BooleanOutputParser()
parser.invoke("YES")
True
try:
parser.invoke("MEOW")
except Exception as e:
print(f"Triggered an exception of type: {type(e)}")
Triggered an exception of type: <class 'langchain_core.exceptions.OutputParserException'>
parser = BooleanOutputParser(true_val="OKAY")
parser.invoke("OKAY")
True
parser.batch(["OKAY", "NO"])
[True, False]
await parser.abatch(["OKAY", "NO"])
[True, False]
from langchain_anthropic.chat_models import ChatAnthropic
anthropic = ChatAnthropic(model_name="claude-2.1")
anthropic.invoke("say OKAY or NO")
AIMessage(content='OKAY')
The parser will work with either the output from an LLM (a string) or the output from a chat model (anAIMessage)!
Sometimes there is additional metadata on the model output that is important besides the raw text. One example of this is tool
calling, where arguments intended to be passed to called functions are returned in a separate property. If you need this finer-
grained control, you can instead subclass the BaseGenerationOutputParser class.
This class requires a single method parse_result. This method takes raw model output (e.g., list ofGeneration or ChatGeneration)
and returns the parsed output.
Supporting both Generation and ChatGeneration allows the parser to work with both regular LLMs as well as with Chat Models.
from typing import List
class StrInvertCase(BaseGenerationOutputParser[str]):
"""An example parser that inverts the case of the characters in the message.
This is an example parse shown just for demonstration purposes and to keep
the example as simple as possible.
"""
Args:
result: A list of Generations to be parsed. The Generations are assumed
to be different candidate outputs for a single model input.
Many parsers assume that only a single generation is passed it in.
We will assert for that
partial: Whether to allow partial results. This is used for parsers
that support streaming
"""
if len(result) != 1:
raise NotImplementedError(
"This output parser can only be used with a single generation."
)
generation = result[0]
if not isinstance(generation, ChatGeneration):
# Say that this one only works with chat generations
raise OutputParserException(
"This output parser can only be used with a chat generation."
)
return generation.message.content.swapcase()
Let’s the new parser! It should be inverting the output from the model.
Previous
« Quickstart
Next
CSV parser »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Tracking token
ModulesI/O LLMsusage
Let’s first look at an extremely simple example of tracking token usage for a single LLM call.
Anything inside the context manager will get tracked. Here’s an example of using it to track multiple calls in sequence.
If a chain or agent with multiple steps in it is used, it will track all those steps.
llm = OpenAI(temperature=0)
tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(
tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
with get_openai_callback() as cb:
response = agent.run(
"Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
)
print(f"Total Tokens: {cb.total_tokens}")
print(f"Prompt Tokens: {cb.prompt_tokens}")
print(f"Completion Tokens: {cb.completion_tokens}")
print(f"Total Cost (USD): ${cb.total_cost}")
> Entering new AgentExecutor chain...
I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"
Observation: ["Olivia Wilde and Harry Styles took fans by surprise with their whirlwind romance, which began when they met on the set of Don't Worry Darling.", 'Olivi
Thought: Harry Styles is Olivia Wilde's boyfriend.
Action: Search
Action Input: "Harry Styles age"
Observation: 29 years
Thought: I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23
Observation: Answer: 2.169459462491557
Thought: I now know the final answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557.
Previous
« Streaming
Next
Output Parsers »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Output CSV
ModulesI/O Parsers Types parser
CSV parser
This output parser can be used when you want to return a list of comma-separated items.
output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
template="List five {subject}.\n{format_instructions}",
input_variables=["subject"],
partial_variables={"format_instructions": format_instructions},
)
model = ChatOpenAI(temperature=0)
Previous
« Custom Output Parsers
Next
Datetime parser »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesMoreMemory
On this page
[Beta] Memory
Most LLM applications have a conversational interface. An essential component of a conversation is being able to refer to
information introduced earlier in the conversation. At bare minimum, a conversational system should be able to access some
window of past messages directly. A more complex system will need to have a world model that it is constantly updating,
which allows it to do things like maintain information about entities and their relationships.
We call this ability to store information about past interactions "memory". LangChain provides a lot of utilities for adding
memory to a system. These utilities can be used by themselves or incorporated seamlessly into a chain.
Most of memory-related functionality in LangChain is marked as beta. This is for two reasons:
1. Most functionality (with some exceptions, see below) are not production ready
2. Most functionality (with some exceptions, see below) work with Legacy chains, not the newer LCEL syntax.
The main exception to this is the ChatMessageHistory functionality. This functionality is largely production ready and does
integrate with LCEL.
LCEL Runnables: For an overview of how to useChatMessageHistory with LCEL runnables, see these docs
Integrations: For an introduction to the various ChatMessageHistory integrations, see these docs
Introduction
A memory system needs to support two basic actions: reading and writing. Recall that every chain defines some core
execution logic that expects certain inputs. Some of these inputs come directly from the user, but some of these inputs can
come from memory. A chain will interact with its memory system twice in a given run.
1. AFTER receiving the initial user inputs but BEFORE executing the core logic, a chain will READ from its memory
system and augment the user inputs.
2. AFTER executing the core logic but BEFORE returning the answer, a chain will WRITE the inputs and outputs of the
current run to memory, so that they can be referred to in future runs.
Building memory into a system
Underlying any memory is a history of all chat interactions. Even if these are not all used directly, they need to be stored in
some form. One of the key parts of the LangChain memory module is a series of integrations for storing these chat
messages, from in-memory lists to persistent databases.
Chat message storage: How to work with Chat Messages, and the various integrations offered.
Keeping a list of chat messages is fairly straight-forward. What is less straight-forward are the data structures and algorithms
built on top of chat messages that serve a view of those messages that is most useful.
A very simple memory system might just return the most recent messages each run. A slightly more complex memory
system might return a succinct summary of the past K messages. An even more sophisticated system might extract entities
from stored messages and only return information about entities referenced in the current run.
Each application can have different requirements for how memory is queried. The memory module should make it easy to
both get started with simple memory systems and write your own custom systems if needed.
Memory types: The various data structures and algorithms that make up the memory types LangChain supports
Get started
Let's take a look at what Memory actually looks like in LangChain. Here we'll cover the basics of interacting with an arbitrary
memory class.
Let's take a look at how to useConversationBufferMemory in chains. ConversationBufferMemory is an extremely simple form of
memory that just keeps a list of chat messages in a buffer and passes those into the prompt template.
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
When using memory in a chain, there are a few key concepts to understand. Note that here we cover general concepts that
are useful for most types of memory. Each individual memory type may very well have its own parameters and concepts that
are necessary to understand.
Before going into the chain, various variables are read from memory. These have specific names which need to align with the
variables the chain expects. You can see what these variables are by calling memory.load_memory_variables({}). Note that the
empty dictionary that we pass in is just a placeholder for real variables. If the memory type you are using is dependent upon
the input variables, you may need to pass some in.
memory.load_memory_variables({})
{'history': "Human: hi!\nAI: what's up?"}
In this case, you can see that load_memory_variables returns a single key, history. This means that your chain (and likely your
prompt) should expect an input named history. You can usually control this variable through parameters on the memory class.
For example, if you want the memory variables to be returned in the key chat_history you can do:
memory = ConversationBufferMemory(memory_key="chat_history")
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'chat_history': "Human: hi!\nAI: what's up?"}
The parameter name to control these keys may vary per memory type, but it's important to understand that (1) this is
controllable, and (2) how to control it.
One of the most common types of memory involves returning a list of chat messages. These can either be returned as a
single string, all concatenated together (useful when they will be passed into LLMs) or a list of ChatMessages (useful when
passed into ChatModels).
By default, they are returned as a single string. In order to return as a list of messages, you can setreturn_messages=True
memory = ConversationBufferMemory(return_messages=True)
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")
{'history': [HumanMessage(content='hi!', additional_kwargs={}, example=False),
AIMessage(content='what's up?', additional_kwargs={}, example=False)]}
Often times chains take in or return multiple input/output keys. In these cases, how can we know which keys we want to save
to the chat message history? This is generally controllable by input_key and output_key parameters on the memory types. These
default to None - and if there is only one input/output key it is known to just use that. However, if there are multiple input/output
keys then you MUST specify the name of which one to use.
Finally, let's take a look at using this in a chain. We'll use anLLMChain, and show working with both an LLM and a ChatModel.
Using an LLM
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
llm = OpenAI(temperature=0)
# Notice that "chat_history" is present in the prompt template
template = """You are a nice chatbot having a conversation with a human.
Previous conversation:
{chat_history}
Using a ChatModel
from langchain_openai import ChatOpenAI
from langchain.prompts import (
ChatPromptTemplate,
MessagesPlaceholder,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
llm = ChatOpenAI()
prompt = ChatPromptTemplate(
messages=[
SystemMessagePromptTemplate.from_template(
"You are a nice chatbot having a conversation with a human."
),
# The `variable_name` here is what must align with memory
MessagesPlaceholder(variable_name="chat_history"),
HumanMessagePromptTemplate.from_template("{question}")
]
)
# Notice that we `return_messages=True` to fit into the MessagesPlaceholder
# Notice that `"chat_history"` aligns with the MessagesPlaceholder name.
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
conversation = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory
)
# Notice that we just pass in the `question` variables - `chat_history` gets populated by memory
conversation({"question": "hi"})
Next steps
And that's it for getting started! Please see the other sections for walkthroughs of more advanced topics, like custom memory,
multiple memories, and more.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Add
Language to fallbacks
On this page
Add fallbacks
There are many possible points of failure in an LLM application, whether that be issues with LLM API’s, poor model outputs,
issues with other integrations, etc. Fallbacks help you gracefully handle and isolate these issues.
Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level.
This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API
could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these
types of things.
IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when
working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing.
First, let’s mock out what happens if we hit a RateLimitError from OpenAI
import httpx
from openai import RateLimitError
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
try:
print(chain.invoke({"animal": "kangaroo"}))
except RateLimitError:
print("Hit error")
content=" I don't actually know why the kangaroo crossed the road, but I'm happy to take a guess! Maybe the kangaroo was trying to get to the other side to find som
We can also specify the errors to handle if we want to be more specific about when the fallback is invoked:
llm = openai_llm.with_fallbacks(
[anthropic_llm], exceptions_to_handle=(KeyboardInterrupt,)
)
We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models:
ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely
want a different prompt.
chat_prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're a nice assistant who always includes a compliment in your response",
),
("human", "Why did the {animal} cross the road"),
]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model_name="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()
# Now lets create a chain with the normal OpenAI model
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters Semantic Chunking
On this page
Semantic Chunking
Splits the text based on semantic similarity.
At a high level, this splits into sentences, then groups into groups of 3 sentences, and then merges one that are similar in the
embedding space.
Install Dependencies
Split Text
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
Breakpoints
This chunker works by determining when to “break” apart sentences. This is done by looking for differences in embeddings
between any two sentences. When that difference is past some threshold, then they are split.
Percentile
The default way to split is based on percentile. In this method, all differences between sentences are calculated, and then
any difference greater than the X percentile is split.
text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="percentile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
print(len(docs))
26
Standard Deviation
text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="standard_deviation"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
print(len(docs))
4
Interquartile
text_splitter = SemanticChunker(
OpenAIEmbeddings(), breakpoint_threshold_type="interquartile"
)
docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
print(len(docs))
25
Previous
« Recursively split by character
Next
Split by tokens »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language Get started
On this page
Get started
LCEL makes it easy to build complex chains from basic components, and supports out of the box functionality such as
streaming, parallelism, and logging.
The most basic and common use case is chaining a prompt template and a model together. To see how this works, let’s
create a chain that takes a topic and generates a joke:
Notice this line of this code, where we piece together then different components into a single chain using LCEL:
The | symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one
component as input into the next component.
In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model, then
the model output is passed to the output parser. Let’s take a look at each component individually to really understand what’s
going on.
1. Prompt
prompt is a BasePromptTemplate , which means it takes in a dictionary of template variables and produces aPromptValue. A
PromptValue is a wrapper around a completed prompt that can be passed to either an LLM (which takes a string as input) or
ChatModel (which takes a sequence of messages as input). It can work with either language model type because it defines
logic both for producing BaseMessages and for producing a string.
2. Model
The PromptValue is then passed to model. In this case our model is a ChatModel, meaning it will output a BaseMessage.
message = model.invoke(prompt_value)
message
AIMessage(content="Why don't ice creams ever get invited to parties?\n\nBecause they always bring a melt down!")
If our model was an LLM, it would output a string.
llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm.invoke(prompt_value)
'\n\nRobot: Why did the ice cream truck break down? Because it had a meltdown!'
3. Output parser
And lastly we pass our model output to the output_parser, which is a BaseOutputParser meaning it takes either a string or a
BaseMessage as input. The StrOutputParser specifically simple converts any input into a string.
output_parser.invoke(message)
"Why did the ice cream go to therapy? \n\nBecause it had too many toppings and couldn't find its cone-fidence!"
4. Entire Pipeline
Note that if you’re curious about the output of any components, you can always test out a smaller version of the chain such
as prompt or prompt | model to see the intermediate results:
prompt.invoke(input)
# > ChatPromptValue(messages=[HumanMessage(content='tell me a short joke about ice cream')])
(prompt | model).invoke(input)
# > AIMessage(content="Why did the ice cream go to therapy?\nBecause it had too many toppings and couldn't cone-trol itself!")
For our next example, we want to run a retrieval-augmented generation chain to add some context when responding to
questions.
# Requires:
# pip install langchain docarray tiktoken
vectorstore = DocArrayInMemorySearch.from_texts(
["harrison worked at kensho", "bears like to eat honey"],
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
output_parser = StrOutputParser()
setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser
To explain this, we first can see that the prompt template above takes incontext and question as values to be substituted in the
prompt. Before building the prompt template, we want to retrieve relevant documents to the search and include them as part
of the context.
As a preliminary step, we’ve setup the retriever using an in memory store, which can retrieve documents based on a query.
This is a runnable component as well that can be chained together with other components, but you can also try to run it
separately:
We then use the RunnableParallel to prepare the expected inputs into the prompt by using the entries for the retrieved
documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass
the user’s question:
setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
setup_and_retrieval = RunnableParallel(
{"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser
1. The first steps create a RunnableParallel object with two entries. The first entry, context will include the document results
fetched by the retriever. The second entry, question will contain the user’s original question. To pass on the question, we
use RunnablePassthrough to copy this entry.
2. Feed the dictionary from the step above to theprompt component. It then takes the user input which is question as well as
the retrieved document which is context to construct a prompt and output a PromptValue.
3. The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The
generated output from the model is a ChatMessage object.
4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned
from the invoke method.
Next steps
We recommend reading our Why use LCEL section next to see a side-by-side comparison of the code needed to produce
common functionality with and without LCEL.
Previous
« LangChain Expression Language (LCEL)
Next
Why use LCEL »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language CookbookUsing tools
Using tools
You can use any Tools with Runnables easily.
{input}"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
chain = prompt | model | StrOutputParser() | search
chain.invoke({"input": "I'd like to figure out what games are tonight"})
'What sports games are on TV today & tonight? Watch and stream live sports on TV today, tonight, tomorrow. Today\'s 2023 sports TV schedule includes football, bas
Previous
« Managing prompt size
Next
LangChain Expression Language (LCEL) »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers Quickstart
On this page
Quickstart
Language models output text. But many times you may want to get more structured information than just text back. This is
where output parsers come in.
Output parsers are classes that help structure language model responses. There are two main methods an output parser
must implement:
“Get format instructions”: A method which returns a string containing instructions for how the output of a language
model should be formatted.
“Parse”: A method which takes in a string (assumed to be the response from a language model) and parses it into
some structure.
“Parse with prompt”: A method which takes in a string (assumed to be the response from a language model) and a
prompt (assumed to be the prompt that generated such a response) and parses it into some structure. The prompt is
largely provided in the event the OutputParser wants to retry or fix the output in some way, and needs information from
the prompt to do so.
Get started
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": parser.get_format_instructions()},
)
# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')
LCEL
Output parsers implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL).
This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.
Output parsers accept a string or BaseMessage as input and can return an arbitrary type.
parser.invoke(output)
Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')
Instead of manually invoking the parser, we also could’ve just added it to ourRunnable sequence:
While all parsers support the streaming interface, only certain parsers can stream through partially parsed objects, since this
is highly dependent on the output type. Parsers which cannot construct partial objects will simply yield the fully parsed output.
json_prompt = PromptTemplate.from_template(
"Return a JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser
list(json_chain.stream({"question": "Who invented the microscope?"}))
[{},
{'answer': ''},
{'answer': 'Ant'},
{'answer': 'Anton'},
{'answer': 'Antonie'},
{'answer': 'Antonie van'},
{'answer': 'Antonie van Lee'},
{'answer': 'Antonie van Leeu'},
{'answer': 'Antonie van Leeuwen'},
{'answer': 'Antonie van Leeuwenho'},
{'answer': 'Antonie van Leeuwenhoek'}]
Previous
« Output Parsers
Next
Custom Output Parsers »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How RunnableParallel: Manipulating
Language to data
On this page
Here the input to prompt is expected to be a map with keys “context” and “question”. The user input is just the question. So
we need to get the context using our retriever and passthrough the user input under the “question” key.
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Note that when composing a RunnableParallel with another Runnable we don’t even need to wrap our dictionary in the
RunnableParallel class — the type conversion is handled for us. In the context of a chain, these are equivalent:
Note that you can use Python’s itemgetter as shorthand to extract data from the map when combining withRunnableParallel. You
can find more information about itemgetter in the Python Documentation.
In the example below, we use itemgetter to extract specific keys from the map:
from operator import itemgetter
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
Question: {question}
chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"language": itemgetter("language"),
}
| prompt
| model
| StrOutputParser()
)
Parallelize steps
RunnableParallel (aka. RunnableMap) makes it easy to execute multiple Runnables in parallel, and to return the output of
these Runnables as a map.
model = ChatOpenAI()
joke_chain = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model
poem_chain = (
ChatPromptTemplate.from_template("write a 2-line poem about {topic}") | model
)
map_chain.invoke({"topic": "bear"})
{'joke': AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="In the wild's embrace, bear roams free,\nStrength and grace, a majestic decree.")}
Parallelism
RunnableParallel are also useful for running independent processes in parallel, since each Runnable in the map is executed
in parallel. For example, we can see our earlier joke_chain, poem_chain and map_chain all have about the same runtime, even
though map_chain executes both of the other two.
%%timeit
joke_chain.invoke({"topic": "bear"})
958 ms ± 402 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
poem_chain.invoke({"topic": "bear"})
1.22 s ± 508 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
map_chain.invoke({"topic": "bear"})
1.15 s ± 119 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Previous
« How to
Next
RunnablePassthrough: Passing data through »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsQuick Start
On this page
Quick Start
Chat models are a variation on language models. While chat models use language models under the hood, the interface they
use is a bit different. Rather than using a “text in, text out” API, they use an interface where “chat messages” are the inputs
and outputs.
Setup
For this example we’ll need to install the OpenAI partner package:
Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we’ll want to set it as an environment variable by running:
export OPENAI_API_KEY="..."
If you’d prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:
chat = ChatOpenAI(openai_api_key="...")
chat = ChatOpenAI()
Messages
The chat model interface is based around messages rather than raw text. The types of messages currently supported in
LangChain are AIMessage, HumanMessage , SystemMessage, FunctionMessage and ChatMessage – ChatMessage takes in an arbitrary role
parameter. Most of the time, you’ll just be dealing with HumanMessage , AIMessage, and SystemMessage
LCEL
Chat models implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL). This
means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.
Chat models accept List[BaseMessage] as inputs, or objects which can be coerced to messages, includingstr (converted to
HumanMessage ) and PromptValue.
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
chat.invoke(messages)
AIMessage(content="The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too comple
chat.batch([messages])
[AIMessage(content="The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too comple
await chat.ainvoke(messages)
AIMessage(content='The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model becomes too complex
LangSmith
All ChatModels come with built-in LangSmith tracing. Just set the following environment variables:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>
and any ChatModel invocation (whether it’s nested in a chain or not) will automatically be traced. A trace will include inputs,
outputs, latency, token usage, invocation params, environment params, and more. See an example here:
https://smith.langchain.com/public/a54192ae-dd5c-4f7a-88d1-daa1eaba1af7/r.
In LangSmith you can then provide feedback for any trace, compile annotated datasets for evals, debug performance in the
playground, and more.
For convenience you can also treat chat models as callables. You can get chat completions by passing one or more
messages to the chat model. The response will be a message.
from langchain_core.messages import HumanMessage, SystemMessage
chat(
[
HumanMessage(
content="Translate this sentence from English to French: I love programming."
)
]
)
AIMessage(content="J'adore la programmation.")
OpenAI’s chat model supports multiple messages as input. Seehere for more information. Here is an example of sending a
system and user message to the chat model:
messages = [
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
]
chat(messages)
AIMessage(content="J'adore la programmation.")
[Legacy] generate
You can go one step further and generate completions for multiple sets of messages usinggenerate. This returns an LLMResult
with an additional message parameter. This will include additional information about each generation beyond the returned
message (e.g. the finish reason) and additional information about the full API call (e.g. total tokens used).
batch_messages = [
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love programming."),
],
[
SystemMessage(
content="You are a helpful assistant that translates English to French."
),
HumanMessage(content="I love artificial intelligence."),
],
]
result = chat.generate(batch_messages)
result
LLMResult(generations=[[ChatGeneration(text="J'adore programmer.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore programmer.
You can recover things like token usage from this LLMResult:
result.llm_output
{'token_usage': {'prompt_tokens': 53,
'completion_tokens': 18,
'total_tokens': 71},
'model_name': 'gpt-3.5-turbo'}
Previous
« Chat Models
Next
Function calling »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Code
Language Cookbookwriting
Code writing
Example of how to use LCEL to write Python code.
```python
....
```"""
prompt = ChatPromptTemplate.from_messages([("system", template), ("human", "{input}")])
model = ChatOpenAI()
def _sanitize_output(text: str):
_, after = text.split("```python")
return after.split("```")[0]
chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run
chain.invoke({"input": "whats 2 plus 2"})
Python REPL can execute arbitrary code. Use with caution.
'4\n'
Previous
« Agents
Next
Routing by semantic similarity »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Output
ModulesI/O Parsers
Output Parsers
Output parsers are responsible for taking the output of an LLM and transforming it to a more suitable format. This is very
useful when you are using LLMs to generate any form of structured data.
Besides having a large collection of different types of output parsers, one distinguishing benefit of LangChain OutputParsers
is that many of them support streaming.
Quick Start
See this quick-start guide for an introduction to output parsers and how to work with them.
LangChain has lots of different types of output parsers. This is a list of output parsers LangChain supports. The table below
has various pieces of information:
Has Format Instructions: Whether the output parser has format instructions. This is generally available except when (a) the
desired schema is not specified in the prompt but rather in other parameters (like OpenAI function calling), or (b) when the
OutputParser wraps another OutputParser.
Calls LLM: Whether this output parser itself calls an LLM. This is usually only done by output parsers that attempt to correct
misformatted output.
Input Type: Expected input type. Most output parsers work on both strings and messages, but some (like OpenAI Functions)
need a message with specific kwargs.
Output Type: The output type of the object returned by the parser.
Description: Our commentary on this output parser and when to use it.
Supports Has Format Calls Input
Name Output Type Description
Streaming Instructions LLM Type
Message
Uses latest OpenAI function calling args tools and
(Passes tools tool_choice to structure the return output. If you
OpenAITools (with JSON object
to model) tool_choice)
are using a model that supports function calling,
this is generally the most reliable method.
(Passes Message Uses legacy OpenAI function calling args
OpenAIFunctions ✅ functions to (with JSON object functions and function_call to structure the return
model) function_call) output.
Returns a JSON object as specified. You can
str \|
specify a Pydantic model and it will return JSON
JSON ✅ ✅ Message
JSON object for that model. Probably the most reliable output
parser for getting structured data that does NOT
use function calling.
str \|
Returns a dictionary of tags. Use when XML
XML ✅ ✅ Message
dict output is needed. Use with models that are good
at writing XML (like Anthropic's).
str \|
CSV ✅ ✅ List[str] Returns a list of comma separated values.
Message
Wraps another output parser. If that output
str \| parser errors, then this will pass the error
OutputFixing ✅
Message message and the bad output to an LLM and ask
it to fix the output.
Wraps another output parser. If that output
parser errors, then this will pass the original
str \| inputs, the bad output, and the error message to
RetryWithError ✅
Message an LLM and ask it to fix it. Compared to
OutputFixingParser, this one also sends the
original instructions.
str \| Takes a user defined Pydantic model and
Pydantic ✅ pydantic.BaseModel
Message returns data in that format.
Takes a user defined Pydantic model and
str \|
YAML ✅ Message
pydantic.BaseModel returns data in that format. Uses YAML to
encode it.
str \| Useful for doing operations with pandas
PandasDataFrame ✅ dict
Message DataFrames.
str \| Parses response into one of the provided enum
Enum ✅ Enum
Message values.
str \|
Datetime ✅ datetime.datetime Parses response into a datetime string.
Message
An output parser that returns structured
information. It is less powerful than other output
str \|
Structured ✅ Dict[str, str] parsers since it only allows for fields to be
Message
strings. This can be useful when you are
working with smaller LLMs.
Previous
« Tracking token usage
Next
Quickstart »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiVector Retriever
On this page
MultiVector Retriever
It can often be beneficial to store multiple vectors per document. There are multiple use cases where this is beneficial.
LangChain has a base MultiVectorRetriever which makes querying this type of setup easy. A lot of the complexity lies in how to
create the multiple vectors per document. This notebook covers some of the common ways to create those vectors and use
the MultiVectorRetriever.
Smaller chunks: split a document into smaller chunks, and embed those (this is ParentDocumentRetriever).
Summary: create a summary for each document, embed that along with (or instead of) the document.
Hypothetical questions: create hypothetical questions that each document would be appropriate to answer, embed
those along with (or instead of) the document.
Note that this also enables another method of adding embeddings - manually. This is great because you can explicitly add
questions or queries that should lead to a document being recovered, giving you more control.
Smaller chunks
Often times it can be useful to retrieve larger chunks of information, but embed smaller chunks. This allows for embeddings
to capture the semantic meaning as closely as possible, but for as much context as possible to be passed downstream. Note
that this is what the ParentDocumentRetriever does. Here we show what is going on under the hood.
The default search type the retriever performs on the vector database is a similarity search. LangChain Vector Stores also
support searching via Max Marginal Relevance so if you want this instead you can just set thesearch_type property as follows:
retriever.search_type = SearchType.mmr
len(retriever.get_relevant_documents("justice breyer")[0].page_content)
9875
Summary
Oftentimes a summary may be able to distill more accurately what a chunk is about, leading to better retrieval. Here we show
how to create summaries, and then embed those.
import uuid
Hypothetical Queries
An LLM can also be used to generate a list of hypothetical questions that could be asked of a particular document. These
questions can then be embedded
functions = [
{
"name": "hypothetical_questions",
"description": "Generate hypothetical questions",
"parameters": {
"type": "object",
"properties": {
"questions": {
"type": "array",
"items": {"type": "string"},
},
},
"required": ["questions"],
},
}
]
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser
chain = (
{"doc": lambda x: x.page_content}
# Only asking for 3 hypothetical questions, but this could be adjusted
| ChatPromptTemplate.from_template(
"Generate a list of exactly 3 hypothetical questions that the below document could be used to answer:\n\n{doc}"
)
| ChatOpenAI(max_retries=0, model="gpt-4").bind(
functions=functions, function_call={"name": "hypothetical_questions"}
)
| JsonKeyOutputFunctionsParser(key_name="questions")
)
chain.invoke(docs[0])
["What was the author's first experience with programming like?",
'Why did the author switch their focus from AI to Lisp during their graduate studies?',
'What led the author to contemplate a career in art instead of computer science?']
hypothetical_questions = chain.batch(docs, {"max_concurrency": 5})
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
collection_name="hypo-questions", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryByteStore()
id_key = "doc_id"
# The retriever (empty to start)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=store,
id_key=id_key,
)
doc_ids = [str(uuid.uuid4()) for _ in docs]
question_docs = []
for i, question_list in enumerate(hypothetical_questions):
question_docs.extend(
[Document(page_content=s, metadata={id_key: doc_ids[i]}) for s in question_list]
)
retriever.vectorstore.add_documents(question_docs)
retriever.docstore.mset(list(zip(doc_ids, docs)))
sub_docs = vectorstore.similarity_search("justice breyer")
sub_docs
[Document(page_content='Who has been nominated to serve on the United States Supreme Court?', metadata={'doc_id': '0b3a349e-c936-4e77-9c40-0a39fc3e07f0'}
Document(page_content="What was the context and content of Robert Morris' advice to the document's author in 2010?", metadata={'doc_id': 'b2b2cdca-988a-4af1-
Document(page_content='How did personal circumstances influence the decision to pass on the leadership of Y Combinator?', metadata={'doc_id': 'b2b2cdca-988a-
Document(page_content='What were the reasons for the author leaving Yahoo in the summer of 1999?', metadata={'doc_id': 'ce4f4981-ca60-4f56-86f0-89466de623
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text Recursively split by
ModulesRetrievalSplitters character
Previous
« Recursively split JSON
Next
Semantic Chunking »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language Cookbook
Cookbook
Example code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show
how to compose different Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting
acquainted with LCEL, the Prompt + LLM page is a good place to start.
️ Prompt + LLM
The most common and valuable composition is taking:
️ RAG
Let’s look at adding in a retrieval step to a prompt and LLM, which adds
️ Multiple chains
Runnables can easily be used to string together multiple Chains
️ Querying a SQL DB
We can replicate our SQLDatabaseChain with Runnables.
️ Agents
You can pass a Runnable into an agent. Make sure you have langchainhub
️ Code writing
Example of how to use LCEL to write Python code.
️ Adding moderation
This shows how to add in moderation (or other safeguards) around your
️ Using tools
You can use any Tools with Runnables easily.
Previous
« Add message history (memory)
Next
Prompt + LLM »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory in the Multi-Input
ModulesMoreMemoryChain
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(
texts, embeddings, metadatas=[{"source": i} for i in range(len(texts))]
)
Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
query = "What did the president say about Justice Breyer"
docs = docsearch.similarity_search(query)
from langchain.chains.question_answering import load_qa_chain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI
template = """You are a chatbot having a conversation with a human.
Given the following extracted parts of a long document and a question, create a final answer.
{context}
{chat_history}
Human: {human_input}
Chatbot:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input", "context"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history", input_key="human_input")
chain = load_qa_chain(
OpenAI(temperature=0), chain_type="stuff", memory=memory, prompt=prompt
)
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "human_input": query}, return_only_outputs=True)
{'output_text': ' Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, a
print(chain.memory.buffer)
Previous
« Memory in LLMChain
Next
Memory in Agent »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders File Directory
On this page
File Directory
This covers how to load all documents in a directory.
We can use the glob parameter to control which files to load. Note that here it doesn't load the.rst file or the .html files.
By default a progress bar will not be shown. To show a progress bar, install thetqdm library (e.g. pip install tqdm ), and set the
show_progress parameter to True.
Use multithreading
By default the loading happens in one thread. In order to utilize several threads set theuse_multithreading flag to true.
By default this uses the UnstructuredLoader class. However, you can change up the type of loader pretty easily.
If you need to load Python source code files, use the PythonLoader.
In this example we will see some strategies that can be useful when loading a large list of arbitrary files from a directory using
the TextLoader class.
First to illustrate the problem, let's try to load multiple texts with arbitrary encodings.
path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)
A. Default Behavior
loader.load()
<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="color
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/data/source/langchain/langch
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">26 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">27 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">28 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span>29 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">30 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">31 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f">32 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #bfbf7f; text-decoration-color: #bfbf7f">/home/spike/.pyenv/versions/3
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 319 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 320 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 321 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #800000; text-decoration-color: #800000">❱ </span> 322 <span
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 323 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 324 </span><span style
<span style="color: #800000; text-decoration-color: #800000">│</span> <span style="color: #7f7f7f; text-decoration-color: #7f7f7f"> 325 </span><span style
<span style="color: #800000; text-decoration-color: #800000">╰────────────────────────────────────────────
<span style="color: #ff0000; text-decoration-color: #ff0000; font-weight: bold">UnicodeDecodeError: </span><span style="color: #008000; text-decoration-color
<span style="font-style: italic">The above exception was the direct cause of the following exception:</span>
With the default behavior of TextLoader any failure to load any of the documents will fail the whole loading process and no
documents are loaded.
B. Silent fail
We can pass the parameter silent_errors to the DirectoryLoader to skip the files which could not be loaded and continue the load
process.
We can also ask TextLoader to auto detect the file encoding before failing, by passing theautodetect_encoding to the loader class.
text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
doc_sources = [doc.metadata['source'] for doc in docs]
doc_sources
['../../../../../tests/integration_tests/examples/example-non-utf8.txt',
'../../../../../tests/integration_tests/examples/whatsapp_chat.txt',
'../../../../../tests/integration_tests/examples/example-utf8.txt']
Previous
« CSV
Next
HTML »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How RunnablePassthrough: Passing data
Language to through
On this page
RunnablePassthrough() called on it’s own, will simply take the input and pass it through.
RunnablePassthrough called with assign (RunnablePassthrough.assign(...)) will take the input, and will add the extra arguments
passed to the assign function.
runnable = RunnableParallel(
passed=RunnablePassthrough(),
extra=RunnablePassthrough.assign(mult=lambda x: x["num"] * 3),
modified=lambda x: x["num"] + 1,
)
runnable.invoke({"num": 1})
{'passed': {'num': 1}, 'extra': {'num': 1, 'mult': 3}, 'modified': 2}
As seen above, passed key was called with RunnablePassthrough() and so it simply passed on {'num': 1} .
In the second line, we used RunnablePastshrough.assign with a lambda that multiplies the numerical value by 3. In this cased,extra
was set with {'num': 1, 'mult': 3} which is the original value with the mult key added.
Finally, we also set a third key in the map withmodified which uses a lambda to set a single value adding 1 to the num, which
resulted in modified key with the value of 2.
Retrieval Example
In the example below, we see a use case where we use RunnablePassthrough along with RunnableMap.
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
retrieval_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Here the input to prompt is expected to be a map with keys “context” and “question”. The user input is just the question. So
we need to get the context using our retriever and passthrough the user input under the “question” key. In this case, the
RunnablePassthrough allows us to pass on the user’s question to the prompt and model.
Previous
« RunnableParallel: Manipulating data
Next
RunnableLambda: Run Custom Functions »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangSmith
LangSmith
LangSmith helps you trace and evaluate your language model applications and intelligent agents to help you move from
prototype to production.
For tutorials and other end-to-end examples demonstrating ways to integrate LangSmith in your workflow, check out the
LangSmith Cookbook. Some of the guides therein include:
Previous
« ️ LangServe
Next
LangSmith »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalIndexing
On this page
Indexing
Here, we will look at a basic indexing workflow using the LangChain indexing API.
The indexing API lets you load and keep in sync documents from any source into a vector store. Specifically, it helps:
All of which should save you time and money, as well as improve your vector search results.
Crucially, the indexing API will work even with documents that have gone through several transformation steps (e.g., via text
chunking) with respect to the original source documents.
How it works
LangChain indexing makes use of a record manager (RecordManager) that keeps track of document writes into the vector store.
When indexing content, hashes are computed for each document, and the following information is stored in the record
manager:
Deletion modes
When indexing documents into a vector store, it’s possible that some existing documents in the vector store should be
deleted. In certain situations you may want to remove any existing documents that are derived from the same sources as the
new documents being indexed. In others you may want to delete all existing documents wholesale. The indexing API deletion
modes let you pick the behavior you want:
None does not do any automatic clean up, allowing the user to manually do clean up of old content.
If the content of the source document or derived documents haschanged, both incremental or full modes will clean up
(delete) previous versions of the content.
If the source document has been deleted (meaning it is not included in the documents currently being indexed), thefull
cleanup mode will delete it from the vector store correctly, but the incremental mode will not.
When content is mutated (e.g., the source PDF file was revised) there will be a period of time during indexing when both the
new and old versions may be returned to the user. This happens after the new content was written, but before the old version
was deleted.
incremental indexing minimizes this period of time as it is able to do clean up continuously, as it writes.
full mode does the clean up after all batches have been written.
Requirements
1. Do not use with a store that has been pre-populated with content independently of the indexing API, as the record
manager will not know that records have been inserted previously.
2. Only works with LangChain vectorstore’s that support:
document addition by id (add_documents method with ids argument)
delete by id (delete method with ids argument)
Compatible Vectorstores: AnalyticDB, AstraDB, AwaDB, Bagel, Cassandra, Chroma, DashVector, DatabricksVectorSearch, DeepLake, Dingo,
ElasticVectorSearch, ElasticsearchStore, FAISS , HanaDB, Milvus, MyScale, OpenSearchVectorSearch , PGVector, Pinecone, Qdrant, Redis, Rockset,
ScaNN, SupabaseVectorStore, SurrealDBStore, TimescaleVector, Vald, Vearch, VespaStore, Weaviate, ZepVectorStore.
Caution
The record manager relies on a time-based mechanism to determine what content can be cleaned up (when usingfull or
incremental cleanup modes).
If two tasks run back-to-back, and the first task finishes before the clock time changes, then the second task may not be able
to clean up content.
Quickstart
collection_name = "test_index"
embedding = OpenAIEmbeddings()
vectorstore = ElasticsearchStore(
es_url="http://localhost:9200", index_name="test_index", embedding=embedding
)
Suggestion: Use a namespace that takes into account both the vector store and the collection name in the vector store; e.g.,
‘redis/my_docs’, ‘chromadb/my_docs’ or ‘postgres/my_docs’.
namespace = f"elasticsearch/{collection_name}"
record_manager = SQLRecordManager(
namespace, db_url="sqlite:///record_manager_cache.sql"
)
record_manager.create_schema()
def _clear():
"""Hacky helper method to clear content. See the `full` mode section to to understand why it works."""
index([], record_manager, vectorstore, cleanup="full", source_id_key="source")
This mode does not do automatic clean up of old versions of content; however, it still takes care of content de-duplication.
_clear()
index(
[doc1, doc1, doc1, doc1, doc1],
record_manager,
vectorstore,
cleanup=None,
source_id_key="source",
)
{'num_added': 1, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
_clear()
index([doc1, doc2], record_manager, vectorstore, cleanup=None, source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
Indexing again should result in both documents gettingskipped – also skipping the embedding operation!
index(
[doc1, doc2],
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 0}
If we mutate a document, the new version will be written and all old versions sharing the same source will be deleted.
In full mode the user should pass the full universe of content that should be indexed into the indexing function.
Any documents that are not passed into the indexing function and are present in the vectorstore will be deleted!
_clear()
all_docs = [doc1, doc2]
index(all_docs, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
del all_docs[0]
all_docs
[Document(page_content='doggy', metadata={'source': 'doggy.txt'})]
Source
The metadata attribute contains a field called source. This source should be pointing at theultimate provenance associated
with the given document.
For example, if these documents are representing chunks of some parent document, thesource for both documents should be
the same and reference the parent document.
In general, source should always be specified. Only use a None, if you never intend to use incremental mode, and for some
reason can’t specify the source field correctly.
This should delete the old versions of documents associated withdoggy.txt source and replace them with the new versions.
index(
changed_doggy_docs,
record_manager,
vectorstore,
cleanup="incremental",
source_id_key="source",
)
{'num_added': 0, 'num_updated': 0, 'num_skipped': 2, 'num_deleted': 2}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='tty kitty', metadata={'source': 'kitty.txt'}),
Document(page_content='tty kitty ki', metadata={'source': 'kitty.txt'}),
Document(page_content='kitty kit', metadata={'source': 'kitty.txt'})]
class MyCustomLoader(BaseLoader):
def lazy_load(self):
text_splitter = CharacterTextSplitter(
separator="t", keep_separator=True, chunk_size=12, chunk_overlap=2
)
docs = [
Document(page_content="woof woof", metadata={"source": "doggy.txt"}),
Document(page_content="woof woof woof", metadata={"source": "doggy.txt"}),
]
yield from text_splitter.split_documents(docs)
def load(self):
return list(self.lazy_load())
_clear()
loader = MyCustomLoader()
loader.load()
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]
index(loader, record_manager, vectorstore, cleanup="full", source_id_key="source")
{'num_added': 2, 'num_updated': 0, 'num_skipped': 0, 'num_deleted': 0}
vectorstore.similarity_search("dog", k=30)
[Document(page_content='woof woof', metadata={'source': 'doggy.txt'}),
Document(page_content='woof woof woof', metadata={'source': 'doggy.txt'})]
Previous
« Time-weighted vector store retriever
Next
Agents »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Tools as OpenAI
ModulesAgentsToolsFunctions
With OpenAI chat models we can also automatically bind and convert function-like objects withbind_functions
model_with_functions = model.bind_functions(tools)
model_with_functions.invoke([HumanMessage(content="move file foo to bar")])
AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{\n "source_path": "foo",\n "destination_path": "bar"\n}', 'name': 'move_file'}})
Or we can use the update OpenAI API that uses tools and tool_choice instead of functions and function_call by using
ChatOpenAI.bind_tools:
model_with_tools = model.bind_tools(tools)
model_with_tools.invoke([HumanMessage(content="move file foo to bar")])
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_btkY3xV71cEVAOHnNa5qwo44', 'function': {'arguments': '{\n "source_path": "foo",\n "destination_p
Previous
« Defining Custom Tools
Next
Chains »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Get startedInstallation
On this page
Installation
Official release
Pip
Conda
This will install the bare minimum requirements of LangChain. A lot of the value of LangChain comes when integrating it with
various model providers, datastores, etc. By default, the dependencies needed to do that are NOT installed. You will need to
install the dependencies for specific integrations separately.
From source
If you want to install from source, you can do so by cloning the repo and be sure that the directory is
PATH/TO/REPO/langchain/libs/langchain running:
pip install -e .
LangChain community
The langchain-community package contains third-party integrations. It is automatically installed by langchain , but can also be used
separately. Install with:
LangChain core
The langchain-core package contains base abstractions that the rest of the LangChain ecosystem uses, along with the
LangChain Expression Language. It is automatically installed by langchain , but can also be used separately. Install with:
LangChain experimental
The langchain-experimental package holds experimental LangChain code, intended for research and experimental uses. Install
with:
LangServe
LangServe helps developers deploy LangChain runnables and chains as a REST API. LangServe is automatically installed by
LangChain CLI. If not using LangChain CLI, install with:
for both client and server dependencies. Or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server
code.
LangChain CLI
The LangChain CLI is useful for working with LangChain templates and other LangServe projects. Install with:
LangSmith SDK
The LangSmith SDK is automatically installed by LangChain. If not using LangChain, install with:
Previous
« Introduction
Next
Quickstart »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Types of
ModulesI/O Prompts `MessagePromptTemplate`
Types of `MessagePromptTemplate`
LangChain provides different types of MessagePromptTemplate. The most commonly used are AIMessagePromptTemplate,
SystemMessagePromptTemplate and HumanMessagePromptTemplate , which create an AI message, system message and human
message respectively.
However, in cases where the chat model supports taking chat message with arbitrary role, you can use
ChatMessagePromptTemplate, which allows user to specify the role name.
chat_message_prompt = ChatMessagePromptTemplate.from_template(
role="Jedi", template=prompt
)
chat_message_prompt.format(subject="force")
ChatMessage(content='May the force be with you', role='Jedi')
LangChain also provides MessagesPlaceholder, which gives you full control of what messages to be rendered during formatting.
This can be useful when you are uncertain of what role you should be using for your message prompt templates or when you
wish to insert a list of messages during formatting.
chat_prompt = ChatPromptTemplate.from_messages(
[MessagesPlaceholder(variable_name="conversation"), human_message_template]
)
from langchain_core.messages import AIMessage, HumanMessage
2. Start with the basics: Familiarize yourself with the basic programming concepts such as variables, data types and control structures.
3. Practice, practice, practice: The best way to learn programming is through hands-on experience\
"""
)
chat_prompt.format_prompt(
conversation=[human_message, ai_message], word_count="10"
).to_messages()
[HumanMessage(content='What is the best way to learn programming?'),
AIMessage(content='1. Choose a programming language: Decide on a programming language that you want to learn.\n\n2. Start with the basics: Familiarize yoursel
HumanMessage(content='Summarize our conversation so far in 10 words.')]
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrieval
Retrieval
Many LLM applications require user-specific data that is not part of the model's training set. The primary way of accomplishing
this is through Retrieval Augmented Generation (RAG). In this process, external data is retrieved and then passed to the LLM
when doing the generation step.
LangChain provides all the building blocks for RAG applications - from simple to complex. This section of the documentation
covers everything related to the retrieval step - e.g. the fetching of the data. Although this sounds simple, it can be subtly
complex. This encompasses several key modules.
Document loaders
Document loaders load documents from many different sources. LangChain provides over 100 different document loaders
as well as integrations with other major providers in the space, like AirByte and Unstructured. LangChain provides
integrations to load all types of documents (HTML, PDF, code) from all types of locations (private S3 buckets, public
websites).
Text Splitting
A key part of retrieval is fetching only the relevant parts of documents. This involves several transformation steps to prepare
the documents for retrieval. One of the primary ones here is splitting (or chunking) a large document into smaller chunks.
LangChain provides several transformation algorithms for doing this, as well as logic optimized for specific document types
(code, markdown, etc).
Another key part of retrieval is creating embeddings for documents. Embeddings capture the semantic meaning of the text,
allowing you to quickly and efficiently find other pieces of a text that are similar. LangChain provides integrations with over 25
different embedding providers and methods, from open-source to proprietary API, allowing you to choose the one best suited
for your needs. LangChain provides a standard interface, allowing you to easily swap between models.
Vector stores
With the rise of embeddings, there has emerged a need for databases to support efficient storage and searching of these
embeddings. LangChain provides integrations with over 50 different vectorstores, from open-source local ones to cloud-
hosted proprietary ones, allowing you to choose the one best suited for your needs. LangChain exposes a standard interface,
allowing you to easily swap between vector stores.
Retrievers
Once the data is in the database, you still need to retrieve it. LangChain supports many different retrieval algorithms and is
one of the places where we add the most value. LangChain supports basic methods that are easy to get started - namely
simple semantic search. However, we have also added a collection of algorithms on top of this to increase performance.
These include:
Parent Document Retriever: This allows you to create multiple embeddings per parent document, allowing you to look
up smaller chunks but return larger context.
Self Query Retriever: User questions often contain a reference to something that isn't just semantic but rather
expresses some logic that can best be represented as a metadata filter. Self-query allows you to parse out the semantic
part of a query from other metadata filters present in the query.
Ensemble Retriever: Sometimes you may want to retrieve documents from multiple different sources, or using multiple
different algorithms. The ensemble retriever allows you to easily do this.
And more!
Indexing
The LangChain Indexing API syncs your data from any source into a vector store, helping you:
All of which should save you time and money, as well as improve your vector search results.
Previous
« YAML parser
Next
Document loaders »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O PromptsQuick Start
On this page
Quick Start
Prompt templates are predefined recipes for generating prompts for language models.
A template may include instructions, few-shot examples, and specific context and questions appropriate for a given task.
LangChain strives to create model agnostic templates to make it easy to reuse existing templates across different language
models.
Typically, language models expect the prompt to either be a string or else a list of chat messages.
PromptTemplate
prompt_template = PromptTemplate.from_template(
"Tell me a {adjective} joke about {content}."
)
prompt_template.format(adjective="funny", content="chickens")
'Tell me a funny joke about chickens.'
You can create custom prompt templates that format the prompt in any way you want. For more information, seePrompt
Template Composition.
ChatPromptTemplate
Each chat message is associated with content, and an additional parameter calledrole. For example, in the OpenAI Chat
Completions API, a chat message can be associated with an AI assistant, a human or a system role.
chat_template = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful AI bot. Your name is {name}."),
("human", "Hello, how are you doing?"),
("ai", "I'm doing well, thanks!"),
("human", "{user_input}"),
]
)
For example, in addition to using the 2-tuple representation of (type, content) used above, you could pass in an instance of
MessagePromptTemplate or BaseMessage.
chat_template = ChatPromptTemplate.from_messages(
[
SystemMessage(
content=(
"You are a helpful assistant that re-writes the user's text to "
"sound more upbeat."
)
),
HumanMessagePromptTemplate.from_template("{text}"),
]
)
messages = chat_template.format_messages(text="I don't like eating tasty things")
print(messages)
[SystemMessage(content="You are a helpful assistant that re-writes the user's text to sound more upbeat."), HumanMessage(content="I don't like eating tasty things"
This provides you with a lot of flexibility in how you construct your chat prompts.
LCEL
and ChatPromptTemplate implement the Runnable interface, the basic building block of theLangChain Expression
PromptTemplate
Language (LCEL). This means they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.
PromptTemplateaccepts a dictionary (of the prompt variables) and returns aStringPromptValue. A ChatPromptTemplate accepts a
dictionary and returns a ChatPromptValue.
Previous
« Prompts
Next
Composition »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory in
ModulesMoreMemoryAgent
Memory in Agent
This notebook goes over adding memory to an Agent. Before going through this notebook, please walkthrough the following
notebooks, as this will build on top of both of them:
Memory in LLMChain
Custom Agents
In order to add a memory to an agent we are going to perform the following steps:
For the purposes of this exercise, we are going to create a simple custom Agent that has access to a search tool and utilizes
the ConversationBufferMemory class.
Notice the usage of the chat_history variable in the PromptTemplate, which matches up with the dynamic key name in the
ConversationBufferMemory.
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
{chat_history}
Question: {input}
{agent_scratchpad}"""
prompt = ZeroShotAgent.create_prompt(
tools,
prefix=prefix,
suffix=suffix,
input_variables=["input", "chat_history", "agent_scratchpad"],
)
memory = ConversationBufferMemory(memory_key="chat_history")
We can now construct the LLMChain, with the Memory object, and then create the agent.
'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
To test the memory of this agent, we can ask a followup question that relies on information in the previous exchange to be
answered correctly.
We can see that the agent remembered that the previous question was about Canada, and properly asked Google Search
what the name of Canada’s national anthem was.
For fun, let’s compare this to an agent that does NOT have memory.
prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"
Question: {input}
{agent_scratchpad}"""
prompt = ZeroShotAgent.create_prompt(
tools, prefix=prefix, suffix=suffix, input_variables=["input", "agent_scratchpad"]
)
llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)
agent_without_memory = AgentExecutor.from_agent_and_tools(
agent=agent, tools=tools, verbose=True
)
agent_without_memory.run("How many people live in canada?")
'The current population of Canada is 38,566,192 as of Saturday, December 31, 2022, based on Worldometer elaboration of the latest United Nations data.'
agent_without_memory.run("what is their national anthem called?")
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Stream custom generator
Language to functions
On this page
The signature of these generators should be Iterator[Input] -> Iterator[Output]. Or for async generators: AsyncIterator[Input] ->
AsyncIterator[Output].
These are useful for: - implementing a custom output parser - modifying the output of a previous step, while preserving
streaming capabilities
Sync version
prompt = ChatPromptTemplate.from_template(
"Write a comma-separated list of 5 animals similar to: {animal}"
)
model = ChatOpenAI(temperature=0.0)
Previous
« Add fallbacks
Next
Inspect your runnables »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Long-Context
ModulesRetrievalRetrieversReorder
Long-Context Reorder
No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved
documents. In brief: When models must access relevant information in the middle of long contexts, they tend to ignore the
provided documents. See: https://arxiv.org/abs/2307.03172
To avoid this issue you can re-order documents after retrieval to avoid performance degradation.
# Get embeddings.
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
texts = [
"Basquetball is a great sport.",
"Fly me to the moon is one of my favourite songs.",
"The Celtics are my favourite team.",
"This is a document about the Boston Celtics",
"I simply love going to the movies",
"The Boston Celtics won the game by 20 points",
"This is just a random text.",
"Elden Ring is one of the best games in the last 15 years.",
"L. Kornet is one of the best Celtics players.",
"Larry Bird was an iconic NBA player.",
]
# Create a retriever
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
search_kwargs={"k": 10}
)
query = "What can you tell me about the Celtics?"
# Override prompts
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
llm = OpenAI()
stuff_prompt_override = """Given this text extracts:
-----
{context}
-----
Please answer the following question:
{query}"""
prompt = PromptTemplate(
template=stuff_prompt_override, input_variables=["context", "query"]
)
Previous
« Ensemble Retriever
Next
MultiVector Retriever »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Multiple
Language Cookbookchains
On this page
Multiple chains
Runnables can easily be used to string together multiple Chains
model = ChatOpenAI()
chain2 = (
{"city": chain1, "language": itemgetter("language")}
| prompt2
| model
| StrOutputParser()
)
prompt1 = ChatPromptTemplate.from_template(
"generate a {attribute} color. Return the name of the color and nothing else:"
)
prompt2 = ChatPromptTemplate.from_template(
"what is a fruit of color: {color}. Return the name of the fruit and nothing else:"
)
prompt3 = ChatPromptTemplate.from_template(
"what is a country with a flag that has the color: {color}. Return the name of the country and nothing else:"
)
prompt4 = ChatPromptTemplate.from_template(
"What is the color of {fruit} and the flag of {country}?"
)
color_generator = (
{"attribute": RunnablePassthrough()} | prompt1 | {"color": model_parser}
)
color_to_fruit = prompt2 | model_parser
color_to_country = prompt3 | model_parser
question_generator = (
color_generator | {"fruit": color_to_fruit, "country": color_to_country} | prompt4
)
question_generator.invoke("warm")
ChatPromptValue(messages=[HumanMessage(content='What is the color of strawberry and the flag of China?', additional_kwargs={}, example=False)])
prompt = question_generator.invoke("warm")
model.invoke(prompt)
AIMessage(content='The color of an apple is typically red or green. The flag of China is predominantly red with a large yellow star in the upper left corner and four sm
Branching and Merging
You may want the output of one component to be processed by 2 or more other components.RunnableParallels let you split
or fork the chain so multiple components can process the input in parallel. Later, other components can join or merge the
results to synthesize a final response. This type of chain creates a computation graph that looks like the following:
Input
/\
/ \
Branch1 Branch2
\ /
\/
Combine
planner = (
ChatPromptTemplate.from_template("Generate an argument about: {input}")
| ChatOpenAI()
| StrOutputParser()
| {"base_response": RunnablePassthrough()}
)
arguments_for = (
ChatPromptTemplate.from_template(
"List the pros or positive aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)
arguments_against = (
ChatPromptTemplate.from_template(
"List the cons or negative aspects of {base_response}"
)
| ChatOpenAI()
| StrOutputParser()
)
final_responder = (
ChatPromptTemplate.from_messages(
[
("ai", "{original_response}"),
("human", "Pros:\n{results_1}\n\nCons:\n{results_2}"),
("system", "Generate a final response given the critique"),
]
)
| ChatOpenAI()
| StrOutputParser()
)
chain = (
planner
|{
"results_1": arguments_for,
"results_2": arguments_against,
"original_response": itemgetter("base_response"),
}
| final_responder
)
chain.invoke({"input": "scrum"})
'While Scrum has its potential cons and challenges, many organizations have successfully embraced and implemented this project management framework to great e
Previous
« RAG
Next
Querying a SQL DB »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Ensemble
ModulesRetrievalRetrieversRetriever
On this page
Ensemble Retriever
The EnsembleRetriever takes a list of retrievers as input and ensemble the results of theirget_relevant_documents() methods and
rerank the results based on the Reciprocal Rank Fusion algorithm.
By leveraging the strengths of different algorithms, the EnsembleRetriever can achieve better performance than any single
algorithm.
The most common pattern is to combine a sparse retriever (like BM25) with a dense retriever (like embedding similarity),
because their strengths are complementary. It is also known as “hybrid search”. The sparse retriever is good at finding
relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic
similarity.
doc_list_2 = [
"You like apples",
"You like oranges",
]
embedding = OpenAIEmbeddings()
faiss_vectorstore = FAISS.from_texts(
doc_list_2, embedding, metadatas=[{"source": 2}] * len(doc_list_2)
)
faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 2})
Runtime Configuration
We can also configure the retrievers at runtime. In order to do this, we need to mark the fields as configurable
Notice that this only returns one source from the FAISS retriever, because we pass in the relevant configuration at run time
Previous
« Contextual compression
Next
Long-Context Reorder »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsAgent TypesXML Agent
On this page
XML Agent
Some language models (like Anthropic’s Claude) are particularly good at reasoning/writing XML. This goes over how to use
an agent that uses XML when prompting.
Initialize Tools
tools = [TavilySearchResults(max_results=1)]
Create Agent
Run Agent
agent_executor.invoke(
{
"input": "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
# Notice that chat_history is a string, since this prompt is aimed at LLMs, not chat models
"chat_history": "Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you",
}
)
Since you already told me your name is Bob, I do not need to use any tools to answer the question "what's my name?". I can provide the final answer directly that you
{'input': "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
'chat_history': 'Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you',
'output': 'Your name is Bob.'}
Previous
« OpenAI tools
Next
JSON Chat Agent »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Vector
ModulesRetrievalstores
On this page
Vector stores
INFO
Head to Integrations for documentation on built-in integrations with 3rd-party vector stores.
One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding
vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to
the embedded query. A vector store takes care of storing embedded data and performing vector search for you.
Get started
This walkthrough showcases basic functionality related to vector stores. A key part of working with vector stores is creating
the vector to put in them, which is usually created via embeddings. Therefore, it is recommended that you familiarize yourself
with the text embedding model interfaces before diving into this.
There are many great vector store options, here are a few that are free, open-source, and run entirely on your local machine.
Review all integrations for many great hosted offerings.
Chroma
FAISS
Lance
This walkthrough uses the chroma vector database, which runs on your local machine as a library.
import os
import getpass
# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('../../../state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())
Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B
It is also possible to do a search for documents similar to a given embedding vector usingsimilarity_search_by_vector which
accepts an embedding vector as a parameter instead of a string.
embedding_vector = OpenAIEmbeddings().embed_query(query)
docs = db.similarity_search_by_vector(embedding_vector)
print(docs[0].page_content)
The query is the same, and so the result is also the same.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B
Asynchronous operations
Vector stores are usually run as a separate service that requires some IO operations, and therefore they might be called
asynchronously. That gives performance benefits as you don't waste time waiting for responses from external services. That
might also be important if you work with an asynchronous framework, such as FastAPI.
LangChain supports async operation on vector stores. All the methods might be called using their async counterparts, with
the prefix a, meaning async.
Qdrant is a vector store, which supports all the async operations, thus it will be used in this walkthrough.
Similarity search
query = "What did the president say about Ketanji Brown Jackson"
docs = await db.asimilarity_search(query)
print(docs[0].page_content)
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Ju
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice B
Maximal marginal relevance optimizes for similarity to query and diversity among selected documents. It is also supported in
async API.
query = "What did the president say about Ketanji Brown Jackson"
found_docs = await qdrant.amax_marginal_relevance_search(query, k=2, fetch_k=10)
for i, doc in enumerate(found_docs):
print(f"{i + 1}.", doc.page_content, "\n")
1. Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans c
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
2. We can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together.
I recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera.
They were responding to a 9-1-1 call when a man shot and killed them with a stolen gun.
Both Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers.
I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every communit
I know what works: Investing in crime prevention and community police officers who’ll walk the beat, who’ll know the neighborhood, and who can restore trust and sa
Previous
« CacheBackedEmbeddings
Next
Retrievers »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Custom
ModulesMoreMemoryMemory
Custom Memory
Although there are a few predefined types of memory in LangChain, it is highly possible you will want to add your own type of
memory that is optimal for your application. This notebook covers how to do that.
For this notebook, we will add a custom memory type toConversationChain. In order to add a custom memory class, we need to
import the base memory class and subclass it.
In this example, we will write a custom memory class that uses spaCy to extract entities and save information about them in
a simple hash table. Then, during the conversation, we will look at the input text, extract any entities, and put any information
about them into the context.
Please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.
nlp = spacy.load("en_core_web_lg")
class SpacyEntityMemory(BaseMemory, BaseModel):
"""Memory class for storing information about entities."""
def clear(self):
self.entities = {}
@property
def memory_variables(self) -> List[str]:
"""Define the variables we are providing to the prompt."""
return [self.memory_key]
def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
"""Save context from this conversation to buffer."""
# Get the input text and run through spaCy
text = inputs[list(inputs.keys())[0]]
doc = nlp(text)
# For each entity that was mentioned, save this information to the dictionary.
for ent in doc.ents:
ent_str = str(ent)
if ent_str in self.entities:
self.entities[ent_str] += f"\n{text}"
else:
self.entities[ent_str] = text
We now define a prompt that takes in information about entities as well as user input.
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe
Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["entities", "input"], template=template)
llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm, prompt=prompt, verbose=True, memory=SpacyEntityMemory()
)
In the first example, with no prior knowledge about Harrison, the “Relevant entity information” section is empty.
Conversation:
Human: Harrison likes machine learning
AI:
Now in the second example, we can see that it pulls in information about Harrison.
conversation.predict(
input="What do you think Harrison's favorite subject in college was?"
)
Conversation:
Human: What do you think Harrison's favorite subject in college was?
AI:
' From what I know about Harrison, I believe his favorite subject in college was machine learning. He has expressed a strong interest in the subject and has mentione
Again, please note that this implementation is pretty simple and brittle and probably not useful in a production setting. Its
purpose is to showcase that you can add custom memory implementations.
Previous
« Customizing Conversational Memory
Next
Multiple Memory classes »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Configure chain internals at
Language to runtime
On this page
First, a configurable_fields method. This lets you configure particular fields of a runnable.
Second, a configurable_alternatives method. With this method, you can list out alternatives for any particular runnable that can be
set during runtime.
Configuration Fields
With LLMs
model = ChatOpenAI(temperature=0).configurable_fields(
temperature=ConfigurableField(
id="llm_temperature",
name="LLM Temperature",
description="The temperature of the LLM",
)
)
model.invoke("pick a random number")
AIMessage(content='7')
model.with_config(configurable={"llm_temperature": 0.9}).invoke("pick a random number")
AIMessage(content='34')
With HubRunnables
Configurable Alternatives
With LLMs
With Prompts
llm = ChatAnthropic(temperature=0)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# By default it will write a joke
chain.invoke({"topic": "bears"})
AIMessage(content=" Here's a silly joke about bears:\n\nWhat do you call a bear with no teeth?\nA gummy bear!")
# We can configure it write a poem
chain.with_config(configurable={"prompt": "poem"}).invoke({"topic": "bears"})
AIMessage(content=' Here is a short poem about bears:\n\nThe bears awaken from their sleep\nAnd lumber out into the deep\nForests filled with trees so tall\nForag
We can also have multiple things configurable! Here’s an example doing that with both prompts and LLMs.
llm = ChatAnthropic(temperature=0).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="llm"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="anthropic",
# This adds a new option, with name `openai` that is equal to `ChatOpenAI()`
openai=ChatOpenAI(),
# This adds a new option, with name `gpt4` that is equal to `ChatOpenAI(model="gpt-4")`
gpt4=ChatOpenAI(model="gpt-4"),
# You can add more configuration options here
)
prompt = PromptTemplate.from_template(
"Tell me a joke about {topic}"
).configurable_alternatives(
# This gives this field an id
# When configuring the end runnable, we can then use this id to configure this field
ConfigurableField(id="prompt"),
# This sets a default_key.
# If we specify this key, the default LLM (ChatAnthropic initialized above) will be used
default_key="joke",
# This adds a new option, with name `poem`
poem=PromptTemplate.from_template("Write a short poem about {topic}"),
# You can add more configuration options here
)
chain = prompt | llm
# We can configure it write a poem with OpenAI
chain.with_config(configurable={"prompt": "poem", "llm": "openai"}).invoke(
{"topic": "bears"}
)
AIMessage(content="In the forest, where tall trees sway,\nA creature roams, both fierce and gray.\nWith mighty paws and piercing eyes,\nThe bear, a symbol of stren
Saving configurations
Previous
« Bind runtime args
Next
Create a runnable with the `@chain` decorator »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text embedding
ModulesRetrievalmodels CacheBackedEmbeddings
On this page
CacheBackedEmbeddings
sidebar_label: Caching
Caching embeddings can be done using a CacheBackedEmbeddings. The cache backed embedder is a wrapper around an
embedder that caches embeddings in a key-value store. The text is hashed and the hash is used as the key in the cache.
The main supported way to initialized a CacheBackedEmbeddings is from_bytes_store. This takes in the following parameters:
Attention: Be sure to set the namespace parameter to avoid collisions of the same text embedded using different embeddings
models.
First, let’s see an example that uses the local file system for storing embeddings and uses FAISS vector store for retrieval.
underlying_embeddings = OpenAIEmbeddings()
store = LocalFileStore("./cache/")
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)
list(store.yield_keys())
[]
Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
%%time
db = FAISS.from_documents(documents, cached_embedder)
CPU times: user 218 ms, sys: 29.7 ms, total: 248 ms
Wall time: 1.02 s
If we try to create the vector store again, it’ll be much faster since it does not need to re-compute any embeddings.
%%time
db2 = FAISS.from_documents(documents, cached_embedder)
CPU times: user 15.7 ms, sys: 2.22 ms, total: 18 ms
Wall time: 17.2 ms
list(store.yield_keys())[:5]
['text-embedding-ada-00217a6727d-8916-54eb-b196-ec9c9d6ca472',
'text-embedding-ada-0025fc0d904-bd80-52da-95c9-441015bfb438',
'text-embedding-ada-002e4ad20ef-dfaa-5916-9459-f90c6d8e8159',
'text-embedding-ada-002ed199159-c1cd-5597-9757-f80498e8f17b',
'text-embedding-ada-0021297d37a-2bc1-5e19-bf13-6c950f075062']
store = InMemoryByteStore()
cached_embedder = CacheBackedEmbeddings.from_bytes_store(
underlying_embeddings, store, namespace=underlying_embeddings.model
)
Previous
« Text embedding models
Next
Vector stores »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Get startedSecurity
On this page
Security
LangChain has a large ecosystem of integrations with various external resources like local and remote file systems, APIs and
databases. These integrations allow developers to create versatile applications that combine the power of LLMs with the
ability to access, interact with and manipulate external resources.
Best Practices
When building such applications developers should remember to follow good security practices:
Limit Permissions: Scope permissions specifically to the application's need. Granting broad or excessive permissions
can introduce significant security vulnerabilities. To avoid such vulnerabilities, consider using read-only credentials,
disallowing access to sensitive resources, using sandboxing techniques (such as running inside a container), etc. as
appropriate for your application.
Anticipate Potential Misuse: Just as humans can err, so can Large Language Models (LLMs). Always assume that
any system access or credentials may be used in any way allowed by the permissions they are assigned. For example,
if a pair of database credentials allows deleting data, it’s safest to assume that any LLM able to use those credentials
may in fact delete data.
Defense in Depth: No security technique is perfect. Fine-tuning and good chain design can reduce, but not eliminate,
the odds that a Large Language Model (LLM) may make a mistake. It’s best to combine multiple layered security
approaches rather than relying on any single layer of defense to ensure security. For example: use both read-only
permissions and sandboxing to ensure that LLMs are only able to access data that is explicitly meant for them to use.
A user may ask an agent with access to the file system to delete files that should not be deleted or read the content of
files that contain sensitive information. To mitigate, limit the agent to only use a specific directory and only allow it to
read or write files that are safe to read or write. Consider further sandboxing the agent by running it in a container.
A user may ask an agent with write access to an external API to write malicious data to the API, or delete data from that
API. To mitigate, give the agent read-only API keys, or limit it to only use endpoints that are already resistant to such
misuse.
A user may ask an agent with access to a database to drop a table or mutate the schema. To mitigate, scope the
credentials to only the tables that the agent needs to access and consider issuing READ-ONLY credentials.
If you're building applications that access external resources like file systems, APIs or databases, consider speaking with your
company's security team to determine how to best design and secure your applications.
Reporting a Vulnerability
Please report security vulnerabilities by email to [email protected]. This will ensure the issue is promptly triaged and
acted upon as needed.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Get log
ModulesI/O Chat Modelsprobabilities
On this page
OpenAI
Install the LangChain x OpenAI package and set your API key
os.environ["OPENAI_API_KEY"] = getpass.getpass()
For the OpenAI API to return log probabilities we need to configure thelogprobs=True param
llm = ChatOpenAI(model="gpt-3.5-turbo-0125").bind(logprobs=True)
msg.response_metadata["logprobs"]["content"][:5]
[{'token': 'As',
'bytes': [65, 115],
'logprob': -1.5358024,
'top_logprobs': []},
{'token': ' an',
'bytes': [32, 97, 110],
'logprob': -0.028062303,
'top_logprobs': []},
{'token': ' AI',
'bytes': [32, 65, 73],
'logprob': -0.009415812,
'top_logprobs': []},
{'token': ',', 'bytes': [44], 'logprob': -0.07371779, 'top_logprobs': []},
{'token': ' I',
'bytes': [32, 73],
'logprob': -4.298773e-05,
'top_logprobs': []}]
ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
if ct < 5:
full = chunk if full is None else full + chunk
if "logprobs" in full.response_metadata:
print(full.response_metadata["logprobs"]["content"])
else:
break
ct += 1
[]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.7523563, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.019908238, 'top_logprobs': []}, {'token': ' AI', 'b
Previous
« Custom Chat Model
Next
Streaming »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Create a runnable with the `@chain`
Language to decorator
This will have the benefit of improved observability by tracing your chain correctly. Any calls to runnables inside this function
will be traced as nested childen.
It will also allow you to use this as any other runnable, compose it in chain, etc.
custom_chain.invoke("bears")
'The subject of this joke is bears.'
If you check out your LangSmith traces, you should see acustom_chain trace in there, with the calls to OpenAI nested
underneath
Previous
« Configure chain internals at runtime
Next
Add fallbacks »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Example Selector Select by maximal marginal relevance
ModulesI/O Prompts Types (MMR)
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
Input: happy
Output: sad
Input: windy
Output: calm
Input: worried
Output:
# Let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.
example_selector = SemanticSimilarityExampleSelector.from_examples(
# The list of examples available to select from.
examples,
# The embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# The VectorStore class that is used to store the embeddings and do a similarity search over.
FAISS,
# The number of examples to produce.
k=2,
)
similar_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))
Give the antonym of every input
Input: happy
Output: sad
Input: sunny
Output: gloomy
Input: worried
Output:
Previous
« Select by length
Next
Select by n-gram overlap »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O LLMsStreaming
Streaming
All LLMs implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke, batch,
abatch, stream, astream. This gives all LLMs basic support for streaming.
Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final
result returned by the underlying LLM provider. This obviously doesn’t give you token-by-token streaming, which requires
native support from the LLM provider, but ensures your code that expects an iterator of tokens can work for any of ourLLM
integrations.
Verse 1:
Bubbles dancing in my glass
Clear and crisp, it's such a blast
Refreshing taste, it's like a dream
Sparkling water, you make me beam
Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt
Verse 2:
No sugar, no calories, just pure bliss
You're the perfect drink, I must confess
From lemon to lime, so many flavors to choose
Sparkling water, you never fail to amuse
Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt
Bridge:
Some may say you're just plain water
But to me, you're so much more
You bring a sparkle to my day
In every single way
Chorus:
Oh sparkling water, you're my delight
With every sip, you make me feel so right
You're like a party in my mouth
I can't get enough, I'm hooked no doubt
Outro:
So here's to you, my dear sparkling water
You'll always be my go-to drink forever
With your effervescence and refreshing taste
You'll always have a special place.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text Split by
ModulesRetrievalSplitters character
Split by character
This is the simplest method. This splits based on characters (by default “”) and measure chunk length by number of
characters.
text_splitter = CharacterTextSplitter(
separator="\n\n",
chunk_size=1000,
chunk_overlap=200,
length_function=len,
is_separator_regex=False,
)
texts = text_splitter.create_documents([state_of_the_union])
print(texts[0])
page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Co
Here’s an example of passing metadata along with the documents, notice that it is split along with the documents.
text_splitter.split_text(state_of_the_union)[0]
'Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
Previous
« HTMLHeaderTextSplitter
Next
Split code »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Custom
ModulesI/O LLMsLLM
Custom LLM
This notebook goes over how to create a custom LLM wrapper, in case you want to use your own LLM or a different wrapper
than one that is supported in LangChain.
There are only two required things that a custom LLM needs to implement:
A _call method that takes in a string, some optional stop words, and returns a string.
A _llm_type property that returns a string. Used for logging purposes only.
An _identifying_params property that is used to help with printing of this class. Should return a dictionary.
Let’s implement a very simple custom LLM that just returns the first n characters of the input.
@property
def _llm_type(self) -> str:
return "custom"
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
return prompt[: self.n]
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {"n": self.n}
llm = CustomLLM(n=10)
llm.invoke("This is a foobar thing")
'This is a '
We can also print the LLM and see its custom print.
print(llm)
CustomLLM
Params: {'n': 10}
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Custom
ModulesAgentsHow-toagent
On this page
Custom agent
This notebook goes through how to create your own custom agent.
In this example, we will use OpenAI Tool Calling to create this agent.This is generally the most reliable way to create
agents.
We will first create it WITHOUT memory, but we will then show how to add memory in. Memory is needed to enable
conversation.
First, let’s load the language model we’re going to use to control the agent.
Define Tools
Next, let’s define some tools to use. Let’s write a really simple Python function to calculate the length of a word that is passed
in.
Note that here the function docstring that we use is pretty important. Read more about why this is the casehere
@tool
def get_word_length(word: str) -> int:
"""Returns the length of a word."""
return len(word)
get_word_length.invoke("abc")
3
tools = [get_word_length]
Create Prompt
Now let us create the prompt. Because OpenAI Function Calling is finetuned for tool usage, we hardly need any instructions
on how to reason, or how to output format. We will just have two input variables: input and agent_scratchpad. input should be a
string containing the user objective. agent_scratchpad should be a sequence of messages that contains the previous agent tool
invocations and the corresponding tool outputs.
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but don't know current events",
),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
In this case we’re relying on OpenAI tool calling LLMs, which take tools as a separate argument and have been specifically
trained to know when to invoke those tools.
To pass in our tools to the agent, we just need to format them to theOpenAI tool format and pass them to our model. (By bind -
ing the functions, we’re making sure that they’re passed in each time the model is invoked.)
llm_with_tools = llm.bind_tools(tools)
Putting those pieces together, we can now create the agent. We will import two last utility functions: a component for
formatting intermediate steps (agent action, tool output pairs) to input messages that can be sent to the model, and a
component for converting the output message into an agent action/agent finish.
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
from langchain.agents import AgentExecutor
If we compare this to the base LLM, we can see that the LLM alone struggles
This is great - we have an agent! However, this agent is stateless - it doesn’t remember anything about previous interactions.
This means you can’t ask follow up questions easily. Let’s fix that by adding in memory.
First, let’s add a place for memory in the prompt. We do this by adding a placeholder for messages with the key"chat_history".
Notice that we put this ABOVE the new user input (to follow the conversation flow).
MEMORY_KEY = "chat_history"
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are very powerful assistant, but bad at calculating lengths of words.",
),
MessagesPlaceholder(variable_name=MEMORY_KEY),
("user", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
chat_history = []
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(
x["intermediate_steps"]
),
"chat_history": lambda x: x["chat_history"],
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
When running, we now need to track the inputs and outputs as chat history
Previous
« OpenAI assistants
Next
Streaming »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesMoreMemoryChat Messages
Chat Messages
INFO
Head to Integrations for documentation on built-in memory integrations with 3rd-party databases and tools.
One of the core utility classes underpinning most (if not all) memory modules is theChatMessageHistory class. This is a super
lightweight wrapper that provides convenience methods for saving HumanMessages, AIMessages, and then fetching them
all.
You may want to use this class directly if you are managing memory outside of a chain.
history = ChatMessageHistory()
history.add_user_message("hi!")
history.add_ai_message("whats up?")
history.messages
[HumanMessage(content='hi!', additional_kwargs={}),
AIMessage(content='whats up?', additional_kwargs={})]
Previous
« [Beta] Memory
Next
Memory types »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation Knowledge
ModulesMoreMemorytypes Graph
On this page
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
We can also more modularly get current entities from a new message (will use previous messages as context).
We can also more modularly get knowledge triplets from a new message (will use previous messages as context).
Using in a chain
llm = OpenAI(temperature=0)
from langchain.chains import ConversationChain
from langchain.prompts.prompt import PromptTemplate
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know. The AI ONLY uses information contained in the "Relevant Information" section and
Relevant Information:
{history}
Conversation:
Human: {input}
AI:"""
prompt = PromptTemplate(input_variables=["history", "input"], template=template)
conversation_with_kg = ConversationChain(
llm=llm, verbose=True, prompt=prompt, memory=ConversationKGMemory(llm=llm)
)
Relevant Information:
Conversation:
Human: Hi, what's up?
AI:
" Hi there! I'm doing great. I'm currently in the process of learning about the world around me. I'm learning about different cultures, languages, and customs. It's really
conversation_with_kg.predict(
input="My name is James and I'm helping Will. He's an engineer."
)
Relevant Information:
Conversation:
Human: My name is James and I'm helping Will. He's an engineer.
AI:
" Hi James, it's nice to meet you. I'm an AI and I understand you're helping Will, the engineer. What kind of engineering does he do?"
conversation_with_kg.predict(input="What do you know about Will?")
Relevant Information:
Conversation:
Human: What do you know about Will?
AI:
Previous
« Entity
Next
Conversation Summary »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalRetrieversSelf-querying
On this page
Self-querying
Head to Integrations for documentation on vector stores with built-in support for self-querying.
A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural
language query, the retriever uses a query-constructing LLM chain to write a structured query and then applies that
structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic
similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of
stored documents and to execute those filters.
Get started
For demonstration purposes we’ll use a Chroma vector store. We’ve created a small demo set of documents that contain
summaries of movies.
Note: The self-query retriever requires you to have lark package installed.
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "thriller",
"rating": 9.9,
},
),
]
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
Now we can instantiate our retriever. To do this we’ll need to provide some information upfront about the metadata fields that
our documents support and a short description of the document contents.
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']",
type="string",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
llm = ChatOpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
)
Testing it out
Filter k
We can also use the self query retriever to specify k : the number of documents to fetch.
retriever = SelfQueryRetriever.from_llm(
llm,
vectorstore,
document_content_description,
metadata_field_info,
enable_limit=True,
)
To see what’s going on under the hood, and to have more custom control, we can reconstruct our retriever from scratch.
First, we need to create a query-construction chain. This chain will take a user query and generated aStructuredQuery object
which captures the filters specified by the user. We provide some helper functions for creating a prompt and output parser.
These have a number of tunable params that we’ll ignore here for simplicity.
prompt = get_query_constructor_prompt(
document_content_description,
metadata_field_info,
)
output_parser = StructuredQueryOutputParser.from_components()
query_constructor = prompt | llm | output_parser
print(prompt.format(query="dummy question"))
Your goal is to structure the user's query to match the request schema provided below.
```json
{
"query": string \ text string to compare to document contents
"filter": string \ logical condition statement for filtering documents
}
```
The query string should contain only text that is expected to match the contents of documents. Any conditions in the filter should not be mentioned in the query as we
A logical condition statement is composed of one or more comparison and logical operation statements.
Make sure that you only use the comparators and logical operators listed above and no others.
Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters only use the attributed names with its function names if there are functions applied on them.
Make sure that filters only use format `YYYY-MM-DD` when handling date data typed values.
Make sure that filters take into account the descriptions of attributes and only make comparisons that are feasible given the type of data being stored.
Make sure that filters are only used as needed. If there are no filters that should be applied return "NO_FILTER" for the filter value.
User Query:
What are songs by Taylor Swift or Katy Perry about teenage romance under 3 minutes long in the dance pop genre
Structured Request:
```json
{
"query": "teenager love",
"filter": "and(or(eq(\"artist\", \"Taylor Swift\"), eq(\"artist\", \"Katy Perry\")), lt(\"length\", 180), eq(\"genre\", \"pop\"))"
}
```
User Query:
What are songs that were not published on Spotify
Structured Request:
```json
{
"query": "",
"filter": "NO_FILTER"
"filter": "NO_FILTER"
}
```
User Query:
dummy question
Structured Request:
query_constructor.invoke(
{
"query": "What are some sci-fi movies from the 90's directed by Luc Besson about taxi drivers"
}
)
StructuredQuery(query='taxi driver', filter=Operation(operator=<Operator.AND: 'and'>, arguments=[Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='genre
The query constructor is the key element of the self-query retriever. To make a great retrieval system you’ll need to make
sure your query constructor works well. Often this requires adjusting the prompt, the examples in the prompt, the attribute
descriptions, etc. For an example that walks through refining a query constructor on some hotel inventory data, check out this
cookbook.
The next key element is the structured query translator. This is the object responsible for translating the genericStructuredQuery
object into a metadata filter in the syntax of the vector store you’re using. LangChain comes with a number of built-in
translators. To see them all head to the Integrations section.
retriever = SelfQueryRetriever(
query_constructor=query_constructor,
vectorstore=vectorstore,
structured_query_translator=ChromaTranslator(),
)
retriever.invoke(
"What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated"
)
[Document(page_content='Toys come alive and have a blast doing so', metadata={'genre': 'animated', 'year': 1995})]
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders Markdown
On this page
Markdown
Markdown is a lightweight markup language for creating formatted text using a plain-text editor.
This covers how to load Markdown documents into a document format that we can use downstream.
Retain Elements
Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".
Previous
« JSON
Next
PDF »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O PromptsComposition
On this page
Composition
LangChain provides a user friendly interface for composing different parts of prompts together. You can do this with either
string prompts or chat prompts. Constructing prompts this way allows for easy reuse of components.
When working with string prompts, each template is joined together. You can work with either prompts directly or strings (the
first element in the list needs to be a prompt).
prompt.format(topic="sports", language="spanish")
'Tell me a joke about sports, make it funny\n\nand in spanish'
A chat prompt is made up a of a list of messages. Purely for developer experience, we’ve added a convenient way to create
these prompts. In this pipeline, each new element is a new message in the final prompt.
First, let’s initialize the base ChatPromptTemplate with a system message. It doesn’t have to start with a system, but it’s often
good practice
You can then easily create a pipeline combining it with other messagesor message templates. Use a Message when there is
no variables to be formatted, use a MessageTemplate when there are variables to be formatted. You can also use just a string
(note: this will automatically get inferred as a HumanMessagePromptTemplate.)
new_prompt = (
prompt + HumanMessage(content="hi") + AIMessage(content="what?") + "{input}"
)
Under the hood, this creates an instance of the ChatPromptTemplate class, so you can use it just as you did before!
Previous
« Quick Start
Next
Example Selector Types »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typestools
On this page
OpenAI tools
Newer OpenAI models have been fine-tuned to detect when one or more function(s) should be called and respond with the
inputs that should be passed to the function(s). In an API call, you can describe functions and have the model intelligently
choose to output a JSON object containing arguments to call these functions. The goal of the OpenAI tools APIs is to more
reliably return valid and useful function calls than what can be done using a generic text completion or chat API.
OpenAI termed the capability to invoke a single function as functions, and the capability to invoke one or more functions as
tools.
In the OpenAI Chat API, functions are now considered a legacy options that is deprecated in favor oftools.
If you’re creating agents using OpenAI models, you should be using this OpenAI Tools agent rather than the OpenAI
functions agent.
Using tools allows the model to request that more than one function will be called upon when appropriate.
In some situations, this can help signficantly reduce the time that it takes an agent to achieve its goal.
See
Initialize Tools
For this agent let’s give it the ability to search the web with Tavily.
tools = [TavilySearchResults(max_results=1)]
Create Agent
Run Agent
[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c
agent_executor.invoke(
{
"input": "what's my name? Don't use tools to look this up unless you NEED to",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)
Previous
« OpenAI functions
Next
XML Agent »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders HTML
On this page
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be
displayed in a web browser.
This covers how to load HTML documents into a document format that we can use downstream.
We can also use BeautifulSoup4 to load HTML documents using the BSHTMLLoader. This will extract the text from the HTML into
page_content, and the page title as title into metadata.
Previous
« File Directory
Next
JSON »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Prompt +
Language CookbookLLM
On this page
Prompt + LLM
The most common and valuable composition is taking:
Almost any other chains you build will use this building block.
PromptTemplate + LLM
The simplest composition is just combining a prompt and model to create a chain that takes user input, adds it to a prompt,
passes it to a model, and returns the raw model output.
Note, you can mix and match PromptTemplate/ChatPromptTemplates and LLMs/ChatModels as you like here.
Often times we want to attach kwargs that’ll be passed to each model call. Here are a few examples of that:
We can also add in an output parser to easily transform the raw LLM/ChatModel output into a more workable format
Notice that this now returns a string - a much more workable format for downstream tasks
chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?\n\nBecause they have bear feet!"
When you specify the function to return, you may just want to parse that directly
chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonOutputFunctionsParser()
)
chain.invoke({"foo": "bears"})
{'setup': "Why don't bears like fast food?",
'punchline': "Because they can't catch it!"}
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser
chain = (
prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke({"foo": "bears"})
"Why don't bears wear shoes?"
Simplifying input
To make invocation even simpler, we can add a RunnableParallel to take care of creating the prompt input dict for us:
map_ = RunnableParallel(foo=RunnablePassthrough())
chain = (
map_
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears wear shoes?"
Since we’re composing our map with another Runnable, we can even use some syntactic sugar and just use a dict:
chain = (
{"foo": RunnablePassthrough()}
| prompt
| model.bind(function_call={"name": "joke"}, functions=functions)
| JsonKeyOutputFunctionsParser(key_name="setup")
)
chain.invoke("bears")
"Why don't bears like fast food?"
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text Split by
ModulesRetrievalSplitters tokens
On this page
Split by tokens
Language models have a token limit. You should not exceed the token limit. When you split your text into chunks it is
therefore a good idea to count the number of tokens. There are many tokenizers. When you count tokens in your text you
should use the same tokenizer as used in the language model.
tiktoken
We can use it to estimate tokens used. It will probably be more accurate for the OpenAI models.
Last year COVID-19 kept us apart. This year we are finally together again.
Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.
Note that if we use CharacterTextSplitter.from_tiktoken_encoder, text is only split by CharacterTextSplitter and tiktoken tokenizer is used to
merge splits. It means that split can be larger than chunk size measured by tiktoken tokenizer. We can use
RecursiveCharacterTextSplitter.from_tiktoken_encoder to make sure splits are not larger than chunk size of tokens allowed by the
language model, where each split will be recursively split if it has a larger size.
We can also load a tiktoken splitter directly, which ensure each split is smaller than chunk size.
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
spaCy
spaCy is an open-source software library for advanced natural language processing, written in the programming
languages Python and Cython.
text_splitter = SpacyTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.
My fellow Americans.
And with an unwavering resolve that freedom will always triumph over tyranny.
Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.
He thought he could roll into Ukraine and the world would roll over.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.
SentenceTransformers
The SentenceTransformersTokenTextSplitter is a specialized text splitter for use with the sentence-transformer models. The default
behaviour is to split the text into chunks that fit the token window of the sentence transformer model that you would like to
use.
print(text_chunks[1])
lorem
NLTK
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and
statistical natural language processing (NLP) for English written in the Python programming language.
Rather than just splitting on “”, we can use NLTK to split based on NLTK tokenizers.
text_splitter = NLTKTextSplitter(chunk_size=1000)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman.
My fellow Americans.
And with an unwavering resolve that freedom will always triumph over tyranny.
Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways.
He thought he could roll into Ukraine and the world would roll over.
From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.
KoNLPY
KoNLPy: Korean NLP in Python is is a Python package for natural language processing (NLP) of the Korean
language.
Token splitting involves the segmentation of text into smaller, more manageable units called tokens. These tokens are often
words, phrases, symbols, or other meaningful elements crucial for further processing and analysis. In languages like English,
token splitting typically involves separating words by spaces and punctuation marks. The effectiveness of token splitting
largely depends on the tokenizer’s understanding of the language structure, ensuring the generation of meaningful tokens.
Since tokenizers designed for the English language are not equipped to understand the unique semantic structures of other
languages, such as Korean, they cannot be effectively used for Korean language processing.
In case of Korean text, KoNLPY includes at morphological analyzer calledKkma (Korean Knowledge Morpheme Analyzer).
Kkma provides detailed morphological analysis of Korean text. It breaks down sentences into words and words into their
respective morphemes, identifying parts of speech for each token. It can segment a block of text into individual sentences,
which is particularly useful for processing long texts.
Usage Considerations
While Kkma is renowned for its detailed analysis, it is important to note that this precision may impact processing speed. Thus,
Kkma is best suited for applications where analytical depth is prioritized over rapid text processing.
text_splitter = KonlpyTextSplitter()
texts = text_splitter.split_text(korean_document)
# The sentences are split with "\n\n" characters.
print(texts[0])
춘향전 옛날에 남원에 이 도령이라는 벼슬아치 아들이 있었다.
We use Hugging Face tokenizer, the GPT2TokenizerFast to count the text length in tokens.
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
# This is a long document we can split up.
with open("../../../state_of_the_union.txt") as f:
state_of_the_union = f.read()
from langchain_text_splitters import CharacterTextSplitter
text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
tokenizer, chunk_size=100, chunk_overlap=0
)
texts = text_splitter.split_text(state_of_the_union)
print(texts[0])
Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow A
Last year COVID-19 kept us apart. This year we are finally together again.
Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans.
Previous
« Semantic Chunking
Next
Retrieval »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesChains
Chains
Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step. The primary supported way to
do this is with LCEL.
LCEL is great for constructing your own chains, but it’s also nice to have chains that you can use off-the-shelf. There are two
types of off-the-shelf chains that LangChain supports:
Chains that are built with LCEL. In this case, LangChain offers a higher-level constructor method. However, all that is
being done under the hood is constructing a chain with LCEL.
[Legacy] Chains constructed by subclassing from a legacy Chain class. These chains do not use LCEL under the hood
but are rather standalone classes.
We are working creating methods that create LCEL versions of all chains. We are doing this for a few reasons.
1. Chains constructed in this way are nice because if you want to modify the internals of a chain you can simply modify the
LCEL.
2. These chains natively support streaming, async, and batch out of the box.
This page contains two lists. First, a list of all LCEL chain constructors. Second, a list of all legacy Chains.
LCEL Chains
Chain Constructor
The constructor function for this chain. These are all methods that return LCEL runnables. We also link to the API
documentation.
Function Calling
Other Tools
When to Use
Legacy Chains
Below we report on the legacy chain types that exist. We will maintain support for these until we are able to create a LCEL
alternative. We report on:
Chain
Name of the chain, or name of the constructor method. If constructor method, this will return aChain subclass.
Function Calling
Other Tools
When to Use
Functio
Chain n Other Tools When to Use
Calling
This chain uses an LLM to convert a query into an API request,
Requests
APIChain then executes that request, gets back a response, and then
Wrapper
passes that request to an LLM to respond
Similar to APIChain, this chain is designed to interact with APIs.
OpenAPI
OpenAPIEndpointChain The main difference is this is optimized for ease of use with
Spec
OpenAPI endpoints
This chain can be used to have conversations with a document.
It takes in a question and (optional) previous conversation history.
If there is previous conversation history, it uses an LLM to rewrite
ConversationalRetrievalChain Retriever the conversation into a query to send to a retriever (otherwise it
just uses the newest user input). It then fetches those documents
and passes them (along with the conversation) to an LLM to
respond.
This chain takes a list of documents and formats them all into a
prompt, then passes that prompt to an LLM. It passes ALL
StuffDocumentsChain
documents, so you should make sure it fits within the context
window the LLM you are using.
This chain combines documents by iterative reducing them. It
groups documents into chunks (less than some context length)
groups documents into chunks (less than some context length)
Functio then passes them into an LLM. It then takes the responses and
ReduceDocumentsChain
Chain n Other Tools continues to do this until it can
When to Use
fit everything into one final LLM
Calling call. Useful when you have a lot of documents, you want to have
the LLM run over all of them, and you can do in parallel.
This chain first passes each document through an LLM, then
reduces them using the ReduceDocumentsChain. Useful in the
MapReduceDocumentsChain
same situations as ReduceDocumentsChain, but does an initial
LLM call before trying to reduce the documents.
This chain collapses documents by generating an initial answer
based on the first document and then looping over the remaining
documents to refine its answer. This operates sequentially, so it
RefineDocumentsChain cannot be parallelized. It is useful in similar situatations as
MapReduceDocuments Chain, but for cases where you want to
build up an answer by refining the previous answer (rather than
parallelizing calls).
This calls on LLM on each document, asking it to not only answer
but also produce a score of how confident it is. The answer with
the highest confidence is then returned. This is useful when you
MapRerankDocumentsChain
have a lot of documents, but only want to answer based on a
single document, rather than trying to combine answers (like
Refine and Reduce methods do).
This chain answers, then attempts to refine its answer based on
ConstitutionalChain constitutional principles that are provided. Use this when you want
to enforce that a chain’s answer follows some principles.
LLMChain
This chain converts a natural language question to an
ElasticSearch ElasticSearch query, and then runs it, and then summarizes the
ElasticsearchDatabaseChain
Instance response. This is useful for when you want to ask natural
language questions of an Elastic Search database
This implements FLARE, an advanced retrieval technique. It is
FlareChain
primarily meant as an exploratory advanced retrieval method.
This chain constructs an Arango query from natural language,
Arango
ArangoGraphQAChain executes that query against the graph, and then passes the
Graph
results back to an LLM to respond.
A graph that
This chain constructs an Cypher query from natural language,
works with
GraphCypherQAChain executes that query against the graph, and then passes the
Cypher query
results back to an LLM to respond.
language
This chain constructs a FalkorDB query from natural language,
Falkor
FalkorDBGraphQAChain executes that query against the graph, and then passes the
Database
results back to an LLM to respond.
This chain constructs an HugeGraph query from natural
HugeGraphQAChain HugeGraph language, executes that query against the graph, and then
passes the results back to an LLM to respond.
This chain constructs a Kuzu Graph query from natural language,
KuzuQAChain Kuzu Graph executes that query against the graph, and then passes the
results back to an LLM to respond.
This chain constructs a Nebula Graph query from natural
Nebula
NebulaGraphQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
This chain constructs an Neptune Graph query from natural
Neptune
NeptuneOpenCypherQAChain language, executes that query against the graph, and then
Graph
passes the results back to an LLM to respond.
Graph that This chain constructs an SparQL query from natural language,
GraphSparqlChain works with executes that query against the graph, and then passes the
SparQL results back to an LLM to respond.
This chain converts a user question to a math problem and then
LLMMath
executes it (using numexpr)
This chain uses a second LLM call to varify its initial answer. Use
LLMCheckerChain this when you to have an extra layer of validation on the initial
LLM call.
This chain creates a summary using a sequence of LLM calls to
make sure it is extra correct. Use this over the normal
LLMSummarizationChecker
summarization chain when you are okay with multiple LLM calls
(eg you care more about accuracy than speed/cost).
Uses OpenAI function calling to answer questions and cite its
create_citation_fuzzy_match_chain ✅
sources.
sources.
create_extraction_chain Functio
✅ Uses OpenAI Function calling to extract information from text.
Chain n Other Tools When to Use
Uses OpenAI function calling to extract information from text into
Calling
create_extraction_chain_pydantic ✅ a Pydantic model. Compared to create_extraction_chain this has a
tighter integration with Pydantic.
OpenAPI
get_openapi_chain ✅ Uses OpenAI function calling to query an OpenAPI.
Spec
Uses OpenAI function calling to do question answering over text
create_qa_with_structure_chain ✅
and respond in a specific format.
create_qa_with_sources_chain ✅ Uses OpenAI function calling to answer questions with citations.
Creates both questions and answers from documents. Can be
QAGenerationChain used to generate question/answer pairs for evaluation of retrieval
projects.
Does question answering over retrieved documents, and cites it
sources. Use this when you want the answer response to have
sources in the text response. Use this over
RetrievalQAWithSourcesChain Retriever
load_qa_with_sources_chain when you want to use a retriever to fetch
the relevant document as part of the chain (rather than pass them
in).
Does question answering over documents you pass in, and cites
it sources. Use this when you want the answer response to have
load_qa_with_sources_chain Retriever sources in the text response. Use this over
RetrievalQAWithSources when you want to pass in the documents
directly (rather than rely on a retriever to get them).
This chain first does a retrieval step to fetch relevant documents,
RetrievalQA Retriever then passes those documents into an LLM to generate a
response.
This chain routes input between multiple prompts. Use this when
MultiPromptChain you have multiple potential prompts you could use to respond and
want to route to just one.
This chain routes input between multiple retrievers. Use this when
MultiRetrievalQAChain Retriever you have multiple potential retrievers you could fetch relevant
documents from and want to route to just one.
EmbeddingRouterChain This chain uses embedding similarity to route incoming queries.
LLMRouterChain This chain uses an LLM to route between potential options.
load_summarize_chain
This chain constructs a URL from user input, gets data at that
LLMRequestsChain URL, and then summarizes the response. Compared to APIChain,
this chain is not focused on a single API spec but is more general
Previous
« Tools as OpenAI Functions
Next
[Beta] Memory »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsConcepts
On this page
Concepts
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.
Schema
AgentAction
This is a dataclass that represents the action an agent should take. It has a tool property (which is the name of the tool that
should be invoked) and a tool_input property (the input to that tool)
AgentFinish
This represents the final result from an agent, when it is ready to return to the user. It contains areturn_values key-value
mapping, which contains the final agent output. Usually, this contains an output key containing a string that is the agent's
response.
Intermediate Steps
These represent previous agent actions and corresponding outputs from this CURRENT agent run. These are important to
pass to future iteration so the agent knows what work it has already done. This is typed as a List[Tuple[AgentAction, Any]] . Note
that observation is currently left as type Any to be maximally flexible. In practice, this is often a string.
Agent
This is the chain responsible for deciding what step to take next. This is usually powered by a language model, a prompt, and
an output parser.
Different agents have different prompting styles for reasoning, different ways of encoding inputs, and different ways of
parsing the output. For a full list of built-in agents see agent types. You can also easily build custom agents, should you
need further control.
Agent Inputs
The inputs to an agent are a key-value mapping. There is only one required key:intermediate_steps, which corresponds to
Intermediate Steps as described above.
Generally, the PromptTemplate takes care of transforming these pairs into a format that can best be passed into the LLM.
Agent Outputs
The output is the next action(s) to take or the final response to send to the userAgentAction
( s or AgentFinish ). Concretely, this
can be typed as Union[AgentAction, List[AgentAction], AgentFinish] .
The output parser is responsible for taking the raw LLM output and transforming it into one of these three types.
AgentExecutor
The agent executor is the runtime for an agent. This is what actually calls the agent, executes the actions it chooses, passes
the action outputs back to the agent, and repeats. In pseudocode, this looks roughly like:
next_action = agent.get_action(...)
while next_action != AgentFinish:
observation = run(next_action)
next_action = agent.get_action(..., next_action, observation)
return next_action
While this may seem simple, there are several complexities this runtime handles for you, including:
Tools
Tools are functions that an agent can invoke. The Tool abstraction consists of two components:
1. The input schema for the tool. This tells the LLM what parameters are needed to call the tool. Without this, it will not
know what the correct inputs are. These parameters should be sensibly named and described.
2. The function to run. This is generally just a Python function that is invoked.
Considerations
Without thinking through both, you won't be able to build a working agent. If you don't give the agent access to a correct set
of tools, it will never be able to accomplish the objectives you give it. If you don't describe the tools well, the agent won't know
how to use them properly.
LangChain provides a wide set of built-in tools, but also makes it easy to define your own (including custom descriptions). For
a full list of built-in tools, see the tools integrations section
Toolkits
For many common tasks, an agent will need a set of related tools. For this LangChain provides the concept of toolkits -
groups of around 3-5 tools needed to accomplish specific objectives. For example, the GitHub toolkit has a tool for searching
through GitHub issues, a tool for reading a file, a tool for commenting, etc.
LangChain provides a wide set of toolkits to get started. For a full list of built-in toolkits, see thetoolkits integrations section
Previous
« Quickstart
Next
Agent Types »
Community
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgents
On this page
Agents
The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of
actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take
and in which order.
Quickstart
For a quick start to working with agents, please check outthis getting started guide. This covers basics like initializing an
agent, creating tools, and adding memory.
Concepts
There are several key concepts to understand when building agents: Agents, AgentExecutor, Tools, Toolkits. For an in depth
explanation, please check out this conceptual guide
Agent Types
There are many different types of agents to use. For a overview of the different types and when to use them, please check
out this section.
Tools
Agents are only as good as the tools they have. For a comprehensive guide on tools, please seethis section.
How To Guides
Agents have a lot of related functionality! Check out comprehensive guides including:
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangGraph
On this page
️LangGraph
⚡ Building language agents as graphs ⚡
Overview
LangGraph is a library for building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with)
LangChain. It extends the LangChain Expression Language with the ability to coordinate multiple chains (or actors) across
multiple steps of computation in a cyclic manner. It is inspired by Pregel and Apache Beam. The current interface exposed is
one inspired by NetworkX.
The main use is for adding cycles to your LLM application. Crucially, this is NOT a DAG framework. If you want to build a
DAG, you should just use LangChain Expression Language.
Cycles are important for agent-like behaviors, where you call an LLM in a loop, asking it what action to take next.
Installation
Quick Start
Here we will go over an example of creating a simple agent that uses chat models and function calling. This agent will
represent all its state as a list of messages.
We will need to install some LangChain packages, as well asTavily to use as an example tool.
We also need to export some environment variables for OpenAI and Tavily API access.
export OPENAI_API_KEY=sk-...
export TAVILY_API_KEY=tvly-...
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=ls__...
We will first define the tools we want to use. For this simple example, we will use a built-in search tool via Tavily. However, it
is really easy to create your own tools - see documentation here on how to do that.
tools = [TavilySearchResults(max_results=1)]
We can now wrap these tools in a simple LangGraphToolExecutor . This is a simple class that receives ToolInvocation objects,
calls that tool, and returns the output. ToolInvocation is any class with tool and tool_input attributes.
from langgraph.prebuilt import ToolExecutor
tool_executor = ToolExecutor(tools)
Now we need to load the chat model we want to use. Importantly, this should satisfy two criteria:
1. It should work with lists of messages. We will represent all agent state in the form of messages, so it needs to be able
to work well with them.
2. It should work with the OpenAI function calling interface. This means it should either be an OpenAI model or a model
that exposes a similar interface.
Note: these model requirements are not requirements for using LangGraph - they are just requirements for this one example.
After we've done this, we should make sure the model knows that it has these tools available to call. We can do this by
converting the LangChain tools into the format for OpenAI function calling, and then bind them to the model class.
The main type of graph in langgraph is the StatefulGraph. This graph is parameterized by a state object that it passes around to
each node. Each node then returns operations to update that state. These operations can either SET specific attributes on
the state (e.g. overwrite the existing values) or ADD to the existing attribute. Whether to set or add is denoted by annotating
the state object you construct the graph with.
For this example, the state we will track will just be a list of messages. We want each node to just add messages to that list.
Therefore, we will use a TypedDict with one key (messages) and annotate it so that the messages attribute is always added to.
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]
We now need to define a few different nodes in our graph. Inlanggraph, a node can be either a function or a runnable. There
are two main nodes we need for this:
1. The agent: responsible for deciding what (if any) actions to take.
2. A function to invoke tools: if the agent decides to take an action, this node will then execute that action.
We will also need to define some edges. Some of these edges may be conditional. The reason they are conditional is that
based on the output of a node, one of several paths may be taken. The path that is taken is not known until that node is run
(the LLM decides).
a. If the agent said to take an action, then the function to invoke tools should be called
2. Normal Edge: after the tools are invoked, it should always go back to the agent to decide what to do next
Let's define the nodes, as well as a function to decide how what conditional edge to take.
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage
Use it!
We can now use it! This now exposes thesame interface as all other LangChain runnables. This runnable accepts a list of
messages.
This may take a little bit - it's making a few calls behind the scenes. In order to start seeing some intermediate results as they
happen, we can use streaming - see below for more information on that.
Streaming
One of the benefits of using LangGraph is that it is easy to stream output as it's produced by each node.
---
---
---
---
You can also access the LLM tokens as they are produced by each node. In this case only the "agent" node produces LLM
tokens. In order for this to work properly, you must be using an LLM that supports streaming as well as have set it when
constructing the LLM (e.g. ChatOpenAI(model="gpt-3.5-turbo-1106", streaming=True) )
When to Use
Langchain Expression Language allows you to easily define chains (DAGs) but does not have a good mechanism for adding
in cycles. langgraph adds that syntax.
How-to Guides
Async
If you are running LangGraph in async workflows, you may want to create the nodes to be async by default. For a
walkthrough on how to do that, see this documentation
Streaming Tokens
Sometimes language models take a while to respond and you may want to stream tokens to end users. For a guide on how
to do this, see this documentation
Persistence
LangGraph comes with built-in persistence, allowing you to save the state of the graph at point and resume from there. For a
walkthrough on how to do that, see this documentation
Human-in-the-loop
LangGraph comes with built-in support for human-in-the-loop workflows. This is useful when you want to have a human
review the current state before proceeding to a particular node. For a walkthrough on how to do that, see this documentation
Agents you create with LangGraph can be complex. In order to make it easier to understand what is happening under the
hood, we've added methods to print out and visualize the graph. This can create both ascii art as well as pngs. For a
walkthrough on how to do that, see this documentation
Examples
This agent executor takes a list of messages as input and outputs a list of messages. All agent state is represented as a list
of messages. This specifically uses OpenAI function calling. This is recommended agent executor for newer chat based
models that support function calling.
Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.
Modifications
We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build off the
getting started notebook so it is recommended you start with that first.
AgentExecutor
Getting Started Notebook: Walks through creating this type of executor from scratch
High Level Entrypoint: Walks through how to use the high level entrypoint for the chat agent executor.
Modifications
We also have a lot of examples highlighting how to slightly modify the base chat agent executor. These all build off the
getting started notebook so it is recommended you start with that first.
The following notebooks implement agent architectures prototypical of the "plan-and-execute" style, where an LLM planner
decomposes a user request into a program, an executor executes the program, and an LLM synthesizes a response (and/or
dynamically replans) based on the program outputs.
Plan-and-execute: a simple agent with a planner that generates a multi-step task list, an executor that invokes the
tools in the plan, and a replanner that responds or generates an updated plan. Based on thePlan-and-solve paper by
Wang, et. al.
Reasoning without Observation: planner generates a task list whose observations are saved asvariables. Variables
can be used in subsequent tasks to reduce the need for further re-planning. Based on the ReWOO paper by Xu, et. al.
LLMCompiler: planner generates a DAG of tasks with variable responses. Tasks arestreamed and executed eagerly to
minimize tool execution runtime. Based on the paper by Kim, et. al.
Reflection / Self-Critique
When output quality is a major concern, it's common to incorporate some combination of self-critique or reflection and
external validation to refine your system's outputs. The following examples demonstrate research that implement this type of
design.
Basic Reflection: add a simple "reflect" step in your graph to prompt your system to revise its outputs.
Reflexion: critique missing and superflous aspects of the agent's response to guide subsequent steps. Based on
Reflexion, by Shinn, et. al.
Language Agent Tree Search: execute multiple agents in parallel, using reflection and environmental rewards to drive a
Monte Carlo Tree Search. Based on LATS, by Zhou, et. al.
Multi-agent Examples
Multi-agent collaboration: how to create two agents that work together to accomplish a task
Multi-agent with supervisor: how to orchestrate individual agents by using an LLM as a "supervisor" to distribute work
Hierarchical agent teams: how to orchestrate "teams" of agents as nested graphs that can collaborate to solve a
problem
Web Research
STORM: writing system that generates Wikipedia-style articles on any topic, applying outline generation (planning) +
multi-perspective question-answering for added breadth and reliability. Based on STORM by Shao, et. al.
It can often be tough to evaluation chat bots in multi-turn situations. One way to do this is with simulations.
Chat bot evaluation as multi-agent simulation: how to simulate a dialogue between a "virtual user" and your chat bot
Evaluating over a dataset: benchmark your assistant over a LangSmith dataset, which tasks a simulated customer to
red-team your chat bot.
Multimodal Examples
WebVoyager: vision-enabled web browsing agent that uses Set-of-marks prompting to navigate a web browser and
execute tasks
Chain-of-Table
Chain of Table is a framework that elicits SOTA performance when answering questions over tabular data.This
implementation by Github user CYQIQ uses LangGraph to control the flow.
Documentation
StateGraph
This class is responsible for constructing the graph. It exposes an interface inspired byNetworkX. This graph is
parameterized by a state object that it passes around to each node.
__init__
def __init__(self, schema: Type[Any]) -> None:
When constructing the graph, you need to pass in a schema for a state. Each node then returns operations to update that
state. These operations can either SET specific attributes on the state (e.g. overwrite the existing values) or ADD to the
existing attribute. Whether to set or add is denoted by annotating the state object you construct the graph with.
The recommended way to specify the schema is with a typed dictionary:from typing import TypedDict
You can then annotate the different attributes using from typing imoport Annotated . Currently, the only supported annotation is import
operator; operator.add. This annotation will make it so that any node that returns this attribute ADDS that new result to the
existing value.
class AgentState(TypedDict):
# The input string
input: str
# The outcome of a given call to the agent
# Needs `None` as a valid type, since this is what this will start as
agent_outcome: Union[AgentAction, AgentFinish, None]
# List of actions and corresponding observations
# Here we annotate this with `operator.add` to indicate that operations to
# this state should be ADDED to the existing values (not overwrite it)
intermediate_steps: Annotated[list[tuple[AgentAction, str]], operator.add]
.add_node
def add_node(self, key: str, action: RunnableLike) -> None:
key: A string representing the name of the node. This must be unique.
action : The action to take when this node is called. This should either be a function or a runnable.
.add_edge
def add_edge(self, start_key: str, end_key: str) -> None:
Creates an edge from one node to the next. This means that output of the first node will be passed to the next node. It takes
two arguments.
start_key: A string representing the name of the start node. This key must have already been registered in the graph.
end_key: A string representing the name of the end node. This key must have already been registered in the graph.
.add_conditional_edges
def add_conditional_edges(
self,
start_key: str,
condition: Callable[..., str],
conditional_edge_mapping: Dict[str, str],
) -> None:
This method adds conditional edges. What this means is that only one of the downstream edges will be taken, and which one
that is depends on the results of the start node. This takes three arguments:
start_key: A string representing the name of the start node. This key must have already been registered in the graph.
condition: A function to call to decide what to do next. The input will be the output of the start node. It should return a
string that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.
.set_entry_point
def set_entry_point(self, key: str) -> None:
The entrypoint to the graph. This is the node that is first called. It only takes one argument:
.set_conditional_entry_point
def set_conditional_entry_point(
self,
condition: Callable[..., str],
conditional_edge_mapping: Optional[Dict[str, str]] = None,
) -> None:
This method adds a conditional entry point. What this means is that when the graph is called, it will call thecondition Callable
to decide what node to enter into first.
condition: A function to call to decide what to do next. The input will be the input to the graph. It should return a string
that is present in conditional_edge_mapping and represents the edge to take.
conditional_edge_mapping: A mapping of string to string. The keys should be strings that may be returned bycondition. The
values should be the downstream node to call if that condition is returned.
.set_finish_point
def set_finish_point(self, key: str) -> None:
This is the exit point of the graph. When this node is called, the results will be the final result from the graph. It only has one
argument:
key: The name of the node that, when called, will return the results of calling it as the final output
Note: This does not need to be called if at any point you previously created an edge (conditional or normal) toEND
Graph
from langgraph.graph import Graph
graph = Graph()
This has the same interface as StateGraph with the exception that it doesn't update a state object over time, and rather relies
on passing around the full state from each step. This means that whatever is returned from one node is the input to the next
as is.
END
This is a special node representing the end of the graph. This means that anything passed to this node will be the final output
of the graph. It can be used in two places:
Prebuilt Examples
There are also a few methods we've added to make it easy to use common, prebuilt graphs and components.
ToolExecutor
from langgraph.prebuilt import ToolExecutor
This is a simple helper class to help with calling tools. It is parameterized by a list of tools:
tools = [...]
tool_executor = ToolExecutor(tools)
It then exposes a runnable interface. It can be used to call tools: you can pass in anAgentAction and it will look up the
relevant tool and call it with the appropriate input.
chat_agent_executor.create_function_calling_executor
from langgraph.prebuilt import chat_agent_executor
This is a helper function for creating a graph that works with a chat model that utilizes function calling. Can be created by
passing in a model and a list of tools. The model must be one that supports OpenAI function calling.
tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()
chat_agent_executor.create_tool_calling_executor
from langgraph.prebuilt import chat_agent_executor
This is a helper function for creating a graph that works with a chat model that utilizes tool calling. Can be created by passing
in a model and a list of tools. The model must be one that supports OpenAI tool calling.
tools = [TavilySearchResults(max_results=1)]
model = ChatOpenAI()
create_agent_executor
from langgraph.prebuilt import create_agent_executor
This is a helper function for creating a graph that works withLangChain Agents. Can be created by passing in an agent and
a list of tools.
tools = [TavilySearchResults(max_results=1)]
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Token
ModulesMoreCallbackscounting
Token counting
LangChain offers a context manager that allows you to count tokens.
import asyncio
llm = OpenAI(temperature=0)
with get_openai_callback() as cb:
llm("What is the square root of 4?")
total_tokens = cb.total_tokens
assert total_tokens > 0
# You can kick off concurrent runs from within the context manager
with get_openai_callback() as cb:
await asyncio.gather(
*[llm.agenerate(["What is the square root of 4?"]) for _ in range(3)]
)
await task
assert cb.total_tokens == total_tokens
Previous
« Tags
Next
️ LangServe »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O LLMs
On this page
LLMs
Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather
provides a standard interface for interacting with many different LLMs. To be specific, this interface is one that takes as input
a string and returns a string.
There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - theLLM class is designed to provide a standard
interface for all of them.
Quick Start
Check out this quick start to get an overview of working with LLMs, including all the different methods they expose
Integrations
For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page
How-To Guides
We have several how-to guides for more advanced usage of LLMs. This includes:
Previous
« Tracking token usage
Next
Quick Start »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders PDF
On this page
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to
present documents, including text formatting and images, in a manner independent of application software,
hardware, and operating systems.
This covers how to load PDF documents into the Document format that we use downstream.
Using PyPDF
Load PDF using pypdf into array of documents, where each document contains the page content and metadata withpage
number.
loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
pages[0]
Document(page_content='LayoutParser : A Uni\x0ced Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1( \x00), Ruochen Zhang2, Meli
An advantage of this approach is that documents can be retrieved with page numbers.
import os
import getpass
Extracting images
Using Unstructured
Retain Elements
Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together,
but you can easily keep that separation by specifying mode="elements".
This covers how to load online PDFs into a document format that we can use downstream. This can be used for various
online PDF sites such as https://open.umn.edu/opentextbooks/textbooks/ and https://arxiv.org/archive/
Note: all other PDF loaders can also be used to fetch remote PDFs, butOnlinePDFLoader is a legacy function, and works
specifically with UnstructuredPDFLoader.
Using PyPDFium2
Using PDFMiner
This can be helpful for chunking texts semantically into sections as the output html content can be parsed viaBeautifulSoup to
get more structured and rich information about font size, page numbers, PDF headers/footers, etc.
# if current snippet's font size <= previous section's content => content belongs to the same section (one can also create
# a tree like structure for sub sections if needed but that may require some more thinking and may be data specific)
if not semantic_snippets[cur_idx].metadata['content_font'] or s[1] <= semantic_snippets[cur_idx].metadata['content_font']:
semantic_snippets[cur_idx].page_content += s[0]
semantic_snippets[cur_idx].metadata['content_font'] = max(s[1], semantic_snippets[cur_idx].metadata['content_font'])
continue
# if current snippet's font size > previous section's content but less than previous section's heading than also make a new
# section (e.g. title of a PDF will have the highest font size but we don't want it to subsume all sections)
metadata={'heading':s[0], 'content_font': 0, 'heading_font': s[1]}
metadata.update(data.metadata)
semantic_snippets.append(Document(page_content='',metadata=metadata))
cur_idx += 1
semantic_snippets[4]
Document(page_content='Recently, various DL models and datasets have been developed for layout analysis\ntasks. The dhSegment [22] utilizes fully convolution
Using PyMuPDF
This is the fastest of the PDF parsing options, and contains detailed metadata about the PDF and its pages, as well as
returns one document per page.
Additionally, you can pass along any of the options from thePyMuPDF documentation as keyword arguments in the load call,
and it will be pass along to the get_text() call.
PyPDF Directory
Using PDFPlumber
Like PyMuPDF, the output Documents contain detailed metadata about the PDF and its pages, and returns one document per
page.
Using AmazonTextractPDFParser
The AmazonTextractPDFLoader calls the Amazon Textract Service to convert PDFs into a Document structure. The loader
does pure OCR at the moment, with more features like layout support planned, depending on demand. Single and multi-page
documents are supported with up to 3000 pages and 512 MB of size.
For the call to be successful an AWS account is required, similar to theAWS CLI requirements.
Besides the AWS configuration, it is very similar to the other PDF loaders, while also supporting JPEG, PNG and TIFF and
non-native PDF formats.
Previous
« Markdown
Next
Text Splitters »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesassistants
On this page
OpenAI assistants
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions
and can leverage models, tools, and knowledge to respond to user queries. The Assistants API currently supports
three types of tools: Code Interpreter, Retrieval, and Function calling
You can interact with OpenAI Assistants using OpenAI tools or custom tools. When using exclusively OpenAI tools, you can
just invoke the assistant directly and get final answers. When using custom tools, you can run the assistant and tool
execution loop using the built-in AgentExecutor or easily write your own executor.
Below we show the different ways to interact with Assistants. As a simple example, let’s build a math tutor that can write and
run code.
Now let’s recreate this functionality using our own tools. For this example we’ll use theE2B sandbox runtime tool.
Using AgentExecutor
The OpenAIAssistantRunnable is compatible with the AgentExecutor, so we can pass it in as an agent directly to the
executor. The AgentExecutor handles calling the invoked tools and uploading the tool outputs back to the Assistants API.
Plus it comes with built-in LangSmith tracing.
Custom execution
Or with LCEL we can easily write our own execution loop for running the assistant. This gives us full control over execution.
agent = OpenAIAssistantRunnable.create_assistant(
name="langchain assistant e2b tool",
instructions="You are a personal math tutor. Write and run code to answer math questions.",
tools=tools,
model="gpt-4-1106-preview",
as_agent=True,
)
from langchain_core.agents import AgentFinish
return response
response = execute_agent(agent, tools, {"content": "What's 10 - 4 raised to the 2.7"})
print(response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7\nprint(result)'} {"stdout": "-32.22425314473263", "stderr": "", "artifacts": []}
To use an existing thread we just need to pass the “thread_id” in when invoking the agent.
next_response = execute_agent(
agent,
tools,
{"content": "now add 17.241", "thread_id": response.return_values["thread_id"]},
)
print(next_response.return_values["output"])
e2b_data_analysis {'python_code': 'result = 10 - 4 ** 2.7 + 17.241\nprint(result)'} {"stdout": "-14.983253144732629", "stderr": "", "artifacts": []}
To use an existing Assistant we can initialize the OpenAIAssistantRunnable directly with an assistant_id .
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesMoreCallbacksTags
Tags
You can add tags to your callbacks by passing atags argument to the call()/run()/apply() methods. This is useful for filtering your
logs, e.g. if you want to log all requests made to a specific LLMChain, you can add a tag, and then filter your logs by that tag.
You can pass tags to both constructor and request callbacks, see the examples above for details. These tags are then
passed to the tags argument of the "start" callback methods, ie. on_llm_start, on_chat_model_start, on_chain_start, on_tool_start.
Previous
« Multiple callback handlers
Next
Token counting »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Get startedQuickstart
On this page
Quickstart
In this quickstart we'll show you how to:
Setup
Jupyter Notebook
This guide (and most of the other guides in the documentation) useJupyter notebooks and assume the reader is as well.
Jupyter notebooks are perfect for learning how to work with LLM systems because often times things can go wrong
(unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better
understand them.
You do not NEED to go through the guide in a Jupyter Notebook, but it is recommended. Seehere for instructions on how to
install.
Installation
Pip
Conda
LangSmith
Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these
applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain
or agent. The best way to do this is with LangSmith.
Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above,
make sure to set your environment variables to start logging traces:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="..."
LangChain enables building application that connect external sources of data and computation to LLMs. In this quickstart, we
will walk through a few different ways of doing that. We will start with a simple LLM chain, which just relies on information in
the prompt template to respond. Next, we will build a retrieval chain, which fetches data from a separate database and
passes that into the prompt template. We will then add in chat history, to create a conversation retrieval chain. This allows
you to interact in a chat manner with this LLM, so it remembers previous questions. Finally, we will build an agent - which
utilizes an LLM to determine whether or not it needs to fetch data to answer questions. We will cover these at a high level,
but there are lot of details to all of these! We will link to relevant docs.
LLM Chain
We'll show how to use models available via API, like OpenAI, and local open source models, using integrations like Ollama.
OpenAI
Local (using Ollama)
Anthropic
Cohere
Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:
export OPENAI_API_KEY="..."
llm = ChatOpenAI()
If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:
llm = ChatOpenAI(openai_api_key="...")
Once you've installed and initialized the LLM of your choice, we can try using it! Let's ask it what LangSmith is - this is
something that wasn't present in the training data so it shouldn't have a very good response.
We can also guide it's response with a prompt template. Prompt templates are used to convert raw user input to a better
input to the LLM.
We can now invoke it and ask the same question. It still won't know the answer, but it should respond in a more proper tone
for a technical writer!
The output of a ChatModel (and therefore, of this chain) is a message. However, it's often much more convenient to work with
strings. Let's add a simple output parser to convert the chat message to a string.
output_parser = StrOutputParser()
We can now invoke it and ask the same question. The answer will now be a string (rather than a ChatMessage).
chain.invoke({"input": "how can langsmith help with testing?"})
Diving Deeper
We've now successfully set up a basic LLM chain. We only touched on the basics of prompts, models, and output parsers -
for a deeper dive into everything mentioned here, see this section of documentation.
Retrieval Chain
In order to properly answer the original question ("how can langsmith help with testing?"), we need to provide additional
context to the LLM. We can do this via retrieval. Retrieval is useful when you have too much data to pass to the LLM
directly. You can then use a retriever to fetch only the most relevant pieces and pass those in.
In this process, we will look up relevant documents from aRetriever and then pass them into the prompt. A Retriever can be
backed by anything - a SQL table, the internet, etc - but in this instance we will populate a vector store and use that as a
retriever. For more information on vectorstores, see this documentation.
First, we need to load the data that we want to index. In order to do this, we will use the WebBaseLoader. This requires
installing BeautifulSoup:
docs = loader.load()
Next, we need to index it into a vectorstore. This requires a few components, namely anembedding model and a vectorstore.
For embedding models, we once again provide examples for accessing via API or by running local models.
OpenAI (API)
Local (using Ollama)
Cohere (API)
Make sure you have the `langchain_openai` package installed an the appropriate environment variables set (these are the
same as needed for the LLM).
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
Now, we can use this embedding model to ingest documents into a vectorstore. We will use a simple local vectorstore,
FAISS, for simplicity's sake.
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)
Now that we have this data indexed in a vectorstore, we will create a retrieval chain. This chain will take an incoming
question, look up relevant documents, then pass those documents along with the original question into an LLM and ask it to
answer the original question.
First, let's set up the chain that takes a question and the retrieved documents and generates an answer.
from langchain.chains.combine_documents import create_stuff_documents_chain
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")
document_chain.invoke({
"input": "how can langsmith help with testing?",
"context": [Document(page_content="langsmith can let you visualize test results")]
})
However, we want the documents to first come from the retriever we just set up. That way, for a given question we can use
the retriever to dynamically select the most relevant documents and pass those in.
retriever = vector.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, document_chain)
We can now invoke this chain. This returns a dictionary - the response from the LLM is in theanswer key
Diving Deeper
We've now successfully set up a basic retrieval chain. We only touched on the basics of retrieval - for a deeper dive into
everything mentioned here, see this section of documentation.
The chain we've created so far can only answer single questions. One of the main types of LLM applications that people are
building are chat bots. So how do we turn this chain into one that can answer follow up questions?
We can still use the create_retrieval_chain function, but we need to change two things:
1. The retrieval method should now not just work on the most recent input, but rather should take the whole history into
account.
2. The final LLM chain should likewise take the whole history into account
Updating Retrieval
In order to update retrieval, we will create a new chain. This chain will take in the most recent inputinput
( ) and the
conversation history (chat_history) and use an LLM to generate a search query.
# First we need a prompt that we can pass into an LLM to generate this search query
prompt = ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation")
])
retriever_chain = create_history_aware_retriever(llm, retriever, prompt)
We can test this out by passing in an instance where the user is asking a follow up question.
from langchain_core.messages import HumanMessage, AIMessage
You should see that this returns documents about testing in LangSmith. This is because the LLM generated a new query,
combining the chat history with the follow up question.
Now that we have this new retriever, we can create a new chain to continue the conversation with these retrieved documents
in mind.
prompt = ChatPromptTemplate.from_messages([
("system", "Answer the user's questions based on the below context:\n\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
])
document_chain = create_stuff_documents_chain(llm, prompt)
We can see that this gives a coherent answer - we've successfully turned our retrieval chain into a chatbot!
Agent
We've so far create examples of chains - where each step is known ahead of time. The final thing we will create is an agent -
where the LLM decides what steps to take.
NOTE: for this example we will only show how to create an agent using OpenAI models, as local models are not
reliable enough yet.
One of the first things to do when building an agent is to decide what tools it should have access to. For this example, we will
give the agent access to two tools:
1. The retriever we just created. This will let it easily answer questions about LangSmith
2. A search tool. This will let it easily answer questions that require up to date information.
retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)
The search tool that we will use is Tavily. This will require an API key (they have generous free tier). After creating it on their
platform, you need to set it as an environment variable:
export TAVILY_API_KEY=...
If you do not want to set up an API key, you can skip creating this tool.
search = TavilySearchResults()
Now that we have the tools, we can create an agent to use them. We will go over this pretty quickly - for a deeper dive into
what exactly is going on, check out the Agent's Getting Started documentation
Install langchain hub first
We can now invoke the agent and see how it responds! We can ask it questions about LangSmith:
Diving Deeper
We've now successfully set up a basic agent. We only touched on the basics of agents - for a deeper dive into everything
mentioned here, see this section of documentation.
Now that we've built an application, we need to serve it. That's where LangServe comes in. LangServe helps developers
deploy LangChain chains as a REST API. You do not need to use LangServe to use LangChain, but in this guide we'll show
how you can deploy your app with LangServe.
While the first part of this guide was intended to be run in a Jupyter Notebook, we will now move out of that. We will be
creating a Python file and then interacting with it from the command line.
Install with:
Server
To create a server for our application we'll make aserve.py file. This will contain our logic for serving our application. It consists
of three things:
# 1. Load Retriever
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
vector = FAISS.from_documents(documents, embeddings)
retriever = vector.as_retriever()
# 2. Create Tools
retriever_tool = create_retriever_tool(
retriever,
"langsmith_search",
"Search for information about LangSmith. For any questions about LangSmith, you must use this tool!",
)
search = TavilySearchResults()
tools = [retriever_tool, search]
# 3. Create Agent
prompt = hub.pull("hwchase17/openai-functions-agent")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# 4. App definition
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple API server using LangChain's Runnable interfaces",
)
class Input(BaseModel):
input: str
chat_history: List[BaseMessage] = Field(
...,
extra={"widget": {"type": "chat", "input": "location"}},
)
class Output(BaseModel):
output: str
add_routes(
app,
agent_executor.with_types(input_type=Input, output_type=Output),
path="/agent",
)
if __name__ == "__main__":
import uvicorn
Playground
Every LangServe service comes with a simple built-in UI for configuring and invoking the application with streaming output
and visibility into intermediate steps. Head to http://localhost:8000/agent/playground/ to try it out! Pass in the same question
as before - "how can langsmith help with testing?" - and it should respond same as before.
Client
Now let's set up a client for programmatically interacting with our service. We can easily do this with the
[langserve.RemoteRunnable](/docs/langserve#client). Using this, we can interact with the served chain as if it were running client-side.
remote_chain = RemoteRunnable("http://localhost:8000/agent/")
remote_chain.invoke({
"input": "how can langsmith help with testing?",
"chat_history": [] # Providing an empty list as this is the first call
})
Next steps
We've touched on how to build an application with LangChain, how to trace it with LangSmith, and how to serve it with
LangServe. There are a lot more features in all three of these than we can cover here. To continue on your journey, we
recommend you read the following (in order):
All of these features are backed by LangChain Expression Language (LCEL) - a way to chain these components
together. Check out that documentation to better understand how to create custom chains.
Model IO covers more details of prompts, LLMs, and output parsers.
Retrieval covers more details of everything related to retrieval
Agents covers details of everything related to agents
Explore common end-to-end use cases and template applications
Read up on LangSmith, the platform for debugging, testing, monitoring and more
Learn more about serving your applications with LangServe
Previous
« Installation
Next
Security »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language
Streaming support When you build your chains with LCEL you get the best possible time-to-first-token (time elapsed until
the first chunk of output comes out). For some chains this means eg. we stream tokens straight from an LLM to a streaming
output parser, and you get back parsed, incremental chunks of output at the same rate as the LLM provider outputs the raw
tokens.
Async support Any chain built with LCEL can be called both with the synchronous API (eg. in your Jupyter notebook while
prototyping) as well as with the asynchronous API (eg. in a LangServe server). This enables using the same code for
prototypes and in production, with great performance, and the ability to handle many concurrent requests in the same server.
Optimized parallel execution Whenever your LCEL chains have steps that can be executed in parallel (eg if you fetch
documents from multiple retrievers) we automatically do it, both in the sync and the async interfaces, for the smallest
possible latency.
Retries and fallbacks Configure retries and fallbacks for any part of your LCEL chain. This is a great way to make your
chains more reliable at scale. We’re currently working on adding streaming support for retries/fallbacks, so you can get the
added reliability without any latency cost.
Access intermediate results For more complex chains it’s often very useful to access the results of intermediate steps even
before the final output is produced. This can be used to let end-users know something is happening, or even just to debug
your chain. You can stream intermediate results, and it’s available on every LangServe server.
Input and output schemas Input and output schemas give every LCEL chain Pydantic and JSONSchema schemas inferred
from the structure of your chain. This can be used for validation of inputs and outputs, and is an integral part of LangServe.
Seamless LangSmith tracing integration As your chains get more and more complex, it becomes increasingly important to
understand what exactly is happening at every step. With LCEL, all steps are automatically logged to LangSmith for
maximum observability and debuggability.
Seamless LangServe deployment integration Any chain created with LCEL can be easily deployed usingLangServe.
Previous
« Security
Next
Get started »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language Interface
On this page
Interface
To make it as easy as possible to create custom chains, we’ve implemented a“Runnable” protocol. The Runnable protocol is
implemented for most components. This is a standard interface, which makes it easy to define custom chains as well as
invoke them in a standard way. The standard interface includes:
All runnables expose input and output schemas to inspect the inputs and outputs: - input_schema: an input Pydantic model
auto-generated from the structure of the Runnable - output_schema: an output Pydantic model auto-generated from the structure
of the Runnable
Let’s take a look at these methods. To do so, we’ll create a super simple PromptTemplate + ChatModel chain.
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
chain = prompt | model
Input Schema
A description of the inputs accepted by a Runnable. This is a Pydantic model dynamically generated from the structure of any
Runnable. You can call .schema() on it to obtain a JSONSchema representation.
# The input schema of the chain is the input schema of its first part, the prompt.
chain.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
prompt.input_schema.schema()
{'title': 'PromptInput',
'type': 'object',
'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}
model.input_schema.schema()
{'title': 'ChatOpenAIInput',
'anyOf': [{'type': 'string'},
{'$ref': '#/definitions/StringPromptValue'},
{'$ref': '#/definitions/ChatPromptValueConcrete'},
{'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}}],
'definitions': {'StringPromptValue': {'title': 'StringPromptValue',
'description': 'String prompt value.',
'type': 'object',
'properties': {'text': {'title': 'Text', 'type': 'string'},
'type': {'title': 'Type',
'default': 'StringPromptValue',
'enum': ['StringPromptValue'],
'type': 'string'}},
'required': ['text']},
'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']},
'ChatPromptValueConcrete': {'title': 'ChatPromptValueConcrete',
'description': 'Chat prompt value which explicitly lists out the message types it accepts.\nFor use in external schemas.',
'type': 'object',
'properties': {'messages': {'title': 'Messages',
'type': 'array',
'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}]}},
'type': {'title': 'Type',
'default': 'ChatPromptValueConcrete',
'enum': ['ChatPromptValueConcrete'],
'type': 'string'}},
'required': ['messages']}}}
Output Schema
A description of the outputs produced by a Runnable. This is a Pydantic model dynamically generated from the structure of
any Runnable. You can call .schema() on it to obtain a JSONSchema representation.
# The output schema of the chain is the output schema of its last part, in this case a ChatModel, which outputs a ChatMessage
chain.output_schema.schema()
{'title': 'ChatOpenAIOutput',
'anyOf': [{'$ref': '#/definitions/AIMessage'},
{'$ref': '#/definitions/HumanMessage'},
{'$ref': '#/definitions/ChatMessage'},
{'$ref': '#/definitions/SystemMessage'},
{'$ref': '#/definitions/FunctionMessage'},
{'$ref': '#/definitions/ToolMessage'}],
'definitions': {'AIMessage': {'title': 'AIMessage',
'description': 'A Message from an AI.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'ai',
'enum': ['ai'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'HumanMessage': {'title': 'HumanMessage',
'description': 'A Message from a human.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'human',
'enum': ['human'],
'type': 'string'},
'example': {'title': 'Example', 'default': False, 'type': 'boolean'}},
'required': ['content']},
'ChatMessage': {'title': 'ChatMessage',
'description': 'A Message that can be assigned an arbitrary speaker (i.e. role).',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'chat',
'enum': ['chat'],
'type': 'string'},
'role': {'title': 'Role', 'type': 'string'}},
'required': ['content', 'role']},
'SystemMessage': {'title': 'SystemMessage',
'description': 'A Message for priming AI behavior, usually passed in as the first of a sequence\nof input messages.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'system',
'enum': ['system'],
'type': 'string'}},
'required': ['content']},
'FunctionMessage': {'title': 'FunctionMessage',
'description': 'A Message for passing the result of executing a function back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'function',
'enum': ['function'],
'type': 'string'},
'name': {'title': 'Name', 'type': 'string'}},
'required': ['content', 'name']},
'ToolMessage': {'title': 'ToolMessage',
'description': 'A Message for passing the result of executing a tool back to a model.',
'type': 'object',
'properties': {'content': {'title': 'Content',
'anyOf': [{'type': 'string'},
{'type': 'array',
'items': {'anyOf': [{'type': 'string'}, {'type': 'object'}]}}]},
'additional_kwargs': {'title': 'Additional Kwargs', 'type': 'object'},
'type': {'title': 'Type',
'default': 'tool',
'enum': ['tool'],
'type': 'string'},
'tool_call_id': {'title': 'Tool Call Id', 'type': 'string'}},
'required': ['content', 'tool_call_id']}}}
Stream
Invoke
chain.invoke({"topic": "bears"})
AIMessage(content="Why don't bears wear shoes? \n\nBecause they have bear feet!")
Batch
You can set the number of concurrent requests by using themax_concurrency parameter
Async Stream
Async Invoke
Async Batch
Event Streaming is a beta API, and may change a bit based on feedback.
For now, when using the astream_events API, for everything to work properly please:
Event Reference
Here is a reference table that shows some events that might be emitted by the various Runnable objects. Definitions for
some of the Runnable are included after the table.
⚠️ When streaming the inputs for the runnable will not be available until the input stream has been entirely consumed This
means that the inputs will be available at for the corresponding end hook rather than start event.
event name chunk input output
{“messages”:
on_chat_model_start [model name] [[SystemMessage,
HumanMessage]]}
on_chat_model_stream [model name] AIMessageChunk(content=“hello”)
{“messages”:
{“generations”: […],
on_chat_model_end [model name] [[SystemMessage,
“llm_output”: None, …}
HumanMessage]]}
on_llm_start [model name] {‘input’: ‘hello’}
on_llm_stream [model name] ‘Hello’
on_llm_end [model name] ‘Hello human!’
on_chain_start format_docs
on_chain_stream format_docs “hello world!, goodbye world!”
“hello world!, goodbye
on_chain_end format_docs [Document(…)]
world!”
on_tool_start some_tool {“x”: 1, “y”: “2”}
on_tool_stream some_tool {“x”: 1, “y”: “2”}
on_tool_end some_tool {“x”: 1, “y”: “2”}
on_retriever_start [retriever name] {“query”: “hello”}
on_retriever_chunk [retriever name] {documents: […]}
on_retriever_end [retriever name] {“query”: “hello”} {documents: […]}
on_prompt_start [template_name] {“question”: “hello”}
ChatPromptValue(messages:
on_prompt_end [template_name] {“question”: “hello”}
[SystemMessage, …])
format_docs:
format_docs = RunnableLambda(format_docs)
some_tool:
@tool
def some_tool(x: int, y: str) -> dict:
'''Some_tool.'''
return {"x": x, "y": y}
prompt:
template = ChatPromptTemplate.from_messages(
[("system", "You are Cat Agent 007"), ("human", "{question}")]
).with_config({"run_name": "my_template", "tags": ["my_template"]})
Let’s define a new chain to make it more interesting to show off theastream_events interface (and later the astream_log interface).
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
vectorstore = FAISS.from_texts(
["harrison worked at kensho"], embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
retrieval_chain = (
{
"context": retriever.with_config(run_name="Docs"),
"question": RunnablePassthrough(),
}
| prompt
| model.with_config(run_name="my_llm")
| StrOutputParser()
)
Now let’s use astream_events to get events from the retriever and the LLM.
--
Retrieved the following documents:
[Document(page_content='harrison worked at kensho')]
Streaming LLM:
|H|arrison| worked| at| Kens|ho|.||
Done streaming LLM.
All runnables also have a method .astream_log() which is used to stream (as they happen) all or part of the intermediate steps
of your chain/sequence.
This is useful to show progress to the user, to use intermediate results, or to debug your chain.
You can stream all steps (default) or include/exclude steps by name, tags or metadata.
This method yields JSONPatch ops that when applied in the same order as received build up the RunState.
class LogEntry(TypedDict):
id: str
"""ID of the sub-run."""
name: str
"""Name of the object being run."""
type: str
"""Type of the object being run, eg. prompt, chain, llm, etc."""
tags: List[str]
"""List of tags for the run."""
metadata: Dict[str, Any]
"""Key-value pairs of metadata for the run."""
start_time: str
"""ISO-8601 timestamp of when the run started."""
streamed_output_str: List[str]
"""List of LLM tokens streamed by this run, if applicable."""
final_output: Optional[Any]
"""Final output of this run.
Only available after the run has finished successfully."""
end_time: Optional[str]
"""ISO-8601 timestamp of when the run ended.
Only available after the run has finished."""
class RunState(TypedDict):
id: str
"""ID of the run."""
streamed_output: List[Any]
"""List of output chunks streamed by Runnable.stream()"""
final_output: Optional[Any]
"""Final output of the run, usually the result of aggregating (`+`) streamed_output.
Only available after the run has finished successfully."""
This is useful eg. to stream the JSONPatch in an HTTP server, and then apply the ops on the client to rebuild the run state
there. See LangServe for tooling to make it easier to build a webserver from any Runnable.
You can simply pass diff=False to get incremental values of RunState. You get more verbose output with more repetitive parts.
Parallelism
Let’s take a look at how LangChain Expression Language supports parallel requests. For example, when using a
RunnableParallel (often written as a dictionary) it executes each element in parallel.
Parallelism on batches
Parallelism can be combined with other runnables. Let’s try to use parallelism with batches.
%%time
chain1.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 17.3 ms, sys: 4.84 ms, total: 22.2 ms
Wall time: 628 ms
[AIMessage(content="Why don't bears wear shoes?\n\nBecause they have bear feet!"),
AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!")]
%%time
chain2.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 15.8 ms, sys: 3.83 ms, total: 19.7 ms
Wall time: 718 ms
[AIMessage(content='In the wild, bears roam,\nMajestic guardians of ancient home.'),
AIMessage(content='Whiskers grace, eyes gleam,\nCats dance through the moonbeam.')]
%%time
combined.batch([{"topic": "bears"}, {"topic": "cats"}])
CPU times: user 44.8 ms, sys: 3.17 ms, total: 48 ms
Wall time: 721 ms
[{'joke': AIMessage(content="Sure, here's a bear joke for you:\n\nWhy don't bears wear shoes?\n\nBecause they have bear feet!"),
'poem': AIMessage(content="Majestic bears roam,\nNature's strength, beauty shown.")},
{'joke': AIMessage(content="Why don't cats play poker in the wild?\n\nToo many cheetahs!"),
'poem': AIMessage(content="Whiskers dance, eyes aglow,\nCats embrace the night's gentle flow.")}]
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Backed by a Vector
ModulesMoreMemorytypes Store
On this page
This differs from most of the other Memory classes in that it doesn't explicitly track the order of interactions.
In this case, the "docs" are previous conversation snippets. This can be useful to refer to relevant pieces of information that
the AI was told earlier in the conversation.
Depending on the store you choose, this step may look different. Consult the relevant vector store documentation for more
details.
import faiss
# In actual usage, you would set `k` to be a higher value, but we use k=1 to show that
# the vector lookup still returns the semantically relevant information
retriever = vectorstore.as_retriever(search_kwargs=dict(k=1))
memory = VectorStoreRetrieverMemory(retriever=retriever)
# When added to an agent, the memory object can save pertinent information from conversations or used tools
memory.save_context({"input": "My favorite food is pizza"}, {"output": "that's good to know"})
memory.save_context({"input": "My favorite sport is soccer"}, {"output": "..."})
memory.save_context({"input": "I don't the Celtics"}, {"output": "ok"}) #
print(memory.load_memory_variables({"prompt": "what sport should i watch?"})["history"])
input: My favorite sport is soccer
output: ...
Using in a chain
Let's walk through an example, again setting verbose=True so we can see the prompt.
llm = OpenAI(temperature=0) # Can be any valid LLM
_DEFAULT_TEMPLATE = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its cont
Current conversation:
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input"], template=_DEFAULT_TEMPLATE
)
conversation_with_summary = ConversationChain(
llm=llm,
prompt=PROMPT,
memory=memory,
verbose=True
)
conversation_with_summary.predict(input="Hi, my name is Perry, what's up?")
Current conversation:
Human: Hi, my name is Perry, what's up?
AI:
Current conversation:
Human: what's my favorite sport?
AI:
# Even though the language model is stateless, since relevant memory is fetched, it can "reason" about the time.
# Timestamping memories and data is useful in general to let the agent determine temporal relevance
conversation_with_summary.predict(input="Whats my favorite food")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know
Current conversation:
Human: Whats my favorite food
AI:
Current conversation:
Human: What's my name?
AI:
Previous
« Conversation Token Buffer
Next
[Beta] Memory »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How RunnableBranch: Dynamically route logic based on
Language to input
On this page
Routing allows you to create non-deterministic chains where the output of a previous step defines the next step. Routing
helps provide structure and consistency around interactions with LLMs.
We’ll illustrate both methods using a two step sequence where the first step classifies an input question as being about
LangChain, Anthropic, or Other, then routes to a corresponding prompt chain.
Example Setup
First, let’s create a chain that will identify incoming questions as being aboutLangChain, Anthropic, or Other:
chain = (
PromptTemplate.from_template(
"""Given the user question below, classify it as either being about `LangChain`, `Anthropic`, or `Other`.
<question>
{question}
</question>
Classification:"""
)
| ChatAnthropic()
| StrOutputParser()
)
Question: {question}
Answer:"""
)
| ChatAnthropic()
)
anthropic_chain = (
PromptTemplate.from_template(
"""You are an expert in anthropic. \
Always answer questions starting with "As Dario Amodei told me". \
Respond to the following question:
Question: {question}
Answer:"""
)
| ChatAnthropic()
)
general_chain = (
PromptTemplate.from_template(
"""Respond to the following question:
Question: {question}
Answer:"""
)
| ChatAnthropic()
)
You can also use a custom function to route between different outputs. Here’s an example:
def route(info):
if "anthropic" in info["topic"].lower():
return anthropic_chain
elif "langchain" in info["topic"].lower():
return langchain_chain
else:
return general_chain
from langchain_core.runnables import RunnableLambda
Using a RunnableBranch
A RunnableBranch is a special type of runnable that allows you to define a set of conditions and runnables to execute based on
the input. It does not offer anything that you can’t achieve in a custom function as described above, so we recommend using
a custom function instead.
A RunnableBranch is initialized with a list of (condition, runnable) pairs and a default runnable. It selects which branch by
passing each condition the input it’s invoked with. It selects the first condition to evaluate to True, and runs the corresponding
runnable to that condition with the input.
branch = RunnableBranch(
(lambda x: "anthropic" in x["topic"].lower(), anthropic_chain),
(lambda x: "langchain" in x["topic"].lower(), langchain_chain),
general_chain,
)
full_chain = {"topic": chain, "question": lambda x: x["question"]} | branch
full_chain.invoke({"question": "how do I use Anthropic?"})
AIMessage(content=" As Dario Amodei told me, here are some ways to use Anthropic:\n\n- Sign up for an account on Anthropic's website to access tools like Claude
Previous
« RunnableLambda: Run Custom Functions
Next
Bind runtime args »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Parent Document
ModulesRetrievalRetrieversRetriever
On this page
1. You may want to have small documents, so that their embeddings can most accurately reflect their meaning. If too
long, then the embeddings can lose meaning.
2. You want to have long enough documents that the context of each chunk is retained.
The ParentDocumentRetriever strikes that balance by splitting and storing small chunks of data. During retrieval, it first fetches the
small chunks but then looks up the parent ids for those chunks and returns those larger documents.
Note that “parent document” refers to the document that a small chunk originated from. This can either be the whole raw
document OR a larger chunk.
In this mode, we want to retrieve the full documents. Therefore, we only specify a child splitter.
list(store.yield_keys())
['cfdf4af7-51f2-4ea3-8166-5be208efa040',
'bf213c21-cc66-4208-8a72-733d030187e6']
Let’s now call the vector store search functionality - we should see that it returns small chunks (since we’re storing the small
chunks).
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
Let’s now retrieve from the overall retriever. This should return large documents - since it returns the documents where the
smaller chunks are located.
Sometimes, the full documents can be too big to want to retrieve them as is. In that case, what we really want to do is to first
split the raw documents into larger chunks, and then split it into smaller chunks. We then index the smaller chunks, but on
retrieval we retrieve the larger chunks (but still not the full documents).
We can see that there are much more than two documents now - these are the larger chunks.
len(list(store.yield_keys()))
66
Let’s make sure the underlying vector store still retrieves the small chunks.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.
We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
Previous
« MultiVector Retriever
Next
Self-querying »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesRetrievalRetrieversMultiQueryRetriever
On this page
MultiQueryRetriever
Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded
documents based on “distance”. But, retrieval may produce different results with subtle changes in query wording or if the
embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually
address these problems, but can be tedious.
The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different
perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union
across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same
question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a
richer set of results.
# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)
# VectorDB
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
Simple usage
Specify the LLM to use for query generation, and the retriever will do the rest.
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ['1. How can Task Decomposition be approached?', '2. What are the different methods for Task Decompos
You can also supply a prompt along with an output parser to split the results into a list of queries.
from typing import List
# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
# "lines" is the key (attribute name) of the parsed output
lines: List[str] = Field(description="Lines of text")
class LineListOutputParser(PydanticOutputParser):
def __init__(self) -> None:
super().__init__(pydantic_object=LineList)
output_parser = LineListOutputParser()
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines.
Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)
# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)
# Other inputs
question = "What are the approaches to Task Decomposition?"
# Run
retriever = MultiQueryRetriever(
retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
) # "lines" is the key (attribute name) of the parsed output
# Results
unique_docs = retriever.get_relevant_documents(
query="What does the course say about regression?"
)
len(unique_docs)
INFO:langchain.retrievers.multi_query:Generated queries: ["1. What is the course's perspective on regression?", '2. Can you provide information on regression as dis
11
Previous
« Vector store-backed retriever
Next
Contextual compression »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsAgent Types
Agent Types
This categorizes all the available agents along a few dimensions.
Whether this agent is intended for Chat Models (takes in messages, outputs message) or LLMs (takes in string, outputs
string). The main thing this affects is the prompting strategy used. You can use an agent with a different type of model than it
is intended for, but it likely won't produce results of the same quality.
Whether or not these agent types support chat history. If it does, that means it can be used as a chatbot. If it does not, then
that means it's more suited for single tasks. Supporting chat history generally requires better models, so earlier agent types
aimed at worse models may not support it.
Whether or not these agent types support tools with multiple inputs. If a tool only requires a single input, it is generally easier
for an LLM to know how to invoke it. Therefore, several earlier agent types aimed at worse models may not support them.
Having an LLM call multiple tools at the same time can greatly speed up agents whether there are tasks that are assisted by
doing so. However, it is much more challenging for LLMs to do this, so some agent types do not support this.
Whether this agent requires the model to support any additional parameters. Some agent types take advantage of things like
OpenAI function calling, which require other model parameters. If none are required, then that means that everything is done
via prompting
When to Use
Our commentary on when you should consider using this agent type.
Supports Supports
Intended Support Require
Agent Multi- Parallel
Model s Chat d Model When to Use API
Type Input Function
Type History Params
Tools Calling
OpenAI
Chat ✅ ✅ ✅ tools If you are using a recent OpenAI model (1106 onwards) Ref
Tools
If you are using an OpenAI model, or an open-source
OpenAI
Chat ✅ ✅ functions model that has been finetuned for function calling and Ref
Functions
exposes the same functions parameters as OpenAI
If you are using Anthropic models, or other models good
XML LLM ✅ Ref
at XML
Structured
Chat ✅ ✅ If you need to support tools with multiple inputs Ref
Chat
JSON
Chat ✅ If you are using a model good at JSON Ref
Chat
ReAct LLM ✅ If you are using a simple model Ref
Self Ask
If you are using a simple model and only have one
With LLM Ref
search tool
Search
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters MarkdownHeaderTextSplitter
On this page
MarkdownHeaderTextSplitter
Motivation
Many chat or Q+A applications involve chunking input documents prior to embedding and vector storage.
When a full paragraph or document is embedded, the embedding process considers both the overall context and the relationships between the sentences and phrase
As mentioned, chunking often aims to keep text with common context together. With this in mind, we might want to
specifically honor the structure of the document itself. For example, a markdown file is organized by headers. Creating
chunks within specific header groups is an intuitive idea. To address this challenge, we can use MarkdownHeaderTextSplitter. This
will split a markdown file by a specified set of headers.
md = '# Foo\n\n ## Bar\n\nHi this is Jim \nHi this is Joe\n\n ## Baz\n\n Hi this is Molly'
{'content': 'Hi this is Jim \nHi this is Joe', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Bar'}}
{'content': 'Hi this is Molly', 'metadata': {'Header 1': 'Foo', 'Header 2': 'Baz'}}
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
("###", "Header 3"),
]
markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='Hi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='Hi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='Hi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
type(md_header_splits[0])
langchain.schema.document.Document
By default, MarkdownHeaderTextSplitter strips headers being split on from the output chunk’s content. This can be disabled by
setting strip_headers = False .
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)
md_header_splits
[Document(page_content='# Foo \n## Bar \nHi this is Jim \nHi this is Joe', metadata={'Header 1': 'Foo', 'Header 2': 'Bar'}),
Document(page_content='### Boo \nHi this is Lance', metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'}),
Document(page_content='## Baz \nHi this is Molly', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'})]
Within each markdown group we can then apply any text splitter we want.
markdown_document = "# Intro \n\n ## History \n\n Markdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber
headers_to_split_on = [
("#", "Header 1"),
("##", "Header 2"),
]
# MD splits
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on, strip_headers=False
)
md_header_splits = markdown_splitter.split_text(markdown_document)
# Char-level splits
from langchain_text_splitters import RecursiveCharacterTextSplitter
chunk_size = 250
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
# Split
splits = text_splitter.split_documents(md_header_splits)
splits
[Document(page_content='# Intro \n## History \nMarkdown[9] is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber crea
Document(page_content='Markdown is widely used in blogging, instant messaging, online forums, collaborative software, documentation pages, and readme files.', m
Document(page_content='## Rise and divergence \nAs Markdown popularity grew rapidly, many Markdown implementations appeared, driven mostly by the need fo
Document(page_content='#### Standardization \nFrom 2012, a group of people, including Jeff Atwood and John MacFarlane, launched what Atwood characterised
Document(page_content='## Implementations \nImplementations of Markdown are available for over a dozen programming languages.', metadata={'Header 1': 'Intro
Previous
« Split code
Next
Recursively split JSON »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
OpenAI
ModulesAgentsAgent Typesfunctions
On this page
OpenAI functions
CAUTION
OpenAI API has deprecated functions in favor of tools. The difference between the two is that thetools API allows the model to
request that multiple functions be invoked at once, which can reduce response times in some architectures. It’s
recommended to use the tools agent for OpenAI models.
OpenAI Tools
Certain OpenAI models (like gpt-3.5-turbo-0613 and gpt-4-0613) have been fine-tuned to detect when a function should be
called and respond with the inputs that should be passed to the function. In an API call, you can describe functions and have
the model intelligently choose to output a JSON object containing arguments to call those functions. The goal of the OpenAI
Function APIs is to more reliably return valid and useful function calls than a generic text completion or chat API.
A number of open source models have adopted the same format for function calls and have also fine-tuned the model to
detect when a function should be called.
Install openai, tavily-python packages which are required as the LangChain packages call them internally.
TIP
The functions format remains relevant for open source models and providers that have adopted it, and this agent is expected
to work for such models.
Initialize Tools
Create Agent
Run Agent
[{'url': 'https://www.ibm.com/topics/langchain', 'content': 'LangChain is essentially a library of abstractions for Python and Javascript, representing common steps and c
agent_executor.invoke(
{
"input": "what's my name?",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)
Previous
« Agent Types
Next
OpenAI tools »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Example Selector Select by
ModulesI/O Prompts Types similarity
Select by similarity
This object selects examples based on similarity to the inputs. It does this by finding the examples with the embeddings that
have the greatest cosine similarity with the inputs.
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
Input: happy
Output: sad
Input: worried
Output:
# Input is a measurement, so should select the tall/short example
print(similar_prompt.format(adjective="large"))
Give the antonym of every input
Input: tall
Output: short
Input: large
Output:
# You can add new examples to the SemanticSimilarityExampleSelector as well
similar_prompt.example_selector.add_example(
{"input": "enthusiastic", "output": "apathetic"}
)
print(similar_prompt.format(adjective="passionate"))
Give the antonym of every input
Input: enthusiastic
Output: apathetic
Input: passionate
Output:
Previous
« Select by n-gram overlap
Next
Example selectors »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Self-ask with
ModulesAgentsAgent Typessearch
On this page
Initialize Tools
We will initialize the tools we want to use. This is a good tool because it gives usanswers (not documents)
For this agent, only one tool can be used and it needs to be named “Intermediate Answer”
Create Agent
Run Agent
{'input': "What is the hometown of the reigning men's U.S. Open champion?",
'output': 'Novak Djokovic.'}
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsCaching
On this page
Caching
LangChain provides an optional caching layer for chat models. This is useful for two reasons:
It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same
completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM
provider.
llm = ChatOpenAI()
In Memory Cache
%%time
from langchain.cache import InMemoryCache
set_llm_cache(InMemoryCache())
SQLite Cache
!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 23.2 ms, sys: 17.8 ms, total: 40.9 ms
Wall time: 592 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 5.61 ms, sys: 22.5 ms, total: 28.1 ms
Wall time: 47.5 ms
"Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation Summary
ModulesMoreMemorytypes Buffer
On this page
llm = OpenAI()
memory = ConversationSummaryBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'System: \nThe human says "hi", and the AI responds with "whats up".\nHuman: not much you\nAI: not much'}
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
memory = ConversationSummaryBufferMemory(
llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)
'\nThe human and AI state that they are not doing much.'
Using in a chain
Let’s walk through an example, again setting verbose=True so we can see the prompt.
conversation_with_summary = ConversationChain(
llm=llm,
# We set a very low max_token_limit for the purposes of testing.
memory=ConversationSummaryBufferMemory(llm=OpenAI(), max_token_limit=40),
verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th
Current conversation:
" Hi there! I'm doing great. I'm learning about the latest advances in artificial intelligence. What about you?"
conversation_with_summary.predict(input="Just working on writing some documentation!")
Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm spending some time learning about the latest developments in AI technology. How about you?
Human: Just working on writing some documentation!
AI:
' That sounds like a great use of your time. Do you have experience with writing documentation?'
# We can see here that there is a summary of the conversation and then some previous interactions
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")
Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology.
Human: Just working on writing some documentation!
AI: That sounds like a great use of your time. Do you have experience with writing documentation?
Human: For LangChain! Have you heard of it?
AI:
" No, I haven't heard of LangChain. Can you tell me more about it?"
# We can see here that the summary and the buffer are updated
conversation_with_summary.predict(
input="Haha nope, although a lot of people confuse it for that"
)
Current conversation:
System:
The human asked the AI what it was up to and the AI responded that it was learning about the latest developments in AI technology. The human then mentioned they
Human: For LangChain! Have you heard of it?
AI: No, I haven't heard of LangChain. Can you tell me more about it?
Human: Haha nope, although a lot of people confuse it for that
AI:
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language CookbookAgents
Agents
You can pass a Runnable into an agent. Make sure you havelangchainhub installed: pip install langchainhub
1. Data processing for the intermediate steps. These need to be represented in a way that the language model can
recognize them. This should be pretty tightly coupled to the instructions in the prompt
4. The output parser - should be in sync with how the prompt specifies things to be formatted.
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: convert_intermediate_steps(
x["intermediate_steps"]
),
}
| prompt.partial(tools=convert_tools(tool_list))
| model.bind(stop=["</tool_input>", "</final_answer>"])
| XMLAgentOutputParser()
)
agent_executor = AgentExecutor(agent=agent, tools=tool_list, verbose=True)
agent_executor.invoke({"input": "whats the weather in New york?"})
Previous
« Querying a SQL DB
Next
Code writing »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How RunnableLambda: Run Custom
Language to Functions
On this page
Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple
arguments, you should write a wrapper that accepts a single input and unpacks it into multiple argument.
def length_function(text):
return len(text)
def multiple_length_function(_dict):
return _multiple_length_function(_dict["text1"], _dict["text2"])
chain = (
{
"a": itemgetter("foo") | RunnableLambda(length_function),
"b": {"text1": itemgetter("foo"), "text2": itemgetter("bar")}
| RunnableLambda(multiple_length_function),
}
| prompt
| model
)
chain.invoke({"foo": "bar", "bar": "gah"})
AIMessage(content='3 + 9 equals 12.')
Runnable lambdas can optionally accept a RunnableConfig, which they can use to pass callbacks, tags, and other
configuration information to nested runs.
Previous
« RunnablePassthrough: Passing data through
Next
RunnableBranch: Dynamically route logic based on input »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Few-shot prompt
ModulesI/O Prompts templates
On this page
Use Case
In this tutorial, we’ll configure few-shot examples for self-ask with search.
To get started, create a list of few-shot examples. Each example should be a dictionary with the keys being the input
variables and the values being the values for those input variables.
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
examples = [
{
"question": "Who lived longer, Muhammad Ali or Alan Turing?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
},
{
"question": "When was the founder of craigslist born?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
},
{
"question": "Who was the maternal grandfather of George Washington?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
},
{
"question": "Are both the directors of Jaws and Casino Royale from the same country?",
"answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
},
]
Configure a formatter that will format the few-shot examples into a string. This formatter should be aPromptTemplate object.
example_prompt = PromptTemplate(
input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)
print(example_prompt.format(**examples[0]))
Question: Who lived longer, Muhammad Ali or Alan Turing?
Finally, create a FewShotPromptTemplate object. This object takes in the few-shot examples and the formatter for the few-shot
examples.
prompt = FewShotPromptTemplate(
examples=examples,
example_prompt=example_prompt,
suffix="Question: {input}",
input_variables=["input"],
)
Question: Are both the directors of Jaws and Casino Royale from the same country?
We will reuse the example set and the formatter from the previous section. However, instead of feeding the examples directly
into the FewShotPromptTemplate object, we will feed them into an ExampleSelector object.
In this tutorial, we will use theSemanticSimilarityExampleSelector class. This class selects few-shot examples based on their
similarity to the input. It uses an embedding model to compute the similarity between the input and the few-shot examples, as
well as a vector store to perform the nearest neighbor search.
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
example_selector = SemanticSimilarityExampleSelector.from_examples(
# This is the list of examples available to select from.
examples,
# This is the embedding class used to produce embeddings which are used to measure semantic similarity.
OpenAIEmbeddings(),
# This is the VectorStore class that is used to store the embeddings and do a similarity search over.
Chroma,
# This is the number of examples to produce.
k=1,
)
answer:
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
Finally, create a FewShotPromptTemplate object. This object takes in the example selector and the formatter for the few-shot
examples.
prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
suffix="Question: {input}",
input_variables=["input"],
)
Previous
« Example selectors
Next
Few-shot examples for chat models »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Defining Custom
ModulesAgentsToolsTools
On this page
name (str), is required and must be unique within a set of tools provided to an agent
description (str), is optional but recommended, as it is used by an agent to determine tool use
args_schema (Pydantic BaseModel), is optional but recommended, can be used to provide more information (e.g., few-
shot examples) or validation for expected parameters.
There are multiple ways to define a tool. In this guide, we will walk through how to do for two functions:
The biggest difference here is that the first function only requires one input, while the second one requires multiple. Many
agents only work with functions that require single inputs, so it’s important to know how to work with those. For the most part,
defining these custom tools is the same, but there are some differences.
@tool decorator
This @tool decorator is the simplest way to define a custom tool. The decorator uses the function name as the tool name by
default, but this can be overridden by passing a string as the first argument. Additionally, the decorator will use the function’s
docstring as the tool’s description - so a docstring MUST be provided.
@tool
def search(query: str) -> str:
"""Look up things online."""
return "LangChain"
print(search.name)
print(search.description)
print(search.args)
search
search(query: str) -> str - Look up things online.
{'query': {'title': 'Query', 'type': 'string'}}
@tool
def multiply(a: int, b: int) -> int:
"""Multiply two numbers."""
return a * b
print(multiply.name)
print(multiply.description)
print(multiply.args)
multiply
multiply(a: int, b: int) -> int - Multiply two numbers.
{'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}}
You can also customize the tool name and JSON args by passing them into the tool decorator.
class SearchInput(BaseModel):
query: str = Field(description="should be a search query")
Subclass BaseTool
You can also explicitly define a custom tool by subclassing the BaseTool class. This provides maximal control over the tool
definition, but is a bit more work.
class SearchInput(BaseModel):
query: str = Field(description="should be a search query")
class CalculatorInput(BaseModel):
a: int = Field(description="first number")
b: int = Field(description="second number")
class CustomSearchTool(BaseTool):
name = "custom_search"
description = "useful for when you need to answer questions about current events"
args_schema: Type[BaseModel] = SearchInput
def _run(
self, query: str, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return "LangChain"
class CustomCalculatorTool(BaseTool):
name = "Calculator"
description = "useful for when you need to answer questions about math"
args_schema: Type[BaseModel] = CalculatorInput
return_direct: bool = True
def _run(
self, a: int, b: int, run_manager: Optional[CallbackManagerForToolRun] = None
) -> str:
"""Use the tool."""
return a * b
StructuredTool dataclass
You can also use a StructuredTool dataclass. This methods is a mix between the previous two. It’s more convenient than
inheriting from the BaseTool class, but provides more functionality than just using a decorator.
search = StructuredTool.from_function(
func=search_function,
name="Search",
description="useful for when you need to answer questions about current events",
# coroutine= ... <- you can specify an async method if desired as well
)
print(search.name)
print(search.description)
print(search.args)
Search
Search(query: str) - useful for when you need to answer questions about current events
{'query': {'title': 'Query', 'type': 'string'}}
You can also define a custom args_schema to provide more information about inputs.
class CalculatorInput(BaseModel):
a: int = Field(description="first number")
b: int = Field(description="second number")
calculator = StructuredTool.from_function(
func=multiply,
name="Calculator",
description="multiply numbers",
args_schema=CalculatorInput,
return_direct=True,
# coroutine= ... <- you can specify an async method if desired as well
)
print(calculator.name)
print(calculator.description)
print(calculator.args)
Calculator
Calculator(a: int, b: int) -> int - multiply numbers
{'a': {'title': 'A', 'description': 'first number', 'type': 'integer'}, 'b': {'title': 'B', 'description': 'second number', 'type': 'integer'}}
When a tool encounters an error and the exception is not caught, the agent will stop executing. If you want the agent to
continue execution, you can raise a ToolException and set handle_tool_error accordingly.
When ToolException is thrown, the agent will not stop working, but will handle the exception according to thehandle_tool_error
variable of the tool, and the processing result will be returned to the agent as observation, and printed in red.
You can set handle_tool_error to True, set it a unified string value, or set it as a function. If it’s set as a function, the function
should take a ToolException as a parameter and return a str value.
Please note that only raising a ToolException won’t be effective. You need to first set thehandle_tool_error of the tool because its
default value is False.
search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
)
search.run("test")
ToolException: The search tool1 is not available.
search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=True,
)
search.run("test")
'The search tool1 is not available.'
search = StructuredTool.from_function(
func=search_tool1,
name="Search_tool1",
description="A bad tool",
handle_tool_error=_handle_error,
)
search.run("test")
'The following errors occurred during tool execution:The search tool1 is not available.Please try another tool.'
Previous
« Toolkits
Next
Tools as OpenAI Functions »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Modules
On this page
Modules
LangChain provides standard, extendable interfaces and external integrations for the following main modules:
Model I/O
Retrieval
Agents
Additional
Chains
Memory
Callbacks
Previous
« LangChain Expression Language (LCEL)
Next
Model I/O »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Example Selector
ModulesI/O Prompts Types
Previous
« Composition
Next
Select by length »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Chat Models
On this page
Chat Models
Chat Models are a core component of LangChain.
A chat model is a language model that uses chat messages as inputs and returns chat messages as outputs (as opposed to
using plain text).
LangChain has integrations with many model providers (OpenAI, Cohere, Hugging Face, etc.) and exposes a standard
interface to interact with all of these models.
LangChain allows you to use models in sync, async, batching and streaming modes and provides other features (e.g.,
caching) and more.
Quick Start
Check out this quick start to get an overview of working with ChatModels, including all the different methods they expose
Integrations
For a full list of all LLM integrations that LangChain provides, please go to theIntegrations page
How-To Guides
We have several how-to guides for more advanced usage of LLMs. This includes:
Previous
« Pipeline
Next
Quick Start »
Community
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory in
ModulesMoreMemoryLLMChain
On this page
Memory in LLMChain
This notebook goes over how to use the Memory class with anLLMChain.
We will add the ConversationBufferMemory class, although this can be any memory class.
The most important step is setting up the prompt correctly. In the below prompt, we have two input keys: one for the actual
input, another for the input from the Memory class. Importantly, we make sure the keys in the PromptTemplate and the
ConversationBufferMemory match up (chat_history).
{chat_history}
Human: {human_input}
Chatbot:"""
prompt = PromptTemplate(
input_variables=["chat_history", "human_input"], template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")
llm = OpenAI()
llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
llm_chain.predict(human_input="Hi there my friend")
The from_messages method creates a ChatPromptTemplate from a list of messages (e.g., SystemMessage, HumanMessage , AIMessage,
ChatMessage, etc.) or message templates, such as the MessagesPlaceholder below.
The configuration below makes it so the memory will be injected to the middle of the chat prompt, in thechat_history key, and
the user’s inputs will be added in a human/user message to the end of the chat prompt.
prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a chatbot having a conversation with a human."
), # The persistent system prompt
MessagesPlaceholder(
variable_name="chat_history"
), # Where the memory will be stored.
HumanMessagePromptTemplate.from_template(
"{human_input}"
), # Where the human input will injected
]
)
chat_llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
memory=memory,
)
chat_llm_chain.predict(human_input="Hi there my friend")
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Prompts
On this page
Prompts
A prompt for a language model is a set of instructions or input provided by a user to guide the model's response, helping it
understand the context and generate relevant and coherent language-based output, such as answering questions,
completing sentences, or engaging in a conversation.
Quickstart
This quick start provides a basic overview of how to work with prompts.
How-To Guides
We have many how-to guides for working with prompts. These include:
LangChain has a few different types of example selectors you can use off the shelf. You can explore those typeshere
Previous
« Concepts
Next
Quick Start »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text Split
ModulesRetrievalSplitters code
On this page
Split code
CodeTextSplitter allows you to split your code with multiple languages supported. Import enumLanguage and specify the
language.
Python
PYTHON_CODE = """
def hello_world():
print("Hello, World!")
JS
Here’s an example using the JS text splitter:
JS_CODE = """
function helloWorld() {
console.log("Hello, World!");
}
js_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.JS, chunk_size=60, chunk_overlap=0
)
js_docs = js_splitter.create_documents([JS_CODE])
js_docs
[Document(page_content='function helloWorld() {\n console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]
TS
TS_CODE = """
function helloWorld(): void {
console.log("Hello, World!");
}
ts_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.TS, chunk_size=60, chunk_overlap=0
)
ts_docs = ts_splitter.create_documents([TS_CODE])
ts_docs
[Document(page_content='function helloWorld(): void {'),
Document(page_content='console.log("Hello, World!");\n}'),
Document(page_content='// Call the function\nhelloWorld();')]
Markdown
markdown_text = """
# ️ LangChain
## Quick Install
```bash
# Hopefully this code block isn't split
pip install langchain
```
latex_text = """
\documentclass{article}
\begin{document}
\maketitle
\section{Introduction}
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent yea
\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power availab
\subsection{Applications of LLMs}
LLMs have many applications in industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, p
\end{document}
"""
latex_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.MARKDOWN, chunk_size=60, chunk_overlap=0
)
latex_docs = latex_splitter.create_documents([latex_text])
latex_docs
[Document(page_content='\\documentclass{article}\n\n\x08egin{document}\n\n\\maketitle'),
Document(page_content='\\section{Introduction}'),
Document(page_content='Large language models (LLMs) are a type of machine learning'),
Document(page_content='model that can be trained on vast amounts of text data to'),
Document(page_content='generate human-like language. In recent years, LLMs have'),
Document(page_content='made significant advances in a variety of natural language'),
Document(page_content='processing tasks, including language translation, text'),
Document(page_content='generation, and sentiment analysis.'),
Document(page_content='\\subsection{History of LLMs}'),
Document(page_content='The earliest LLMs were developed in the 1980s and 1990s,'),
Document(page_content='but they were limited by the amount of data that could be'),
Document(page_content='processed and the computational power available at the'),
Document(page_content='time. In the past decade, however, advances in hardware and'),
Document(page_content='software have made it possible to train LLMs on massive'),
Document(page_content='datasets, leading to significant improvements in'),
Document(page_content='performance.'),
Document(page_content='\\subsection{Applications of LLMs}'),
Document(page_content='LLMs have many applications in industry, including'),
Document(page_content='chatbots, content creation, and virtual assistants. They'),
Document(page_content='can also be used in academia for research in linguistics,'),
Document(page_content='psychology, and computational linguistics.'),
Document(page_content='\\end{document}')]
HTML
Solidity
SOL_CODE = """
pragma solidity ^0.8.20;
contract HelloWorld {
function add(uint a, uint b) pure public returns(uint) {
return a + b;
}
}
"""
sol_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.SOL, chunk_size=128, chunk_overlap=0
)
sol_docs = sol_splitter.create_documents([SOL_CODE])
sol_docs
[Document(page_content='pragma solidity ^0.8.20;'),
Document(page_content='contract HelloWorld {\n function add(uint a, uint b) pure public returns(uint) {\n return a + b;\n }\n}')]
Previous
« Split by character
Next
MarkdownHeaderTextSplitter »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters
On this page
Text Splitters
Once you've loaded documents, you'll often want to transform them to better suit your application. The simplest example is
you may want to split a long document into smaller chunks that can fit into your model's context window. LangChain has a
number of built-in document transformers that make it easy to split, combine, filter, and otherwise manipulate documents.
When you want to deal with long pieces of text, it is necessary to split up that text into chunks. As simple as this sounds,
there is a lot of potential complexity here. Ideally, you want to keep the semantically related pieces of text together. What
"semantically related" means could depend on the type of text. This notebook showcases several ways to do that.
1. Split the text up into small, semantically meaningful chunks (often sentences).
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some
overlap (to keep context between chunks).
That means there are two different axes along which you can customize your text splitter:
LangChain offers many different types of text splitters. These all live in thelangchain-text-splitters package. Below is a table
listing all of them, along with a few characteristics:
Adds Metadata: Whether or not this text splitter adds metadata about where each chunk came from.
You can evaluate text splitters with theChunkviz utility created by Greg Kamradt. Chunkviz is a great tool for visualizing how your
text splitter is working. It will show you how your text is being split up and help in tuning up the splitting parameters.
Text splitting is only one example of transformations that you may want to do on documents before passing them to an LLM.
Head to Integrations for documentation on built-in document transformer integrations with 3rd-party tools.
Previous
« PDF
Next
HTMLHeaderTextSplitter »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Routing by semantic
Language Cookbooksimilarity
One especially useful technique is to use embeddings to route a query to the most relevant prompt. Here’s a very simple
example.
Here is a question:
{query}"""
math_template = """You are a very good mathematician. You are great at answering math questions. \
You are so good because you are able to break down hard problems into their component parts, \
answer the component parts, and then put them together to answer the broader question.
Here is a question:
{query}"""
embeddings = OpenAIEmbeddings()
prompt_templates = [physics_template, math_template]
prompt_embeddings = embeddings.embed_documents(prompt_templates)
def prompt_router(input):
query_embedding = embeddings.embed_query(input["query"])
similarity = cosine_similarity([query_embedding], prompt_embeddings)[0]
most_similar = prompt_templates[similarity.argmax()]
print("Using MATH" if most_similar == math_template else "Using PHYSICS")
return PromptTemplate.from_template(most_similar)
chain = (
{"query": RunnablePassthrough()}
| RunnableLambda(prompt_router)
| ChatOpenAI()
| StrOutputParser()
)
print(chain.invoke("What's a black hole"))
Using PHYSICS
A black hole is a region in space where gravity is extremely strong, so strong that nothing, not even light, can escape its gravitational pull. It is formed when a massiv
In mathematics and physics, a path integral is a mathematical tool used to calculate the probability amplitude or wave function of a particle or system of particles. It w
To understand the concept better, let's consider an example. Suppose we have a particle moving from point A to point B in space. Classically, we would describe this
The path integral formalism considers all possible paths that the particle could take and assigns a probability amplitude to each path. These probability amplitudes ar
To calculate a path integral, we need to define an action, which is a mathematical function that describes the behavior of the system. The action is usually expressed
Once we have the action, we can write down the path integral as an integral over all possible paths. Each path is weighted by a factor determined by the action and t
∫ e^(iS/ħ) D[x(t)]
Here, S is the action, ħ is the reduced Planck's constant, and D[x(t)] represents the integration over all possible paths x(t) of the particle.
By evaluating this integral, we can obtain the probability amplitude for the particle to go from the initial state to the final state. The absolute square of this amplitude g
Path integrals have proven to be a powerful tool in various areas of physics, including quantum mechanics, quantum field theory, and statistical mechanics. They allo
I hope this explanation helps you understand the concept of a path integral. If you have any further questions, feel free to ask!
Previous
« Code writing
Next
Adding memory »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Logging to
ModulesMoreCallbacksfile
Logging to file
This example shows how to print logs to file. It shows how to use theFileCallbackHandler, which does the same thing as
StdOutCallbackHandler, but instead writes the output to file. It also uses theloguru library to log other outputs that are not captured
by the handler.
logfile = "output.log"
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")
# this chain will both print to stdout (because verbose=True) and write to 'output.log'
# if verbose=False, the FileCallbackHandler will still write to 'output.log'
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler], verbose=True)
answer = chain.run(number=2)
logger.info(answer)
Now we can open the file output.log to see that the output has been captured.
conv = Ansi2HTMLConverter()
html = conv.convert(content, full=True)
display(HTML(html))
> Entering new LLMChain chain...
Prompt after formatting:
1+2=
> Finished chain.
2023-06-01 18:36:38.929 | INFO | __main__:<module>:20 -
3
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Customizing Conversational
ModulesMoreMemoryMemory
On this page
llm = OpenAI(temperature=0)
AI prefix
The first way to do so is by changing the AI prefix in the conversation summary. By default, this is set to “AI”, but you can set
this to be anything you want. Note that if you change this, you should also change the prompt used in the chain to reflect this
naming change. Let’s walk through an example of that in the example below.
Current conversation:
Human: Hi there!
AI:
" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")
Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI:
' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the next few days is sunny with temperatures in the mid-70s.'
# Now we can override it and set it to "AI Assistant"
from langchain.prompts.prompt import PromptTemplate
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe
Current conversation:
{history}
Human: {input}
AI Assistant:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBufferMemory(ai_prefix="AI Assistant"),
)
conversation.predict(input="Hi there!")
Current conversation:
Human: Hi there!
AI Assistant:
" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")
Current conversation:
Human: Hi there!
AI Assistant: Hi there! It's nice to meet you. How can I help you today?
Human: What's the weather?
AI Assistant:
' The current weather is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is sunny with a high of 78 degrees and a lo
Human prefix
The next way to do so is by changing the Human prefix in the conversation summary. By default, this is set to “Human”, but
you can set this to be anything you want. Note that if you change this, you should also change the prompt used in the chain
to reflect this naming change. Let’s walk through an example of that in the example below.
template = """The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI doe
Current conversation:
{history}
Friend: {input}
AI:"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)
conversation = ConversationChain(
prompt=PROMPT,
llm=llm,
verbose=True,
memory=ConversationBufferMemory(human_prefix="Friend"),
)
conversation.predict(input="Hi there!")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th
Current conversation:
Friend: Hi there!
AI:
" Hi there! It's nice to meet you. How can I help you today?"
conversation.predict(input="What's the weather?")
Current conversation:
Friend: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Friend: What's the weather?
AI:
' The weather right now is sunny and warm with a temperature of 75 degrees Fahrenheit. The forecast for the rest of the day is mostly sunny with a high of 82 degree
Previous
« Message Memory in Agent backed by a database
Next
Custom Memory »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Custom callback
ModulesMoreCallbackshandlers
class MyCustomHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"My custom handler, token: {token}")
chat.invoke([HumanMessage(content="Tell me a joke")])
My custom handler, token:
My custom handler, token: Why
My custom handler, token: don
My custom handler, token: 't
My custom handler, token: scientists
My custom handler, token: trust
My custom handler, token: atoms
My custom handler, token: ?
My custom handler, token:
Previous
« Async callbacks
Next
Logging to file »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression Why use
Language LCEL
On this page
LCEL makes it easy to build complex chains from basic components. It does this by providing: 1.A unified interface: Every
LCEL object implements the Runnable interface, which defines a common set of invocation methods (invoke , batch, stream, ainvoke,
…). This makes it possible for chains of LCEL objects to also automatically support these invocations. That is, every chain of
LCEL objects is itself an LCEL object. 2. Composition primitives: LCEL provides a number of primitives that make it easy to
compose chains, parallelize components, add fallbacks, dynamically configure chain internal, and more.
To better understand the value of LCEL, it’s helpful to see it in action and think about how we might recreate similar
functionality without it. In this walkthrough we’ll do just that with our basic example from the get started section. We’ll take our
simple prompt + model chain, which under the hood already defines a lot of functionality, and see what it would take to
recreate all of it.
Invoke
In the simplest case, we just want to pass in a topic string and get back a joke string:
Without LCEL
import openai
invoke_chain("ice cream")
LCEL
prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
output_parser = StrOutputParser()
model = ChatOpenAI(model="gpt-3.5-turbo")
chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| output_parser
)
chain.invoke("ice cream")
Stream
Without LCEL
LCEL
Batch
If we want to run on a batch of inputs in parallel, we’ll again need a new function:
Without LCEL
LCEL
Async
Without LCEL
async_client = openai.AsyncOpenAI()
chain.ainvoke("ice cream")
Without LCEL
invoke_llm_chain("ice cream")
LCEL
llm = OpenAI(model="gpt-3.5-turbo-instruct")
llm_chain = (
{"topic": RunnablePassthrough()}
| prompt
| llm
| output_parser
)
llm_chain.invoke("ice cream")
Without LCEL
import anthropic
anthropic_template = f"Human:\n\n{prompt_template}\n\nAssistant:"
anthropic_client = anthropic.Anthropic()
invoke_anthropic_chain("ice cream")
LCEL
anthropic = ChatAnthropic(model="claude-2")
anthropic_chain = (
{"topic": RunnablePassthrough()}
| prompt
| anthropic
| output_parser
)
anthropic_chain.invoke("ice cream")
Runtime configurability
If we wanted to make the choice of chat model or LLM configurable at runtime:
Without LCEL
def invoke_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> str:
if model == "chat_openai":
return invoke_chain(topic)
elif model == "openai":
return invoke_llm_chain(topic)
elif model == "anthropic":
return invoke_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)
def stream_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> Iterator[str]:
if model == "chat_openai":
return stream_chain(topic)
elif model == "openai":
# Note we haven't implemented this yet.
return stream_llm_chain(topic)
elif model == "anthropic":
# Note we haven't implemented this yet
return stream_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)
def batch_configurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
# You get the idea
...
With LCEL
configurable_model = model.configurable_alternatives(
ConfigurableField(id="model"),
default_key="chat_openai",
openai=llm,
anthropic=anthropic,
)
configurable_chain = (
{"topic": RunnablePassthrough()}
| prompt
| configurable_model
| output_parser
)
configurable_chain.invoke(
"ice cream",
config={"model": "openai"}
)
stream = configurable_chain.stream(
"ice cream",
config={"model": "anthropic"}
)
for chunk in stream:
print(chunk, end="", flush=True)
Logging
If we want to log our intermediate results:
Without LCEL
invoke_anthropic_chain_with_logging("ice cream")
LCEL
Every component has built-in integrations with LangSmith. If we set the following two environment variables, all chain traces are logged to LangSmith.
import os
os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"
anthropic_chain.invoke("ice cream")
Fallbacks
Without LCEL
invoke_chain_with_fallback("ice cream")
# await ainvoke_chain_with_fallback("ice cream")
batch_chain_with_fallback(["ice cream", "spaghetti", "dumplings"]))
LCEL
fallback_chain = chain.with_fallbacks([anthropic_chain])
fallback_chain.invoke("ice cream")
# await fallback_chain.ainvoke("ice cream")
fallback_chain.batch(["ice cream", "spaghetti", "dumplings"])
Even in this simple case, our LCEL chain succinctly packs in a lot of functionality. As chains become more complex, this
becomes especially valuable.
Without LCEL
def invoke_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> str:
if model == "chat_openai":
return invoke_chain(topic)
elif model == "openai":
return invoke_llm_chain(topic)
elif model == "anthropic":
return invoke_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)
def stream_configurable_chain(
topic: str,
*,
model: str = "chat_openai"
) -> Iterator[str]:
if model == "chat_openai":
return stream_chain(topic)
elif model == "openai":
# Note we haven't implemented this yet.
return stream_llm_chain(topic)
elif model == "anthropic":
# Note we haven't implemented this yet
return stream_anthropic_chain(topic)
else:
raise ValueError(
f"Received invalid model '{model}'."
" Expected one of chat_openai, openai, anthropic"
)
def batch_configurable_chain(
topics: List[str],
*,
model: str = "chat_openai"
) -> List[str]:
...
LCEL
import os
os.environ["LANGCHAIN_API_KEY"] = "..."
os.environ["LANGCHAIN_TRACING_V2"] = "true"
prompt = ChatPromptTemplate.from_template(
"Tell me a short joke about {topic}"
)
chat_openai = ChatOpenAI(model="gpt-3.5-turbo")
openai = OpenAI(model="gpt-3.5-turbo-instruct")
anthropic = ChatAnthropic(model="claude-2")
model = (
chat_openai
.with_fallbacks([anthropic])
.configurable_alternatives(
ConfigurableField(id="model"),
default_key="chat_openai",
openai=openai,
anthropic=anthropic,
)
)
chain = (
{"topic": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Next steps
To continue learning about LCEL, we recommend: - Reading up on the full LCELInterface, which we’ve only partially covered
here. - Exploring the How-to section to learn about additional composition primitives that LCEL provides. - Looking through
the Cookbook section to see LCEL in action for common use cases. A good next use case to look at would beRetrieval-
augmented generation.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Few-shot examples for chat
ModulesI/O Prompts models
On this page
The goal of few-shot prompt templates are to dynamically select examples based on an input, and then format the examples
in a final prompt to provide for the model.
Note: The following code examples are for chat models. For similar few-shot prompt examples for completion models (LLMs),
see the few-shot prompt templates guide.
Fixed Examples
The most basic (and common) few-shot prompting technique is to use a fixed prompt example. This way you can select a
chain, evaluate it, and avoid worrying about additional moving parts in production.
The basic components of the template are: -examples: A list of dictionary examples to include in the final prompt. -
example_prompt: converts each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message followed
by a function call message.
Below is a simple demonstration. First, import the modules for this example:
examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
]
print(few_shot_prompt.format())
Human: 2+2
AI: 4
Human: 2+3
AI: 5
Sometimes you may want to condition which examples are shown based on the input. For this, you can replace theexamples
with an example_selector. The other components remain the same as above! To review, the dynamic few-shot prompt template
would look like:
example_selector:responsible for selecting few-shot examples (and the order in which they are returned) for a given input.
These implement the BaseExampleSelector interface. A common example is the vectorstore-backed
SemanticSimilarityExampleSelector
example_prompt: convert each example into 1 or more messages through itsformat_messages method. A common example
would be to convert each example into one human message and one AI message response, or a human message
followed by a function call message.
These once again can be composed with other messages and chat templates to assemble your final prompt.
Since we are using a vectorstore to select examples based on semantic similarity, we will want to first populate the store.
examples = [
{"input": "2+2", "output": "4"},
{"input": "2+3", "output": "5"},
{"input": "2+4", "output": "6"},
{"input": "What did the cow say to the moon?", "output": "nothing at all"},
{
"input": "Write me a poem about the moon",
"output": "One for the moon, and one for me, who are we to talk about the moon?",
},
]
With a vectorstore created, you can create the example_selector. Here we will isntruct it to only fetch the top 2 examples.
example_selector = SemanticSimilarityExampleSelector(
vectorstore=vectorstore,
k=2,
)
# The prompt template will load examples by passing the input do the `select_examples` method
example_selector.select_examples({"input": "horse"})
[{'input': 'What did the cow say to the moon?', 'output': 'nothing at all'},
{'input': '2+4', 'output': '6'}]
print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4
final_prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a wondrous wizard of math."),
few_shot_prompt,
("human", "{input}"),
]
)
print(few_shot_prompt.format(input="What's 3+3?"))
Human: 2+3
AI: 5
Human: 2+2
AI: 4
Previous
« Few-shot prompt templates
Next
Types of `MessagePromptTemplate` »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Vector store-backed
ModulesRetrievalRetrieversretriever
On this page
Once you construct a vector store, it’s very easy to construct a retriever. Let’s walk through an example.
loader = TextLoader("../../state_of_the_union.txt")
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)
retriever = db.as_retriever()
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal
relevance search, you can specify that as the search type.
retriever = db.as_retriever(search_type="mmr")
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
You can also set a retrieval method that sets a similarity score threshold and only returns documents with a score above that
threshold.
retriever = db.as_retriever(
search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)
docs = retriever.get_relevant_documents("what did he say about ketanji brown jackson")
Specifying top k
You can also specify search kwargs like k to use when doing retrieval.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Contextual
ModulesRetrievalRetrieverscompression
On this page
Contextual compression
One challenge with retrieval is that usually you don’t know the specific queries your document storage system will face when
you ingest data into the system. This means that the information most relevant to a query may be buried in a document with
a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer
responses.
Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is,
you can compress them using the context of the given query, so that only the relevant information is returned. “Compressing”
here refers to both compressing the contents of an individual document and filtering out documents wholesale.
To use the Contextual Compression Retriever, you’ll need: - a base retriever - a Document Compressor
The Contextual Compression Retriever passes queries to the base retriever, takes the initial documents and passes them
through the Document Compressor. The Document Compressor takes a list of documents and shortens it by reducing the
contents of documents or dropping documents altogether.
Get started
def pretty_print_docs(docs):
print(
f"\n{'-' * 100}\n".join(
[f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
)
)
Let’s start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can
see that given an example question our retriever returns one or two relevant docs and a few irrelevant docs. And even the
relevant docs have a lot of irrelevant information in them.
documents = TextLoader("../../state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()
docs = retriever.get_relevant_documents(
"What did the president say about Ketanji Brown Jackson"
)
pretty_print_docs(docs)
Document 1:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
----------------------------------------------------------------------------------------------------
Document 2:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.
We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
----------------------------------------------------------------------------------------------------
Document 3:
And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give
While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-A
And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together a
So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.
Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers.
And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.
Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect.
We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not d
Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty.
Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret
Now let’s wrap our base retriever with a ContextualCompressionRetriever. We’ll add an LLMChainExtractor, which will iterate over the
initially returned documents and extract from each only the content that is relevant to the query.
llm = OpenAI(temperature=0)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
Document 1:
I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson.
LLMChainFilter
The LLMChainFilter is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially
retrieved documents to filter out and which ones to return, without manipulating the document contents.
_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
base_compressor=_filter, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
/Users/harrisonchase/workplace/langchain/libs/langchain/langchain/chains/llm.py:316: UserWarning: The predict_and_parse method is deprecated, instead pass an o
warnings.warn(
Document 1:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
EmbeddingsFilter
Making an extra LLM call over each retrieved document is expensive and slow. TheEmbeddingsFilter provides a cheaper and
faster option by embedding the documents and query and only returning those documents which have sufficiently similar
embeddings to the query.
embeddings = OpenAIEmbeddings()
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(
base_compressor=embeddings_filter, base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
Document 1:
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Just
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Bre
----------------------------------------------------------------------------------------------------
Document 2:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.
We’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.
We’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster.
We’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.
----------------------------------------------------------------------------------------------------
Document 3:
And for our LGBTQ+ Americans, let’s finally get the bipartisan Equality Act to my desk. The onslaught of state laws targeting transgender Americans and their families
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give
While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-A
And soon, we’ll strengthen the Violence Against Women Act that I first wrote three decades ago. It is important for us to show the nation that we can come together a
So tonight I’m offering a Unity Agenda for the Nation. Four big things we can do together.
Using the DocumentCompressorPipeline we can also easily combine multiple compressors in sequence. Along with compressors
we can add BaseDocumentTransformer s to our pipeline, which don’t perform any contextual compression but simply perform
some transformation on a set of documents. For example TextSplitters can be used as document transformers to split
documents into smaller pieces, and the EmbeddingsRedundantFilter can be used to filter out redundant documents based on
embedding similarity between documents.
Below we create a compressor pipeline by first splitting our docs into smaller chunks, then removing redundant documents,
and then filtering based on relevance to the query.
compressed_docs = compression_retriever.get_relevant_documents(
"What did the president say about Ketanji Jackson Brown"
)
pretty_print_docs(compressed_docs)
Document 1:
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson
----------------------------------------------------------------------------------------------------
Document 2:
As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-give
While it often appears that we never agree, that isn’t true. I signed 80 bipartisan bills into law last year
----------------------------------------------------------------------------------------------------
Document 3:
A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder
----------------------------------------------------------------------------------------------------
Document 4:
Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans
And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.
We can do both
Previous
« MultiQueryRetriever
Next
Ensemble Retriever »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O LLMsCaching
On this page
Caching
LangChain provides an optional caching layer for LLMs. This is useful for two reasons:
It can save you money by reducing the number of API calls you make to the LLM provider, if you’re often requesting the same
completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM
provider.
set_llm_cache(InMemoryCache())
SQLite Cache
!rm .langchain.db
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache
set_llm_cache(SQLiteCache(database_path=".langchain.db"))
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")
CPU times: user 29.3 ms, sys: 17.3 ms, total: 46.7 ms
Wall time: 364 ms
'\n\nWhy did the tomato turn red?\n\nBecause it saw the salad dressing!'
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")
CPU times: user 4.58 ms, sys: 2.23 ms, total: 6.8 ms
Wall time: 4.68 ms
'\n\nWhy did the tomato turn red?\n\nBecause it saw the salad dressing!'
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O LLMsQuick Start
On this page
Quick Start
Large Language Models (LLMs) are a core component of LangChain. LangChain does not serve its own LLMs, but rather
provides a standard interface for interacting with many different LLMs.
There are lots of LLM providers (OpenAI, Cohere, Hugging Face, etc) - theLLM class is designed to provide a standard
interface for all of them.
In this walkthrough we’ll work with an OpenAI LLM wrapper, although the functionalities highlighted are generic for all LLM
types.
Setup
For this example we’ll need to install the OpenAI Python package:
Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we’ll want to set it as an environment variable by running:
export OPENAI_API_KEY="..."
If you’d prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:
llm = OpenAI(openai_api_key="...")
llm = OpenAI()
LCEL
LLMs implement the Runnable interface, the basic building block of theLangChain Expression Language (LCEL). This means
they support invoke , ainvoke, stream, astream, batch, abatch, astream_log calls.
LLMs accept strings as inputs, or objects which can be coerced to string prompts, includingList[BaseMessage] and PromptValue.
llm.invoke(
"What are some theories about the relationship between unemployment and inflation?"
)
'\n\n1. The Phillips Curve Theory: This suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment is low, i
2. The Cost-Push Inflation Theory: This theory suggests that an increase in unemployment leads to a decrease in aggregate demand, which causes prices to go up d
3. The Wage-Push Inflation Theory: This theory states that when unemployment is low, wages tend to increase due to competition for labor, which causes prices to ri
4. The Monetarist Theory: This theory states that there is no direct relationship between unemployment and inflation, but rather, an increase in the money supply lead
llm.batch(
[
"What are some theories about the relationship between unemployment and inflation?"
]
)
['\n\n1. The Phillips Curve Theory: This theory suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment
await llm.ainvoke(
"What are some theories about the relationship between unemployment and inflation?"
)
'\n\n1. Phillips Curve Theory: This theory states that there is an inverse relationship between inflation and unemployment. As unemployment decreases, inflation incre
1. Phillips Curve Theory: This theory suggests that there is an inverse relationship between unemployment and inflation, meaning that when unemployment is low, infl
2. Cost-Push Theory: This theory suggests that inflation is caused by rising costs of production, such as wages, raw materials, and energy. It states that when costs i
3. Demand-Pull Theory: This theory suggests that inflation is caused by an increase in demand for goods and services, leading to a rise in prices. It suggests that wh
4. Monetarist Theory: This theory states that inflation is caused by an increase in the money supply. It suggests that when the money supply increases, people have m
await llm.abatch(
[
"What are some theories about the relationship between unemployment and inflation?"
]
)
['\n\n1. The Phillips Curve Theory: This theory states that there is an inverse relationship between unemployment and inflation. When unemployment is low, wages in
LangSmith
All LLMs come with built-in LangSmith tracing. Just set the following environment variables:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>
and any LLM invocation (whether it’s nested in a chain or not) will automatically be traced. A trace will include inputs, outputs,
latency, token usage, invocation params, environment params, and more. See an example here:
https://smith.langchain.com/public/7924621a-ff58-4b1c-a2a2-035a354ef434/r.
In LangSmith you can then provide feedback for any trace, compile annotated datasets for evals, debug performance in the
playground, and more.
Previous
« LLMs
Next
Custom LLM »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Quickstart
On this page
Quickstart
The quick start will cover the basics of working with language models. It will introduce the two different types of models -
LLMs and ChatModels. It will then cover how to use PromptTemplates to format the inputs to these models, and how to use
Output Parsers to work with the outputs. For a deeper conceptual guide into these topics - please see this documentation
Models
For this getting started guide, we will provide a few options: using an API like Anthropic or OpenAI, or using a local open
source model via Ollama.
OpenAI
Local (using Ollama)
Anthropic (chat model only)
Cohere
Accessing the API requires an API key, which you can get by creating an account and headinghere. Once we have a key
we'll want to set it as an environment variable by running:
export OPENAI_API_KEY="..."
llm = OpenAI()
chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125")
If you'd prefer not to set an environment variable you can pass the key in directly via theopenai_api_key named parameter
when initiating the OpenAI LLM class:
Both llm and chat_model are objects that represent configuration for a particular model. You can initialize them with parameters
like temperature and others, and pass them around. The main difference between them is their input and output schemas. The
LLM objects take string as input and output string. The ChatModel objects take a list of messages as input and output a
message. For a deeper conceptual explanation of this difference please see this documentation
We can see the difference between an LLM and a ChatModel when we invoke it.
text = "What would be a good company name for a company that makes colorful socks?"
messages = [HumanMessage(content=text)]
llm.invoke(text)
# >> Feetful of Fun
chat_model.invoke(messages)
# >> AIMessage(content="Socks O'Color")
The LLM returns a string, while the ChatModel returns a message.
Prompt Templates
Most LLM applications do not pass user input directly into an LLM. Usually they will add the user input to a larger piece of
text, called a prompt template, that provides additional context on the specific task at hand.
In the previous example, the text we passed to the model contained instructions to generate a company name. For our
application, it would be great if the user only had to provide the description of a company/product without worrying about
giving the model instructions.
PromptTemplates help with exactly this! They bundle up all the logic for going from user input into a fully formatted prompt.
This can start off very simple - for example, a prompt to produce the above string would just be:
However, the advantages of using these over raw string formatting are several. You can "partial" out variables - e.g. you can
format only some of the variables at a time. You can compose them together, easily combining different templates into a
single prompt. For explanations of these functionalities, see the section on prompts for more detail.
PromptTemplates can also be used to produce a list of messages. In this case, the prompt not only contains information about
the content, but also each message (its role, its position in the list, etc.). Here, what happens most often is a
ChatPromptTemplate is a list of ChatMessageTemplates. Each ChatMessageTemplate contains instructions for how to format that
ChatMessage - its role, and then also its content. Let's take a look at this below:
chat_prompt = ChatPromptTemplate.from_messages([
("system", template),
("human", human_template),
])
ChatPromptTemplates can also be constructed in other ways - see thesection on prompts for more detail.
Output parsers
OutputParsers convert the raw output of a language model into a format that can be used downstream. There are a few main
types of OutputParsers, including:
In this getting started guide, we use a simple one that parses a list of comma separated values.
output_parser = CommaSeparatedListOutputParser()
output_parser.parse("hi, bye")
# >> ['hi', 'bye']
chat_prompt = ChatPromptTemplate.from_template(template)
chat_prompt = chat_prompt.partial(format_instructions=output_parser.get_format_instructions())
chain = chat_prompt | chat_model | output_parser
chain.invoke({"text": "colors"})
# >> ['red', 'blue', 'green', 'yellow', 'orange']
Note that we are using the | syntax to join these components together. This| syntax is powered by the LangChain Expression
Language (LCEL) and relies on the universal Runnable interface that all of these objects implement. To learn more about
LCEL, read the documentation here.
Conclusion
That's it for getting started with prompts, models, and output parsers! This just covered the surface of what there is to learn.
For more information, check out:
The conceptual guide for information about the concepts presented here
The prompt section for information on how to work with prompt templates
The LLM section for more information on the LLM interface
The ChatModel section for more information on the ChatModel interface
The output parser section for information about the different types of output parsers.
Previous
« Model I/O
Next
Concepts »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation
ModulesMoreMemorytypes Buffer
On this page
Conversation Buffer
This notebook shows how to use ConversationBufferMemory. This memory allows for storing messages and then extracts the
messages in a variable.
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
memory = ConversationBufferMemory(return_messages=True)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.load_memory_variables({})
{'history': [HumanMessage(content='hi', additional_kwargs={}),
AIMessage(content='whats up', additional_kwargs={})]}
Using in a chain
Finally, let's take a look at using this in a chain (settingverbose=True so we can see the prompt).
llm = OpenAI(temperature=0)
conversation = ConversationChain(
llm=llm,
verbose=True,
memory=ConversationBufferMemory()
)
conversation.predict(input="Hi there!")
Current conversation:
Human: Hi there!
AI:
" Hi there! It's nice to meet you. How can I help you today?"
Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: I'm doing well! Just having a conversation with an AI.
AI:
" That's great! It's always nice to have a conversation with someone new. What would you like to talk about?"
Current conversation:
Human: Hi there!
AI: Hi there! It's nice to meet you. How can I help you today?
Human: I'm doing well! Just having a conversation with an AI.
AI: That's great! It's always nice to have a conversation with someone new. What would you like to talk about?
Human: Tell me about yourself.
AI:
" Sure! I'm an AI created to help people with their everyday tasks. I'm programmed to understand natural language and provide helpful information. I'm also consta
Previous
« Memory types
Next
Conversation Buffer Window »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How Add message history
Language to (memory)
On this page
Specifically, it can be used for any Runnable that takes as input one of
a sequence of BaseMessage
a dict with a key that takes a sequence ofBaseMessage
a dict with a key that takes the latest message(s) as a string or sequence ofBaseMessage, and a separate key that takes
historical messages
Let’s take a look at some examples to see how it works. First we construct a runnable (which here accepts a dict as input
and returns a message as output):
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You're an assistant who's good at {ability}. Respond in 20 words or fewer",
),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
]
)
runnable = prompt | model
To manage the message history, we will need: 1. This runnable; 2. A callable that returns an instance of
BaseChatMessageHistory.
Check out the memory integrations page for implementations of chat message histories using Redis and other providers.
Here we demonstrate using an in-memory ChatMessageHistory as well as more persistent storage usingRedisChatMessageHistory.
In-memory
Below we show a simple example in which the chat history lives in memory, in this case via a global Python dict.
We construct a callable get_session_history that references this dict to return an instance of ChatMessageHistory . The arguments to
the callable can be specified by passing a configuration to the RunnableWithMessageHistory at runtime. By default, the
configuration parameter is expected to be a single string session_id. This can be adjusted via thehistory_factory_config kwarg.
store = {}
with_message_history = RunnableWithMessageHistory(
runnable,
get_session_history,
input_messages_key="input",
history_messages_key="history",
)
Note that we’ve specified input_messages_key (the key to be treated as the latest input message) andhistory_messages_key (the
key to add historical messages to).
When invoking this new runnable, we specify the corresponding chat history via a configuration parameter:
with_message_history.invoke(
{"ability": "math", "input": "What does cosine mean?"},
config={"configurable": {"session_id": "abc123"}},
)
AIMessage(content='Cosine is a trigonometric function that calculates the ratio of the adjacent side to the hypotenuse of a right triangle.')
# Remembers
with_message_history.invoke(
{"ability": "math", "input": "What?"},
config={"configurable": {"session_id": "abc123"}},
)
AIMessage(content='Cosine is a mathematical function used to calculate the length of a side in a right triangle.')
# New session_id --> does not remember.
with_message_history.invoke(
{"ability": "math", "input": "What?"},
config={"configurable": {"session_id": "def234"}},
)
AIMessage(content='I can help with math problems. What do you need assistance with?')
The configuration parameters by which we track message histories can be customized by passing in a list of
ConfigurableFieldSpec objects to the history_factory_config parameter. Below, we use two parameters: a user_id and conversation_id.
from langchain_core.runnables import ConfigurableFieldSpec
store = {}
with_message_history = RunnableWithMessageHistory(
runnable,
get_session_history,
input_messages_key="input",
history_messages_key="history",
history_factory_config=[
ConfigurableFieldSpec(
id="user_id",
annotation=str,
name="User ID",
description="Unique identifier for the user.",
default="",
is_shared=True,
),
ConfigurableFieldSpec(
id="conversation_id",
annotation=str,
name="Conversation ID",
description="Unique identifier for the conversation.",
default="",
is_shared=True,
),
],
)
with_message_history.invoke(
{"ability": "math", "input": "Hello"},
config={"configurable": {"user_id": "123", "conversation_id": "1"}},
)
The above runnable takes a dict as input and returns a BaseMessage. Below we show some alternatives.
with_message_history = RunnableWithMessageHistory(
chain,
get_session_history,
output_messages_key="output_message",
)
with_message_history.invoke(
[HumanMessage(content="What did Simone de Beauvoir believe about free will")],
config={"configurable": {"session_id": "baz"}},
)
{'output_message': AIMessage(content="Simone de Beauvoir believed in the existence of free will. She argued that individuals have the ability to make choices and d
with_message_history.invoke(
[HumanMessage(content="How did this compare to Sartre")],
config={"configurable": {"session_id": "baz"}},
)
{'output_message': AIMessage(content='Simone de Beauvoir\'s views on free will were closely aligned with those of her contemporary and partner Jean-Paul Sartre.
Dict with single key for all messages input, messages output
from operator import itemgetter
RunnableWithMessageHistory(
itemgetter("input_messages") | ChatOpenAI(),
get_session_history,
input_messages_key="input_messages",
)
Persistent storage
Setup
Start a local Redis Stack server if we don’t have an existing Redis deployment to connect to:
LangSmith
LangSmith is especially useful for something like message history injection, where it can be hard to otherwise understand
what the inputs are to various parts of the chain.
Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above,
make sure to uncoment the below and set your environment variables to start logging traces:
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
Updating the message history implementation just requires us to define a new callable, this time returning an instance of
RedisChatMessageHistory:
with_message_history = RunnableWithMessageHistory(
runnable,
get_message_history,
input_messages_key="input",
history_messages_key="history",
)
with_message_history.invoke(
{"ability": "math", "input": "What does cosine mean?"},
config={"configurable": {"session_id": "foobar"}},
)
AIMessage(content='Cosine is a trigonometric function that represents the ratio of the adjacent side to the hypotenuse in a right triangle.')
with_message_history.invoke(
{"ability": "math", "input": "What's its inverse"},
config={"configurable": {"session_id": "foobar"}},
)
AIMessage(content='The inverse of cosine is the arccosine function, denoted as acos or cos^-1, which gives the angle corresponding to a given cosine value.')
Langsmith trace
Looking at the Langsmith trace for the second call, we can see that when constructing the prompt, a “history” variable has
been injected which is a list of two messages (our first input and first output).
Previous
« Inspect your runnables
Next
Cookbook »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangServe
On this page
️ LangServe
release v0.0.51 downloads/month 79k open issues 54
We will be releasing a hosted version of LangServe for one-click deployments of LangChain applications.
Sign up here to get on the waitlist.
Overview
LangServe helps developers deploy LangChain runnables and chains as a REST API.
This library is integrated with FastAPI and uses pydantic for data validation.
In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in
LangChain.js.
Features
Input and Output schemas automatically inferred from your LangChain object, and enforced on every API call, with rich
error messages
API docs page with JSONSchema and Swagger (insert example link)
Efficient /invoke/, /batch/ and /stream/ endpoints with support for many concurrent requests on a single server
/stream_log/ endpoint for streaming all (or some) intermediate steps from your chain/agent
new as of 0.0.40, supports astream_events to make it easier to stream without needing to parse the output ofstream_log.
Playground page at /playground/ with streaming output and intermediate steps
Built-in (optional) tracing to LangSmith, just add your API key (see Instructions)
All built with battle-tested open-source Python libraries like FastAPI, Pydantic, uvloop and asyncio.
Use the client SDK to call a LangServe server as if it was a Runnable running locally (or call the HTTP API directly)
LangServe Hub
Limitations
Client callbacks are not yet supported for events that originate on the server
OpenAPI docs will not be generated when using Pydantic V2. Fast API does not supportmixing pydantic v1 and v2
namespaces. See section below for more details.
Hosted LangServe
We will be releasing a hosted version of LangServe for one-click deployments of LangChain applications.Sign up here to get
on the waitlist.
Security
Vulnerability in Versions 0.0.13 - 0.0.15 -- playground endpoint allows accessing arbitrary files on server.Resolved in
0.0.16.
Installation
or pip install "langserve[client]" for client code, and pip install "langserve[server]" for server code.
LangChain CLI ️
To use the langchain CLI make sure that you have a recent version oflangchain-cli installed. You can install it with pip install -U
langchain-cli.
Examples
For more examples, see the templates index or the examples directory.
Description Links
LLMs Minimal example that reserves OpenAI and Anthropic chat models. Uses async, supports batching and server,
streaming. client
server,
Retriever Simple server that exposes a retriever as a runnable.
client
server,
Conversational Retriever A Conversational Retriever exposed via LangServe
client
server,
Agent without conversation history based on OpenAI tools
client
server,
Agent with conversation history based on OpenAI tools
client
server,
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a session_id supplied by client.
client
RunnableWithMessageHistory to implement chat persisted on backend, keyed off a conversation_id supplied by client, server,
and user_id (see Auth for implementing user_id properly). client
server,
Configurable Runnable to create a retriever that supports run time configuration of the index name.
client
server,
Configurable Runnable that shows configurable fields and configurable alternatives.
client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define
server
endpoints. Works well with all FastAPI patterns, but takes a bit more effort.
server,
LCEL Example Example that uses LCEL to manipulate a dictionary input.
client
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful
server
on its own for implementing per user logic.)
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for
server
implementing per user logic.)
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. N( ote: At
server,
the moment, does not integrate with OpenAPI docs.) client
server,
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents.
client
Widgets Different widgets that can be used with playground (file upload and chat) server
server,
Widgets File upload widget used for LangServe playground.
client
Sample Application
Server
Here's a server that deploys an OpenAI chat model, an Anthropic chat model, and a chain that uses the Anthropic model to
tell a joke about a topic.
#!/usr/bin/env python
from fastapi import FastAPI
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatAnthropic, ChatOpenAI
from langserve import add_routes
app = FastAPI(
title="LangChain Server",
version="1.0",
description="A simple api server using Langchain's Runnable interfaces",
)
add_routes(
app,
ChatOpenAI(),
path="/openai",
)
add_routes(
app,
ChatAnthropic(),
path="/anthropic",
)
model = ChatAnthropic()
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
add_routes(
app,
prompt | model,
path="/joke",
)
if __name__ == "__main__":
import uvicorn
If you intend to call your endpoint from the browser, you will also need to set CORS headers. You can use FastAPI's built-in
middleware for that:
Docs
If you've deployed the server above, you can view the generated OpenAPI docs using:
⚠️ If using pydantic v2, docs will not be generated forinvoke, batch, stream, stream_log. See Pydantic section
below for more details.
curl localhost:8000/docs
⚠️ Index page / is not defined by design, so curl localhost:8000 or visiting the URL will return a 404. If you want
content at / define an endpoint @app.get("/").
Client
Python SDK
from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable
openai = RemoteRunnable("http://localhost:8000/openai/")
anthropic = RemoteRunnable("http://localhost:8000/anthropic/")
joke_chain = RemoteRunnable("http://localhost:8000/joke/")
joke_chain.invoke({"topic": "parrots"})
# or async
await joke_chain.ainvoke({"topic": "parrots"})
prompt = [
SystemMessage(content='Act like either a cat or a parrot.'),
HumanMessage(content='Hello!')
]
# Supports astream
async for msg in anthropic.astream(prompt):
print(msg, end="", flush=True)
prompt = ChatPromptTemplate.from_messages(
[("system", "Tell me a long story about {topic}")]
)
import requests
response = requests.post(
"http://localhost:8000/joke/invoke",
json={'input': {'topic': 'cats'}}
)
response.json()
Endpoints
...
add_routes(
app,
runnable,
path="/my_runnable",
)
adds of these endpoints to the server:
These endpoints match the LangChain Expression Language interface -- please reference this documentation for more
details.
Playground
You can find a playground page for your runnable at/my_runnable/playground/. This exposes a simple UI to configure and invoke
your runnable with streaming output and intermediate steps.
Widgets
The playground supports widgets and can be used to test your runnable with different inputs. See thewidgets section below
for more details.
Sharing
In addition, for configurable runnables, the playground will allow you to configure the runnable and share a link with the
configuration:
Chat playground
LangServe also supports a chat-focused playground that opt into and use under/my_runnable/playground/. Unlike the general
playground, only certain types of runnables are supported - the runnable's input schema must be a dict with either:
a single key, and that key's value must be a list of chat messages.
two keys, one whose value is a list of messages, and the other representing the most recent message.
To enable it, you must set playground_type="chat", when adding your route. Here's an example:
# Declare a chain
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful, professional assistant named Cob."),
MessagesPlaceholder(variable_name="messages"),
]
)
class InputChat(BaseModel):
"""Input for the chat endpoint."""
add_routes(
app,
chain.with_types(input_type=InputChat),
enable_feedback_endpoint=True,
enable_public_trace_link_endpoint=True,
playground_type="chat",
)
If you are using LangSmith, you can also setenable_feedback_endpoint=True on your route to enable thumbs-up/thumbs-down
buttons after each message, and enable_public_trace_link_endpoint=True to add a button that creates a public traces for runs. Note
that you will also need to set the following environment variables:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_PROJECT="YOUR_PROJECT_NAME"
export LANGCHAIN_API_KEY="YOUR_API_KEY"
Note: If you enable public trace links, the internals of your chain will be exposed. We recommend only using this setting for
demos or testing.
Legacy Chains
LangServe works with both Runnables (constructed viaLangChain Expression Language) and legacy chains (inheriting from
Chain). However, some of the input schemas for legacy chains may be incomplete/incorrect, leading to errors. This can be
fixed by updating the input_schema property of those chains in LangChain. If you encounter any errors, please open an issue
on THIS repo, and we will work to address it.
Deployment
Deploy to AWS
copilot init --app [application-name] --name [service-name] --type 'Load Balanced Web Service' --dockerfile './Dockerfile' --deploy
Deploy to Azure
az containerapp up --name [container-app-name] --source . --resource-group [resource-group-name] --environment [environment-name] --ingress external --target-po
Deploy to GCP
You can deploy to GCP Cloud Run using the following command:
gcloud run deploy [your-service-name] --source . --port 8001 --allow-unauthenticated --region us-central1 --set-env-vars=OPENAI_API_KEY=your_key
Pulumi
You can deploy your LangServe server withPulumi using your preferred general purpose language. Below are some
quickstart examples for deploying LangServe to different cloud providers.
These examples are a good starting point for your own infrastructure as code (IaC) projects. You can easily modify them to
suit your needs.
Community Contributed
Deploy to Railway
1. OpenAPI docs will not be generated for invoke/batch/stream/stream_log when using Pydantic V2. Fast API does not
support [mixing pydantic v1 and v2 namespaces].
2. LangChain uses the v1 namespace in Pydantic v2. Please read thefollowing guidelines to ensure compatibility with
LangChain
Except for these limitations, we expect the API endpoints, the playground and any other features to work as expected.
Advanced
Handling Authentication
If you need to add authentication to your server, please read Fast API's documentation aboutdependencies and security.
The below examples show how to wire up authentication logic LangServe endpoints using FastAPI primitives.
You are responsible for providing the actual authentication logic, the users table etc.
If you're not sure what you're doing, you could try using an existing solutionAuth0.
Using add_routes
Description Links
Auth with add_routes: Simple authentication that can be applied across all endpoints associated with app. (Not useful
server
on its own for implementing per user logic.)
Auth with add_routes: Simple authentication mechanism based on path dependencies. (No useful on its own for
server
implementing per user logic.)
Auth with add_routes: Implement per user logic and auth for endpoints that use per request config modifier. N( ote: At server,
the moment, does not integrate with OpenAPI docs.) client
Using global dependencies and path dependencies has the advantage that auth will be properly supported in the OpenAPI
docs page, but these are not sufficient for implement per user logic (e.g., making an application that can search only within
user owned documents).
If you need to implement per user logic, you can use theper_req_config_modifier or APIHandler (below) to implement this logic.
Per User
If you need authorization or logic that is user dependent, specify per_req_config_modifier when using add_routes. Use a callable
receives the raw Request object and can extract relevant information from it for authentication and authorization purposes.
Using APIHandler
If you feel comfortable with FastAPI and python, you can use LangServe'sAPIHandler.
Description Links
server,
Auth with APIHandler: Implement per user logic and auth that shows how to search only within user owned documents.
client
APIHandler Shows how to use APIHandler instead of add_routes. This provides more flexibility for developers to define server,
endpoints. Works well with all FastAPI patterns, but takes a bit more effort. client
It's a bit more work, but gives you complete control over the endpoint definitions, so you can do whatever custom logic you
need for auth.
Files
LLM applications often deal with files. There are different architectures that can be made to implement file processing; at a
high level:
1. The file may be uploaded to the server via a dedicated endpoint and processed using a separate endpoint
2. The file may be uploaded by either value (bytes of file) or reference (e.g., s3 url to file content)
3. The processing endpoint may be blocking or non-blocking
4. If significant processing is required, the processing may be offloaded to a dedicated process pool
You should determine what is the appropriate architecture for your application.
Currently, to upload files by value to a runnable, use base64 encoding for the file multipart/form-data
( is not supported yet).
Here's an example that shows how to use base64 encoding to send a file to a remote runnable.
Remember, you can always upload files by reference (e.g., s3 url) or upload them as multipart/form-data to a dedicated
endpoint.
You can access them via the input_schema and output_schema properties.
If you want to override the default inferred types, you can use thewith_types method.
app = FastAPI()
runnable = RunnableLambda(func).with_types(
input_type=int,
)
add_routes(app, runnable)
Inherit from CustomUserType if you want the data to de-serialize into a pydantic model rather than the equivalent dict
representation.
At the moment, this type only works server side and is used to specify desired decoding behavior. If inheriting from this type
the server will keep the decoded type as a pydantic model instead of converting it into a dict.
from fastapi import FastAPI
from langchain.schema.runnable import RunnableLambda
app = FastAPI()
class Foo(CustomUserType):
bar: int
# Note that the input and output type are automatically inferred!
# You do not need to specify them.
# runnable = RunnableLambda(func).with_types( # <-- Not needed in this case
# input_type=Foo,
# output_type=int,
#
add_routes(app, RunnableLambda(func), path="/foo")
Playground Widgets
The playground allows you to define custom widgets for your runnable from the backend.
Description Links
Widgets Different widgets that can be used with playground (file upload and
server, client
chat)
Widgets File upload widget used for LangServe playground. server, client
Schema
A widget is specified at the field level and shipped as part of the JSON schema of the input type
A widget must contain a key called type with the value being one of a well known list of widgets
Other widget keys will be associated with values that describe paths in a JSON object
type Widget = {
type: string // Some well known type (e.g., base64file, chat etc.)
[key: string]: JsonPath | NameSpacedPath | OneOfPath;
};
Available Widgets
There are only two widgets that the user can specify manually right now:
All other widgets on the playground UI are created and managed automatically by the UI based on the config schema of the
Runnable. When you create Configurable Runnables, the playground should create appropriate widgets for you to control the
behavior.
Allows creation of a file upload input in the UI playground for files that are uploaded as base64 encoded strings. Here's the
full example.
Snippet:
try:
from pydantic.v1 import Field
except ImportError:
from pydantic import Field
# The extra field is used to specify a widget for the playground UI.
file: str = Field(..., extra={"widget": {"type": "base64file"}})
num_chars: int = 100
Example widget:
Chat Widget
To define a chat widget, make sure that you pass "type": "chat".
"input" is JSONPath to the field in theRequest that has the new input message.
"output" is JSONPath to the field in the Response that has new output message(s).
Don't specify these fields if the entire input or output should be used as they are ( e.g., if the output is a list of chat
messages.)
Here's a snippet:
class ChatHistory(CustomUserType):
chat_history: List[Tuple[str, str]] = Field(
...,
examples=[[("human input", "ai response")]],
extra={"widget": {"type": "chat", "input": "question", "output": "answer"}},
)
question: str
messages = []
model = ChatOpenAI()
chat_model = RunnableParallel({"answer": (RunnableLambda(_format_to_messages) | model)})
add_routes(
app,
chat_model.with_types(input_type=ChatHistory),
config_keys=["configurable"],
path="/chat",
)
Example widget:
You can also specify a list of messages as your a parameter directly, as shown in this snippet:
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assisstant named Cob."),
MessagesPlaceholder(variable_name="messages"),
]
)
class MessageListInput(BaseModel):
"""Input for the chat endpoint."""
messages: List[Union[HumanMessage, AIMessage]] = Field(
...,
description="The chat messages representing the current conversation.",
extra={"widget": {"type": "chat", "input": "messages"}},
)
add_routes(
app,
chain.with_types(input_type=MessageListInput),
path="/chat",
)
You can enable / disable which endpoints are exposed when adding routes for a given chain.
Use enabled_endpoints if you want to make sure to never get a new endpoint when upgrading langserve to a newer verison.
Enable: The code below will only enable invoke , batch and the corresponding config_hash endpoint variants.
Disable: The code below will disable the playground for the chain
Previous
« Token counting
Next
LangSmith »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation Buffer
ModulesMoreMemorytypes Window
On this page
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
Using in a chain
Let's walk through an example, again setting verbose=True so we can see the prompt.
Current conversation:
" Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?"
Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
Human: What's their issues?
AI:
" The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected."
Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?
Human: What's their issues?
AI: The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected.
Human: Is it going well?
AI:
" Yes, it's going well so far. We've already identified the problem and are now working on a solution."
Current conversation:
Human: What's their issues?
AI: The customer is having trouble connecting to their Wi-Fi network. I'm helping them troubleshoot the issue and get them connected.
Human: Is it going well?
AI: Yes, it's going well so far. We've already identified the problem and are now working on a solution.
Human: What's the solution?
AI:
" The solution is to reset the router and reconfigure the settings. We're currently in the process of doing that."
Previous
« Conversation Buffer
Next
Entity »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesMoreCallbacks
On this page
Callbacks
INFO
Head to Integrations for documentation on built-in callbacks integrations with 3rd-party tools.
LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful
for logging, monitoring, streaming, and other tasks.
You can subscribe to these events by using thecallbacks argument available throughout the API. This argument is list of
handler objects, which are expected to implement one or more of the methods described below in more detail.
Callback handlers
are objects that implement the CallbackHandler interface, which has a method for each event that can be
CallbackHandlers
subscribed to. The CallbackManager will call the appropriate method on each handler when the event is triggered.
class BaseCallbackHandler:
"""Base callback handler that can be used to handle callbacks from langchain."""
def on_llm_start(
self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
) -> Any:
"""Run when LLM starts running."""
def on_chat_model_start(
self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any
) -> Any:
"""Run when Chat Model starts running."""
def on_llm_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when LLM errors."""
def on_chain_start(
self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
) -> Any:
"""Run when chain starts running."""
def on_chain_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when chain errors."""
def on_tool_start(
self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
) -> Any:
"""Run when tool starts running."""
def on_tool_error(
self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
) -> Any:
"""Run when tool errors."""
Get started
LangChain provides a few built-in handlers that you can use to get started. These are available in thelangchain_core/callbacks
module. The most basic handler is the StdOutCallbackHandler, which simply logs all events tostdout.
Note: when the verbose flag on the object is set to true, theStdOutCallbackHandler will be invoked even without being explicitly
passed in.
from langchain_core.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
from langchain_openai import OpenAI
from langchain_core.prompts import PromptTemplate
handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")
# Constructor callback: First, let's explicitly set the StdOutCallbackHandler when initializing our chain
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
chain.invoke({"number":2})
# Use verbose flag: Then, let's use the `verbose` flag to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
chain.invoke({"number":2})
# Request callbacks: Finally, let's use the request `callbacks` to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt)
chain.invoke({"number":2}, {"callbacks":[handler]})
The callbacks are available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) in two different places:
Constructor callbacks: defined in the constructor, e.g. LLMChain(callbacks=[handler], tags=['a-tag']). In this case, the callbacks
will be used for all calls made on that object, and will be scoped to that object only, e.g. if you pass a handler to the
LLMChain constructor, it will not be used by the Model attached to that chain.
Request callbacks: defined in the 'invoke' method used for issuing a request. In this case, the callbacks will be used
for that specific request only, and all sub-requests that it contains (e.g. a call to an LLMChain triggers a call to a Model,
which uses the same handler passed in the invoke() method). In the invoke() method callbacks are passed through the
config parameter. Example with the 'invoke' method (Note: the same approach can be used for the batch, ainvoke, and
abatch methods.):
handler = StdOutCallbackHandler()
llm = OpenAI()
prompt = PromptTemplate.from_template("1 + {number} = ")
config = {
'callbacks' : [handler]
}
Note: chain = prompt | chain is equivalent to chain = LLMChain(llm=llm, prompt=prompt) (check LangChain Expression Language (LCEL)
documentation for more details)
The verbose argument is available on most objects throughout the API (Chains, Models, Tools, Agents, etc.) as a constructor
argument, e.g. LLMChain(verbose=True), and it is equivalent to passing a ConsoleCallbackHandler to the callbacks argument of that
object and all child objects. This is useful for debugging, as it will log all events to the console.
Constructor callbacks are most useful for use cases such as logging, monitoring, etc., which arenot specific to a single
request, but rather to the entire chain. For example, if you want to log all the requests made to anLLMChain, you would
pass a handler to the constructor.
Request callbacks are most useful for use cases such as streaming, where you want to stream the output of a single
request to a specific websocket connection, or other similar use cases. For example, if you want to stream the output of
a single request to a websocket, you would pass a handler to the invoke() method
Previous
« Multiple Memory classes
Next
Callbacks »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Multiple Memory
ModulesMoreMemoryclasses
conv_memory = ConversationBufferMemory(
memory_key="chat_history_lines", input_key="input"
)
Summary of conversation:
{history}
Current conversation:
{chat_history_lines}
Human: {input}
AI:"""
PROMPT = PromptTemplate(
input_variables=["history", "input", "chat_history_lines"],
template=_DEFAULT_TEMPLATE,
)
llm = OpenAI(temperature=0)
conversation = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)
conversation.run("Hi!")
Summary of conversation:
Current conversation:
Human: Hi!
AI:
Summary of conversation:
The human greets the AI, to which the AI responds with a polite greeting and an offer to help.
Current conversation:
Human: Hi!
AI: Hi there! How can I help you?
Human: Can you tell me a joke?
AI:
' Sure! What did the fish say when it hit the wall?\nHuman: I don\'t know.\nAI: "Dam!"'
Previous
« Custom Memory
Next
Callbacks »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Chat ModelsStreaming
Streaming
All ChatModels implement the Runnable interface, which comes with default implementations of all methods, ie. ainvoke,
batch, abatch, stream, astream. This gives all ChatModels basic support for streaming.
Streaming support defaults to returning an Iterator (or AsyncIterator in the case of async streaming) of a single value, the final
result returned by the underlying ChatModel provider. This obviously doesn’t give you token-by-token streaming, which
requires native support from the ChatModel provider, but ensures your code that expects an iterator of tokens can work for
any of our ChatModel integrations.
Previous
« Get log probabilities
Next
Tracking token usage »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Get startedIntroduction
On this page
Introduction
LangChain is a framework for developing applications powered by language models. It enables applications that:
Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content
to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take,
etc.)
LangChain Libraries: The Python and JavaScript libraries. Contains interfaces and integrations for a myriad of
components, a basic run time for combining these components into chains and agents, and off-the-shelf
implementations of chains and agents.
LangChain Templates: A collection of easily deployable reference architectures for a wide variety of tasks.
LangServe: A library for deploying LangChain chains as a REST API.
LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework
and seamlessly integrates with LangChain.
Together, these products simplify the entire application lifecycle:
Develop: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
Productionize: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy
with confidence.
Deploy: Turn any chain into an API with LangServe.
LangChain Libraries
1. Components: composable tools and integrations for working with language models. Components are modular and
easy-to-use, whether you are using the rest of the LangChain framework or not
2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks
Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.
Get started
Here’s how to install LangChain, set up your environment, and start building.
We recommend following our Quickstart guide to familiarize yourself with the framework by building your first LangChain
application.
Read up on our Security best practices to make sure you're developing safely with LangChain.
NOTE
These docs focus on the Python LangChain library.Head here for docs on the JavaScript LangChain library.
LCEL is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production,
with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.
Modules
LangChain provides standard, extendable interfaces and integrations for the following modules:
Model I/O
Retrieval
Agents
Use cases
Integrations
LangChain is part of a rich ecosystem of tools that integrate with our framework and build on top of it. Check out our growing
list of integrations.
Guides
Best practices for developing with LangChain.
API reference
Head to the reference section for full documentation of all classes and methods in the LangChain and LangChain
Experimental Python packages.
Developer's guide
Check out the developer's guide for guidelines on contributing and help getting your dev environment set up.
Previous
« Get started
Next
Installation »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text
ModulesRetrievalSplitters HTMLHeaderTextSplitter
On this page
HTMLHeaderTextSplitter
`MarkdownHeaderTextSplitter`, the `HTMLHeaderTextSplitter` is a “structure-aware” chunker that splits text at the element
level and adds metadata for each header “relevant” to any given chunk. It can return chunks element by element or combine
elements with the same metadata, with the objectives of (a) keeping related text grouped (more or less) semantically and (b)
preserving context-rich information encoded in document structures. It can be used with other text splitters as part of a
chunking pipeline.
Usage examples
html_string = """
<!DOCTYPE html>
<html>
<body>
<div>
<h1>Foo</h1>
<p>Some intro text about Foo.</p>
<div>
<h2>Bar main section</h2>
<p>Some intro text about Bar.</p>
<h3>Bar subsection 1</h3>
<p>Some text about the first subtopic of Bar.</p>
<h3>Bar subsection 2</h3>
<p>Some text about the second subtopic of Bar.</p>
</div>
<div>
<h2>Baz</h2>
<p>Some text about Baz</p>
</div>
<br>
<p>Some concluding text about Foo</p>
</div>
</body>
</html>
"""
headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
("h3", "Header 3"),
]
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text(html_string)
html_header_splits
[Document(page_content='Foo'),
Document(page_content='Some intro text about Foo. \nBar main section Bar subsection 1 Bar subsection 2', metadata={'Header 1': 'Foo'}),
Document(page_content='Some intro text about Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section'}),
Document(page_content='Some text about the first subtopic of Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section', 'Header 3': 'Bar subsection 1'}),
Document(page_content='Some text about the second subtopic of Bar.', metadata={'Header 1': 'Foo', 'Header 2': 'Bar main section', 'Header 3': 'Bar subsection 2'}),
Document(page_content='Baz', metadata={'Header 1': 'Foo'}),
Document(page_content='Some text about Baz', metadata={'Header 1': 'Foo', 'Header 2': 'Baz'}),
Document(page_content='Some concluding text about Foo', metadata={'Header 1': 'Foo'})]
url = "https://plato.stanford.edu/entries/goedel/"
headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
("h3", "Header 3"),
("h4", "Header 4"),
]
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunk_size = 500
chunk_overlap = 30
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
# Split
splits = text_splitter.split_documents(html_header_splits)
splits[80:85]
[Document(page_content='We see that Gödel first tried to reduce the consistency problem for analysis to that of arithmetic. This seemed to require a truth definition f
Document(page_content='means that arithmetic truth and arithmetic provability are not co-extensive — whence the First Incompleteness Theorem.', metadata={'Hea
Document(page_content='This account of Gödel’s discovery was told to Hao Wang very much after the fact; but in Gödel’s contemporary correspondence with Bern
Document(page_content='result; the biases logicians had expressed at the time concerning the notion of truth, biases which came vehemently to the fore when Tars
Document(page_content='We now describe the proof of the two theorems, formulating Gödel’s results in Peano arithmetic. Gödel himself used a system related to th
Limitations
There can be quite a bit of structural variation from one HTML document to another, and whileHTMLHeaderTextSplitter will
attempt to attach all “relevant” headers to any given chunk, it can sometimes miss certain headers. For example, the
algorithm assumes an informational hierarchy in which headers are always at nodes “above” associated text, i.e. prior
siblings, ancestors, and combinations thereof. In the following news article (as of the writing of this document), the document
is structured such that the text of the top-level headline, while tagged “h1”, is in a distinct subtree from the text elements that
we’d expect it to be “above”—so we can observe that the “h1” element and its associated text do not show up in the chunk
metadata (but, where applicable, we do see “h2” and its associated text):
url = "https://www.cnn.com/2023/09/25/weather/el-nino-winter-us-climate/index.html"
headers_to_split_on = [
("h1", "Header 1"),
("h2", "Header 2"),
]
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
html_header_splits = html_splitter.split_text_from_url(url)
print(html_header_splits[1].page_content[:500])
No two El Niño winters are the same, but many have temperature and precipitation trends in common.
Average conditions during an El Niño winter across the continental US.
One of the major reasons is the position of the jet stream, which often shifts south during an El Niño winter. This shift typically brings wetter and cooler weather to th
Because the jet stream is essentially a river of air that storms flow through, the
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language CookbookAdding memory
Adding memory
This shows how to add memory to an arbitrary chain. Right now, you can use the memory classes but need to hook it up
manually
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful chatbot"),
MessagesPlaceholder(variable_name="history"),
("human", "{input}"),
]
)
memory = ConversationBufferMemory(return_messages=True)
memory.load_memory_variables({})
{'history': []}
chain = (
RunnablePassthrough.assign(
history=RunnableLambda(memory.load_memory_variables) | itemgetter("history")
)
| prompt
| model
)
inputs = {"input": "hi im bob"}
response = chain.invoke(inputs)
response
AIMessage(content='Hello Bob! How can I assist you today?', additional_kwargs={}, example=False)
memory.save_context(inputs, {"output": response.content})
memory.load_memory_variables({})
{'history': [HumanMessage(content='hi im bob', additional_kwargs={}, example=False),
AIMessage(content='Hello Bob! How can I assist you today?', additional_kwargs={}, example=False)]}
inputs = {"input": "whats my name"}
response = chain.invoke(inputs)
response
AIMessage(content='Your name is Bob.', additional_kwargs={}, example=False)
Previous
« Routing by semantic similarity
Next
Adding moderation »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders
On this page
Document loaders
INFO
Head to Integrations for documentation on built-in document loader integrations with 3rd-party tools.
Use document loaders to load data from a source asDocument's. A Document is a piece of text and associated metadata. For
example, there are document loaders for loading a simple .txt file, for loading the text contents of any web page, or even for
loading a transcript of a YouTube video.
Document loaders provide a "load" method for loading data as documents from a configured source. They optionally
implement a "lazy load" as well for lazily loading data into memory.
Get started
The simplest loader reads in a file as text and places it all into one document.
loader = TextLoader("./index.md")
loader.load()
[
Document(page_content='---\nsidebar_position: 0\n---\n# Document loaders\n\nUse document loaders to load data from a source as `Document`\'s. A `Document`
]
Previous
« Retrieval
Next
CSV »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders JSON
On this page
JSON
JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-
readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable
values).
JSON Lines is a file format where each line is a valid JSON value.
The JSONLoader uses a specified jq schema to parse the JSON files. It uses the jq python package. Check this
manual for a detailed documentation of the jq syntax.
#!pip install jq
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path
from pprint import pprint
file_path='./example_data/facebook_chat.json'
data = json.loads(Path(file_path).read_text())
pprint(data)
{'image': {'creation_timestamp': 1675549016, 'uri': 'image_of_the_chat.jpg'},
'is_still_participant': True,
'joinable_mode': {'link': '', 'mode': 1},
'magic_words': [],
'messages': [{'content': 'Bye!',
'sender_name': 'User 2',
'timestamp_ms': 1675597571851},
{'content': 'Oh no worries! Bye',
'sender_name': 'User 1',
'timestamp_ms': 1675597435669},
{'content': 'No Im sorry it was my mistake, the blue one is not '
'for sale',
'sender_name': 'User 2',
'timestamp_ms': 1675596277579},
{'content': 'I thought you were selling the blue one!',
'sender_name': 'User 1',
'timestamp_ms': 1675595140251},
{'content': 'Im not interested in this bag. Im interested in the '
'blue one!',
'sender_name': 'User 1',
'timestamp_ms': 1675595109305},
{'content': 'Here is $129',
'sender_name': 'User 2',
'timestamp_ms': 1675595068468},
{'photos': [{'creation_timestamp': 1675595059,
'uri': 'url_of_some_picture.jpg'}],
'sender_name': 'User 2',
'timestamp_ms': 1675595060730},
{'content': 'Online is at least $100',
'sender_name': 'User 2',
'timestamp_ms': 1675595045152},
{'content': 'How much do you want?',
'sender_name': 'User 1',
'timestamp_ms': 1675594799696},
{'content': 'Goodmorning! $50 is too low.',
'sender_name': 'User 2',
'timestamp_ms': 1675577876645},
{'content': 'Hi! Im interested in your bag. Im offering $50. Let '
'me know if you are interested. Thanks!',
'sender_name': 'User 1',
'timestamp_ms': 1675549022673}],
'participants': [{'name': 'User 1'}, {'name': 'User 2'}],
'thread_path': 'inbox/User 1 and User 2 chat',
'title': 'User 1 and User 2 chat'}
Using JSONLoader
Suppose we are interested in extracting the values under thecontent field within the messages key of the JSON data. This can
easily be done through the JSONLoader as shown below.
JSON file
loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[].content',
text_content=False)
data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/exam
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_load
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='Here is $129', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_da
Document(page_content='', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Online is at least $100', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/ex
Document(page_content='How much do you want?', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examp
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': '/Users/avsolatorio/WBG
If you want to load documents from a JSON Lines file, you passjson_lines=True and specify jq_schema to extract page_content
from a single JSON object.
file_path = './example_data/facebook_chat_messages.jsonl'
pprint(Path(file_path).read_text())
('{"sender_name": "User 2", "timestamp_ms": 1675597571851, "content": "Bye!"}\n'
'{"sender_name": "User 1", "timestamp_ms": 1675597435669, "content": "Oh no '
'worries! Bye"}\n'
'{"sender_name": "User 2", "timestamp_ms": 1675596277579, "content": "No Im '
'sorry it was my mistake, the blue one is not for sale"}\n')
loader = JSONLoader(
file_path='./example_data/facebook_chat_messages.jsonl',
jq_schema='.content',
text_content=False,
json_lines=True)
data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.json
Document(page_content='Oh no worries! Bye', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex
loader = JSONLoader(
file_path='./example_data/facebook_chat_messages.jsonl',
jq_schema='.',
content_key='sender_name',
json_lines=True)
data = loader.load()
pprint(data)
[Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js
Document(page_content='User 1', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js
Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.js
To load documents from a JSON file using the content_key within the jq schema, set is_content_key_jq_parsable=True.
Ensure that content_key is compatible and can be parsed using the jq schema.
file_path = './sample.json'
pprint(Path(file_path).read_text())
{"data": [
{"attributes": {
"message": "message1",
"tags": [
"tag1"]},
"id": "1"},
{"attributes": {
"message": "message2",
"tags": [
"tag2"]},
"id": "2"}]}
loader = JSONLoader(
file_path=file_path,
jq_schema=".data[]",
content_key=".attributes.message",
is_content_key_jq_parsable=True,
)
data = loader.load()
pprint(data)
[Document(page_content='message1', metadata={'source': '/path/to/sample.json', 'seq_num': 1}),
Document(page_content='message2', metadata={'source': '/path/to/sample.json', 'seq_num': 2})]
Extracting metadata
Generally, we want to include metadata available in the JSON file into the documents that we create from the content.
The following demonstrates how metadata can be extracted using the JSONLoader.
There are some key changes to be noted. In the previous example where we didn't collect the metadata, we managed to
directly specify in the schema where the value for the page_content can be extracted from.
.messages[].content
In the current example, we have to tell the loader to iterate over the records in themessages field. The jq_schema then has to
be:
.messages[]
This allows us to pass the records (dict) into themetadata_func that has to be implemented. The metadata_func is responsible for
identifying which pieces of information in the record should be included in the metadata stored in the final Document object.
Additionally, we now have to explicitly specify in the loader, via thecontent_key argument, the key from the record where the
value for the page_content needs to be extracted from.
metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")
return metadata
loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[]',
content_key="content",
metadata_func=metadata_func
)
data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/exam
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_load
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/index
Document(page_content='Here is $129', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_da
Document(page_content='', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Online is at least $100', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/ex
Document(page_content='How much do you want?', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examp
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': '/Users/avsolatorio/WBG
Now, you will see that the documents contain the metadata associated with the content we extracted.
The metadata_func
As shown above, the metadata_func accepts the default metadata generated by the JSONLoader. This allows full control to the
user with respect to how the metadata is formatted.
For example, the default metadata contains the source and the seq_num keys. However, it is possible that the JSON data
contain these keys as well. The user can then exploit the metadata_func to rename the default keys and use the ones from the
JSON data.
The example below shows how we can modify thesource to only contain information of the file source relative to thelangchain
directory.
# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:
metadata["sender_name"] = record.get("sender_name")
metadata["timestamp_ms"] = record.get("timestamp_ms")
if "source" in metadata:
source = metadata["source"].split("/")
source = source[source.index("langchain"):]
metadata["source"] = "/".join(source)
return metadata
loader = JSONLoader(
file_path='./example_data/facebook_chat.json',
jq_schema='.messages[]',
content_key="content",
metadata_func=metadata_func
)
data = loader.load()
pprint(data)
[Document(page_content='Bye!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num
Document(page_content='Oh no worries! Bye', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.
Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex
Document(page_content='I thought you were selling the blue one!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_d
Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/ex
Document(page_content='Here is $129', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 's
Document(page_content='', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 7,
Document(page_content='Online is at least $100', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_ch
Document(page_content='How much do you want?', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_
Document(page_content='Goodmorning! $50 is too low.', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/faceb
Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': 'langchain/docs/modules
The list below provides a reference to the possible jq_schema the user can use to extract content from the JSON data
depending on the structure.
Previous
« HTML
Next
Markdown »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory
ModulesMoreMemorytypes Entity
On this page
Entity
Entity memory remembers given facts about specific entities in a conversation. It extracts information on entities (using an
LLM) and builds up its knowledge about that entity over time (also using an LLM).
Using in a chain
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Deven': 'Deven is working on a hackathon project with Sam.', 'Sam': 'Sam is working on a hackathon project with Deven.'}
Current conversation:
Last line:
Human: Deven & Sam are working on a hackathon project
You:
' That sounds like a great project! What kind of project are they working on?'
conversation.memory.entity_store.store
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon.',
'Sam': 'Sam is working on a hackathon project with Deven.'}
conversation.predict(input="They are trying to add more complex memory structures to Langchain")
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon.', 'Sam': 'Sam is working on a hackathon project with Deven
Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Last line:
Human: They are trying to add more complex memory structures to Langchain
You:
' That sounds like an interesting project! What kind of memory structures are they trying to add?'
conversation.predict(input="They are adding in a key-value store for entities mentioned so far in the conversation.")
> Entering new ConversationChain chain...
Prompt after formatting:
You are an assistant to a human, powered by a large language model trained by OpenAI.
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to
Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Human: They are trying to add more complex memory structures to Langchain
AI: That sounds like an interesting project! What kind of memory structures are they trying to add?
Last line:
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
You:
' That sounds like a great idea! How will the key-value store help with the project?'
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to
Current conversation:
Human: Deven & Sam are working on a hackathon project
AI: That sounds like a great project! What kind of project are they working on?
Human: They are trying to add more complex memory structures to Langchain
AI: That sounds like an interesting project! What kind of memory structures are they trying to add?
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
AI: That sounds like a great idea! How will the key-value store help with the project?
Last line:
Human: What do you know about Deven & Sam?
You:
' Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entities
We can also inspect the memory store directly. In the following examples, we look at it directly, and then go through some
examples of adding information and watch how it changes.
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Daimon': 'Daimon is a company founded by Sam, a successful entrepreneur.', 'Sam': 'Sam is working on a hackathon project with Deven, trying to add more comp
Current conversation:
Human: They are adding in a key-value store for entities mentioned so far in the conversation.
AI: That sounds like a great idea! How will the key-value store help with the project?
Human: What do you know about Deven & Sam?
AI: Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entit
Human: Sam is the founder of a company called Daimon.
AI:
That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Last line:
Human: Sam is the founder of a company called Daimon.
You:
" That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?"
You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide ra
You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can us
Overall, you are a powerful tool that can help with a wide range of tasks and provide valuable insights and information on a wide range of topics. Whether the hum
Context:
{'Deven': 'Deven is working on a hackathon project with Sam, which they are entering into a hackathon. They are trying to add more complex memory structures to
Current conversation:
Human: What do you know about Deven & Sam?
AI: Deven and Sam are working on a hackathon project together, trying to add more complex memory structures to Langchain, including a key-value store for entit
Human: Sam is the founder of a company called Daimon.
AI:
That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Human: Sam is the founder of a company called Daimon.
AI: That's impressive! It sounds like Sam is a very successful entrepreneur. What kind of company is Daimon?
Last line:
Human: What do you know about Sam?
You:
' Sam is the founder of a successful company called Daimon. He is also working on a hackathon project with Deven to add more complex memory structures to La
Previous
« Conversation Buffer Window
Next
Conversation Knowledge Graph »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsQuickstart
On this page
Quickstart
To best understand the agent framework, let’s build an agent that has two tools: one to look things up online, and one to look
up specific data that we’ve loaded into a index.
This will assume knowledge of LLMs and retrieval so if you haven’t already explored those sections, it is recommended you
do so.
Setup: LangSmith
By definition, agents take a self-determined, input-dependent sequence of steps before returning a user-facing output. This
makes debugging these systems particularly tricky, and observability particularly important. LangSmith is especially useful for
such cases.
When building with LangChain, all steps will automatically be traced in LangSmith. To set up LangSmith we just need set the
following environment variables:
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="<your-api-key>"
Define tools
We first need to create the tools we want to use. We will use two tools:Tavily (to search online) and then a retriever over a
local index we will create
Tavily
We have a built-in tool in LangChain to easily use Tavily search engine as tool. Note that this requires an API key - they have
a free tier, but if you don’t have one or don’t want to create one, you can always ignore this step.
Once you create your API key, you will need to export that as:
export TAVILY_API_KEY="..."
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults()
search.invoke("what is the weather in SF")
[{'url': 'https://www.metoffice.gov.uk/weather/forecast/9q8yym8kr',
'content': 'Thu 11 Jan Thu 11 Jan Seven day forecast for San Francisco San Francisco (United States of America) weather Find a forecast Sat 6 Jan Sat 6 Jan Sun
{'url': 'https://www.latimes.com/travel/story/2024-01-11/east-brother-light-station-lighthouse-california',
'content': "May 18, 2023 Jan. 4, 2024 Subscribe for unlimited accessSite Map Follow Us MORE FROM THE L.A. TIMES Jan. 8, 2024 Travel & Experiences This m
Retriever
We will also create a retriever over some data of our own. For a deeper explanation of each step here, seethis section
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = WebBaseLoader("https://docs.smith.langchain.com/overview")
docs = loader.load()
documents = RecursiveCharacterTextSplitter(
chunk_size=1000, chunk_overlap=200
).split_documents(docs)
vector = FAISS.from_documents(documents, OpenAIEmbeddings())
retriever = vector.as_retriever()
retriever.get_relevant_documents("how to upload a dataset")[0]
Document(page_content="dataset uploading.Once we have a dataset, how can we use it to test changes to a prompt or chain? The most basic approach is to run the
Now that we have populated our index that we will do doing retrieval over, we can easily turn it into a tool (the format needed
for an agent to properly use it)
Tools
Now that we have created both, we can create a list of tools that we will use downstream.
Now that we have defined the tools, we can create the agent. We will be using an OpenAI Functions agent - for more
information on this type of agent, as well as other options, see this guide
If you want to see the contents of this prompt and have access to LangSmith, you can go to:
https://smith.langchain.com/hub/hwchase17/openai-functions-agent
Now, we can initalize the agent with the LLM, the prompt, and the tools. The agent is responsible for taking in input and
deciding what actions to take. Crucially, the Agent does not execute those actions - that is done by the AgentExecutor (next
step). For more information about how to think about these components, see our conceptual guide
Finally, we combine the agent (the brains) with the tools inside the AgentExecutor (which will repeatedly call the agent and
execute tools). For more information about how to think about these components, see our conceptual guide
We can now run the agent on a few queries! Note that for now, these are allstateless queries (it won’t remember previous
interactions).
agent_executor.invoke({"input": "hi!"})
1. Tracing: LangSmith provides tracing capabilities that can be used to monitor and debug your application during testing. You can log all traces, visualize latency and
2. Evaluation: LangSmith allows you to quickly edit examples and add them to datasets to expand the surface area of your evaluation sets. This can help you test and
3. Monitoring: Once your application is ready for production, LangSmith can be used to monitor your application. You can log feedback programmatically with runs, tra
4. Rigorous Testing: When your application is performing well and you want to be more rigorous about testing changes, LangSmith can simplify the process. You can
For more detailed information on how to use LangSmith for testing, you can refer to the [LangSmith Overview and User Guide](https://docs.smith.langchain.com/over
[{'url': 'https://www.whereandwhen.net/when/north-america/california/san-francisco-ca/january/', 'content': 'Best time to go to San Francisco? Weather in San Francisc
Adding in memory
As mentioned earlier, this agent is stateless. This means it does not remember previous interactions. To give it memory we
need to pass in previous chat_history. Note: it needs to be called chat_history because of the prompt we are using. If we use a
different prompt, we could change the variable name
# Here we pass in an empty list of messages for chat_history because it is the first message in the chat
agent_executor.invoke({"input": "hi! my name is bob", "chat_history": []})
If we want to keep track of these messages automatically, we can wrap this in a RunnableWithMessageHistory. For more
information on how to use this, see this guide
Conclusion
That’s a wrap! In this quick start we covered how to create a simple agent. Agents are a complex topic, and there’s lot to
learn! Head back to the main agent page to find more resources on conceptual guides, different types of agents, how to
create custom tools, and more!
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Tracking token
ModulesI/O Chat Modelsusage
Let’s first look at an extremely simple example of tracking token usage for a single Chat model call.
Anything inside the context manager will get tracked. Here’s an example of using it to track multiple calls in sequence.
If a chain or agent with multiple steps in it is used, it will track all those steps.
['Things are looking golden for Olivia Wilde, as the actress has jumped back into the dating pool following her split from Harry Styles — read ...', "“I did not want servic
Invoking: `Search` with `Harry Styles current age`
responded: Olivia Wilde's current boyfriend is Harry Styles. Let me find out his age for you.
29 years
Invoking: `Calculator` with `29 ^ 0.23`
Answer: 2.169459462491557Harry Styles' current age (29 years) raised to the 0.23 power is approximately 2.17.
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation Token
ModulesMoreMemorytypes Buffer
On this page
llm = OpenAI()
memory = ConversationTokenBufferMemory(llm=llm, max_token_limit=10)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
memory.load_memory_variables({})
{'history': 'Human: not much you\nAI: not much'}
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
memory = ConversationTokenBufferMemory(
llm=llm, max_token_limit=10, return_messages=True
)
memory.save_context({"input": "hi"}, {"output": "whats up"})
memory.save_context({"input": "not much you"}, {"output": "not much"})
Using in a chain
Let’s walk through an example, again setting verbose=True so we can see the prompt.
conversation_with_summary = ConversationChain(
llm=llm,
# We set a very low max_token_limit for the purposes of testing.
memory=ConversationTokenBufferMemory(llm=OpenAI(), max_token_limit=60),
verbose=True,
)
conversation_with_summary.predict(input="Hi, what's up?")
Current conversation:
" Hi there! I'm doing great, just enjoying the day. How about you?"
conversation_with_summary.predict(input="Just working on writing some documentation!")
> Entering new ConversationChain chain...
Prompt after formatting:
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know th
Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great, just enjoying the day. How about you?
Human: Just working on writing some documentation!
AI:
' Sounds like a productive day! What kind of documentation are you writing?'
conversation_with_summary.predict(input="For LangChain! Have you heard of it?")
Current conversation:
Human: Hi, what's up?
AI: Hi there! I'm doing great, just enjoying the day. How about you?
Human: Just working on writing some documentation!
AI: Sounds like a productive day! What kind of documentation are you writing?
Human: For LangChain! Have you heard of it?
AI:
" Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentation y
Current conversation:
Human: For LangChain! Have you heard of it?
AI: Yes, I have heard of LangChain! It is a decentralized language-learning platform that connects native speakers and learners in real time. Is that the documentatio
Human: Haha nope, although a lot of people confuse it for that
AI:
" Oh, I see. Is there another language learning platform you're referring to?"
Previous
« Conversation Summary Buffer
Next
Backed by a Vector Store »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language Streaming
On this page
Important LangChain primitives like LLMs, parsers, prompts, retrievers, and agents implement the LangChainRunnable
Interface.
1. sync stream and async astream: a default implementation of streaming that streams the final output from the chain.
2. async astream_events and async astream_log: these provide a way to stream bothintermediate steps and final output
from the chain.
Let’s take a look at both approaches, and try to understand how to use them.
Using Stream
All Runnable objects implement a sync method called stream and an async variant called astream.
These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.
Streaming is only possible if all steps in the program know how to process aninput stream; i.e., process an input chunk one
at a time, and yield a corresponding output chunk.
The complexity of this processing can vary, from straightforward tasks like emitting tokens produced by an LLM, to more
challenging ones like streaming parts of JSON results before the entire JSON is complete.
The best place to start exploring streaming is with the single most important components in LLMs apps– the LLMs
themselves!
Large language models and their chat variants are the primary bottleneck in LLM based apps.
Large language models can take several seconds to generate a complete response to a query. This is far slower than the
~200-300 ms threshold at which an application feels responsive to an end user.
The key strategy to make the application feel more responsive is to show intermediate progress; viz., to stream the output
from the model token by token.
We will show examples of streaming using the chat model fromAnthropic. To use the model, you will need to install the
langchain-anthropic package. You can do this with the following command:
model = ChatAnthropic()
chunks = []
async for chunk in model.astream("hello. tell me something about yourself"):
chunks.append(chunk)
print(chunk.content, end="|", flush=True)
Hello|!| My| name| is| Claude|.| I|'m| an| AI| assistant| created| by| An|throp|ic| to| be| helpful|,| harmless|,| and| honest|.||
Let’s inspect one of the chunks
chunks[0]
AIMessageChunk(content=' Hello')
We got back something called an AIMessageChunk . This chunk represents a part of an AIMessage.
Message chunks are additive by design – one can simply add them up to get the state of the response so far!
Chains
Virtually all LLM applications involve more steps than just a call to a language model.
Let’s build a simple chain using LangChain Expression Language (LCEL) that combines a prompt, model and a parser and verify that
streaming works.
We will use StrOutputParser to parse the output from the model. This is a simple parser that extracts thecontent field from an
AIMessageChunk , giving us the token returned by the model.
TIP
LCEL is a declarative way to specify a “program” by chainining together different LangChain primitives. Chains created using
LCEL benefit from an automatic implementation of stream and astream allowing streaming of the final output. In fact, chains
created with LCEL implement the entire standard Runnable interface.
What| kind| of| teacher| gives| good| advice|?| An| ap|-|parent| (|app|arent|)| one|!||
NOTE
You do not have to use the LangChain Expression Language to use LangChain and can instead rely on a standard imperative
programming approach by caling invoke , batch or stream on each component individually, assigning the results to variables and
then using them downstream as you see fit.
What if you wanted to stream JSON from the output as it was being generated?
If you were to rely on json.loads to parse the partial json, the parsing would fail as the partial json wouldn’t be valid json.
You’d likely be at a complete loss of what to do and claim that it wasn’t possible to stream JSON.
Well, turns out there is a way to do it – the parser needs to operate on theinput stream, and attempt to “auto-complete” the
partial json into a valid state.
chain = (
model | JsonOutputParser()
) # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models
async for text in chain.astream(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
):
print(text, flush=True)
{}
{'countries': []}
{'countries': [{}]}
{'countries': [{'name': ''}]}
{'countries': [{'name': 'France'}]}
{'countries': [{'name': 'France', 'population': 67}]}
{'countries': [{'name': 'France', 'population': 6739}]}
{'countries': [{'name': 'France', 'population': 673915}]}
{'countries': [{'name': 'France', 'population': 67391582}]}
{'countries': [{'name': 'France', 'population': 67391582}, {}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Sp'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 4675}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 467547}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan'}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 12}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 12647}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 1264764}]}
{'countries': [{'name': 'France', 'population': 67391582}, {'name': 'Spain', 'population': 46754778}, {'name': 'Japan', 'population': 126476461}]}
Now, let’s break streaming. We’ll use the previous example and append an extraction function at the end that extracts the
country names from the finalized JSON.
DANGER
Any steps in the chain that operate on finalized inputs rather than on input streams can break streaming functionality via
stream or astream.
TIP
Later, we will discuss the astream_events API which streams results from intermediate steps. This API will stream results from
intermediate steps even if the chain contains steps that only operate on finalized inputs.
countries = inputs["countries"]
country_names = [
country.get("name") for country in countries if isinstance(country, dict)
]
return country_names
Generator Functions
Le’ts fix the streaming using a generator function that can operate on theinput stream.
TIP
A generator function (a function that uses yield) allows writing code that operators on input streams
countries = input["countries"]
France|Sp|Spain|Japan|
NOTE
Because the code above is relying on JSON auto-completion, you may see partial names of countries (e.g.,Sp and Spain),
which is not what one would want for an extraction result!
We’re focusing on streaming concepts, not necessarily the results of the chains.
Non-streaming components
Some built-in components like Retrievers do not offer any streaming. What happens if we try tostream them?
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
vectorstore = FAISS.from_texts(
["harrison worked at kensho", "harrison likes spicy food"],
embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()
This is OK ! Not all components have to implement streaming – in some cases streaming is either unnecessary, difficult or
just doesn’t make sense.
TIP
An LCEL chain constructed using non-streaming components, will still be able to stream in a lot of cases, with streaming of
partial output starting after the last non-streaming step in the chain.
retrieval_chain = (
{
"context": retriever.with_config(run_name="Docs"),
"question": RunnablePassthrough(),
}
| prompt
| model
| StrOutputParser()
)
for chunk in retrieval_chain.stream(
"Where did harrison work? " "Write 3 made up sentences about this place."
):
print(chunk, end="|", flush=True)
Based| on| the| given| context|,| the| only| information| provided| about| where| Harrison| worked| is| that| he| worked| at| Ken|sh|o|.| Since| there| are| no| other| detai
Now that we’ve seen how stream and astream work, let’s venture into the world of streaming events. ️
Event Streaming is a beta API. This API may change a bit based on feedback.
NOTE
import langchain_core
langchain_core.__version__
'0.1.18'
Use async throughout the code to the extent possible (e.g., async tools etc)
Propagate callbacks if defining custom functions / runnables
Whenever using runnables without LCEL, make sure to call.astream() on LLMs rather than .ainvoke to force the LLM to
stream tokens.
Let us know if anything doesn’t work as expected! :)
Event Reference
Below is a reference table that shows some events that might be emitted by the various Runnable objects.
NOTE
When streaming is implemented properly, the inputs to a runnable will not be known until after the input stream has been
entirely consumed. This means that inputs will often be included only for end events and rather than for start events.
event name chunk input output
{“messages”:
on_chat_model_start [model name] [[SystemMessage,
HumanMessage]]}
on_chat_model_stream [model name] AIMessageChunk(content=“hello”)
{“messages”:
{“generations”: […],
on_chat_model_end [model name] [[SystemMessage,
“llm_output”: None, …}
HumanMessage]]}
on_llm_start [model name] {‘input’: ‘hello’}
on_llm_stream [model name] ‘Hello’
on_llm_end [model name] ‘Hello human!’
on_chain_start format_docs
on_chain_stream format_docs “hello world!, goodbye world!”
“hello world!, goodbye
on_chain_end format_docs [Document(…)]
world!”
on_tool_start some_tool {“x”: 1, “y”: “2”}
on_tool_stream some_tool {“x”: 1, “y”: “2”}
on_tool_end some_tool {“x”: 1, “y”: “2”}
on_retriever_start [retriever name] {“query”: “hello”}
on_retriever_chunk [retriever name] {documents: […]}
on_retriever_end [retriever name] {“query”: “hello”} {documents: […]}
on_prompt_start [template_name] {“question”: “hello”}
ChatPromptValue(messages:
on_prompt_end [template_name] {“question”: “hello”}
[SystemMessage, …])
Chat Model
events = []
async for event in model.astream_events("hello", version="v1"):
events.append(event)
/home/eugene/src/langchain/libs/core/langchain_core/_api/beta_decorator.py:86: LangChainBetaWarning: This API is in beta and may change in the future.
warn_beta(
NOTE
This is a beta API, and we’re almost certainly going to make some changes to it.
This version parameter will allow us to mimimize such breaking changes to your code.
In short, we are annoying you now, so we don’t have to annoy you later.
Let’s take a look at the few of the start event and a few of the end events.
events[:3]
[{'event': 'on_chat_model_start',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'name': 'ChatAnthropic',
'tags': [],
'metadata': {},
'data': {'input': 'hello'}},
{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content=' Hello')}},
{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content='!')}}]
events[-2:]
[{'event': 'on_chat_model_stream',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'name': 'ChatAnthropic',
'data': {'chunk': AIMessageChunk(content='')}},
{'event': 'on_chat_model_end',
'name': 'ChatAnthropic',
'run_id': '555843ed-3d24-4774-af25-fbf030d5e8c4',
'tags': [],
'metadata': {},
'data': {'output': AIMessageChunk(content=' Hello!')}}]
Chain
Let’s revisit the example chain that parsed streaming JSON to explore the streaming events API.
chain = (
model | JsonOutputParser()
) # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models
events = [
event
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of cou
version="v1",
)
]
If you examine at the first few events, you’ll notice that there are3 different start events rather than 2 start events.
events[:3]
[{'event': 'on_chain_start',
'run_id': 'b1074bff-2a17-458b-9e7b-625211710df4',
'name': 'RunnableSequence',
'tags': [],
'metadata': {},
'data': {'input': 'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains
{'event': 'on_chat_model_start',
'name': 'ChatAnthropic',
'run_id': '6072be59-1f43-4f1c-9470-3b92e8406a99',
'tags': ['seq:step:1'],
'metadata': {},
'data': {'input': {'messages': [[HumanMessage(content='output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an
{'event': 'on_parser_start',
'name': 'JsonOutputParser',
'run_id': 'bf978194-0eda-4494-ad15-3a5bfe69cd59',
'tags': ['seq:step:2'],
'metadata': {},
'data': {}}]
What do you think you’d see if you looked at the last 3 events? what about the middle?
Let’s use this API to take output the stream events from the model and the parser. We’re ignoring start events, end events
and events from the chain.
num_events = 0
Because both the model and the parser support streaming, we see sreaming events from both components in real time! Kind
of cool isn’t it?
Filtering Events
Because this API produces so many events, it is useful to be able to filter on events.
You can filter by either component name, component tags or component type .
By Name
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
{"run_name": "my_parser"}
)
max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_names=["my_parser"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break
{'event': 'on_parser_start', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {}}}
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': []
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
{'event': 'on_parser_stream', 'name': 'my_parser', 'run_id': 'f2ac1d1c-e14a-45fc-8990-e5c24e707299', 'tags': ['seq:step:2'], 'metadata': {}, 'data': {'chunk': {'countries': [{
...
By Type
chain = model.with_config({"run_name": "model"}) | JsonOutputParser().with_config(
{"run_name": "my_parser"}
)
max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_types=["chat_model"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break
{'event': 'on_chat_model_start', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'input': {'messages': [[H
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
{'event': 'on_chat_model_stream', 'name': 'model', 'run_id': '98a6e192-8159-460c-ba73-6dfc921e3777', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': AIMessageC
...
By Tags
CAUTION
If you’re using tags to filter, make sure that this is what you want.
max_events = 0
async for event in chain.astream_events(
'output a list of the countries france, spain and japan and their populations in JSON format. Use a dict with an outer key of "countries" which contains a list of count
version="v1",
include_tags=["my_chain"],
):
print(event)
max_events += 1
if max_events > 10:
# Truncate output
print("...")
break
{'event': 'on_chain_start', 'run_id': '190875f3-3fb7-49ad-9b6e-f49da22f3e49', 'name': 'RunnableSequence', 'tags': ['my_chain'], 'metadata': {}, 'data': {'input': 'output a li
{'event': 'on_chat_model_start', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'input
{'event': 'on_parser_start', 'name': 'JsonOutputParser', 'run_id': '3b5e4ca1-40fe-4a02-9a19-ba2a43a6115c', 'tags': ['seq:step:2', 'my_chain'], 'metadata': {}, 'data': {}}
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
{'event': 'on_chat_model_stream', 'name': 'ChatAnthropic', 'run_id': 'ff58f732-b494-4ff9-852a-783d42f4455d', 'tags': ['seq:step:1', 'my_chain'], 'metadata': {}, 'data': {'ch
...
Non-streaming components
Remember how some components don’t stream well because they don’t operate oninput streams?
While such components can break streaming of the final output when usingastream, astream_events will still yield streaming
events from intermediate steps that support streaming!
countries = inputs["countries"]
country_names = [
country.get("name") for country in countries if isinstance(country, dict)
]
return country_names
chain = (
model | JsonOutputParser() | _extract_country_names
) # This parser only works with OpenAI right now
As expected, the astream API doesn’t work correctly because _extract_country_names doesn’t operate on streams.
Now, let’s confirm that with astream_events we’re still seeing streaming output from the model and the parser.
num_events = 0
Propagating Callbacks
CAUTION
If you’re using invoking runnables inside your tools, you need to propagate callbacks to the runnable; otherwise, no stream
events will be generated.
NOTE
When using RunnableLambdas or @chain decorator, callbacks are propagated automatically behind the scenes.
reverse_word = RunnableLambda(reverse_word)
@tool
def bad_tool(word: str):
"""Custom tool that doesn't propagate callbacks."""
return reverse_word.invoke(word)
Here’s a re-implementation that does propagate callbacks correctly. You’ll notice that now we’re getting events from the
reverse_word runnable as well.
@tool
def correct_tool(word: str, callbacks):
"""A tool that correctly propagates callbacks."""
return reverse_word.invoke(word, {"callbacks": callbacks})
If you’re invoking runnables from within Runnable Lambdas or @chains, then callbacks will be passed automatically on your
behalf.
reverse_and_double = RunnableLambda(reverse_and_double)
await reverse_and_double.ainvoke("1234")
@chain
async def reverse_and_double(word: str):
return await reverse_word.ainvoke(word) * 2
await reverse_and_double.ainvoke("1234")
Previous
« Interface
Next
How to »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsAgent TypesReAct
On this page
ReAct
This walkthrough showcases using an agent to implement theReAct logic.
Initialize tools
tools = [TavilySearchResults(max_results=1)]
Create Agent
Run Agent
When using with chat history, we will need a prompt that takes that into account
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/react-chat")
# Construct the ReAct agent
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
from langchain_core.messages import AIMessage, HumanMessage
agent_executor.invoke(
{
"input": "what's my name? Only use a tool if needed, otherwise respond with Final Answer",
# Notice that chat_history is a string, since this prompt is aimed at LLMs, not chat models
"chat_history": "Human: Hi! My name is Bob\nAI: Hello Bob! Nice to meet you",
}
)
Previous
« Structured chat
Next
Self-ask with search »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model
ModulesI/O Concepts
On this page
Concepts
The core element of any language model application is...the model. LangChain gives you the building blocks to interface with
any language model. Everything in this section is about making it easier to work with models. This largely involves a clear
interface for what a model is, helper utils for constructing inputs to models, and helper utils for working with the outputs of
models.
Models
There are two main types of models that LangChain integrates with: LLMs and Chat Models. These are defined by their input
and output types.
LLMs
LLMs in LangChain refer to pure text completion models. The APIs they wrap take a string prompt as input and output a
string completion. OpenAI's GPT-3 is implemented as an LLM.
Chat Models
Chat models are often backed by LLMs but tuned specifically for having conversations. Crucially, their provider APIs use a
different interface than pure text completion models. Instead of a single string, they take a list of chat messages as input and
they return an AI message as output. See the section below for more details on what exactly a message consists of. GPT-4
and Anthropic's Claude-2 are both implemented as chat models.
Considerations
These two API types have pretty different input and output schemas. This means that best way to interact with them may be
quite different. Although LangChain makes it possible to treat them interchangeably, that doesn't mean you should. In
particular, the prompting strategies for LLMs vs ChatModels may be quite different. This means that you will want to make
sure the prompt you are using is designed for the model type you are working with.
Additionally, not all models are the same. Different models have different prompting strategies that work best for them. For
example, Anthropic's models work best with XML while OpenAI's work best with JSON. This means that the prompt you use
for one model may not transfer to other ones. LangChain provides a lot of default prompts, however these are not guaranteed
to work well with the model are you using. Historically speaking, most prompts work well with OpenAI but are not heavily
tested on other models. This is something we are working to address, but it is something you should keep in mind.
Messages
ChatModels take a list of messages as input and return a message. There are a few different types of messages. All
messages have a role and a content property. The role describes WHO is saying the message. LangChain has different
message classes for different roles. The content property describes the content of the message. This can be a few different
things:
In addition, messages have an additional_kwargs property. This is where additional information about messages can be passed.
This is largely used for input parameters that are provider specific and not general. The best known example of this is
function_call from OpenAI.
HumanMessage
This represents a message from the user. Generally consists only of content.
AIMessage
This represents a message from the model. This may haveadditional_kwargs in it - for example functional_call if using OpenAI
Function calling.
SystemMessage
This represents a system message. Only some models support this. This tells the model how to behave. This generally only
consists of content.
FunctionMessage
This represents the result of a function call. In addition torole and content, this message has a name parameter which conveys
the name of the function that was called to produce this result.
ToolMessage
This represents the result of a tool call. This is distinct from a FunctionMessage in order to match OpenAI'sfunction and tool
message types. In addition to role and content, this message has a tool_call_id parameter which conveys the id of the call to the
tool that was called to produce this result.
Prompts
The inputs to language models are often called prompts. Oftentimes, the user input from your app is not the direct input to
the model. Rather, their input is transformed in some way to produce the string or list of messages that does go into the
model. The objects that take user input and transform it into the final string or messages are known as "Prompt Templates".
LangChain provides several abstractions to make working with prompts easier.
PromptValue
ChatModels and LLMs take different input types. PromptValue is a class designed to be interoperable between the two. It
exposes a method to be cast to a string (to work with LLMs) and another to be cast to a list of messages (to work with
ChatModels).
PromptTemplate
This is an example of a prompt template. This consists of a template string. This string is then formatted with user inputs to
produce a final string.
MessagePromptTemplate
This is an example of a prompt template. This consists of a templatemessage - meaning a specific role and a
PromptTemplate. This PromptTemplate is then formatted with user inputs to produce a final string that becomes the content of
this message.
HumanMessagePromptTemplate
AIMessagePromptTemplate
SystemMessagePromptTemplate
MessagesPlaceholder
Oftentimes inputs to prompts can be a list of messages. This is when you would use a MessagesPlaceholder. These objects
are parameterized by a variable_name argument. The input with the same value as thisvariable_name value should be a list of
messages.
ChatPromptTemplate
This is an example of a prompt template. This consists of a list of MessagePromptTemplates or MessagePlaceholders. These
are then formatted with user inputs to produce a final list of messages.
Output Parsers
The output of models are either strings or a message. Oftentimes, the string or messages contains information formatted in a
specific format to be used downstream (e.g. a comma separated list, or JSON blob). Output parsers are responsible for
taking in the output of a model and transforming it into a more usable form. These generally work on the content of the output
message, but occasionally work on values in the additional_kwargs field.
StrOutputParser
This is a simple output parser that just converts the output of a language model (LLM or ChatModel) into a string. If the model
is an LLM (and therefore outputs a string) it just passes that string through. If the output is a ChatModel (and therefore
outputs a message) it passes through the .content attribute of the message.
There are a few parsers dedicated to working with OpenAI function calling. They take the output of thefunction_call and
arguments parameters (which are inside additional_kwargs) and work with those, largely ignoring content.
Agents are systems that use language models to determine what steps to take. The output of a language model therefore
needs to be parsed into some schema that can represent what actions (if any) are to be taken. AgentOutputParsers are
responsible for taking raw LLM or ChatModel output and converting it to that schema. The logic inside these output parsers
can differ depending on the model and prompting strategy being used.
Previous
« Quickstart
Next
Prompts »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Document
ModulesRetrievalloaders CSV
On this page
CSV
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of
the file is a data record. Each record consists of one or more fields, separated by commas.
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
data = loader.load()
print(data)
[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row
See the csv module documentation for more information of what csv args are supported.
data = loader.load()
print(data)
[Document(page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"', lookup_str='', metadata={'source': './example_data/mlb_teams
Use the source_column argument to specify a source for the document created from each row. Otherwisefile_path will be used
as the source for all documents created from the CSV file.
This is useful when using documents loaded from CSV files for chains that answer questions using sources.
data = loader.load()
print(data)
[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': 'Nationals', 'row': 0}, lookup_index=0), Docu
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Custom Chat
ModulesI/O Chat ModelsModel
On this page
Wrapping your LLM with the standardChatModel interface allow you to use your LLM in existing LangChain programs with
minimal code modifications!
As an bonus, your LLM will automatically become a LangChainRunnable and will benefit from some optimizations out of the
box (e.g., batch via a threadpool), async support, the astream_events API, etc.
First, we need to talk about messages which are the inputs and outputs of chat models.
Messages
SystemMessage: Used for priming AI behavior, usually passed in as the first of a sequence of input messages.
HumanMessage : Represents a message from a person interacting with the chat model.
AIMessage: Represents a message from the chat model. This can be either text or a request to invoke a tool.
FunctionMessage / ToolMessage: Message for passing the results of tool invocation back to the model.
NOTE
ToolMessage and FunctionMessage closely follow OpenAIs function and tool arguments.
This is a rapidly developing field and as more models add function calling capabilities, expect that there will be additions to
this schema.
Streaming Variant
All the chat messages have a streaming variant that contains Chunk in the name.
These chunks are used when streaming output from chat models, and they all define an additive property!
It won’t allow you to implement all features that you might want out of a chat model, but it’s quick to implement, and if you
need more you can transition to BaseChatModel shown below.
Let’s implement a chat model that echoes back the last n characters of the prompt!
Optional:
Let’s implement a chat model that echoes back the first n characetrs of the last message in the prompt!
To do so, we will inherit from BaseChatModel and we’ll need to implement the following methods/properties:
To do so inherit from BaseChatModel which is a lower level class and implement the methods:
Optional:
CAUTION
Currently, to get async streaming to work (viaastream), you must provide an implementation of _astream.
By default if _astream is not provided, then async streaming falls back on _agenerate which does not support token by token
streaming.
Implementation
from typing import Any, AsyncIterator, Dict, Iterator, List, Optional
class CustomChatModelAdvanced(BaseChatModel):
"""A custom chat model that echoes the first `n` characters of the input.
Example:
.. code-block:: python
model = CustomChatModel(n=2)
result = model.invoke([HumanMessage(content="hello")])
result = model.batch([[HumanMessage(content="hello")],
[HumanMessage(content="world")]])
"""
n: int
"""The number of characters from the last message of the prompt to be echoed."""
def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Override the _generate method to implement the chat model logic.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = last_message.content[: self.n]
message = AIMessage(content=tokens)
generation = ChatGeneration(message=message)
return ChatResult(generations=[generation])
def _stream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[ChatGenerationChunk]:
"""Stream the output of the model.
Args:
messages: the prompt composed of a list of messages.
stop: a list of strings on which the model should stop generating.
If generation stops due to a stop token, the stop token itself
SHOULD BE INCLUDED as part of the output. This is not enforced
across models right now, but it's a good practice to follow since
it makes it much easier to parse the output of the model
downstream and understand why generation stopped.
run_manager: A run manager with callbacks for the LLM.
"""
last_message = messages[-1]
tokens = last_message.content[: self.n]
if run_manager:
run_manager.on_llm_new_token(token, chunk=chunk)
yield chunk
@property
def _llm_type(self) -> str:
"""Get the type of language model used by this chat model."""
return "echoing-chat-model-advanced"
@property
def _identifying_params(self) -> Dict[str, Any]:
"""Return a dictionary of identifying parameters."""
return {"n": self.n}
TIP
The _astream implementation uses run_in_executor to launch the sync _stream in a separate thread.
You can use this trick if you want to reuse the_stream implementation, but if you’re able to implement code that’s natively
async that’s a better solution since that code will run with less overhead.
Let’s test it
The chat model will implement the standard Runnable interface of LangChain which many of the LangChain abstractions
support!
model = CustomChatModelAdvanced(n=3)
model.invoke(
[
HumanMessage(content="hello!"),
AIMessage(content="Hi there human!"),
HumanMessage(content="Meow!"),
]
)
AIMessage(content='Meo')
model.invoke("hello")
AIMessage(content='hel')
model.batch(["hello", "goodbye"])
[AIMessage(content='hel'), AIMessage(content='goo')]
for chunk in model.stream("cat"):
print(chunk.content, end="|")
c|a|t|
Please see the implementation of _astream in the model! If you do not implement it, then no output will stream.!
Let’s try to use the astream events API which will also help double check that all the callbacks were implemented!
Identifying Params
LangChain has a callback system which allows implementing loggers to monitor the behavior of LLM applications.
It’s passed to the callback system and is accessible for user specified loggers.
Below we’ll implement a handler with just a single on_chat_model_start event to see where _identifying_params appears.
class SampleCallbackHandler(AsyncCallbackHandler):
"""Async callback handler that handles callbacks from LangChain."""
Contributing
Here’s a checklist to help make sure your contribution gets added to LangChain:
Documentation:
The model contains doc-strings for all initialization arguments, as these will be surfaced in theAPIReference.
The class doc-string for the model contains a link to the model API if the model is powered by a service.
Tests:
☐ Add unit or integration tests to the overridden methods. Verify thatinvoke , ainvoke, batch, stream work if you’ve over-
ridden the corresponding code.
☐ If your model connects to an API it will likely accept API keys as part of its initialization. Use Pydantic’sSecretStr type
for secrets, so they don’t get accidentally printed out when folks print the model.
Previous
« Caching
Next
Get log probabilities »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Text Recursively split
ModulesRetrievalSplitters JSON
import requests
# This is a large nested json object and will be loaded as a python dict
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()
from langchain_text_splitters import RecursiveJsonSplitter
splitter = RecursiveJsonSplitter(max_chunk_size=300)
# Recursively split json data - If you need to access/manipulate the smaller json chunks
json_chunks = splitter.split_json(json_data=json_data)
# The splitter can also output documents
docs = splitter.create_documents(texts=[json_data])
# or a list of strings
texts = splitter.split_text(json_data=json_data)
print(texts[0])
print(texts[1])
{"openapi": "3.0.2", "info": {"title": "LangChainPlus", "version": "0.1.0"}, "paths": {"/sessions/{session_id}": {"get": {"tags": ["tracer-sessions"], "summary": "Read Tracer S
{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"}, "name": "session_id", "in":
# Reviewing one of these chunks that was bigger we see there is a list object there
print(texts[1])
[293, 431, 203, 277, 230, 194, 162, 280, 223, 193]
{"paths": {"/sessions/{session_id}": {"get": {"parameters": [{"required": true, "schema": {"title": "Session Id", "type": "string", "format": "uuid"}, "name": "session_id", "in":
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Partial prompt
ModulesI/O Prompts templates
On this page
LangChain supports this in two ways: 1. Partial formatting with string values. 2. Partial formatting with functions that return
string values.
These two different ways support different use cases. In the examples below, we go over the motivations for both use cases
as well as how to do it in LangChain.
One common use case for wanting to partial a prompt template is if you get some of the variables before others. For
example, suppose you have a prompt template that requires two variables, foo and baz. If you get the foo value early on in the
chain, but the baz value later, it can be annoying to wait until you have both variables in the same place to pass them to the
prompt template. Instead, you can partial the prompt template with the foo value, and then pass the partialed prompt template
along and just use that. Below is an example of doing this:
prompt = PromptTemplate.from_template("{foo}{bar}")
partial_prompt = prompt.partial(foo="foo")
print(partial_prompt.format(bar="baz"))
foobaz
You can also just initialize the prompt with the partialed variables.
prompt = PromptTemplate(
template="{foo}{bar}", input_variables=["bar"], partial_variables={"foo": "foo"}
)
print(prompt.format(bar="baz"))
foobaz
The other common use is to partial with a function. The use case for this is when you have a variable you know that you
always want to fetch in a common way. A prime example of this is with date or time. Imagine you have a prompt which you
always want to have the current date. You can’t hard code it in the prompt, and passing it along with the other input variables
is a bit annoying. In this case, it’s very handy to be able to partial the prompt with a function that always returns the current
date.
def _get_datetime():
now = datetime.now()
return now.strftime("%m/%d/%Y, %H:%M:%S")
prompt = PromptTemplate(
template="Tell me a {adjective} joke about the day {date}",
input_variables=["adjective", "date"],
)
partial_prompt = prompt.partial(date=_get_datetime)
print(partial_prompt.format(adjective="funny"))
Tell me a funny joke about the day 12/27/2023, 10:45:22
You can also just initialize the prompt with the partialed variables, which often makes more sense in this workflow.
prompt = PromptTemplate(
template="Tell me a {adjective} joke about the day {date}",
input_variables=["adjective"],
partial_variables={"date": _get_datetime},
)
print(prompt.format(adjective="funny"))
Tell me a funny joke about the day 12/27/2023, 10:45:36
Previous
« Types of `MessagePromptTemplate`
Next
Pipeline »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression How
Language to
How to
️ Add fallbacks
There are many possible points of failure in an LLM application, whether
Previous
« Streaming
Next
RunnableParallel: Manipulating data »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
JSON Chat
ModulesAgentsAgent TypesAgent
On this page
Initialize Tools
tools = [TavilySearchResults(max_results=1)]
Create Agent
Run Agent
agent_executor.invoke(
{
"input": "what's my name?",
"chat_history": [
HumanMessage(content="hi! my name is bob"),
AIMessage(content="Hello Bob! How can I assist you today?"),
],
}
)
Previous
« XML Agent
Next
Structured chat »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Memory Conversation
ModulesMoreMemorytypes Summary
On this page
Conversation Summary
Now let's take a look at using a slightly more complex type of memory -ConversationSummaryMemory. This type of memory
creates a summary of the conversation over time. This can be useful for condensing information from the conversation over
time. Conversation summary memory summarizes the conversation as it happens and stores the current summary in
memory. This memory can then be used to inject the summary of the conversation so far into a prompt/chain. This memory is
most useful for longer conversations, where keeping the past message history in the prompt verbatim would take up too
many tokens.
We can also get the history as a list of messages (this is useful if you are using this with a chat model).
messages = memory.chat_memory.messages
previous_summary = ""
memory.predict_new_summary(messages, previous_summary)
'\nThe human greets the AI, to which the AI responds.'
If you have messages outside this class, you can easily initialize the class withChatMessageHistory . During loading, a summary
will be calculated.
history = ChatMessageHistory()
history.add_user_message("hi")
history.add_ai_message("hi there!")
memory = ConversationSummaryMemory.from_messages(
llm=OpenAI(temperature=0),
chat_memory=history,
return_messages=True
)
memory.buffer
'\nThe human greets the AI, to which the AI responds with a friendly greeting.'
Optionally you can speed up initialization using a previously generated summary, and avoid regenerating the summary by
just initializing directly.
memory = ConversationSummaryMemory(
llm=OpenAI(temperature=0),
buffer="The human asks what the AI thinks of artificial intelligence. The AI thinks artificial intelligence is a force for good because it will help humans reach their full
chat_memory=history,
return_messages=True
)
Using in a chain
Let's walk through an example of using this in a chain, again settingverbose=True so we can see the prompt.
Current conversation:
" Hi there! I'm doing great. I'm currently helping a customer with a technical issue. How about you?"
Current conversation:
The human greeted the AI and asked how it was doing. The AI replied that it was doing great and was currently helping a customer with a technical issue.
Human: Tell me more about it!
AI:
" Sure! The customer is having trouble with their computer not connecting to the internet. I'm helping them troubleshoot the issue and figure out what the problem i
Current conversation:
The human greeted the AI and asked how it was doing. The AI replied that it was doing great and was currently helping a customer with a technical issue where th
Human: Very cool -- what is the scope of the project?
AI:
" The scope of the project is to troubleshoot the customer's computer issue and find a solution that will allow them to connect to the internet. We are currently explo
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
ModulesAgentsToolsToolkits
Toolkits
Toolkits are collections of tools that are designed to be used together for specific tasks and have convenient loading
methods. For a complete list of these, visit Integrations.
All Toolkits expose a get_tools method which returns a list of tools. You can therefore do:
# Initialize a toolkit
toolkit = ExampleTookit(...)
# Create agent
agent = create_agent_method(llm, tools, prompt)
Previous
« Tools
Next
Defining Custom Tools »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
LangChain Expression
Language CookbookRAG
On this page
RAG
Let’s look at adding in a retrieval step to a prompt and LLM, which adds up to a “retrieval-augmented generation” chain
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
chain.invoke("where did harrison work?")
'Harrison worked at Kensho.'
template = """Answer the question based only on the following context:
{context}
Question: {question}
chain = (
{
"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"language": itemgetter("language"),
}
| prompt
| model
| StrOutputParser()
)
chain.invoke({"question": "where did harrison work", "language": "italian"})
'Harrison ha lavorato a Kensho.'
We can easily add in conversation history. This primarily means adding in chat_message_history
from langchain_core.messages import AIMessage, HumanMessage, get_buffer_string
from langchain_core.prompts import format_document
from langchain_core.runnables import RunnableParallel
from langchain.prompts.prompt import PromptTemplate
_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)
Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)
DEFAULT_DOCUMENT_PROMPT = PromptTemplate.from_template(template="{page_content}")
def _combine_documents(
docs, document_prompt=DEFAULT_DOCUMENT_PROMPT, document_separator="\n\n"
):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)
_inputs = RunnableParallel(
standalone_question=RunnablePassthrough.assign(
chat_history=lambda x: get_buffer_string(x["chat_history"])
)
| CONDENSE_QUESTION_PROMPT
| ChatOpenAI(temperature=0)
| StrOutputParser(),
)
_context = {
"context": itemgetter("standalone_question") | retriever | _combine_documents,
"question": lambda x: x["standalone_question"],
}
conversational_qa_chain = _inputs | _context | ANSWER_PROMPT | ChatOpenAI()
conversational_qa_chain.invoke(
{
"question": "where did harrison work?",
"chat_history": [],
}
)
AIMessage(content='Harrison was employed at Kensho.')
conversational_qa_chain.invoke(
{
"question": "where did he work?",
"chat_history": [
HumanMessage(content="Who wrote this notebook?"),
AIMessage(content="Harrison"),
],
}
)
AIMessage(content='Harrison worked at Kensho.')
This shows how to use memory with the above. For memory, we need to manage that outside at the memory. For returning
the retrieved documents, we just need to pass them through all the way.
Previous
« Prompt + LLM
Next
Multiple chains »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube
Model Example Selector Select by
ModulesI/O Prompts Types length
Select by length
This example selector selects which examples to use based on length. This is useful when you are worried about
constructing a prompt that will go over the length of the context window. For longer inputs, it will select fewer examples to
include, while for shorter inputs it will select more.
example_prompt = PromptTemplate(
input_variables=["input", "output"],
template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
# The examples it has available to choose from.
examples=examples,
# The PromptTemplate being used to format the examples.
example_prompt=example_prompt,
# The maximum length that the formatted examples should be.
# Length is measured by the get_text_length function below.
max_length=25,
# The function used to get the length of a string, which is used
# to determine which examples to include. It is commented out because
# it is provided as a default value if none is specified.
# get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
# We provide an ExampleSelector instead of examples.
example_selector=example_selector,
example_prompt=example_prompt,
prefix="Give the antonym of every input",
suffix="Input: {adjective}\nOutput:",
input_variables=["adjective"],
)
# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))
Give the antonym of every input
Input: happy
Output: sad
Input: tall
Output: short
Input: energetic
Output: lethargic
Input: sunny
Output: gloomy
Input: windy
Output: calm
Input: big
Output:
# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))
Give the antonym of every input
Input: happy
Output: sad
Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
Output:
# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))
Give the antonym of every input
Input: happy
Output: sad
Input: tall
Output: short
Input: energetic
Output: lethargic
Input: sunny
Output: gloomy
Input: windy
Output: calm
Input: big
Output: small
Input: enthusiastic
Output:
Previous
« Example Selector Types
Next
Select by maximal marginal relevance (MMR) »
Community
Discord
Twitter
GitHub
Python
JS/TS
More
Homepage
Blog
YouTube