SummarizationMiddleware¶
SummarizationMiddleware
¶
Bases: SummarizationMiddleware
Summarization middleware with backend for conversation history offloading.
| METHOD | DESCRIPTION |
|---|---|
__init__ |
Initialize summarization middleware with backend support. |
before_model |
Process messages before model invocation, with history offloading. |
abefore_model |
Process messages before model invocation, with history offloading (async). |
before_agent |
Logic to run before the agent execution starts. |
abefore_agent |
Async logic to run before the agent execution starts. |
after_model |
Logic to run after the model is called. |
aafter_model |
Async logic to run after the model is called. |
wrap_model_call |
Intercept and control model execution via handler callback. |
awrap_model_call |
Intercept and control async model execution via handler callback. |
after_agent |
Logic to run after the agent execution completes. |
aafter_agent |
Async logic to run after the agent execution completes. |
wrap_tool_call |
Intercept tool execution for retries, monitoring, or modification. |
awrap_tool_call |
Intercept and control async tool execution via handler callback. |
state_schema
class-attribute
instance-attribute
¶
The schema for state passed to the middleware nodes.
name
property
¶
name: str
The name of the middleware instance.
Defaults to the class name, but can be overridden for custom naming.
__init__
¶
__init__(
model: str | BaseChatModel,
*,
backend: BACKEND_TYPES,
trigger: ContextSize | list[ContextSize] | None = None,
keep: ContextSize = ("messages", _DEFAULT_MESSAGES_TO_KEEP),
token_counter: TokenCounter = count_tokens_approximately,
summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
trim_tokens_to_summarize: int | None = _DEFAULT_TRIM_TOKEN_LIMIT,
history_path_prefix: str = "/conversation_history",
**deprecated_kwargs: Any,
) -> None
Initialize summarization middleware with backend support.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The language model to use for generating summaries.
TYPE:
|
backend
|
Backend instance or factory for persisting conversation history.
TYPE:
|
trigger
|
Threshold(s) that trigger summarization.
TYPE:
|
keep
|
Context retention policy after summarization. Defaults to keeping last 20 messages.
TYPE:
|
token_counter
|
Function to count tokens in messages.
TYPE:
|
summary_prompt
|
Prompt template for generating summaries.
TYPE:
|
trim_tokens_to_summarize
|
Max tokens to include when generating summary. Defaults to 4000.
TYPE:
|
history_path_prefix
|
Path prefix for storing conversation history.
TYPE:
|
before_model
¶
Process messages before model invocation, with history offloading.
Overrides parent to offload messages to backend before summarization.
The summary message includes a reference to the file path where the full conversation history was stored.
| PARAMETER | DESCRIPTION |
|---|---|
state
|
The agent state.
TYPE:
|
runtime
|
The runtime environment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None
|
Updated state with summarized messages if summarization was performed. |
abefore_model
async
¶
Process messages before model invocation, with history offloading (async).
Overrides parent to offload messages to backend before summarization.
The summary message includes a reference to the file path where the full conversation history was stored.
| PARAMETER | DESCRIPTION |
|---|---|
state
|
The agent state.
TYPE:
|
runtime
|
The runtime environment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any] | None
|
Updated state with summarized messages if summarization was performed. |
before_agent
¶
abefore_agent
async
¶
after_model
¶
aafter_model
async
¶
wrap_model_call
¶
wrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult
Intercept and control model execution via handler callback.
Async version is awrap_model_call
The handler callback executes the model request and returns a ModelResponse.
Middleware can call the handler multiple times for retry logic, skip calling
it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Callback that executes the model request and returns
Call this to execute the model. Can be called multiple times for retry logic. Can skip calling it to short-circuit.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult
|
The model call result. |
Examples:
Retry on error
Rewrite response
Error to fallback
Cache/short-circuit
awrap_model_call
async
¶
awrap_model_call(
request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult
Intercept and control async model execution via handler callback.
The handler callback executes the model request and returns a ModelResponse.
Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request
|
Model request to execute (includes state and runtime).
TYPE:
|
handler
|
Async callback that executes the model request and returns
Call this to execute the model. Can be called multiple times for retry logic. Can skip calling it to short-circuit.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ModelCallResult
|
The model call result. |
Examples:
after_agent
¶
aafter_agent
async
¶
wrap_tool_call
¶
wrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], ToolMessage | Command[Any]],
) -> ToolMessage | Command[Any]
Intercept tool execution for retries, monitoring, or modification.
Async version is awrap_tool_call
Multiple middleware compose automatically (first defined = outermost).
Exceptions propagate unless handle_tool_errors is configured on ToolNode.
| PARAMETER | DESCRIPTION |
|---|---|
request
|
Tool call request with call Access state via
TYPE:
|
handler
|
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command[Any]
|
|
The handler Callable can be invoked multiple times for retry logic.
Each call to handler is independent and stateless.
Examples:
Modify request before execution
Retry on error (call handler multiple times)
awrap_tool_call
async
¶
awrap_tool_call(
request: ToolCallRequest,
handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command[Any]]],
) -> ToolMessage | Command[Any]
Intercept and control async tool execution via handler callback.
The handler callback executes the tool call and returns a ToolMessage or
Command. Middleware can call the handler multiple times for retry logic, skip
calling it to short-circuit, or modify the request/response. Multiple middleware
compose with first in list as outermost layer.
| PARAMETER | DESCRIPTION |
|---|---|
request
|
Tool call request with call Access state via
TYPE:
|
handler
|
Async callable to execute the tool and returns Call this to execute the tool. Can be called multiple times for retry logic. Can skip calling it to short-circuit.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ToolMessage | Command[Any]
|
|
The handler Callable can be invoked multiple times for retry logic.
Each call to handler is independent and stateless.
Examples: