`SummarizationMiddleware`¶

SummarizationMiddleware ¶

Bases: SummarizationMiddleware

Summarization middleware with backend for conversation history offloading.

METHOD	DESCRIPTION
`__init__`	Initialize summarization middleware with backend support.
`before_model`	Process messages before model invocation, with history offloading.
`abefore_model`	Process messages before model invocation, with history offloading (async).
`before_agent`	Logic to run before the agent execution starts.
`abefore_agent`	Async logic to run before the agent execution starts.
`after_model`	Logic to run after the model is called.
`aafter_model`	Async logic to run after the model is called.
`wrap_model_call`	Intercept and control model execution via handler callback.
`awrap_model_call`	Intercept and control async model execution via handler callback.
`after_agent`	Logic to run after the agent execution completes.
`aafter_agent`	Async logic to run after the agent execution completes.
`wrap_tool_call`	Intercept tool execution for retries, monitoring, or modification.
`awrap_tool_call`	Intercept and control async tool execution via handler callback.

state_schema `class-attribute` `instance-attribute` ¶

state_schema: type[StateT] = cast('type[StateT]', _DefaultAgentState)

The schema for state passed to the middleware nodes.

tools `instance-attribute` ¶

tools: Sequence[BaseTool]

Additional tools registered by the middleware.

name `property` ¶

name: str

The name of the middleware instance.

Defaults to the class name, but can be overridden for custom naming.

init ¶

__init__(
    model: str | BaseChatModel,
    *,
    backend: BACKEND_TYPES,
    trigger: ContextSize | list[ContextSize] | None = None,
    keep: ContextSize = ("messages", _DEFAULT_MESSAGES_TO_KEEP),
    token_counter: TokenCounter = count_tokens_approximately,
    summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
    trim_tokens_to_summarize: int | None = _DEFAULT_TRIM_TOKEN_LIMIT,
    history_path_prefix: str = "/conversation_history",
    **deprecated_kwargs: Any,
) -> None

Initialize summarization middleware with backend support.

PARAMETER	DESCRIPTION
`model`	The language model to use for generating summaries. TYPE: `str \| BaseChatModel`
`backend`	Backend instance or factory for persisting conversation history. TYPE: `BACKEND_TYPES`
`trigger`	Threshold(s) that trigger summarization. TYPE: `ContextSize \| list[ContextSize] \| None` DEFAULT: `None`
`keep`	Context retention policy after summarization. Defaults to keeping last 20 messages. TYPE: `ContextSize` DEFAULT: `('messages', _DEFAULT_MESSAGES_TO_KEEP)`
`token_counter`	Function to count tokens in messages. TYPE: `TokenCounter` DEFAULT: `count_tokens_approximately`
`summary_prompt`	Prompt template for generating summaries. TYPE: `str` DEFAULT: `DEFAULT_SUMMARY_PROMPT`
`trim_tokens_to_summarize`	Max tokens to include when generating summary. Defaults to 4000. TYPE: `int \| None` DEFAULT: `_DEFAULT_TRIM_TOKEN_LIMIT`
`history_path_prefix`	Path prefix for storing conversation history. TYPE: `str` DEFAULT: `'/conversation_history'`

Example

from deepagents.middleware.summarization import SummarizationMiddleware
from deepagents.backends import StateBackend

middleware = SummarizationMiddleware(
    model="gpt-4o-mini",
    backend=lambda tool_runtime: StateBackend(tool_runtime),
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

before_model ¶

before_model(state: AgentState[Any], runtime: Runtime) -> dict[str, Any] | None

Process messages before model invocation, with history offloading.

Overrides parent to offload messages to backend before summarization.

The summary message includes a reference to the file path where the full conversation history was stored.

PARAMETER	DESCRIPTION
`state`	The agent state. TYPE: `AgentState[Any]`
`runtime`	The runtime environment. TYPE: `Runtime`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Updated state with summarized messages if summarization was performed.

abefore_model `async` ¶

abefore_model(state: AgentState[Any], runtime: Runtime) -> dict[str, Any] | None

Process messages before model invocation, with history offloading (async).

Overrides parent to offload messages to backend before summarization.

The summary message includes a reference to the file path where the full conversation history was stored.

PARAMETER	DESCRIPTION
`state`	The agent state. TYPE: `AgentState[Any]`
`runtime`	The runtime environment. TYPE: `Runtime`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Updated state with summarized messages if summarization was performed.

before_agent ¶

before_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run before the agent execution starts.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply before agent execution.

abefore_agent `async` ¶

abefore_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run before the agent execution starts.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply before agent execution.

after_model ¶

after_model(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run after the model is called.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply after model call.

aafter_model `async` ¶

aafter_model(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run after the model is called.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply after model call.

wrap_model_call ¶

wrap_model_call(
    request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult

Intercept and control model execution via handler callback.

Async version is awrap_model_call

The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER	DESCRIPTION
`request`	Model request to execute (includes state and runtime). TYPE: `ModelRequest`
`handler`	Callback that executes the model request and returns `ModelResponse`. Call this to execute the model. Can be called multiple times for retry logic. Can skip calling it to short-circuit. TYPE: `Callable[[ModelRequest], ModelResponse]`

RETURNS	DESCRIPTION
`ModelCallResult`	The model call result.

Examples:

Retry on error

def wrap_model_call(self, request, handler):
    for attempt in range(3):
        try:
            return handler(request)
        except Exception:
            if attempt == 2:
                raise

Rewrite response

def wrap_model_call(self, request, handler):
    response = handler(request)
    ai_msg = response.result[0]
    return ModelResponse(
        result=[AIMessage(content=f"[{ai_msg.content}]")],
        structured_response=response.structured_response,
    )

Error to fallback

def wrap_model_call(self, request, handler):
    try:
        return handler(request)
    except Exception:
        return ModelResponse(result=[AIMessage(content="Service unavailable")])

Cache/short-circuit

def wrap_model_call(self, request, handler):
    if cached := get_cache(request):
        return cached  # Short-circuit with cached result
    response = handler(request)
    save_cache(request, response)
    return response

Simple AIMessage return (converted automatically)

def wrap_model_call(self, request, handler):
    response = handler(request)
    # Can return AIMessage directly for simple cases
    return AIMessage(content="Simplified response")

awrap_model_call `async` ¶

awrap_model_call(
    request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult

Intercept and control async model execution via handler callback.

The handler callback executes the model request and returns a ModelResponse.

Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER	DESCRIPTION
`request`	Model request to execute (includes state and runtime). TYPE: `ModelRequest`
`handler`	Async callback that executes the model request and returns `ModelResponse`. Call this to execute the model. Can be called multiple times for retry logic. Can skip calling it to short-circuit. TYPE: `Callable[[ModelRequest], Awaitable[ModelResponse]]`

RETURNS	DESCRIPTION
`ModelCallResult`	The model call result.

Examples:

Retry on error

async def awrap_model_call(self, request, handler):
    for attempt in range(3):
        try:
            return await handler(request)
        except Exception:
            if attempt == 2:
                raise

after_agent ¶

after_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run after the agent execution completes.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply after agent execution.

aafter_agent `async` ¶

aafter_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run after the agent execution completes.

PARAMETER	DESCRIPTION
`state`	The current agent state. TYPE: `StateT`
`runtime`	The runtime context. TYPE: `Runtime[ContextT]`

RETURNS	DESCRIPTION
`dict[str, Any] \| None`	Agent state updates to apply after agent execution.

wrap_tool_call ¶

wrap_tool_call(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage | Command[Any]],
) -> ToolMessage | Command[Any]

Intercept tool execution for retries, monitoring, or modification.

Async version is awrap_tool_call

Multiple middleware compose automatically (first defined = outermost).

Exceptions propagate unless handle_tool_errors is configured on ToolNode.

PARAMETER	DESCRIPTION
`request`	Tool call request with call `dict`, `BaseTool`, state, and runtime. Access state via `request.state` and runtime via `request.runtime`. TYPE: `ToolCallRequest`
`handler`	`Callable` to execute the tool (can be called multiple times). TYPE: `Callable[[ToolCallRequest], ToolMessage \| Command[Any]]`

RETURNS	DESCRIPTION
`ToolMessage \| Command[Any]`	`ToolMessage` or `Command` (the final result).

The handler Callable can be invoked multiple times for retry logic.

Each call to handler is independent and stateless.

Examples:

Modify request before execution

def wrap_tool_call(self, request, handler):
    modified_call = {
        **request.tool_call,
        "args": {
            **request.tool_call["args"],
            "value": request.tool_call["args"]["value"] * 2,
        },
    }
    request = request.override(tool_call=modified_call)
    return handler(request)

Retry on error (call handler multiple times)

def wrap_tool_call(self, request, handler):
    for attempt in range(3):
        try:
            result = handler(request)
            if is_valid(result):
                return result
        except Exception:
            if attempt == 2:
                raise
    return result

Conditional retry based on response

def wrap_tool_call(self, request, handler):
    for attempt in range(3):
        result = handler(request)
        if isinstance(result, ToolMessage) and result.status != "error":
            return result
        if attempt < 2:
            continue
        return result

awrap_tool_call `async` ¶

awrap_tool_call(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command[Any]]],
) -> ToolMessage | Command[Any]

Intercept and control async tool execution via handler callback.

The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER	DESCRIPTION
`request`	Tool call request with call `dict`, `BaseTool`, state, and runtime. Access state via `request.state` and runtime via `request.runtime`. TYPE: `ToolCallRequest`
`handler`	Async callable to execute the tool and returns `ToolMessage` or `Command`. Call this to execute the tool. Can be called multiple times for retry logic. Can skip calling it to short-circuit. TYPE: `Callable[[ToolCallRequest], Awaitable[ToolMessage \| Command[Any]]]`

RETURNS	DESCRIPTION
`ToolMessage \| Command[Any]`	`ToolMessage` or `Command` (the final result).

The handler Callable can be invoked multiple times for retry logic.

Each call to handler is independent and stateless.

Examples:

Async retry on error

async def awrap_tool_call(self, request, handler):
    for attempt in range(3):
        try:
            result = await handler(request)
            if is_valid(result):
                return result
        except Exception:
            if attempt == 2:
                raise
    return result

async def awrap_tool_call(self, request, handler):
    if cached := await get_cache_async(request):
        return ToolMessage(content=cached, tool_call_id=request.tool_call["id"])
    result = await handler(request)
    await save_cache_async(request, result)
    return result

SummarizationMiddleware¶