Skip to content

SummarizationMiddleware

SummarizationMiddleware

Bases: SummarizationMiddleware

Summarization middleware with backend for conversation history offloading.

METHOD DESCRIPTION
__init__

Initialize summarization middleware with backend support.

before_model

Process messages before model invocation, with history offloading.

abefore_model

Process messages before model invocation, with history offloading (async).

before_agent

Logic to run before the agent execution starts.

abefore_agent

Async logic to run before the agent execution starts.

after_model

Logic to run after the model is called.

aafter_model

Async logic to run after the model is called.

wrap_model_call

Intercept and control model execution via handler callback.

awrap_model_call

Intercept and control async model execution via handler callback.

after_agent

Logic to run after the agent execution completes.

aafter_agent

Async logic to run after the agent execution completes.

wrap_tool_call

Intercept tool execution for retries, monitoring, or modification.

awrap_tool_call

Intercept and control async tool execution via handler callback.

state_schema class-attribute instance-attribute

state_schema: type[StateT] = cast('type[StateT]', _DefaultAgentState)

The schema for state passed to the middleware nodes.

tools instance-attribute

Additional tools registered by the middleware.

name property

name: str

The name of the middleware instance.

Defaults to the class name, but can be overridden for custom naming.

__init__

__init__(
    model: str | BaseChatModel,
    *,
    backend: BACKEND_TYPES,
    trigger: ContextSize | list[ContextSize] | None = None,
    keep: ContextSize = ("messages", _DEFAULT_MESSAGES_TO_KEEP),
    token_counter: TokenCounter = count_tokens_approximately,
    summary_prompt: str = DEFAULT_SUMMARY_PROMPT,
    trim_tokens_to_summarize: int | None = _DEFAULT_TRIM_TOKEN_LIMIT,
    history_path_prefix: str = "/conversation_history",
    **deprecated_kwargs: Any,
) -> None

Initialize summarization middleware with backend support.

PARAMETER DESCRIPTION
model

The language model to use for generating summaries.

TYPE: str | BaseChatModel

backend

Backend instance or factory for persisting conversation history.

TYPE: BACKEND_TYPES

trigger

Threshold(s) that trigger summarization.

TYPE: ContextSize | list[ContextSize] | None DEFAULT: None

keep

Context retention policy after summarization.

Defaults to keeping last 20 messages.

TYPE: ContextSize DEFAULT: ('messages', _DEFAULT_MESSAGES_TO_KEEP)

token_counter

Function to count tokens in messages.

TYPE: TokenCounter DEFAULT: count_tokens_approximately

summary_prompt

Prompt template for generating summaries.

TYPE: str DEFAULT: DEFAULT_SUMMARY_PROMPT

trim_tokens_to_summarize

Max tokens to include when generating summary.

Defaults to 4000.

TYPE: int | None DEFAULT: _DEFAULT_TRIM_TOKEN_LIMIT

history_path_prefix

Path prefix for storing conversation history.

TYPE: str DEFAULT: '/conversation_history'

Example
from deepagents.middleware.summarization import SummarizationMiddleware
from deepagents.backends import StateBackend

middleware = SummarizationMiddleware(
    model="gpt-4o-mini",
    backend=lambda tool_runtime: StateBackend(tool_runtime),
    trigger=("tokens", 100000),
    keep=("messages", 20),
)

before_model

before_model(state: AgentState[Any], runtime: Runtime) -> dict[str, Any] | None

Process messages before model invocation, with history offloading.

Overrides parent to offload messages to backend before summarization.

The summary message includes a reference to the file path where the full conversation history was stored.

PARAMETER DESCRIPTION
state

The agent state.

TYPE: AgentState[Any]

runtime

The runtime environment.

TYPE: Runtime

RETURNS DESCRIPTION
dict[str, Any] | None

Updated state with summarized messages if summarization was performed.

abefore_model async

abefore_model(state: AgentState[Any], runtime: Runtime) -> dict[str, Any] | None

Process messages before model invocation, with history offloading (async).

Overrides parent to offload messages to backend before summarization.

The summary message includes a reference to the file path where the full conversation history was stored.

PARAMETER DESCRIPTION
state

The agent state.

TYPE: AgentState[Any]

runtime

The runtime environment.

TYPE: Runtime

RETURNS DESCRIPTION
dict[str, Any] | None

Updated state with summarized messages if summarization was performed.

before_agent

before_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run before the agent execution starts.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply before agent execution.

abefore_agent async

abefore_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run before the agent execution starts.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply before agent execution.

after_model

after_model(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run after the model is called.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply after model call.

aafter_model async

aafter_model(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run after the model is called.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply after model call.

wrap_model_call

wrap_model_call(
    request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse]
) -> ModelCallResult

Intercept and control model execution via handler callback.

Async version is awrap_model_call

The handler callback executes the model request and returns a ModelResponse. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER DESCRIPTION
request

Model request to execute (includes state and runtime).

TYPE: ModelRequest

handler

Callback that executes the model request and returns ModelResponse.

Call this to execute the model.

Can be called multiple times for retry logic.

Can skip calling it to short-circuit.

TYPE: Callable[[ModelRequest], ModelResponse]

RETURNS DESCRIPTION
ModelCallResult

The model call result.

Examples:

Retry on error

def wrap_model_call(self, request, handler):
    for attempt in range(3):
        try:
            return handler(request)
        except Exception:
            if attempt == 2:
                raise

Rewrite response

def wrap_model_call(self, request, handler):
    response = handler(request)
    ai_msg = response.result[0]
    return ModelResponse(
        result=[AIMessage(content=f"[{ai_msg.content}]")],
        structured_response=response.structured_response,
    )

Error to fallback

def wrap_model_call(self, request, handler):
    try:
        return handler(request)
    except Exception:
        return ModelResponse(result=[AIMessage(content="Service unavailable")])

Cache/short-circuit

def wrap_model_call(self, request, handler):
    if cached := get_cache(request):
        return cached  # Short-circuit with cached result
    response = handler(request)
    save_cache(request, response)
    return response

Simple AIMessage return (converted automatically)

def wrap_model_call(self, request, handler):
    response = handler(request)
    # Can return AIMessage directly for simple cases
    return AIMessage(content="Simplified response")

awrap_model_call async

awrap_model_call(
    request: ModelRequest, handler: Callable[[ModelRequest], Awaitable[ModelResponse]]
) -> ModelCallResult

Intercept and control async model execution via handler callback.

The handler callback executes the model request and returns a ModelResponse.

Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER DESCRIPTION
request

Model request to execute (includes state and runtime).

TYPE: ModelRequest

handler

Async callback that executes the model request and returns ModelResponse.

Call this to execute the model.

Can be called multiple times for retry logic.

Can skip calling it to short-circuit.

TYPE: Callable[[ModelRequest], Awaitable[ModelResponse]]

RETURNS DESCRIPTION
ModelCallResult

The model call result.

Examples:

Retry on error

async def awrap_model_call(self, request, handler):
    for attempt in range(3):
        try:
            return await handler(request)
        except Exception:
            if attempt == 2:
                raise

after_agent

after_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Logic to run after the agent execution completes.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply after agent execution.

aafter_agent async

aafter_agent(state: StateT, runtime: Runtime[ContextT]) -> dict[str, Any] | None

Async logic to run after the agent execution completes.

PARAMETER DESCRIPTION
state

The current agent state.

TYPE: StateT

runtime

The runtime context.

TYPE: Runtime[ContextT]

RETURNS DESCRIPTION
dict[str, Any] | None

Agent state updates to apply after agent execution.

wrap_tool_call

wrap_tool_call(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage | Command[Any]],
) -> ToolMessage | Command[Any]

Intercept tool execution for retries, monitoring, or modification.

Async version is awrap_tool_call

Multiple middleware compose automatically (first defined = outermost).

Exceptions propagate unless handle_tool_errors is configured on ToolNode.

PARAMETER DESCRIPTION
request

Tool call request with call dict, BaseTool, state, and runtime.

Access state via request.state and runtime via request.runtime.

TYPE: ToolCallRequest

handler

Callable to execute the tool (can be called multiple times).

TYPE: Callable[[ToolCallRequest], ToolMessage | Command[Any]]

RETURNS DESCRIPTION
ToolMessage | Command[Any]

ToolMessage or Command (the final result).

The handler Callable can be invoked multiple times for retry logic.

Each call to handler is independent and stateless.

Examples:

Modify request before execution

def wrap_tool_call(self, request, handler):
    modified_call = {
        **request.tool_call,
        "args": {
            **request.tool_call["args"],
            "value": request.tool_call["args"]["value"] * 2,
        },
    }
    request = request.override(tool_call=modified_call)
    return handler(request)

Retry on error (call handler multiple times)

def wrap_tool_call(self, request, handler):
    for attempt in range(3):
        try:
            result = handler(request)
            if is_valid(result):
                return result
        except Exception:
            if attempt == 2:
                raise
    return result

Conditional retry based on response

def wrap_tool_call(self, request, handler):
    for attempt in range(3):
        result = handler(request)
        if isinstance(result, ToolMessage) and result.status != "error":
            return result
        if attempt < 2:
            continue
        return result

awrap_tool_call async

awrap_tool_call(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command[Any]]],
) -> ToolMessage | Command[Any]

Intercept and control async tool execution via handler callback.

The handler callback executes the tool call and returns a ToolMessage or Command. Middleware can call the handler multiple times for retry logic, skip calling it to short-circuit, or modify the request/response. Multiple middleware compose with first in list as outermost layer.

PARAMETER DESCRIPTION
request

Tool call request with call dict, BaseTool, state, and runtime.

Access state via request.state and runtime via request.runtime.

TYPE: ToolCallRequest

handler

Async callable to execute the tool and returns ToolMessage or Command.

Call this to execute the tool.

Can be called multiple times for retry logic.

Can skip calling it to short-circuit.

TYPE: Callable[[ToolCallRequest], Awaitable[ToolMessage | Command[Any]]]

RETURNS DESCRIPTION
ToolMessage | Command[Any]

ToolMessage or Command (the final result).

The handler Callable can be invoked multiple times for retry logic.

Each call to handler is independent and stateless.

Examples:

Async retry on error

async def awrap_tool_call(self, request, handler):
    for attempt in range(3):
        try:
            result = await handler(request)
            if is_valid(result):
                return result
        except Exception:
            if attempt == 2:
                raise
    return result
async def awrap_tool_call(self, request, handler):
    if cached := await get_cache_async(request):
        return ToolMessage(content=cached, tool_call_id=request.tool_call["id"])
    result = await handler(request)
    await save_cache_async(request, result)
    return result