Complete reference of all events available in Vision Agents. Events are emitted by components during agent execution and can be subscribed to using the @agent.events.subscribe decorator.
Base Event Structure
All events inherit from BaseEvent and share these common fields:
| Field | Type | Description | |
|---|
type | str | Event type identifier (e.g., "plugin.stt_transcript") | |
event_id | str | Unique UUID for this event instance | |
timestamp | datetime | When the event was created (UTC) | |
session_id | `str | None` | Current session identifier |
participant | `Participant | None` | Participant metadata from the call |
Plugin events extend PluginBaseEvent which adds:
| Field | Type | Description | |
|---|
plugin_name | `str | None` | Name of the plugin that emitted the event |
plugin_version | `str | None` | Version of the plugin |
Call Session Events
Events for participant activity on calls. These come from the Stream Video SDK.
Import: from vision_agents.core.events import ...
CallSessionParticipantJoinedEvent
Emitted when a participant joins the call.
from vision_agents.core.events import CallSessionParticipantJoinedEvent
@agent.events.subscribe
async def on_join(event: CallSessionParticipantJoinedEvent):
user = event.participant.user
print(f"{user.name} joined (id: {user.id})")
| Field | Type | Description |
|---|
call_cid | str | Call channel ID |
session_id | str | Session identifier |
participant | CallParticipantResponse | Joined participant info |
CallSessionParticipantLeftEvent
Emitted when a participant leaves the call.
from vision_agents.core.events import CallSessionParticipantLeftEvent
@agent.events.subscribe
async def on_leave(event: CallSessionParticipantLeftEvent):
print(f"{event.participant.user.name} left")
print(f"Duration: {event.duration_seconds}s")
| Field | Type | Description | |
|---|
call_cid | str | Call channel ID | |
session_id | str | Session identifier | |
participant | CallParticipantResponse | Left participant info | |
duration_seconds | int | How long participant was in call | |
reason | `str | None` | Why they left |
Other Call Events
| Event | Description |
|---|
CallCreatedEvent | Call was created |
CallEndedEvent | Call ended |
CallSessionStartedEvent | Session started |
CallSessionEndedEvent | Session ended |
CallSessionParticipantCountsUpdatedEvent | Participant count changed |
CallUpdatedEvent | Call settings updated |
CallMemberAddedEvent | Member added to call |
CallMemberRemovedEvent | Member removed from call |
CallRecordingStartedEvent | Recording started |
CallRecordingStoppedEvent | Recording stopped |
CallTranscriptionStartedEvent | Transcription started |
CallTranscriptionStoppedEvent | Transcription stopped |
ClosedCaptionEvent | Closed caption received |
Speech-to-Text (STT) Events
Events from speech recognition.
Import: from vision_agents.core.stt.events import ...
STTTranscriptEvent
Emitted when a complete transcript is available.
from vision_agents.core.stt.events import STTTranscriptEvent
@agent.events.subscribe
async def on_transcript(event: STTTranscriptEvent):
print(f"Text: {event.text}")
print(f"Confidence: {event.confidence}")
print(f"Language: {event.language}")
| Field | Type | Description | |
|---|
text | str | Transcribed text (required, non-empty) | |
confidence | `float | None` | Recognition confidence (0.0-1.0) |
language | `str | None` | Detected language code |
processing_time_ms | `float | None` | Time to process audio |
audio_duration_ms | `float | None` | Duration of audio processed |
model_name | `str | None` | Model used for recognition |
STTPartialTranscriptEvent
Emitted during speech for interim results.
| Field | Type | Description | |
|---|
text | str | Partial transcribed text | |
confidence | `float | None` | Recognition confidence |
language | `str | None` | Detected language |
STTErrorEvent
Emitted when STT encounters an error.
from vision_agents.core.stt.events import STTErrorEvent
@agent.events.subscribe
async def on_stt_error(event: STTErrorEvent):
print(f"Error: {event.error_message}")
print(f"Recoverable: {event.is_recoverable}")
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code identifier |
context | `str | None` | Additional context |
retry_count | int | Number of retry attempts | |
is_recoverable | bool | Whether the error is recoverable | |
error_message | str | Property: human-readable error message | |
STTConnectionEvent
Emitted when STT connection state changes.
| Field | Type | Description | |
|---|
connection_state | ConnectionState | New state (CONNECTED, DISCONNECTED, RECONNECTING, ERROR) | |
provider | `str | None` | STT provider name |
details | `dict | None` | Additional connection details |
reconnect_attempts | int | Number of reconnection attempts | |
Text-to-Speech (TTS) Events
Events from speech synthesis.
Import: from vision_agents.core.tts.events import ...
TTSAudioEvent
Emitted when TTS audio data is available.
from vision_agents.core.tts.events import TTSAudioEvent
@agent.events.subscribe
async def on_audio(event: TTSAudioEvent):
print(f"Chunk {event.chunk_index}, final: {event.is_final_chunk}")
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data |
chunk_index | int | Index of this chunk | |
is_final_chunk | bool | Whether this is the last chunk | |
text_source | `str | None` | Original text being synthesized |
synthesis_id | `str | None` | Unique ID for this synthesis |
TTSSynthesisStartEvent
Emitted when TTS synthesis begins.
| Field | Type | Description | |
|---|
text | `str | None` | Text being synthesized |
synthesis_id | str | Unique ID for this synthesis | |
model_name | `str | None` | TTS model name |
voice_id | `str | None` | Voice identifier |
estimated_duration_ms | `float | None` | Estimated audio duration |
TTSSynthesisCompleteEvent
Emitted when TTS synthesis finishes.
from vision_agents.core.tts.events import TTSSynthesisCompleteEvent
@agent.events.subscribe
async def on_complete(event: TTSSynthesisCompleteEvent):
print(f"Synthesis took {event.synthesis_time_ms}ms")
print(f"Audio duration: {event.audio_duration_ms}ms")
| Field | Type | Description | |
|---|
synthesis_id | `str | None` | Unique ID for this synthesis |
text | `str | None` | Text that was synthesized |
total_audio_bytes | int | Total bytes of audio | |
synthesis_time_ms | float | Processing time | |
audio_duration_ms | `float | None` | Resulting audio duration |
chunk_count | int | Number of chunks produced | |
real_time_factor | `float | None` | Synthesis speed vs real-time |
TTSErrorEvent
Emitted when TTS encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code identifier |
context | `str | None` | Additional context |
text_source | `str | None` | Text being synthesized |
synthesis_id | `str | None` | Synthesis identifier |
is_recoverable | bool | Whether the error is recoverable | |
TTSConnectionEvent
Emitted when TTS connection state changes.
| Field | Type | Description | |
|---|
connection_state | ConnectionState | New connection state | |
provider | `str | None` | TTS provider name |
details | `dict | None` | Additional details |
LLM Events
Events from language model interactions.
Import: from vision_agents.core.llm.events import ...
LLMResponseCompletedEvent
Emitted when the LLM finishes a response.
from vision_agents.core.llm.events import LLMResponseCompletedEvent
@agent.events.subscribe
async def on_response(event: LLMResponseCompletedEvent):
print(f"Response: {event.text}")
print(f"Model: {event.model}")
print(f"Tokens: {event.input_tokens} in, {event.output_tokens} out")
print(f"Latency: {event.latency_ms}ms")
| Field | Type | Description | |
|---|
text | str | Complete response text | |
original | Any | Raw response from provider | |
item_id | `str | None` | Response item identifier |
latency_ms | `float | None` | Total request to response time |
time_to_first_token_ms | `float | None` | Time to first token (streaming) |
input_tokens | `int | None` | Input/prompt tokens used |
output_tokens | `int | None` | Output tokens generated |
total_tokens | `int | None` | Total tokens used |
model | `str | None` | Model identifier |
LLMResponseChunkEvent
Emitted for each chunk during streaming responses.
| Field | Type | Description | |
|---|
delta | `str | None` | Text delta for this chunk |
content_index | `int | None` | Index of content part |
item_id | `str | None` | Response item identifier |
output_index | `int | None` | Output index |
sequence_number | `int | None` | Sequence number |
is_first_chunk | bool | Whether this is the first chunk | |
time_to_first_token_ms | `float | None` | Time to this first chunk |
LLMRequestStartedEvent
Emitted when an LLM request begins.
| Field | Type | Description | |
|---|
request_id | str | Unique request identifier | |
model | `str | None` | Model being used |
streaming | bool | Whether streaming is enabled | |
LLMErrorEvent
Emitted when a non-realtime LLM error occurs.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception |
error_code | `str | None` | Error code |
context | `str | None` | Additional context |
request_id | `str | None` | Request identifier |
is_recoverable | bool | Whether error is recoverable | |
Realtime LLM Events
Events specific to realtime LLM connections (like OpenAI Realtime API).
Import: from vision_agents.core.llm.events import ...
RealtimeConnectedEvent
Emitted when realtime connection is established.
| Field | Type | Description | |
|---|
provider | `str | None` | Provider name |
session_id | `str | None` | Session identifier |
session_config | `dict | None` | Session configuration |
capabilities | `list[str] | None` | Available capabilities |
RealtimeDisconnectedEvent
Emitted when realtime connection closes.
| Field | Type | Description | |
|---|
provider | `str | None` | Provider name |
session_id | `str | None` | Session identifier |
reason | `str | None` | Disconnection reason |
was_clean | bool | Whether disconnect was clean | |
RealtimeUserSpeechTranscriptionEvent
Emitted when user speech is transcribed by the realtime API.
from vision_agents.core.llm.events import RealtimeUserSpeechTranscriptionEvent
@agent.events.subscribe
async def on_user_speech(event: RealtimeUserSpeechTranscriptionEvent):
print(f"User said: {event.text}")
| Field | Type | Description |
|---|
text | str | Transcribed user speech |
original | Any | Raw event from provider |
RealtimeAgentSpeechTranscriptionEvent
Emitted when agent speech is transcribed by the realtime API.
from vision_agents.core.llm.events import RealtimeAgentSpeechTranscriptionEvent
@agent.events.subscribe
async def on_agent_speech(event: RealtimeAgentSpeechTranscriptionEvent):
print(f"Agent said: {event.text}")
| Field | Type | Description |
|---|
text | str | Transcribed agent speech |
original | Any | Raw event from provider |
Emitted when audio is sent to the realtime session.
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data sent |
RealtimeAudioOutputEvent
Emitted when audio is received from the realtime session.
| Field | Type | Description | |
|---|
data | `PcmData | None` | Audio data received |
response_id | `str | None` | Response identifier |
RealtimeResponseEvent
Emitted when the realtime session provides a response.
| Field | Type | Description | |
|---|
text | `str | None` | Response text |
original | `str | None` | Raw response |
response_id | str | Response identifier | |
is_complete | bool | Whether response is complete | |
conversation_item_id | `str | None` | Conversation item ID |
RealtimeConversationItemEvent
Emitted for conversation item updates.
| Field | Type | Description | |
|---|
item_id | `str | None` | Item identifier |
item_type | `str | None` | Type: "message", "function_call", "function_call_output" |
status | `str | None` | Status: "completed", "in_progress", "incomplete" |
role | `str | None` | Role: "user", "assistant", "system" |
content | `list[dict] | None` | Item content |
RealtimeErrorEvent
Emitted when a realtime error occurs.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception |
error_code | `str | None` | Error code |
context | `str | None` | Additional context |
is_recoverable | bool | Whether error is recoverable | |
Events for function calling / tool use.
Import: from vision_agents.core.llm.events import ...
Emitted when tool execution begins.
from vision_agents.core.llm.events import ToolStartEvent
@agent.events.subscribe
async def on_tool_start(event: ToolStartEvent):
print(f"Calling {event.tool_name}")
print(f"Args: {event.arguments}")
| Field | Type | Description | |
|---|
tool_name | str | Name of the tool being called | |
arguments | `dict | None` | Arguments passed to the tool |
tool_call_id | `str | None` | Unique call identifier |
Emitted when tool execution completes.
from vision_agents.core.llm.events import ToolEndEvent
@agent.events.subscribe
async def on_tool_end(event: ToolEndEvent):
if event.success:
print(f"{event.tool_name} returned: {event.result}")
print(f"Took {event.execution_time_ms}ms")
else:
print(f"{event.tool_name} failed: {event.error}")
| Field | Type | Description | |
|---|
tool_name | str | Name of the tool | |
success | bool | Whether execution succeeded | |
result | Any | Return value (if success) | |
error | `str | None` | Error message (if failed) |
tool_call_id | `str | None` | Unique call identifier |
execution_time_ms | `float | None` | Execution duration |
VLM Events
Events for vision/multimodal language models.
Import: from vision_agents.core.llm.events import ...
VLMInferenceStartEvent
Emitted when a VLM (Vision Language Model) inference starts.
Event Type: plugin.vlm_inference_start
| Field | Type | Description | |
|---|
inference_id | str | Unique identifier for this inference | |
model | `str | None` | Model identifier |
frames_count | int | Number of frames to process | |
VLMInferenceCompletedEvent
Emitted when a VLM inference completes. Contains timing metrics, token usage, and detection counts.
Event Type: plugin.vlm_inference_completed
from vision_agents.core.llm.events import VLMInferenceCompletedEvent
@agent.events.subscribe
async def on_vlm_complete(event: VLMInferenceCompletedEvent):
print(f"VLM response: {event.text}")
print(f"Processed {event.frames_processed} frames")
print(f"Detected {event.detections} objects")
| Field | Type | Description | |
|---|
inference_id | `str | None` | Unique identifier for this inference |
model | `str | None` | Model identifier |
text | str | Generated text response | |
latency_ms | `float | None` | Total time from request to complete response |
input_tokens | `int | None` | Number of input tokens (text + image tokens) |
output_tokens | `int | None` | Number of output tokens generated |
frames_processed | int | Number of video frames processed | |
detections | int | Number of objects/items detected | |
This event is used by MetricsCollector to record VLM metrics. See Telemetry for details.
VLMErrorEvent
Emitted when a VLM error occurs.
Event Type: plugin.vlm_error
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context about the error |
inference_id | `str | None` | ID of the failed inference |
is_recoverable | bool | Whether the error is recoverable | |
Video Processor Events
Events from video processing plugins (roboflow, ultralytics, etc.).
Import: from vision_agents.core.events import VideoProcessorDetectionEvent
VideoProcessorDetectionEvent
Emitted when a video processor detects objects in a frame.
from vision_agents.core.events import VideoProcessorDetectionEvent
@agent.events.subscribe
async def on_detection(event: VideoProcessorDetectionEvent):
print(f"Detected {event.detection_count} objects")
print(f"Inference took {event.inference_time_ms}ms")
| Field | Type | Description | |
|---|
model_id | `str | None` | Identifier of the model used |
inference_time_ms | `float | None` | Time taken for inference |
detection_count | int | Number of objects detected | |
This event is used by MetricsCollector to record video processing metrics. See Telemetry for details.
OpenAI Plugin Events
Events specific to the OpenAI plugin.
Import: from vision_agents.plugins.openai.events import ...
OpenAIStreamEvent
Emitted when OpenAI provides a streaming chunk.
| Field | Type | Description |
|---|
chunk | Any | Raw streaming chunk from OpenAI |
VAD Events
Voice Activity Detection events.
Import: from vision_agents.core.vad.events import ...
VADSpeechStartEvent
Emitted when VAD detects the start of speech.
| Field | Type | Description |
|---|
timestamp | datetime | When speech started |
VADSpeechEndEvent
Emitted when VAD detects the end of speech.
| Field | Type | Description | |
|---|
timestamp | datetime | When speech ended | |
duration_ms | `float | None` | Duration of speech segment |
VADErrorEvent
Emitted when VAD encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context |
Turn Detection Events
Events for detecting when speakers start and stop talking.
Import: from vision_agents.core.turn_detection.events import ...
TurnStartedEvent
Emitted when a speaker starts their turn.
Event Type: plugin.turn_detection.turn_started
from vision_agents.core.turn_detection.events import TurnStartedEvent
@agent.events.subscribe
async def on_turn_start(event: TurnStartedEvent):
print(f"Turn started (confidence: {event.confidence})")
| Field | Type | Description | |
|---|
participant | `Participant | None` | Who started speaking |
participant_id | `str | None` | ID of the participant speaking |
confidence | `float | None` | Detection confidence (0.0-1.0) |
custom | `dict | None` | Additional metadata |
TurnEndedEvent
Emitted when a speaker completes their turn.
Event Type: plugin.turn_detection.turn_ended
from vision_agents.core.turn_detection.events import TurnEndedEvent
@agent.events.subscribe
async def on_turn_end(event: TurnEndedEvent):
print(f"Turn ended after {event.duration_ms}ms")
print(f"Silence: {event.trailing_silence_ms}ms")
| Field | Type | Description | |
|---|
participant | `Participant | None` | Who stopped speaking |
participant_id | `str | None` | ID of the participant |
confidence | `float | None` | Detection confidence |
duration_ms | `float | None` | Duration of the turn in milliseconds |
trailing_silence_ms | `float | None` | Silence duration before turn end |
custom | `dict | None` | Additional metadata |
eager_end_of_turn | bool | Early end detection flag | |
This event is used by MetricsCollector to record turn detection metrics. See Telemetry for details.
xAI Plugin Events
Events specific to the xAI plugin.
Import: from vision_agents.plugins.xai.events import ...
XAIChunkEvent
Emitted for xAI streaming response chunks.
| Field | Type | Description |
|---|
chunk | Any | Raw streaming chunk from xAI |
Qwen Plugin Events
Events specific to the Qwen plugin.
Import: from vision_agents.plugins.qwen.events import ...
QwenLLMErrorEvent
Emitted when Qwen LLM encounters an error.
| Field | Type | Description | |
|---|
error | `Exception | None` | The exception that occurred |
error_code | `str | None` | Error code if available |
context | `str | None` | Additional context |
ConnectionState Enum
Used in connection events to indicate state.
Import: from vision_agents.core.events import ConnectionState
| Value | Description |
|---|
DISCONNECTED | Not connected |
CONNECTING | Connection in progress |
CONNECTED | Successfully connected |
RECONNECTING | Attempting to reconnect |
ERROR | Connection error |
Subscribing to Events
All events can be subscribed to using the @agent.events.subscribe decorator:
@agent.events.subscribe
async def my_handler(event: EventType):
# Handle event
pass
Subscribe to multiple event types using union types:
@agent.events.subscribe
async def my_handler(event: STTTranscriptEvent | STTPartialTranscriptEvent):
print(f"Transcript: {event.text}")
Event handlers must be async functions. Non-async handlers will raise a RuntimeError.