UVC 1.5 Class Specification
UVC 1.5 Class Specification
Revision 1.5
August 9, 2012
USB Device Class Definition for Video Devices
Contributors
Hans van Antwerpen Cypress Semiconductor
Eric Luttmann Cypress Semiconductor
David Roh Dolby Labs
Choon Chng Google Inc.
Pawel Osciak Google Inc.
Ville-Mikko Rautio Google Inc.
Van Duros Immedia Semiconductor Inc.
Abdul R. Ismail Intel Corp.
Bradley Saunders Intel Corp.
Ygal Blum Jungo
Yoav Nissim Jungo
Jean-Michel Chardon Logitech Inc.
Olivier Lechenne Logitech Inc.
Geraud Mudry Logitech Inc.
Chandrashekhar Rao Logitech Inc.
Remy Zimmermann Logitech Inc.
Chris Yokum MCCI Corporation
Stephen Cooper Microsoft Corp.
Maribel Figuera Microsoft Corp.
Richard Webb Microsoft Corp.
Anand Ganesh Microsoft Corp.
David Goll Microsoft Corp.
Hiro Kobayashi Microsoft Corp.
Bertrand Lee Microsoft Corp.
Jeff Zhu Microsoft Corp.
Andrei Jefremov Microsoft Corp.
Tim Vlaar Point Grey Research
Mark Bohm SMSC
John Sisto SMSC
Will Harris Texas Instruments
Grant Ley Texas Instruments
Anshuman Saxena Texas Instruments
Paul E. Berg USB-IF
All product names are trademarks, registered trademarks, or service marks of their respective owners.
Revision History
Table of Contents
1 Introduction ............................................................................................................................. 1
1.1 Purpose ............................................................................................................................. 1
1.2 Scope ................................................................................................................................ 1
1.3 Related Documents .......................................................................................................... 1
1.4 Document Conventions .................................................................................................... 1
1.4.1 Notations .................................................................................................................. 2
1.5 Terms and Abbreviations ................................................................................................. 2
2 Functional Characteristics ....................................................................................................... 5
2.1 Video Interface Class ....................................................................................................... 5
2.2 Video Interface Subclass and Protocol............................................................................. 5
2.3 Video Function Topology ................................................................................................ 6
2.3.1 Input Terminal .......................................................................................................... 8
2.3.2 Output Terminal ....................................................................................................... 9
2.3.3 Camera Terminal ...................................................................................................... 9
2.3.4 Selector Unit ........................................................................................................... 10
2.3.5 Processing Unit....................................................................................................... 10
2.3.6 Encoding Unit......................................................................................................... 11
2.3.7 Extension Unit ........................................................................................................ 12
2.4 Operational Model.......................................................................................................... 12
2.4.1 Video Interface Collection ..................................................................................... 13
2.4.2 VideoControl Interface ........................................................................................... 13
2.4.2.1 Control Endpoint ............................................................................................. 14
2.4.2.2 Status Interrupt Endpoint ................................................................................. 14
2.4.2.3 Hardware Trigger Interrupts ............................................................................ 17
2.4.2.4 Still Image Capture .......................................................................................... 17
2.4.2.5 Optical Zoom vs Digital Zoom ........................................................................ 18
2.4.3 VideoStreaming Interface....................................................................................... 18
2.4.3.1 Stream Bandwidth Selection ........................................................................... 19
2.4.3.2 Video and Still Image Samples ....................................................................... 20
2.4.3.2.1 Sample Bulk Transfers ................................................................................. 23
2.4.3.2.2 Sample Isochronous Transfers...................................................................... 25
2.4.3.3 Video and Still Image Payload Headers .......................................................... 29
2.4.3.4 Stream Synchronization and Rate Matching ................................................... 32
2.4.3.4.1 Latency ......................................................................................................... 32
2.4.3.4.2 Clock Reference ........................................................................................... 33
2.4.3.4.3 Presentation Time ......................................................................................... 33
2.4.3.5 Dynamic Frame Interval Support .................................................................... 34
2.4.3.6 Device Initiated Dynamic Format Change Support ........................................ 34
2.4.3.7 Data Format Classes ........................................................................................ 34
2.4.4 Control Transfer and Request Processing .............................................................. 35
3 Descriptors ............................................................................................................................. 43
3.1 Descriptor Layout Overview .......................................................................................... 44
3.2 Device Descriptor ........................................................................................................... 44
3.3 Device_Qualifier Descriptor (deprecated) ..................................................................... 45
3.4 Configuration Descriptor................................................................................................ 45
List of Tables
Table 2-1 Status Packet Format 15
Table 2-2 Status Packet Format (VideoControl Interface as the Originator) 16
Table 2-3 Status Packet Format (VideoStreaming Interface as the Originator) 16
Table 2-4 Summary of Still Image Capture Methods 18
Table 2-5 Format of the Payload Header 29
Table 2-6 Extended Fields of the Payload Header 30
Table 3-1 Standard Video Interface Collection IAD 46
Table 3-2 Standard VC Interface Descriptor 47
Table 3-3 Class-specific VC Interface Header Descriptor 48
Table 3-4 Input Terminal Descriptor 50
Table 3-5 Output Terminal Descriptor 51
Table 3-6 Camera Terminal Descriptor 52
Table 3-7 Selector Unit Descriptor 53
Table 3-8 Processing Unit Descriptor 54
Table 3-10 Extension Unit Descriptor 58
Table 3-11 Standard VC Interrupt Endpoint Descriptor 59
Table 3-12 Class-specific VC Interrupt Endpoint Descriptor 60
Table 3-13 Standard VS Interface Descriptor 61
Table 3-14 Class-specific VS Interface Input Header Descriptor 62
Table 3-15 Class-specific VS Interface Output Header Descriptor 64
Table 3-16 Payload Format Descriptor 65
Table 3-17 Defined Video Frame Descriptor Resources 66
Table 3-18 Still Image Frame Descriptor 67
Table 3-19 Color Matching Descriptor 68
Table 3-20 Standard VS Isochronous Video Data Endpoint Descriptor 70
Table 3-21 Standard VS Bulk Video Data Endpoint Descriptor 70
Table 3-22 Standard VS Bulk Still Image Data Endpoint Descriptor 71
Table 4-1 Set Request 73
Table 4-2 Get Request 74
Table 4-3 Defined Bits Containing Capabilities of the Control 75
Table 4-4 Interface Control Requests 77
Table 4-5 Power Mode Control 77
Table 4-6 Device Power Mode 78
Table 4-7 Request Error Code Control 79
Table 4-8 Unit and Terminal Control Requests 80
Table 4-9 Scanning Mode Control 82
Table 4-10 Auto-Exposure Mode Control 82
Table 4-11 Auto-Exposure Priority Control 83
Table 4-12 Exposure Time (Absolute) Control 83
Table 4-13 Exposure Time (Relative) Control 84
Table 4-14 Focus (Absolute) Control 84
Table 4-15 Focus (Relative) Control 85
Table 4-17 Focus, Auto Control 86
Table 4-18 Iris (Absolute) Control 86
Table 4-19 Iris (Relative) Control 87
List of Figures
Figure 2-3 Input Terminal Icon 9
Figure 2-4 Output Terminal Icon 9
Figure 2-5 Selector Unit Icon (2 input pins) 10
Figure 2-6 Processing Unit Icon 11
Figure 2-8 Extension Unit Icon 12
Figure 2-9 Stream Bandwidth Selection 20
Figure 2-10 Protocol Layering and Abstraction 21
Figure 2-11 A Payload Transfer 22
Figure 2-12 Sample Bulk Read (Multiple Transfers per Sample) 23
Figure 2-13 Sample Bulk Read (Single Transfer per Sample) 24
Figure 2-14 Sample Bulk Write (Single Transfer per Sample) 24
Figure 2-15 Sample Isochronous Transfer, IN endpoint 25
Figure 2-16 Sample Isochronous Transfer, OUT endpoint 26
Figure 2-17 Sample Isochronous Transfer, IN endpoint 27
Figure 2-18 Sample Isochronous Transfer, OUT endpoint 28
Figure 2-19 Control Transfer Example (Case 1) 38
Figure 2-20 Control Transfer Example (Case 2) 39
Figure 2-21 Control Transfer Example (Case 3) 40
Figure 2-22 Control Transfer Example (Case 4) 41
Figure 2-23 Control Transfer Example (Case 5) 42
Figure 3-1 Video Camera Descriptor Layout Example 44
Figure 4-5 Successful USB Isochronous Bandwidth Negotiation 147
Figure 4-6 Failed USB Isochronous Bandwidth Negotiation 148
Figure 4-7 Dynamic Stream Settings Modification while Streaming 149
1 Introduction
1.1 Purpose
This document describes the minimum capabilities and characteristics that a video streaming
device must support to comply with the USB Video Class specification.
It defines and standardizes video streaming functionality on the USB, and contains all necessary
information for a designer to build a USB-compliant device that incorporates video streaming
functionality. It specifies the standard and class-specific descriptors that must be present in each
USB video function. It further explains the use of class-specific requests that allow for full video
streaming control. Finally, it explains how devices can be compliant with multiple versions of
this specification to enable backwards compatibility.
Devices that conform to this specification will be referred to as USB Video Class devices.
1.2 Scope
The USB Device Class Definition for Video Devices applies to all devices or functions within
composite devices that are used to manipulate video and video-related functionality. This would
include devices such as desktop video cameras (or "webcams"), digital camcorders, analog video
converters, analog and digital television tuners, and still-image cameras that support video
streaming. This specification also applies to Video Devices that compress video using temporal
encoders.
Shall/Must
keywords indicating a mandatory requirement. Designers are required to implement all such
mandatory requirements.
Should
a keyword indicating flexibility of choice with a strongly preferred alternative. Equivalent to
the phrase is recommended.
1.4.1 Notations
The following notations are used in this specification and all associated video payload and
example documents.
Notation Description
SET_INTERFACE (n) This notation indicates a SET_INTERFACE request as
defined section 9.4.10 of USB Specification 2.0 and USB
Specification 3.0 where wValue = n.
Control_Name(request_type) This notation indicates a specific request_type being
issued to a specific Control_Name.
EU_*_CONTROL(request_type) This notation indicates a specific request_type issued to
any Encoding Unit.
2 Functional Characteristics
The video function is located at the interface level in the device class hierarchy. It consists of a
number of interfaces grouping related pipes that together implement the interface to the video
function.
Video functions are addressed through their video interfaces. Each video function has a single
VideoControl (VC) interface and can have several VideoStreaming (VS) interfaces. The
VideoControl (VC) interface is used to access the device controls of the function whereas the
VideoStreaming (VS) interfaces are used to transport data streams into and out of the function.
The collection of the single VideoControl interface and the VideoStreaming interfaces that
belong to the same video function is called the Video Interface Collection (VIC). An Interface
Association Descriptor (IAD) is used to describe the Video Interface Collection.
In fact, for a video function to be part of this class, the only requirement is that it exposes one
VideoControl Interface. No further interaction with the function is mandatory, although most
functions in the video interface class will support one or more optional VideoStreaming
interfaces for consuming or producing one or more video data streams.
The Video Interface class code is assigned by the USB. For details, see section A.1 "Video
Interface Class Code".
The assigned codes can be found in sections A.2, "Video Interface Subclass Codes" and A.3,
"Video Interface Protocol Codes" of this specification. All other subclass codes are unused and
reserved except code 0xFF, which is reserved for vendor-specific extensions.
Units provide the basic building blocks to fully describe most video functions. Video functions
are built by connecting together several of these Units. A Unit has one or more Input Pins and a
single Output Pin, where each Pin represents a cluster of logical data streams inside the video
function.
Unit Unit
Units are wired together by connecting their I/O Pins according to the required topology. A
single Output Pin can be connected to one or more Input Pins (fan-out allowed). However, a
single Input Pin can only be connected to one Output Pin (fan-in disallowed). Loops or cycles
within the graph topology are not allowed.
Unit Unit
Unit Unit
Unit Unit
In addition, the concept of Terminal is introduced. There are two types of Terminals. An Input
Terminal (IT) is an entity that represents a starting point for data streams inside the video
function. An Output Terminal (OT) represents an ending point for data streams. From the video
function’s perspective, a USB endpoint is a typical example of an Input Terminal or Output
Terminal. It either provides data streams to the video function (IT) or consumes data streams
coming from the video function (OT). Likewise, a Charge Coupled Device (CCD) sensor, built
into the video function is represented as an Input Terminal in the video function’s model.
Connection to a Terminal is made through its single Input Pin or Output Pin.
Input Pins of a Unit are numbered starting from one up to the total number of Input Pins on the
Unit. The Output Pin number is always one. Terminals have one Input or Output Pin that is
always numbered one.
The information traveling over I/O Pins is not necessarily of a digital nature. It is possible to use
the Unit model to describe fully analog or even hybrid video functions. The mere fact that I/O
Pins are connected together is a guarantee (by construction) that the protocol and format, used
over these connections (analog or digital), is compatible on both ends.
Every Unit in the video function is fully described by its associated Unit Descriptor (UD). The
Unit Descriptor contains all necessary fields to identify and describe the Unit. Likewise, there is
a Terminal Descriptor (TD) for every Terminal in the video function. In addition, these
descriptors provide all necessary information about the topology of the video function. They
fully describe how Terminals and Units are interconnected.
This specification describes the following types of standard Units and Terminals that are
considered adequate to represent most video functions available today and in the near future:
Input Terminal
Output Terminal
Selector Unit
Processing Unit
Encoding Unit
Extension Unit
Also, there are certain special Terminals that extend the functionality of the basic Input and
Output Terminals. These special Terminals support additional Terminal Descriptor fields and
Requests that are specific to the extended features these Terminals provide. These include:
Media Transport Terminal (defined in USB Device Class Definition for Video Media
Transport Terminal specification)
Camera Terminal
The types of Units defined in this specification could be extended in future revisions, or via
companion specifications. For example, a Tuner Unit could be added as a companion
specification to accommodate devices with TV Tuners.
Inside a Unit or Terminal, functionality is further described through Video Controls. A Control
typically provides access to a specific video property. Each Control has a set of attributes that
can be manipulated or that present additional information about the behavior of the Control.
Controls have attributes, which might include:
Current setting
Minimum setting
Maximum setting
Resolution
Size
Default
Consider a Brightness Control inside a Processing Unit. By issuing the appropriate requests, the
Host software can obtain values for the Brightness Control’s attributes and, for instance, use
them to correctly display the Control in a User Interface. Setting the Brightness Control’s current
setting attribute allows the Host software to change the brightness of the video that is being
streamed.
The ensemble of Unit Descriptors, Terminal Descriptors and Video Controls provide a full
description of the video function to the Host. A generic class driver shall be able to fully control
the video function. When functionality is represented by Extension Units, the class driver shall
permit access to vendor-specific extensions via a pass-through mechanism. The implementation
details of such a class driver are beyond the scope of this specification.
An Input Terminal can represent inputs to the video function other than USB OUT endpoints. A
CCD sensor on a video camera or a composite video input is an example of such a non-USB
input. However, if the video stream is entering the video function by means of a USB OUT
endpoint, there is a one-to-one relationship between that endpoint and its associated Input
Terminal. The class-specific Output Header descriptor contains a field that holds a direct
reference to this Input Terminal (see section 3.9.2.2, “Output Header Descriptor”). The Host
needs to use both the endpoint descriptors and the Input Terminal descriptor to get a full
understanding of the characteristics and capabilities of the Input Terminal. Stream-related
parameters are stored in the endpoint descriptors. Control-related parameters are stored in the
Terminal descriptor.
The symbol for the Input Terminal is depicted in the following figure.
An Output Terminal can represent outputs from the video function other than USB IN endpoints.
A Liquid Crystal Display (LCD) screen built into a video device or a composite video out
connector are examples of such an output. However, if the video stream is leaving the video
function by means of a USB IN endpoint, there is a one-to-one relationship between that
endpoint and its associated Output Terminal. The class-specific Input Header descriptor contains
a field that holds a direct reference to this Output Terminal (see section 3.9.2.1, “Input Header
Descriptor”). The Host needs to use both the endpoint descriptors and the Output Terminal
descriptor to fully understand the characteristics and capabilities of the Output Terminal. Stream-
related parameters are stored in the endpoint descriptors. Control-related parameters are stored in
the Terminal descriptor.
The symbol for the Output Terminal is depicted in the following figure.
Pan
Roll
Tilt
Digital Windowing
Region of Interest
Support for any particular control is optional. The Focus control can optionally provide support
for an auto setting (with an on/off state). If the auto setting is supported and set to the on state,
the device will provide automatic focus adjustment, and read requests will reflect the
automatically set value. Attempts to programmatically set the Focus control when in auto mode
shall result in protocol STALL with an error code of bRequestErrorCode = “Wrong State”.
When leaving Auto-Focus mode (entering manual focus mode), the control shall remain at the
value that was in effect just before the transition.
The symbol for the Selector Unit is depicted in the following figure.
User Controls
Brightness
Hue
Saturation
Sharpness
Gamma
Digital Multiplier (Zoom)
Auto Controls
White Balance Temperature
White Balance Component
Backlight Compensation
Contrast
Other
Gain
Power Line Frequency
Analog Video Standard
Analog Video Lock Status
Support for any particular control is optional. In particular, if the device supports the White
Balance function, it shall implement either the White Balance Temperature control or the White
Balance Component control, but not both. The User Controls indicate properties that are
governed by user preference and not subject to any automatic adjustment by the device. The
Auto Controls will provide support for an auto setting (with an on/off state). If the auto setting
for a particular control is supported and set to the on state, the device will provide automatic
adjustment of the control, and read requests to the related control will reflect the automatically
set value. Attempts to programmatically set the Focus control when in auto mode shall result in
protocol STALL with an error code of bRequestErrorCode = “Wrong State”. When leaving an
auto mode, the related control shall remain at the value that was in effect just before the
transition.
The symbol for the Processing Unit is depicted in the following figure.
Support for the Encoding Unit control is optional and only applicable to devices with onboard
video encoders. The Select Layer control also allows control of individual streams for devices
that support simulcast transport of more than one stream. Individual payloads may specialize the
behavior of each of these controls to align with the feature set defined by the associated encoder,
e.g. H.264. This specialized behavior is defined in the associated payload specification.
The symbol for the Encoding Unit is depicted in the following figure.
EU
Figure 2-7 Encoding Unit Icon
Although a generic host driver will not be able to determine what functionality is implemented in
the Extension Unit, it shall report the presence of these extensions to vendor-supplied client
software, and provide a method for sending control requests from the client software to the Unit,
and receiving status from the unit.
The symbol for the Extension Unit is depicted in the following figure.
functions that co-reside in the same composite device. Several independent video functions can
exist in the same device. Interfaces that belong to the same video function are grouped into a
Video Interface Collection described by an Interface Association Descriptor. If the device
contains multiple independent video functions, there must be multiple Video Interface
Collections (and hence multiple Interface Association Descriptors), each providing full access to
their associated video function.
Video Interface Collections can be dynamic in devices that support multiple operating modes.
Because the VideoControl interface, together with its associated VideoStreaming interface(s),
constitutes the ‘logical interface’ to the video function, they must all come into existence at the
same moment in time. Changing the operating mode of a device causes the previous Video
Interface Collection to be replaced with a new Video Interface Collection, followed by re-
initialization of the host software. This specification does not provide a mechanism for the host
to initiate such a mode change, which is typically initiated via a physical switch on the device.
As stated earlier, video functionality is located at the interface level in the device class hierarchy.
The following sections describe the Video Interface Collection, containing a single VideoControl
interface and optional VideoStreaming interfaces, together with their associated endpoints that
are used for video function control and for data stream transfer.
A control endpoint for manipulating Unit and Terminal settings and retrieving the state of
the video function. This endpoint is mandatory, and the default endpoint 0 is used for this
purpose.
An interrupt endpoint for status returns. This endpoint is optional, but may be mandatory
under certain conditions. See section 2.4.2.2, "Status Interrupt Endpoint" for further
information.
The VideoControl interface is the single entry point to access the internals of the video function.
All requests that are concerned with the manipulation of certain Video Controls within the video
function’s Units or Terminals must be directed to the VideoControl interface of the video
function. Likewise, all descriptors related to the internals of the video function are part of the
class-specific VideoControl interface descriptor.
This specification defines a single alternate setting for the VideoControl interface, the default
alternate setting zero.
The interrupt packet is a variable size data structure depending on the originator of the interrupt
status. The bStatusType and bOriginator fields contain information about the originator of the
interrupt. The bEvent field contains information about the event triggering the interrupt. If the
originator is the Video Control interface, the bSelector field reports the Control Selector of the
control that issued the interrupt. Any addressable entity inside a video function can be the
originator.
The contents of the bOriginator field must be interpreted according to the code in D3..0 of the
bStatusType field. If the originator is the VideoControl interface, the bOriginator field contains
the Terminal ID or Unit ID of the entity that caused the interrupt to occur. If the bOriginator
field is set to zero, the virtual entity interface is the originator. This can be used to report global
VideoControl interface changes to the Host. If the originator is a VideoStreaming interface, the
bOriginator field contains the interface number of the VideoStreaming interface. This scheme is
unambiguous because Units and Terminals are not allowed to have an ID of zero.
If the originator is the VideoControl interface, the bAttribute field indicates the type of Control
change.
The contents of the bEvent field must also be interpreted according to the code in D3..0 of the
bStatusType field. If the originator is the VideoStreaming interface, there are additional button
press events defined as described in the table below.
For all originators, there is a Control Change event defined. Controls that support this event will
trigger an interrupt when a host-initiated or externally-initiated control change occurs. The
interrupt shall only be sent when the operation corresponding to the control change is completed
by the device.
A Control shall support Control Change events if any of the following is true:
The Control state can be changed independently of host control.
The Control can take longer than 10ms from the start of the Data stage through the
completion of the Status stage when transferring to the device (SET_CUR operations).
If a control is required to support Control Change events, the event shall be sent for all
SET_CUR operations, even if the operation can be completed within the 10ms limit. The device
indicates support for the Control Change event for any particular control via the GET_INFO
attribute (see section 4.1.2, "Get Request"). Section 2.4.4, "Control Transfer and Request
Processing" describes in detail the interaction of Control Transfers (Requests) and Control
Change events.
When the originator is a Video Control Interface, the rest of structure is:
Table 2-2 Status Packet Format (VideoControl Interface as the Originator)
Offset Field Size Value Description
2 bEvent 1 Number 0x00: Control Change
0x01 – 0xFF: Reserved
3 bSelector 1 Number Control Change
Report the Control Selector of the control that
issued the interrupt.
4 bAttribute 1 Number Specify the type of control change:
0x00: Control value change
0x01: Control info change
0x02: Control failure change
0x03: Control min change
0x04: Control max change
0x05 – 0xFF: Reserved
5 bValue n See control request description in section 4.2
"VideoControl Requests".
bAttribute: Description:
0x00 Equivalent to the result of a
GET_CUR request
0x01 Equivalent to the result of a
GET_INFO request
0x02 Equivalent to the result of a
GET_CUR request on
VC_REQUEST_ERROR_
CODE_CONTROL
0x03 Equivalent to the result of a
GET_MIN request
0x04 Equivalent to the result of a
GET_MAX request
When the originator is a Video Streaming Interface the rest of the structure is:
Table 2-3 Status Packet Format (VideoStreaming Interface as the Originator)
Offset Field Size Value Description
2 bEvent 1 Number All originators:
0x00 = Button Press
0x01 – 0xFF = Stream Error
3 bValue n Number Button Press: (n=1)
0x00: Button released
0x01: Button pressed
The device will have to specify whether it supports hardware triggers, and how the Host software
should respond to hardware trigger events. These are specified in the class-specific descriptors
within the relevant VideoStreaming interface. See section 3, "Descriptors".
Depending on the method used, the still image frame may have to be the same size as the video
frames that are being streamed. There are several supported methods of capturing the still image,
and the device will have to specify which method it supports in the class-specific descriptors
within the relevant VideoStreaming interface.
Method 1 - The host software will extract the next available video frame from the active video
pipe in the relevant VideoStreaming interface upon receiving the hardware trigger event. The
hardware does not interrupt or alter the video stream in this case. For this method, the still image
frame is always the same size as the video frames being streamed.
Method 2 – If the device supports higher-quality still images, it has the option of streaming still-
image-specific packets across the active video pipe. In this case, the host software will
temporarily suspend video streaming, select the optimal bandwidth alternate setting based on the
still probe/commit negotiation (subject to bandwidth availability), send a
VS_STILL_IMAGE_TRIGGER_CONTROL Set request with the "Transmit still image" option
(see section 4.3.1.4, "Still Image Trigger Control"), and prepare to receive the still image data.
The device transmits the still image data marked as such in the payload header (see section
2.4.3.2.2, "Sample Isochronous Transfers"). Once the complete still image is received, the host
software will then revert back to the original alternate setting, and resume video streaming.
Method 3 – This method enables the capture of higher-quality still images from a dedicated bulk
still image pipe. By doing so, the active streams would continue uninterrupted. There are two
cases covered by this method.
In the first case, the host software initiates the still image capture from the device. It does so by
issuing a VS_STILL_IMAGE_TRIGGER_CONTROL Set request with the "Transmit still image
via dedicated bulk pipe" option (see section 4.3.1.4, "Still Image Trigger Control"). In this case,
after issuing the request, the host will start receiving the still image from the bulk still image
endpoint of the relevant VideoStreaming interface. The device captures the high-quality still
image and transmits the data to the bulk still image endpoint. While transmission is occurring,
the bTrigger field of the VS_STILL_IMAGE_TRIGGER_CONTROL control shall remain as
"Transmit still image via dedicated bulk pipe". After transmission is complete, the device shall
reset the control to "Normal operation" and trigger a control change interrupt via the Status
Interrupt endpoint.
In the second case, the device initiates the still image transmission after detecting a hardware
trigger. When the hardware detects a button press, the Status Interrupt endpoint will issue an
interrupt originating from the relevant VideoStreaming interface. If the bTriggerUsage field of
the selected Format descriptor is set as initiating still image capture, the device shall set the
bTrigger field of the VS_STILL_IMAGE_TRIGGER_CONTROL control to “Transmit still
image via dedicated bulk pipe”. The Host software should then begin receiving still image data
that was captured by the device after it received the interrupt. After transmission is complete, the
device shall reset the bTrigger field to “Normal operation”. The host software can abort data
transmission by issuing a VS_STILL_IMAGE_TRIGGER_CONTROL request with the “Abort
still image transmission” option. In either case, the device shall trigger a control change interrupt
via the Status Interrupt endpoint
The following table summarizes endpoint usage for the various methods of still image capture.
Table 2-4 Summary of Still Image Capture Methods
Isochronous video data pipe Bulk video data pipe
Method 1 1 Isochronous (Video) 1 Bulk (Video)
Method 2 1 Isochronous (Video/Still) 1 Bulk (Video/Still)
Method 3 1 Isochronous (Video) 1 Bulk (Video)
1 Bulk (Still) 1 Bulk (Still)
one relationship between the VideoStreaming interface and the single data stream related to the
endpoint.
A VideoStreaming interface with isochronous endpoints must have alternate settings that can be
used to change certain characteristics of the interface and underlying endpoint(s). A typical use
of alternate settings is to provide a way to change the bandwidth requirements an active
isochronous pipe imposes on the USB. All devices that transfer isochronous video data must
incorporate a zero-bandwidth alternate setting for each VideoStreaming interface that has an
isochronous video endpoint, and it must be the default alternate setting (alternate setting zero). A
device offers to the Host software the option to temporarily relinquish USB bandwidth by
switching to this alternate setting. The zero-bandwidth alternate setting does not contain a
VideoStreaming isochronous data endpoint descriptor.
A VideoStreaming interface containing a bulk endpoint for streaming shall support only alternate
setting zero. Additional alternate settings containing bulk endpoints are not permitted in a device
that is compliant with the Video Class specification. This restriction does not prohibit the mix of
bulk and isochronous endpoints when the bulk endpoints are used solely for Still Image Transfer
Method 3. In that case, each alternate setting will include the descriptors for both an isochronous
endpoint and a bulk endpoint.
For device implementers, the process of determining the number of alternate settings to be
provided and the maximum packet size for the video data endpoint in each alternate setting is
implementation dependent, and would depend on the bandwidth usage across the range of video
parameter combinations that the VideoStreaming interface is capable of supporting.
USB Bandwidth
Function Bandwidth
Stream
The optimal allocation of the USB bandwidth to match the function’s bandwidth requirement is
achieved via negotiation between the host and the device.
See section 4.3.1.1, "Video Probe and Commit Control" for a complete description of the
negotiation process.
The negotiation process allows the host to provide preferred stream parameters to the device,
while the device selects the best combination of streaming parameters and reports the maximum
bandwidth usage for those settings. The host will use the bandwidth information to identify the
optimal alternate interface. The device is responsible for choosing the live streaming parameters
once the bandwidth is allocated. These parameters may be different than originally agreed upon
during the negotiation process. However, during the negotiation process, the host provided hints
to the device indicating the preferred way to choose the live stream parameters.
Once bandwidth has been allocated and streaming started, further parameter negotiation between
the host and the device can be performed without disturbing the current stream. Streaming
parameters are set as a group so that the function will have all information available while it
attempts to determine a working set.
Still image Method 2 uses a similar mechanism (see section 2.4.2.4, “Still Image Capture”).
A single video sample may require multiple class-defined Payload Transfers. Conversely, there
may be one or more video samples within a single Payload Transfer. In the latter case, there must
be an integral number of fixed size samples within each Payload Transfer.
The VideoStreaming endpoint(s) encapsulate data with the class-defined Payload Header. This
encapsulation is identical for Payload Transfers on both isochronous and bulk endpoint types,
and applies to both the streaming and still image endpoints.
The following block diagram details the protocol layering and abstraction used in Payload
Transfers.
Video Codec
1. I/O Request Packet (IRP) requests from the client to the USB system software result in USB
transfers.
2. In response to IRP completion, the host software forwards the data in the form of payload
transfers. The bulk and isochronous handlers hide the transfer type differences from the
upper layers of the protocol stack.
3. The video sample handler accumulates the individual payload transfers to form a sample
transfer.
A Payload Transfer is composed of the class-defined payload header (see section 2.4.3.3 "Video
and Still Image Payload Headers") followed by the format-specific payload data.
Payload
Payload
Header
Data
Payload
Payload
Header
Data
.
.
.
IN DATA0 IN DATA1
Payload
Payload
Header
Data
Payload
Payload
Header
Data
Video
Sample
Payload
Payload
Header
Data
Video
Sample
Figure 2-15 gives an example of a High Speed/High Bandwidth transfer over an IN endpoint.
SOF Packet
Payload
Payload
Header
Data
SOF Packet
Payload
Payload
Header
Data
.
.
SOF Packet
Payload Payload
Header Data
Figure 2-16 gives an example of a High Speed/High Bandwidth transfer over an OUT endpoint.
SOF Packet
Payload Payload
Header Data
SOF Packet
Payload Payload
Header Data
.
.
.
SOF Packet
Payload Payload
Header Data
Figure 2-17 gives an example of a Full or High Speed transfer over an IN endpoint.
SOF Packet
IN DATA0
Payload
Header Payload
Data
SOF Packet
IN DATA0
Video
Sample
Payload
Header Payload
Data
.
.
.
SOF Packet
IN DATA0
Payload Payload
Header Data
Figure 2-18 gives an example of a Full or High Speed transfer over an OUT endpoint.
SOF Packet
OUT DATA0
Payload
Header Payload
Data
SOF Packet
Payload
Header Payload
. Data
.
.
SOF Packet
OUT DATA0
Payload Payload
Header Data
The following fields may or may not be included in the header, depending on the bits that were
specified in the bmHeaderInfo field above.
These fields are in the order in which they are specified in the bitmap header field above, in the
order of least significant bit first. Because the header itself might be extended in the future, the
offset of dwPresentationTime is also variable. The device will indicate if it supports these fields
in the Payload Format Descriptor within the class-specific VideoStreaming descriptor. See
section 3.9.2.3 "Payload Format Descriptors".
Table 2-6 Extended Fields of the Payload Header
Offset Field Size Value Description
Variable dwPresentationTime 4 Number Presentation Time Stamp (PTS).
The source clock time in native device
clock units when the raw frame capture
begins. This field may be repeated for
multiple payload transfers comprising a
single video frame, with the restriction
that the value shall remain the same
The device has multiple video and/or audio source functions and is sending audio and
video streams to the host.
The video and/or audio streams are interrelated and therefore need to be kept
synchronized.
The stream format in use does not already contain timestamp and clock reference
information (MPEG-2 TS is an example of a format that contains this information).
The sample is part of a video frame (and not a still image frame).
For temporally encoded payloads, the dwPresentationTime and dwSourceClock fields may be
required for all video frames. See the appropriate payload specification for details.
These time information fields allow the host software to reconstruct the source clock to support
high-quality synchronization between separate data pipes (audio, video, etc.) and rate matching
between the data source and sink, as discussed in the following section.
2.4.3.4.1 Latency
The media source is required to report its internal latency (delay from data acquisition to data
delivery on the bus). This latency reflects the lag introduced by any buffering, compression,
decompression, or processing done by the stream source. Without latency information for each
stream, a media sink (or rendering device) cannot properly correlate the presentation times of
each stream.
In the case of a video source, this means that the source must guarantee that the portion of a
sample fully acquired as of SOFn (Start Of Frame n) will have been completely sent to the bus as
of SOFn+. Latency is the source’s internal delay expressed in number of USB frames
(milliseconds). For high-speed endpoints, the resolution increases to 125 microseconds, but the
delay will continue to be expressed in number of USB frames. Every VideoStreaming interface
must report this latency value. See the description of the wDelay parameter in section 4.3.1.1,
"Video Probe and Commit Controls". By following these rules, phase jitter is limited to ±1
millisecond. It is up to the video sink to synchronize streams by scheduling the rendering of
samples at the correct moment, taking into account the internal delays of all media streams being
rendered.
To understand the problem of clocks running at slightly different rates, consider the following
example. For simplicity, assume that video buffers can be filled instantaneously, and that there is
one buffer available to be filled at any given time within the video frame interval. Also assume
that the two crystals governing the source and rendering clocks operate with 100ppm (parts per
million) accuracy. The accuracy value is a ratio that can be applied such that for every frame, the
clock will drift by a fraction of the frame that is equal to the ratio. In other words, two clocks
with accuracy of 100ppm could have a worst case drift relative to each other of 1/5,000th of a
frame (two clocks at opposite extremes of their valid operating range for a cumulative error ratio
of 2 * 100/1,000,000). Therefore, a frame glitch will occur once every 5,000 frames. At a frame
rate of 30 fps, this would equate to a glitch every 166.67 seconds. At a frame rate of 60 fps, it’s
worse, with one glitch every 83.3 seconds.
Frame glitches can be postponed, but not avoided, by adding additional buffers to hold video
frames before they are rendered. If the source clock is running slower than the rendering clock,
the buffer underrun could only be postponed by letting the extra buffers fill to a certain threshold
before rendering, resulting in unacceptable latency. Once the first glitch occurs, the extra buffers
are effectively useless, since the behavior will degrade to the single-buffer case from that point
onward.
This specification assumes that in all cases, the media sink has no control over the media source
clock, and that the source and sink do not "slave" to a common clock (the bus clock lacking
sufficient resolution). Also, due to cost constraints, additional isochronous endpoints to
communicate clock rate information will not be used. Therefore, this specification requires that a
video stream include clock reference information that can be used to adjust the rendering clock
rate. The clock reference information may be encapsulated in a transport stream, or it may be
provided via an optional field in each payload header. This field becomes required in the latter
case.
compression and encoding, they are not considered fixed-rate streams and will require
timestamps on the samples.
After bus bandwidth for the video data pipe of the corresponding VideoStreaming interface has
been allocated and streaming has commenced, the data source may dynamically vary the frame
interval (and the corresponding frame rate), as long as the new frame interval does not require
greater bus bandwidth than what was originally allocated. The data sink would determine the
new frame interval based on the Presentation Time Stamp (PTS) information included in the
video payload headers.
The device indicates its support for dynamic format change events through the bmInfo field of
the VideoStreaming Input Header. See section 3.9.2.1 "Input Header Descriptor".
When a dynamic format change event occurs, the following steps take place:
Device detects dynamic format change (while streaming is occurring).
Device begins sending empty data payloads to the host with the Error bit set in the video
stream payload header.
Device sets the Stream Error Code Control to "Format Change" (see section 4.3.1.7
"Stream Error Code Control").
The host queries the new stream state through a VS_PROBE_CONTROL request with
the GET_CUR attribute (see 4.3.1.1, “Video Probe and Commit Controls”).
If the new format is acceptable by the host, it issues a VS_COMMIT_CONTROL request
with the SET_CUR attribute and, if necessary, reallocates the USB bandwidth through an
alternate interface selection standard request. If the new format is not acceptable, the host
will negotiate a new format with the stream PROBE/COMMIT controls.
Frame-based video formats – These video formats require the frame/sample boundary
information to be transmitted out-of-band. Examples of such formats are uncompressed
video (formatted in various YUV variants), MJPEG, and DV. For these formats, the FID
(and optionally EOF) bits in the UVC payload headers must be supported.
Stream-based video formats – These video formats have the frame/sample boundary
information transmitted in-band. Examples of such formats are MPEG-2 TS, MPEG-2 PS
and MPEG-1 system streams. For these formats, the FID and EOF bits are optional. If
used, the bits allow the sender to identify codec-specific segment boundaries within the
stream. The receiver would typically use this information to provide data to a decoder
with lower latency than would be possible if buffer fullness alone was used to trigger
buffer completion (see section 4.3.1.1, “Video Probe and Commit Controls”).
Temporally Encoded video formats – While these video formats have the frame/sample
boundary information transmitted in-band, they are often managed as frames or sub-
frames by the host. The EOF and EOS bits are required to indicate these boundaries to
the host so it may generate time stamps and trigger buffer completion on these
boundaries. Examples of temporally encoded video formats are H.264 and VP8.
The following is determined by the format class under which the video format is classified:
The default Incoming/Outgoing data processing algorithm
Bit fields supported by default in UVC payload header (BFH[0])
Control transfers minimally have two transaction stages: Setup and Status. A control transfer
may optionally contain a Data stage between the Setup and Status stages. The Setup stage
contains all information necessary to address a particular entity, specify the desired operation,
and prepare for an optional Data stage. A Data stage can be host-to-device (OUT transactions),
or device-to-host (IN transactions), depending on the direction and operation specified in the
Setup stage via the bmRequestType and bRequest fields.
In the context of the Video Class specification, SET_CUR requests will always involve a Data
stage from host to device, and GET_* requests will always involve a Data stage from device to
host. Although none are defined currently, an exception to this rule would be a SET_CUR
request where the bRequest field contains all information necessary to place the device into a
known state. However, “toggle” requests without a Data stage are explicitly disallowed.
The device shall use protocol STALL (not function stall) during the Data or Status stages if the
device is unable to complete the Control transfer (see section 8.5.3.4 of the USB Specification
Revision 2.0). Reasons for protocol STALL include unsupported operations, invalid target entity,
invalid control selector, unexpected Data length, or invalid Data content. The device shall update
the value of Request Error Code Control, and the host may use that control to determine the
reason for the protocol STALL (see section 4.2.1.2 "Request Error Code Control"). The device
must not NAK or STALL the SETUP transaction.
Typically, the host will serialize Control Transfers, meaning that the next Setup stage will not
begin until the previous Status stage has completed. However, in situations where the bus has
experienced errors, a Setup transaction may be sent before the completion of a previous control
transfer. The device must abandon the previous control transfer.
Due to this command serialization, it is important that the duration of control transfers (from
Setup stage through Status stage) be kept as short as possible. For this reason, as well as the
desire to avoid polling for device status, this specification defines an interrupt status mechanism
to convey status changes independently of the control transfers that caused the state change. This
mechanism is described in section 2.4.2.2, "Status Interrupt Endpoint". Any control that requires
more than 10ms to respond to a SET_CUR request (asynchronous control), or that can change
independently of any external SET_CUR request (Autoupdate control), must send a Control
Change status interrupt. These characteristics will be reflected in the GET_INFO response for
that control (see 4.1.2, “Get Request”).
In the case of a SET_CUR request with valid parameters to an Asynchronous Control, the
Control Transfer operation shall enter the Status stage immediately after receiving the data
transferred during the Data stage. Once the Status stage has successfully completed, the device
shall eventually send a Control Change Interrupt that will reflect the outcome of the request:
If the request succeeded, the Control Change Interrupt will advertise the new value (see
section 2.4.2.2 “Status Interrupt Endpoint”).
If the request could not be executed, the device shall send a Control Change Interrupt
using the Control Failure Change mechanism to describe the reason for the failure (see
Table 2-1 in section 2.4.2.2 “Status Interrupt Endpoint” and Figure 2-23 in section 2.4.4
“Control Transfer and Request Processing”).
The amount of time between the end of a successful Status stage and the Control Change
interrupt is implementation specific. For instance, a tape transport might take 3-5 seconds to
completely change state, so the Control Change interrupt would be sent within 3-5 seconds.
The following flow diagrams show the Setup, Data and Status stages of SET_CUR Control
Transfers for controls supporting one of the two legal bit combinations with the D1 (SET) bit
enabled. These are described because they show the relationship between a SET_CUR request
and the resulting state change.
SET/GET Supported
Host Device
Setup Stage
Data Stage
SET_CUR State
<=10ms Change
Status Stage
SET/GET/Interrupt Supported
Host Device
Setup Stage
Data Stage
SET_CUR
<=10ms
Status Stage
>10ms
State
Change
Control Change interrupt
Host Device
Setup Stage
Data Stage
SET_CUR
<=10ms
Status Stage
Setup Stage
GET_CUR
(same Data Stage STALL
control)
Setup Stage
GET_CUR
(other Data Stage
control)
Status Stage
>10ms
Setup Stage
SET_CUR
(same Data Stage STALL
control)
State
Change
Control Change interrupt
Setup Stage
GET_CUR
(same Data Stage
control)
Status Stage
Host Device
Setup Stage
SET_CUR
Data Stage STALL
Device
Setup Stage Busy
SET_CUR
Data Stage STALL
Setup Stage
Data Stage
SET_CUR
<=10ms
Status Stage
>10ms
State
Change
Control Change interrupt
Host Device
Setup Stage
Data Stage
SET_CUR
<=10ms
Status Stage
>10ms
State
Change
Control Change interrupt
3 Descriptors
Descriptors are used by USB devices to report their attributes. A descriptor is a data structure
with a defined format. For information, see section 9.5 Descriptors of USB Specification
Revision 2.0.
The following sections describe the standard and class-specific USB descriptors for the Video
Interface Class.
For devices that contain a video function that only exposes a VideoControl Interface, the device
descriptor must indicate that class information is to be found at the interface level. Therefore, the
bDeviceClass field of the device descriptor must contain zero so that enumeration software
looks down at the interface level to determine the Interface Class. The bDeviceSubClass and
bDeviceProtocol fields must be set to zero.
Devices that expose one or more Video Interface Collections also indicate that class information
is to be found at the interface level. However, since the device uses an Interface Association
Descriptor in order to describe the Video Interface Collection, it must set the bDeviceClass,
bDeviceSubClass and bDeviceProtocol fields 0xEF, 0x02 and 0x01 respectively. This set of
class codes is defined as the Multi-interface Function Class codes.
All other fields of the device descriptor must comply with the definitions in section 9.6.1
"Device" of the appropriate USB specification (USB Specification Revision 2.0 or USB
Specification Revision 3.0). There is no class-specific device descriptor.
If the VideoControl interface is part of a Video Interface Collection, the iFunction field in the
IAD and the iInterface field in the Standard VC interface descriptor for this Video Interface
Collection must be equal.
The total length of the class-specific VC interface descriptor depends on the number of Units and
Terminals in the video function. Therefore, the descriptor starts with a header that reflects the
total length in bytes of the entire class-specific VC interface descriptor in the wTotalLength
field. The bcdUVC field identifies the release of the Video Device Class Specification with
which this video function and its descriptors are compliant. The bInCollection field indicates
how many VideoStreaming interfaces there are in the Video Interface Collection to which this
VideoControl interface belongs. The baInterfaceNr() array contains the interface numbers of all
the VideoStreaming interfaces in the Collection. The bInCollection and baInterfaceNr() fields
together provide all necessary information to determine which interfaces together constitute the
entire USB interface to the video function, i.e., describe the Video Interface Collection.
The order in which the Unit and Terminal descriptors are reported is not important, because
every descriptor can be identified through its bDescriptorType and bDescriptorSubtype fields.
The bDescriptorType field identifies the descriptor as being a class-specific interface descriptor.
The bDescriptorSubtype field further qualifies the exact nature of the descriptor.
This header is followed by one or more Unit and/or Terminal Descriptors. The layout of the
descriptors depends on the type of Unit or Terminal they represent. There is a descriptor type for
each Unit and Terminal described in section 2.3, "Video Function Topology". They are
summarized in the following sections. The first four fields are common for all Unit and Terminal
Descriptors. They contain the Descriptor Length, Descriptor Type, Descriptor Subtype, and Unit
or Terminal ID.
Each Unit and Terminal within the video function is assigned a unique identification number, the
Unit ID (UID) or Terminal ID (TID), contained in the bUnitID or bTerminalID field of the
descriptor. The value 0x00 is reserved for undefined ID, effectively restricting the total number
of addressable entities in the video function (both Units and Terminals) to 255.
Besides uniquely identifying all addressable entities in a video function, the IDs also serve to
describe the topology of the video function; i.e., the bSourceID field of a Unit or Terminal
descriptor indicates to which other Unit or Terminal this Unit or Terminal is connected.
The Input Terminal is uniquely identified by the value in the bTerminalID field. No other Unit
or Terminal within the same video function may have the same ID. This value must be passed in
the bTerminalID field of each request that is directed to the Terminal.
The wTerminalType field provides pertinent information about the physical entity that the Input
Terminal represents. This could be a USB OUT endpoint, an external Composite Video In
connection, a camera sensor, etc. A complete list of Terminal Type codes is provided in section
B.2, "Input Terminal Types".
The bAssocTerminal field is used to associate an Output Terminal to this Input Terminal,
effectively implementing a bi-directional Terminal pair. An example of this would be a tape unit
on a camcorder, which would have Input and Output Terminals to sink and source video
respectively. If the bAssocTerminal field is used, both associated Terminals must belong to the
bi-directional Terminal Type group. If no association exists, the bAssocTerminal field must be
set to zero.
The Host software can treat the associated Terminals as being physically related. In many cases,
one Terminal can not exist without the other. An index to a string descriptor is provided to
further describe the Input Terminal.
The Output Terminal is uniquely identified by the value in the bTerminalID field. No other Unit
or Terminal within the same video function may have the same ID. This value must be passed in
the bTerminalID field of each request that is directed to the Terminal.
The wTerminalType field provides pertinent information about the physical entity the Output
Terminal represents. This could be a USB IN endpoint, an external Composite Video Out
connection, a LCD display, etc. A complete list of Terminal Type codes is provided in section
B.3, "Output Terminal Types".
The bAssocTerminal field is used to associate an Input Terminal to this Output Terminal,
effectively implementing a bi-directional Terminal pair. If the bAssocTerminal field is used,
both associated Terminals must belong to the bi-directional Terminal Type group. If no
association exists, the bAssocTerminal field must be set to zero.
The bSourceID field is used to describe the connectivity for this Terminal. It contains the ID of
the Unit or Terminal to which this Output Terminal is connected via its Input Pin.
An index to a string descriptor is provided to further describe the Output Terminal.
The wTerminalType field provides pertinent information about the physical entity that the Input
Terminal represents. For the Camera Terminal, this field shall be set to ITT_CAMERA
(see section B.2, “Input Terminal Types”).
The bAssocTerminal field is used to associate an Output Terminal to this Input Terminal,
effectively implementing a bi-directional Terminal pair. An index to a string descriptor is
provided to further describe the Camera Terminal.
The bmControls field is a bitmap, indicating the availability of certain camera controls for the
video stream.
The layout of the Camera Terminal descriptor is detailed in the following table.
Table 3-6 Camera Terminal Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 18
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type
2 bDescriptorSubtype 1 Constant VC_INPUT_TERMINAL
descriptor subtype
3 bTerminalID 1 Constant A non-zero constant that uniquely
identifies the Terminal within the
video function. This value is used
in all requests to address this
Terminal.
4 wTerminalType 2 Constant Constant that characterizes the type
of Terminal. This is set to the
ITT_CAMERA value.
6 bAssocTerminal 1 Constant ID of the Output Terminal to which
this Input Terminal is associated.
7 iTerminal 1 Index Index of a string descriptor that
describes the Camera Terminal.
8 wObjectiveFocalLengthMin 2 Number The value of Lmin If Optical Zoom
is not supported; this field shall be
set to 0.
10 wObjectiveFocalLengthMax 2 Number The value of Lmax If Optical Zoom
is not supported; this field shall be
set to 0.
12 wOcularFocalLength 2 Number The value of Locular If Optical
Zoom is not supported; this field
shall be set to 0.
14 bControlSize 1 Number Size in bytes of the bmControls
field: 3
15 bmControls 3 Bitmap A bit set to 1 indicates that the
mentioned Control is supported for
the video stream.
D0: Scanning Mode
D1: Auto-Exposure Mode
The bNrInPins field contains the number of Input Pins (p) of the Selector Unit. The connectivity
of the Input Pins is described via the baSourceID() array that contains p elements. The index i
into the array is one-based and directly related to the Input Pin numbers. baSourceID(i) contains
the ID of the Unit or Terminal to which Input Pin i is connected.
The following table details the structure of the Selector Unit descriptor.
Table 3-7 Selector Unit Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 6+p
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type
2 bDescriptorSubtype 1 Constant VC_SELECTOR_UNIT descriptor subtype
3 bUnitID 1 Number A non-zero constant that uniquely identifies
the Unit within the video function. This
The bSourceID field is used to describe the connectivity for this Processing Unit. It contains the
ID of the Unit or Terminal to which this Processing Unit is connected via its Input Pin.
bSourceID must refer to a Unit or Terminal in the same video function. The bmControls field
is a bit-map, indicating the availability of certain processing Controls for the video stream.
The layout of the Processing Unit descriptor is detailed in the following table.
Table 3-8 Processing Unit Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 13
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type
2 bDescriptorSubtype 1 Constant VC_PROCESSING_UNIT descriptor
subtype
3 bUnitID 1 Number A non-zero constant that uniquely identifies
the Unit within the video function. This value
is used in all requests to address this Unit.
4 bSourceID 1 Constant ID of the Unit or Terminal to which this Unit
is connected.
5 wMaxMultiplier 2 Number If the Digital Multiplier control is supported,
this field indicates the maximum digital
magnification, multiplied by 100. For
example, for a device that supports 1-4.5X
digital zoom (a multiplier of 4.5), this field
would be set to 450. If the Digital Multiplier
D0: None
D1: NTSC – 525/60
D2: PAL – 625/50
D3: SECAM – 625/50
D4: NTSC – 625/50
D5: PAL – 525/60
D6-D7: Reserved. Set to zero.
The bSourceID field is used to describe the connectivity for this Encoding Unit. It contains the
ID of the Unit or Terminal to which this Encoding Unit is connected via its Input Pin.
bSourceID must refer to a Unit or Terminal in the same video function. The bmControls field is
a bit-map, indicating the availability of certain encoding Controls for the video stream.
An index to a string descriptor is provided to further describe the Encoding Unit.
The layout of the Encoding Unit descriptor is detailed in the following table.
The Encoding Unit Descriptor supports separate lists for bmControls and bmRuntimeControls.
The use here of two lists reflects the expectation that many UVC devices will not be able to
support the same features while streaming video as during initialization. This is partially due to
the inherent asynchronous nature of encoder control offered over USB.
The Extension Unit Descriptor allows the hardware designer to define any arbitrary set of
controls such that a class driver can act as an intermediary between vendor-supplied host
software and functionality of the device.
The guidExtensionCode field contains a vendor-specific code that further identifies the
Extension Unit.
The bNrInPins field contains the number of Input Pins (p) of the Extension Unit. The
connectivity of the Input Pins is described via the baSourceID() array that contains p elements.
The index i into the array is one-based and directly related to the Input Pin numbers.
baSourceID(i) contains the ID of the Unit or Terminal to which Input Pin i is connected.
The bmControls field is a bitmap, indicating the availability of certain video Controls in the
Extension Unit. For future expandability, the number of bytes occupied by the bmControls field
is indicated in the bControlSize field. All Controls are optional.
The following table defines the class-specific VS interface Input Header descriptor.
Table 3-14 Class-specific VS Interface Input Header Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 13+(p*n).
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type
2 bDescriptorSubtype 1 Constant VS_INPUT_HEADER descriptor subtype
3 bNumFormats 1 Number Number of video payload Format
descriptors following for this interface
(excluding video Frame descriptors): p
4 wTotalLength 2 Number Total number of bytes returned for the
class-specific VideoStreaming interface
descriptors including this header
descriptor.
6 bEndpointAddress 1 Endpoint The address of the isochronous or bulk
endpoint used for video data. The address
is encoded as follows:
D7: Direction
1 = IN endpoint
D6..4: Reserved, set to zero.
D3..0: The endpoint number, determined
by the designer.
7 bmInfo 1 Bitmap Indicates the capabilities of this
VideoStreaming interface:
D0: Dynamic Format Change supported
D7..1: Reserved, set to zero.
8 bTerminalLink 1 Constant The terminal ID of the Output Terminal to
which the video endpoint of this interface
is connected.
9 bStillCaptureMethod 1 Number Method of still image capture supported as
described in section 2.4.2.4, "Still Image
Capture":
0: None (Host software will not support
any form of still image capture)
1: Method 1
2: Method 2
3: Method 3
10 bTriggerSupport 1 Number Specifies if hardware triggering is
supported through this interface
0: Not supported
1: Supported
11 bTriggerUsage 1 Number Specifies how the host software shall
respond to a hardware trigger interrupt
event from this interface. This is ignored if
The following table defines the class-specific VS interface output header descriptor:
Table 3-15 Class-specific VS Interface Output Header Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 9+(p*n)
1 bDescriptorType 1 Constant CS_INTERFACE descriptor type
2 bDescriptorSubtype 1 Constant VS_OUTPUT_HEADER descriptor
subtype
3 bNumFormats 1 Number Number of video payload Format
descriptors following for this interface
(excluding video Frame descriptors): p
4 wTotalLength 2 Number Total number of bytes returned for the
class-specific VideoStreaming interface
descriptors including this header
descriptor.
6 bEndpointAddress 1 Endpoint The address of the isochronous or bulk
endpoint used for video data. The address
is encoded as follows:
D7: Direction
0 = OUT endpoint
D6..4: Reserved, set to zero
D3..0: The endpoint number, determined
by the designer.
7 bTerminalLink 1 Constant The terminal ID of the Input Terminal to
which the video endpoint of this interface
is connected.
8 bControlSize 1 Number Size of each bmaControls(x) field, in
bytes: n
9 bmaControls(1) n Bitmap For bits D3..0, a bit set to 1 indicates that
the named field is supported by the Video
Probe and Commit Control when
bFormatIndex is 1:
D0: wKeyFrameRate
D1: wPFrameRate
D2: wCompQuality
D3: wCompWindowSize
DV USB_Video_Payload_DV
Vendor Defined USB_Video_Payload_Stream_Based or
USB_Video_Payload_Frame_Based
The Still Image Frame descriptor contains the range of image sizes available from the device,
which comprise the list of possible still image formats. To select a particular still image format,
host software sends control requests to the corresponding interface (see section 4.3.1.2, "Video
Still Probe Control and Still Commit Control").
The Still Image Frame descriptor is shown in Table 3-18 Still Image Frame Descriptor below.
The bEndpointAddress field contains the bulk endpoint address within the related VS interface
that is used for still image capture. The endpoint always functions as an IN-Endpoint.
The wWidth(x) and wHeight(x) fields form an array of image sizes supported by the device,
measured in pixels of an uncompressed image.
The bNumImageSizePatterns represents the number of wWidth and wHeight pairs in the
array.
The bCompression field represents the image quality that would be generated by the device.
The range of compression values is from 0 to 255. A small value indicates a low compression
ratio and high quality image. The default setting of this value depends on device implementation.
The bCompression(x) fields form an array of compression ratios supported by the device for all
image sizes. The bNumCompressionPatterns field represents the number of bCompression
fields in this array.
… … … … …
10+4*n bCompression(m) 1 Number Compression of the still image in pattern
-4+m-1 m
For example, this descriptor would be used with Uncompressed Video, MJPEG and MPEG-1
formats. It would not be used in the case MPEG-2, DV or MPEG-4 because the information is
already available implicitly (DV) or explicitly (MPEG-2, MPEG-4). If a format requires this
descriptor, the corresponding payload specification must enforce this requirement.
In the absence of this descriptor, or in the case of “Unspecified” values within the descriptor,
color matching defaults will be assumed. The color matching defaults are compliant with sRGB
since the BT.709 transfer function and the sRGB transfer function are very similar.
The viewing conditions and monitor setup are implicitly based on sRGB and the device should
compensate for them (D50 ambient white, dim viewing or 64 lux ambient illuminance, 2.2
gamma reference CRT, etc).
Table 3-19 Color Matching Descriptor
Offset Field Size Value Description
0 bLength 1 Constant 6
1 bDescriptorType 1 Number CS_INTERFACE type
2 bDescriptorSubtype 1 Number VS_COLORFORMAT
3 bColorPrimaries 1 Number This defines the color primaries
and the reference white.
0: Unspecified (Image
characteristics unknown)
1: BT.709, sRGB (default)
2: BT.470-2 (M)
3: BT.470-2 (B, G)
4: SMPTE 170M
5: SMPTE 240M
6-255: Reserved
4 bTransferCharacteristics 1 Number This field defines the opto-
electronic transfer characteristic of
the source picture also called the
gamma function.
0: Unspecified (Image
characteristics unknown)
1: BT.709 (default)
2: BT.470-2 M
3: BT.470-2 B, G
4: SMPTE 170M
5: SMPTE 240M
6: Linear (V = Lc)
7: sRGB (very similar to BT.709)
8-255: Reserved
5 bMatrixCoefficients 1 Number Matrix used to compute luma and
chroma values from the color
primaries.
0: Unspecified (Image
characteristics unknown)
1: BT. 709
2: FCC
3: BT.470-2 B, G
4: SMPTE 170M (BT.601,
default)
5: SMPTE 240M
6-255: Reserved
This optional endpoint is only implemented by the device if it supports method 3 of still image
capture. If implemented, it should always follow the Video Data endpoint (where available) in
descriptor ordering and endpoint addressing.
Table 3-22 Standard VS Bulk Still Image Data Endpoint Descriptor
Offset Field Size Value Description
0 bLength 1 Number Size of this descriptor, in bytes: 7
If the VideoControl interface is part of a Video Interface Collection, the iFunction field in the
IAD and the iInterface field in the Standard VC interface descriptor for this Video Interface
Collection must be equal. See section 3.5.
Since the device must implement the device name string descriptor, it must also support String
Descriptor Zero which contains the list of LANGID codes supported by the device. This
descriptor, as well as the layout of a standard UNICODE String Descriptor, is defined in section
9.6.7 "String" of the USB Specification Revision 2.0.
4 Class-Specific Requests
Most class-specific requests are used to set and get video related Controls. These Controls fall
into two main groups: those that manipulate controls related to the video function, such as
brightness, exposure, selector position, etc. and those that influence data transfer over a video
data endpoint, such as the current frame rate.
Requests may be mandatory or optional and listed as such for every control. Where SET_CUR is
optional, its presence is determined via GET_INFO. If a video function does not support a
certain request, it must indicate this by stalling the control pipe when that request is issued to the
function.
The bmRequestType field specifies that this is a SET request (D7=0). It is a class-specific
request (D6..5=01), directed to either the VideoControl interface, or a VideoStreaming interface
of the video function (D4..0=00001), or the video data endpoint of a VideoStreaming interface
(D4..0=00010).
The bRequest field contains a constant that identifies which attribute of the addressed Control is
to be modified. Possible attributes for a Control are:
Current setting attribute (SET_CUR)
If the addressed Control or entity does not support modification of a certain attribute, the control
pipe must indicate a stall when an attempt is made to modify that attribute. Only the CUR
attribute is supported for the Set request. For the list of Request constants, refer to section A.8,
"Video Class-Specific Request Codes"
The wValue field interpretation is qualified by the value in the wIndex field. Depending on what
entity is addressed, the layout of the wValue field changes. The following paragraphs describe
the contents of the wValue field for each entity separately. In most cases, the wValue field
contains the Control Selector (CS) in the high byte. It is used to address a particular Control
within entities that can contain multiple Controls. If the entity only contains a single Control,
there is no need to specify a Control Selector and the wValue field can be used to pass additional
parameters.
The wIndex field specifies the interface or endpoint to be addressed in the low byte, and the
entity ID or zero in the high byte. In case an interface is addressed, the virtual entity "interface"
can be addressed by specifying zero in the high byte. The values in wIndex must be appropriate
to the recipient. Only existing entities in the video function can be addressed, and only
appropriate interface or endpoint numbers may be used. If the request specifies an unknown or
non-entity ID or an unknown interface or endpoint number, the control pipe must indicate a stall.
The actual parameter(s) for the Set request are passed in the data stage of the control transfer.
The length of the parameter block is indicated in the wLength field of the request. The layout of
the parameter block is qualified by both the bRequest and wIndex fields. Refer to the following
sections for a detailed description of the parameter block layout for all possible entities.
The bmRequestType field specifies that this is a GET request (D7=1). It is a class-specific
request (D6..5=01), directed to either the VideoControl interface or a VideoStreaming interface
of the video function (D4..0=00001), or the video data endpoint of a VideoStreaming interface
(D4..0=00010).
The bRequest field contains a constant that identifies which attribute of the addressed Control or
entity is to be returned. Possible attributes for a Control are:
Current setting attribute (GET_CUR)
Minimum setting attribute (GET_MIN)
Maximum setting attribute (GET_MAX)
Default setting attribute (GET_DEF)
Resolution attribute (GET_RES)
Data length attribute (GET_LEN)
Information attribute (GET_INFO)
The GET_INFO request queries the capabilities and status of the specified control. When issuing
this request, the wLength field shall always be set to a value of 1 byte. The result returned is a
bit mask reporting the capabilities of the control. The bits are defined as:
Table 4-3 Defined Bits Containing Capabilities of the Control
Bit field Description Bit State
D0 1=Supports GET value requests Capability
D1 1=Supports SET value requests Capability
D2 1=Disabled due to automatic mode (under State
device control)
D3 1= Autoupdate Control (see section 2.4.2.2 Capability
"Status Interrupt Endpoint")
D4 1= Asynchronous Control (see sections Capability
2.4.2.2 "Status Interrupt Endpoint" and 2.4.4,
“Control Transfer and Request Processing”)
D5 1= Disabled due to incompatibility with State
Commit state.
D7..D6 Reserved (Set to 0) --
The two bits in GET_INFO that reflect the state of the control are D2 (Disabled due to
Automatic Mode) and D5 (Disabled due to incompatibility with Commit state). The other bits
are capability bits. Capability bits should not change when state bits change. For example, when
a control is set in Automatic Mode (D2 set), the bit D1 must not be updated in GET_INFO.
If a control is implemented such that D2 can be set, the device needs to have the capability of
sending Control Change Interrupts, thus D3 (Autoupdate Control) must be set. If a control is
implemented such that D5 can be set, the device should to have the capability of sending Control
Change Interrupts.
If an Encoding Unit control is implemented such that the device may initiate a change in the
minimum and/or maximum setting attribute for that control, then the device should have the
capability of sending Control Change Interrupts to notify the host of the new GET_MIN and/or
GET_MAX settings, thus the D3 (Autoupdate Control) must be set.
The device indicates hardware default values for Unit, Terminal and Interface Controls through
their GET_DEF values. These values may be used by the host to restore a control to its default
setting.
If the addressed Control or entity does not support readout of a certain attribute, the control pipe
must indicate a stall when an attempt is made to read that attribute. For the list of Request
constants, refer to section A.8, "Video Class-Specific Request Codes".
The wValue field interpretation is qualified by the value in the wIndex field. Depending on what
entity is addressed, the layout of the wValue field changes. The following paragraphs describe
the contents of the wValue field for each entity separately. In most cases, the wValue field
contains the Control Selector (CS) in the high byte. It is used to address a particular Control
within entities that can contain multiple Controls. If the entity only contains a single Control,
there is no need to specify a Control Selector and the wValue field can be used to pass additional
parameters.
The wIndex field specifies the interface or endpoint to be addressed in the low byte, and the
entity ID or zero in the high byte. In case an interface is addressed, the virtual entity "interface"
can be addressed by specifying zero in the high byte. The values in wIndex must be appropriate
to the recipient. Only existing entities in the video function can be addressed, and only
appropriate interface or endpoint numbers may be used. If the request specifies an unknown or
non-entity ID, or an unknown interface or endpoint number, the control pipe must indicate a
stall.
The actual parameter(s) for the Get request are returned in the data stage of the control transfer.
The length of the parameter block to return is indicated in the wLength field of the request. If
the parameter block is longer than is indicated in the wLength field, only the initial bytes of the
parameter block are returned. If the parameter block is shorter than is indicated in the wLength
field, the device indicates the end of the control transfer by sending a short packet when further
data is requested. The layout of the parameter block is qualified by both the bRequest and
wIndex fields. Refer to the following sections for a detailed description of the parameter block
layout for all possible entities.
Each of the following control definitions specifies whether requests are mandatory or optional
for that control. Any implemented request must comply with the definition for that control. The
device manufacturer is free to implement any other requests, but the definition of those
unspecified requests shall be ignored by host implementations, with the exception of the
GET_LEN request. If the GET_LEN request is implemented, the host software will use the result
to determine the correct buffer length for Set and Get requests.
The bRequest field indicates which attribute the request is manipulating. The MIN, MAX, and
RES attributes are not supported for the Set request.
The wValue field specifies the Control Selector (CS) in the high byte, and the low byte must be
set to zero. The Control Selector indicates which type of Control this request is manipulating. If
the request specifies an unknown CS to that endpoint, the control pipe must indicate a stall.
Vendor-dependent Device operates in low power mode. In this mode, the device
power mode continues to operate, although not at full functionality.
For example, as the result of setting the device to this power
mode, the device will stop the Zoom function. To avoid
confusing the user, the device should issue an interrupt
(GET_INFO) to notify the user that the Zoom function is
disabled.
In this mode, the device can stream video data, the functionality
of USB is not affected, and the device can execute all requests
that it supports.
This mode is optional.
The power mode that is supported by the device must be passed to the host, as well as the power
source, since if the device is working with battery power, the host can change the device power
mode to “vendor-dependent power mode” to reduce power consumption.
Information regarding power modes and power sources is communicated through the following
bit fields. D7..D5 indicates which power source is currently used in the device. The D4 indicates
that the device supports “vendor-dependent power mode”. Bits D7..D4 are set by the device and
are read-only. The host can change the device power mode by setting a combination of D3..D0.
The host can update the power mode during video streaming.
The D3..D0 value of 0000B indicates that the device is in, or should transition to, full power
mode. The D3..D0 value of 0001B indicates that the device is in, or should transition to, vendor-
dependent power mode.
The host must specify D3..D0 only when the power mode is required to switch, and the other
fields must be set to 0.
Table 4-6 Device Power Mode
Control selector VC_VIDEO_POWER_MODE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO
wLength 1
Offset Field Size Value Description
0 bDevicePowerMode 1 Bitmap
Bit Description R W
D3..0 Power Mode setting o o
0000B:Full power mode
0001B:device dependent
power mode (opt.)
All other bits are reserved.
D4 Device dependent power o x
mode supported.
Not ready: The device has not completed a previous operation. The device will recover from this
state as soon as the previous operation has completed.
Wrong State: The device is in a state that disallows the specific request. The device will remain
in this state until a specific action from the host or the user is completed.
Power: The actual Power Mode of the device is not sufficient to complete the Request.
Out of Range: Result of a SET_CUR Request when attempting to set a value outside of the MIN
and MAX range, or a value that does not satisfy the constraint on resolution (see section 4.2.2,
“Unit and Terminal Control Requests”).
Invalid value with range: Results of a SET_CUR Request when attempting to set a value that is
inside the MIN and MAX range but is not supported.
The bRequest field indicates which attribute the request is manipulating. The MIN, MAX and
RES attributes are not supported for the Set request.
The wValue field specifies the Control Selector (CS) in the high byte, and zero in the low byte.
The Control Selector indicates which type of Control this request is manipulating. When
processing all Controls as part of a batch request (GET_###_ALL), wValue is not needed and
must be set to 0. If the request specifies an unknown or unsupported CS to that Unit or Terminal,
the control pipe must indicate a protocol STALL.
The value of wLength must be calculated as follows. Use wIndex to determine the Unit or
Terminal of interest. For that Unit or Terminal, establish which Controls are supported using the
bmControls field of the associated Unit or Terminal Descriptor. wLength is the sum of the
length of all supported Controls for the target Unit or Terminal. The Data must be ordered
according to the order of the Controls listed in the bmControls field of the target Unit or
Terminal descriptor. If the Unit or Terminal supports batch requests, then each Control in the
Unit or Terminal must contribute to the Data field, even if it does not support the associated
single operation request.
If a Control supports GET_MIN, GET_MAX and GET_RES requests, the values of MAX, MIN
and RES shall be constrained such that (MAX-MIN)/RES is an integral number. Furthermore, the
CUR value (returned by GET_CUR, or set via SET_CUR) shall be constrained such that (CUR-
MIN)/RES is an integral number. The device shall indicate protocol STALL and update the
Request Error Code Control with 0x04 “Out of Range” if an invalid CUR value is provided in a
SET_CUR operation (see section 2.4.4, “Control Transfer and Request Processing”).
There are special Terminal types (such as the Camera Terminal and Media Transport Terminal)
that have type-specific Terminal Controls defined. The controls for the Media Transport
Terminal are defined in a companion specification (see the USB Device Class Definition for
Video Media Transport Terminal specification). The controls for the Camera Terminal are
defined in the following sections.
As this specification evolves, new controls in the Camera Terminal, Processing Unit, and
Encoding Unit are added to the list of associated Control Selectors at the end (Tables A-12
through A-14). However, in the sections below, the description of the functionality is placed next
to controls with associated functionality.
specific. A value of zero (0) indicates that the exposure time is set to the default value for
implementation. The default values are implementation specific. . When the Auto-Exposure
Mode control is in Auto mode or Aperture Priority mode attempts to programmatically set this
control shall result in a protocol STALL and an error code of bRequestErrorCode = “Wrong
state”.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control (see
section 2.4.2.2, “Status Interrupt Endpoint”).
Table 4-13 Exposure Time (Relative) Control
Control Selector CT_EXPOSURE_TIME_RELATIVE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO
wLength 1
Offset Field Size Value Description
0 bExposureTimeRelative 1 Signed The setting for the attribute of the
Number addressed Exposure Time (Relative)
Control:
0: default
1: incremented by 1 step
0xFF: decremented by 1 step
The bFocusRelative field indicates whether the focus lens group is stopped or is moving for
near or for infinity direction. A value of 1 indicates that the focus lens group is moved for near
direction. A value of 0 indicates that the focus lens group is stopped. And a value of 0xFF
indicates that the lens group is moved for infinity direction. The GET_MIN, GET_MAX,
GET_RES and GET_DEF requests will return zero for this field.
The bSpeed field indicates the speed of the lens group movement. A low number indicates a
slow speed and a high number indicates a high speed. The GET_MIN, GET_MAX and
GET_RES requests are used to retrieve the range and resolution for this field. The GET_DEF
request is used to retrieve the default value for this field. If the control does not support speed
control, it will return the value 1 in this field for all these requests.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control at the
end of the movement (see section 2.4.2.2, “Status Interrupt Endpoint”). The end of movement
can be due to physical device limits, or due to an explicit request by the host to stop the
movement. If the end of movement is due to physical device limits (such as a limit in range of
motion), a Control Change interrupt shall be generated for this Relative Control. If there is no
limit in range of motion, a Control Change interrupt is not required.
When the Auto-Focus Mode control is enabled, attempts to programmatically set this control
shall result in a protocol STALL and an error code of bRequestErrorCode = “Wrong state”.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control (see
section 2.4.2.2, “Status Interrupt Endpoint”).
Table 4-19 Iris (Relative) Control
Control Selector CT_IRIS_RELATIVE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO
wLength 1
Offset Field Size Value Description
0 bIrisRelative 1 Number The setting for the attribute of the
addressed Iris (Relative) Control:
0: Default
1: Iris is opened by 1 step.
0xFF: Iris is closed by 1 step.
The bZoom field indicates whether the zoom lens group is stopped or the direction of the zoom
lens. A value of 1 indicates that the zoom lens is moved towards the telephoto direction. A value
of zero indicates that the zoom lens is stopped, and a value of 0xFF indicates that the zoom lens
is moved towards the wide-angle direction. The GET_MIN, GET_MAX, GET_RES and
GET_DEF requests will return zero for this field.
The bDigitalZoom field specifies whether digital zoom is enabled or disabled. If the device only
supports digital zoom, this field would be ignored. The GET_DEF request will return the default
value for this field. The GET_MIN, GET_MAX and GET_RES requests will return zero for this
field.
The bSpeed field indicates the speed of the control change. A low number indicates a slow speed
and a high number indicates a higher speed. The GET_MIN, GET_MAX and GET_RES
requests are used to retrieve the range and resolution for this field. The GET_DEF request is
used to retrieve the default value for this field. If the control does not support speed control, it
will return the value 1 in this field for all these requests.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control at the
end of the movement (see section 2.4.2.2, “Status Interrupt Endpoint”). The end of movement
can be due to physical device limits, or due to an explicit request by the host to stop the
movement. If the end of movement is due to physical device limits (such as a limit in range of
motion), a Control Change interrupt shall be generated for this Relative Control.
Table 4-21 Zoom (Relative) Control
Control Selector CT_ZOOM_RELATIVE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO, GET_DEF, GET_MIN,
GET_MAX, GET_RES
wLength 3
Offset Field Size Value Description
0 bZoom 1 Signed The setting for the attribute of the
number addressed Zoom Control:
0: Stop
1: moving to telephoto direction
0xFF: moving to wide-angle direction
1 bDigitalZoom 1 Boolean 0: Digital Zoom OFF
1: Digital Zoom On
The bPanRelative field is used to specify the pan direction to move. A value of 0 indicates to
stop the pan, a value of 1 indicates to start moving clockwise direction, and a value of 0xFF
indicates to start moving counterclockwise direction. The GET_DEF, GET_MIN, GET_MAX
and GET_RES requests will return zero for this field.
The bPanSpeed field is used to specify the speed of the movement for the Pan direction. A low
number indicates a slow speed and a high number indicates a higher speed. The GET_MIN,
GET_MAX and GET_RES requests are used to retrieve the range and resolution for this field.
The GET_DEF request is used to retrieve the default value for this field. If the control does not
support speed control for the Pan control, it will return the value 1 in this field for all these
requests.
The bTiltRelative field is used to specify the tilt direction to move. A value of zero indicates to
stop the tilt, a value of 1 indicates that the camera point the imaging plane up, and a value of
0xFF indicates that the camera point the imaging plane down. The GET_DEF, GET_MIN,
GET_MAX and GET_RES requests will return zero for this field.
The bTiltSpeed field is used to specify the speed of the movement for the Tilt direction. A low
number indicates a slow speed and a high number indicates a higher speed. The GET_MIN,
GET_MAX and GET_RES requests are used to retrieve the range and resolution for this field.
The GET_DEF request is used to retrieve the default value for this field. If the control does not
support speed control for the Tilt control, it will return the value 1 in this field for all these
requests.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control at the
end of the movement (see section 2.4.2.2, “Status Interrupt Endpoint”). The end of movement
can be due to physical device limits, or due to an explicit request by the host to stop the
movement. If the end of movement is due to physical device limits (such as a limit in range of
motion), a Control Change interrupt shall be generated for this Relative Control. If there is no
limit in range of motion, a Control Change interrupt is not required.
Table 4-23 PanTilt (Relative) Control
Control Selector CT_PANTILT_RELATIVE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO, GET_DEF, GET_MIN,
GET_MAX, GET_RES
wLength 4
Offset Field Size Value Description
0 bPanRelative 1 Signed The setting for the attribute of the
number addressed Pan (Relative) Control:
0: Stop
1: moving to clockwise direction
0xFF: moving to counter clockwise
direction
1 bPanSpeed 1 Number Speed of the Pan movement
2 bTiltRelative 1 Signed The setting for the attribute of the
number addressed Tilt (Relative) Control:
0: Stop
1: point the imaging plane up
0xFF: point the imaging plane down
3 bTiltSpeed 1 Number Speed for the Tilt movement
The bRollRelative field is used to specify the roll direction to move. A value of 0 indicates to
stop the roll, a value of 1 indicates to start moving in a clockwise rotation of the camera along
the image viewing axis, and a value of 0xFF indicates to start moving in a counterclockwise
direction. The GET_DEF, GET_MIN, GET_MAX and GET_RES requests will return zero for
this field.
The bSpeed is used to specify the speed of the roll movement. A low number indicates a slow
speed and a high number indicates a higher speed. The GET_MIN, GET_MAX and GET_RES
requests are used to retrieve the range and resolution for this field. The GET_DEF request is
used to retrieve the default value for this field. If the control does not support speed control, it
will return the value 1 in this field for all these requests.
If both Relative and Absolute Controls are supported, a SET_CUR to the Relative Control with a
value other than 0x00 shall result in a Control Change interrupt for the Absolute Control at the
end of the movement (see section 2.4.2.2, “Status Interrupt Endpoint”). The end of movement
can be due to physical device limits, or due to an explicit request by the host to stop the
movement. If the end of movement is due to physical device limits (such as a limit in range of
motion), a Control Change interrupt shall be generated for this Relative Control. If there is no
limit in range of motion, a Control Change interrupt is not required.
Table 4-25 Roll (Relative) Control
Control Selector CT_ROLL_RELATIVE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO, GET_DEF, GET_MIN,
GET_MAX, GET_RES
wLength 2
Offset Field Size Value Description
0 bRollRelative 1 Signed The setting for the attribute of the
number addressed Roll (Relative) Control:
0: Stop
1: moving clockwise rotation
0xFF: moving counter clockwise rotation
1 bSpeed 1 Number Speed for the Roll movement
GET_MAX should return the sensor size as well as maximum number of supported steps in the
units indicated by bmNumStepsUnits. If the bmNumStepsUnits has not been set, the default
value should be used. GET_CUR returns the current coordinates of the digitial window used for
capture. If the device is moving between settings (e.g. wNumSteps > 1), GET_CUR references
the digital window of the current step.
The bmAutoControls bitmask determines which, if any, on board features should track to the
region of interest. To detect if a device supports a particular Auto Control, use GET_MAX
which returns a mask indicating all supported Auto Controls.
GET_CUR returns the current Region of Interest (RoI) being employed by the device. This RoI
should be the same as specified in most recent SET_CUR except in the case where the ‘Auto
Detect and Track’ and/or ‘Image Stabilization’ bit have been set.
A Selector Unit represents a video stream source selector. The valid range for the CUR, MIN,
and MAX attributes is from one up to the number of Input Pins of the Selector Unit. This value
can be found in the bNrInPins field of the Selector Unit descriptor. The RES attribute can only
have a value of one.
Table 4-29 Selector Unit Control Requests
Control Selector SU_INPUT_SELECT_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_MIN, GET_MAX,GET_RES,
GET_INFO
wLength 1
Offset Field Size Value Description
0 bSelector 1 Number The setting for the attribute of the
Selector Control.
The following paragraphs present a detailed description of all possible Controls a Processing
Unit can incorporate. For each Control, the layout of the parameter block together with the
appropriate Control Selector is listed for all forms of the Get/Set Processing Unit Control
request. All values are interpreted as unsigned unless otherwise specified.
This is used to specify the amount of Digital Zoom applied to the optical image. This is the
position within the range of possible values of multiplier m, allowing the multiplier resolution to
be described by the device implementation. The MIN and MAX values are sufficient to imply
the resolution, so the RES value must always be 1. The MIN, MAX and default values are
implementation dependent. If the Digital Multiplier Limit Control is supported, the MIN and
MAX values shall match the MIN and MAX values of the Digital Multiplier Control. The Digital
Multiplier Limit Control allows either the Device or the Host to establish a temporary upper limit
for the Zcur value, thus reducing dynamically the range of the Digital Multiplier Control. If
Digital Multiplier Limit is used to decrease the Limit below the current Zcur value, the Zcur value
will be adjusted to match the new limit and the Digital Multiplier Control shall send a Control
Change Event to notify the host of the adjustment.
Table 4-45 Digital Multiplier Control
Control Selector PU_DIGITAL_MULTIPLIER_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_MIN, GET_MAX, GET_RES,
GET_INFO, GET_DEF
wLength 2
Offset Field Size Value Description
0 wMultiplierStep 2 Number The value Zcur
enters a Commit state that is different from the previous Commit state, the device must update
the Encoding Unit state using the rules defined in Table 4-49 together with the new Commit data
structure.
Table 4-49 Default Encoding Unit State after VS_COMMIT_CONTROL(SET_CUR)
Request.
Method of Configuration
COMMIT(x): Indicates that the parameter is given by the VS_COMMIT_CONTROL(GET_CUR) structure field indicated
within parentheses.
Figure 4-1 illustrates three high level device states and the USB requests that can trigger
transitions between these device states when the requests succeed. USB requests not shown in
the Figure below, such as GET_CUR, GET_MIN, GET_MAX, GET_DEF among others, do not
trigger a state transition. State 0 represents the device in the USB Configured State (see Section
9.1.1.5 “Configured” of USB Specification Revision 2.0 and USB Specification Revision 3.0). At
this point, the Encoding Unit State in undefined. State 1 represents the active device state after a
successful VS_COMMIT_CONTROL(SET_CUR). Additional configuration of the encoder
through Encoding Units may occur before or during streaming. When in state 1 or 2, a
VS_COMMIT_CONTROL(SET_CUR) equal to the current
VS_COMMIT_CONTROL(GET_CUR) will have no effect on the device, that means that the
Encoding Unit state will also remain the same. When in state 1 or 2, a
VS_COMMIT_CONTROL(SET_CUR) different than the current
VS_COMMIT_CONTROL(GET_CUR) will update both the Commit and the Encoding Unit
State according to the new Commit state and the rules defined in Table 4-49; this transition is
not shown in the state diagram.
Table 4-50 describes how Encoding Units interact with each these states. This description
includes possible errors that may be logged by the device in response to USB Requests issued to
Encoding Unit controls. Errors may be retrieved by the host using the Request Error Code
Control (see section 4.2.1.2 "Request Error Code Control").
EU_*_CONTROL(SET_CUR) EU_*_CONTROL(SET_CUR)
SET_ INTERFACE (> 0)
0 1 2
VS_COMMIT_CONTROL(SET_CUR)
SET_ INTERFACE(0)
When in state 2, streaming, the only valid parameter value in a SET_INTERFACE request is
alternate setting 0.
Table 4-50 Encoding Units, Devices States and Error Code Control Responses
State Streaming State Encoding Unit Error Code Control Response to USB
State Requests issued to Encoding Unit Controls
0 not streaming Undefined, except Protocol STALL with error code:
GET_LEN “Invalid Control” if EU Control is not
supported. Else,
“Invalid Request” if USB request is not
supported for this Control. Else,
“Wrong State” if EU Control is supported
after initial
VS_COMMIT_CONTROL(SET_CUR)
1 not streaming defined by the Protocol STALL with error code:
default values as “Invalid Control” if EU Control is not
given in Table 4-49 supported. Else,
together with the “Invalid Request” if USB request is not
values set in a supported for this Control. Else,
successful “Wrong State” if EU is supported only
SET_CUR request while streaming or if the active
to the Encoding wLayerOrView is not valid. Else,
Unit “Out of Range” if any of the input
Host Device
PROBE_CONTROL(SET_CUR)
PROBE_CONTROL(GET_CUR)
COMMIT_CONTROL(GET_CUR)
EU_RESOLUTION_CONTROL(SET_CUR)
Start streaming
appropriate bitrate and CPB size. At this point the host starts the stream selecting alternate
interface number 1. At a later time, while still streaming, the host increases the resolution to
720p. To do this, the host first increases the CPB size and then increases the bit rate. It is
important to keep within the limits of all bitrate and buffer size limits when making these
changes. Finally, the host changes the resolution to 720p.
The example below shows a failed Encoding Unit request to set the frame rate to a higher value
than was negotiated.
Host Device
PROBE_CONTROL(SET_CUR)
PROBE_CONTROL(GET_CUR)
COMMIT_CONTROL(GET_CUR)
SELECT_ALTERNATE_INTERFACE(1)
Start streaming
(Fail)
The dependency_id, quality_id, temporal_id, and view_id of each layer of a stream are determined
by the scaling capability mode negotiated in Probe and Commit. The stream_id parameter is used to
differentiate between streams when simulcast of two or more streams is enabled. In the case of a
single stream, stream_id is always zero. Each additional stream is given a unique stream_id by
incrementing the stream_id by 1. If the stream supports simulcast transport and the stream_id does
not exist, the device should protocol STALL with error code “Out of Range”. If wLayerOrViewID
specifies a layer or view that is not defined by the bmLayoutPerStream established during
VS_COMMIT the device shall protocol STALL with error code “Out of Range”. If the stream does
not support simulcast transport and the stream_id is not zero, the device should ignore the stream_id
and proceed with the Select Layer request for the first stream.
4.2.2.4.2.2 Multicast
This specification supports multicast, e.g. multiple encoded video streams from a single video
function. The device must offer a separate Encoding Unit for each Video Streaming interface that
delivers encoded video.
Video Streaming
Interface 2
EU OT USB IN Endpoint 2
Video Streaming
Interface 3
EU OT USB IN Endpoint 3
Probe control must accurately reflect the capabilities of the device, including what is already been
negotiated using VS_COMMIT_CONTROL on other streaming interfaces.
For variable and constant bit rate buffer modeling, this document specifies rate control operation
in terms of a leaky bucket model. The bits used to encode each picture are analogous to cups of
water being dumped into the top of the bucket when each picture is encoded (after re-ordering
the pictures as necessary for bitstream orders that differ from display order); the level of water in
the bucket indicates the number of bits waiting to be sent to the decoder; and the water leaking
out of a hole in the bottom of the bucket corresponds to bits flowing into the decoder through a
transmission channel. The leaky bucket is a traffic meter that contains two parameters:
RP = dwPeakBitRate (bits per second), which is the peak bit rate at which bit can flow
out from the bottom of the bucket
B = dwCPBsize (bits), which is the coded picture buffer (CPB) capacity.
Picture
Picture
Picture
Buffer (B)
Bits waiting
to be sent
The buffer serves to smooth out local bit rate fluctuations while limiting the total bit usage that is
possible over longer durations and limiting the buffering capacity necessary for a decoder to be
able to decode the video content.
The leaky bucket model at the encoder has a corresponding mirror-image model that operates
from the decoder perspective. As bits leak out of the encoder buffer, they conceptually enter into
a corresponding decoder input buffer, which continues to fill up until the decoding time of a
picture arrives – at which time the bits for that picture are removed from the decoder’s CPB.
If too many bits are dumped into the bucket too quickly, the buffer capacity B would be
exceeded before enough bits have time to drain out of the hole in the bottom of the bucket, and
the buffer is said to “overflow” from the encoder perspective. From the decoder perspective, an
overflow could occur if the removal of pictures from the decoder CPB at the decoding times of
those pictures is not fast enough to keep up with the amount of bits that have been flowing into it
from the encoder.
The encoder shall ensure that the leaky bucket never overflows.
Table 4-58 Rate Control Mode Control
Control Selector EU_RATE_CONTROL_MODE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_DEF, GET_INFO, GET_LEN
wLength 1
Offset Field Size Value Description
0 bRateControlMode 1 Number 0: Reserved
1: Variable Bit Rate low delay (VBR)
2: Constant bit rate (CBR)
3: Constant QP
4: Global VBR low delay (GVBR)
5: Variable bit rate non-low delay (VBRN)
6: Global VBR non-low delay (GVBRN)
7-255: Reserved
It is allowed for the bit rate produced by the video encoder to exceed dwAverageBitRate
(e.g., when there is an exceptionally high degree of activity in the video scene).
It is also allowed for the bit rate produced by the video encoder to be less than
dwAverageBitRate (e.g., when there is very little activity in the video scene or when
lighting conditions are poor).
In contrast, the peak bit rate RP = dwPeakBitRate bps and the total buffer capacity B,
which are the operating parameters of the leaky bucket model, correspond to a mandatory
maximum not to be exceeded by the encoder on a long-term basis (i.e., the leaky bucket model
shall not overflow).
The average bit rate parameter dwAverageBitRate shall be set to a value less than or equal to peak
bit rate = dwPeakBitRate x 64.
For multi-layer bitstreams, the VBR-control model applies to the currently selected sub-bitstream
as defined in section 4.2.2.4.2.1 For single layer bitstreams, this rate control model applies to the
entire bitstream, because there is only one layer in the bitstream.
The dwAverageBitRate value returned by the device upon a GET_MIN request specifies the
minimum average bit rate for the sub-bitstream at the sub-bitstream frame interval. For
GET_MAX, the device returns the maximum capability for the overall stream at the current
frame interval for that stream, summing up bit rates across all layers in the sub-bitstream. These
values are for the active resolution as specified by
EU_VIDEO_RESOLUTION_CONTROL(GET_CUR).
Table 4-59: Average Bitrate Control
Control Selector EU_AVERAGE_BITRATE_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_MIN, GET_MAX, GET_DEF,
GET_INFO, GET_LEN, GET_RES
wLength 4
Offset Field Size Value Description
0 dwAverageBitRate 4 Number Average bit rate, in bits per second. Must be
less than or equal to dwPeakBitRate x 64.
Applies for all rate control modes except
Constant QP.
This control should be reported as an AutoUpdate control. When this control is reported as
AutoUpdate, any changes to the encoder that alter the values of GET_MIN and GET_MAX must
be reported via AutoUpdate.
for the given bLevelIDC specified in the Frame descriptor. The host should set the current
dwCPBsize appropriately for the dwPeakBitRate.
the rate control mode changes to something other than Constant QP. GET_MIN, GET_MAX,
and GET_RES shall return values for each field.
If a slice type (I, P, B) is not supported by the format, device or if it is not supported for the
active wLayerOrViewID, then
For a SET_CUR request to the EU_QUANTIZATION_PARAMS_CONTROL the
device will ignore the wQpPrime field for that slice type.
wLength 2
Offset Field Size Value Description
0 bMinQp 1 Number Minimum quantization parameter to use for the frame
(both luma and chroma).
1 bMaxQp 1 Number Maximum quantization parameter to use for the frame
(both luma and chroma).
This control can be used to narrow the range of possible quantization parameter values (for both
luma and chroma) to use in the encoding process. GET_MIN returns the highest bMinQp and
lowest bMaxQp for the given rate control mode and negotiated bit rate. GET_MAX returns the
lowest bMinQp and highest bMaxQp for the given rate control mode and negotiated bit rate.
The bMinQp and bMaxQp values reflect the global minimum and maximum Qp, including all
offsets.
This control should be reported as an AutoUpdate control. When this control is reported as
AutoUpdate, any changes to the encoder that alter the values of GET_MIN and GET_MAX must
be reported via AutoUpdate.
wLayerOrViewID.
3: Generate a non-IDR random-access I frame
for the associated dependency layers of the
current wLayerOrViewID.
4: Generate a non-IDR random-access I frame
that is a long-term reference frame for the
associated dependency layers of the current
wLayerOrViewID.
5: Generate a P frame that is a-long term
reference frame for the associated dependency
layers of the current wLayerOrViewID.
6: Gradual Decoder Refresh (GDR)
7-255: Reserved.
1 wSyncFrameInte 2 Number In milliseconds. This field indicates the periodic
rval recurrences of the selected bSyncFrameType.
A value of wSyncFrameInterval = 0 indicates
a single bSyncFrameType with no requirement
for periodic recurrence.
3 bGradualDecode 1 Number Indicates a count of frames over which the
rRefresh gradual decoder refresh occurs. Only valid
when bSyncFrameType = 6 (GDR). When
bSyncFrameType is not 6, this field must be 0.
From a recovery point of view,
bGradualDecoderRefresh + 1 represents the
number of frames required to completely
refresh the picture.
Bits:
0-6: recovery_frame_cnt
7: Reserved
GET_MIN and GET_MAX can be used to determine the minimum and maximum
recovery_frame_cnt over which the encoder can implement GDR. GET_MIN and GET_MAX
can also be used to determine the minimum and maximum wSynchFrameInterval of the current
bSyncFrameType.
GET_MIN and GET_MAX may be used to determine if the device supports changes to
wSyncFrameInterval. If GET_MIN and GET_MAX return the same wSyncFrameInterval
value as GET_CUR, then the device does not support changes to this value.
The Long Term Reference controls defined in this specification support two different trust
models:
Trust Until (only notify when something has gone wrong)
Don’t Trust Until (notify all results, success or failure)
The host can decide which Trust Model to use by setting bTrustMode to the desired mode.
To implement a “Don’t Trust Until” solution, set bTrustMode to 0. If an LTR has been
validated by the host, then the associated position in the list of LTRs can be set to 1 using the
EU_LTR_VALIDATION_CONTROL. To implement a “Trust Until” model, set bTrustMode
to 1. If an LTR is confirmed as no longer valid by the host, then the associated position in the list
of LTRs can be set to 0 using the EU_LTR_VALIDATION_CONTROL.
The EU_LTR_BUFFER_CONTROL allows for discovery and allocation of long term reference
(LTR) frames on the device. It also allows the host to set the device behavior when inserting new
LTR frames using EU_LTR_PICTURE_CONTROL. If the encoder on the device does not have
enough memory to enable long term reference frames at the current resolution, then the
GET_MAX shall return bNumHostControlLTRBuffers equal to 0. If the encoder allows the
host to manage the LTR buffers, it shall assign continuous index space starting from index 1.
GET_MAX returns the maximum number of LTR buffers available for host control.
When bLTRMode is 0, the new LTR generated by this control must only reference host
controlled LTRs that have been validated. When bLTRMode is 1, the new LTR generated by
this control must only reference valid host or encoder controlled LTRs. When bLTRMode is 2,
the encoder is free to use any long or short term references it wishes when creating the new
LTR.. When a request for a new LTR frame is still pending, the device shall protocol STALL
any new requests to this control for new LTR frames with bRequestErrorCode = “Not Ready”,
D36: parallel_decoding_info
D37: mvc_scalable_nesting
D38: view_scalability_info
D39: multiview_scene_info
D40: multiview_acquisition_info
D41: non_required_view_component
D42: view_dependency_change
D43: operation_points_not_present
D44: base_view_temporal_hrd
D45: frame_packing_arrangement
D63..D46: Reserved. Set to 0
Each bit in bmSEIMessages represents a different SEI message and when the associated bit is 1
the SEI message is enabled. Multiple types of SEI messages can be enabled/disabled
simultaneously with this control. Bits set in the GET_CUR response will indicate the SEI
messages that are currently enabled. Bits set in the GET_MAX response will indicate the SEI
messages that the device supports. Bits set in the GET_MIN response will indicate which SEI
messages are enabled and cannot be disabled by the host.
depending on it.
1: Start streaming the current layer (and all layers it
depends on) once the device starts streaming. If this
control is not issued before streaming is enabled
(before SET_INTERFACE), the encoder shall
stream every layer when streaming is enabled
GET_CUR returns the current state of the layer (0
if stopped, or 1 if started).
Note that the host must adjust the current average bit rate and current CBP size prior to changing
the level_idc to guarantee those do not violate the new level_idc .
When this value changes, the device must adjust the values returned by
EU_AVERAGE_BITRATE_CONTROL(GET_MAX) and
EU_CPB_SIZE_CONTROL(GET_MAX) to satisfy the new level_idc.
In response to a GET_RES response, device shall set the bits for supported error resiliency
features to 1. All other bits should be set to 0. In response to a GET_DEF request device shall set
the bits to 1 for the tools that are enabled in the device default configuration. All other bits shall
be set to 0. While the exact meaning of bmErrorResiliencyFeatures is established in each
payload specification, several possible examples are given below.
The bRequest field indicates which attribute the request is manipulating. The MIN, MAX, and
RES attributes are not supported for the Set request.
The wValue field specifies the Control Selector (CS) in the high byte and zero in the low byte.
The Control Selector indicates which vendor-defined control within the Extension Unit that this
request is manipulating. If the request specifies an unknown or unsupported CS to that Unit, the
control pipe must indicate a stall. However, if the request specifies an available control, the
request should succeed.
The range of CS values supported by the Extension Unit is dictated by the number of controls
specified by the bNumControls field in the Extension Unit descriptor. See section 3.7.2.7,
"Extension Unit Descriptor". The range shall be [1..bNumControls].
The GET_LEN request queries for the length of the parameter block of the specified control.
When issuing the GET_LEN request, the wLength field shall always be set to a value of 2 bytes.
The result returned shall be the length specified for all other requests on the same control.
All controls supported by the Extension Unit must support the following requests:
GET_CUR, GET_MIN, GET_MAX, GET_RES, GET_INFO, GET_DEF, GET_LEN.
The following request(s) are optional, depending on the control usage and behavior:
SET_CUR
All Extension Unit controls are vendor-defined. The vendor must provide the relevant host
software to program these controls. The generic host driver will not have knowledge of the
control semantics, but acts as a control transport between the vendor-provided host software and
the device.
However, by using the GET_LEN request, the host driver would be able to query the length and
raw data stored in the vendor-defined controls. While it would not be able to interpret this data, it
would be capable of saving and restoring these control settings if required.
The wValue field specifies the Control Selector (CS) in the high byte, and the low byte must be
set to zero. The CS indicates the type of Control that this request is manipulating. If the request
specifies an unknown CS to that endpoint, the control pipe must indicate a stall.
The VideoStreaming interface controls allow the host software to query and set parameters
related to the video stream format and the video stream encoder. These parameters include the
format, frame size and frame rate of the video stream, as well as the format and frame size of still
images captured by the device that are associated with the video stream. For devices that support
host-adjustable video stream encoder parameters, controls allowing the adjustment of the key
frame rate and compression quality, among other parameters, are also supported. Only Stream
Error Code Control supports interrupt with VideoStreaming interface.
This negotiation model is supported by the Video Probe and Commit controls. The Probe control
allows retrieval and negotiation of streaming parameters. When an acceptable combination of
streaming parameters has been obtained, the Commit control is used to configure the hardware
with the negotiated parameters from the Probe control.
Additional Encoding Units may used to finalize the configuration of the video streaming
interface after Probe and Commit but before streaming starts. This hybrid model of Descriptor
plus Encoding Unit was chosen as the best model to navigate the complex space of encoder
configuration.
Table 4-75 Video Probe and Commit Controls
Control Selector VS_PROBE_CONTROL
VS_COMMIT_CONTROL
Mandatory Requests See tables below
wLength 48
Offse Field Size Value Description
t
0 bmHint 2 Bitmap Bitfield control indicating to the function
what fields shall be kept fixed (indicative
only):
D0: dwFrameInterval
D1: wKeyFrameRate
D2: wPFrameRate
D3: wCompQuality
D4: wCompWindowSize
D15..5: Reserved (0)
During Probe and Commit, the following fields, if supported, shall be negotiated in order of
decreasing priority:
bFormatIndex
bFrameIndex
dwMaxPayloadTransferSize
bUsage
bmLayoutPerStream
Fields set to zero by the host with their associated bmHint bit set to 1
All the remaining fields set to zero by the host
For simplicity when streaming temporally encoded video, the required bandwidth for each
streaming interface shall be estimated using the maximum bit rate for the selected
profile/resolution and the number of simulcast streams. The USB bandwidth reserved shall be the
calculated by the host as the advertised dwMaxBitRate from the selected Frame Descriptor
multiplied times the number of simulcast streams as defined in the bmLayoutPerStream field.
The interface descriptor for the video function should have multiple alternate settings that
support the required bandwidths calculated in the manner above.
This request shall protocol STALL in the case where the device would
be place into an unsupported state or the case where value for a
negotiated field is out of range. For exact errors to register, see section
4.2.1.2 "Request Error Code Control".
This request shall protocol STALL in the case where the device would
be place into an unsupported state or the case where value for a
negotiated field is out of range. For exact errors to register, see section
4.2.1.2 "Request Error Code Control".
Host Device
VS_PROBE_CONTROL(SET_CUR)
VS_PROBE_CONTROL(GET_CUR)
VS_COMMIT_CONTROL(SET_CUR)
SET_INTERFACE(1)
Host Device
VS_PROBE_CONTROL(SET_CUR)
VS_PROBE_CONTROL(GET_CUR)
VS_COMMIT_CONTROL(SET_CUR)
SET_INTERFACE(1)
(Fail)
VS_PROBE_CONTROL(SET_CUR)
VS_PROBE_CONTROL(GET_CUR)
VS_COMMIT_CONTROL(SET_CUR)
SET_INTERFACE (1)
Host Device
VS_PROBE_CONTROL(SET_CUR)
VS_PROBE_CONTROL(GET_CUR)
VS_COMMIT_CONTROL(SET_CUR)
SET_INTERFACE(1)
Streaming
VS_COMMIT_CONTROL(SET_CUR)
Only those devices that are capable of video streaming with adjustable delay latency parameters
support this control.
The Control is used to notify the video application buffer memory manager on the device to
control an internal latency by controlling output timing of the video data to its endpoint.
It is the responsibility of the host (video sink) to synchronize streams by scheduling the
rendering of samples at the correct moment, taking into account the internal delays of all media
streams being rendered.
Table 4-81 Synch Delay Control
Control Selector VS_SYNCH_DELAY_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_MIN, GET_MAX, GET_RES,
GET_INFO, GET_DEF
wLength 2
Offset Field Size Value Description
0 wDelay 2 Number Delay from the time that the packet
should be sent. wDelay is expressed in
microsecond units.
The Generate Key Frame Control is used to notify the video encoder on the device to generate a
key frame in the device stream at its earliest opportunity. After the key frame has been generated,
the device shall reset the control to the “Normal Operation” mode. This control is only applicable
to video formats that support temporal compression (such as MPEG-2 Video), and while
streaming is occurring. In all other cases, the device shall respond to requests by indicating a stall
on the control pipe.
Table 4-83 Generate Key Frame Control
Control Selector VS_GENERATE_KEY_FRAME_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_INFO
wLength 1
Offset Field Size Value Description
0 bGenerateKeyFrame 1 Number The setting for the attribute of the
addressed Generate Key Frame control:
0: Normal operation
1: Generate Key Frame
The Update Frame Segment Control is used to notify the video encoder on the device to encode
the specified range of video frame segments with intra coding (no dependency on surrounding
frames) at its earliest opportunity. A video frame segment corresponds to a group of macroblocks
that can be decoded independently, such as a slice in MPEG Video, or a Group of Blocks in
H.26x Video. This control is only applicable to video formats that support the concept of a video
frame segment, and while streaming is occurring. In all other cases, the device shall respond to
requests by indicating a stall on the control pipe.
The device will indicate the number of frame segments that it supports through the GET_MAX
request, for which the device will indicate the maximum frame segment index supported in both
the bStartFrameSegment and bEndFrameSegment fields. The minimum value for these fields
shall always be zero. The resolution for this control shall always be set to 1.
Table 4-84 Update Frame Segment Control
Control Selector VS_UPDATE_FRAME_SEGMENT_CONTROL
Mandatory Requests SET_CUR, GET_CUR, GET_MIN, GET_MAX, GET_RES,
GET_INFO, GET_DEF
wLength 2
Offset Field Size Value Description
0 bStartFrameSegment 1 Number The zero-based index of the first frame
segment in the range to update
1 bEndFrameSegment 1 Number The zero-based index of the last frame
segment in the range to update
The host software should send a GET_CUR request to this control to determine the error when
one of the following events occurs:
- The Error bit in the video or still image payload header is set by the device (see section
2.4.3.2.2, “Sample Isochronous Transfers”).
- The device issues a “Stream Error” interrupt to the host, with the source being the Stream
Error Code Control (see section 2.4.2.2, “Status Interrupt Endpoint”).
- A bulk video endpoint returns a STALL packet to the host in the data or handshake stage
of the transaction.
For scenarios where the host is transmitting video data to the device, the host can not use the
Error bit in the payload header to detect a device error. Therefore, in order to determine when a
streaming error occurs, the host must rely on either a Control Change interrupt from the device
or a bulk endpoint stall.
Table 4-85 Stream Error Code Control
Control Selector VS_STREAM_ERROR_CODE_CONTROL
Mandatory Requests GET_CUR, GET_INFO
wLength 1
Offset Field Size Value Description
0 bStreamErrorCode 1 Number 0: No Error.
VS_FORMAT_VP8 0x16
VS_FRAME_VP8 0x17
VS_FORMAT_VP8_SIMULCAST 0x18
CT_ROLL_RELATIVE_CONTROL 0x10
CT_PRIVACY_CONTROL 0x11
CT_FOCUS_SIMPLE_CONTROL 0x12
CT_WINDOW_CONTROL 0x13
CT_REGION_OF_INTEREST_CONTROL 0x14
EU_AVERAGE_BITRATE_CONTROL 0x07
EU_CPB_SIZE_CONTROL 0x08
EU_PEAK_BIT_RATE_CONTROL 0x09
EU_QUANTIZATION_PARAMS_CONTROL 0x0A
EU_SYNC_REF_FRAME_CONTROL 0x0B
EU_LTR_BUFFER_ CONTROL 0x0C
EU_LTR_PICTURE_CONTROL 0x0D
EU_LTR_VALIDATION_CONTROL 0x0E
EU_LEVEL_IDC_LIMIT_CONTROL 0x0F
EU_SEI_PAYLOADTYPE_CONTROL 0x10
EU_QP_RANGE_CONTROL 0x11
EU_PRIORITY_CONTROL 0x12
EU_START_OR_STOP_LAYER_CONTROL 0x13
EU_ERROR_RESILIENCY_CONTROL 0x14
Optical and digital zoom are functionally independent, so each will be discussed separately in the
following sections. Although functionally independent, users will expect a single zoom control
that integrates both.
The objective lens is the one nearest the subject, while the ocular lens is the one nearest the
viewer, or in our case, the camera sensor. A zoom lens varies the objective focal length.
Since magnification is a ratio of the objective and ocular focal lengths, the Units used to specify
these focal lengths can be of any resolution supported by the device. In other words, these Units
do not need to be specified in real physical units (millimeters or fractions of inches). The only
requirement is that the two focal lengths are specified in the same units.
Note that when Lobjective < Locular, the lenses are at a wide-angle setting. The subject will appear
smaller than life, and the field of view will be wider.
Locular will be a device-specific constant value for each camera implementation, so it will be
specified within the static Camera Terminal descriptor. If a camera implements an optical zoom
function, Lobjective can vary within a specified range. In order to properly represent the range of
magnification, Lobjective will be specified as a range Lmin to Lmax, which will also be specified
within the static Camera Descriptor.
Finally, the variable position within the range of possible Lobjective values will be specified via a
dynamic Camera Zoom Control, as integral values Zmin, Zmax, Zstep, and Zcur. See sections
4.2.2.1.12, "Zoom (Absolute) Control" and 4.2.2.1.13, "Zoom (Relative) Control". This allows
the Units of the objective lens focal length to be de-coupled from the Units used to control zoom.
For simplicity, Zstep will be constrained to equal one (1). Values of Lmin and Lmax are constrained
to be non-zero integral numbers; however, for the purpose of the following calculations, Lcur will
be a real number.
Note: A typical choice for Locular would be half the length of a diagonal line of the imager
(CCD, etc.), however there is no requirement for this value to be a direct physical measurement.
Given a known Zcur, the current objective focal length (Lcur) can be calculated as follows:
Working from the opposite direction, given a known magnification (M), Lcur can be calculated as
follows:
Lcur = M * Locular
From this, the current Zoom control value (Zcur) can be calculated as follows:
To further simplify the calculations, Zmin can be constrained to be zero (0). The camera designer
will choose the values and ranges of the remaining variables according to the capabilities of the
device.
Lmin = 800
Lmax = 10000
Zmin = 0
Zmax = 255
Zcur 9200
* + 800
Lcur =
255
When choosing a camera sensor to match a lens system, the camera designer may need to
consider a multiplier effect caused by a sensor that is smaller than the exit pupil of the ocular
lens. This multiplier will not be represented explicitly in the USB Video Class specification,
since its effect can be represented via adjustments to the Lobjective values.
Note The Zcur value can be mapped to the physical lens position sensor control/status register.
Digital zoom is represented as a multiplier of the current optical magnification of the captured
image. In order to change the amount of digital zoom, the multiplier is changed through a range
from 1 to some maximum value mmax, and mmax will be specified in the Processing Unit
Descriptor. The position within the range of possible values of multiplier m will be expressed via
a Processing Unit Digital Multiplier Control, as Zmin, Zmax, Zstep, and Zcur. See section
4.2.2.3.16, "Digital Multiplier Control". This allows the multiplier resolution to be described by
the device implementation. Zstep will be constrained to equal one (1).
Given a known Zcur, the current multiplier mcur can be calculated as follows:
From this, and referring to the optical zoom values of Lmax and Locular described in the previous
section, the total magnification M can be calculated as follows:
Lmax
mcur
M = *
Locular
Working from the opposite direction, given a known magnification M, the multiplier mcur can be
calculated as follows:
Locular
mcur = M *
Lmax
From this, the current Digital Multiplier Control value (Zcur) can be calculated as follows:
For simplicity, Zmin can be constrained to be zero (0). The camera designer will choose the
values and ranges of the remaining variables according to the capabilities of the device.
mmax = 40
Zmin = 0
Zmax = 255
Zcur 39
* + 1
mcur =
255
The current Digital Zoom control value (Zcur) can be calculated as follows:
(mcur – 1) * 255
Zcur =
39
In addition to the Digital Multiplier Control, devices may optionally support a Digital Multiplier
Limit control, allowing either the camera or the host to establish a temporary upper limit for the
Zcur value. This control may be read-only if the limit can only be established via physical camera
configuration. If this control is used to decrease the limit below the current Zcur value, the Zcur
value will be adjusted to match the new limit.
The following diagram illustrates the relationship between optical and digital zoom, and the
constraints on the zoom control variables:
Zmin Zmax
Zcur == Zmax
Optical Range Zmin <= Zcur <= Zmax