Webrtc For The Curious
Webrtc For The Curious
https://github.com/webrtc-for-the-curious/webrtc-for-the-curious
Contents
WebRTC For The Curious 4
Who is this book for. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Designed for multiple readings . . . . . . . . . . . . . . . . . . . . . . 4
Non-commercial and privacy-respecting . . . . . . . . . . . . . . . . . 5
Get Involved! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
What is WebRTC? 5
Why should I learn WebRTC? . . . . . . . . . . . . . . . . . . . . . . 5
WebRTC Protocol is a collection of other technologies . . . . . . . . . 6
Signaling: How peers find each other in WebRTC . . . . . . . . . 6
Connecting and NAT Traversal with STUN/TURN . . . . . . . . 7
Securing the transport layer with DTLS and SRTP . . . . . . . . 7
Communicating with peers via RTP and SCTP . . . . . . . . . . 8
WebRTC, a collection of protocols . . . . . . . . . . . . . . . . . . . . 8
How does WebRTC (the API) work . . . . . . . . . . . . . . . . . . . 8
1
Secure E2E Communication . . . . . . . . . . . . . . . . . . . . . 16
How does it work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Networking real-world constraints . . . . . . . . . . . . . . . . . . . . . 16
Not in the same network . . . . . . . . . . . . . . . . . . . . . . . 16
Protocol Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . 17
Firewall/IDS Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 17
NAT Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Creating a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 18
Mapping Creation Behaviors . . . . . . . . . . . . . . . . . . . . 18
Mapping Filtering Behaviors . . . . . . . . . . . . . . . . . . . . 18
Mapping Refresh . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
STUN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Protocol Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Create a NAT Mapping . . . . . . . . . . . . . . . . . . . . . . . 20
Determining NAT Type . . . . . . . . . . . . . . . . . . . . . . . 21
TURN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
TURN Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
TURN Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Creating an ICE Agent . . . . . . . . . . . . . . . . . . . . . . . 24
Candidate Gathering . . . . . . . . . . . . . . . . . . . . . . . . . 24
Connectivity Checks . . . . . . . . . . . . . . . . . . . . . . . . . 25
Candidate Selection . . . . . . . . . . . . . . . . . . . . . . . . . 25
Restarts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Mapping Payload Types to Codecs . . . . . . . . . . . . . . . . . 35
RTCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Packet Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Full INTRA-frame Request . . . . . . . . . . . . . . . . . . . . . 36
Negative ACKnowledgements . . . . . . . . . . . . . . . . . . . . 36
Sender/Receiver Reports . . . . . . . . . . . . . . . . . . . . . . . 36
Generic RTP Feedback . . . . . . . . . . . . . . . . . . . . . . . . 36
How RTP/RTCP solve problems . . . . . . . . . . . . . . . . . . . . . 36
Negative Acknowledgment . . . . . . . . . . . . . . . . . . . . . . 36
Forward Error Correction . . . . . . . . . . . . . . . . . . . . . . 36
Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . 37
JitterBuffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Applied WebRTC 40
By Use Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Conferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Remote Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
File-Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Distributed CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Protocol Bridging . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
WebRTC Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Client-Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Peer-To-Peer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Debugging 42
Isolate The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 42
Tools of the trade . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
History 43
3
RTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
SDP 48
ICE 48
SRTP 48
SCTP 48
DTLS 48
FAQ 48
Contributing 48
4
• How do we solve it?
• Technical details about the solution.
Each chapter doesn’t assume prior knowledge. You can start at any point in the
book and begin learning.
Get Involved!
We need your help! This book is entirely developed on GitHub and is still being
written. We encourage readers to open issues with questions on things we didn’t
do a good job of covering yet.
License
This book is available under the CC0 license. The authors have waived all their
copyright and related rights in their works to the fullest extent allowed by law.
You may use this work however you want and no attribution is required.
What is WebRTC?
WebRTC, short for Web Real-Time Communication, is both an API and a
Protocol. The WebRTC protocol is a set of rules for two WebRTC agents to
negotiate bi-directional secure real-time communication. The WebRTC API then
allows developers to use the WebRTC protocol. The WebRTC API is specified
only for Javascript.
A similar relationship would be HTTP and the fetch API. WebRTC the protocol
would be HTTP, and WebRTC the API would be the fetch API.
The WebRTC protocol is available in other APIs/languages besides Javascript.
You can find servers and domain-specific tools as well for WebRTC. All of these
implementations use the WebRTC protocol so they can interact with each other.
5
you don’t know some of these terms yet, this book will teach them to you along
the way.
• Open Standard
• Multiple Implementations
• Available in Browsers
• Mandatory Encryption
• NAT Traversal
• Repurposed existing technology
• Congestion Control
• Sub-second Latency
6
• Values used while securing (certificate fingerprint)
Note that signaling typically happens “out-of-band”; that is, applications gen-
erally don’t use WebRTC itself to trade signaling messages. Any architecture
suitable for sending messages can be used to relay the SDPs between the con-
necting peers, and many applications will use their existing infrastructure (like
REST endpoints, websocket connections, or authentication proxies) to facilitate
easy trading of SDPs between the proper clients.
7
Communicating with peers via RTP and SCTP
We now have two WebRTC Agents with secure bi-directional communication.
Let’s start communicating! Again, we use two pre-existing protocols: RTP (Real-
time Transport Protocol), and SCTP (Stream Control Transmission Protocol).
SRTP is used to encrypt media exchanged over RTP, and SCTP is used to send
DataChannel messages encrypted with DTLS.
RTP is quite minimal but provides what is needed to implement real-time
streaming. The important thing is that RTP gives flexibility to the developer
so they can handle latency, loss, and congestion as they please. We will discuss
this further in the media chapter.
The final protocol in the stack is SCTP. SCTP allows many different delivery
options for messages. You can optionally choose to have unreliable, out of order
delivery so you can get the latency needed for real-time systems.
8
addTrack addTrack creates a new RTP stream. A random SSRC will be
generated for this stream. This stream will then be inside the Session Description
generated by createOffer inside a media section. Each call to addTrack will
create a new SSRC and media section.
Immediately after a SRTP Session is established these media packets will start
being sent via ICE after being encrypted using SRTP.
9
oniceconnectionstatechange oniceconnectionstatechange is a callback
that is fired that reflects the state of the ICE Agent. When you have network
connectivity or when you become disconnected this is how you are notified.
10
How to read the SDP
Every line in a Session Description will start with a single character, this is your
key. It will then be followed by an equal sign. Everything after that equal sign
is the value. After the value is complete you will have a newline.
The Session Description Protocol defines all the keys that are valid. You can
only use letters for keys as defined in the protocol. These keys all have significant
meaning, which will be explained later.
Take this Session Description excerpt.
a=my-sdp-value
a=second-value
You have two lines. Each with the key a. The first line has the value
my-sdp-value, the second line has the value second-value.
11
You have two Media Descriptions, one of type audio with fmt 111 and one of
type video with fmt 96. The first Media Description has only one attribute. This
attribute maps the Payload Type 111 to Opus. The second Media Description
has two attributes. The first attribute maps the Payload Type 96 to be VP8,
and the second attribute is just my-sdp-value
Full Example
The following brings all the concepts we have talked about together. These are
all the features of the Session Description Protocol that WebRTC uses. If you
can read this you can read any WebRTC Session Description!
v=0
o=- 0 0 IN IP4 127.0.0.1
s=-
c=IN IP4 127.0.0.1
t=0 0
m=audio 4000 RTP/AVP 111
a=rtpmap:111 OPUS/48000/2
m=video 4002 RTP/AVP 96
a=rtpmap:96 VP8/90000
• v, o, s, c, t are defined but they do not affect the WebRTC session.
• You have two Media Descriptions. One of type audio and one of type
video.
• Each of those has one attribute. This attribute configures details of the
RTP pipeline, which is discussed in the ‘Media Communication’ chapter.
12
Description becomes a Transceiver. Every time you create a Transceiver a new
Media Description is added to the local Session Description.
Each Media Description in WebRTC will have a direction attribute. This allows
a WebRTC Agent to declare ‘I am going to send you this codec, but I am not
willing to accept anything back’. There are four valid values
• send
• recv
• sendrecv
• inactive
group:BUNDLE Bundling is the act of running multiple types of traffic over one
connection. Some WebRTC implementations use a dedicated connection per
media stream. Bundling should be preferred.
setup: This controls the DTLS Agent behavior. This determines if it runs as
a client or server after ICE has connected.
• setup:active - Run as DTLS Client
• setup:passive - Run as DTLS Server
• setup:actpass - Ask other WebRTC Agent to choose
ice-ufrag This is the user fragment value for the ICE Agent. Used for the
authentication of ICE Traffic.
ice-pwd This is the password for the ICE Agent. Used for authentication of
ICE Traffic.
rtpmap This value is used to map a specific codec to a RTP Payload Type.
Payload types are not static so every call the Offerer decides the Payload types
for each codec.
fmtp Defines additional values for one Payload Type. This is useful to commu-
nicate a specific video profile or encoder setting.
13
candidate This is an ICE Candidate that comes from the ICE Agent. This is
one possible address that the WebRTC Agent is available on. These are fully
explained in the next chapter.
14
a=rtcp-rsize
a=rtpmap:96 VP8/90000
a=ssrc:2180035812 cname:XHbOTNRFnLtesHwJ
a=ssrc:2180035812 msid:XHbOTNRFnLtesHwJ JgtwEhBWNEiOnhuW
a=ssrc:2180035812 mslabel:XHbOTNRFnLtesHwJ
a=ssrc:2180035812 label:JgtwEhBWNEiOnhuW
a=msid:XHbOTNRFnLtesHwJ JgtwEhBWNEiOnhuW
a=sendrecv
a=candidate:foundation 1 udp 2130706431 192.168.1.1 53165 typ host generation 0
a=candidate:foundation 2 udp 2130706431 192.168.1.1 53165 typ host generation 0
a=candidate:foundation 1 udp 1694498815 1.2.3.4 57336 typ srflx raddr 0.0.0.0 rport 57336 ge
a=candidate:foundation 2 udp 1694498815 1.2.3.4 57336 typ srflx raddr 0.0.0.0 rport 57336 ge
a=end-of-candidates
This is what we know from this message
• We have two media sections, one audio, and one video
• Each of those is a sendrecv Transceiver. We are getting two streams, and
we can send two back.
• We have ICE Candidates and Authentication details so we can attempt to
connect
• We have a certificate fingerprint, so we can have a secure call
Further Topics
In later versions of this book, the following topics will also be addressed. If you
have more questions please submit a Pull Request!
• Renegotiation
• Simulcast
15
Reduced Bandwidth Costs
Since media communication happens directly between peers you don’t have to
pay for transporting it.
Lower Latency
Communication is faster when it is direct! When a user has to run everything
through your server it makes things slower.
16
For the hosts in the same network, it is very easy to connect. Communication
between 192.168.0.1 -> 192.168.0.2 is easy to do! These two hosts can
connect to each other without any outside help.
However, a host using Router B has no way to directly access anything behind
Router A. A host using Router B could send traffic directly to Router A, but
the request would end there. How does Router A know which host it should
deliver the message too?
Protocol Restrictions
Some networks don’t allow UDP traffic at all, or maybe they don’t allow TCP.
Some networks have very low MTU. There are lots of variables that network
administrators can change that can make communication difficult.
Firewall/IDS Rules
Another is ‘Deep Packet Inspection’ and other intelligent filterings. Some network
administrators will run software that tries to process every packet. Many times
this software doesn’t understand WebRTC, so it blocks because it doesn’t know
what to do
NAT Mapping
NAT(Network Address Translation) Mapping is the magic that makes the connec-
tivity of WebRTC possible. This is how WebRTC allows two peers in completely
different subnets to communicate. It doesn’t use a relay, proxy, or server.
Again we have Agent 1 and Agent 2 and they are in different networks. However,
traffic is flowing completely through. Visualized that looks like.
{{}} graph TB subgraph netb [“Network B (IP Address 5.0.0.2)”] b2[“Agent 2
(IP 192.168.0.1)”] routerb[“Router B”] end
subgraph neta [“Network A (IP Address 5.0.0.1)”] routera[“Router A”] a1[“Agent
1 (IP 192.168.0.1)”] end
pub{Public Internet}
a1-.->routera; routera-.->pub; pub-.->routerb; routerb-.->b2; {{}}
To make this communication happen you establish a NAT Mapping. Creating
a NAT mapping will feel like an automated/config-less version of doing port
forwarding in your router.
The downside to NAT Mapping is that that network behavior is inconsistent
between networks. ISPs and hardware manufacturers may do it in different ways.
In some cases, network administrators may even disable it. The full range of
behaviors is understood and observable, so an ICE Agent is able to confirm it
created a NAT Mapping, and the attributes of the mapping.
17
The document that describes these behaviors is RFC 4787
Creating a Mapping
Creating a mapping is the easiest part. When you send a packet to an address
outside your network, a mapping is created! A NAT Mapping is just a temporary
public IP/Port that is allocated by your NAT. The outbound message will be
rewritten to have its source address be the newly created mapping. If a message
isn’t sent, it will be automatically routed back to the host inside the NAT.
The details around mappings are where it gets complicated.
Address Dependent Filtering Only the host the mapping was created for
can use the mapping. If you send a packet to host A it can respond back with as
many packets as it wants. If host B attempts to send a packet to that mapping,
it will be ignored.
18
Address and Port Dependent Filtering Only the host and port for which
the mapping was created for can use that mapping. If you send a packet to host
A:5000 it can respond back with as many packets as it wants. If host A:5001
attempts to send a packet to that mapping, it will be ignored.
Mapping Refresh
It is recommended that if a mapping is unused for 5 minutes it should be
destroyed. This is entirely up to the ISP or hardware manufacturer.
STUN
STUN(Session Traversal Utilities for NAT) is a protocol that was created just for
working with NATs. This is another technology that pre-dates WebRTC (and
ICE!). It is defined by RFC 5389 which also defines the STUN packet structure.
The STUN protocol is also used by ICE/TURN.
STUN is useful because it allows the programmatic creation of NAT Mappings.
Before STUN, we were able to create NAT Mappings, but we had no idea what
the IP/Port of it was! STUN not only gives you the ability to create a mapping,
but you also get the details so you can share it with others so they can send
traffic to you via the mapping you created.
Let’s start with a basic description of STUN. Later, we will expand on TURN
and ICE usage. For now, we are just going to describe the Request/Response
flow to create a mapping. Then we talk about how we get the details of it to
share with others. This is the process that happens when you have a stun:
server in your ICE urls for a WebRTC PeerConnection.
Protocol Structure
Every STUN packet has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0| STUN Message Type | Message Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic Cookie |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Transaction ID (96 bits) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
19
STUN Message Type Each STUN packet has a type. For now, we only care
about the following:
• Binding Request - 0x0001
• Binding Response - 0x0101
To create a NAT Mapping we make a Binding Request. Then the server
responds with a Binding Response.
Message Length This is how long the Data section is. This section contains
arbitrary data that is defined by the Message Type
Magic Cookie The fixed value 0x2112A442 in network byte order, it helps
distinguish STUN traffic from other protocols.
Data Data will contain a list of STUN attributes. A STUN Attribute has the
following structure.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type | Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Value (variable) ....
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The STUN Binding Request uses no attributes. This means a STUN Binding
Request contains only the header.
The STUN Binding Response uses a XOR-MAPPED-ADDRESS (0x0020). This at-
tribute contains an IP/Port. This is the IP/Port of the NAT Mapping that is
created!
20
Determining NAT Type
Unfortunately, the Mapped Address might not be useful in all cases. If it is
Address Dependent only the STUN server can send traffic back to you. If you
shared it and another peer tried to send messages in they will be dropped. This
makes it useless for communicating with others.
RFC5780 defines a method for running a test to determine your NAT Type. This
would be useful because you would know ahead of time if direct connectivity
was possible.
TURN
TURN (Traversal Using Relays around NAT) is defined in RFC5766 is the
solution when direct connectivity isn’t possible. It could be because you have
two NAT Types that are incompatible, or maybe can’t speak the same protocol!
TURN is also important for privacy purposes. By running all your communication
through TURN you obscure the client’s actual address.
TURN uses a dedicated server. This server acts as a proxy for a client. The client
connects to a TURN Server and creates an Allocation. By creating an Allocation
a client gets a temporary IP/Port/Protocol that can send into to get traffic back
to the client. This new listener is known as the Relayed Transport Address.
Think of it as a forwarding address, you give this out so others can send you
traffic via TURN! For each peer you give the Relay Transport Address to,
you must create a Permission to allow communication with you.
When you send outbound traffic via TURN it is sent via the Relayed Transport
Address. When a remote peer gets traffic they see it coming from the TURN
Server.
TURN Lifecycle
The following is everything that a client who wishes to create a TURN allocation
has to do. Communicating with someone who is using TURN requires no changes.
The other peer gets an IP/Port and they communicate with it like any other
host.
21
If the request succeeded, you get a response with the TURN Server with the follow
STUN Attributes in the Data section. * XOR-MAPPED-ADDRESS - Mapped Address
of the TURN Client. When someone sends data to the Relayed Transport
Address this is where it is forwarded to. * RELAYED-ADDRESS - This is the
address that you give out to other clients. If someone sends a packet to this
address it is relayed to the TURN client. * LIFETIME - How long until this
TURN Allocation is destroyed. You can extend the lifetime by sending a Refresh
request.
TURN Usage
TURN Usage exists in two forms. Usually, you have one peer acting as a ‘TURN
Client’ and the other side communicating directly. In some cases you might have
TURN Usage on both sides, for example because both clients are in networks
that block UDP and therefore the connection to the respective TURN servers
happens via TCP.
22
These diagrams help illustrate what that would look like.
ICE
ICE (Interactive Connectivity Establishment) is how WebRTC connects two
Agents. Defined in RFC8445, this is another technology that pre-dates WebRTC!
ICE is a protocol for establishing connectivity. It determines all the possible
routes between the two peers and then ensures you stay connected.
These routes are known as Candidate Pairs, which is a pairing of a local and
remote address. This where STUN and TURN come into play with ICE. These
addresses can be your local IP Address, NAT Mapping, or Relayed Transport
Address. Each side gathers all the addresses they want to use, exchanges them,
and then attempts to connect!
Two ICE Agents communicate using the STUN Protocol. They send STUN
packets to each other to establish connectivity. After connectivity is established
they can send whatever they want. It will feel like using a normal socket.
23
Creating an ICE Agent
An ICE Agent is either Controlling or Controlled. The Controlling Agent
is the one that decides the selected Candidate Pair. Usually, the peer sending
the offer is the controlling side.
Each side must have a user fragment and password. These two values must
be exchanged before connectivity checks to even begin. The user fragment
is sent in plain text and is useful for demuxing multiple ICE Sessions. The
password is used to generate a MESSAGE-INTEGRITY attribute. At the end of
each STUN packet, there is an attribute that is a hash of the entire packet using
the password as a key. This is used to authenticate the packet and ensure it
hasn’t been tampered with.
For WebRTC, all these values are distributed via the Session Description as
described in the previous chapter.
Candidate Gathering
We now need to gather all the possible addresses we are reachable at. These
addresses are known as candidates. These candidates are also distributed via
the Session Description.
Host A Host candidate is listening directly on a local interface. This can either
be UDP or TCP.
Peer Reflexive A Peer Reflexive candidate is when you get an inbound request
from an address that isn’t known to you. Since ICE is an authenticated protocol
you know the traffic is valid. This just means the remote peer is communicating
with you from an address it didn’t know about.
24
This commonly happens when a Host Candidate communicates with a Server
Reflexive Candidate. A new NAT Mapping was created because you are com-
municating outside your subnet.
Connectivity Checks
We now know the remote agent’s user fragment, password, and candidates.
We can now attempt to connect! Every candidate is paired with each other. So
if you have 3 candidates on each side, you now have 9 candidate pairs.
Visually it looks like this {{}} graph LR
subgraph agentA[“ICE Agent A”] hostA{Host Candidate} serverreflex-
iveA{Server Reflexive Candidate} relayA{Relay Candidate} end
style hostA fill:#ECECFF,stroke:red style serverreflexiveA fill:#ECECFF,stroke:green
style relayA fill:#ECECFF,stroke:blue
subgraph agentB[“ICE Agent B”] hostB{Host Candidate} serverreflexiveB{Server
Reflexive Candidate} relayB{Relay Candidate} end
hostA — hostB hostA — serverreflexiveB hostA — relayB linkStyle 0,1,2 stroke-
width:2px,fill:none,stroke:red;
serverreflexiveA — hostB serverreflexiveA — serverreflexiveB serverreflexiveA —
relayB linkStyle 3,4,5 stroke-width:2px,fill:none,stroke:green;
relayA — hostB relayA — serverreflexiveB relayA — relayB linkStyle 6,7,8
stroke-width:2px,fill:none,stroke:blue; {{}}
Candidate Selection
The Controlling and Controlled Agent both start sending traffic on each pair.
This is needed if one Agent is behind a Address Dependent Mapping, this will
cause a Peer Reflexive Candidate to be created.
Each Candidate Pair that saw network traffic is then promoted to a Valid
Candidate pair. The Controlling Agent then takes one Valid Candidate pair
and nominates it. This becomes the Nominated Pair. The Controlling and
Controlled Agent then attempt one more round of bi-directional communication.
If that succeeds, the Nominated Pair becomes the Selected Candidate Pair!
This is used for the rest of the session.
25
Restarts
If the Selected Candidate Pair stops working for any reason (NAT Mapping
Expires, TURN Server crashes) the ICE Agent will go to Failed state. Both
agents can be restarted and do the whole process over again.
Security 101
To understand the technology presented in this chapter you will need to under-
stand these terms first. Cryptography is a tricky subject so would be worth
consulting other sources also!
26
Cipher Cipher is a series of steps that takes plaintext to ciphertext. The
cipher then can be reversed so you can take your ciphertext back to plaintext.
A Cipher usually has a key to change its behavior. Another term for this is
encrypting and decrypting.
A simple cipher is ROT13. Each letter is moved 13 characters forward. To undo
the cipher you move 13 characters backward. The plaintext HELLO would become
the ciphertext URYYB. In this case, the Cipher is ROT, and the key is 13.
27
Nonce A nonce is additional input to a cipher. This is so you can get different
output from the cipher, even if you are encrypting the same message multiple
times.
If you encrypt the same message 10 times, the cipher will give you the same
ciphertext 10 times. By using a nonce you can get different input, while still
using the same key. It is important you use a different nonce for each message!
Otherwise it negates much of the value.
DTLS
DTLS (Datagram Transport Layer Security) allows two peers to establish se-
cure communication with no pre-existing configuration. Even if someone is
eavesdropping on the conversation they will not be able to decrypt the messages.
For a DTLS Client and a Server to communicate, they need to agree on a cipher
and the key. They determine these values by doing a DTLS handshake. During
the handshake, the messages are in plaintext. When a DTLS Client/Server has
exchanged enough details to start encrypting it sends a Change Cipher Spec.
After this message each subsequent message is encrypted!
Packet Format
Every DTLS packet starts with a header
28
Epoch The epoch starts at 0, but becomes 1 after a Change Cipher Spec.
Any message with a non-zero epoch is encrypted.
S->>C: HelloVerifyRequest
Note over C,S: Flight 2
C->>S: ClientHello
Note over C,S: Flight 3
S->>C: ServerHello
S->>C: Certificate
S->>C: ServerKeyExchange
S->>C: CertificateRequest
S->>C: ServerHelloDone
Note over C,S: Flight 4
C->>S: Certificate
C->>S: ClientKeyExchange
C->>S: CertificateVerify
C->>S: ChangeCipherSpec
C->>S: Finished
Note over C,S: Flight 5
S->>C: ChangeCipherSpec
29
S->>C: Finished
Note over C,S: Flight 6
{{}}
Certificate Certificate contains the certificate for the Client or Server. This is
used to uniquely identify who we were communicating with. After the handshake
is over we will make sure this certificate when hashed matches the fingerprint in
the SessionDescription
30
Key Generation
After the Handshake is complete you can start sending encrypted data. The
Cipher was chosen by the server and is in the ServerHello. How was the key
chosen though?
First we generate the Pre-Master Secret. To obtain this value Diffie–Hellman is
used on the keys exchanged by the ServerKeyExchange and ClientKeyExchange.
The details differ depending on the chosen Cipher.
Next the Master Secret is generated. Each version of DTLS has a defined
Pseudorandom function. For DTLS 1.2 the PRF takes the Pre-Master Secret
and random values in the ClientHello and ServerHello. The output from run-
ning the Pseudorandom Function is the Master Secret. The Master Secret
is the value that is what is used for the Cipher.
Exchanging ApplicationData
The workhorse of DTLS is ApplicationData. Now that we have a initialized
Cipher we can start encrypting and sending values.
ApplicationData messages use DTLS header as described before. The Payload
is populated with ciphertext. You now have a working DTLS Session and can
communicate securely.
DTLS has many more interesting features like re-negotiation. This aren’t used
by WebRTC, so not covered here.
SRTP
SRTP is a protocol designed just for encrypting RTP packets. To start a SRTP
session you specify your keys and cipher. Unlike DTLS it has no handshake
mechanism. So all the configuration/keys were configured during the DTLS
handshake.
DTLS provides a dedicated API to export the keys to used by another process.
This is defined in RFC 5705
Session Creation
SRTP defines a Key Deriviation Function that is used on the inputs. When
creating a SRTP Session the inputs are run through this to generate our keys
for our SRTP Cipher. After this you can move on to processing media.
Exchanging Media
Each RTP packet has a 16 bit SequenceNumber. These Sequence Numbers
are used to keep packets in order, like a Primary Key. During a call these will
rollover. SRTP keeps track of it and calls this the rollover counter.
31
When encrypting a packet SRTP uses the rollover counter and sequence number
as a nonce. This is to ensure that even if you send the same data twice, it will
ciphertext will be different. This is important so that an attacker can’t identify
patterns or attempt a replay attack.
Latency vs Quality
Real-time media is about making trade-offs between latency and quality. The
more latency you are willing to tolerate, the higher quality video you can expect.
32
Real World Limitations
These constraints are all caused by the limitations of the real world. These are
all characteristics of your network that you will need to overcome.
Jitter Jitter is the fact that Transmission Time may vary. Some times you
will see packets arrive in bursts. Any piece of hardware along the network path
can introduce issues.
Packet Loss Packet Loss is when messages are lost in transmission. The loss
could be steady, or it could come in spikes. This isn’t an uncommon occurrence
either!
Media 101
Codec
Frame Types
RTP
Packet Format
Every RTP packet has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
33
| Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Synchronization Source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| Contributing Source (CSRC) identifiers |
| .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Padding (P) Padding is a bool that controls if the payload has padding.
The last byte of the payload contains a count of how many padding bytes were
added.
Extension (X) If set the RTP header will have extensions. This is described
in greater detail below.
CSRC count (CC) The amount of CSRC identifiers that follow after the SSRC,
and before the payload.
Marker (M) The marker bit has no pre-set meaning, and is up to the user.
It some cases it is set when a user is speaking. It is also commonly used to mark
a keyframe.
Payload Type (PT) Payload Type is the unique identifier for what codec
is being carried by this packet.
For WebRTC the Payload Type is dynamic. VP8 in one call may be different
then another. The Offerer in the call determines the mapping of Payload Types
to codecs in the Session Description.
Timestamp The sampling instant for this packet. This is not a global clock,
but how much time has passed in the media stream.
34
Contributing Source (CSRC) A list that communicates what SSRCes con-
tributed to this packet.
This is commonly used for talking indicators. Lets say server side you combined
multiple audio feeds into a single RTP stream. You could then use this field to
say ‘Input stream A and C were talking at this moment’
Extensions
Mapping Payload Types to Codecs
RTCP
Packet Format
Every RTCP packet has the following structure:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P| RC | PT | length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Padding (P) Padding is a bool that controls if the payload has padding.
The last byte of the payload contains a count of how many padding bytes were
added.
Packet Type (PT) Unique Identifier for what type of RTCP Packet this is.
A WebRTC Agent doesn’t need to support all these types, and support between
Agents can be different. These are the ones you may commonly see though.
• Full INTRA-frame Request (FIR) - 192
• Negative ACKnowledgements (NACK) - 193
• Sender Report - 200
• Receiver Report - 201
• Generic RTP Feedback - 205
The significance of these packet types will be described in greater detail below.
35
Full INTRA-frame Request
This RTCP message notifies the sender that it needs to send a full image. This is
for when the encoder is giving you partial frames, but you aren’t able to decode
them.
This could happen because you had lots of packet loss, or maybe the decoder
crashed.
Negative ACKnowledgements
A NACK requests that a sender re-transmits a single RTP Packet. This is usually
caused when a RTP Packet is lost, but could also happen because it is late.
NACKs are much more bandwidth efficent then requesting that the whole frame
get sent again. Since RTP breaks up packets into very small chunks, you are
really just requesting one small missing piece.
Sender/Receiver Reports
These reports are used to send statistics between agents. This communicates
the amount of packets actually received and jitter.
The reports could be used for general or diagnostics, or basic Congestion Control.
Negative Acknowledgment
Also known as a NACK. This is one method of dealing with packet loss with
RTP.
A NACK is a RTCP message sent back to a sender to request re-transmission.
The receiver crafts a RTCP message with the SSRC and Sequence Number. If
the sender does not have this RTP packet available to re-send it just ignores the
message.
36
Congestion Control
Congestion Control is the act of adjusting the media depending on the attributes
of the network. If you don’t have a lot of bandwidth, you need to send lower
quality video.
Congestion Control is all about making trade offs.
JitterBuffer
37
{{<expand “Question: How is the latency compared to that of TCP?”>}} Short
Answer: Even though the WebRTC’s SCTP lives in “user space”, performance
is comparable to that of TCP. Data channels support “Partial-reliability” option
which can reduce latency caused by data retransmissions over a lossy connection.
Related sections for more details: * Partial Reliability {{}}
{{<expand “Question: What technologies are used in data channels?”>}} Short
Answer: WebRTC (primarily) uses UDP. On top of UDP, there’s SCTP, then
DCEP which controls establishment of data channels.
Related sections for more details: * Protocol Stack {{}}
{{<expand “Question: How do I know if I am sending too much?”>}} Short
Answer:
Calls on the send method on a data channel always return immediately. It does
not mean the data was sent on the wire. It will first be written to a buffer. . .
When you call send() faster than your network can process, the buffered size
will grow. RTCDataChannel.bufferedAmount is your friend.
Related sections for more details: * Flow Control API {{}}
{{<expand “Question: Do messages arrive in the order they were sent?”>}}
Short Answer:
By default, Yes. Optionally, you could disable it to receive messages as it arrives.
Related sections for more details: * Data Channel API * SCTP {{}}
{{<expand “Question: When do you use unordered delivery option?”>}} Short
Answer:
When newer information obsoletes the old such as positional information of
an object, or each message is independent from the others and need to avoid
head-of-line blocking delay.
Related sections for more details: * Data Channel Options * SCTP {{}}
{{<expand “Question: Can we send audio or video via data channel?”>}} Short
Answer:
You could send any data over data channel. In a browser case, it is your
responsibility to decode the data and pass it to a media player for rendering,
where it is automatically done when you use media channels.
Related sections for more details: * Audio and Video Communication {{}}
Functional Overview
Data channel can deliver any types of data. If you wish, you send send audio
or video data over the data channel too, but if you need to playback the media
38
in real-time, using media channels (See [Media Communication]({{< ref “05-
media-communication.md” >}})) that uses RTP/RTCP protocols are the better
options.
SCTP Protocol Layer This is the heart of the data channel. What it does
includes:
• Channel multiplexing (In SCTP, channels are called “streams”)
• Reliable delivery with TCP-like retransmission mechanism
• Partial-reliability options
• Congestion Avoidance
• Flow Control
39
Data Channel API
Connection / Teardown
Data Channel Options
Flow Control API
Keep-alive mechanism
Congestion avoidance
Selective ACK
Fast retransmission/recovery
Partial Reliability
DCEP
Open/ACK handshake
PPID
Parameter exchange
Reference to RFCs
Applied WebRTC
Now that you know how WebRTC works it is time to build with it. This chapter
explores what people are building with WebRTC, and how they are building
it. You will learn all the interesting things that are happening with WebRTC.
The power of WebRTC comes as a cost. Building production grade WebRTC
40
services is challenging. This chapter will try and explain those challenges before
you hit them.
By Use Case
The technologies behind WebRTC aren’t just for video chatting – since WebRTC
is a generic real-time framework, applications are limitless. Some of the most
common categories include:
Conferencing
Conferencing was the use case that WebRTC was originally designed for. You
can have two users directly connect to each other. Once they are connected they
can share their webcams, or maybe their desktop. Participants can send and
receive as many streams as they want. They can also add and remove those
streams at any time.
Going beyond just media DataChannels are very useful for building a conferencing
experience. Users can send metadata or share documents. You can create multiple
streams and have multiple conversations going at once.
Conferencing becomes more difficult as more users join the call. How you scale
is entirely up to you. The WebRTC topologies section covers this further.
Broadcasting
WebRTC can also be used to broadcast video streams one-to-many.
Remote Control
File-Transfer
Distributed CDN
IoT
Protocol Bridging
WebRTC Topologies
Regardless of whether you use WebRTC for voice, video or DataChannel capa-
bilities, everything starts with a PeerConnection. How peers connect to one
another is a key design consideration in any WebRTC application, and there are
many established approaches.
Client-Server
The low-latency nature of WebRTC protocol is great for calls, and it’s common to
see conferences arranged in p2p mesh configuration (for low latency), or peering
through an SFU (Selective Forwarding Unit) to improve call quality. Since codec
41
support varies by browser, many conferencing servers allow browsers to broadcast
using proprietary or non-free codecs like h264, and then re-encode to an open
standard like VP8 at the server level; when the SFU performs an encoding
task beyond just forwarding packets, it is now called an MCU (Multi-point
Conferencing Unit). While SFU are notoriously fast and efficient and great for
conferences, MCU can be very resource intensive! Some conferencing servers
even perform heavy tasks like compositing (combining together) A/V streams,
customized for each caller, to minimize client bandwidth use by sending only a
single stream of all the other callers.
Peer-To-Peer
One-To-One
P2P Mesh
Debugging
Debugging WebRTC can be a daunting task. There are a lot of moving parts,
and they all can break independently. If you aren’t careful you can lose weeks
of time looking at the wrong things. When you do finally find the part that is
broken you will need to learn a bit to understand it.
This chapter will get you in the mindset to debug WebRTC. It will show you
how to break down the problem. After we know the problem we will give a quick
tour of the popular debugging tools.
Signaling Failure
Networking Failure
Security Failure
Media Failure
Data Failure
42
Tools of the trade
tcpdump
wireshark
webrtc-internals
History
This section is ongoing and we don’t have all the facts yet. We are conducting
interviews and build a history of digital communication.
RTP
RTP and RTCP is the protocol that handles all media transport for WebRTC.
It was defined in RFC 1889 in January 1996. We are very lucky to have one of
the authors Ron Frederick talk about it himself. Ron recently uploaded Network
Video tool, a project that informed RTP.
In his own words:
In October of 1992, I began to experiment with the Sun VideoPix frame grabber
card, with the idea of writing a network videoconferencing tool based upon IP
multicast. It was modeled after “vat” – an audioconferencing tool developed
at LBL, in that it used a similar lightweight session protocol for users joining
into conferences, where you simply sent data to a particular multicast group and
watched that group for any traffic from other group members.
In order for the program to really be successful, it needed to compress the video
data before putting it out on the network. My goal was to make an acceptable
looking stream of data that would fit in about 128 kbps, or the bandwidth
available on a standard home ISDN line. I also hoped to produce something
that was still watchable that fit in half this bandwidth. This meant I needed
approximately a factor of 20 in compression for the particular image size and
frame rate I was working with. I was able to achieve this compression and filed
for a patent on the techniques I used, later granted as patent US5485212A:
Software video compression for teleconferencing.
In early November of 1992, I released the videoconferencing tool “nv” (in binary
form) to the Internet community. After some initial testing, it was used to
videocast parts of the November Internet Engineering Task Force all around the
world. Approximately 200 subnets in 15 countries were capable of receiving this
broadcast, and approximately 50-100 people received video using “nv” at some
point in the week.
Over the next couple of months, three other workshops and some smaller meet-
ings used “nv” to broadcast to the Internet at large, including the Australian
43
NetWorkshop, the MCNC Packet Audio and Video workshop, and the MultiG
workshop on distributed virtual realities in Sweden.
A source code release of “nv” followed in February of 1993, and in March I
released a version of the tool where I introduced a new wavelet-based compression
scheme. In May of 1993, I added support for color video.
The network protocol used for “nv” and other Internet conferencing tools became
the basis of the Realtime Transport Protocol (RTP), standardized through the
Internet Engineering Task Force (IETF), first published in RFCs 1889-1890 and
later revised in RFCs 3550-3551 along with various other RFCs that covered
profiles for carrying specific formats of audio and video.
Over the next couple of years, work contined on “nv”, porting the tool to a
number of additional hardware platforms and video capture devices. It continued
to be used as one of the primary tools for broadcasting conferences on the Internet
at the time, including being selected by NASA to broadcast live coverage of
shuttle missions online.
In 1994, I added support in “nv” for supporting video compression algorithms
developed by others, including some hardware compression schemes such as the
CellB format supported by the SunVideo video capture card. This also allowed
“nv” to send video in CUSeeMe format, to send video to users running CUSeeMe
on Macs and PCs.
The last publicly released version of “nv” was version 3.3beta, released in July of
1994. I was working on a “4.0alpha” release that was intended to migrate “nv”
over to version 2 of the RTP protocol, but this work was never completed due
to my moving on to other projects. A copy of the 4.0 alpha code is included in
the Network Video tool archive for completeness, but it is unfinished and there
are known issues with it, particularly in the incomplete RTPv2 support.
The framework provided in “nv” later went on to become the basis of video
conferencing in the “Jupiter multi-media MOO” project at Xerox PARC, which
eventually became the basis for a spin-off company “PlaceWare”, later acquired
by Microsoft. It was also used as the basis for a number of hardware video
conferencing projects that allowed sending of full NTSC broadcast quality video
over high-bandwidth Ethernet and ATM networks. I also later used some of this
code as the basis for “Mediastore”, which was a network-based video recording
and playback service.
Do you remember the motivations/ideas of the other people on the
draft?
We were all researchers working on IP multicast, and helping to create the
Internet multicast backbone (aka MBONE). The MBONE was created by Steve
Deering (who first developed IP multicast), Van Jacobson, and Steve Casner.
Steve Deering and I had the same advisor at Stanford, and Steve ended up
going to work at Xerox PARC when he left Stanford, I spent a summer at Xerox
PARC as an intern working on IP multicast-related projects and continued to
44
work for them part time while at Stanford and later full time. Van Jacobson
and Steve Casner were two of the four authors on the initial RTP RFCs, along
with Henning Schulzrinne and myself. We all had MBONE tools that we were
working on that allowed for various forms of online collaboration, and trying to
come up with a common base protocol all these tools could use was what led to
RTP.
Multicast is super fascinating. WebRTC is entirely unicast, mind
expanding on that?
Before getting to Stanford and learning about IP multicast, I had a long history
working on ways to use computers as a way for people to communicate with one
another. This started in the early 80s for me where I ran a dial-up bulletin board
system where people could log on and leave messages for one another, both
private (sort of the equivalent of e-mail) and public (discussion groups). Around
the same time, I also learned about the online service provider CompuServe.
One of the cool features on CompuServe was something called a “CB Simulator”
where people could talk to one another in real-time. It was all text-based, but it
had a notion of “channels” like a real CB radio, and multiple people could see
what others typed, as long as they were in the same channel. I built my own
version of CB which ran on a timesharing system I had access to which let users
on that system send messages to one another in real-time, and over the next few
years I worked with friends to develop more sophisticated versions of real-time
communication tools on several different computer systems and networks. In
fact, one of those systems is still operational, and I use it talk every day to folks
I went to college with 30+ years ago!
All of those tools were text based, since computers at the time generally didn’t
have any audio/video capabilities, but when I got to Stanford and learned about
IP multicast, I was intrigued by the notion of using multicast to get something
more like a true “radio” where you could send a signal out onto the network
that wasn’t directed at anyone in particular, but everyone who tuned to that
“channel” could receive it. As it happened, the computer I was porting the
IP multicast code to was the first generation SPARCstation from Sun, and it
actually had built-in telephone-quality audio hardware! You could digitize sound
from a microphone and play it back over built-in speakers (or via a headphone
output). So, my first thought was to figure out how to send that audio out onto
the network in real-time using IP multicast, and see if I could build a “CB radio”
equivalent with actual audio instead of text.
There were some tricky things to work out, like the fact that the computer could
only play one audio stream at a time, so if multiple people were talking you
needed to mathematically “mix” multiple audio streams into one before you
could play it, but that could all be done in software once you understood how
the audio sampling worked. That audio application led me to working on the
MBONE and eventually moving from audio to video with “nv”.
Anything that got left out of the protocol that you wish you had
45
added? Anything in the protocol you regret?
I wouldn’t say I regret it, but one of the big complaints people ended up having
about RTP was the complexity of implementing RTCP, the control protocol
that ran in parallel with the main RTP data traffic. I think that complexity
was a large part of why RTP wasn’t more widely adopted, particularly in the
unicast case where there wasn’t as much need for some of RTCP’s features. As
network bandwidth became less scarce and congestion wasn’t as big a problem,
a lot of people just ended up streaming audio & video over plain TCP (and later
HTTP), and generally speaking it worked “well enough” that it wasn’t worth
dealing with RTP.
Unfortunately, using TCP or HTTP meant that multi-party audio and video
applications had to send the same data over the network multiple times, to
each of the peers that needed to receive it, making it much less efficient from
a bandwidth perspective. I sometimes wish we had pushed harder to get IP
multicast adopted beyond just the research community. I think we could have
seen the transition from cable and broadcast television to Internet-based audio
and video much sooner if we had.
What things did you imagine being built with RTP? Do have any
cool RTP projects/ideas that got lost to time?
One of the fun things I built was a version of the classic “Spacewar” game which
used IP multicast. Without having any kind of central server, multiple clients
could each run the spacewar binary and start broadcasting their ship’s location,
velocity, the direction it was facing, and similar information for any “bullets”
it had fired, and all of the other instances would pick up that information and
render it locally, allowing users to all see each other’s ships and bullets, with
ships “exploding” if they crashed into each other or bullets hit them. I even
made the “debris” from the explosion a live object that could take out other
ships, sometimes leading to fun chain reactions!
In the spirit of the original game, I rendered it using simulated vector graphics, so
you could do things like zooming your view in & out and everything would scale
up/down. The ships themselves were a bunch of line segments in vector form
that I had some of my colleagues at PARC helped me to design, so everyone’s
ship had a unique look to it.
Basically, anything that could benefit from a real-time data stream that didn’t
need perfect in-order delivery could benefit from RTP. So, in addition to audio
& video we could build things like a shared whiteboard. Even file transfers could
benefit from RTP, especially in conjunction with IP multicast.
Imagine something like BitTorrent but where you didn’t need all the data going
point-to-point between peers. The original seeder could send a multicast stream
to all of the leeches at once, and any packet losses along the way could be quickly
cleaned up by a retransmission from any peer that successfully received the data.
You could even scope your retransmission requests so that some peer nearby
46
delivered the copy of the data, and that too could be multicast to others in that
region, since a packet loss in the middle of the network would tend to mean a
bunch of clients downstream of that point all missed the same data.
Why did you have to roll your own video compression. Was nothing
else available at the time?
At the time I began to build “nv”, the only systems I know of that did video-
conferencing were very expensive specialized hardware. For instance, Steve
Casner had access to a system from BBN that was called “DVC” (and later
commercialized as “PictureWindow”). The compression required specialized
hardware but the decompression could be done in software. What made “nv”
somewhat unique was that both compression and decompression was being done
in software, with the only hardware requirement being something to digitize an
incoming analog video signal.
Many of the basic concepts about how to compress video existed by then, with
things like the MPEG-1 standard appearing right around the same time “nv”
did, but real-time encoding with MPEG-1 was definitely NOT possible at the
time. The changes I made were all about taking those basic concepts and
approximating them with much cheaper algorithms, where I avoided things like
cosine transforms and floating point, and even avoided integer multiplications
since those were very slow on SPARCstations. I tried to do everything I could
with just additions/subtractions and bit masking and shifting, and that got back
enough speed to still feel somewhat like video.
Within a year or two of the release of “nv”, there were many different audio
and video tools to choose from, not only on the MBONE but in other places
like the CU-SeeMe tool built on the Mac. So, it was clearly an idea whose time
had come. I actually ended up making “nv” interoperate with many of these
tools, and in a few cases other tools picked up my “nv” codecs so they could
interoperate when using my compression scheme.
47
SDP
ICE
SRTP
SCTP
DTLS
FAQ
Contributing
48