Flash Player and the corresponding ActionScript programming language have NetConnection/NetStream programming APIs for developing communicating web applications. Emerging WebRTC has PeerConnection/MediaStream APIs to enable real-time multimedia communication in the browser. This article compares and contrasts these two approaches to designing application programming interfaces (API). I have attempted to compare the API-styles rather than the specific implementations in Flash Player or browsers.
Background
In Flash Player + ActionScript, four categories of objects (or classes) are needed to build a communicating web application such as a video chat. First, you need a connection to the server which may be a media server processing the media path in a client server approach or just a rendezvous server for establishing a peer-to-peer media path in the peer-to-peer approach. Second, you create named streams, for publishing and for playing. Third, you attach the publish side stream with your capture devices for audio and/or video. And fourth, you attach the play side stream with display or playback of remote media. The corresponding classes for these are:
NetConnection,
NetStream,
Camera and/or
Microphone, and
Video or
VideoDisplay [
1] (pp.16-18). The diagram below shows a two party video call between our regular actors, Alice and Bob, each creating a named stream to publish, and each playing the other party's named stream to display. On each terminal, a NetConnection object is used to create two named NetStream objects, one is attached to Camera and Microphone and another to a Video or VideoDisplay element. Additionally, for displaying local preview, the Camera object is also attached directly to a Video or VideoDisplay.
In WebRTC capable browser + Javascript, again, four categories of objects (or classes) are needed. First, a
LocalMediaStream created via the getUserMedia function to represent the capture stream from the user's camera and microphone devices. Second, an
PeerConnection (actually called
RTCPeerConnection) object is created to establish a peer-to-peer path between the two browser instances. Third, the
LocalMediaStream object is attached (or added) to the PeerConnection on one end, and appears as remote
MediaStream object at the other end. Fourth, this remote MediaStream is attached to an
audio or
video element to playback or display the received media. The LocalMediaStream is also attached to another video element to display local preview. Unlike publish vs. play mode of NetStream, I put the LocalMediaStream vs. remote MediaStream into separate categories, due to the way they are created and what they represent, even though LocalMediaStream is also a MediaStream.
For comparison purpose, let us call these Flash vs WebRTC set of APIs as NetConnection vs PeerConnection.
Level of abstraction: named stream vs. offer-answer
Clearly NetConnection presents a higher level of abstraction than the PeerConnection. In PeerConnection, each media path is explicitly created via a new RTCPeerConnection object on each side. It uses
offer-answer model - where one side sends a session offer with its capabilities and the other side responds back with an answer with its side's capabilities. The offer-answer messages are exchanged between the two browsers outside the WebRTC APIs. In case of NetConnection, when a participant publishes a
named stream (MediaStream), zero or more other participants may play that named stream. The implementation of these connection and stream classes hide the details of the peer-to-peer or client-server media path establishment. Internally, some variation of offer-answer and transport candidate messages are exchanged between the Flash Player instances, but are not exposed to the programming interface - thus making it a higher level of abstraction.
Topology creation: client-server vs. peer-to-peer
A NetConnection is a
client-server abstraction where as a PeerConnection is a
peer-to-peer path abstraction. A NetConnection object can be used to create client-server as well as peer-to-peer media paths, whereas PeerConnection is targeted only towards a peer-to-peer media path. To further clarify - a NetConnection is between a client and a server, which may be using RTMP in which case the server is a media server for the client-server media path, or may be RTMFP in which case the server may just be a rendezvous server to establish peer-to-peer MediaStreams between the two browsers. Note that with RTMFP it is also possible to do client-server media path. Additionally, it also allows application level multicast for use cases such as large event broadcast. With PeerConnection, the media path is intended to be peer-to-peer, but it is also possible to implement PeerConnection at a server (outside the browser), in which case it gets transparently used to create a client-server media path because the server behaves as another browser endpoint of the WebRTC media path. With WebRTC, once MediaStream
proxying is implemented in the browser, it may be possible, though not trivial, to create application level multicast for large group event broadcast. Proxying a media stream means that a received remote MediaStream from one peer may be attached (or attached) to another PeerConnection to another peer, thus proxying the media path via this peer.
Signaling channel: built-in vs. external
As explained in the previous points, NetConnection is the signaling channel to establish the client-server or peer-to-peer MediaStream objects. Due to the built-in nature of the client-server signaling channel, it is non-trivial for third-party vendors to build such a signaling server without implementing the full RTMP and/or RTMFP protocol. On the other hand, a PeerConnection generates the offer-answer and other signaling messages, which must be exchanged via another channel between the two browser instances, e.g., via a notification service over WebSocket, by using Ajax, or over other third-party channels. Such notification service does not need to understand the details of the WebRTC protocol. This gives more flexibility and freedom to the developers on how the signaling channel for a particular communicating web application is implemented.
Server dependent vs. independent
As explained in the previous points, a NetConnection is between a client and a server, thus dependent on the server. Additionally, any server side media processing such as mixing or gateway'ing to VoIP can only be done by extending or modifying this server. If the server is vendor dependent, which seems to be the case for RTMFP, until recently, then it makes these media processing and gateway'ing features also vendor dependent. On the other hand, a PeerConnection does not care about the server side. But an implementation of a web application that uses this object does. Hence, in practice, these types of servers may be needed - notification server for exchanging offer-answer and other messages, media server for doing things like server side recording or mixing, media gateway for translating to VoIP or other networks, and media relay for gathering reflexive or relayed transport candidates and for actually relaying media path across network middle boxes.
Summary and further reading
In summary, the
higher level NetConnection-style interface with built-in signaling channel makes it easier for programmers to create applications, ranging from two-party video calls to multi-party conferences to large scale event broadcasts. On the other hand,
greater flexibility (and freedom) of the PeerConnection-style interface allows building wide range of system architectures using third-party components such as notification servers, media servers, media gateways and media relays.
In my past research on flash-videoio [
2] and aRtisy [
3], I have proposed another higher level abstraction for the programming APIs for communicating web applications. In fact [3] describes how to implement the NetConnection-style interface on top of WebRTC's PeerConnection-style interface and using an external resource server to provide the notification service. It extends the named stream model to named resource model to implement several other applications such as text chat or shared whiteboard or notepad. Then, [2] and [3] describe how to build a
widget-oriented interface which consists of a single
video-io widget used in either publish or play mode to create a wide range of communicating web applications such as two-party calls, multi-party conferences and video presence for Flash and WebRTC, respectively.