Kundan Singh: Communication

Showing posts with label Communication. Show all posts

Named stream abstraction for WebRTC notification with P2P media

This is a continuation of my earlier article on WebRTC notification system and signaling paradigms which presents, among others, a named stream abstraction for WebRTC signaling. That article also mentioned my light weight notification server notify.py and sample client code webrtc.html as part of my open source rtclite project.

In this article I present another light weight notification server streams.py and sample client code streams.html as part of that project. This enables named stream abstraction described in my earlier article. The named stream abstraction is closer to the traditional Flash Player's NetStream based publish-subscribe media, and facilitates wide range of application use cases.

Writing coherent dialogs using Twilio

This article shows the complexity of writing a multistep interactive voice dialog for cloud telephony. It presents our dialog package to create such a dialog as a single and simple coherent script in Tcl. The script performs programmed interactive voice or messaging interaction using the popular cloud telephony platform, Twilio.

It should also work, after trivial modifications, with other similar systems such as Restcomm or Somleng that use TwiML or similar markups. The package can further be extended to support other popular dialog languages such as VoiceXML. The core idea can be re-implemented using other scripting languages such as Python.

A journey through real time...

I did my first project on real-time communications in 1997. In the past two decades I did numerous projects in this general area. In particular, these were related to voice over IP (VoIP), multimedia and web communication. As 2016 comes to an end, I reminisce my 20-year journey. I attempt to capture the gist of my projects. How my project themes and ideas evolved over time?

What are endpoint driven communication systems?

A lot of my work in the past decade has focussed on endpoint driven systems, e.g., peer-to-peer Internet telephony (P2P-SIP) [1,2] for inherent scalability/robustness, Rich Internet Applications for web video conferencing [3,4], and more recently, resource-based software architecture [5,6]. In this article, I emphasize the importance of such systems, and differentiate them from other system architectures in the context of real-time communication.

Impress.js 3D presentations on cloud, communication, WebRTC and mobile

I am really impressed by impress.js presentation framework based on CSS3. It is pretty low level, requires knowledge of HTML and CSS, and is small enough that it can stay out of your way. Others have created very impressive 3D presentations. I also did my share of 3D presentations in 2015, as various conference paper presentations, while at Avaya Labs.

click here to see all my presentations

Below, I list my presentations and associated demo videos largely based on impress.js framework. When viewing a presentation, use an HTML5 capable browser such as Chrome or Safari, and please wait for it to loads completely, and the browser's loading icon to stop spinning, before doing the slide show.

Introducing rtclite

We have attempted to unify our diverse Python based projects related to SIP, RTP, XMPP, RTMP, REST, etc., into a single theme of real-time communication (RTC). In particular, we have migrated the relevant source code from other projects to the new "rtclite" project after cleanup and refactoring. Moving forward, any new functionality related to Python implementations of real-time communication protocols and applications will be done in this new project.

Over the next few weeks, we will continue to evolve this project, migrate or fix issues/comments, and deprecate the previous older projects.

I will also write blog posts here to introduce various specific modules and how they are used in real world in the new few weeks.

From the project page at http://www.rtclite.com (currently forwards to https://github.com/theintencity/rtclite)

"Light-weight implementations of real-time communication protocols and application in Python

This project aims to create an open source repository of light weight implementations of real-time communication (RTC) protocols and applications. In a nutshell, it contains reference implementations, prototypes and sample applications, mostly written in Python, of various RTC standards defined in the IETF, W3C and elsewhere, e.g., RFC 3261 (SIP), RFC 3550 (RTP), RFC 3920 (XMPP), RTMP, etc."

We encourage all users of my open source projects to visit this new project, and if possible migrate your applications to use this new project.

Irony in Internet Communications

We live in a world where

first, we build Internet Protocol over phone network (modem), and then we build a phone system over Internet Protocol (VoIP),
first, we build a phone system to run on Internet (VoIP), and then drive the internet from our phones (smart phones),
first, we build open communication on top of monopolies (Internet over PSTN), and then build closed communication on top of that to promote monopolies (Facetime, Skype, or you-name-it-here).

Wouldn't it be easier if we start afresh?

NetConnection or PeerConnection?

Flash Player and the corresponding ActionScript programming language have NetConnection/NetStream programming APIs for developing communicating web applications. Emerging WebRTC has PeerConnection/MediaStream APIs to enable real-time multimedia communication in the browser. This article compares and contrasts these two approaches to designing application programming interfaces (API). I have attempted to compare the API-styles rather than the specific implementations in Flash Player or browsers.

Background

In Flash Player + ActionScript, four categories of objects (or classes) are needed to build a communicating web application such as a video chat. First, you need a connection to the server which may be a media server processing the media path in a client server approach or just a rendezvous server for establishing a peer-to-peer media path in the peer-to-peer approach. Second, you create named streams, for publishing and for playing. Third, you attach the publish side stream with your capture devices for audio and/or video. And fourth, you attach the play side stream with display or playback of remote media. The corresponding classes for these are: NetConnection, NetStream, Camera and/or Microphone, and Video or VideoDisplay [1] (pp.16-18). The diagram below shows a two party video call between our regular actors, Alice and Bob, each creating a named stream to publish, and each playing the other party's named stream to display. On each terminal, a NetConnection object is used to create two named NetStream objects, one is attached to Camera and Microphone and another to a Video or VideoDisplay element. Additionally, for displaying local preview, the Camera object is also attached directly to a Video or VideoDisplay.

In WebRTC capable browser + Javascript, again, four categories of objects (or classes) are needed. First, a LocalMediaStream created via the getUserMedia function to represent the capture stream from the user's camera and microphone devices. Second, an PeerConnection (actually called RTCPeerConnection) object is created to establish a peer-to-peer path between the two browser instances. Third, the LocalMediaStream object is attached (or added) to the PeerConnection on one end, and appears as remote MediaStream object at the other end. Fourth, this remote MediaStream is attached to an audio or video element to playback or display the received media. The LocalMediaStream is also attached to another video element to display local preview. Unlike publish vs. play mode of NetStream, I put the LocalMediaStream vs. remote MediaStream into separate categories, due to the way they are created and what they represent, even though LocalMediaStream is also a MediaStream.

For comparison purpose, let us call these Flash vs WebRTC set of APIs as NetConnection vs PeerConnection.

Level of abstraction: named stream vs. offer-answer

Clearly NetConnection presents a higher level of abstraction than the PeerConnection. In PeerConnection, each media path is explicitly created via a new RTCPeerConnection object on each side. It uses offer-answer model - where one side sends a session offer with its capabilities and the other side responds back with an answer with its side's capabilities. The offer-answer messages are exchanged between the two browsers outside the WebRTC APIs. In case of NetConnection, when a participant publishes a named stream (MediaStream), zero or more other participants may play that named stream. The implementation of these connection and stream classes hide the details of the peer-to-peer or client-server media path establishment. Internally, some variation of offer-answer and transport candidate messages are exchanged between the Flash Player instances, but are not exposed to the programming interface - thus making it a higher level of abstraction.

Topology creation: client-server vs. peer-to-peer

A NetConnection is a client-server abstraction where as a PeerConnection is a peer-to-peer path abstraction. A NetConnection object can be used to create client-server as well as peer-to-peer media paths, whereas PeerConnection is targeted only towards a peer-to-peer media path. To further clarify - a NetConnection is between a client and a server, which may be using RTMP in which case the server is a media server for the client-server media path, or may be RTMFP in which case the server may just be a rendezvous server to establish peer-to-peer MediaStreams between the two browsers. Note that with RTMFP it is also possible to do client-server media path. Additionally, it also allows application level multicast for use cases such as large event broadcast. With PeerConnection, the media path is intended to be peer-to-peer, but it is also possible to implement PeerConnection at a server (outside the browser), in which case it gets transparently used to create a client-server media path because the server behaves as another browser endpoint of the WebRTC media path. With WebRTC, once MediaStream proxying is implemented in the browser, it may be possible, though not trivial, to create application level multicast for large group event broadcast. Proxying a media stream means that a received remote MediaStream from one peer may be attached (or attached) to another PeerConnection to another peer, thus proxying the media path via this peer.

Signaling channel: built-in vs. external

As explained in the previous points, NetConnection is the signaling channel to establish the client-server or peer-to-peer MediaStream objects. Due to the built-in nature of the client-server signaling channel, it is non-trivial for third-party vendors to build such a signaling server without implementing the full RTMP and/or RTMFP protocol. On the other hand, a PeerConnection generates the offer-answer and other signaling messages, which must be exchanged via another channel between the two browser instances, e.g., via a notification service over WebSocket, by using Ajax, or over other third-party channels. Such notification service does not need to understand the details of the WebRTC protocol. This gives more flexibility and freedom to the developers on how the signaling channel for a particular communicating web application is implemented.

Server dependent vs. independent

As explained in the previous points, a NetConnection is between a client and a server, thus dependent on the server. Additionally, any server side media processing such as mixing or gateway'ing to VoIP can only be done by extending or modifying this server. If the server is vendor dependent, which seems to be the case for RTMFP, until recently, then it makes these media processing and gateway'ing features also vendor dependent. On the other hand, a PeerConnection does not care about the server side. But an implementation of a web application that uses this object does. Hence, in practice, these types of servers may be needed - notification server for exchanging offer-answer and other messages, media server for doing things like server side recording or mixing, media gateway for translating to VoIP or other networks, and media relay for gathering reflexive or relayed transport candidates and for actually relaying media path across network middle boxes.

Summary and further reading

In summary, the higher level NetConnection-style interface with built-in signaling channel makes it easier for programmers to create applications, ranging from two-party video calls to multi-party conferences to large scale event broadcasts. On the other hand, greater flexibility (and freedom) of the PeerConnection-style interface allows building wide range of system architectures using third-party components such as notification servers, media servers, media gateways and media relays.

In my past research on flash-videoio [2] and aRtisy [3], I have proposed another higher level abstraction for the programming APIs for communicating web applications. In fact [3] describes how to implement the NetConnection-style interface on top of WebRTC's PeerConnection-style interface and using an external resource server to provide the notification service. It extends the named stream model to named resource model to implement several other applications such as text chat or shared whiteboard or notepad. Then, [2] and [3] describe how to build a widget-oriented interface which consists of a single video-io widget used in either publish or play mode to create a wide range of communicating web applications such as two-party calls, multi-party conferences and video presence for Flash and WebRTC, respectively.

My WebRTC related papers from 2013

This post gives an overview of my past and ongoing research on web-based multimedia communication. I wrote four published co-authored research papers last year answering these questions - How does WebRTC interoperate with existing SIP/VoIP systems and how does SIP in JavaScript compare with other approaches? What challenges lie ahead of enterprises in adopting WebRTC and what can they do? How can developers take advantage of WebRTC and HTML5 in creating rich internet applications in the cloud? How does emerging HTML5 technology allow better enterprise collaboration using rich application mash-ups? Read on if one or more of these questions interest you...

A Case for SIP in JavaScript

WebRTC (Web Real Time Communications) promises easy interaction with others using just your web browser on a website. SIP (Session Initiation Protocol) is the backbone of Internet communications interconnecting many devices, servers and systems you use daily for your call. Connecting WebRTC with SIP/RTP raises more questions than what we already know, e.g., does interoperability really work? what changes do you need in the existing devices and systems? does the conversion happen at the server or in the browser? what are the trade-offs of doing the translation? what are the risks if you go full throttle with WebRTC in the browser? and many more. There is a need for a fair comparison to evaluate not only what is theoretically possible but also practically achievable with existing systems.

"This paper presents the challenges and compares the alternatives to interoperate between the Session Initiation Protocol (SIP)-based systems and the emerging standards for the Web Real-Time Communication (WebRTC). We argue for an end-point and web-focused architecture, and present both sides of the SIP in JavaScript approach. Until WebRTC has ubiquitous cross-browser availability, we suggest a fall back strategy for web developers — detect and use HTML5 if available, otherwise fall back to a browser plugin."

read >>

Taking on WebRTC in an enterprise

On one hand WebRTC enables the end-users to communicate with others directly from their browsers, on the other it opens ~~a can of worms~~ a door to many novel communicating applications that scares your enterprise IT guy. Existing enterprise policies and tools are designed to work with existing communication systems such as emails and VoIP, and employ bunch of edge devices such as firewalls, reverse proxies, email filters and SBCs (session border controllers) to protect the enterprise data and interaction from the outside world. Unfortunately, session-less WebRTC media flows and, more importantly, end-to-end encrypted data channels, pose the same threat to enterprise security that peer-to-peer applications did. The first reaction from your IT guy will likely be to block all such peer-to-peer flows in the firewall. There is a need to systematically understand the risk and propose potential solutions so that the pearls of WebRTC can co-exist with the chains-and-locks of enterprise security.

"WebRTC will have a major impact on enterprise communications, as well as consumer communications and their interactions with enterprises. This article illustrates and discusses a number of issues that are specific to WebRTC enterprise usage. Some of these relate to security: firewall traversal, access control, and peer-to-peer data flows. Others relate to compliance: recording, logging, and enforcing enterprise policies. Additional enterprise considerations relate to integration and interoperation with existing communication infrastructure and session-centric telephony systems."

read>>

Building communicating web applications leveraging endpoints and cloud resource service

One problem with the growth of SIP was that there were not many applications that demonstrated its capabilities beyond replacing a telephone call. This resulted in telephony mindset dominating the evolution in both standards and deployments. At the core of it, WebRTC essentially does most of what SIP aims for - peer-to-peer media flows and control in the end-points. Unfortunately, telephony influence in existing SIP devices, servers and providers have hidden that benefit from the end-user. There is a need to avoid such dominating telephony influence on WebRTC by demonstrating how easy it is to build communicating web applications, without relying on existing (and expensive) VoIP infrastructure.

Secondly, the open web is threatened by closed social websites that tend to lock the user data in their ecosystem, and force the user to only use what they build, go where they exist and talk to who they allow. This developer focused research is an attempt to go against them, by showing a rich internet application architecture focused on application logic in the end-point and application mash-ups at the data-level. It shows my 15+ crucial web widgets written in few hundred lines of code each, covering wide rand of use cases from video click-to-call to automatic-call-distribution, call-queue and shared white-boards, and my web applications such as multiparty video collaboration, instant messaging/communicator, video presence and personal social wall, all written using HTML5 without depending on "the others" (imagine tune from lost played here) - the others are the existing SIP, XMPP or Flash Player based systems.

"We describe a resource-based architecture to quickly and easily build communicating web applications. Resources are structured and hierarchical data stored in the server but accessed by the endpoint via the application logic running in the browser. The architecture enables deployments that are fully cloud based, fully on-premise or hybrid of the two. Unlike a single web application controlling the user's social data, this model allows any application to access the authenticated user's resources promoting application mash-ups. For example, user contacts are created by one application but used by another based on the permission from the user instead of the first application."

read>>

Private overlay of enterprise social data and interactions in the public web context

Continuing the previous research theme, this research focuses on enterprise collaboration and enterprise social interactions. The goal is to define a few architectural principles to facilitate rich user interactions, both real-time and offline in the form of digital trail, in an enterprise environment. In particular, my proof-of-concept project described in the paper shows how to do web annotations, virtual presence, co-browsing, click-to-call from corporate directory, and internal context sensitive personal wall. The basic idea is to separate the application logic from the user data, keep the user data protected within the enterprise network, use public web as a context for storing the user interactions and build a generic web application framework running the browser.

"We describe our project, living-content, that creates a private overlay of enterprise social data and interactions on public websites via a browser extension and HTML5. It enables starting collaboration from and storing private interactions in the context of web pages. It addresses issues such as lack of adoption, privacy concerns and fragmented collaboration seen in enterprise social networks. In a data-centric approach, we show application scenarios for virtual presence, web annotations, interactions and mash-ups such as showing a user's presence on linked-in pages or embedding a social wall in corporate directory without help from those websites. The project enables new group interactions not found in existing social networks."

read>>

Internet Video Communication: Past, Present and Future

I gave a presentation last month titled Hello 1 2 3, can you ~~hear~~ see me now? highlighting my point of view on the origins of Internet video communication technologies we see today. The full text of the presentation can be found at Internet video communication: past, present and future.

Modern video communication systems have roots in several technologies: transporting video over phone lines, using multicast on Internet2's Mbone, adding video to voice-over-IP (VoIP), and adding interactivity in existing streaming applications. Although the Internet telephony and multimedia communication protocols have matured over the last fifteen years, they are largely being used for interconnectivity among closed networks of telecom services. Recently, the world wide web has evolved as a popular platform for everything we do on the Internet including email, text chat, voice calls, discussions, enterprise applications and multi-party collaboration. Unfortunately, there is a disconnect between the web and traditional Internet telephony protocols as they have ignored the constraints and requirements of each other. Consequently, Adobe's Flash Player is being used as a web browser plugin by many developers for voice and video calls over the web.

Learning from the mistakes of the past and knowing where we stand at present will help us build the Internet video communication systems of the future. I present my point of view on the evolution, challenges and mistakes of the past, and, moving forward, describe the challenges in bridging the gap between web and VoIP. I highlight my contributions at various stages in the journey of Internet audio/video communication protocols.