Blog /

Using WebRTC for real-time communication

Joel Oliveira

Jun 4 2021

Posted in Engineering & Technology

Building powerful peer-to-peer applications

Using WebRTC for real-time communication

If you are not familiar with WebRTC, we would like to introduce you to one of the most powerful set of APIs available for real-time communication. It is built on the top of an open standard and available as a Javascript API. Although it has been around for some time, only a couple of years ago it became widely available in all major browsers. The WebRTC project is open-source and supported by companies like Apple, Google, Microsoft and Mozilla.

The technologies behind WebRTC allow applications to send video, audio and generic data between peers, enabling developers to create powerful applications, similar to Google Meet, Zoom or Microsoft Teams. In this post we will cover how you access media devices, discover and connect peers, stream video and audio and send arbitrary data.

If you are new to WebRTC, we strongly recommend you to take a look at the code examples available here.

On a high-level, the WebRTC standard covers two different technologies, media capture and peer-to-peer connectivity. Basically a WebRTC application will capture a video or audio stream and relay those via a peer-to-peer connection. In the sections below, we will quickly cover all those steps and hopefully demonstrate how you can build such applications.

Media Devices

Media devices include video cameras, microphones and screen capturing devices. For cameras and microphones, you will use navigator.mediaDevices.getUserMedia(). This is the first step towards a peer-to-peer connection. We start by requesting access to all media devices:

try {
    await navigator.mediaDevices.getUserMedia({'video':true,'audio':true});
    console.log('MediaStream:', stream);
} catch(error) {
    console.error('Could not access user media:', error);
}

The first time you invoke this code, it will prompt the user with a permission dialogue to use the camera and microphone. If the user accepts it, the promise resolves with a MediaStream. If instead the user denies it, it will throw a PermissionDeniedError error or if no devices found, a NotFoundError error.

Once the user allows access to their camera and microphone, you can also query them. This is useful, for example, if you want to display a list of all available hardware:

const devices = await navigator.mediaDevices.enumerateDevices();

let videoCameras = devices.filter(device => device.kind === 'videoinput')
console.log('Cameras:', videoCameras);

let microphones = devices.filter(device => device.kind === 'audioinput')
console.log('Microphones:', microphones);

It is also possible that new devices are added at runtime. Most computers allow plug-and-play cameras, microphones, headsets and speakers, so you will also want to listen to these changes.

To do that, you will implement the following event listener:

navigator.mediaDevices.addEventListener('devicechange', event => {
    // Update list of devices
});

It is also important to understand the MediaStreamConstraints. This allows us to access devices that have certain requirements. These constrains can define which kind of device you want to open, target specific devices, enable echo cancellation and specify a minimum width and height for a camera input.

You use it by invoking the following:

await navigator.mediaDevices.getUserMedia({
        'audio': {'echoCancellation': true},
        'video': {
            'width': {"min": "1280"},
            'height': {"min": "720"}
            }
});

Once you have access to a MediaStream, you can assign it to a HTML video element and play the stream locally:

try {
    await navigator.mediaDevices.getUserMedia({'video':true,'audio':true});
    const videoElement = document.querySelector('video#myVideo');
    videoElement.srcObject = stream;
} catch(error) {
    console.error('Could not access user media:', error);
}

Peer Connections

This part of a WebRTC application is responsible for connecting users using a peer-to-peer protocol. This can include video, audio or binary data. In order for peers to discover each other, they need to provide an ICE (Internet Connectivity Establishment) server configuration. This can be either a STUN (Session Traversal Utilities for NAT) or a TURN (Traversal Using Relay NAT) and they are used to provide ICE candidates to each client in order to transfer this information to a remote peer. The transfer of these ICE candidates is also often referred as Signaling.

There are a couple of solutions out there that you can use free of charge. For example, Google maintains a bunch of STUN servers at the following addresses:

stun:stun.l.google.com:19302
stun:stun1.l.google.com:19302
stun:stun2.l.google.com:19302
stun:stun3.l.google.com:19302
stun:stun4.l.google.com:19302

But in some cases, a direct socket between peers is not possible, and for those cases, you will want to work around this issues using a TURN server. There are some free cloud services out there that you can use, like this one, or you can also just build one yourself, in your own infrastructure, using an open source project like this one.

Signaling

Although the WebRTC standard includes APIs for communication with ICE servers, the signaling is not part of it. You will need this in order for all peers to know how they should connect. This usually is solved by implementing a REST API or a protocol like WebSockets. In this post we will not dive into detail in how to do this, and assume you have a pseudo signaling service as follows:

const signalingChannel = new MySignalingChannel("wss://mysignalingserver.com");
signalingChannel.addEventListener('message', message => {

});

signalingChannel.send({"signal": "Hello There"});

Start a Peer Connection

For each peer, you will need to create a peer connection. This is done using the RTCPeerConnection object. This object takes one since parameter, the RTCConfiguration object. This should contain the information about the ICE server. Depending on which peer we are (the caller or the receiver), we need to create a SDP (Session Description Protocol) offer or an answer. These will then be exchanged using the signaling server.

For example if you are the initiator, you will create a peer connection as follows:

const configuration = {'iceServers': [{'urls': 'stun:mystunserver.com'}]}
const peerConnection = new RTCPeerConnection(configuration);
signalingChannel.addEventListener('message', async message => {
  if (message.answer) {
    const remoteDesc = new RTCSessionDescription(message.answer);
    await peerConnection.setRemoteDescription(remoteDesc);
  }
});
const offer = await peerConnection.createOffer();
await peerConnection.setLocalDescription(offer);
signalingChannel.send({'offer': offer});

Pretty much the same way, the receiving end must implement the following peer connection:

const configuration = {'iceServers': [{'urls': 'stun:mystunserver.com'}]}
const peerConnection = new RTCPeerConnection(configuration);
signalingChannel.addEventListener('message', async message => {
    if (message.offer) {
        peerConnection.setRemoteDescription(new RTCSessionDescription(message.offer));
        const answer = await peerConnection.createAnswer();
        await peerConnection.setLocalDescription(answer);
        signalingChannel.send({'answer': answer});
    }
});

ICE Candidates

Before peers can communicate using WebRTC, they need to exchange information about their connectivity. Like mentioned above, you will use an ICE server to achieve this. After a peer connection is created, the WebRTC protocol will use the ICE server configuration to gather information about the connectivity (candidates) of each peer.

The icegatheringstatechange event in the RTCPeerConnection object will let your application know in which state the gathering of information is (new, gathering or complete). Although it is possible to wait until the ICE gathering is completed, it is common to use something called the Trickle ICE technique. Basically, you will transmit the ICE candidates as they are discovered to the remote peer allowing a video session to start much faster.

Using the RTCPeerConnection object, you can listen to the candidate's changes in the local connection and send them to the remote peer using your signaling server:

peerConnection.addEventListener('icecandidate', event => {
    if (event.candidate) {
        signalingChannel.send({'candidate': event.candidate});
    }
});

And at the same time you can use the signaling server to listen to changes from the remote peer candidates and add them to the connection:

signalingChannel.addEventListener('message', async message => {
    if (message.iceCandidate) {
        try {
            await peerConnection.addIceCandidate(message.iceCandidate);
        } catch (e) {
            console.error('Error: ', e);
        }
    }
});

Once the connection is established, the RTCPeerConnection object will trigger the following event:

peerConnection.addEventListener('connectionstatechange', event => {
    if (peerConnection.connectionState === 'connected') {

    }
});

At this point, all peers should be connected and the signaling server is no longer needed. You can now start streaming data from any camera and microphone you've retrieved before using navigator.mediaDevices.getUserMedia(). This usually consists of at least one media track and they are added individually to the connection as follows:

const localStream = await navigator.mediaDevices.getUserMedia({'video':true,'audio':true});

...

const configuration = {'iceServers': [{'urls': 'stun:mystunserver.com'}]}
const peerConnection = new RTCPeerConnection(configuration);

...

localStream.getTracks().forEach(track => {
    peerConnection.addTrack(track, localStream);
});

Again, pretty much the same way, we add any remote tracks to the connection and assign them to the HTML video element reserved for the remote peer:

const remoteStream = MediaStream();
const remoteVideo = document.querySelector('video#aRemoteVideo');
remoteVideo.srcObject = remoteStream;

peerConnection.addEventListener('track', async (event) => {
    remoteStream.addTrack(event.track, remoteStream);
});

Data Channels

The WebRTC standard also includes an API to send data over the RTCPeerConnection object. This can be useful to send any kind of data you want, from text to binary data, you can use it to take your application to a whole different level. For example, you can use this to create a chat functionality or even a multi-user web based game.

You create a data channel by invoking the following:

const configuration = {'iceServers': [{'urls': 'stun:mystunserver.com'}]}
const peerConnection = new RTCPeerConnection(configuration);

const dataChannel = peerConnection.createDataChannel();

And a remote peer can receive data channels by listening to the following event:

peerConnection.addEventListener('datachannel', event => {
    const dataChannel = event.channel;
});

But before you can start transmitting data over this channel, you must wait until it is ready. This is done by listening to the open and close events:

dataChannel.addEventListener('open', event => {
    // Allow data to be sent
});


dataChannel.addEventListener('close', event => {
    // Prevent data from being sent
});

If a data channel is opened, you can then start sending data using:

dataChannel.send({"data": "Hello World"});

And receive data using:

dataChannel.addEventListener('message', event => {
    console.log(event.data);
});

Congratulations!

If you've made this far, you are now ready to start building your own WebRTC application. Of course, there is much more to it than this basic introduction to WebRTC, but this is a good starting point. At Notificare, we are developing new products using this technology. A year ago, we even released a small PoC using this same technology. This project, called AirMink, allows you to start a 1-on-1 private conversation without installing any app, sign up for any service or pay any fees.

As always, we hope you liked this post and feel free to contact us if you have any questions, corrections or suggestions via our Support Channel.