Try   HackMD

Getting started with WebRTC

This tutorial assumes knowledge of the following concepts/tools:

  • Network Address Translator (NAT)
  • JavaScript

What is WebRTC?

WebRTC (Web Real-Time Communication) is a collection of standards, protocols, and JavaScript APIs which enables peer-to-peer audio, video, and data sharing between browsers, without requiring any third-party plug-ins or libraries.
Delivering rich, high-quality, real-time content is a complex problem. Thankfully, the browser abstracts most of the complex task behind a few simple JavaScript APIs, making it possible to write a teleconferencing web application in a few dozens of lines of JavaScript code. This tutorial will show you, step by step, how to create a sample WebRTC P2P video chat system.

What are the challenges in WebRTC?

P2P means the peers will directly communicate with each other. Doing this is not as straightforward as it seems. There are a few problems that we need to solve here:

  1. The other peer might not be listening for incoming traffic. We must notify the other peer of the intent to open a peer-to-peer connection, such that it knows to start listening for incoming packets.
  2. We need to capture, process, and optimize the media streams locally.
  3. The two peers must agree on the necessary information about the parameters of the different media and data streams—protocols, encodings used, and so on.
  4. We must identify potential routing paths for the peer-to-peer connection on both sides of the connection and relay this information between peers. Note that the peers might not be directly reachable from each other because they might be located behind their own NATs.

Do keep these problems in mind while reading the rest of the tutorial. We will solve them one by one.

Do not fret, though. The WebRTC client in your browser encapsulates most of the complex tasks into a few simple APIs. We only need to understand how they work and piece a few components together.

P2P chat app with WebRTC using Node.js and Websocket

We will be building a P2P video/audio chat application using WebRTC. The app will allow two users to join and video chat with each other.

Setting up the project

This tutorial will use Node.js to build the application, so install it if it's not installed yet. Navigate to the project folder and execute the following two commands:

npm init
npm install websocket --save # we will use this later

For socket.io

npm init
npm install socket.io express --save #we will use this later

Signaling server

What we need is a out-of-band channel that allows the two peers to communciate before they have a direct channel for communication. After the session is established, the peers will then be able to communicate directly.

In theory, any channel (including shouting across the room) could work. This allows interoperability with a variaty of other signaling protocols used in existing communication infrastructure (e.g., SIP, Jingle, ISUP). If you have a server that already tracks which users are logged in, then you can simply use that server as the signalling server. We can also use an dedicated server to relay messages between the peers. The server does not need to understand the content of the messages, but need to resides in the public internet where both peers can reach.

In this tutorial, we'll implement a custom signalling server that uses WebSocket to relay information between the peers.

Create a new directory src containing a new file index.js. This file will contain the implementation for the server. Start by setting up a websocket server which listens on a HTTP port.

const http = require('http'); const server = require('websocket').server; const httpServer = http.createServer(() => {}); const port = 1337; httpServer.listen(port, () => { console.log('Server listening at port ' + port); }); const wsServer = new server({ httpServer }); let clients = []; wsServer.on('request', request => { const connection = request.accept(); const id = Math.floor(Math.random() * 100); clients.push({ connection, id }); connection.on('message', message => { clients .filter(client => client.id !== id) .forEach(client => client.connection.send(JSON.stringify({ client: id, text: message.utf8Data, }))); }); connection.on('close', () => { clients = clients.filter(client => client.id !== id); clients.forEach(client => client.connection.send(JSON.stringify({ client: id, text: JSON.stringify({ message_type: 'disconnect', content: null }) }))); }); });

On each request to connect, the server creates a WebSocket, assigns a random id to it and keeps it in an array. When this socket receives messages from the client, the server simply forwards it to all other connected clients together with the ID of the client who sent the message. When the socket is closed, the server informs all remaining clients of the closure.


For socket.io

Create a new directory src containing a new file index.js. This file will contain the implementation for the server. Start by setting up a socket.io server which listens on a HTTP port.

const app = require('express')(); const http = require('http'); const httpServer = http.createServer(app); const port = 1337; httpServer.listen(port, () => { console.log('Server listening at port ' + port); }); app.use(require('express').static('public')); app.get('/', (req, res) => { res.sendFile(__dirname + '/index.html'); }); const io = require('socket.io')(httpServer); let clients = []; io.on('connection', socket => { const id = (Math.random() * 100).toString().replace('.', ''); clients.push({ connection: socket, id: id }); //defined for each event socket.on('eventName', message => { clients .filter(client => client.id !== id) .forEach(client => client.connection.emit('eventName', { id: id, content: message, }))); }); socket.on('disconnect', () => { clients = clients.filter(client => client.id !== id); clients.forEach(client => client.connection.emit('clientDisconnect', { id: id, content: null }))); }); });

On each request to connect, the server creates a socket.io socket, assigns a random id to it and keeps it in an array. When this socket receives messages from the client, the server simply forwards it to all other connected clients together with the ID of the client who sent the message. When the socket is closed, the server informs all remaining clients of the closure.


Note that in a more realistic example, the message from the client should include the other peer's ID and the server should only forward the message or send the closure notification to the intended peer.

With our signalling server in place, our peers can use it to communicate with each other to exchange routing information and negotiate the session parameters. Note that the signalling server does not transfer the video, audio, or data of our application. It simply serves as a starting point for the peers to find a direct route and establish a session. Once these are done, the video, audio, and data are sent between the peers directly.

In essence, the signalling server solves the first problem we listed above.

Getting the media streams

The client side is slightly more complicated compared to the server side, since this is where most of the complexity of WebRTC lies. We'll build this part step by step and explain what we are doing at each step.

Firstly, since we're building a video chat application, we have to first obtain streams from the user's webcam and microphone. Thankfully, Javascript provides the Media Capture and Streams API for the application to capture, manipulate, and process video and audio streams from the underlying platform. All the audio and video processing, such as noise cancellation, equalization, image enhancement, and more are automatically handled by the audio and video engines.

Navigate to the public/ directory, where the client scripts will be located. Let's first create a HTML document for the client page. It's just a simple one, containing a start button and two video streams. We'll focus on the self-view in this section. You can safely ignore remove-view for now.

<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>WebRTC Chat</title> <script src="index.js"></script> </head> <h1 class="cover-heading">Demo Video Chat</h1> <div id="chat-room"> <div id="videos"> <video id="self-view" autoplay></video> <video id="remote-view" autoplay></video> </div> <button id="start">Start Video Chat</button> </div> </html>

In our index.js file, we will define a listener startChat() for the start button we created. For now, it displays the user's video stream (and also plays the user's audio).

const startChat = async () => { try { // Get both video and audio from the system const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true }); // Display the stream in `self-view` document.getElementById('self-view').srcObject = stream; showChatRoom(); } catch (err) { console.error(err); } }; const showChatRoom = () => { document.getElementById('start').style.display = 'none'; document.getElementById('stop').style.display = 'inline'; document.getElementById('chat-room').style.display = 'block'; };

The browser might prompt you for your permision to open your webcam. After you give permission, you should be able to see yourself on the left-hand side of the document.

Our problem 2 is also out of the way.

With the video stream, we can then discuss how to send it over WebRTC to our remote peer.

Creating a P2P Connection

Assuming the signalling server is in place, the peers can now use it to establish a P2P session.

Session Negotiation

The two peers first need to agree on the session parameters. This includes the types of media to be exchanged (audio, video, and application data), used codecs and their settings, bandwidth information, and other metadata. These data are collectively called the session description and should be encoded in the Session Description Protocol (SDP) format.

Each peer will have a local description. To establish a connection, the peers need to exchange their respective descriptions. The session initiator will send an 'offer' containing its local description and the callee must send an 'answer' containing its own local description.
This happens in a symmetric manner:

The result of the offer/answer process is that both peers are aware of each other's descriptions. This solves our third problem listed above.

To start, we establish a Websocket connection with the signalling server and create a RTCPeerConnection object. As we'll see, almost all the WebRTC functionalities are encapsulated in the RTCPeerConnection API.

var signalingChannel = new Websocket("ws://127.0.0.1:1337");    
var peerConnection = new RTCPeerConnection();

To help us send messages over to other peers, we define the following helper function:

const sendToServer = (message_type, content) => {
  signalingChannel.send(JSON.stringify({
    message_type: message_type,
    content: content
  }));
}

On the receiving side, we break down the encrypted message to obtain its sender ID, message type and content.

signalingChannel.onmessage = async (message) => {
  const wrappedData = JSON.parse(message.data);
  const id = wrappedData.id;
  const data = JSON.parse(wrappedData.content);
  const message_type = data.message_type;
  const content = data.content;
  
  //Message handlers here
}

When there is an need to start a session negotiation, the negotiationneeded event will be fired on the RTCPeerConnection. We attach an onnegotiationneeded event handler that sends a connection offer to the signalling server, which should be relayed to the remote peer.

peerConnection.onnegotiationneeded = async () => {
  console.log("Negotiation needed, send offer");
  await createAndSendOffer(signalingChannel, peerConnection);
};
const createAndSendOffer = async (signalingChannel, peerConnection) => {
  try {
    const offer = await peerConnection.createOffer();
    await peerConnection.setLocalDescription(offer);
    sendToServer("videoOffer", offer);
  } catch (err) {
    console.log(err);
  }
};

When the remote peer receives the offer through the signaling server, it will set its remote description, and send an answer containing its own session description.

//In message handler
if (message_type === 'videoOffer') {
  console.log("SDP Offer received from " + id);
  await peerConnection.setRemoteDescription(content);
  const answer = await peerConnection.createAnswer();
  await peerConnection.setLocalDescription(answer);
  sendToServer("videoAnswer", answer);
}

When the answer signal is received by the initiating peer, it will set its remote description to the received description.

//In message handler
if (message_type === 'videoAnswer') {
  console.log("SDP answer received from " + id);
  await peerConnection.setRemoteDescription(content);
}

The more careful readers might have noticed that a potential race condition can occur here. If two clients send offers to each other at the same time, the RTCPeerConnection object might try to establish two connections. To avoid this, there is a technique known as perfect negotiation. For a detailed discussion on this issue, see this Mozilla developer guide article.


For socket.io

To start, we establish a socket.io connection with the signalling server and create a RTCPeerConnection object. As we'll see, almost all the WebRTC functionalities are encapsulated in the RTCPeerConnection API.

var signalingChannel = io.connect("http://127.0.0.1:1337");    
var peerConnection = new RTCPeerConnection();

When there is an need to start a session negotiation, the negotiationneeded event will be fired on the RTCPeerConnection. We attach an onnegotiationneeded event handler that sends a connection offer to the signalling server, which should be relayed to the remote peer.

peerConnection.onnegotiationneeded = async () => {
  console.log("Negotiation needed, send offer");
  await createAndSendOffer(signalingChannel, peerConnection);
};
const createAndSendOffer = async (signalingChannel, peerConnection) => {
  try {
    const offer = await peerConnection.createOffer();
    await peerConnection.setLocalDescription(offer);
    signalingChannel.emit("videoOffer", offer);
  } catch (err) {
    console.log(err);
  }
};

When the remote peer receives the offer through the signaling server, it will set its remote description, and send an answer containing its own session description.

signalingChannel.on('videoOffer', async (data) => {
  console.log("SDP Offer received from " + data.id);
  await peerConnection.setRemoteDescription(data.content);
  const answer = await peerConnection.createAnswer();
  await peerConnection.setLocalDescription(answer);
  signalingChannel.emit("videoAnswer", answer);
});

When the answer signal is received by the initiating peer, it will set its remote description to the received description.

signalingChannel.on('videoAnswer', async (data) => {
  console.log("SDP answer received from " + data.id);
  await peerConnection.setRemoteDescription(data.content);
});

The more careful readers might have noticed that a potential race condition can occur here. If two clients send offers to each other at the same time, the RTCPeerConnection object might try to establish two connections. To avoid this, there is a technique known as perfect negotiation. For a detailed discussion on this issue, see this Mozilla developer guide article.


Interative Connectivity Establishment (ICE)

In order to establish a peer-to-peer connection, the peers must be able to route packets to each other. This sounds trivial, but is very hard to achieve in practice. There are a few distinct scenarios that can happen: none, one, or both of them can be located behind a NAT; they can be behind the same NAT, or distinct NATs; there can be numerous layers of NATs between them; worse still, they can be located behind address- and port- dependent NATs. So how do can we find a route between the two peers?

The ICE protocol was designed to solve the problem of finding a route between the peers. In short, each peer would generate a list of transport candidates and send to the other peer. There are three types of candidates:

  1. A local IP address and port
  2. A translated address and port on the public side of a NAT (the 'server-reflexive' address) discovered using a STUN server.
  3. A transport address allocated from a TURN server. A TURN server is one that relays all the data between the two peers.

A TURN server is certainly less than optimal, therefore it is only used as a last resort when all other candidates fail.

Upon receiving the candidates from the other peer, each peer should perform connectivity checks to determine which candidate is viable and use that to establish the direct connection.

Thankfully, the RTCPeerConnection object has an ICE agent. All we need to do is to send the local candidates to the remote peer through the signalling channel, and receive the candidates sent by the remote peer. Similar to session description, the ICE candidates are also encoded in SDP format.

This solves our problem 4.

For this demo, we will use a public STUN server. You can also include a TURN server if you have one available.
Change the definition of the peerConnection object to this.

const configuration = {iceServers: [{ urls: 'stun:stun1.l.google.com:19302' }]};
const peerConnection = new RTCPeerConnection(configuration);

After the session description is set, the RTCPeerConnection object will start to gather ICE candidates. Each time it finds a candidate, it will emit an icecandidate event. We need to listen to this event and send the candidate to the other peer through the signalling server.

peerConnection.onicecandidate = (iceEvent) => {
  console.log("ICE candidate requested");
  if (iceEvent && iceEvent.candidate) {
    sendToServer("candidate", iceEvent.candidate);
  }
};

For socket.io

peerConnection.onicecandidate = (iceEvent) => {
  console.log("ICE candidate requested");
  if (iceEvent && iceEvent.candidate) {
    signalingChannel.emit("candidate", iceEvent.candidate);
  }
};

In response to receiving an ICE candidate, the remote client should add the received candidate to the RTCPeerConnection object, which will automatically start connectivity checks.

//In message handler
if (message_type === 'candidate') {
  console.log("ICE candidate received from " + id);
  await peerConnection.addIceCandidate(content);
});

For socket.io

signaling.on('candidate', async (data) => {
  console.log("ICE candidate received from " + data.id);
  await peerConnection.addIceCandidate(data.content);
});

Sending and Receiving Streams

Now that we have exchange session descriptions and ICE candidates, the RTCPeerConnection will establish a working P2P connection. But wait, we are not sending any media! However, most of the battle is already won. All we need to do is to pass our local stream to the RTCPeerConnection object and display the remote stream in our remote-view video tag.

Modify the startChat function to pass our stream to the RTCPeerConnection.

const startChat = async () => { try { const stream = await navigator.mediaDevices.getUserMedia({ audio: true, video: true }); document.getElementById('self-view').srcObject = stream; showChatRoom(); // Add this line stream.getTracks().forEach(track => peerConnection.addTrack(track, stream)); } catch (err) { console.error(err); } };

Adding local tracks will fire an negotiationneeded event, which invokes the onnegotiationneeded event handler which would send our media description to the remote peer.

In turn, when we receive a remote description (either offer or answer), and it contains tracks, our RTCPeerConnection object will receive a track event.
We listen to this event to display the stream sent by our remote peer.

peerConnection.ontrack = (event) => {
  const video = document.getElementById('remote-view');
  if (!video.srcObject) {
    video.srcObject = event.streams[0];
  }
};

When the media is sent over WebRTC, it will be automatically optimized, encoded, and decoded by the WebRTC audio and video engines based on the session description agreed between the peers.

Flowcharts of Various Steps/States in WebRTC

Steps to Establish a RTCPeerConnection

  1. negotiationneeded event is most commonly triggered when a send MediaStreamTrack is first added to the RTCPeerConnection and signalingState is stable. Actually, the steps to check whether negotiation is needed is quite complex.
  2. According to the specs (step 4.18), track event is triggered after setting the remote description, if the description contains tracks.
    The Chromium implementation honours this fact. However, Go Pion seems to fire ontrack event in SetLocalDescription too as long as the description is an answer.

ICE Candidate Discovery

ICE candidate discovery is started once a description (local or remote) is set on the peer connection. It is fully automated by the internal ICE agent of the peer connection and runs asynchronously to the rest of the code. We only need to listen for the icecandidate event, which is fired when a new ICE candidate is found.

Changes in Signaling State

The state of the SDP negotiation is represented by the signaling state.
The standard defines 5 states, but have-local-pranswer and have-remote-pranswer are usually only employed by legacy hardware. In this demo, we only use 3 of them.

Changes in ICE Connection State


The W3C specs recommends that when the state changes to disconnected, an ICE restart should be attempted.

When the network gets disrupted, ICE connection will enter disconnected state, and the ICE agent will attempt to self-recover. If it fails to recover, ICE connection will enter failed state, in which an ICE restart has to be initiated. (See Network Interruption below).

An ICE restart is very similar to a fresh offer/answer exchange.

Hanging up

To end a RTCPeerConnection, we can call RTCPeerConnection.close(). It is recommended to set all event handlers of the RTCPeerConnection to null before doing so, so that these handlers will not trigger erroneously duing the disconnection process. As an example, the following function will close the peer connection and reset the video elements.

const stopChat = () => { currentPeerConnection.ontrack = null; currentPeerConnection.onicecandidate = null; currentPeerConnection.onnegotiationneeded = null; const selfVideo = document.getElementById('self-view'); if (selfVideo.srcObject) { selfVideo.pause(); selfVideo.srcObject.getTracks().forEach(track => track.stop()); selfVideo.srcObject = null; } const remoteVideo = document.getElementById('remote-view'); if (remoteVideo.srcObject) { remoteVideo.pause(); remoteVideo.srcObject.getTracks().forEach(track => track.stop()); remoteVideo.srcObject = null; } currentPeerConnection.close(); currentPeerConnection = null; }

Of course, it is important to inform the other peer to hang up as well. This can be done through the signalling server. When a peer hangs up the call, a "hang-up" message is sent to the signaling server, which is forwarded to the other peer. When this message is received, the peer connection is closed.

For the initiating side (assume there is a button with ID stop for hanging up),

document.getElementById('stop').onclick = () => {
  currentSignaling.send(JSON.stringify({
    message_type: 'hangup',
    content: null
  }));
  stopChat();
};

For the receiving side,

//In message handler
if (message_type === 'hangup') {
  console.log("Client " + id + " disconencted");
  stopChat();
}

For socket.io

document.getElementById('stop').onclick = () => {
  signalingChannel.emit('hangup');
  stopChat();
};

For the receiving side,

signalingChannel.on('hangup', (data) => {
  console.log("Client " + data.id + " disconencted");
  stopChat();
});

Additional logging

There are some other events you might be interested in listening to to understand the inner workings of the WebRTC process. For example, if we want to log the new state each time, we can use the following code:

peerConnection.onicegatheringstatechange = (event) => {
  console.log('ICE gathering state changed to ' + peerConnection.iceGatheringState);
}

peerConnection.oniceconnectionstatechange = (event) => {
  console.log('ICE connection state changed to ' + peerConnection.iceConnectionState);
}

peerConnection.onsignalingstatechange = (event) => {
  console.log('Signaling state changed to ' + peerConnection.signalingState);
}

peerConnection.onconnectionstatechange = (event) => {
  console.log('ICE connection state changed to ' + peerConnection.connectionState);
}

Network Interruption

WebRTC is also resilient to network interruptions and changes. RTCPeerConnection can self-recover from a temporary network interruption without renegotiation. If a previously exchanged ICE candidate is still valid, e.g. in the case of a temporary disconnection, the client only needs to re-establish a connection with the signaling server.

However, if a peer's network interface changes, e.g. switching from Wi-Fi to 4G, RTCPeerConnection can no longer self-recover since the candidates exchanged previously are no longer valid. If this happens, we can initiate an "ICE restart" to renegotiate the session. The following code snippet illustrates how to do so (only works in Chrome):

peerConnection.oniceconnectionstatechange = (event) => {
  console.log('ICE connection state changed to ' + peerConnection.iceConnectionState);
  if (peerConnection.iceConnectionState === 'failed') {
    peerConnection.restartIce();
  }
};

Adapting to bandwidth fluctuation

Real-Time Control Transport Protocol (RTCP) tracks the number of sent and lost bytes and packets, last received sequence number, inter-arrival jitter for each RTP packet, and other RTP statistics. Then, periodically, both peers exchange this data and use it to adjust the sending rate, encoding quality, and other parameters of each stream.

WebRTC requires that all traffic be encrypted, therefore, the secure versions of RTCP and RTP are actually used. They are encrypted before sending by the browser. This means it is impossible to capture the packets with WireShark and read them.

However, Firefox supports logging RTCP packets before they are encrypted. Check out this Mozilla Blogpost.

Modifications for Multiparty Communications

Additional Tools and Resources

Chrome WebRTC inspector

Google Chrome provides a handy tool to inspect the internal workings of WebRTC. You can go to chrome://webrtc-internals in the address bar, and view the current WebRTC events and statistics.

Anatomy of an SDP

https://webrtchacks.com/sdp-anatomy/
Breaks down every single line of an SDP to help you understand what is going on under the hood.

SDP Visualizer

https://sdp.garyliu.dev
Visualize long SDPs. Quickly find the part you're interested in and collapse the rest.