Signaling and video calling

This is an experimental technology
Because this technology's specification has not stabilized, check the compatibility table for usage in various browsers. Also note that the syntax and behavior of an experimental technology is subject to change in future versions of browsers as the specification changes.

Summary

WebRTC allows real-time, peer-to-peer, media exchange; via a discovery and negotiation process called signaling. This tutorial will guide you through building a two video-call.

WebRTC is a fully peer-to-peer technology for the real-time exchange of audio, video, and data, with one central caveat. A form of discovery and media format negotiation must take place, as discussed elsewhere, in order for two devices on different networks to locate one another. This process is called signaling and involves both devices connecting to a third, mutually agreed-upon server. Through this third server, the two devices can locate one another, and exchange negotiation messages.

In this article, we will further enhance the WebSocket chat first created as part of our continuing WebSocket documentation (this article link is forthcoming; it isn't actually online yet) to support opening a two-way video call between users. You can try out the sample, and you can look at the full project on GitHub.

The code on Github and on our test server is currently newer (and better) than the code shown below. This article is in the process of being updated right now; that update should be done soon (you'll know it's done because this note will vanish).

This example uses promises. If you're not already familiar with them, you should read up on them.

The signaling server

Establishing a WebRTC connection between two devices requires the use of a signaling server to resolve how to connect them over the internet. How do we create this server and how does the signaling process actually work?

First we need the signaling server itself. WebRTC doesn't specify a transport mechanism for the signaling information. You could use anything you like, from WebSocket to XMLHttpRequest to carrier pigeon for sending the signaling information between the two peers.

It's important to note that the server doesn't need to know the signaling data content. Although it's SDP, even this doesn't matter so much: the content of the message going through the signaling server is, in effect, a black box. What does matter is when the ICE subsystem instructs you to send signaling data to the other peer, you do so, and the other peer knows how to receive this information and deliver it to its own ICE subsystem.

Readying the chat server for signaling

Our chat server uses the WebSocket API to send information as JSON strings between each client and the server. The server supports several message types to handle tasks, such as registering new users, setting usernames, and sending public chat messages. To allow the server to support signaling and ICE negotiation, we need to update the code. We'll have to to allow directing messages to one specific user instead of broadcasting to all logged-in users, and ensure unrecognized message types are passed through and delivered, without the server needing to know what they are. This lets us send signaling messages using this same server, instead of needing a separate server.

Let's take a look which changes we need to make to the chat server support WebRTC signaling. This is in the file chatserver.js.

First up, is the addition of the function sendToOneUser(). As the name suggests, this sends a stringified JSON message to a particular username.

function sendToOneUser(target, msgString) {
  var isUnique = true;
  var i;
  for (i=0; i<connectionArray.length; i++) {
    if (connectionArray[i].username === target) {
      connectionArray[i].sendUTF(msgString);
      break;
    }
  }
}

This function iterates over the list of connected users until it finds one matching the specified username, sending the message to this user. Using such an implementation, the message, msgString, is a stringified JSON object. We could have made it receive our original message object, but in this scenario, it's a more efficient use. As our message has been already stringified, we reach the point of sending the message along without the need of further processing.

Our original chat demo didn't support sending messages to a specific user. By modifying the main WebSocket message handler, we can now support this. Doing so involves a change towards the end of the connection.on() function:

if (sendToClients) {
  var msgString = JSON.stringify(msg);
  var i;
  // If the message specifies a target username, only send the
  // message to them. Otherwise, send it to every user.
  if (msg.target && msg.target !== undefined && msg.target.length !== 0) {
    sendToOneUser(msg.target, msgString);
  } else {
    for (i=0; i<connectionArray.length; i++) {
      connectionArray[i].sendUTF(msgString);
    }
  }
}

This code now looks at our pending message to check if it has a target property specified. This property can be included to specify the username of the person intended to receive our outgoing message. If a target parameter is present, the message is sent only to that user by again calling the sendToOneUser() method. Alternatively, the message is broadcast to all users, by iterating over the connection list, sending our message to each user in this list.

As the existing code allows the sending of arbitrary message types, no additional changes are required. Our clients can now send messages of unknown types to any specific user, letting them send signaling messages back and forth as desired.

Designing the signaling protocol

Now we've built a mechanism for exchanging messages, we need a protocol for how those messages will look. This can be done in a number of ways; what's demonstrated here is just one of those possible ways to structure signaling messages.

Our server uses stringified JSON objects to communicate with its clients. This means our signaling messages will be in JSON format, with contents which specify what kind of messages they are, and instructions needed to handle the message appropriately.

Exchanging session descriptions

When starting the signaling process, an offer is created by the user initiating the call. This offer includes a session description, in SDP format, and needs to be delivered to the receiving user, or callee. This callee responds to this offer, with an answer message, also containing an SDP description. Our offer messages use the type "video-offer", and the answer messages use the type "video-answer". These messages have the following fields:

type: The message type; either "video-offer" or "video-answer".
name: The sender's username.
target: The username of the person to receive the description (if the caller is sending the message, this specifies the callee, and vice-versa).
sdp: The SDP (Session Description Protocol) string describing the local end of the connection (e.g. from the point of view of the recipient, the SDP describes the remote end of the connection).

At this point, the two participants know which codecs and video parameters are to be used for this call. They still don't know how to transmit the media data itself though. This is where Interactive Connectivity Establishment (ICE) comes in.

Exchanging ICE candidates

After exchanging session descriptions (SDP), the two peers start exchanging ICE candidates. Each ICE candidate describes a method which the originating peer is able to communicate. Each peer sends candidates in the order of discovery, and keeps sending until it runs out of suggestions, even if media has already started streaming. Once the two peers suggest a compatible candidate, media begins to flow. If they later agree on a better pairing (usually higher-performance), the stream may change formats as needed.

Though not currently supported, this technique could theoretically be used in downgrading to a lower-bandwidth connection if needed.

The message we'll be sending through the signaling server, carrying these ICE candidates, has the type "new-ice-candidate". Such messages include the fields:

type: The message type: "new-ice-candidate".
target: The username of the person with whom negotation is underway; the server will direct the message to this user only.
candidate: The SDP candidate string, describing the proposed connection method.

Each ICE message suggests a communication protocol (TCP or UDP), IP address, port number, connection type (for example, whether the specified IP is the peer itself or a relay server), along with other information needed to link the two computers together. This includes NAT or other networking complexity.

The important thing to note is this: the only thing your code is responsible for during ICE negotiation is accepting outgoing candidates from the ICE layer and sending them across the signaling connection to the other peer when your onicecandidate handler is executed, and receiving ICE candidate messages from the signaling server (when the "new-ice-candidate" message is received) and delivering them to your ICE layer by calling RTCPeerConnection.addIceCandidate(). That's it. Avoid the temptation to try to make it more complicated than that until you really know what you're doing. That way lies madness.

All your signaling server now needs to do is send the messages it's asked to. Your workflow may also demand login/authentication functionality, but such details will vary.

Signaling transaction flow

Signaling information is transmitted between the two peers to be connected. Or at a basic level: which messages are sent by whom, to whom, and why.

The signaling process involves this exchange of messages among a number of points: each user's instance of the chat system's web application, each user's browser, the signaling server, and the hosting Web server.

Imagine, Naomi and Priya are engaged in a discussion using the chat software, and Naomi decides to open a video call between the two. Here's the expected sequence of events:

We'll see this detailed more over the course of this article.

ICE candidate exchange process

When each peer's ICE layer begins to send candidates, it enters into an exchange:

Each side sends candidates, processing received candidates when ready. Candidates switch back and forth until both sides agree, allowing media to flow. "ICE exchange" doesn't mean the two sides take turns making suggestions. When appropriate, each side sends suggested candidates, continuing to do so until exhausted or by reaching an agreement.

If conditions change, for example the network connection deteriorates, one or both peers might suggest switching to a lower-bandwidth media resolution, or alternative codec. In a following candidate exchange, another media format and/or codec change may take place, when the two peers come to agreement on a new format.

Optionally, see RFC 5245: Interactive Connectivity Establishment, section 2.6 ("Concluding ICE") if you want greater understanding of this process is completed inside the ICE layer. You should note that candidates are exchanged and media starts to flow as soon as the ICE layer is satisfied. This all taken care of behind the scenes. Our role is to simply send the candidates, back and forth, through the signaling server.

The client application

Now, let's apply all these above concepts to our sample of code.

The core to any signaling process is its message handling. It's not necessary to use WebSockets for signaling, but it is a common solution. Other solutions from your workflow might also be performant, acheiving the same outcome.

Updating the HTML

The HTML for our client needs a location for video to be presented. This requires video elements, and a button to hang up the call:

      <div class="flexChild" id="camera-container">
        <div class="camera-box">
          <video id="received_video" autoplay></video>
          <video id="local_video" autoplay muted></video>
          <button id="hangup-button" onclick="hangUpCall();" disabled>
            Hang Up
          </button>
        </div>
      </div>

The page structure defined here is using <div> elements, giving us full control over the page layout by enabling the use of CSS. We'll skip layout detail in this guide, but take a look at the CSS on Github to see how we handled it. Take note of the two <video> elements, one for your self-view, one for the connection, and the <button> element.

The <video> element with the id "received_video" will present video received from the connected user. We specify the autoplay attribute, ensuring once the video starts arriving, it immediately plays. This removes any need to explicitly handle playback in our code. The "local_video" <video> element presents a preview of the user's camera; specifiying the muted attribute, as we don't need to hear local audio in this preview panel.

Finally, the "hangup-button" <button>, to disconnect from a call, is defined and configured to start disabled (setting this as our default for when no call is connected) and apply the function hangUpCall() on click. This function's role is to close the call, and send a signalling server notification to the other peer, requesting it also close.

The JavaScript code

We'll divide this code into functional areas to more easily describe how it works. The main body of this code is found in the connect() function: it opens up a WebSocket server on port 6503, and establishes a handler to receive messages in JSON object format. This code generally handles text chat messages as it did previously.

Sending messages to the signaling server

Throughout our code, we call sendToServer() in order to send messages to the signaling server. This function uses the WebSocket connection to do its work:

function sendToServer(msg) {
  var msgJSON = JSON.stringify(msg);
  connection.send(msgJSON);
}

The message object passed-in is converted into a JSON string by calling JSON.stringify(), then we call the WebSocket connection's send() function to transmit the message to the server.

UI to start a call

The code which handles the "userlist" message calls handleUserlistMsg(). Here we set up the handler for each connected user in the user list displayed to the left of the chat panel. This function receives a message object whose users property is an array of usernames for every user online. We will look at this code in sections to make the explanation easier to follow.

function handleUserlistMsg(msg) {
  var i;
  var listElem = document.getElementById("userlistbox");
  while (listElem.firstChild) {
    listElem.removeChild(listElem.firstChild);
  }
  // …

We get a reference to the <ul>, which contains the list of usernames, into the variable listElem. Then we empty this list, removing each child element one by one.

Obviously, it would be more efficient to update the list by adding and removing individual users instead of rebuilding the whole list every time it changes, but this is good enough for the purposes of this example.

We then build the new user list:

  // …
  for (i=0; i < msg.users.length; i++) {
    var item = document.createElement("li");
    item.appendChild(document.createTextNode(msg.users[i]));
    item.addEventListener("click", invite, false);
    listElem.appendChild(item);
  }
}

Next, we create and insert <li> elements into the DOM, one for each user currently connected to the chat server. Then we add a listener to each of them so invite() is called when the username is clicked; this initiates the process of calling the user.

Starting a call

When the user clicks on a username they want to call, the invite() function is invoked as the event handler for that click event:

var mediaConstraints = {
  audio: true, // We want an audio track
  video: true // ...and we want a video track
};
function invite(evt) {
  if (myPeerConnection) {
    alert("You can't start a call because you already have one open!");
  } else {
    var clickedUsername = evt.target.textContent;
    if (clickedUsername === myUsername) {
      alert("I'm afraid I can't let you talk to yourself. That would be weird.");
      return;
    }
    targetUsername = clickedUsername;
    createPeerConnection();
    navigator.mediaDevices.getUserMedia(mediaConstraints)
    .then(function(localStream) {
      document.getElementById("local_video").srcObject = localStream;
      myPeerConnection.addStream(localStream);
    })
    .catch(handleGetUserMediaError);
  }
}

The first thing that happens are a couple of quick sanity checks: is there already a call open? Did the user click on their own username? In those cases, we don't want to start a new call, so alert() is invoked to explain why the call can't be opened.

Then we pull the name of the user we're calling into the variable targetUsername and call createPeerConnection(), a function which will create and do basic configuration of the RTCPeerConnection.

Once the RTCPeerConnection has been created, we request access to the user's camera and microphone by applying MediaDevices.getUserMedia(), which is exposed to us through the Navigator.mediaDevices.getUserMedia property. When this succeeds, fulfilling the returned promise, our then clause is performed. It receives, as input, a MediaStream object, representing the stream from the user's camera and microphone.

We set our local video preview's srcObject property to the stream and, since the <video> element is configured to automatically play incoming video, the stream begins playing in our local preview box.

Then we call myPeerConnection.addStream() to add the stream to the RTCPeerConnection. This starts feeding our stream to the WebRTC connection, even though this is not yet fully set up. This stream needs adding to the connection before ICE negotiation can occur, since the ICE layer will use information from the stream when negotiating the connection. That action happens when we receive the negotiationneeded event.

If an error occurs while trying to get the local media stream, our catch clause calls handleGetUserMediaError(), can displays an appropriate error to the user as required.

Handling getUserMedia() errors

If the promise returned by getUserMedia() concludes in a failure, our handleGetUserMediaError() function performs.

function handleGetUserMediaError(e) {
  switch(e.name) {
    case "NotFoundError":
      alert("Unable to open your call because no camera and/or microphone" +
            "were found.");
      break;
    case "SecurityError":
    case "PermissionDeniedError":
      // Do nothing; this is the same as the user canceling the call.
      break;
    default:
      alert("Error opening your camera and/or microphone: " + e.message);
      break;
  }
  closeVideoCall();
}

An error message is displayed in all cases but one. In this example, we ignore "SecurityError" and "PermissionDeniedError" results, treating refusal to grant permission to use the media hardware like canceling the call.

Regardless of why an attempt to get the stream fails, we call our closeVideoCall() function to shut down the RTCPeerConnection, and release any resources already allocated by the process of attempting the call. This code is designed to safely handle partially-started calls.

Creating the peer connection

The createPeerConnection() function is used by both the caller and the callee to construct their RTCPeerConnection objects, representing each end of their WebRTC connection. It's invoked by the use of invite() on the caller side, and by handleVideoOfferMsg() on the callee side.

It's rather explicit:

var myHostname = window.location.hostname;
function createPeerConnection() {
  myPeerConnection = new RTCPeerConnection({
      iceServers: [     // Information about ICE servers - Use your own!
        {
          urls: "turn:" + myHostname,  // A TURN server
          username: "webrtc",
          credential: "turnserver"
        }
      ]
  });
// …

Since we're running a STUN/TURN server, on the same host as the Web server, we get its domain name using location.hostname.

When we apply the RTCPeerConnection constructor, we specify parameters which configure the call; the most important one is iceServers, a list of STUN and/or TURN servers, for the ICE layer to use when trying to establish a route between the caller and the callee. WebRTC uses STUN and/or TURN to find a route and protocol to use to communicate between the two peers, even if they're behind a firewall or using NAT.

You should always use STUN/TURN servers which you own, or which you have specific authorization to use.

The iceServers parameter is an array of objects, each contains at least a urls field, detailing the URLs this server can be reached at. In our example, we provide a single server for the ICE layer to try to locate and link to the other peer: a TURN server running on the same hostname as the Web server. Note the inclusion of username and password information, through the username and credential fields for the TURN server's description.

Set up event handlers

Once the RTCPeerConnection is created, we need to set up handlers for the events that matter to us:

// …
  myPeerConnection.onicecandidate = handleICECandidateEvent;
  myPeerConnection.onaddstream = handleAddStreamEvent;
  myPeerConnection.onremovestream = handleRemoveStreamEvent;
  myPeerConnection.oniceconnectionstatechange = handleICEConnectionStateChangeEvent;
  myPeerConnection.onicegatheringstatechange = handleICEGatheringStateChangeEvent;
  myPeerConnection.onsignalingstatechange = handleSignalingStateChangeEvent;
  myPeerConnection.onnegotiationneeded = handleNegotiationNeededEvent;
}

The first two of these event handlers are required; you have to handle them to do anything involving streamed media with WebRTC. The removestream event is useful for detecting the cessation of streaming, so you'll probably use that too. The remaining four handlers are not mandatory, but have uses that we'll explore. There are a few other events available to us, but we're not using them in this example. Here's a summary of each:

RTCPeerConnection.onicecandidate: The local ICE layer calls your icecandidate event handler, when it needs you to transmit an ICE candidate to the other peer, through your signaling server. See Sending ICE candidates for more information and to see the code for this example.
RTCPeerConnection.onaddstream: This handler for the addstream event is called by the local WebRTC layer, to let you know when a remote stream has been added to your connection. This lets you connect the incoming stream to an element to display it, for example. See Receiving new streams for details.
RTCPeerConnection.onremovestream: This counterpart to onaddstream is called to handle the removestream event, when the remote stream removes a stream from your connection. See Handling the removal of streams.
RTCPeerConnection.oniceconnectionstatechange: The iceconnectionstatechange event is sent by the ICE layer to let you know about changes to the state of the ICE connection. This can help you know when the connection has failed, or been lost. We'll look at the code for this example in ICE connection state below.
RTCPeerConnection.onicegatheringstatechange: The ICE layer sends you the icegatheringstatechange event, when the ICE agent's process of collecting candidates shifts, from one state to another (such as starting to gather candidates or completing negotiation). See ICE gathering state below.
RTCPeerConnection.onsignalingstatechange: The WebRTC infrastructure sends you the signalingstatechange message when the state of the signaling process changes (or if the connection to the signaling server changes). See Signaling state to see our code.
RTCPeerConnection.onnegotiationneeded: This function is called whenever the WebRTC infrastructure needs you to start the session negotiation process anew. Its job is to create and send an offer, to the callee, asking it to connect with us. See Starting negotiation to see how we handle this.

Starting negotiation

Once the caller has created its RTCPeerConnection, created a media stream, and added it to the connection as shown in Starting a call, the browser will activate a negotiationneeded event when it's ready for a connection with another peer. Here's our code for handling such an event:

function handleNegotiationNeededEvent() {
  myPeerConnection.createOffer().then(function(offer) {
    return myPeerConnection.setLocalDescription(offer);
  })
  .then(function() {
    sendToServer({
      name: myUsername,
      target: targetUsername,
      type: "video-offer",
      sdp: myPeerConnection.localDescription
    });
  })
  .catch(reportError);
}

To start the negotiation process, we need to create and send an SDP offer to the peer we want to connect to. This offer will include a list of supported configurations for the connection, including information about the media stream we've added to the connection locally (that is, the video we want to send to the other end of the call), and any ICE candidates gathered by the ICE layer already. We create this offer by calling myPeerConnection.createOffer(). When this succeeds (fulfilling the promise), we pass the created offer information into myPeerConnection.setLocalDescription(), which configures the connection and media configuration state for the caller's end of the connection.

Technically speaking, the blob returned by createOffer() is an RFC 3264 offer.

We know the description is valid, and has been set, when the promise returned by setLocalDescription() is fulfilled. This is when we send our offer to the other peer, by creating a new "video-offer" message, containing the local description (now the same as the offer), and sending it through our signaling server to the callee. The offer has the following members:

type: The message type: "video-offer".
name: The caller's username.
target: The name of the user we wish to call.
sdp: The SDP blob describing the offer.

If an error occurs, either in the initial createOffer() or in any of the fulfillment handlers that follow, an error is reported by invoking our reportError() function.

Once setLocalDescription()'s fulfillment handler has run, the ICE agent begins discharging icecandidate events to be handled.

Session negotiation

Now we're negotiating with the other peer, it receives our offer, passing it to the handleVideoOfferMsg() function. We continue this story with the "video-offer" message's arrival to the callee.

Handling the invitation

When the offer arrives, the callee's handleVideoOfferMsg() function is triggered, and given our "video-offer" message, containing this offer. This code needs to do two things. FIrst, it needs to create its own RTCPeerConnection and media stream. Second, it needs to process the received offer, constructing and sending its answer.

function handleVideoOfferMsg(msg) {
  var localStream = null;
  targetUsername = msg.name;
  createPeerConnection();
  var desc = new RTCSessionDescription(msg.sdp);
  myPeerConnection.setRemoteDescription(desc).then(function () {
    return navigator.mediaDevices.getUserMedia(mediaConstraints);
  })
  .then(function(stream) {
    localStream = stream;
    document.getElementById("local_video").srcObject = localStream;
    return myPeerConnection.addStream(localStream);
  })
// …

This code is very similar to what we did in the invite() function back in Starting a call. It starts by creating and configuring an RTCPeerConnection using our createPeerConnection() function. Then it takes the SDP offer, from the received "video-offer" message, using it to create a new RTCSessionDescription object representing the caller's session description.

The session description is then passed into myPeerConnection.setRemoteDescription(). This establishes the received offer as the caller's session information. If this is successful, the promise fulfillment handler (in the then() clause) starts the process of getting access to the callee's camera and microphone, setting up the stream, and so forth, as we saw previously in invite().

Once the local stream is up and running, it's time to create an SDP answer, and send it to the caller:

  .then(function() {
    return myPeerConnection.createAnswer();
  })
  .then(function(answer) {
    return myPeerConnection.setLocalDescription(answer);
  })
  .then(function() {
    var msg = {
      name: myUsername,
      target: targetUsername,
      type: "video-answer",
      sdp: myPeerConnection.localDescription
    };
    sendToServer(msg);
  })
  .catch(handleGetUserMediaError);
}

Once RTCPeerConnection.addStream() successfully completes, execution and the next fulfillment handler is mobilized. We invoke myPeerConnection.createAnswer() to construct an SDP answer string. This is passed to myPeerConnection.setLocalDescription, to establish the resulting SDP as a description of the callee's local end of the connection.

This concluding answer is sent to the caller, so it knows how to reach the callee. This is achieved by constructing a "video-answer" message, whose sdp property contains the callee's answer.

Any errors are caught and passed to handleGetUserMediaError(), described in Handling getUserMedia() errors.

As is the case with the caller, once the setLocalDescription() fulfillment handler has run, the browser begins firing icecandidate events that the callee must handle.

Sending ICE candidates

You might think everything has completed. once the caller receives an answer from the callee, but it's not. Behind the scenes, the ICE agents of each peer need to eagerly exchange ICE candidate messages. Each peer sends candidates to the other, repeatedly, until it's informed of each way it can be contacted for each media transport. These candidates must be sent through your signaling server; since ICE doesn't know about your signaling server, your code is handles transmission of each candidate, by calling your handler for the icecandidate event.

Your onicecandidate handler receives an event, whose candidate property is the SDP describing the candidate (or null to mark the end of candidates); this is what's needed to transmit to the other peer through your signaling server. Here's our example's implementation:

function handleICECandidateEvent(event) {
  if (event.candidate) {
    sendToServer({
      type: "new-ice-candidate",
      target: targetUsername,
      candidate: event.candidate
    });
  }
}

This builds an object, containing the candidate, sending it to the other peer. The sendToServer() function is described in Sending messages to the signaling server. The message properties are:

target: The username the ICE candidate needs sending to. This enables the signaling server to route the message.
type: The message type: "new-ice-candidate".
candidate: The candidate object the ICE layer wants to transmit to the other peer.

The format of this message (as is the case with everything you do when handling signaling) is entirely up to you, depending on your needs; you can provide other information as required.

It's important to keep in mind that the icecandidate event is not sent when ICE candidates arrive from the other end of the call. Instead, they're sent by your own end of the call so that you can take on the job of transmitting the data over whatever channel you choose. This can be confusing when you're new to WebRTC.

Receiving ICE candidates

The signaling server delivers each ICE candidate to the destination peer using whatever method it chooses; in our example this is as JSON objects, with the type "new-ice-candidate". Our handleNewICECandidateMsg() function is called to handle these messages:

function handleNewICECandidateMsg(msg) {
  var candidate = new RTCIceCandidate(msg.candidate);
  myPeerConnection.addIceCandidate(candidate)
    .catch(reportError);
}

We construct an RTCIceCandidate object by passing the received SDP into its constructor, passing the new object into myPeerConnection.addIceCandidate(). This hands the fresh ICE candidate to the local ICE layer, and finally, our role in the process of handling this candidate is complete.

Each peer sends to the other peer, a candidate for each connection method it expects to work. The two sides come to agreement and open their connection; note that candidates can still keep coming and going after the conversation has begun, either while trying to find a better connection method, or simply because they were already underway when the peers finished establishing their connection.

Receiving new streams

When a new stream is added to the connection, by the remote peer (by that peer expressly calling RTCPeerConnection.addStream(), or automatically due to a renegotiation of the stream format), an addstream event is triggered. Here's how our sample handles these:

function handleAddStreamEvent(event) {
  document.getElementById("received_video").srcObject = event.stream;
  document.getElementById("hangup-button").disabled = false;
}

This function assigns the incoming stream to the "received_video" <video> element, and enables the button element so the user can hang up the call.

Once this code has completed, finally the video being sent by the other peer is displayed in the local browser window!

Handling the removal of streams

Your code receives a similar removestream event, when the remote peer removes a stream from the connection, by calling RTCPeerConnection.removeStream(). Our implementation is very simple:

function handleRemoveStreamEvent(event) {
  closeVideoCall();
}

This invokes our closeVideoCall() function to hang up, ensuring the call is closed, leaving our interface ready to start another connection. See Ending the call to understand how that code works.

Ending the call

There are many reasons why calls may end. A call might have completed, with one or both sides having hung up. Perhaps a network failure has occurred. Or one user might have quit their browser, or had a systen crash.

Hanging up

When the user clicks the "Hang Up" button to end the call, the hangUpCall() function is apllied:

function hangUpCall() {
  closeVideoCall();
  sendToServer({
    name: myUsername,
    target: targetUsername,
    type: "hang-up"
  });
}

hangUpCall() executes closeVideoCall(), shutting down and resetting the connection and related resources. We then build a "hang-up" message, sending this to the other end of the call, allowing the other peer to neatly shut down.

Ending the call

The closeVideoCall() function, shown below, is responsible for stopping the streams, cleaning up, and disposing of the RTCPeerConnection object:

function closeVideoCall() {
  var remoteVideo = document.getElementById("received_video");
  var localVideo = document.getElementById("local_video");
  if (myPeerConnection) {
    if (remoteVideo.srcObject) {
      remoteVideo.srcObject.getTracks().forEach(track => track.stop());
      remoteVideo.srcObject = null;
    }
    if (localVideo.srcObject) {
      localVideo.srcObject.getTracks().forEach(track => track.stop());
      localVideo.srcObject = null;
    }
    myPeerConnection.close();
    myPeerConnection = null;
  }
  document.getElementById("hangup-button").disabled = true;
  targetUsername = null;
}

After pulling references to the two <video> elements, we check if a WebRTC connection exists; if it does, we proceed to disconnect and close the call:

For both remote and local video streams, we iterate over each track, calling the MediaTrack.stop() method.
Set both videos' HTMLMediaElement.srcObject properties to null, releasing all references to the stream.
Close the RTCPeerConnection by calling myPeerConnection.close().
Set myPeerConnection to null, ensuring our code learns there's no ongoing call; this is useful when the user clicks a name in the user list.

Finally, we set the disabled property to true on the "Hang Up" button, making it unclickable while there is no call; then we set targetUsername to null since we're no longer talking to anyone. This allows the user to call another username, or to receive an incoming call.

Dealing with state changes

There are a number of events you could set listeners for, notifying your code of a variety of state changes. We use three of them: iceconnectionstatechange, icegatheringstatechange, and signalingstatechange.

ICE connection state

iceconnectionstatechange events are sent to us by the ICE layer when the connection state changes (such as when the call is terminated from the other end).

function handleICEConnectionStateChangeEvent(event) {
  switch(myPeerConnection.iceConnectionState) {
    case "closed":
    case "failed":
    case "disconnected":
      closeVideoCall();
      break;
  }
}

Here, we apply our closeVideoCall() function when the ICE connection state changes to "closed", "failed", or "disconnected". This handles shutting down our end of the connection, going back to a start (or accept) call state.

ICE signaling state

Similarly, we watch for signalingstatechange events, should the signaling state change to "closed", we shut down the call completely.

  myPeerConnection.onsignalingstatechange = function(event) {
    switch(myPeerConnection.signalingState) {
      case "closed":
        closeVideoCall();
        break;
    }
  };

ICE gathering state

icegatheringstatechange events are used to let you know when the ICE candidate gathering process state changes. Our example doesn't use this for anything, but we're implementing it for logging, observing via the console log how the whole process works.

function handleICEGatheringStateChangeEvent(event) {
  // Our sample just logs information to console here,
  // but you can do whatever you need.
}

Next steps

You can now play with this sample to see it in action. Open the Web console on both devices and look at the logged output—although you don't see it in the code as shown above, the code on the server (and on GitHub) has a lot of console output so you can see the signaling and connection processes at work.

Summary

The signaling server

Readying the chat server for signaling

Designing the signaling protocol

Exchanging session descriptions

Exchanging ICE candidates

Signaling transaction flow

ICE candidate exchange process

The client application

Updating the HTML

The JavaScript code

Sending messages to the signaling server

UI to start a call

Starting a call

Handling getUserMedia() errors

Creating the peer connection

Set up event handlers

Starting negotiation

Session negotiation

Handling the invitation

Sending ICE candidates

Receiving ICE candidates

Receiving new streams

Handling the removal of streams

Ending the call

Hanging up

Ending the call

Dealing with state changes

ICE connection state

ICE signaling state

ICE gathering state

Next steps

Document Tags and Contributors