What is WebRTC and introduction to how it works!

What is WebRTC and introduction to how it works!

Hey 👋🏻, today we are going to have a glimpse of a powerful video + audio (🎥 📞) calls technology that powers Google Meet, Facebook Messenger & Discord.

  • The abbreviation of WebRTC is Web Real-Time Communication.
  • It is an HTML5 specification that enables real-time communication between two peers. A peer can be a browser or mobile app.
  • It supports video, voice and generic data transfer between peers.
  • It is supported by all modern browsers and native clients as well.
  • It is open-source backed by big giants like Apple, Google, Microsoft and Mozilla etc.
  • It provides a high-level JavaScript API that can be easily used to develop applications for the browser.
  • It's just not limited to browsers, SDK's are available that enables it to be used in any mobile device natively.

In the peer-peer mechanism, a connection has to be established between two peers | devices. In order to establish a connection and locate each other in the network, the two peers are discovered and negotiated by the mechanism called signalling.

In signalling two peers will locate one other by connecting to a mutually agreed third device ( usually signalling server). The only objective of this third party server is to help the two peers locate each other in a secure manner. The required data to set up a connection includes:

  • Messages to intimate about opening or closing a connection.
  • Metadata about media. Eg: Codes, Bandwidth & Protocols.
  • Error messages.
  • IP address
  • etc

In the real world, each device is located behind Firewalls, ISP's, Routers, Access points and various network barriers. In order to get the public IP of the machine, we need to find a way to go beyond the NAT(Network Address Translation).

NAT.jpg

While getting a connection from ISP's the router in our home is assigned with a public IP by ISP that is reachable from the internet.

When any device gets connected to the router inside the home network, it assigns a private IP to that device internally.

According to standards set forth in Internet Engineering Task Force (IETF) document RFC-1918 , the following IPv4 address ranges are reserved for private internets, and are not publicly routable on the global internet:

  • 10.0.0.0/8 IP addresses: 10.0.0.0 – 10.255.255.255
  • 172.16.0.0/12 IP addresses: 172.16.0.0 – 172.31.255.255
  • 192.168.0.0/16 IP addresses: 192.168.0.0 – 192.168.255.255

Now, for example, assume Device 1 wants to connect to google.com and Device 2 to youtube.com. Device 1 sends the packet with the source IP and port as 192.168.1.2:5692 to the router. Now the router modifies the source IP with the public IP (which in our case is 46.206.39.125) that is provided by the ISP and it sends that modified packet over the internet. Before modifying it saves the relevant information as shown below in a tabular form.

NATTable.png

So when there is an incoming packet the router will look up this table and will route it to the intended device.

Note:

  • The IP address of google.com and youtube.com are fetched from public DNS servers by the browser.
  • The NAT concept was introduced in order to tackle the problem of a limited number of available IPv4 address (Max: approx 4.2 Billion).

As we have understood the basics of NAT, let's go to the next step i.e. finding the public IP address. To solve this WebRTC uses a protocol called ICE (Internet Connectivity Establishment) protocol.

It is a protocol for Network Address Translator (NAT) traversal for UDP-based multimedia sessions established with the offer/answer model. It finds an efficient path for the peers to connect easily. ICE makes use of the Session Traversal Utilities for NAT (STUN) protocol and its extension, Traversal Using Relay NAT (TURN).

STUN.png

ICE first tries to create a connection using the host address obtained from a device OS and network card. If this is not possible ( for devices behind the NAT the IP address will be a private IP that cannot be used for public connection so this will fail), then the STUN server is used to get the devices public IP. Even if STUN fails due to deeply nested NAT's or strong firewalls then it creates a connection with the TURN server and traffic is relayed with the help of it.

TURN.png

WebRTC primarily uses three JavaScript API's:

  1. Media Devices & Media Stream
  2. RTC Peer Connection
  3. RTC Data Channel

Media Devices and Stream

Camera, speaker and microphone are called media devices. The permission to access them by the browser is triggered by getUserMedia(MediaStreamConstraints) based on the constraints provided. Available media stream constraints are { video: true, audio: true }

mediapermissions.png

Let us look at a simple stream example with simple code.

Loopback UI.png

In the above image, we can see that my camera is being streamed to my browser. Let's have a look at the code below.

LoopBack.png

  • I require both video and audio, hence I have put them both in constraints.
  • The stream obtained from mediaDevices.getUserMedia(constraints) will contain respective video and audio tracks.
  • Optionally if you want to know the list of available media devices we can use mediaDevices.enumerateDevices() . It will return various available devices available to be used by browser.

mediaDevicesList.png

  • Based on kind key we can differentiate and filter the devices.
  • Now, the above stream obtained is passed as the source to the video tag of the page with the help of a query selector.
  • That's it, a simple video + audio capture in the browser frame is ready.

Once the required tracks are ready, the caller creates RTCPeerConnection and creates an offer by using the tracks obtained from user media. This offer is saved locally and sent to the recipient via a signalling server with the help of ICE protocol.

The receiver receives the offer and saves it as a remote peer's description, after which he captures the local media stream and saves it locally. Now, the receiver will create an answer for the offer and transmit it to the caller via a signalling server with the help of ICE protocol. The callers save the answer as a remote peer's description. At this point, both users know info about each other and the media will get transferred as per the configuration.

The above whole process is called the Offer Answer model and information is exchanged in SDP(Session Description Protocol) format.

A more detailed explanation is provided here.

Finally, that's the introduction of the WebRTC standard. I hope, you will love 😍 it & will find this insightful 😄 .

References & Useful resources 🔗

https://webrtc.org/getting-started/media-devices

https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API

https://www.html5rocks.com/en/tutorials/webrtc/basics/

https://hpbn.co/webrtc/

https://tools.ietf.org/html/rfc5245