Often, customers ask if our server is able to “mp4 streaming in HTML5”. In 99% of cases, the questioner does not understand what he is talking about. It’s hard to blame clients: because of the confusion with terms, technical complexity and a wide variety of streaming options, it’s very easy to get confused.
In this article we will tell you what HTML5 streaming is like, what options are good, and why the hell you can't say “mp4 streaming”.
▍Terms
HTML5 video is when you insert a
<video> tag into a web page and point it to some
src . HTML5 streaming is the same HTML5 video, but when the
src is not a ready file, but a constantly updated video stream. The video on YouTube is an HTML5 video, tweeting is HTML5 streaming.
The
<video> tag does not matter how the video stream is generated and transmitted, and whether the browser can play it. The main thing is that src has a link to some kind of video stream. Technically speaking, the specification says nothing about which
protocols, transports, and
codecs are supported in HTML5 video.
')
A protocol is how two participants in a video call (almost always a client and a server) exchange data in order to receive data. The client is the one who comes to the server and initiates a communication session. The video stream can flow from the server to the client (then this is the usual playback) or from the client to the server (then this is the publication). Even when a giant wardrobe, eating electricity like an apartment building, comes to a small IP camera, it will be the server, and this wardrobe will be the client.
The protocol usually implies at least a Play command (start playback), but sometimes there are also extended options: pause, continue, publish, rewind, etc.
Examples of protocols:
RTSP, RTMP, HTTP, HLS, IGMP .
Transport , or
transport container , or
container - this is how compressed video is packaged in bytes for transmission from one participant to another (using some protocol).
Examples of containers:
MPEG-TS, RTMP, RTP .
Please note that RTMP is in the protocols and transports. This is because in the description of RTMP there is a specification of what the parties have to send to each other so that the video will run out (i.e., the protocol), and how to package the video (i.e. transport). This is not always the case. For example, in the RTSP protocol, the video is packaged in an RTP transport. |
Codec is a multi-valued term. Here it means a way to compress raw video. The difference between a codec and a transport is that a codec is about preparing a video, and a transport is about transmitting video over a protocol. Video compressed by one codec can be sent using different protocols and different transports. Most video streaming servers do not climb deeper than encoded video and operate only with protocols and transports.
Codec examples:
h264, aac, mp3 .
Due to the fact that the term is multi-valued, there is a confusion with the names. For example, H.264 is a standard for how to pack a stream of huge raw video frames in very few bytes, libx264 is a library for compression according to this standard, and there is also the software of the same name under Windows that can decode h264 and play it on the screen. |
So, the HTML5 specification does not describe protocols, transports, and codecs. Therefore, the authors of the browsers themselves choose what to support, and by “HTML5 streaming” they mean different things.
At the same time there are combinations that are supported by a significant part of browsers. Consider the most promising.
▍HLS
HLS is h264 video and aac or mp3 audio packed in an MPEG-TS transport. The stream is divided into segments described in m3u8 playlists and distributed over HTTP. HLS supports multibitrate streams, Live / VOD. The option is very simple, but at the same time it has a lot of details, because of which it works differently on different devices.
We developed HLS in Epple, so initially it worked only in Safari on iOS and MacOS. Even Safari on Windows did not know how to play HLS (when there was still a version under Win).
Nevertheless, now HLS can play all set-top boxes and even almost all devices on Android.
But all is not smooth. Manufacturers of third-party players spat on the Apple standard in terms of reporting various audio tracks and added playing all that is in ordinary MPEG-TS: mpeg2 video, mpeg2 audio, etc. Because of this, you have to give different playlist formats for different players.
▍MPEG-DASH
MPEG-DASH is usually h264 / h265 video and aac audio packaged in a mp4 or vp8 / vp9 transport packed in WebM, although the standard is not tied to specific codecs, protocols, or transports. As in HLS, a stream can be broken up into segments, but this is optional. Instead of playlists - MPD manifest in XML.
MPEG-DASH is much like HLS. Perhaps it is even more popular, because such giants as YouTube and Netflix have been using it as the main way to distribute content for several years.
MPEG-DASH is good because in most browsers it works natively through MSE (about what it is - just below). For him, even there is no implementation on Flash - this is honest, uncompromising HTML5.
Definitely, MPEG-DASH is the real HTML5 streaming, followed by the future.
▍MSE
When it became clear that Flash would die anyway (after hundreds of false burials), the question of what would replace him would have an edge. It would be nice to get in browsers the ability to play video in terms of quality and convenience close to what Flash can do (and he does it all the same well).
In Flush, a very convenient mechanism appeared long ago for the universal playback of different variants - appendBytes. The bottom line is that the user code itself wants to download frames of compressed video, packs it into a specified container (with Flush it is flv) and shoves it in a video player. That is, the protocol and transport are implemented in user code that runs in the browser.
MSE (Media Sources Extensions) is an extension to the HTML5 specification that allows you to do what appendBytes does in Flash. Unfortunately, MSE is much more difficult both in understanding and in implementation.
MPEG-DASH, created on its base, is even more cunning, so working with them is still a pleasure: tons of XML, parsing of binary containers in Javascript, unreasoned at the design stage cutting questions into segments - everything as we love, everything you need for a single, bugless implementation in all browsers.
Interestingly, MSE works not only with MPEG-DASH, but also with HLS. There is an implementation of hls.js that downloads HLS playlists, downloads MPEG-TS segments, repacks them into the required format for MSE and plays through MSE. Apple even made a step towards compatibility with MPEG-DASH - the use of mp4-containers in HLS.
By the end of 2017, Flash will most likely die completely, and today we can safely begin the project with MPEG-DASH.
▍WebRTC
In Flash, a suitable attempt was made in one technology to implement both real-time communication and mass broadcasting. Unfortunately, in HTML5 it did not happen. We have MSE to view the broadcasts, and WebRTC for video calls.
WebRTC is SIP in a browser: a way to organize an audio and video channel and a data channel between two browsers through a server.
The technology is not intended for streaming, but in principle it can, so it would be wrong to forget about it. WebRTC is also considered to be HTML5, because it seems to require nothing but JavaScript in the browser. But it requires the availability of the latest versions of both popular browsers, and with Edge is not compatible at all.
Confusion in the understanding of WebRTC brings its use in the torrent-delivery of television. The bottom line is that browsers through WebRTC organize a network of data channels, and then HLS- or MSE-segments of the video are distributed over this network, and playback takes place via Flash or MSE. That is, WebRTC - for delivery, MSE - for playback. It is important not to confuse this with using WebRTC to play videos.
▍ So what's up with the mp4 streaming?
Any modern browser is likely to be able to request via HTTP protocol a file packed into the mp4 transport and containing a video compressed with the h264 / aac codec. And even try to lose it. This is the most convenient, understandable and standard version of playing files. It lays itself a file on the disk, nginx gives it. The code that plays mp4 in browsers is good enough. For example, he can even download pieces of video as needed (unlike a Flash player that downloads the entire video).
Around the h264 there was a lot of hype about his "closeness" and "lack of freedom." So there is an “open” alternative that Google forsates - vp8 and vp9 video codecs packed into WebM transport. WebM is a subset of the mkv transport (a. K. A. Matryoshka), which is very similar to mp4 in essence, but differs from it in its “binary”.
It is from here that the legs grow in such a phenomenon as “mp4 streaming”, which is arranged as WebM. The fact is that in the usual mp4 at the very beginning the size of the entire container is indicated. Therefore, if we want to broadcast live via normal mp4, we will fail. And in order to succeed and create mp4 without a fixed end, the next move was invented: first, mp4 is written without frames, and then fragments with frames are signed at the end in blocks of several seconds. This is called mp4 fragmented, or mp4 streaming.
In fact, this is not a streaming, but a crutch, allowing you to create its appearance. Mp4 is a great format for downloading videos, but not suitable for streaming, so you can simply forget about it and never use the term mp4 streaming.
▍ Conclusions
- Good HTML5 streaming options: MPEG-DASH and HLS. They are suitable for mobile devices, and for PCs, and consoles.
- The flash will die anyway, and MSE is already taking its place.
- WebRTC - HTML5 technology, primarily for communication, but not for television broadcasting.
- Do not bring old codecs to the web or try to deliver mp2video and mp2audio via HLS, even if your player is able to.
- Never say “mp4 streaming”. You are welcome.