This article describes the ancient story of how I managed to implement the switching of audio tracks for a Flash player using the WMPA Server 2 RTMP server.
Back in 2011, I was researching the capabilities of streaming servers for Adobe Flash Player. My task was to find a way to play video files with several audio tracks. In this case, it was necessary that the switching occurred without jumps in the video being played. Search for ready-made solutions on the Internet did not give any results then. Moreover, it turned out that the Adobe Flash Player itself cannot switch tracks and uses only the first one ...
Helped me advertise Adobe Flash Media Server'a. In the examples of this server there was a player with support for adaptive streaming. He was able to seamlessly switch the video stream from one bitrate to another and vice versa. After a little digging, I found the following details:
- video must be pre-encoded in different bitrates;
- data transfer is via RTMP;
- quality switching is performed by a Flash application using the NetStream.play2 function.
I tried to do this trick on files with the same video, but with different audio tracks. The experiment was successful, switching streams, I heard different audio tracks, while the transition from one video file to another was visually imperceptible. But it was still too early to rejoice, since along with N sound tracks one also has to store N copies of the video sequence, and this is too expensive.
After analyzing the data that the server sends to the Flash-player via the RTMP protocol, I found that the audio and video streams are going in separate packets from each other. At the same time, the extra audio tracks were not transmitted at all. That is, the RTMP server itself is engaged in selecting the necessary tracks from the container (demuxing). This information inspired me, and I began to study the RTMP server in more detail with the possibility of adaptive streaming. One of these servers was Wowza Media Server version 2.
')
A distinctive feature of the Wowza Media Server is that it allows you to create classes to play any media files, all you need to do is implement the IMediaReader interface and declare your class in the server configuration. But instead of writing my own mp4 container decoder, I began to reverse-engineer the server classes.
Having decompiled the MediaReaderH264 and QTMediaContainer classes from the wms-mediareader-h264.jar file, I noticed the following lines:
Firstly, it is obvious that MediaReaderH264 has access to the moov atom. Secondly, since the link to the container is a protected-field, access can be obtained by inheriting from this class.
What is a moov atom? According to the mp4 container specification, the moov atom contains all the information about frame rate, film length, frame arrangement, decoder configuration, etc. It also contains a set of trak-atoms that describe audio and video tracks, and this is exactly what we need.
By decompiling the QTAtommoov class, you can see the following picture:
public class QTAtommoov extends QTAtom { public QTAtomtrak getTrackByMinf(String s) { QTAtomtrak qtatomtrak = null; Iterator iterator = traks.iterator(); do { if(!iterator.hasNext()) break; QTAtomtrak qtatomtrak1 = (QTAtomtrak)iterator.next(); if(qtatomtrak1 == null || !qtatomtrak1.getMinfType().equals(s)) continue; qtatomtrak = qtatomtrak1; break; } while(true); return qtatomtrak; } public QTAtomtrak getAudioTrack() { QTAtomtrak qtatomtrak = getTrackByMinf("smhd"); try { QTAtomstbl qtatomstbl = qtatomtrak != null ? qtatomtrak.getMdiaAtom().getMinfAtom().getStblAtom() : null; if(!qtatomstbl.isValidAudioFormat()) qtatomtrak = null; } catch(Exception exception) { } return qtatomtrak; } . . . }
When trying to get an audio track, the server goes through all trak-atoms and selects the first one that came with the type of smhd (sound media header). That is, the very first audio track is selected.
To test my guesses, I decided to inject into the code of the Wowza Media Server library. At first I thought to fix the decompiled code of the QTAtommoov class a bit, compile it back and just replace the file in the jar-archive. But, to my surprise, everything turned out to be much simpler. In the source code of the server application, I created the com.wowza.wms.mediareader.h264.atom package and put the QTAtommoov.java file there with the following content:
public class QTAtommoov extends QTAtom { public int aTrackNum = 2; . . . public QTAtomtrak getTrackByMinf(String s, int count) { QTAtomtrak qtatomtrak = null; Iterator iterator = traks.iterator(); do { if(!iterator.hasNext()) break; QTAtomtrak qtatomtrak1 = (QTAtomtrak)iterator.next(); if(qtatomtrak1 == null || !qtatomtrak1.getMinfType().equals(s)) continue; if (--count <= 0) { qtatomtrak = qtatomtrak1; break; } } while(true); return qtatomtrak; } public QTAtomtrak getTrackByMinf(String s) { return getTrackByMinf(s, 1); } public QTAtomtrak getAudioTrack() { QTAtomtrak qtatomtrak = getTrackByMinf("smhd", aTrackNum); try { QTAtomstbl qtatomstbl = qtatomtrak != null ? qtatomtrak.getMdiaAtom().getMinfAtom().getStblAtom() : null; if(!qtatomstbl.isValidAudioFormat()) qtatomtrak = null; } catch(Exception exception) { } return qtatomtrak; } }
Thus, a small modification was made: instead of the first audio track, the second one was returned.
Having compiled and deployed the server in this form, I was pleasantly surprised that my class picked up and worked inside the jar library, and the Flash player plays the second audio track in the file. I didn't even have to rebuild the jar library.
Before the final implementation of the prototype of switching audio tracks, it remained only to implement the extended class MediaReaderH264ext and declare it in the server configuration.
public class MediaReaderH264ext extends MediaReaderH264 implements IMediaReader { private String filename; private int aTrackNum; private void init(String basePath, String mediaName) { HashMap<String, String> params = new HashMap<String, String>(); String[] query = mediaName.split(":", 2); if (query.length > 1) { String[] args = query[1].split("&"); for (String arg : args) { String[] keyvalue = arg.split("=", 2); params.put(keyvalue[0], keyvalue.length > 1 ? keyvalue[1] : ""); } } filename = query[0]; aTrackNum = "rus".equals(params.get("lang")) ? 2 : 1; WMSLoggerFactory.getLogger(MediaReaderH264ext.class).info("filename: " + filename); WMSLoggerFactory.getLogger(MediaReaderH264ext.class).info("aTrackNum: " + aTrackNum); } @Override public void init(IApplicationInstance iapplicationinstance, IMediaStream imediastream, String ext, String basePath, String name) { WMSLoggerFactory.getLogger(MediaReaderH264ext.class).info("init: " + name); this.init(basePath, name); super.init(iapplicationinstance, imediastream, ext, basePath, filename); } @Override public void open(String basePath, String name) { WMSLoggerFactory.getLogger(MediaReaderH264ext.class).info("open: " + name); super.open(basePath, name); if (container != null && container.getMoovAtom() != null) container.getMoovAtom().aTrackNum = this.aTrackNum; } }
And to switch the sound, the following code was called in the Flash player:
var opt = new NetStreamPlayOptions(); opt.transition = NetStreamPlayTransitions.SWITCH; opt.streamName = "mp4e:video.mp4:lang=rus"; ns.play2(opt);