
Audio conferences are different, as well as the tasks that they solve: centralized (on the server), client, distributed. In our case, we consider the first two options - centralized on the side of the VoxImplant cloud and client, made directly in the browser using WebAudio and WebRTC (yes, and this has already become possible!). Both options have their pros and cons, which we will examine in more detail under the cut, and also tell you about how to use them and about the pitfalls (wherever without them!).
Server conferences
From the name it follows that the audio stream mixing occurs on the server side. For each participant of the conference, their own mix is created in which there are all participants except himself (you do not want to listen to your echo). In addition, conferences have a number of parameters that affect the sound quality. For example, the sampling rate at which it works. In the case of VoxImplant, we have 2 options - regular and HD. In the usual frequency of 8KHz and they are best suited to combine calls from the telephone network, there above 8KHz still will not work. In the case of HD, we took the path of creating maximum quality, and therefore in this case we mix already at 48KHz (maximum for WebRTC in the browser). Since server resources are used, it is difficult to make such conferences free, the hardware and traffic are still worth something :)

During the creation of server conferences, we had to use all sorts of different innovative technologies that suppress noise well (NR), effectively identify speakers (VAD), and so on, all this in the most direct way affects both sound quality and scalability: stream encoding and decoding nobody canceled (mixing and resampling are not the most difficult tasks). We primarily focus on WebRTC, so the main codec is Opus, but you can also connect from SIP with any of the following: G.711, Speex (and Opus).
A conference on the VoxImplant side is created as follows (VoxEngine script):
')
Calls are sent there using the
callConference function, so you will have to make a separate script, which will send calls to the conference from different sources (PSTN, WebSDK, MobileSDK or SIP) and prescribe the corresponding rule (Pattern) of the application. More details about working with conferences in VoxImplant can be found
at this link .
What are good server conferences? Many participants (by default up to 100 in the case of VoxImplant), server-side conference management (this can be quite useful in some cases), better sound quality. We have already listed the disadvantages - this is not free, as server resources are required.
Poor man's conferencing: client-side conferences
We are all familiar with Skype and its excellent audio conferencing capability. This is the same client-side conferencing, the host is the user who creates the conference, and accordingly all will be mixed on his computer. If the Internet or hardware from this comrade is not very good, then everyone will suffer, but it’s free! :)

After recent significant updates of WebRTC and Web Audio in Chrome and Firefox, it became possible to implement the same script right at the browser level. I was very excited when I started to implement this idea. But my passion was a little bit worse after I had to tinker pretty much so that it all started without any extra effects and regardless of the participants' browsers (WebRTC is still in Chrome / Chromium and Firefox). Let's start with the theory ...
RTCPeerConnection
This excellent class (hereinafter referred to as PC) from WebRTC gives us the opportunity to transmit sound (and video, but this time without it) in real time, connecting to the stream (local stream) from the microphone, through the network to someone at the other end and from there receive another stream (remote stream). Initially, everything revolved around MediaStream in WebRTC (that local stream from the microphone is an object of this class), but now the standard has evolved a bit and everything has moved towards Audio / VideoTracks (for better video conferencing, but about this another time). What does not cancel work with the MediaStream class when we go to the Web Audio plane. We will not consider how to make a P2P call using WebRTC, there are many other articles about it + on VoxImplant this is done quite simply. So, what should we do to mix the sound from different PCs and our microphone? Let's start with the simple:
In order to combine different streams, we will need
ChannelMergerNode , this is our mixer, we need such as much as we have participants in the conference and each participant will receive a mix of the rest except himself, it looks like this:
window.AudioContext = window.AudioContext || window.webkitAudioContext; var audioContext = new AudioContext(); var mediaStreamSource = audioContext.createMediaStreamSource( local_stream ), participant1 = audioContext.createMediaStreamSource( participant1_stream ), participantN = audioContext.createMediaStreamSource( participantN_stream );
Nothing ingenious, but believe that browser developers had to pretty much tinker with this to work. Do not you think that everything is somehow too simple? :) So it seemed to me, until it came to testing. Checking the sending of a mix from Chrome to Firefox revealed that only 1 of all media streams sent to the mix is playing, while in cases of Chrome-> Chrome, Firefox-> Chrome, Firefox-> Firefox everything works fine. An attempt to comprehend the reason for this behavior has not yet led to success; we have written about this to colleagues at Google and Mozilla, but at the time of writing this current article, we have not received an answer yet. As soon as there is an understanding of the problem or a way to solve the problem, then we will definitely write about it in PS
Demos
Finally, we suggest that you familiarize yourself with the demos that we quickly assembled on VoxImplant: the
first one uses the client approach (
+ github ) - in it you need to choose who we call to connect to the client conference, and the
second one uses server conferences (
+ github ) - here everyone simply connects to one conference. We are always happy to read your thoughts and comments, all successful conference!