Internet radio with a host of leading from different cities and calls live

From May 1 to May 4, 2014, the annual All-Russian Festival of Japanese Animation will be held in Voronezh for the fifteenth time. The festival has become a tradition for us, visitors come from many cities of Russia.

Participants and visitors have a lot of questions and ideas about the festival. And although they have many opportunities to ask these questions to us and get answers, the organizers have come up with a way to give them another opportunity - to organize Internet radio and the ability to call live and ask a question there.

However, this is complicated by the fact that the festival organizing committee itself is geographically highly distributed. We are located in different cities, including Kobe (Japan), Moscow, Rostov-on-Don, Waldkirch (Germany), Krasnodar, and of course, Voronezh, and it is very long and expensive to physically gather in one place. (It is enough that everyone gathers at the festival itself.) And you also need to organize incoming calls on the air, and preferably also for free. In this case, I would like as simple and safe instructions as possible, for example, using existing software on audience computers.
')
The organizers successfully make the festival using voice conferences in Skype. The natural idea was to get together in a conference and somehow wrap it in a radio. And to receive calls, run a second Skype on your computer, with a different account, and at the right time after receiving the call, wrap it up on the radio and in the conference (and also the conference in it).

All of the following applies to Linux. I deliberately do not provide exact package names, since they may differ in different distributions. I should also immediately warn you that I have not been working with Windows for many years, and I have no idea how to do the same in it.

A note about the screenshots: they were not made at the very last moment. They reflect the essence, but not the exact settings that were used. If something is different, you should believe the text, not the pictures.

Software

JACK (the recursive acronym “JACK Audio Connection Kit”) is a zero-delay synchronous sound server. It was controlled using qjackctl
Skype , two instances
Virtual sound cards are made through the ALSA snd-aloop module
The alsa_in and alsa_out programs that come with JACK
Switching Center - jack_mixer
Gate and compressor for the microphone, as well as the output limiter before the air - from the package Calf Studio Gear
Icecast - broadcast server
JACK to Icecast - Darkice feeder , and Darksnow frontend to it
Player - Audacious

Switching and configuration

So, in the center of the system is the JACK server, which runs on my external sound card. A microphone and headphones are connected to this card. There are no special details here.

Skype

Skype supports ALSA and Pulseaudio (so that it is dead), and does not support JACK. To wrap Skype in JACK, you have to build crutches using the ALSA virtual sound card module - snd_aloop.

If you simply load this module, it will create one virtual Loopback sound card with two devices (pcm0 and pcm1), each with eight streams (sub0..sub8). If you output audio to the first stream of the first device (pcm0p / sub0), you can record this sound from the first stream of the second device (pcm1c / sub0). In this case, the data format is set by the first application that opened the device: if, say, you started capturing on pcm1c / sub0 in 44100 Hz mode signed 32bit little endian, then you must play on pcm0p / sub0 in exactly the same format. The module is not able to convert anything, it just pretends to be a sound card and shuffles the data back and forth.

Skype in Linux in ALSA mode is a rather capricious application. It opens the sound card by all means in the mode of 16 bit 48 kHz, mono for capture, and 16 bit, 48 kHz, stereo for playback; if the card refuses to work in this mode, Skype will either report that it is impossible to initialize the sound device, or the sound (usually captured) will be severely distorted.

We need two Skype. For convenience, I would like each of them to work with their card - then it will be easier to adjust the switching. If you read modinfo snd_aloop, you can find out that the module accepts several very useful parameters, among which are index numbers (yes, there are several of them), the names of virtual maps and the number of streams in each. That is, we load the module in this way:

modprobe snd-aloop enable=1,1 index=2,3 id=Chat,Incoming pcm_substreams=1,1

In this case, the system will create two virtual cards, numbered 2 and 3, with the names Chat and Incoming, respectively, each of which will contain one stream. You can view the properties of these cards in / proc / asound / card2 and ... / card3. (These index numbers on my system were dictated by the fact that I already have a built-in card with index 0 and a USB card with index 1.)

The first Skype starts as usual. To run the second, you need a special command line:

screen skype --dbpath=~/.Skype.vrnfest --secondary

I ran it on the screen to untie it from the terminal. Here, dbpath is the path to the Skype profile (the default is ~ / .Skype), and - second is to start it again.

Each Skype is obviously configured to work with its virtual sound card, with devices 0. There is a subtle point about how Skype works with ALSA. To play the melody of the call and for the sound that came from the network itself, it can use different devices. If you specify the same for the “call” and “columns” in the settings, it will open the device two times, falling on different streams. A hardware card usually mixes them, but snd-aloop is not; In addition, I have limited the virtual cards to one stream, and in this case, when you press the "accept call" button, a reset will occur with the comment "failed to initialize the sound device." Therefore, in both Skype, I indicated a physical built-in sound card for the call.

Now, after Skype is configured, it is required to wrap the second ends of the virtual cards in JACK, for which the programs alsa_in and alsa_out are used. Since Skype is capricious, it must first occupy sound devices in order to customize them. Therefore, we call from one Skype to another, and accept the call. While the call is hanging, you can run bundles:

screen -dmS chat_in alsa_in -d hw:2,1 -j chat_in
screen -dmS chat_out alsa_out -d hw:2,1 -r 48000 -q 1 -c 1 -j chat_out
screen -dmS incoming_in alsa_in -d hw:3,1 -j incoming_in
screen -dmS incoming_out alsa_out -d hw:3,1 -r 48000 -q 1 -c 1 -j incoming_out

As you can see, the names of JACK clients are also shown here - so as not to confuse which of the alsa_in and alsa_out is connected with what. We use the device 1 virtual cards.

So, now Skype is wrapped in JACK, and:

from chat_in: capture_1 we take the sound from the conference of the organizers (there is still capture_2, but it will not be used)
in chat_out: playback_1 we send sound to the conference of the organizers
from incoming_in: capture_1 take the sound of an incoming call
in incoming_out: playback_1 we send the sound intended to the called

Player, microphone, headphones (monitor) and broadcast

With headphones everything is simple - they are visible as two channels, system: playback_1 and system: playback_2. The microphone is physically connected to the first input of the card on which JACK is running, so it appears as system: capture_1. However, you can not just take and use the microphone. First, I want the gate to cut off unnecessary sounds, and second, I want to compress the rest to be similar in dynamics to the signal coming from Skype (Skype itself compresses the dynamic range quite strongly). To do this, run calfjackhost and add the Gate and Compressor plugins to it. For my case, I picked up the following settings:

Gate: threshold -36 dB, ratio 3, knee 9 dB, attack 20 ms, release 450 ms, max reduction -inf dB
Compressor: gain 10 dB, threshold -18 dB, ratio 3.5, attack 20 ms, release 250 ms, knee 9 dB

The broadcast will go to the icecast using darkice, which is configured via darksnow. There is nothing special there, except for the output limiter before darkice, which should align the signal with the level. This is the “Limiter” plugin on calfjackhost, with this setting:

Limiter: input gain: 12 dB, lookahead: 10 ms, limit: -1 dB, release: 300 ms

The player will appear in JACK when the first track starts and will remain so. He will connect to the physical output (system: playback), you need to disable it.

jack_mixer and switching

It remains to connect everything together, with the ability to quickly start and cancel the direction of sound streams. For this, the program jack_mixer was used. It is rather primitive and with an odd-looking interface, but its functionality turned out (almost) enough.

Run it and add it via the menu:

Three incoming mono channels: mic, chat_in, incoming_in
Incoming stereo channel: player
Four outgoing channels (they are always stereo): monitor, chat_out, incoming_out, radio

This will cause jack_mixer to declare a bunch of inputs and outputs in JACK:

Inputs: jack_mixer: chat_in, ...: incoming_in, mic, player_L, player_R
Outputs: jack_mixer: chat_in Out, ...: chat_out R, chat_out L, incoming_in Out, incoming_out L, incoming_out R, MAIN L, MAIN R, mic Out, Monitor L, Monitor R, monitor L, monitor R, player Out L , player Out R, radio L, radio R

The setting can be saved to a file for reuse.

It can be seen that for each input there is an exit with the same name and a postfix "Out". This is the same signal as the input, but after the fader. All stereo channels have outputs L and R. There are two special stereo outputs - MAIN and Monitor (with a capital letter). MAIN is a normal stereo output, and the Monitor will receive a copy of the output under which the Mon button is pressed in the interface. I did not use these two outputs: I did not need the functionality of the monitor, and MAIN would use it if I could rename it to something more specific.

Also in each channel appears on the button Mute, corresponding to each output. From these mute buttons the mixer commutation matrix is typed. This is the main functionality for which the program is used.

In general, it is necessary to connect everything (“switch”):

system: capture_1 -> calf: gate_in_l, calf: gate_out_l -> calf: compressor_in_l, calf: compressor_out_l -> jack_mixer: mic
chat_in: capture_1 -> jack_mixer: chat_in, jack_mixer: chat_out L -> chat_out: playback_1
incoming_in: capture_1 -> jack_mixer: incoming_in, jack_mixer: incoming_out L -> incoming_out: playback_1
jack_mixer: radio L -> calf: limiter_in_l, calf_limiter_out_l -> darkice: left, and the same for the right channel
jack_mixer: monitor L -> system: playback_1, and the same for the right channel
audacious_jack: out_0 -> jack_mixer: player_L, and the same for the right channel

The work of the radio bridge

So, we collect the conference in one of Skype, we make sure that no one wheezes, does not give feedback, and so on. Everyone warns others that the broadcast is on and it would be better if they did not make extraneous sounds.

In the player, a playlist is recruited in which background music is highlighted (against which background conversations between presenters and callers will be heard). To do this, I used the feature of several playlists in Audacious (playlists are displayed as tabs that can be renamed).

Conference participants, as well as callers, cannot listen to the broadcast itself. First, it will confuse them, as it is delayed for all buffering, and is 10-30 seconds; besides, it is highly advisable to use headphones to ensure that there is no feedback (radio-> microphone).

Therefore, it is necessary to copy the content of the broadcast to the conference, that is, the output of the player; The caller must copy the sound from the conference; background music is optional.

In the end I came to the next model. Our system can be in one of three public basic states: presenters talking, talking to a caller and playing music. In fact, during the conversation, the presenters or the music can receive an incoming call, during which the caller is explained what is possible and what is not, they are trying to ensure that he does not create feedback either. After that, they declare it to be the moderator (text or voice) that there is a caller, and when necessary, commute him.

These states are actually the states of the switching matrix. It is immediately obvious that some buttons of the matrix will be immediately brought into a certain state and will never change; others will turn on and off. Constantly will include:

mic-> monitor - to hear yourself in headphones
mic-> incoming_out - either there is still no caller and the signal is rejected by snd-aloop, or he should hear the operator
incoming_in-> monitor - either there is still no caller and there is silence at the entrance, or the operator needs to hear it
player-> radio - the player always plays something, be it music or background for talking
player-> monitor - for the operator to hear the music
player-> chat_out - music to the conference

The player input level is adjustable depending on whether it is a background or a track.

Always disabled:

chat_in-> chat_out - conference does not need feedback
incoming_in-> incoming_out - incoming does not need feedback
player-> incoming_out - inbound does not need to hear background music

In the process of conducting the broadcast, the operator can make comments by the leading voice or text, and the voice should not be broadcast (mic-> radio in the Mute position).

Listening to music

In this state only music goes to the radio. The operator and the conference can communicate if they manage to drown it out, because the music is broadcast to the conference. In addition to permanent, included:

mic-> chat_out - for the conference to hear the operator
chat_in-> monitor - for the operator to hear the conference

For some time before the end of the track you need to warn the leading of this. If there is an incoming call, go to its reception: turn off the conference and music in the monitors, take the call, then we inform the conference about it and wait for the end of the track.

Chatter on the air

Here the music goes on the air (quietly) and the conference. Routing is as follows:

mic-> chat_out
mic-> radio - if the operator participates in the discussion
chat_in-> radio

If an incoming call is received, the microphone is disconnected from the conference and broadcast, and the conference is disconnected from the monitor. It is necessary to warn the conference that there is an incoming, so that they do not announce a new track.

Receive an incoming call

It is possible if there is music playing or there is a conversation in real time (the image reflects the second option). In this case, the operator disables audio communication with the conference and from the radio, and receives a Skype call. Having instructed the caller, he informs the conference about the incoming one and when the presenters are ready to enter it, commutes him into the conference and on the radio.
The following channels are turned on (except those that are always on):

chat_in-> radio, if the call is received in chat mode

Conversation with the caller

In this mode, there is a conversation with the caller, which is broadcast on the air. The following channels are included:

chat_in-> radio
chat_in-> monitor
chat_in-> incoming_out for the caller to hear the conference
incoming_in-> radio
incoming_in-> chat_out for the conference to hear the caller
mic-> chat_out
mic-> radio if the operator also participates in the conversation

Working environment in the process

The easiest way to comment on this is with an illustration:

All other windows that are not needed in the maintenance process are located on another KDE desktop.

The conference can be collected by any participant. It is easier if it is not a radio bridge operator, for unloading it. From the point of view of the conference, the incoming ones are in it on behalf of the operator (for example, in Skype the operator icon is highlighted if the caller speaks).

Radio Festival, held on April 19

According to this scheme, we organized a pre-festival radio bridge . It began on April 19 at 20 pm and was planned for three hours, but in the process it was decided to extend it by one hour.

Prior to the beginning of the ether, tracks for reproduction were selected, an approximate script of the ether was written (in the form of the order of the tracks). The broadcast turned out to be more verbose than originally intended, so the track script did not reach the end.

For all the time on the air listened to one hundred simultaneous listeners. With Skype "for incoming calls" during the broadcast two people were controlled, because it also sent text questions. For visitors, a text Skype conference was also organized for them via Skype text chat. The load on the operator (which I spoke) turned out to be less than expected, so I was able to communicate with them in this chat too.

The broadcast was recorded, after the broadcast the recording was laid out for online listening, as a podcast (after a small cleanup of the moments in which there were gaps). She currently downloaded three hundred people.

The operator three or four times for all the time interrupted the Internet. At this time, the administrator of the relay server operatively connected the backup stream, or started the track that was supposed to be further on the script. The logic of the built-in fallback in Icecast did not suit us.

During the first hour there was no incoming call. Then they started to call, and people who phoned sometimes didn’t believe that they phoned. By the way, quite a funny feeling when the caller recognized you by his voice in Skype, and you do not know this person.

Conclusions and directions for further development

The airwaves was a new thing for us, and several mistakes were made. The following points can be noted:

Management was somewhat more complicated than it could be. This problem is solved, you should write a program with five buttons, switches, which will control the mixer on the OSC or MIDI
Audacious is not the most convenient player for this application. It is lightweight and does not contain the harmful functions of the media library, but the useful function of a simple crossfade was not enough. It is also inconvenient for them to manage programmatically if a separate program is developed for broadcasting.
In the process of playing the track, he goes to the conference at full volume, which makes it difficult to communicate in it (without broadcasting). By the end of the air, I figured out how to fix it, but it was too late. It is necessary to add one more stereo input, “loud”, which will be shown on air only and for the duration of the track; the existing one will be declared “silent” and will be displayed constantly and everywhere (in the conference, on the radio and in the monitor). JACK is a sample-accurate server, so you can not be afraid of phase comb-effects from the addition of the same signals.
The loss of the Internet operator - the most visible of the problems. Already after the ether, a solution was invented and tested how to properly adjust the logic of the fallback so that it works appropriately.
When you stop darkice for some reason, the jackd fell. This is very unpleasant, because in this case it is necessary to restart all jack-applications and set up switching again. And darkice stopped when the connection with the icecast on the Internet was broken. So that the fall of the Internet did not ruin the entire system, a local icecast was raised, and the main one was set up as a repeater from a local

However, in general, we coped with the task. Both the organizers and the audience liked it, and they ask to repeat the broadcast and even to make such ether-radio bridges regular.

Thanks for attention!

Source: https://habr.com/ru/post/220123/

All Articles