Console Player .wav for pc-speaker in Linux

I have long wanted to write a player for pc-speaker and not only notes and monophonic melodies. But at the time when it was relevant (DOS - forever!) I had neither knowledge, nor ability, nor thoughts. Later, I could not get through to it through the Windows DDK and continued to squeak softly in the style of QBASIC SOUND. And the pc speaker’s relevance as a sound device became zero, the proud speaker turned into a beeper and buzzer. However, it did not disappear from the PC anywhere (having gone through all the disk drives in the meantime) still giving itself to be known when turned on and reporting errors. So is it possible in a modern software-hardware user-space environment to play a polyphonic melody or a voice on a pc-speaker? Of course you can - C and Linux will help us with this.
Dedicated to a peasant in a hat and glasses who sends everyone in a certain direction (an author unknown to me, everything works well in DOSBox ).
What is a speaker from a programmer's point of view? This device has two states: on and off - this is what we control the membrane (probably, and maybe some other active element) that makes a sound. The threshold of human hearing on the assurances of biologists about 22KHz above and 20Hz below, so we have to switch states very quickly. Usually, the speaker is controlled by an interval timer, but in this way we can play sounds only of a given frequency and duration, for which there are ready-made software (it is not necessary to access the timer directly) and user interfaces. For example, for a Linux console, a “Do” note of the first octave, with a duration of 1 second:

echo -en "\e[10;263;11;1000]\a" echo -en "\ec"

The second echo command returns the duration and frequency settings to the default state.
To access the speaker control, you need to communicate directly with the 0x61 (hexadecimal 61) port, with its zero and first bits. Zero bit: controls the binding of the speaker to the timer - if 1 then it is controlled by the timer. The first bit: toggles the state of the speaker - 1 on, 0 off. This is the way that we will use.
Now, in order to understand how to play, we must decide what we will play. For playing we will use .WAV files. The data format of the .WAV file can be varied, but usually it means files containing data in pulse code modulation . This is a sequence of values obtained with a given frequency (sampling frequency) from the ADC and recorded in the file as is, for each of the channels. From the point of view of sound, this is the volume at a given time or from the point of view of dynamics - the position of the membrane relative to the point of rest. The maximum value obtained from the ADC at a time sets the bitrate value (bitrate). The collection of data records for all channels at a given point in time is a sample. Speaking about the sound, it is necessary to remember the Kotelnikov theorem , and to say that using these data we can restore the original wave (with large values of the sampling frequency and bitstream) without loss. But in our case it will not work, because the speaker does not have the concept of loudness, or rather, it is, but it cannot be changed - either the entire volume (turned on) or it does not (turned off). Given that the ADC loudness sampling also has finite accuracy, the value of “warm tube sound” becomes clear.
The recorded data is presented in the form of positive and negative values (the movement of the wave that crosses the abscissa axis), but in a slightly different format: half (maximum) is taken as zero (silence). If our data is represented by single-byte samples, then zero will be 256/2 = 128. 256 is the maximum possible number of numbers that can be represented by one byte, or 256 = 255 (0xFF, the maximum value of the number) + 1. Accordingly, if for recording 2 bytes, then (65535 + 1) / 2 = 32768. Values greater than this number are positive, and grow from smaller to larger, values less than this number are negative, and decrease from larger to smaller. For example, translate into a regular number for single-byte values:

 245 (ADC value) => 117 = 245 - 128
  93 (ADC value) => -35 = 93 - 128

As mentioned above, we can only turn the speaker on and off, that is, from the data we receive we will form a square wave, where 1 - these values are greater than zero, 0 - in all other cases. More details about how to play the speaker can be found here , this article served as a starting point in the conceived implementation, thank you very much to its author.
Let's start to carry out our plans. I’ll just make a reservation that only one channel is played, the maximum data size for analysis is 4 bytes per channel (given that one bit is enough for a speaker and the maximum possible quality is 32 times redundant). From the very beginning, I saw two big problems:
First, access to the ports - how simple it was in DOS, to the same degree everything was difficult for me in Windows. However, Linux provides a completely smartest way to directly access ports, you just need to know the root password, or in some other way get the superuser rights, or rather the CAP_SYS_RAWIO privilege for the process you are creating. This feature is called ioperm and allows you to open access to all ports in the range from 0 to 0x3FF. I would like more, use another challenge - iopl , we will not need it.
Access is directly to the input / output ports, after permission, by using macros describing inline inserts in an assembler. You can write (out) and read (in) bytes (b), words (w), double words (l), lines (s), use the pause (_p) after the operation. MAN page contains very little information and more scary that you should not do this, so it’s best to look at the source files for header files. If you use macros with _p , then you also need to open access to the 0x80 port, because the delay is performed by outputting data bytes to this port.
In the program, we first initialize access to the ports, retaining the value that was already in the 0x61 port, for recovery after the operation of the program:
')

 #define SPKPORT 0x61 static unsigned char old61 = 0; unsigned char out61 = 0; if (!ioperm(SPKPORT,1,1)){ //   0x61  old61 = inb(SPKPORT); // out61 = old61 & 0xFE; //   outb(old61,SPKPORT); //,       }

In the end, we will return as it was:

 outb(old61,SPKPORT); //   ioperm(SPKPORT,1,0); //      0x61

The second problem is time. Linux is a multi-tasking multi-user environment and you cannot count on monopolistic continuous ownership of a resource (in particular, a processor). To play a sound, as close as possible (considering that it is a pc-speaker) to the original, we must send data in a strictly fixed interval, deviations from this interval will instantly distort the sound. If the interval is rhythmic (equal), but longer or shorter, the sounds being played will be correspondingly lower or higher than the original ones. If the interval is different each time, the sound will simply not be recognized. All this is complicated by the fact that the minimum sampling frequency is 8000 Hz, which forces us to specify the longest intervals of no more than 1/8000 ~ 125 microseconds. At 22 kHz, this is already 45 microsecond intervals. Such delays, according to MAN, are possible with usleep or nanosleep . But first, where they should have been done:

 char wavdata[0x10000]; //   unsigned int *curdata; //      unsigned int bufsize; //    unsigned int cursampleraw; //    unsigned int datamask; //        short int onechannelinc; //     =   unsigned int samplezero; //   for (i = 0;i < bufsize;i += onechannelinc){ curdata = (void*)(wavdata+i); //     cursampleraw = *curdata&datamask; //    if (cursampleraw > samplezero){ //    out61 |= 0x2; // ,  1  = 1 }else{ out61 &= 0xFD; // ,  1  = 0 } outb(out61,SPKPORT); //   //  ,     }

This is practically the whole program, everything else is reading from a file and preliminary data analysis.
So, what about the delays? Using usleep and nanosleep did not give any results, or rather they gave results, but with pause values less than 10 microseconds. If the pause was longer, the sound broke irreparably, and the point was not the pitch of the sound, but the fact that the pause could not be maintained, there was no rhythm, that is, each pass of the cycle had a different duration. Thinking that the process has too low priority - use nice . But neither the utility nor the software call solved the problem. It remains to try to change the policy scheduler:

 struct sched_param schedio; sched_getparam(0,&schedio); //   schedio.sched_priority = sched_get_priority_max(SCHED_FIFO); //   sched_setscheduler(0,SCHED_FIFO,&schedio); //  SCHED_FIFO

I'm not sure about the correct use, but in this form did not help (this code remained commented out in the program, in case it still helps, then it can be returned to work, as well as nice ). All this was tried following the recommendations from this page . It remained to try only an empty cycle, instead of usleep ... and he gave the result - it was possible to hear not only music, but also speech.
Everything worked. It inspired hope, but foreshadowed bad consequences when transferring to other machines. In the last link, there was a couple of paragraphs about the fact that output to ports gives a delay of about 1 microsecond, I reacted with skepticism, although the macro outb_p for output to the port with a delay was guided by the same principle. A pause was added to the main loop:

 short int pause; for (i = 0;i < bufsize;i += onechannelinc){ curdata = (void*)(wavdata+i); cursampleraw = *curdata&datamask; if (cursampleraw > samplezero){ out61 |= 0x2; }else{ out61 &= 0xFD; } for (k=0;k<pause;k++)outb(out61,SPKPORT); //    }

- after which the program was tested on a working server, unexpectedly for me, nothing broke and everything worked. Possible problems of reading from a file during playback (fading between reading cycles) also did not occur on any of the tested computers: the file is read with 64KB buffer, with no noticeable distortion at that. The presence of any extraneous code in the main loop does not affect the sound quality in any way, unless this code is not system calls. As a result, I came to the conclusion that the more powerful the computer, the better our speaker will sound, paradoxically. If you go towards reducing the power, then at some point everything will break on multitasking systems, but by switching to single-tasking systems, we will again achieve the result.
Honestly, I was not sure of any positive outcome, as there were two normal versions of this program for DOS. First, we prohibit all interruptions (we remain the monopoly owner of everything) and execute the code in the forehead as described above, considering the delays in ticks in an infinite loop. Second, we set the timer to the maximum frequency (the minimum interval is just about 1 microsecond), pick up the 0x8 interrupt (IRQ0) and issue the data to the speaker at our own calculated intervals, preventing anyone from interfering with this process. Both of these options in the Linux user-space environment are not working, but I am glad that everything turned out so strange.
Now a few lines about what's in the rest of the code. Basically, this is parsing the header of the .WAV file, having found this description , I disassembled all the fields, checked them, but the first file downloaded from the Internet was of the wrong format. Then, turning to this document and simplifying the analysis, taking into account the new data, downloaded the second file from the Internet, which put me at a standstill by the lack of a field in 2 bytes before the fact chain, in the end I had to reduce the checks in order to achieve some reasonable amount arbitrary files.
The analysis of the .WAV header contains a check for data types, using the sizeof construct, it is assumed that the used data types are int in 4 bytes, short int in 2 bytes and char - 1 byte. In this form, you can compile for 64-bit systems. If this is not the case, then the check should not pass, and the program will generate an error about the unsupported .WAV format.
Also in the code there is an attempt to visualize the process, but combining with the sound, we get only croaking and crackling, so the graph can be seen, but without sound. By the way, you can estimate how much slower the playback takes place (or compare your visual and sound sensations) when using usleep as a delay.
This is my first program specifically for Linux in a compiled language, so for those who are looking for conio.h from Borland C, it is not here, but everything is much better: ESC-sequences (or man console_codes) replace almost everything except kbhit (it's just one of the reading modes), and getting the console screen sizes, but here you need to access the device directly using ioctl (man console_ioctl):

 struct winsize scrsize; ioctl(STDOUT_FILENO,TIOCGWINSZ,&scrsize); //scrsize.ws_col -    //scrsize.ws_row -

You can pick up the program here: playwav.zip - several .WAV files are also included in the archive. You can compile easily, including tested on 64-bit systems:

 gcc playwav64.c -o playwav64 chmod +x playwav64

You can run with three parameters, the first - the file to play,

 sudo ./playwav64 file.wav

the second is the time multiplier: the more it is, the lower the tone is, if the second parameter is not a number or it is not, then the value is used by default 650000,

 sudo ./playwav64 file.wav 500000

the third is any value, informs the program that it is necessary to display the graph on the screen (without sound).

 ./playwav64 file.wav sw

Sound only with root privileges. It's better not to run X, although it works in KDE too. If you run in an SSH session, then the sound will be on the physical machine to which you are connected.
I checked it to the maximum, on all computers available to me (it worked everywhere), maybe in the code I messed up a lot, I apologize for that.

Source: https://habr.com/ru/post/138144/

All Articles

Console Player .wav for pc-speaker in Linux

More articles: