📜 ⬆️ ⬇️

Measuring speech intelligibility: a modulation approach

In previous articles on speech intelligibility, I looked at objective methods and formant approach. This topic will be the final one, and we will consider a modulation approach to measuring speech intelligibility both in communication systems and indoors.

1970 can be considered the birth of the modulation method. T. Hodgast and G. Stinekenr developed a system that used a test signal in the form of noise, amplitude modulated by a fixed frequency signal with a rectangular envelope. The spectrum of the carrier noise was similar to the spectrum of long-term speech. As a result, it was possible to take into account the influence of noise, clipping, and reverberation in assessing intelligibility. Subsequently, a specially designed device made it possible to measure the speech transmission index (STI).

STI


STI is the value that determines the effect of the path on speech intelligibility. It is closely related to such a channel characteristic as the modulation transfer function (MTF) modulation. MTF is a measure of how well the amplitude modulation of a signal is maintained in a particular path when it is transmitted from input to output.

We will not climb into the wilds of physical justification and theoretical calculations. I think it will be enough just to give an expression to calculate the MTF:
')


F is the modulation frequency;
T is the early reverberation time;
S / N is the signal-to-noise ratio in dB;

The first factor takes into account the influence of reverberation, the second - the influence of noise. But despite the simplicity, I do not like this formula, if only because it was obtained as a result of using mathematical models. Therefore, I think that it would be more expedient to calculate the MTF using the Schröder formula:





h e (t) is the impulse response of the system;
h k (t) is the impulse response of the octave filter;

Now we have everything to evaluate STI using a simplified method in a small room:

  1. We estimate 98 MTF values ​​for 14 modulation frequencies (F = 0.63; 0.8; 1; 1.25; 1.6; 2; 2.5; 3.15; 4; 5; 6.3; 8; 10; 12.5 Hz) in each of the seven octave bands with central frequencies 125; 250; 500; 1k; 2k; 4k; 8k Hz. Counting with different modulation frequencies is determined by the uniqueness of the speech apparatus of each person.
  2. Each MTF value is recalculated into an effective signal-to-noise ratio (SNR):





  3. We average the SNR estimates for each octave band:



  4. Calculate the weighted average:



    w k = 0.13; 0.14; 0.11; 0.12; 0.19; 0.17; 0.14.
  5. We calculate the STI using the ratio below:






RASTI & STITEL


RASTI (rapid STI) is a simplified version of the STI method, which takes into account the contribution to the transmission of modulation of only two octave bands with central frequencies of 500 Hz and 2 kHz. In this case, the modulation frequency is 1; 2; four; 8 Hz for an octave band with a central frequency of 500 Hz, and 0.7; 1.4; 2.8; 5.6; 11.2 Hz for an octave band of 2 kHz. After calculating the MTF for these frequencies and bands, then the calculation is performed similarly to the algorithm above.

STITEL (STI for telecommunication systems) is a simplified version of STI, in which only one modulation frequency is used in each of the seven octave bands. The carrier noise for each octave band has a spectral width of half an octave (to avoid affecting adjacent bands) and is radiated simultaneously in each frequency band. Due to simplifications, this method does not take into account the effect of reverberation and non-linear distortion.

Almost end


And now the catch: the above is applicable to the western family of languages, in particular for English speech. The reason lies in the following: there is a good agreement between the STI results and numerous subjective assessments of the measurement of intelligibility in English. In the case of Russian / Ukrainian speech there is no good match. Therefore, it would be best to use the following technique:



(S / N) ef k - averaged estimate of the effective signal-to-noise ratio for each frequency band;
p k - the probability of stay of formants in the k-th frequency band;

Further steps are discussed in some detail in the topic, which was devoted to the formant approach . It also describes some measures for obtaining more accurate results.

Now exactly the end


Basement

  1. Acoustic examination of speech communication channels. Monograph / Didkovsky V.S., Didkovskaya M.V., Prodeus A.N. - Kiev, 2008. 420.
  2. There is a possibility to give a quote to the DBC, Techron, Div. Crown International, Inc., Elkhart, Indiana, 46517, USA

Habramaterialy on a subject

Source: https://habr.com/ru/post/130682/


All Articles