I Preprocess Signal ProcessingApplying a window transform The original digitized data is divided into frames. The digitized data of each frame is subjected to Hann's scaled window transformation using the formula (2). Hanna's window function is:

(one)
Scaled version of the Hanna window function:

(2)
The transition to the frequency domain is carried out by applying the
discrete Fourier transform (DFT):

(3)
Model of the outer and middle earThe frequency response of the outer and middle ear should be calculated using the following formula:

(four)
By the formulas (4) the weight coefficient vector is calculated as follows:

(five)
Using these weights (5), the
weighted DFT energy is calculated:

(6)
Decomposition of the critical hearing bandBelow are the formulas required for the transformation to the Bark scale (7) and the inverse transformation (8):

(7)
where z is measured in Barks.

(eight)
Frequency bandsFrequency bands are determined by specifying the lower, center and upper frequencies of each band. These values in the Bark scale are given as follows:

(9)
The inverse transformation is performed by the following formulas:

(ten)
The value of i = 1, 2, ...,

.
Bandwidth energyFor the i-th frequency band, the energy contribution from the k-th fundamental frequency of the DFT is calculated by the following formula:

(eleven)
Then the energy of the i-th frequency band is equal to:

(12)
Below is the final formula for the energy of the i-th frequency band:

(13)
Internal ear noiseTo compensate for the internal noise in the ear itself, we introduce a surcharge value for the energy of each frequency band:

(14)
where the internal noise is modeled as follows:

(15)
Energies

will be called further
images of height .
Distribution energy within one frameThe characteristic of the propagation energy in the Bark scale is calculated as follows:

(sixteen)
Where

(17)
The function
S (i, l, E) has the following form:

(18)
Where

(nineteen)
Below are the formulas for calculating the terms

and

:

(20)
and

(21)
Energies

-
images of uncommon excitations .
Energy filtrationLet n be the frame index (frames are indexed starting from n = 0). Then the energy of the n-th frame corresponding to formula (16) is denoted as:

Energy filtration is performed according to the following formula:

(22)
Where

- time constant for extinct energy. Initial filtering condition:

End values

-
images of excitement .
Time constantsThe time constant for filtering the i-th band is calculated as follows:

(23)

can be calculated as:

(24)
Ii. Image processingThe figure 4 below shows the scheme of preliminary calculations described in the previous chapter.

Figure 4 Signal preprocessing circuit
The indices R and T denote the original and recovered audio signals, respectively. The index k denotes the index of the frequency band (total frequency bands - 109), and the index n - the frame number. For recurrent formulas at this and the next stage (stage III), zero initial conditions are always chosen.
Processing images of excitationsThe inputs for this stage of the calculation are the images of the excitations

and

calculated by the formula (22) for the original and tested audio signals, respectively.
Correction images of arousalFirst, filtering is performed for both audio signals by the formula:

(25)
Time constant

calculated by formulas (23) and (24), but with

,

. The initial condition for filtering is set to 0.
Next,
the correction factor is calculated:

(26)
Images of excitations are adjusted as follows:

(27)
Adaptation of excitation imagesUsing the same time constants and initial conditions as in the correction of the images of excitations, the output signals calculated by the formula (27) are smoothed in accordance with the following formulas:

(28)
Based on the ratio between the values calculated in (28), a pair of auxiliary signals is calculated:

(29)
If in the previous formula (29) the numerator and denominator are equal to zero, then it is necessary to perform the following actions:

.
If k = 0, then

For the formation of factors for image correction, the auxiliary signals are filtered, using the same time constants and initial conditions as in (25):

(thirty)
Where

(31)

(32)
As an end result of this stage of processing, on the basis of formula (30),
spectrally adapted images are obtained:

(33)
Processing modulation imagesThe inputs to this stage of the computation are images of unpredicted excitations.

and

calculated by the formula (16) for the original and tested audio signals, respectively. The purpose of this section is to calculate the
modulation measures for the spectral envelopes .
First, the
average volume is calculated:

(34)
Next, you need to calculate the following differences:

(35)
The time constants and initial conditions are the same as in the previous section.
The modulation measures of the spectral envelopes are calculated as follows:

(36)
Volume calculationVolume images are calculated according to the following formulas:

(37)
Where

(38)
and

(39)
The parameter c = 1.07664.
The total volumes for both signals are calculated as follows:

(40)
Iii. Calculation of the output values of the psychoacoustic modelThe output characteristics from chapter I are used to calculate the output characteristics of chapter II in accordance with the diagram below (see figure 5).

Figure 5 Image Processing Scheme
In turn, the values of the previous chapter (II) are used to calculate the output values of the variables of the psychoacoustic model (see table 1 and figure 6).

Figure 6 Scheme for calculating the output variables of the psychoacoustic model
In total, the values of 11 variables of the psychoacoustic model are calculated. They are listed in Table 2.

Table 2. Output variables of the psychoacoustic model
For two-channel audio signals, the variable values for each channel are calculated separately, and then averaged. The values of all variables (except for the values of the ADBB and MFPDB variables) for each signal channel are calculated independently of the second channel.
General description of the process of calculating parametersAll values of the output variables of the model are obtained by averaging over all frames of the functions of time and frequency obtained at the previous step (as a result, a scalar value).
The values to be averaged must lie within the limits determined by the following condition: the beginning or end of the data to be averaged is defined as the first position from the beginning or from the end of the sequence of amplitudes of the audio signal, for which the sum of five consecutive absolute values of amplitudes exceeds 200 volts any of the audio channels. Frames that lie outside these bounds should be ignored when averaging. The threshold value 200 is used in case the amplitudes of the input audio signals are normalized in the range from -32,768 to + 32767. Otherwise, the threshold value

is calculated as follows:
(41)
Where

- the maximum amplitude of the audio signal.
Further, the frame index n: starts from zero for the first frame that satisfies the conditions for checking borders with a threshold

and counts the number of frames N up to the last frame satisfying the above mentioned condition.
Modulation window difference 1 (WinModDiff1B)Below is the formula for calculating the
instantaneous modulation difference :

(42)
The value of the instantaneous modulation difference is averaged over all frequency bands.

in accordance with the following formula:

(43)
The final value of the output variable is obtained by averaging formula 43 with a sliding window L = 4 (85 ms, since each step is equal to 1024 digitized values):

(44)
In this case, the so-called
delay averaging is used - the first 0.5 seconds of the signal does not participate in the calculations. The number of skipped frames is:

(45)
In formula 45, operation denotes discarding the fractional part.
Thus, in the formula 44, the frame index includes only frames that go after a delay of 0.5 seconds.
Average Modulation Difference 1 (WinModDiff1B)The value of this output variable of the psychoacoustic model is calculated by the following formula:

(46)
Where

(47)
Delay averaging is also used to calculate this value.
Average Modulation Difference 2 (WinModDiff2B)First, the value of the instantaneous modulation difference is calculated by the formula:

(48)
Then, the modulation difference averaged over the frequency bands is calculated:

(49)
The final variable value of the psychoacoustic model is calculated as follows:

(50)
Where

(51)
Delay averaging is also used to calculate this value.
Noise Volume (RmsNoiseLoudB)Below is the formula for finding the values of the instantaneous volume of noise:

(52)
Where

(53)
Where:

(54)

(55)

(56)
but
Further, if the instantaneous volume is less than 0, then it is set to 0:

(57)
The value of the final output variable of the psychoacoustic model is averaged by the instantaneous volume:

(58)
Delay averaging is used to calculate this value. Together with averaging with a delay, the volume threshold is used to find the value of the instantaneous volume of the noise from which the averaging process starts. Thus, averaging starts from the first value determined by the condition of exceeding the volume threshold, but no later than 0.5 seconds from the beginning of the signal (in accordance with averaging with a delay).
The condition for exceeding the threshold volumeThe instantaneous loudness values of the noise at the beginning of both signals (source and test) are ignored until 50 ms passes after the total volume exceeds one of the signals in one of the signals, the threshold value is 0.1.
The condition of exceeding the threshold can be represented as:

(59)
The following formula is used to calculate the number of frames to be skipped after exceeding the threshold:

(60)
The bandwidth of the original and restored audio signals (BandwidthRefB and BandwidthTestB)The operations of calculating the bandwidths of the original and recovered audio signals are described in terms of operations on the output values of the DFT, expressed in decibels (dB). First of all, for each frame the following operations are performed:
• For recovered signal: the largest component is located after the 21.6 kHz frequency. This value is called the threshold level.
• For source signal: performing a down search starting at 21.6 kHz, the first value is found, which is 10 dB above the threshold level. The frequency corresponding to this value is called the bandwidth for the original signal.
• For recovered signal: performing a search downwards, starting with the bandwidth value of the original signal, is the first value that exceeds the threshold level value by 5 dB. We denote the frequency corresponding to this value as the bandwidth for the recovered signal.
If the found frequencies for the original signal do not exceed 8.1 kHz, then the bandwidth for this frame is ignored.
The bandwidths for all frames are called the
DFT base frequencies.The basic frequency of the DFT for the nth frame is denoted as

for the original signal and how—

for the recovered signal. To calculate the final values of the psychoacoustic model variables, the widths of the bands of the original and restored signals, it is necessary to perform the following formulas, respectively:

(61)

(62)
where the summation is carried out only for those frames in which the main frequency of the DFT exceeds 8.1 kHz.
The ratio of the noise level to the masking threshold (Total NMRB)The masking threshold is calculated using the following formula:

(63)
Where

(64)
The
noise level
is calculated as follows:

(65)
where k denotes the index of the fundamental frequency of the DFT.
The ratio of the noise level to the masking threshold in the k-th frequency band is expressed by the following formula:

(66)
The final ratio of the noise level to the masking threshold (in decibels) is calculated as:

(67)
Frame relative distortion (RelDistFramesB)The maximum ratio of noise to the frame masking threshold is calculated as follows:

(68)
Distorted is considered to be the frame in which the maximum ratio of noise to the masking threshold exceeds 1.5 dB.
The final value of the output variable of the psychoacoustic model is the ratio of the number of distorted frames to the total number of frames.
Maximum probability of detecting distortion (MFPDB)First of all, let's calculate
asymmetric excitation:
(69)
Where

(70)
Next, a
step is calculated
to detect the distortion :

(71)
Where

(72)
The probability of detection is calculated as follows:
(73)
where b is calculated as:

(74)
We calculate the
number of steps above the detection probability threshold:
(75)
Characteristics (73) and (75) are calculated for each channel of the signal. For each frequency and time, the
total detection probability and the total number of steps above the threshold are selected as the larger value from all channels:

(76)
where indices 1 and 2 denote the channel number.
For single-channel signals, the above values are calculated as:

(77)
The following computational procedure is performed:

(78)
Where

and the initial condition is zero.
The maximum probability of detecting distortion is calculated by the recurrence formula:

(79)
The final value of the output variable of the psychoacoustic model is calculated as follows:

(80)
Average block distortion (ADBB)First, the
sum of the total number of steps above the detection threshold is calculated:

(81)
Moreover, the summation is carried out for all values for which

The final characteristic is:

(82)
Harmonic error structure (EHSB)The DFT outputs for the original and reconstructed signals are denoted as

and

respectively.
Calculate the characteristic:

(83)
A vector of length M is formed from the values of D [k]:

(84)
Normalized autocorrelation is calculated by the formula:

(85)
Where
Let —C [l] = C [l, 0]. Next, you need to calculate:

(86)
When calculating (85) in case the signals are equal, it is necessary to set the normalized autocorrelation equal to one in order to avoid dividing by 0.
A window function of the following form is introduced:

(87)
The window transform (87) is applied to the normalized autocorrelation:

(88)
Where

(89)
The power spectrum is calculated by the formula:

(90)
The search for the maximum peak of the power spectrum starts with k = 1 and ends when

or

The maximum peak value found is denoted as

Then the final value of the output variable of the psychoacoustic model is calculated using the following formula:

(91)
When calculating this value, low energy frames are excluded. To define low energy frames, a threshold value is entered:

(92)
Where

for amplitudes stored as a 16-bit integer.
The frame energy is estimated using the following formula:

(93)
When calculating the harmonic structure of the error, the frame is ignored if:

(94)
Iv. Rationing of the output variables of the psychoacoustic model
The normalization of the output variables of the psychoacoustic model obtained at the previous step is performed in accordance with the following formula:

(95)
Where

- value of the i-th output variable of the psychoacoustic model, values

and

are shown in Table 3 below.

Table 3. Constants for rationing the values of the output variables of the psychoacoustic model
V. Evaluation of the quality of the recovered signal using an artificial neural network
(96)
where bmin = −3.98 and bmax = 0.22, and the function sig (x) is an asymmetric sigmoid:

(97)
Value

is calculated as follows:

(98)
Where

- the normalized value of the i-th output variable, I - the number of output variables (equal to 11), J - the number of neurons in the hidden layer (equal to 3),

- the weights and offsets of the neural network are shown in Tables 4-6 below.

Table 4 Neural Network Weights
<

Table 5 Neural Network Offsets

Table 6 The weights and displacements of the neural network
This metric value (PEAQ) is a real number belonging to the [-3.98; 0.22].