Correlation analysis of voice communications in Russian language in the airspace of the Russian Federation

. The spectral characteristics of audio recordings of typical phrases of radio exchange of air traffic controllers and civil aviation pilots are investigated. Based on the obtained spectral characteristics, experimental covariance functions are constructed that show statistical relationships within a single voice message in the time domain. The main attention is paid to the possibility of describing speech messages using mathematical models of random sequences.


Introduction
The rapid development of transport systems, including air, land and water vehicles, is currently accompanied by the active use of automated or automatic systems of various kinds. However, such systems are more focused on the development of autonomous vehicles. Today, such devices are used only for solving highly specialized tasks and usually are performed completely offline, i.e. in the absence of a person inside. In this regard, such vehicles a priori have lower safety requirements.
A completely different situation is in the field of passenger transportation. An important element that ensures flight safety remains radio communication today. In accordance with the Federal Aviation Rules in Russia [1], the use of the Russian language is allowed on the territory of the Russian Federation. This, on the one hand, facilitates mutual understanding between pilots and controllers, but on the other hand, requires both of these skills to quickly switch from one language to another. In noisy environments, voice recognition accuracy can be enhanced by special digital signal processing systems. In this paper, we propose methods for describing the voice messages of some typical phrases using mathematical models of random processes [2][3][4], which can also be used in differern processing algorithms.

Modern mathematical models for representing time sequences
The properties of heterogeneity and multidimensionality inherent in real big data remain outside the scope of most mathematical models that either describe spatially homogeneous signals or are too complex for the subsequent formation of processing and prediction * Corresponding author: nikita-and-nov@mail.ru algorithms. The most relevant for the description and processing of multidimensional inhomogeneous random fields and their time sequences is the use of a mixture of deep Gaussian processes. A feature of these processes is a characteristic nested structure, when the model parameters themselves are realizations of a lower level process. In [5][6][7], a good approximation quality was shown, which is achieved when describing real processes by deep Gaussian models. In [5], the possibility of describing unsteady random processes using this approach was shown. Unfortunately, the published work did not resolve the issue of describing the correlation and dynamics of changes in this correlation within the same model layer. This means that the behavior of the reference of a deep Gaussian process depends only on hidden embedded random variables at this point and auxiliary random additions, but not on neighboring samples. This significantly reduces the scope of such models. Therefore, at the initial stage of the study of mathematical models suitable for describing speech messages, we will use autoregressive models or autoregressive models with multiple roots of characteristic equations [8,9]. These models are well studied, possess properties important from the point of view of implementation (for example, quasi-isotropy, the minimum number of necessary parameters) and allow one to determine not only random processes, but also fields of arbitrary dimension, as well as to form subsequent processing algorithms that are qualitatively superior to algorithms for most random fields based on conventional autoregressions. In addition, more complex doubly stochastic models, which are described in [10][11][12], can be used to describe inhomogeneous signals.

Spectral characteristics of typical radio messages
The papers results obtain by using Audacity editor to record 20 typical phrases of radio exchange in Russian. Table 1 presents the phrases and the distribution of the number of records belonging to various male announcers. Figure 1 shows the spectra of the phrase "Preflight communication check" (sounds like "Predpoletnaya proverka svyazi"), which were also formed in Audacity. The signal level in [dB] is plotted along the Y axis, and the frequency in [Hz] is plotted along the X axis.  Analysis of Figure 1 shows that the same phrase recorded by different speakers generally has different characteristics of voice messages in the frequency domain. However, the presented dependences also have pronounced similarities associated with the presence of harmonic oscillations in the number of words and the general attenuation of the spectral components with increasing frequency, as well as with the close arrangement of the peaks and valleys of the spectrum.

17
Course capture 3-4-3 18 Landing allowed 3-3-4 19 Parking lot one one on RD three 4-3-3 20 Aeroflot one five four 3-4-3 It should be noted that all phrases presented in Table 1 was recorded in Russian language. Therefore, it was not the variant that is presented as a translation in the table that was recorded, processed, and studied, but the one that was originally received in Russian. This is due to the fact that in the airspace of the Russian Federation on domestic flights radio communication in Russian can often be used.  According to the graphs of Figure 2, a comparison was made with the source data for each implementation. As a result of this study, we can conclude that, on the whole, it did not lead to severe distortions of the shape of the spectrum of signals, which is well explained by the same features characteristic of each sound, and that in the future may serve as the basis for the development of machine learning algorithms for recognizing speech message data.

Correlation representation of typical phrases and identification of parameters
However, all previously considered values characterize the signal level in [dB], and the change in the signal over time is more convenient to consider in amplitude values, for example, [V]. Given the level of the reference voltage U 0 = 1 mV, we use the expression where S is spectrum in [dB], U is spectrum in [V], 0 U is reference voltage.
Using the formula (1), the desired signal level can be expressed in [V] ( Next, we use the following relationship of the amplitude spectrum of a signal with its energy spectrum or power spectral density where ) (F G is signal energy spectrum, | . | is module calculation operation, ) (F S is signal spectrum. Finally, we can use the inverse Fourier transform to find the covariance function (CF) in accordance with the expression where 2  is maximum transformation value. Analysis of the curves in Figure 3 shows that most of the obtained CFs have a decreasing character. The presence of many extremes is associated with quasiperiodicity, which may be present in a speech signal. In addition, it should be noted the possibility of forming autoregressive sequences with similar CFs. To identify the parameters of the model, we write an arbitrary order autoregressive model , ...
where m is order of autoregression.
The CF of sequence (5) also obeys the recurrence expression (6). Substitution values k = 1, 2, … , m in (6) leads to a well-known system of Yule-Walker equations [13], which, for example, for second-order systems takes the following form ).  (1), R x (2), …, R x (m). We will use the previously found CF values, then we can compose a system of equations with 12 unknowns  An analysis of the obtained dependences allows us to conclude that the higher the order of the model, the more accurately it is able to convey the correlation properties of speech messages. However, in some cases (Figure 4 a), lower-order models also provide CFs which are close to real, while significantly reducing computational costs.

Simulation of voice messages using autoregressive models
Let us implement the autoregressive models of different orders according to the identified parameters for typical message No. 1 in accordance with the obtained solutions of the system (8). Figure 5 shows the obtained random processes based on the 12th-order (solid line), 6th-order (dash-dotted line) and 3rd-order (dashed line) models. The X-axis represents the numbers of samples, the Y-axis represents the values of the random process.   Thus, it is possible to simulate speech messages for which CFs will be close enough to CFs based on real data. However, in some cases, models may diverge. In this case, you must either change the order of the model, or introduce additional restrictive conditions.

Conclusion
A statistical analysis of typical phrases of voice messages of radio exchange is carried out. The use of autoregressive models of random processes to describe such messages is proposed, and the capabilities of simulating signals with close correlation properties are established. The main results of the work are as follows: 1) 20 typical phrases of radio exchange were recorded, in different voices so that each phrase would have 10 entries. The necessary sample was collected for research.
2) On the basis of a representative sample, spectral characteristics were constructed for each studied voice message, the data were averaged, and the CF values of the studied voice messages were obtained.
3) Using the Yule-Walker equations, the parameters of autoregressive models are identified by the calculated values of the CF of the studied voice messages. It is shown that this identification method makes it possible to obtain CFs arbitrarily close to real ones when the order of the model is increased. 4) Signals were generated using autoregression models of different orders, the parameters of which were identified on the basis of experimental values of CF.
Thus, the study allows us to conclude that it is possible to increase the efficiency of processing voice messages during radio communications in civil aviation based on mathematical modeling of typical phrases.