Issue |
ITM Web Conf.
Volume 59, 2024
II International Workshop “Hybrid Methods of Modeling and Optimization in Complex Systems” (HMMOCS-II 2023)
|
|
---|---|---|
Article Number | 04003 | |
Number of page(s) | 10 | |
Section | Adaptive Intelligence: Exploring Learning in Evolutionary Algorithms and Neural Networks | |
DOI | https://doi.org/10.1051/itmconf/20245904003 | |
Published online | 25 January 2024 |
Speech enhancement augmentation for robust speech recognition in noisy environments
Department of Information Security, Institute of Digital Technology, Electronics and Physics, Altai State University,
61 Lenina pr,
Barnaul,
656049,
Russia
* Corresponding author: andrey.lependin@gmail.com
Abstract. The use of augmentations as a data enrichment method has become an important element in improving the performance of speech recognition systems. To work effectively in noisy conditions, augmentation is usually used to simulate the presence of background noise. However, the quality of speech recognition on samples pre-processed by noise reduction models does not increase. This paper proposes a new approach to speech data augmentation when training ASR systems, intended for their joint use with models for speech enhancement. It was based on the creation of several additional data samples containing speech samples processed by the enhancement model. The proposed approach was tested on the E-Branchformer neural network model using data from the Librispeech set. The quality of speech samples was assessed using the DNSMOS metric. By means of a 100-hour sample of clean speech samples it was shown that the proposed augmentation allows for an improvement in the WER metric of more than 4% in absolute value compared to the generally accepted approach based on adding noisy speech samples. Experiments on 960-hour data demonstrated the robustness of this approach as the training set size increased.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.