Open Access
Issue
ITM Web Conf.
Volume 59, 2024
II International Workshop “Hybrid Methods of Modeling and Optimization in Complex Systems” (HMMOCS-II 2023)
Article Number 04003
Number of page(s) 10
Section Adaptive Intelligence: Exploring Learning in Evolutionary Algorithms and Neural Networks
DOI https://doi.org/10.1051/itmconf/20245904003
Published online 25 January 2024
  1. N. Jaitly, G. E. Hinton, Vocal tract length perturbation (VTLP) improves speech recognition, in Proceedings of the International Conference on Machine Learning, ICML, Workshop on Deep Learning for Audio, Speech, and Language Processing, 2021 June 2013, Atlanta, USA (2013) [Google Scholar]
  2. T. Ko, V. Peddinti, D. Povey, S. Khudanpur, Audio Augmentation for Speech Recognition, in Proceedings of the Interspeech, 6-10 September 2015, Dresden, Germany (2015) [Google Scholar]
  3. D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, Specaugment: A simple data augmentation method for automatic speech recognition, in Proceedings of the Interspeech, 15-19 September 2019, Graz, Austria (2019) [Google Scholar]
  4. V. Panayotov, G. Chen, D. Povey, S. Khudanpur, LibriSpeech: An ASR corpus based on public domain audio books, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP, 19-24 April 2015, Brisbane, Queensland (2015) [Google Scholar]
  5. A. Rosenberg, Y. Zhang, B. Ramabhadran, Y. Jia, P. Moreno, Y. Wu, Z. Wu, Speech recognition with augmented synthesized speech, in Proceedings of the IEEE automatic speech recognition and understanding workshop, ASRU, 14-18 December 2019, Sentosa, Singapore (2019) [Google Scholar]
  6. A. Hannun, C. Case, J. Casper, B. Catanzaro, Deep Speech: Scaling up end-to-end speech recognition, arXiv preprint arXiv:1412.5567 [Google Scholar]
  7. Y. Koizumi, S. Karita, A. Narayanan, SNRi Target Training for Joint Speech Enhancement and Recognition, in Proceedings of the Interspeech, 18-22 September 2022, Incheon, Korea (2022) [Google Scholar]
  8. Y. Yang, A. Pandey, D. Wang, Time-Domain Speech Enhancement for Robust Automatic Speech Recognition, in Proceedings of the Interspeech, 20-24 August 2023, Dublin, Ireland (2023) [Google Scholar]
  9. A. Gulati, J. Qin, C. Chiu, N. Parmar, Conformer: Convolution-augmented Transformer for Speech Recognition, in Proceedings of the Interspeech, 25-29 October 2020, Shanghai, China (2020) [Google Scholar]
  10. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, Attention Is All You Need, in Proceedings of the Neural Information Processing Systems, NIPS, 4-9 December 2017, Los Angeles, USA (2017) [Google Scholar]
  11. K. Kim, F. Wu, Y. Peng, E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition, in Proceedings of the IEEE Spoken Language Technology Workshop (SLT), 9-12 January 2023, Doha, Qatar (2023) [Google Scholar]
  12. Y. Luo, N. Mesgarani, Conv-TasNet: surpassing ideal time-frequency magnitude masking for speech separation, IEEE/ACM Trans. Audio Speech Lang. Process. 27, 8, 1256–1266 (2019) [Google Scholar]
  13. X. Hao, X. Su, R. Horaud, X. Li, FullSubNet: a full-band and sub-band fusion model for real-time single-channel speech enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 6-11 June 2021, Toronto, Canada (2021) [Google Scholar]
  14. A. Tan, D.A. Wang, Convolutional recurrent neural network for real-time speech enhancement, in Proceedings of the Interspeech, 2-6 September 2018, Hyderabad, India (2018) [Google Scholar]
  15. P. Loizou, Speech Enhancemen: Theory and Practice, Second Edition (CRC Press, 2013) [CrossRef] [Google Scholar]
  16. R. Nasretdinov, I. Ilyashenko, A. Lependin, Two-Stage Method of Speech Denoising by Long Short-Term Memory Neural Network, C.C.I.S, 1526, Springer, Cham (2022) [Google Scholar]
  17. Y. Hu, Y. Liu, S. Lv, M. Xing, DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement, in Proceedings of the Interspeech, 25-29 October 2020, Shanghai, China (2020) [Google Scholar]
  18. C. Zheng, X. Peng, Y. Zhang, Interactive Speech and Noise Modeling for Speech Enhancement, arXiv preprint arXiv:2012.09408 [Google Scholar]
  19. R. Nasretdinov, I. Ilyashenko, J. Filin, A. Lependin, Hierarchical Encoder-Decoder Neural Network with Self-Attention for Single-Channel Speech Denoising, CCIS, 1733, Springer, Cham (2023) [Google Scholar]
  20. C. Reddy, H. Dubey, K. Koishida, A. Nair, INTERSPEECH 2021 Deep Noise Suppression Challenge, in Proceedings of the Interspeech, 30 August - 3 September 2021, Bron, Czechia (2021) [Google Scholar]
  21. A. Ali, S. Renals, Word Error Rate Estimation for Speech Recognition: e-WER, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 15-20 July 2018, Melbourne, Australia (2018) [Google Scholar]
  22. Itu-t recommendation P.800: Methods for subjective determination of transmission quality, (1998) [Google Scholar]
  23. R. Chandan, V. Gopal, R. Cutler, Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, 6-11 June 2021, Toronto, Canada (2021) [Google Scholar]
  24. R. Sennrich, B. Haddow, A. Birch, Neural Machine Translation of Rare Words with Subword Units, in Proceedings of the of the 54th Annual Meeting of the Association for Computational Linguistics, ACL, Berlin, Germany (2016) [Google Scholar]
  25. S. Watanabe, T. Hori, S. Karita, T. Hayashi, ESPnet: End-to-End Speech Processing Toolkit, in Proceedings of the Interspeech, 2-6 September 2018, Hyderabad, India (2018) [Google Scholar]
  26. S. Kim, T. Hori, S. Watanabe, Joint CTC-attention based end-to-end speech recognition using multi-task learning, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processings, ICASSP, 5-9 March 2017, New Orleans, USA (2017) [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.