Cell & Cellular Life Sciences Journal (CCLSJ)

ISSN: 2578-4811

Research Article

Application of Hybrid CTC/2D-Attention End-to-End Model in Speech Recognition during the COVID-19 Pandemic

Authors: Bin Zhao*, Mingzhe E and Xia Jiang

DOI: 10.23880/cclsj-16000163

Abstract

Recent research in the field of speech recognition has shown that end-to-end speech recognition frameworks have greater potential than traditional frameworks. Aiming at the problem of unstable decoding performance in end-to-end speech recognition, a hybrid end-to-end model of connectionist temporal classification (CTC) and multi-head attention is proposed. CTC criterion was introduced to constrain2D-attention, and then the implicit constraint of CTC on 2D-attention distribution was realized by adjusting the weight ratio of the loss functions of the two criteria. On the 178h Aishell open source dataset, 7.237% word error rate was achieved. Experimental results show that the proposed end-to-end model has a higher recognition rate than the general end-to-end model, and has a certain advance in solving the problem of mandarin recognition.

Keywords: Speech Recognition; 2-Dimensional Multi-Head Attention; Connectionist Temporal Classification; COVID-19

View PDF

Google_Scholar_logo Academic Research index asi ISI_logo logo_wcmasthead_en scilitLogo_white F1 search-result-logo-horizontal-TEST cas_color europub infobase logo_world_of_journals_no_margin