Sequence-to-sequence models (seq2seq) have been designed to learn a mapping from arbitrary sized input sequence to an output sequence. Although the models are so versatile that it have been applied to variety of domain successfully, its adapta- tion f...
Sequence-to-sequence models (seq2seq) have been designed to learn a mapping from arbitrary sized input sequence to an output sequence. Although the models are so versatile that it have been applied to variety of domain successfully, its adapta- tion for speech recognition should be reconsidered in that im- plicit alignment between speech signal and its output sequence is different from other domains. Moreover, speech signal is usu- ally much longer than its corresponding text label sequence.
In this thesis, I modified attention mechanisms of sequence- to-sequence models so that it can perform better for speech recognition. The revised model used double attention mecha- nism instead of conventional single attention mechanism so that it can attend relevant part of input sequence more easily. More- over, I generalized existing hybrid score function and achieved best results with multiplicative score function.
Experimental results on TIMIT dataset showed that pro- posed modifications achieve fast convergence and improved recog- nition performance.