American Journal of Signal Processing
p-ISSN: 2165-9354 e-ISSN: 2165-9362
2016; 6(1): 19-23
doi:10.5923/j.ajsp.20160601.03
Yuki Takashima 1, Toru Nakashika 2, Tetsuya Takiguchi 1, Yasuo Ariki 1
1Graduate School of System Informatics, Kobe University, Kobe, Japan
2Graduate School of Information Systems, The University of Electro-Communications, Chofu, Japan
Correspondence to: Yuki Takashima , Graduate School of System Informatics, Kobe University, Kobe, Japan.
Email: |
Copyright © 2016 Scientific & Academic Publishing. All Rights Reserved.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/
In this paper, we discuss speech recognition for persons with articulation disorders resulting from athetoid cerebral palsy. Because the speech style for a person with this type of articulation disorder is quite different from a physically unimpaired person, a conventional speaker-independent acoustic model for unimpaired persons is hardly useful to recognize it. Therefore, a speaker-dependent model for a person with an articulation disorder is necessary. In our previous work, a feature extraction method using a convolutional neural network was proposed for dealing with small local fluctuation of dysarthric speech, and its effectiveness was shown in a word recognition task. The neural network needs a training label (teaching signal) to train the network using back-propagation, and the previous method used results from forced alignment using HMMs as the training label. However, as the phoneme boundary of an utterance by a dysarthric speaker is ambiguous, it is difficult to obtain the correct alignment. If a wrong alignment is used, the network may be inadequately trained. Therefore, we propose a probabilistic phoneme labeling method using the Gaussian distribution. In contrast to the general approach, we deal with the phoneme label as the soft label, that is, our proposed label takes the continuous value. This approach is effective for the dysarthric speech which is the ambiguity of the phoneme boundary. The effectiveness of this method has been confirmed by comparing its effectiveness with that of forced alignment.
Keywords: Articulation disorders, Feature extraction, Convolutional neural network, Bottleneck feature, Phoneme labelling
Cite this paper: Yuki Takashima , Toru Nakashika , Tetsuya Takiguchi , Yasuo Ariki , Phone Labeling Based on the Probabilistic Representation for Dysarthric Speech Recognition, American Journal of Signal Processing, Vol. 6 No. 1, 2016, pp. 19-23. doi: 10.5923/j.ajsp.20160601.03.
Figure 1. Example of a spectrogram for /ikioi/ spoken by a physically unimpaired person |
Figure 2. Example of a spectrogram for /ikioi/ spoken by a person with an articulation disorder using forced alignment (top) and manual alignment (bottom) |
(1) |
(2) |
(3) |
Figure 3. Illustration of the proposed Gaussian labeling for an utterance /ki/ in Japanese |
Figure 5. Word recognition accuracy for utterances of “speaker A” using each phoneme labeling method and conventional MFCC features |
Figure 6. Word recognition accuracy for utterances of “speaker B” using each phoneme labeling method and conventional MFCC features |