American Journal of Signal Processing
p-ISSN: 2165-9354 e-ISSN: 2165-9362
2012; 2(5): 92-97
doi: 10.5923/j.ajsp.20120205.02
Kiran Bhuvanagirir , Sunil Kumar Kopparapu
TCS Innovation Labs, Mumbai, Tata Consultancy Services, Thane (West), 400601, India
Correspondence to: Sunil Kumar Kopparapu , TCS Innovation Labs, Mumbai, Tata Consultancy Services, Thane (West), 400601, India.
Email: |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
Use of mixed language in day to day spoken speech is becoming common and is accepted as being syntactically correct. However machine recognition of mixed language spoken speech is a challenge to a conventional speech recognition engine. There are studies on how to enable recognition of mixed language speech. At one end of the spectra is to use acoustic models of the complete phone set of the mixed language to enable recognition while on the other end of the spectra is to use a language identification module followed by language dependent speech recognition engines to do the recognition. Each of this has its own implications. In this paper, we approach the problem of mixed language speech recognition by using available resources and show that by suitably constructing an appropriate pronunciation dictionary and modifying the language model to use mixed language, one can achieve a good recognition accuracy of spoken mixed language.
Keywords: Speech Recognition, Mixed-language Speech, Language Identification, Phoneme Set
Cite this paper: Kiran Bhuvanagirir , Sunil Kumar Kopparapu , "Mixed Language Speech Recognition without Explicit Identification of Language", American Journal of Signal Processing, Vol. 2 No. 5, 2012, pp. 92-97. doi: 10.5923/j.ajsp.20120205.02.
Figure 1. Mixed Language sentence |
Figure 2. Multi pass approach for mixed language ASR |
Figure 3. One pass approach for mixed language ASR |
Figure 4. Proposed approach |
|
|
Figure 5. Sample lexicon constructions. (a) using CMU tool kit. (b) using Hindi phoneme set (c) using APM (from Hindi to English) (d) Both (CMU and APM) phonetic representations in same lexicon |
[1] | CHIEN-LIN Huang and CHUNG-HSIEN Wu., “Generation of phonetic units for mixed language speech recognition based on acoustic and contextual analysis”. IEEE Transactions on Computers, 56:1225–1233, 2007. |
[2] | PO-YI Shih, JHING-FA Wang, HSIAO-PING Lee, HUNG-JEN Kai, HUNG-TZU Kao, and YUAN- NING Lin. “Acoustic and phoneme modeling based on confusion matrix for ubiquitous mixed language speech recognition”, In SUTC ’08: Proceedings of the 2008 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, pages 500–506, Washington, DC, USA, 2008. |
[3] | DAU-CHENG Lyu, REN-YUAN Lyu, YUANG-CHIN Chiang and CHUN-NAN Hsu, “Speech recognition on code-switching among the Chinese dialects”, of IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, May. 2006 |
[4] | CHUNG-HSIEN Wu, YU-HSEIN Chie, CHI JIUN Shia, CHUN-YU Lin , “Automatic segmentation and identification of mixed language speech using Delta-BIC and LSA based GMMs”, ICASSP 06, vol 14, No 1, 266-276. |
[5] | CIMARUSTI, D., Ives, R. B. “Development of an automatic identification system of spoken languages: Phase 1”. Proc. ICASSP’82, pp. 1661-1664, May 1982. |
[6] | P. A. TORRES-CARRASQUILLO, ELLIOT singer, MARS A Kohler, RICHARD J Greene, DOUGLAS A Reynolds, and J R DELLER JR, “Approaches to language identification using Gaussian mixture models and shifted delta Ceptral features”, in Proc. ICSLP’02, 2002, pp. 89–92. |
[7] | FOIL, J.T. “Language identification using noisy speech”, Proc. ICASSP’86, pp. 861-864, April 1986. |
[8] | NAKAGAWA, S., UEDA, Y., SEINO, T. “Speaker-independent, text-independent language identification by HMM”, Proc. ICSLP’92, pp. 1011-1014, October 1992. |
[9] | YAN, Y, “Development of an approach to language identification based on language dependent phone recognition.”, PhD thesis, Oregon Graduate Institute of Science and Technology, October 1995. |
[10] | NAVRÁTIL, J. “Spoken language recognition - A step Toward Multilinguality in Speech Processing”, IEEE Trans. Speech Audio Processing, vol. 9, pp. 678-685, September 2001. |
[11] | W. H. TSAI and W.-W. CHANG, “Discriminative training of Gaussian mixture bi-gram models with application to Chinese dialect identification”, Speech Comm., vol. 36, pp. 317–326, 2002. |
[12] | CHI JIUN shia, YU-HIEN Chiu, JIA-HIN Hieh, CHUNG-HSIEN Wu, “Language boundary detection and identification of mixed language speech based on MAP estimation”, ICASSP 04, vol 1, 381-384. |
[13] | NILOY Mukherjee, NITENDRA Rajput, L V SUBRAMANIAM, ASISH Verma, “On deriving a phoneme model for new language”, proc ICSLP, 2000, pages 850-852. |
[14] | http://www.speech.cs.cmu.edu/cgi-bin/cmudict (last accessed Aug 2010) |
[15] | http://cmusphinx.sourceforge.net/ (last accessed Aug 2012) |
[16] | Sunil Kumar KOPPARAPU,” Voice based Self-Help System: User Experience Vs Accuracy”, International Conference on Systems, Computing Sciences and Software Engineering: pages 101-105, 2008. |
[17] | http://en.wikipedia.org/wiki/Mixed_language (last accessed Aug 2012) |
[18] | Kiran Kumar BHUVANAGIRI, Sunil KOPPARAPU, “An approach to mixed language automatic speech recognition”, Oriental COCOSDA 2010, Nepal. |
[19] | Imseng David, Bourlard Herve, Magimai-Doss Matthew, “Towards mixed language speech recognition systems”, Proceedings of Interspeech, Sept 2010, Pages 278-281, Japan. |