Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients

Authors

  • Tjong Wan Sen Bandung Institute of Technology, Jl. Ganesha 10, Bandung 40132, Indonesia
  • Bambang Riyanto Trilaksono Bandung Institute of Technology, Jl. Ganesha 10, Bandung 40132, Indonesia
  • Arry Akhmad Arman Bandung Institute of Technology, Jl. Ganesha 10, Bandung 40132, Indonesia
  • Rila Mandala Bandung Institute of Technology, Jl. Ganesha 10, Bandung 40132, Indonesia

DOI:

https://doi.org/10.5614/itbj.ict.2009.3.2.4

Abstract

To improve the performance of phoneme based Automatic Speech Recognition (ASR) in noisy environment; we developed a new technique that could add robustness to clean phonemes features. These robust features are obtained from Complex Wavelet Packet Transform (CWPT) coefficients. Since the CWPT coefficients represent all different frequency bands of the input signal, decomposing the input signal into complete CWPT tree would also cover all frequencies involved in recognition process. For time overlapping signals with different frequency contents, e. g. phoneme signal with noises, its CWPT coefficients are the combination of CWPT coefficients of phoneme signal and CWPT coefficients of noises. The CWPT coefficients of phonemes signal would be changed according to frequency components contained in noises. Since the numbers of phonemes in every language are relatively small (limited) and already well known, one could easily derive principal component vectors from clean training dataset using Principal Component Analysis (PCA). These principal component vectors could be used then to add robustness and minimize noises effects in testing phase. Simulation results, using Alpha Numeric 4 (AN4) from Carnegie Mellon University and NOISEX-92 examples from Rice University, showed that this new technique could be used as features extractor that improves the robustness of phoneme based ASR systems in various adverse noisy conditions and still preserves the performance in clean environments.

Downloads

Download data is not yet available.

References

Juang, B.H. & Furui, S., Automatic Recognition and Understanding of Spoken Language - A First Step Toward Natural Human-Machine Communication, Proceeding of the IEEE, 8, pp. 1142-1165, 2000.

Pallett, D., DARPA HUB-4 rep., National Institute of Science and Technology, 1999.

Hanai, N. & Stern, R.M., Robust speech recognition in the automobile, International Conference on Spoken Language Processing, 3, pp. 1339-1342, 1994.

Huerta, Juan M., Speech Recognition in Mobile Environments, Carnegie Mellon Universit, April 2000.

Davis, Steven B. & Mermelstein, P., Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences, IEEE Transaction on Acoustic, Speech and Signal Processing, 28, pp. 357-366, 1980.

Ganchev, T., Fakotakis, N. & Kokkinakis, G., Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task, Proceedings of the SPECOM-2005, 1, 191-194, 2005.

Hai, J., Meng, J.E. & Yang, G., Feature Extraction Using Wavelet Packets Strategy, IEEE Conference on Decision and Control, 5, pp. 4517-4520, 2003

Kingsbury, N.G., Complex Wavelets for Shift Invariant Analysis and Filtering of Signals, Journal of Applied and Computational Harmonic Analysis, 3, pp. 234-253, 2001.

Selesnick, I.W., Baraniuk, R.G. & Kingsbury, N., The Dual-Tree Complex Wavelet Transform - A Coherent Framework for Multiscale Signal and Image Processing, IEEE Signal Processing Magazine, 22(6), pp. 123-151, 2005.

Bayram, I. & Selesnick, I.W., On the Dual-Tree Complex Wavelet Packet and M-Band Transforms, IEEE Transactions on Signal Processing, 56(6), pp. 2298-2310, 2008.

Duda, R.O., Hart, P.E. & Stork, D.G., Pattern Classification 2nd Edition,Wiley-Interscience, 2000.

Rioul, O. & Vetterli, M. Wavelets and Signal Processing, IEEE Signal Processing Magazine, 4, pp. 14-38, 1991.

Burges, C.J.C., A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, pp. 121-167, 1998.

Downloads

How to Cite

Sen, T. W., Trilaksono, B. R., Arman, A. A., & Mandala, R. (2013). Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients. Journal of ICT Research and Applications, 3(2), 123-134. https://doi.org/10.5614/itbj.ict.2009.3.2.4

Issue

Section

Articles