Unsupervised Detection of Anomalous Sound for Machine Condition Monitoring using Fully Connected U-Net

Hoang Van Truong; Nguyen Chi Hieu; Pham Ngoc Giao; Nguyen Xuan Phong

doi:10.5614/itbj.ict.res.appl.2021.15.1.3

Authors

Hoang Van Truong FPT Software Company Limited
Nguyen Chi Hieu FPT Software Company Limited
Pham Ngoc Giao FPT Software Company Limited
Nguyen Xuan Phong FPT Software Company Limited

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2021.15.1.3

Keywords:

anomaly detection, anomalous sound, auto-encoder, spectrogram, U-Net

Abstract

Anomaly detection in the sound from machines is an important task in machine monitoring. An autoencoder architecture based on the reconstruction error using a log-Mel spectrogram feature is a conventional approach for this domain. However, because of the non-stationary nature of some sounds from the target machine, such a conventional approach does not perform well in those circumstances. In this paper, we propose a novel approach regarding the choice of used features and a new auto-encoder architecture. We created the Mixed Feature, which is a mixture of different sound representations, and a new deep learning method called Fully-Connected U-Net, a form of autoencoder architecture. With experiments on the same dataset as the baseline system, using the same architecture for all types of machines, the experimental results showed that our methods outperformed the baseline system in terms of the AUC and pAUC evaluation metrics. The optimized model achieved 83.38% AUC and 64.51% pAUC on average overall machine types on the developed dataset and outperformed the published baseline by 13.43% AUC and 8.13% pAUC.

Downloads

Download data is not yet available.

References

Koizumi, Y., Saito, S., Uematsu, H. & Harada, N., Optimizing Acoustic Feature Extractor for Anomalous Sound Detection Based on Neyman-Pearson lemma, in Proceedings of European Signal Processing Conference (EUSIPCO), Kos, Greece, pp. 728-732, 2017.

Li, Y., Li, X., Zhang, Y., Liu, M. & Wang, W., Anomalous Sound Detection Using Deep Audio Representation and a BLSTM Network for Audio Surveillance of Roads, IEEE Access, 6, pp. 58043-58055, 2018.

Foggia, P., Petkov N., Saggese A., Strisciuglio, N. & Vento, M., Audio Surveillance of Roads: A System for Detecting Anomalous Sounds, IEEE Transactions on Intelligent Transportation Systems, 17, pp. 279-288, 2016.

Koizumi, Y., Saito, S. & Uematsu, H., Anomalous Sound Detection for Machine Operating Sounds using Deep Neural Networks, in Proceedings of 2017 Spring Meeting of the Acoustical Society of Japan, pp. 473-476, 2017.

Hayashi, T., Komatsu, T., Kondo, R., Toda, T. & Takeda, K., Anomalous Sound Event Detection Based on Wavenet, in Proceedings of European Signal Processing Conference, Rome, Italy, pp. 2508-2512, 2018.

tk, ?., Stacked Auto-Encoder Based Tagging With Deep Features for Content-based Medical Image Retrieval, Expert Systems with Applications, 161, 2020.

tk, ?., Image Inpainting based Compact Hash Code Learning using Modified U-Net, International Symposium on Multidisciplinary Studies and Innovative Technologies, pp. 1-5, 2020.

Marchi, E., Vesperini, F., Squartini, S. & Schuller, B., Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection, Computational Intelligence and Neuroscience, 2017, 2017

Suefusa, K., Nishida, T., Purohit, H., Tanabe, R., Endo, T. & Kawaguchi, Y., Anomalous Sound Detection Based on Interpolation Deep Neural Network, in IEEE International Conference on Acoustics, Speech and Signal Processing, Virtual Barcelona, pp. 271-275, 2020.

Koizumi, Y., Kawaguchi, Y., Imoto, K., Nakamura, T., Nikaido, Y., Tanabe, R., Purohit, H., Suefusa, K., Endo, T., Yasuda, M. & Harada, N., Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring, in arXiv e-prints: 2006.05822, pp. 1-4, 2020.

Muller, M., Ewert, S. & Kreuzer, S., Making Chroma Features More Robust to Timbre Changes, in IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, pp. 1877-1880, 2009.

Dan-Ning, J., Lie, L., Hong-Jiang, Z., Jian-Hua, T. & Lian-Hong, C., Music Type Classification by Spectral Contrast Feature, in IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, 1, pp. 113-116, 2002.

Wikipedia contributors, Tonnetz, in Wikipedia, en.wikipedia.org/wiki/ Tonnetz

McFee, B., Raffel, C., Liang, D., Ellis, DP., McVicar, M., Battenberg, E. & Nieto, O., librosa: Audio and Music Signal Analysis in Python, Proceedings of the 14th Python in Science Conference, Texas, USA, 8, 2015.

Ronneberger, O., Fischer, P. & Brox, T., U-Net: Convolutional Networks for Biomedical Image Segmentation, in Medical Image Computing and Computer-Assisted Intervention, pp. 234-241, 2015.

Dumoulin, V., Visin, F., A Guide to Convolution Arithmetic for Deep Learning, preprint arXiv:1603.07285, 2016.

Koizumi, Y., Saito, S., Uematsu, H. & Imoto, K., ToyADMOS: A Dataset of Miniature Machine Operating Sounds for Anomalous Sound Detection, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New York, USA, pp. 308-312, 2019.

Purohit, H., Tanabe, R., Ichige, T., Endo, T., Nikaido, Y., Suefusa, K. & Kawaguchi, Y., MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019), New York, USA, pp. 209-213, 2019.

Keras Team, Reduce LR On Plateau, Google, Retrieved December 10, 2020, from keras.io/api/callbacks/reduce_lr_on_plateau.