Talk by Postdoc Researcher György Kovács "Noise Robust Automatic Speech Recognition Based on Spectro-Temporal Techniques"
17-12-2018
Speech technology has a wide variety of existing and potential applications in many areas of our life. From dictating systems to voice translation, from digital assistants to telephone dialogue systems. Many of these applications have to rely on an Automatic Speech Recognition (ASR) component. This component not only has to perform well, it has to perform well in adverse environments. After all, a personal assistant one can only use in a sound-insulated room is of not much use. For this reason, noise robust ASR has been a topic of intensive research. Yet, human-equivalent performance has not been achieved. This motivated many to search for ways to improve the robustness of ASR based on human speech perception. One popular method - inspired by the examination of the receptive fields of auditory neurons - is that of spectro-temporal processing. Here, the aim is to simultaneously capture the spectral and temporal modulations of the speech signal. One simple way to do so is to extract the features to be used from spectro-temporal patches, and then use the resulting spectro-temporal features in the same manner one would use traditional features. There is more than one way to shoe a horse, however. And in our case this is true twice over. For one, there are various ways to extract useful features from the patches. But there are other, more sophisticated ways to incorporate the concept of spectro-temporal processing into a speech recognition system. I will discuss a selection of these methods – some briefly, some more extensively - and examine their effect on the noise robustness of ASR solutions.
BIO
György Kovács has recently received his PhD degree from the University of Szeged in Hungary, and he is currently a post doc researcher at the Marcus Liwicki's group of EISLAB Machine Learning at Luleå University of Technology in Sweden.