7 – Acoustic Models and the Trouble with Time

We’ve got our data now. With feature extraction, we’ve addressed noise problems due to environmental factors as well as variability of speakers. Phonetics gives us a representation for sounds and language that we can map to. That mapping, from the sound representation to the phonetic representation, is the task of our acoustic model. We still haven’t solved the problem of matching variable lengths of the same word. We’ve talked about this problem before in another lesson, when the dynamic time warping algorithm was introduced. To recap, DTW calculates the similarity between two signals, even if their time lengths differ. This can be used in speech recognition, for instance, to align the sequence data of a new word to its most similar counterpart in a dictionary of word examples. As we’ll soon see, hidden Markov models are well-suited for solving this type of time series pattern sequencing within an acoustic model, as well. This characteristic explains their popularity in speech recognition solutions for the past 30 years. If we choose to use deep neural networks for our acoustic model, the sequencing problem reappears. We can address the problem with a hybrid HMM/DNN system, or we can solve it another way. Later, we’ll talk about how we can solve the problem in DNNs with connectionist temporal classification or CTC. First though, we’ll review HMMs and how they’re used in speech recognition.

%d 블로거가 이것을 좋아합니다: