1 – Introduction to Speech Recognition

Hello again. In the last module, we built a VUI application that used a commercial implementation of speech recognition. Like magic, the words we speak into a voice enabled device, are converted into text. In this module, we take a closer look at how speech recognition really works. Now, when we say speech recognition, we’re really talking about ASR, or automatic speech recognition. With ASR, the goal is to simply input any continuous audio speech and output the text equivalent. We want our ASR to be speaker independent and have high accuracy. Such a system has long been a core goal of AI, and in the 1980s and 1990s, advances in probabilistic models began to make ASR a reality. We’ll start by asking the question, what makes speech recognition hard? Like many other AI problems we’ve seen, ASR can be implemented by gathering a large pool of labelled data, training a model on that data, and then deploying the trained model to accurately label new data. The twist is that speech is structured in time and has a lot of variability. We’ll identify specific challenges we face when decoding spoken words and sentences into text. To understand how these challenges can be met, we’ll take a deeper dive into the sound signal itself as well as various speech models. The sound signal is our data. We’ll get into signal analysis, phonetics, and how to extract features to represent speech data. Models in speech recognition can conceptually be divided into an acoustic model and a language model. The acoustic model solves the problems of turning sound signals into some kind of phonetic representation. The language model houses the domain knowledge of words, grammar, and sentence structure for the language. These conceptual models can be implemented with probabilistic models using machine learning algorithms. Hidden Markov models have been refined with advances for ASR over a few decades now, and are considered the traditional ASR solution. Meanwhile, the cutting edge of ASR today is end-to-end Deep Neural Network Models. We’ll talk about both. When you’ve absorbed all that, you’ll be ready to build your own ASR in the final project. You’ll be using deep learning tools you’ve already been introduced to, with a few new twists. Let’s get started.

%d 블로거가 이것을 좋아합니다: