In this lesson, you will learn about Temporal Difference or TD learning. In order to understand TD learning, it will help to discuss what exactly it would mean to solve this problem of learning from interaction. The solution will come many years into the future, when we’ve developed artificially intelligent agents that interact with the world much like the way humans do. In order to accomplish this, the agents would need to learn from the kind of online streaming data that we learn from everyday. Real life is far from an episodic task and it requires its agents, it requires us to constantly make decisions all day everyday. We get no break with our interaction with the world. Remember that Monte Carlo learning needed those breaks, it needed the episode to end so that the return could be calculated, and then used as an estimate for the action values. So, we’ll need to come up with something else if we want to deal with more realistic learning in a real world setting. So, the main idea is this, if an agent is playing chess, instead of waiting until the end of an episode to see if it’s won the game or not, it will at every move be able to estimate the probability that it’s winning the game, or a self-driving car at every turn will be able to estimate if it’s likely to crash, and if necessary, amend its strategy to avoid disaster. To emphasize, the Monte Carlo approach would have this car crash every time it wants to learn anything, and this is too expensive and also quite dangerous. TD learning will solve these problems. Instead of waiting to update values when the interaction ends, it will amend its predictions at every step, and you’ll be able to use it to solve both continuous and episodic tasks. It’s also widely used in reinforcement learning and lies at the heart of many state-of-the-art algorithms that you see in the news today. So, let’s jump right in.