6 – L4 17 HS Outro V2

In this lesson, we learned about a class of ensemble methods that uses random subsets of rows and columns to create a forest of different decision trees. We also learned a few reasons why they are a good fit for some of the types of problems we encounter in finance. You may notice that certain … Read more

5 – L4 15 HS Outofbag Score V4

Let’s discuss a useful performance metric that you can use whenever your ensemble algorithm utilizes bagging or a random row selection. Remember that bagging involves drawing a sample of the original datasets rows, with replacement for each tree in the ensemble. It can be shown, that on average, each tree makes use of about two-thirds … Read more

4 – L4 011 HS Random Forests V5

Random forests take advantage of perturbations applied to both columns and rows. Let’s take a moment to describe how to generate a random forest model. Let’s say we’re working with this dataset. We know we’re going to generate many trees and combine their predictions together. We first choose the number of trees to generate. This … Read more

3 – MLND SL EM 02 Bagging V1 MAIN V1

So let’s start with bagging. Here’s our data in the form of some red and blue points, and for simplicity, we’ll say that our weak learners will be the simplest possible learner. A Decision Tree of one-node. So, all of them are either horizontal or a vertical line that says on this side everything is … Read more

2 – MLND SL DT 13 Random Forests MAIN V2

Now, here’s a potential problem with decision trees. Let’s say we have a humongous table with lots and lots of columns. So, we create our decision tree and let’s say it looks like this. This is not a realistic tree though, just an example. We end up with answers like the following. If a client … Read more

1 – L4 01 HS Intro V2

You’ve already learned about decision trees. Decision trees are very useful predictive models that have a number of advantages, including the facts that they are simple to understand and interpret, require little data preparation, can handle many datatypes and work well with large datasets. A number of properties of tree-based methods are particularly appealing for … Read more