9 – M7L7 09 Gini Solution V1

Okay. So, when both classes are 50 percent, what is the impurity? So, frequency is 0.5 for both. Let’s type all this formula, which is actually the same thing that you see right here, but let’s just type it out. So, negative one times frequency one, times one minus frequency one, plus minus one times … Read more

8 – M7L7 08 Gini Intro V1

Now, let’s look at Gini impurity and practice calculating in code. So, Gini impurity like entropy is a way to measure how disorganized the observations are before and after splitting them using a feature. So, there’s an impurity value calculated for each node. In the formula, frequency sub I over here, is the frequency of … Read more

7 – M7L7 07 Sklearn Code Intro Part 4 V1

Okay. So let’s look at features used for splitting at each node. So here, again, is documentation that I pasted from the source code that’s over here. So I just copied that into the notebook. So if we were to look at it, the feature one is used to split at node zero. So at … Read more

66 – L7 Outro V1

In this lesson, you’ve learned about the need to be able to interpret complex models and to identify which features are important to model’s prediction. You’ve also learned how feature importance is implemented in scikit-learn, as well as how sharply additive explanations are implemented. Finally, you also practice ranking features by their importance to a … Read more

65 – M7L7 66 Discussion V1

So, let’s discuss sector a bit. The random forests can still work with categorical features that are numbers. For instance, to filter features by sector five, is possible for a tree to learn to split on a sector less than six and then below that, split on sector greater than four. So, one of the … Read more

64 – M7L7 65 Rank Shap Solution V1

So let’s implement model SHAP importances with the starter code. First we’ll calculate the SHAP values. So if you’ll recall, let’s go back up to see how we call that function. The first step was we take the SHAP library. It has a tree explainer, we create that object passing in the model, and then … Read more

63 – M7L7 64 Rank Shap Intro V1

Rank features using SHAP. There are a couple of classes, one for each quintile. So the list returned by shap_values function has one element for each of those classes. We’ll explore how to get the absolute values and then average of those absolute values for each of the features. Then, we can put that code … Read more

62 – M7L7 63 Local Global V1

Local to global feature importance. SHAP calculates local feature importance for every training observation, so for every single row. To calculate global feature importance, take the absolute values of the local feature importances, and then take the average across all the samples. So, that’s described in this formula here where we have N samples and … Read more

61 – M7L7 62 Shap V1

We’ll also use the Shap library to determine feature importance. Let’s import the Shap library and we’ll also initialize JavaScript so that when we make plots, we can see them. Now, here’s some documentation for this function that we’ll use. If you went here, you can look for the Shap values function, and I’ve also … Read more

60 – M7L7 61 Rank Sklearn Solution V1

Okay. So first let’s get the feature importance from the model and here the variable M is the model. We can get feature importance like so, and then sort the importances in descending order, and store the indices of that sort okay. So, if you did it numpy.argsort of the importances, that would be in … Read more

6 – M7L7 07 Sklearn Code Intro Part 3 V1

Okay. So first let’s get familiar with exploring the tree data structure in scikit-learn or rather the tree class. So the source code for tree is here and it has useful comments about the attributes in the tree class. So once we click on this link to the source code, I’m going to ask you … Read more

59 – M7L7 60 Rank Sklearn V1

Rank features by feature importance with sklearn. We’ll define a function that uses the built-in sklearn feature importances, and source the features by their feature importance. Note that there’s a numpy.argsort function that returns a list of the original index locations of a list in the order that would make them sorted in ascending order. … Read more

58 – M7L7 59 Run Starter Code V1

So now let’s rank the features by their importance. The creator of Shapley Additive Explanations, Scott Lundberg, has written an efficient implementation that we can install and use. We’ll be able to use this to determine both local feature importance, which means for a single observation, and also global feature importance, which is for all … Read more

57 – M7L7 56 Test3 V1

Okay. So test three is a case when feature zero and one are one, and feature two is zero, it can actually be zero or a one. You want to fight the prediction. So we can first double-check that the shap function that we calculated gives the same result as the shap library. So you … Read more

55 – M7L7 54 Test1 V1

Okay. So, now we’re going to just make sure that our model’s predictions that we wrote by hand matches the shap library. Okay. So, we’re testing the case where we’ll input a simple observation where all the features have a value of zero. All right, and then here is our calculation of the set values. … Read more

54 – M7L7 53 Additive Feature Att Part 2 V1

Okay. So, for more detail, let’s go over an example, which will help to make this a little more concrete. So let’s say we’ve trained a complex model on three features. If it’s given no inputs to make a prediction, then its prediction would be the equal weighted average of all this training samples. We … Read more

53 – M7L7 53 Additive Feature Att Part 1 V2

Additive feature attribution. So, additive feature attribution methods are simple models that are used to explain complex models. So, you can see the formula that I pasted here on page three of the paper. So, if I went to the paper itself, let’s say, we’ll go to page three. So, I’m just showing this formula … Read more

52 – M7L7 52 Shap All V1 (1)

Okay. So now, I’m just going to explain some code that I’ve already written for you. This is just to tie things up so that notice that we only calculated the feature importance for a single feature. So we just will create this function shap_tree_explainer, and it will take that tree object, it will take … Read more

51 – M7L7 51 Shap One Solution V2 (1)

Okay. So now let’s fill in this starter code for shap_ feature_ I. First S list, is generate all subsets of S and put it into S list. So if we go back up, and in our book we can see where we defined the function for that. So this generate all subsets, and it … Read more

50 – M7L7 50 Shap One Intro V1 (1)

Calculate Shapley value for one feature. Implement a function that calculates the Shapley value for a single feature by iterating across all subsets S. So, this is the formula you may be familiar with from the previous exercise. We’re going to implement this in code here. So, as a bonus, you can try to implement … Read more