Lesson 7 - Official topic

Nonnormalizable · April 29, 2020, 3:23am

How does treeinterpreter's method compare to shapely values?

harish3110 · April 29, 2020, 3:25am

How are the contributions calculated in the tree interpreter? Just to understand it at a higher level…

FraPochetti · April 29, 2020, 3:27am

Look here

jeremy · April 29, 2020, 3:29am

Jeremy said this question would be better for the advanced thread.

FraPochetti · April 29, 2020, 3:29am

In a nutshell this is how it works

from this post of mine.

jcatanza · April 29, 2020, 3:32am

Question for Jeremy from Larry D. of the TWiML study group:

How do the feature importance values relate to correlation?

FraPochetti · April 29, 2020, 3:34am

The end result is more or less the same, at least in terms of interpretation.
They come from 2 completely different approaches though.
treeintepreter just looks at relative differences of the metric of interest across splits, from the root to the leaf.
Shapley values come from game theory. They have a strong theoretical background (and I prefer them), compared to what treeintepreter spits out.

ilovescience · April 29, 2020, 3:35am

BTW this approach of trying to determine if the data is training or validation data is known as adversarial validation and is often done at the beginning of Kaggle competitions to also check if the test data is of similar data distribution as the training/validation data.

jcatanza · April 29, 2020, 3:37am

Question from Jiwon of the TWiML Study group:

Why don’t we use y_range from -0.5~5.5?

butchland · April 29, 2020, 3:37am

what about replacing sale id and machine id with non-ordered ids?

(in contrast to removing them to fix the problem with time orderings)

barnacl · April 29, 2020, 3:37am

could you please point to resources for this? @ilovescience

harish3110 · April 29, 2020, 3:38am

How is this done for image data?

arunslb123 · April 29, 2020, 3:38am

Feature importance accounts various feature interactions in the tree,whereas correlation just considers two variables.

jona · April 29, 2020, 3:39am

Not certain, but I think that “0” was not actually a choice that users could rate. It seemed to me like it was used as a placeholder for “no rating” in some of Jeremy’s models.
That means that the lowest you would need to predict is “1”, which means that setting range 0 provides sufficient buffer.

ilovescience · April 29, 2020, 3:40am

Check out some of Bojan Tunguz’s kernels on Kaggle.

Here is an example where he accurately predicted the “shakeup” of a tabular competition using this method:

https://www.kaggle.com/tunguz/lanl-adversarial-validation-shakeup-is-coming

FraPochetti · April 29, 2020, 3:42am

If 2 features are highly correlated their relative feature importance would be reduced compared to keeping just one of the two.
Here why.
A random forest selects features randomly at each split (in general).
If 2 variables are correlated, they more or less carry the same signal wrt the dependent variable.
Hence you can expect a tree to split on either of the 2 evenly.
As an end result, your 2 features will have much less importance, just because they are carrying the same information. They hide each other.
I generally remove correlated features even if it is not strictly needed, just to be able to uncover these kind of hidden relationships and spot truly important variables.

ilovescience · April 29, 2020, 3:42am

You would just train on the images directly. Here’s an example:
https://www.kaggle.com/konradb/adversarial-validation-quick-fast-ai-approach

FraPochetti · April 29, 2020, 3:43am

Same way as Jeremy is doing for tabular.
Just create a dataset with a new dependent variable (valid VS train) and train a model on that.
If your model is good, you are in trouble

harish3110 · April 29, 2020, 3:45am

So dropping features in a model is a way to reduce the complexity of the model and thus reduce overfitting? Is this better than adding some regularization like weight decay to tabular models?

jankelowitz · April 29, 2020, 3:45am

Is there a good heuristic for picking the number of linear layers in the tabular model?