Lesson 7 - Official topic

There are implementations of RF that are optimized for the right hardware as well, for example see https://github.com/rapidsai/cuml

4 Likes

Regarding Kaggle: I’m trying to use fastai2 on TPUs (PyTorch version for TPUs came out March 25) as part of Kaggle’s “Flower Classification with TPUs” in case any one wants to join me https://www.kaggle.com/c/flower-classification-with-tpus/overview

Jeremy, I heard that you won every Kaggle competition for 5 years straight. Is this true? Do you have any favorite stories of Kaggle competitions you were involved in?

1 Like

fastai2 won’t work directly with TPUs at this point (even with the PyTorch TPU library). There is ongoing development for this though.

1 Like

You’ll find a few answers here: https://youtu.be/205j37G1cxw :tea:

6 Likes

Here is Fastai’s competition using GPUs:
https://forums.fast.ai/t/fastgarden-a-new-imagenette-like-competition-just-for-fun/65909

2 Likes

It seems this algorithm is only for categorical variables, correct?

If I understand correctly, decision trees also work with continuous (numeric) variables too. Is this true? If so, how does that work?

1 Like

If we are splitting only on categorical variables, then what do we do with the continuous variables?

2 Likes

We are just talking about the cleaning for now.

We split on some values: less than something or greater than something.

Oh I was referring to the section describing “The basic steps to train a decision tree can be written down very easily:”

Did I miss something?

Oh sorry, see my other answer above.

1 Like

Does fastai use any default data augmentation or create synthetic data for tabular datasets?
Doe such techniques exist?

1 Like

I don’t know if such a technique exists, and there is nothing in fastai for this. Such a thing is probably domain-dependent.

1 Like

@ilovescience I’m giving it a try and seeing what happens. If it doesn’t work, I’l either join the GPU competition or try to get the best of both worlds, i.e., data augmentation with fastai2 and learning with tf

Ok feel free to do so. I have worked on this for a couple months though (first with fastai and now with fastai2) so I know it’s not trivial. Just fair warning :slight_smile:
I would love to see how it goes for you though, and please do share your progress! :slight_smile:

2 Likes

I believe the split is checked for every value in the range of that continuous variable and looks for the best value for the split based on the metric used. Generally in DT’s they look to maximize the information gain or reduce the entropy.

1 Like

@jeremy, do you have any thoughts on what data augmentation for tabular might look like?

7 Likes

This leads me to a follow-up question. Is there like a “resolution” for how this split value would be adjusted during training? Like adjusting the split from 0.1 to 0.15 vs 0.1 to 0.11?

Does fastai distinguish between ordered (example: “low”, “medium”, “high”) and unordered categorical variables (“red”, “green”, “blue”)?

4 Likes