A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

muellerzr · February 4, 2020, 7:01am

CV is only meant for training your model. Not production/inference. If you wanted to use all ‘n’ you’d want to save the Learners in an array and export them all.

muellerzr · February 4, 2020, 7:02am

It would depend on the block you use. I can’t guarantee it, but it’s what I’ve found for what I’ve been trying.

ulat · February 4, 2020, 7:03am

Wow! This support works! What time is it in Florida? ;=) Many thanks!

barnacl · February 4, 2020, 7:03am

Sounds good, thanks @muellerzr

foobar8675 · February 4, 2020, 7:12am

@muellerzr I’m amazed by this walkthrough you are doing. I just watched lesson 2 and in regards to deployments, I do know a little about aws and deploying on aws. If anyone has questions about that, I’m glad to help if I can.

vijayabhaskar · February 4, 2020, 8:23am

@muellerzr I found the bug! it was numpy that was causing the problem.
When I create the mask the datatype was dtype=np.uint which created an array of uint32 numbers in my Windows machine, while on colab it was uint64. PIL fromarray works only on uint32 array. That is why it worked on Win and not on colab.

barnacl · February 4, 2020, 8:25am

in the Unknown Labels notebook - “I’m choosing a very high threshold for our metrics as we want only super confident answers (as we only have one label)” shouldn’t we be setting the thresh on the loss function - loss_func=BCEWithLogitsLossFlat(thresh=0.1), for us to get confident answers ?

navneetkrch · February 4, 2020, 10:55am

Hey,
I had a few questions related to putting fastai models in production.

Do you have experience with putting fastai Tabular models in production?
How did you handle the preprocessing of the tabular data for test?

muellerzr · February 4, 2020, 2:45pm

@navneetkrch for 2, our test_dl (which you can still use after an exported learner) will apply the preprocess first you. What I mean is assume ‘df’ is some test dataframe I loaded into pandas

learn = load_learner(myModel)
dl = learn.dls.test_dl(df)
learn.get_preds(dl=dl)

mgloria · February 4, 2020, 2:55pm

I see the point in the question - cannot properly answer it BUT what we are doing in our case on accuracy_multi is this:

return ((inp>thresh)==targ.bool()).float().mean()

So basically for each potential class that is returned for an image we only take it as a candidate if the model was at least as confident as the threshold. Then we check it agains the actual target to see if we were right or not. So I believe @muellerzr probably meant 0.9 instead of 0.1 if we want to be super sure as I believe he corrected himself later in the video.

vijayabhaskar · February 4, 2020, 3:20pm

@muellerzr I’m working on Tabular data, looks like it doesn’t work without the labels for the test set. I ran the your notebook dropping the “sales” in the test set and get_preds throws an error. Probably a bug?

muellerzr · February 4, 2020, 3:21pm

it should. Those tabular notebooks are all severly outdated so it doesn’t surprise me it’s not working. I’d recommend the ones from the course folder in the fastai2 repo under nbs

vijayabhaskar · February 4, 2020, 3:22pm

Ok, I will try that. Thanks.

muellerzr · February 4, 2020, 3:24pm

Once all the notebooks are done for our Image block, I’ll be moving onto tabular. So probably here in the next month at most.

barnacl · February 4, 2020, 4:25pm

@muellerzr could you please verify if this is correct

muellerzr · February 4, 2020, 4:26pm

Hmmm. Yes probably. You are right. Adjusting the metric let’s use see what it really is, but I’d imagine the model would also fit faster and better with this adjustment. I’m wondering if MultiCategoryBlock allows this threshold (and if not this would be a great PR)

barnacl · February 4, 2020, 4:27pm

hi @mgloria i think this is in a different part of the video. That was in the Multi Label notebook. This is in the Unknown Label notebook.

muellerzr · February 4, 2020, 4:28pm

It’s not, this would be a good PR if you feel up to figuring it out @barnacl (we’ll help along the way ) I’m thinking along the lines it’s simply a parameter we can pass to MultiCategoryBlock

It’s being assigned here:

github.com

fastai/fastai2/blob/master/fastai2/data/transforms.py#L191


        if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, add_na=self.add_na)
        self.c = len(self.vocab)


    def encodes(self, o): return TensorCategory(self.vocab.o2i[o])
    def decodes(self, o): return Category      (self.vocab    [o])


# Cell
class Category(str, ShowTitle): _show_args = {'label': 'category'}


# Cell
class MultiCategorize(Categorize):
    "Reversible transform of multi-category strings to `vocab` id"
    loss_func,order=BCEWithLogitsLossFlat(),1
    def __init__(self, vocab=None, add_na=False):
        self.add_na = add_na
        self.vocab = None if vocab is None else CategoryMap(vocab, add_na=add_na)


    def setups(self, dsets):
        if not dsets: return
        if self.vocab is None:
            vals = set()

barnacl · February 4, 2020, 4:30pm

ah wouldn’t just setting the thres in BCEWithLogitLossFlat’s thresh parameter work ?
Let me look at a it a little more, i still have a few questions.
I’m trying to run a loop with thresh varying from 0.1 to 0.9 at intervals of 0.05 and seeing how the accuracy varies. not seeing much change at all not sure if the dataset is easy enough or if i’m doing something wrong. I rem Jeremy said he chose thresh=0.2 for the planet dataset, i’m guessing something like this was to choose it

muellerzr · February 4, 2020, 4:31pm

Yes, but we’d want to bring it in when we initialize our DataBlock (hence a param to MultiCategoryBlock)