Fastai v2 chat

jeremy · August 31, 2019, 8:33pm

I think most of the infrastructure for that should be in place already - a DataSource can have more than 2 subsets, as can DataBunch. It might need some callback to be modified to get additional metrics to show in fit, and there might be other changes needed too.

What use cases have you heard of for labeled test sets?

muellerzr · August 31, 2019, 8:35pm

I know the general rule of thumb for evaluating neural networks is a third test set. In my research I have been trying to evaluate my model based on that third set where it’s a 10% subset the model was never trained on, “real” data to get a fair judgement which is labeled.

There’s methods to do this currently with the v1 library but it requires a bit of hacking.

KevinB · August 31, 2019, 9:06pm

Is anybody else seeing issues trying to open links in the docs? The one I’m trying now is in nb 00_test. It links to /test.html, but I think it should be in /docs/test.html

The actual command is:

show_doc(test_eq)

And the link to test underneath is what doesn’t take me to the correct place. I’m digging into it still but just want to see if anybody else can confirm it isn’t just me.

Is there a specific level that a user is expected to run these notebooks from? Inside dev or from the top level?

jeremy · August 31, 2019, 9:16pm

So would you be wanting to see the results on the additional dataset during training? Or just have a way to easily evaluate after training?

muellerzr · August 31, 2019, 9:18pm

After training is completely done to avoid test set bias

jeremy · August 31, 2019, 9:18pm

These are designed to work in the HTML version of the docs (http://dev.fast.ai), not in the notebooks.

jeremy · August 31, 2019, 9:20pm

We haven’t got much inference/prediction functionality yet - but once we do, that should be trivial, since all the data functionality is designed to handle as many datasets as you like.

muellerzr · August 31, 2019, 9:22pm

Awesome! Glad to hear!

muellerzr · August 31, 2019, 9:26pm

For anyone using Google Colab, run this in the first cell and everything should be set up for you to import and use

!pip3 install torch===1.2.0 torchvision===0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
!pip install typeguard
!git clone https://github.com/fastai/fastai_dev.git
%cd fastai_dev/dev

muellerzr · August 31, 2019, 11:42pm

I’m slowly going further along, I’m trying to train my model with mixup (as the cnn_learner) tutorial has it, and I get a runtime error: _thnn_conv2d_forward not supported on CPUType for Half

After investigating it looks like for some reason python is not picking up to use my GPU, despite !python -c 'import torch; print(torch.cuda.is_available())' returning True…

(Training w/o mixup left me in an epoch time of 10 min/epoch, on v1 this is 1:30)

Also ds_tfms seems to have been removed from DataSource?

img_tfms = [PILFlip(0.5)]
tfms = [PILImage.create, [parent_label, Categorize()]]
dsrc = DataSource(items, tfms, filts=split_idx, ds_tfms=img_tfms)

TypeError: __init__() got an unexpected keyword argument 'ds_tfms'

jeremy · September 1, 2019, 2:59am

Yes some things have new names - see nb 50 for up to date examples.

KevinB · September 1, 2019, 3:36am

I’m looking at 01_core and have a question on the example from patch_to:

@patch_to(_T3)
def func1(x, a:bool): return x+2

Why is a:bool here and shouldn’t self, be in here?

@patch_to(_T3)
def func1(self,x): return x+2

Or am I missing something with that a?

I’m writing up an explanation of how that decorator works and that’s the ~~only~~ biggest point of confusion I have at the moment.

KevinB · September 1, 2019, 5:04am

should 01a_script be 01b_script?

erikg · September 1, 2019, 6:22am

So I think I figured it out and I think it is a mistake! If you look at the tests, the bool doesn’t matter. x refers to the self object of the object, in this case int.

@patch
def func(x:_T3, a:bool):
    "test"
    return x+2

t = _T3(1)
test_eq(t.func(1), 3) # The one here doesn't matter
test_eq(t.func.__qualname__, '_T3.func')

We can make this a bit more sane, which is consistent with later in the core notebook:

@patch
def func(self:_T3):
    "test"
    return self+2

@patch
def func2(self:_T3, a:int):
    "test"
    return self+2+a


t = _T3(1)
test_eq(t.func(), 3) # Here the method takes in no arguments
test_eq(t.func.__qualname__, '_T3.func')

t2 = _T3(1)
test_eq(t2.func2(2), 5) # Here we can pass in an argument
test_eq(t2.func.__qualname__, '_T3.func')

Been reading through the core notebook today myself! It’s been a wild ride. These metaclasses are really melting my brain.

KevinB · September 1, 2019, 6:28am

I agree with this change unless there is something happening that I’m missing with the a:bool. Pretty sure this is the correct interpretation though.

This stuff is very brain melty for sure. Just going to try understanding enough of it so I can work on adding audio capabilities.

erikg · September 1, 2019, 6:34am

Sounds good! I’ll go in and submit a pull request and and fix the other instances of that bool creeping in.

And yeah! Each fastai revision reminds me how little I know about python. Love it though. Understanding the core.ipynb was like trying to stay on one of those mechanical bulls. Gonna go through it a few more times, especially for the latter portion. Favorite part so far is definitely building up to getting rid of the super call in nn.Module.

class Module(nn.Module, metaclass=PrePostInitMeta):
    "Same as `nn.Module`, but no need for subclasses to call `super().__init__`"
    def __pre_init__(self): super().__init__()
    def __init__(self): pass

pnvijay · September 1, 2019, 11:57am

Think it would be better to write documentation on every class mentioned in 01_core.ipynb. That ways it would help every enthusiast to understand what is going on and also help us understand python better. Let me see what I can do.

jeremy · September 1, 2019, 5:57pm

Not exactly a mistake - but I do agree it was a little unclear. I’ve changed the tests to make it more clear; thanks also for your PR!

pete1 · September 1, 2019, 9:22pm

I agree that documentation is a key issue. I would include comments in the code, which appears to be entirely comment-free. Just to give a hint as to the purpose of often obscure lines.
The videos are absolutely fantastic but I do not consider them to be easily accessible documentation.

I have had difficulty with the complete incorporation of tqdm (perhaps because I often do not use notebooks, and it can get mangled in displays). It has a disable flag but that doesn’t seem to work and to override huge classes for that is a tragedy. I would argue that tqdm is a bit too high level to put in a development package. But this could just be me…

I also think that the code is generally rather abstract. I can go quite quickly into almost anything in pytorch and easily understand how to make it work but in fastai, I have spent hours in some parts without really understanding much of anything. And I recall painfully learning data classes only to have them replaced from scratch.

Having said all that, the code rarely (if ever) fails when used properly and I really like that new research is quickly incorporated - greatly helpful in doing a (admittedly less pristine) port to pure Pytorch. For example, the 1-cycle was easy to put in pure Pytorch, after it was celebrated by fastai.

Sorry for the negativity. Even with that, I realize that fastai has had great positive impact and I am glad it exists.

muellerzr · September 1, 2019, 9:25pm

The reasoning for this is ideally the documentation should explain line-by-line what’s going on. Which yes needs to be worked on immensely. However the new notebook structure allows that quite easily, and is a step in the right direction when compared to v1.

That being said though some decisions are just “well it just works” moments too that don’t necessarily have an explanation but do work in practice. For example the embedding sizes for tabular models. To some regard it took inspiration from word2vec but there isn’t a strong standpoint to why it works, aside from experimentation