I think most of the infrastructure for that should be in place already - a DataSource can have more than 2 subsets, as can DataBunch. It might need some callback to be modified to get additional metrics to show in fit, and there might be other changes needed too.
What use cases have you heard of for labeled test sets?
I know the general rule of thumb for evaluating neural networks is a third test set. In my research I have been trying to evaluate my model based on that third set where itās a 10% subset the model was never trained on, ārealā data to get a fair judgement which is labeled.
Thereās methods to do this currently with the v1 library but it requires a bit of hacking.
Is anybody else seeing issues trying to open links in the docs? The one Iām trying now is in nb 00_test. It links to /test.html, but I think it should be in /docs/test.html
The actual command is:
show_doc(test_eq)
And the link to test underneath is what doesnāt take me to the correct place. Iām digging into it still but just want to see if anybody else can confirm it isnāt just me.
Is there a specific level that a user is expected to run these notebooks from? Inside dev or from the top level?
We havenāt got much inference/prediction functionality yet - but once we do, that should be trivial, since all the data functionality is designed to handle as many datasets as you like.
Iām slowly going further along, Iām trying to train my model with mixup (as the cnn_learner) tutorial has it, and I get a runtime error: _thnn_conv2d_forward not supported on CPUType for Half
After investigating it looks like for some reason python is not picking up to use my GPU, despite !python -c 'import torch; print(torch.cuda.is_available())' returning Trueā¦
(Training w/o mixup left me in an epoch time of 10 min/epoch, on v1 this is 1:30)
Also ds_tfms seems to have been removed from DataSource?
So I think I figured it out and I think it is a mistake! If you look at the tests, the bool doesnāt matter. x refers to the self object of the object, in this case int.
@patch
def func(x:_T3, a:bool):
"test"
return x+2
t = _T3(1)
test_eq(t.func(1), 3) # The one here doesn't matter
test_eq(t.func.__qualname__, '_T3.func')
We can make this a bit more sane, which is consistent with later in the core notebook:
@patch
def func(self:_T3):
"test"
return self+2
@patch
def func2(self:_T3, a:int):
"test"
return self+2+a
t = _T3(1)
test_eq(t.func(), 3) # Here the method takes in no arguments
test_eq(t.func.__qualname__, '_T3.func')
t2 = _T3(1)
test_eq(t2.func2(2), 5) # Here we can pass in an argument
test_eq(t2.func.__qualname__, '_T3.func')
Been reading through the core notebook today myself! Itās been a wild ride. These metaclasses are really melting my brain.
I agree with this change unless there is something happening that Iām missing with the a:bool. Pretty sure this is the correct interpretation though.
This stuff is very brain melty for sure. Just going to try understanding enough of it so I can work on adding audio capabilities.
Sounds good! Iāll go in and submit a pull request and and fix the other instances of that bool creeping in.
And yeah! Each fastai revision reminds me how little I know about python. Love it though. Understanding the core.ipynb was like trying to stay on one of those mechanical bulls. Gonna go through it a few more times, especially for the latter portion. Favorite part so far is definitely building up to getting rid of the super call in nn.Module.
class Module(nn.Module, metaclass=PrePostInitMeta):
"Same as `nn.Module`, but no need for subclasses to call `super().__init__`"
def __pre_init__(self): super().__init__()
def __init__(self): pass
Think it would be better to write documentation on every class mentioned in 01_core.ipynb. That ways it would help every enthusiast to understand what is going on and also help us understand python better. Let me see what I can do.
I agree that documentation is a key issue. I would include comments in the code, which appears to be entirely comment-free. Just to give a hint as to the purpose of often obscure lines.
The videos are absolutely fantastic but I do not consider them to be easily accessible documentation.
I have had difficulty with the complete incorporation of tqdm (perhaps because I often do not use notebooks, and it can get mangled in displays). It has a disable flag but that doesnāt seem to work and to override huge classes for that is a tragedy. I would argue that tqdm is a bit too high level to put in a development package. But this could just be meā¦
I also think that the code is generally rather abstract. I can go quite quickly into almost anything in pytorch and easily understand how to make it work but in fastai, I have spent hours in some parts without really understanding much of anything. And I recall painfully learning data classes only to have them replaced from scratch.
Having said all that, the code rarely (if ever) fails when used properly and I really like that new research is quickly incorporated - greatly helpful in doing a (admittedly less pristine) port to pure Pytorch. For example, the 1-cycle was easy to put in pure Pytorch, after it was celebrated by fastai.
Sorry for the negativity. Even with that, I realize that fastai has had great positive impact and I am glad it exists.
The reasoning for this is ideally the documentation should explain line-by-line whatās going on. Which yes needs to be worked on immensely. However the new notebook structure allows that quite easily, and is a step in the right direction when compared to v1.
That being said though some decisions are just āwell it just worksā moments too that donāt necessarily have an explanation but do work in practice. For example the embedding sizes for tabular models. To some regard it took inspiration from word2vec but there isnāt a strong standpoint to why it works, aside from experimentation