i havent’ tried anything yet, i was just sharing how i thought i could do it - because i have this datablock with a bunch of callbacks that almost does it anyway. I didn’t actually know there was a sampler until he mentioned it because i haven’t used it.
i’ll understand his way first and try it when i get there after i get the predictions working, appreciate the redirect
The best way to understand DataLoader and TfmdDL is to read their source code and tests. They’re extremely simple, in terms of implementation, but extremely flexible. We’ll need lots of tutorial examples to show how to take full advantage of this. Creating examples for Kaggle competitions will be fantastic role models for others to follow - I’m happy to code review any draft approaches that you all come up with. It’s possible you’ll find things that aren’t easy to do, in which case we can modify the library until it is easy
I’ll address more things during the day and will share whatever code I end up writing on sampling / inference. Part of the benefit of putting in writing what I struggled with yesterday was that I think I came up with ways to work around them. But the problem is that I would not like to work around the framework. Not in the sense that I don’t want to put in the time or that I don’t know how to - I’ve done that before and would be happy to do it again. It’s just that I feel there is less value for everyone in this approach.
For inference in a Flask app I think I probably would want to call the model on a tensor directly? That’s what I have done in the past doing a project for a fashion startup. If not a permanent solution maybe at least in the interim period that would be ok. I would still use the building blocks that v2 provides, the context managers, etc. Would learn a bit more about the framework so that would be nice as well. We probably want to forgo the whole DataLoader mechanism for inference anyhow.
For inference for a Kaggle competition, I think I could figure out how to get the augmented tensors and manually iterate over the dataloader using context mangers, setting model to eval, stopping to recalculate BN values, etc. I could manually do the TTA if it even will make sense in the context of this competition. Maybe that is not a bad idea to go this route? I am just thinking out loud, but its a bit hard for me to draw the line between what the framework could do out of the box vs piecing functionality from components. I think the answer is fairly simply though - whatever the framework doesn’t do and is not easily achieved using callbacks / subclassing and overriding a simple method, that probably would be best to be created using individual ‘lego’ bricks and shared if possible for other to have a chance of referencing.
Those are just some random thoughts but I am trying to orient myself in how to get things done with v2 while still being a good member of the community In particular, I hope that neither Jeremy nor Sylvain mind what I write - I do respect your time and I neither would want to impede your work in whatever order you deem to be best (seems text models have been getting a lot of love recently! ) nor is my intention to waste your time on some Radek goose chases BTW you seem very good with people on the Internet not impeding your work so maybe there is nothing there to worry about
Could I please ask if the Adam optimizer in v2 is of the AdamW variety? Also, does it handle setting the eps for us?
Is there some description that you could please point me to of the TTA trick where you train for a little and then search for useful transforms? Probably the biggest question on my mind - by training a little this would mean training for a couple of epochs where the model just starts doing something useful? Roughly how much training one should do here? Or is it training to the point where the model achieves 80% - 90% of its final accuracy, a point that can be reached rather quickly but where the model can still benefit from further training?
Adam in v2 is AdamW unless you pass true_wd=False. You can also set the eps which defaults to 1e-5 (more than the default value in PyTorch but we found it led to more stable trainings and allowed for slightly bigger learning rates).
The second one above is thanks to the new Tuple class that I recently added, that supports broadcasted arithmetic ops on tuples - really helpful for arithmetic on array shapes, as you see here.
In V2, is there a way please to create X_labels using Pipeline from source (items) but y_labels from a csv file? I am trying to convert .dcm images to X dataset, but y_labels come from a .csv file in RSNA Kaggle challenge.
Easiest would be to combine everything you need (filenames and labels) into a DataFrame, and then follow any of the sample approaches for Planet in nb 50. Let us know how you go!
I guess that’s more of a feature request:
And please let me know if that’s the right place to ask or if there is another forum / discussion for that.
Are there any plans to add a SOTA NLP Zoo similar to huggingface/transformers that ships 32+ pre-trained pytorch models including (Ro)BERT, GP2, XLM ?
Maybe as a “contrib” branch?
They already have some SWIFT / CoreML ports so on the surface that looks like a nice complement to the recent Swift.AI work but I guess that might need to be examine a bit more.
I am well aware that fast.ai already comes with ULmFit and similar models, but it would be wonderful to add GPT2 and BERT as an addition to enable new kinds of application development. For example, the guys at TabNine use the GPT-2 model to train a predictive code-autocomplete / code-snippet generator for several programming languages and that is a pretty useful application.