I’ve just pushed changes based on my recent work in NLP. I haven’t had time to thoroughly test whether they’ve broken any of the lessons, so apologies in advance if you find anything not working. Please either let me know here of problems, or if you know how to fix them feel free to send a PR (and let us know here that you’ve fixed it).
One key change is that metrics now take pytorch tensors/variables, not numpy arrays. The reason is that (to my surprise!) calculating accuracy was taking a significant percentage of training time. Moving it from numpy to pytorch as fixed that problem.
The other key changes are around the NLP modules. They are designed to continue to work much the same way as before, but there have been internal changes. (FYI, you’ll see significant overlap between fastai.nlp and fastai.text. fastai.nlp is the old module that’s designed to work with torchtext. fastai.text is a new module designed to replace torchtext. You should stick with fastai.nlp since the new module isn’t documented, unless you’re interested in getting involved in development of this module.)
I’m wanting to work with the new code in the fastai.text namespace as it seems more friendly for both multi-class and multi-label problems.
In the .nlp namespace there exists a helper method to build the DataLoaders that was heavily dependenent on torchtext. In the .text namepace, there exists just a constructor that accepts the dataloaders as arguments. My question is:
What should these datasets/dataloaders look like and is there a recommended way to build them?
(attached is what I did for the toxic comp. using the nlp namespace; any feedback and what is good, bad, or could be improved would be appreciated in addition to how to translate these to something that will work with the text namespace).
I’m also going to submit a PR for a def text_labels_from_dataframes() once I get things working as is (I still find working with dataframes so much more pleasant and flexible than the reading files approach).