@rachel@jeremy I just listened to the first lesson in the new Fastai course “A Code-First Introduction to Natural Language Processing”. I’m both grateful and excited to use this course to learn about NLP! Thanks so much for creating the course and making it available online!
Is there a “Category” in the Fastai Forum for this course? If not, could you please create one, or let me know the appropriate category for posts relating to the new course? Again, thank you so much!
edit from Jeremy: let’s use this thread for discussion, and if it’s popular, we’ll create separate category
I second that suggestion.
I tried executing the first few jupyter notebooks and there are a few bugs. In the past, with machine learning and deep learning courses there was a specific forum to report and discuss resolution of those bugs.
I’ll assume that you have a working installation of fastai V1.0 under the latest anaconda (version 2019.03 with build channel py37_0), and that you have activated the environment you created for fastai.
Follow these steps to prepare your environment to run the first course notebook:
(1) Install the course materials from github git clone https://github.com/fastai/course-nlp.git
(3) Install nltk, the Natural Language Toolkit, a library for Natural Language Processing that is widely used for teaching and research. conda install -c anaconda nltk
(4) Install spaCy, a library for “Industrial-Strength Natural Language Processing” conda install -c conda-forge spacy
(5) Download an English language model for spaCy python -m spacy download en_core_web_sm
(6) Install fbpca, a library for “Fast computations of PCA/SVD/eigendecompositions via randomized methods” pip install fbpca
After this, you should be able to run the first notebook with code, which is 2-svd-nmf-topic-modeling.ipynb
I will continue to update this post in case the infrastructure needs to be extended in order to run subsequent notebooks in the course.
I’ve used Render with good experiences. Here’s a workshop I did for my class/study group this week on deployment. Nlp is in there too. Deployment on Render UWF
Hi, Zachary Thank you very much for sharing, which are very valuable to me. I am just wondering try to follow your example for “Cats vs Dogs”, when I try to run learn.export(pets.pkl) and get a pkl file, my question is where can I find that “pets.pkl” file? I am running this notebook at google Colab’s. Thanks again!
Hey Anthony, the notebook is just there as a general guide to show where to modify the serve.py file. (Or server.py). I had my students run lesson 1 in the background. Use it as a reference, not plug and run in the notebook however you should see where I change learner’s path. I believe you may have skipped that.
Though in the future I’d direct questions like this to a separate thread as to not lead away from the threads original direction.
[155] itos = itos[:max_vocab]
[156] if len(itos) < max_vocab: #Make sure vocab size is a multiple of 8 for fast mixed precision training
[157] while len(itos)%8 !=0: itos.append('xxfake')
Hi, in the notebook 3-logreg-nb-imdb.ipynb from the video 4, Rachel introduces the coefficient b to get the predictions on the valid dataset in the naïve Bayes sentiment classifier (see screenshot): why?
Hello, I’m wondering what platform jeremy used to train the notepad vietnamese-nn.ipynb with an epoch time inferior to 30mn?
Since training an LM from scratch (for languages other than English) requires lot of resources (because of the corpus size), it would be nice to explain what GPU configuration is needed in AWS, GCP… (ie, in fastai GPU tutorials: https://course.fast.ai/gpu_tutorial.html).
If the answer has already been given in the fastai forum, thanks to give the link to the post (cc @jeremy).
I was wondering the same (while trying to apply those great NLP notebooks to a bidirectional German LM). I found that Vietnamese has about 1/3 less articles than French or German Wikipedia - but in the end I still downsized my ambition to just 20% of those 2.3 Million German Wikipedia entries on a GCP P4…
Hello @jolackner. I think this is an important point for Rachel’s NLP course to really be open to everyone (I mean: to be able to run all notebooks from course-nlp github).
For my part, I spend a lot of time on the GCP platform to get a fast but inexpensive instance to train a Language Model from scratch (in fact, in French and Portuguese = large corpus). But every time I think I found the right configuration, a problem occurs during the training (last problem: SSH connection stopped by GCP).
From experimented fastai users, I would love to get a tutorial on the instance configuration needed to train a LM from scratch on GCP and AWS for example, with a corpus similar to the English one (it means: a huge corpus!).
I totally agree, @pierreguillou . Would be great to get expert info on which instance / memory type should be appropriate in order to successfully train a full Wikipedia LM - I ran into quite some memory errors on GCP, and yes, my preemptible instance was terminated a couple of times before I managed to finish training.
On a side note I think that the Language Model Zoo should be more populated. Let all those beautiful (and smaller) languages roar and be made amenable to NLP tasks thanks to fastai!
Thanks Jeremy. I imagine that using an NVIDIA V100 on GCP would give a similar result but the problem is the stability of the SSH connection to the cloud instance (using a university network helps for that). Do you have any tips to help train a LM from scratch on GCP (with a Wikipedia corpus size similar to the English one like the French one)?