A Code-First Introduction to Natural Language Processing 2019

just got a little distracted with the sparsity pattern of the doc-term matrix in 3-logreg-nb-imdb:
image
I think we see the density ripples because “Elements with equal counts are ordered in the order first encountered” - so tokens are ordered by usage frequency then by when they were 1st encountered.

If we zoom right in ↓ we can see new tokens added by the 1st few docs get consecutively assigned indexes - (in the plot above ↑ this is what makes the edge of the ripple dense):
image
If we remove doc order from the plot with np.random.shuffle(A), we loose the ripples:
image

I’m trying to run the nn-vietnamese notebook, but the get_wiki function won’t work. It returns the following error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/viwiki/text/AA/wiki_00’

This is after “extracting…” gets printed, so I’d assume this is the step:
shutil.move(str(path/‘text/AA/wiki_00’), str(path/name))

I’m using Google Colab, GPU-enabled.

Any idea what the error might be? I’d suspect it’s Colab running out of space, but the download and upzipping seem to work fine.

There is an issue with the (latest) wikiextractor version.
I manually downloaded wikiextractor from this commit: attardi/wikiextractor at e4abb4cbd019b0257824ee47c23dd163919b731b (github.com)
Just replace the files created by the nlputils script.

Hi! If I want to run a notebook (5-nn-imdb) on a cloud, how should I deal with this “Unzip it into the .fastai/data/ folder on your computer.”? Where I can put wikitext?

The wikiextractor is not working.

This does not work.

Thanks for the list. I just started running the notebooks in course materials and it seems some modules are deprecated (or their names have changed). For example, in 2-svd-nmf-topic-modeling.ipynb

from sklearn.feature_extraction import stop_words

is now

from sklearn.feature_extraction import _stop_words

Is it possible to know the version of all the libraries mentioned above?
P.S. I’m using scikit-learn 0.24.2

Take a look at the kaggle dataset giga-fren | Kaggle

This dataset has a file called giga-fren.csv which I believe is the same as questions_easy.csv

Hope this helps!

Is this course deprecated, obsolete and/or unsupported?

1 Like

Just sharing a paper explaination that was only a bit above my level of understanding, so perhaps broadly interesting to others (and summarising it helps me remember better)

RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)

TLDR; Introduction - 5 min

  • Transformers suffer from quadratic compute/memory
  • RNN have linear compute/memory, but struggle to match Transformer performance due to paralizability/scaling issues
  • RWKV strikes a balance combining the efficient parallizability of Transformers with the efficient inference of RNN.
  • intro ends with this table…

[TLDR; Conculsion, 1 minute cliips]

2 Likes

I’m in the 3-logreg-nb-imdb.ipynb portion of the lecture. Most of the code is giving error. I know the tutorial asked to install the fastaiv1, but most of the method used in fastai_v1 are changed.

I’m trying to solve all the errors and run the notebook, although it is taking too much time in solving the errors rather than understanding the concept. I know this would be a practice for me to learn to debug. Still, I’m not in favour of changing the whole notebook implementing the new methods from the new fastai version. This is just Lesson 3 As such I’m a newbie to NLP and it would take a lot of time to do that.

Now one solution is to install the required packages in local and start running the notebooks. But I don’t have GPU in my laptop and wanted to run all the notebooks on Google coLab. Now Another problem in google colab is that the python installed ther eis 3.10.something. While most of the fastai’s components are valid on python versions<3.10. Basically what I mean to say is some of the fastai’s modules are not compatible with python 3.10

So what I wanted to ask is :

  • Is there a way to run the notebook with minimal errors( errors especially related to core packages like fastai and it’s sub-modules like DataLoaders, TextDataLoaders etc) so that I can focus on learning the concepts of NLP rather than understanding what the new methods do as compared to older methods. FYI the notebook uses methods from old versions ?
  • If not, what would your suggestion about ?
  • are we going to have a new version of the A Code-First Introduction to Natural Language Processing 2019 in the near future ?

Can anybody please help me?