A Code-First Introduction to Natural Language Processing 2019

pete88b · July 16, 2020, 3:53pm

just got a little distracted with the sparsity pattern of the doc-term matrix in 3-logreg-nb-imdb:

I think we see the density ripples because “Elements with equal counts are ordered in the order first encountered” - so tokens are ordered by usage frequency then by when they were 1st encountered.

If we zoom right in ↓ we can see new tokens added by the 1st few docs get consecutively assigned indexes - (in the plot above ↑ this is what makes the edge of the ripple dense):

If we remove doc order from the plot with np.random.shuffle(A), we loose the ripples:

aaaa · July 31, 2020, 6:16pm

I’m trying to run the nn-vietnamese notebook, but the get_wiki function won’t work. It returns the following error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/root/.fastai/data/viwiki/text/AA/wiki_00’

This is after “extracting…” gets printed, so I’d assume this is the step:
shutil.move(str(path/‘text/AA/wiki_00’), str(path/name))

I’m using Google Colab, GPU-enabled.

Any idea what the error might be? I’d suspect it’s Colab running out of space, but the download and upzipping seem to work fine.

FelixRe · December 7, 2020, 9:35am

There is an issue with the (latest) wikiextractor version.
I manually downloaded wikiextractor from this commit: attardi/wikiextractor at e4abb4cbd019b0257824ee47c23dd163919b731b (github.com)
Just replace the files created by the nlputils script.

mariannak · December 16, 2020, 1:53pm

Hi! If I want to run a notebook (5-nn-imdb) on a cloud, how should I deal with this “Unzip it into the .fastai/data/ folder on your computer.”? Where I can put wikitext?

Qazqa · March 24, 2021, 9:22am

The wikiextractor is not working.

Qazqa · March 24, 2021, 11:05am

This does not work.

tanwarkml · August 13, 2021, 4:16am

Thanks for the list. I just started running the notebooks in course materials and it seems some modules are deprecated (or their names have changed). For example, in 2-svd-nmf-topic-modeling.ipynb

from sklearn.feature_extraction import stop_words

is now

from sklearn.feature_extraction import _stop_words

Is it possible to know the version of all the libraries mentioned above?
P.S. I’m using scikit-learn 0.24.2

sinhak · January 14, 2022, 1:29pm

Take a look at the kaggle dataset giga-fren | Kaggle

This dataset has a file called giga-fren.csv which I believe is the same as questions_easy.csv

Hope this helps!

willisjs · November 23, 2022, 3:46pm

Is this course deprecated, obsolete and/or unsupported?

bencoman · June 3, 2023, 6:28am

Just sharing a paper explaination that was only a bit above my level of understanding, so perhaps broadly interesting to others (and summarising it helps me remember better)

RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)

TLDR; Introduction - 5 min

Transformers suffer from quadratic compute/memory
RNN have linear compute/memory, but struggle to match Transformer performance due to paralizability/scaling issues
RWKV strikes a balance combining the efficient parallizability of Transformers with the efficient inference of RNN.
intro ends with this table…

image747×360 54.2 KB

[TLDR; Conculsion, 1 minute cliips]

pkn · June 29, 2023, 5:59pm

I’m in the 3-logreg-nb-imdb.ipynb portion of the lecture. Most of the code is giving error. I know the tutorial asked to install the fastaiv1, but most of the method used in fastai_v1 are changed.

I’m trying to solve all the errors and run the notebook, although it is taking too much time in solving the errors rather than understanding the concept. I know this would be a practice for me to learn to debug. Still, I’m not in favour of changing the whole notebook implementing the new methods from the new fastai version. This is just Lesson 3 As such I’m a newbie to NLP and it would take a lot of time to do that.

Now one solution is to install the required packages in local and start running the notebooks. But I don’t have GPU in my laptop and wanted to run all the notebooks on Google coLab. Now Another problem in google colab is that the python installed ther eis 3.10.something. While most of the fastai’s components are valid on python versions<3.10. Basically what I mean to say is some of the fastai’s modules are not compatible with python 3.10

So what I wanted to ask is :

Is there a way to run the notebook with minimal errors( errors especially related to core packages like fastai and it’s sub-modules like DataLoaders, TextDataLoaders etc) so that I can focus on learning the concepts of NLP rather than understanding what the new methods do as compared to older methods. FYI the notebook uses methods from old versions ?
If not, what would your suggestion about ?
are we going to have a new version of the A Code-First Introduction to Natural Language Processing 2019 in the near future ?

Can anybody please help me?