Part 2 Lesson 10 wiki

KevinB · April 3, 2018, 2:08am

Will that be used later to determine the labels of the test file?

erinjerri · April 3, 2018, 2:08am

I was thinking this was like last course on NLP (part I)? Since it was imdb.ipynb.?

binga · April 3, 2018, 2:08am

Number of records.

rachel · April 3, 2018, 2:09am

The notebook is here

vikbehal · April 3, 2018, 2:09am

In comparison to part 1 IMDB nb. That nb took a while to train!

wgpubs · April 3, 2018, 2:09am

If it is used that way, why not just assign the label instead of setting them all to zero?

erinjerri · April 3, 2018, 2:09am

My bad, just did another git pull, forgot to update -_-.

gerardo · April 3, 2018, 2:13am

The data is here
http://files.fast.ai/data/aclImdb.tgz

rudraksh · April 3, 2018, 2:14am

Why can’t we simple add a new dimension in the embedding for each word denoting whether the word was uppercased or not and finetune that as well?

snagpaul · April 3, 2018, 2:14am

Applause on the t_up trick.

emilmelnikov · April 3, 2018, 2:15am

So, are we essentially learning a (semantic) markup along with the plain text?

gerardo · April 3, 2018, 2:16am

What happens if we load a new set and the new set includes the words removed because 2 or less repetitions?

NaOlimar · April 3, 2018, 2:16am

Are numbers of the text also been changed to another value(numericalised) ? If not how the model does not that a number represent a word or a real number originally written in the text ?

keitabr · April 3, 2018, 2:16am

Is there a way to introduce context from outside of the corpus?

narvind2003 · April 3, 2018, 2:17am

always use the same tokenizer + vocab for future text inputs.

ziron · April 3, 2018, 2:18am

What if the numpy array does not fit in memory? Is it possible to write a pytorch dataloader directly from a large csv file?

fizx · April 3, 2018, 2:19am

Totally valid too. I’d be interested in pros/cons.

erinjerri · April 3, 2018, 2:20am

Is it weird that I got an error when doing a recent git pull on this repo? Seemed odd. But my guess is bc I’m doing this locally on CPU as opposed to at home with my GPU? Things were fine last week during class.

NoPackagesFoundError: Package missing in current osx-64 channels: - cuda90

emilmelnikov · April 3, 2018, 2:20am

For those who don’t have the previous IMDB notebook executed for some reason: you need to download the SpaCy English language model (GitHub issue link):

python -m spacy download en

binga · April 3, 2018, 2:21am

You might have to use the conda env update -f environment-cpu.yml on your Macbook.