Lesson 4 In-Class Discussion ✅

jeremy · November 17, 2018, 12:18am

I suspect regression should work fine now, although I haven’t tried it for NLP.

jeremy · November 17, 2018, 12:19am

I don’t think it almost ever makes sense to start from scratch. Old English and Modern English have some similarities, so pre-training should help.

jeremy · November 17, 2018, 12:25am

Well spotted. The reason is that they’re different. You can label with multiple columns, so it’s called cols, but you can only split on one column, so it’s called col.

jeremy · November 17, 2018, 12:29am

I don’t think changing bs after creation will make any difference, since the dataloaders are already created by then. Try passing that to load as a param.

And try looking at a batch of your data (e.g. with data.train_dl.one_batch()) and check the shape, to make sure you have the size you expected.

jcatanza · November 17, 2018, 1:18am

Has anyone encountered this issue, or does anyone know how to resolve it?

In the lesson3-imdb notebook, the exception

NameError: name ‘TextFilesList’ is not defined

is returned by the command

data_lm = (TextFilesList.from_folder(path) .filter_by_folder(include=['train', 'test']) .random_split_by_pct(0.1) .label_for_lm() .databunch())

I can find no documentation for a TextFilesList object in fastai.

Note: I have refreshed the repo as recommended before running the notebook:
git pull in the /notebooks/course-v3 folder, and
pip install fastai --upgrade

chans.best · November 17, 2018, 3:22am

i was thinking about language model and how it was able to predict next word.Now idea that struck me was will it be possible to get a score for sentence out of model for use in sentence comparison.

ideally
sentence[w1…wn] ->language model-> wn+1
and
sentence[w1…wn] ->language model-> classifier+sigmoid ->0,1

could it be something like
sentence[w1…wn] ->language model-> +??? -> sentence representation[1212,1521515,0212,451]

I know this is advanced topic and i found below link in advanced forum but i would like advanced users to share ideas about it in

MaheshKhatri · November 17, 2018, 4:11am

The same steps will need to be done in neural nets too.

MaheshKhatri · November 17, 2018, 4:27am

Yes. Of course.

MaheshKhatri · November 17, 2018, 4:47am

https://pytorch.org/docs/0.3.1/torch.html

lesscomfortable · November 17, 2018, 5:54am

I don’t know what this might be, I assume you ran .fit already?

ymittal23 · November 17, 2018, 6:54am

So it has a hidden meaning. It will be helpful while writing code, thanks for clearing that

edwardjross · November 17, 2018, 9:11am

I’ve run it successfully on 16GB cards (P5000 in Paperspace and a P100 in GCP) on the cloud as is.

Have you tried decreasing bptt on the learner? This helped me in an earlier version of the course. Good luck.

sgugger · November 17, 2018, 2:47pm

Pull the latest version of the course notebooks. TextFilesList has now disappeared and we always use TextList.

kofi · November 17, 2018, 3:11pm

yes I did

jcatanza · November 17, 2018, 6:18pm

Thanks @sgugger

I did refresh the repo before running the notebook, running
git pull in the /notebooks/course-v3 folder, and
pip install fastai --upgrade

Is this what you mean? If not, what do you mean?

balnazzar · November 17, 2018, 10:48pm

Try using them as a single 22gb card, with dataparallel.

FourMoBro · November 18, 2018, 1:03am

DataParallel is one of the first things I add to the notebook. I have had great success with it for images/camvid, but I am afraid it does not work for NLP. I noted this before, perhaps in a different thread.

Here it the error it throws on 1.0.27 which was the same for previous versions:

~/anaconda3/envs/course1018/lib/python3.6/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    516                 return modules[name]
    517         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 518             type(self).__name__, name))
    519 
    520     def __setattr__(self, name, value):

AttributeError: 'DataParallel' object has no attribute 'reset'

nikhil.ikhar · November 18, 2018, 4:10am

Yes. When people say ai is biased. It also means data on which ai was trained is biased

balnazzar · November 18, 2018, 10:29am

I think fastai has nothing to do with this: it is pytorch stuff.

However, it could be worth to ask the developers (Jeremy, Sgugger, etc…) about such issues. AWD-lstm is truly a beast of RNN, it would be a shame not to use parallelization, it could completely hinder its usage on non-enterprise hardware.

bluesky314 · November 18, 2018, 10:33am

When I run

data_clas = (TextList.from_folder(path, vocab=data_lm.vocab) # vocab is passed in from our pretrained model so that the numerialization is exactly the same of the same words
         #grab all the text files in path
         .split_by_folder(valid='test')
         #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
         .label_from_folder(classes=['neg', 'pos'])
         #remove docs with labels not in above list (i.e. 'unsup')
         .filter_missing_y()
         #label them all with their folders
         .databunch(bs=bs))

data_clas.save(‘tmp_clas’)

I get

TypeError                                 Traceback (most recent call last)
<ipython-input-25-ef1d6c6e4867> in <module>
  3              .split_by_folder(valid='test')
  4              #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
----> 5              .label_from_folder(classes=['neg', 'pos'])
  6              #remove docs with labels not in above list (i.e. 'unsup')
  7              .filter_missing_y()

TypeError: 'bool' object is not callable

It loads for a bit then throws this. Any fix? I am using the latest version.