Lesson 4 In-Class Discussion ✅

I suspect regression should work fine now, although I haven’t tried it for NLP.

1 Like

I don’t think it almost ever makes sense to start from scratch. Old English and Modern English have some similarities, so pre-training should help.

1 Like

Well spotted. The reason is that they’re different. You can label with multiple columns, so it’s called cols, but you can only split on one column, so it’s called col. :slight_smile:

1 Like

I don’t think changing bs after creation will make any difference, since the dataloaders are already created by then. Try passing that to load as a param.

And try looking at a batch of your data (e.g. with data.train_dl.one_batch()) and check the shape, to make sure you have the size you expected.

2 Likes

Has anyone encountered this issue, or does anyone know how to resolve it?

In the lesson3-imdb notebook, the exception

NameError: name ‘TextFilesList’ is not defined

is returned by the command

data_lm = (TextFilesList.from_folder(path) .filter_by_folder(include=['train', 'test']) .random_split_by_pct(0.1) .label_for_lm() .databunch())

I can find no documentation for a TextFilesList object in fastai.

Note: I have refreshed the repo as recommended before running the notebook:
git pull in the /notebooks/course-v3 folder, and
pip install fastai --upgrade

i was thinking about language model and how it was able to predict next word.Now idea that struck me was will it be possible to get a score for sentence out of model for use in sentence comparison.

ideally
sentence[w1…wn] ->language model-> wn+1
and
sentence[w1…wn] ->language model-> classifier+sigmoid ->0,1

could it be something like
sentence[w1…wn] ->language model-> +??? -> sentence representation[1212,1521515,0212,451]

I know this is advanced topic and i found below link in advanced forum but i would like advanced users to share ideas about it in

The same steps will need to be done in neural nets too.

Yes. Of course.

https://pytorch.org/docs/0.3.1/torch.html

I don’t know what this might be, I assume you ran .fit already?

So it has a hidden meaning. It will be helpful while writing code, thanks for clearing that :slight_smile:

I’ve run it successfully on 16GB cards (P5000 in Paperspace and a P100 in GCP) on the cloud as is.

Have you tried decreasing bptt on the learner? This helped me in an earlier version of the course. Good luck.

Pull the latest version of the course notebooks. TextFilesList has now disappeared and we always use TextList.

yes I did

Thanks @sgugger

I did refresh the repo before running the notebook, running
git pull in the /notebooks/course-v3 folder, and
pip install fastai --upgrade

Is this what you mean? If not, what do you mean?

Try using them as a single 22gb card, with dataparallel.

DataParallel is one of the first things I add to the notebook. I have had great success with it for images/camvid, but I am afraid it does not work for NLP. I noted this before, perhaps in a different thread.

Here it the error it throws on 1.0.27 which was the same for previous versions:

~/anaconda3/envs/course1018/lib/python3.6/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
    516                 return modules[name]
    517         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 518             type(self).__name__, name))
    519 
    520     def __setattr__(self, name, value):

AttributeError: 'DataParallel' object has no attribute 'reset'
2 Likes

Yes. When people say ai is biased. It also means data on which ai was trained is biased

I think fastai has nothing to do with this: it is pytorch stuff.

However, it could be worth to ask the developers (Jeremy, Sgugger, etc…) about such issues. AWD-lstm is truly a beast of RNN, it would be a shame not to use parallelization, it could completely hinder its usage on non-enterprise hardware.

When I run

data_clas = (TextList.from_folder(path, vocab=data_lm.vocab) # vocab is passed in from our pretrained model so that the numerialization is exactly the same of the same words
         #grab all the text files in path
         .split_by_folder(valid='test')
         #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
         .label_from_folder(classes=['neg', 'pos'])
         #remove docs with labels not in above list (i.e. 'unsup')
         .filter_missing_y()
         #label them all with their folders
         .databunch(bs=bs))

data_clas.save(‘tmp_clas’)

I get

TypeError                                 Traceback (most recent call last)
<ipython-input-25-ef1d6c6e4867> in <module>
  3              .split_by_folder(valid='test')
  4              #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
----> 5              .label_from_folder(classes=['neg', 'pos'])
  6              #remove docs with labels not in above list (i.e. 'unsup')
  7              .filter_missing_y()

TypeError: 'bool' object is not callable

It loads for a bit then throws this. Any fix? I am using the latest version.

4 Likes