I suspect regression should work fine now, although I haven’t tried it for NLP.
I don’t think it almost ever makes sense to start from scratch. Old English and Modern English have some similarities, so pre-training should help.
Well spotted. The reason is that they’re different. You can label with multiple columns, so it’s called cols
, but you can only split on one column, so it’s called col
.
I don’t think changing bs
after creation will make any difference, since the dataloaders are already created by then. Try passing that to load
as a param.
And try looking at a batch of your data (e.g. with data.train_dl.one_batch()
) and check the shape, to make sure you have the size you expected.
Has anyone encountered this issue, or does anyone know how to resolve it?
In the lesson3-imdb notebook, the exception
NameError: name ‘TextFilesList’ is not defined
is returned by the command
data_lm = (TextFilesList.from_folder(path) .filter_by_folder(include=['train', 'test']) .random_split_by_pct(0.1) .label_for_lm() .databunch())
I can find no documentation for a TextFilesList object in fastai.
Note: I have refreshed the repo as recommended before running the notebook:
git pull
in the /notebooks/course-v3 folder, and
pip install fastai --upgrade
i was thinking about language model and how it was able to predict next word.Now idea that struck me was will it be possible to get a score for sentence out of model for use in sentence comparison.
ideally
sentence[w1…wn] ->language model-> wn+1
and
sentence[w1…wn] ->language model-> classifier+sigmoid ->0,1
could it be something like
sentence[w1…wn] ->language model-> +??? -> sentence representation[1212,1521515,0212,451]
I know this is advanced topic and i found below link in advanced forum but i would like advanced users to share ideas about it in
The same steps will need to be done in neural nets too.
Yes. Of course.
I don’t know what this might be, I assume you ran .fit
already?
So it has a hidden meaning. It will be helpful while writing code, thanks for clearing that
I’ve run it successfully on 16GB cards (P5000 in Paperspace and a P100 in GCP) on the cloud as is.
Have you tried decreasing bptt
on the learner? This helped me in an earlier version of the course. Good luck.
Pull the latest version of the course notebooks. TextFilesList
has now disappeared and we always use TextList
.
yes I did
Thanks @sgugger
I did refresh the repo before running the notebook, running
git pull
in the /notebooks/course-v3 folder, and
pip install fastai --upgrade
Is this what you mean? If not, what do you mean?
Try using them as a single 22gb card, with dataparallel.
DataParallel is one of the first things I add to the notebook. I have had great success with it for images/camvid, but I am afraid it does not work for NLP. I noted this before, perhaps in a different thread.
Here it the error it throws on 1.0.27 which was the same for previous versions:
~/anaconda3/envs/course1018/lib/python3.6/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
516 return modules[name]
517 raise AttributeError("'{}' object has no attribute '{}'".format(
--> 518 type(self).__name__, name))
519
520 def __setattr__(self, name, value):
AttributeError: 'DataParallel' object has no attribute 'reset'
Yes. When people say ai is biased. It also means data on which ai was trained is biased
I think fastai has nothing to do with this: it is pytorch stuff.
However, it could be worth to ask the developers (Jeremy, Sgugger, etc…) about such issues. AWD-lstm is truly a beast of RNN, it would be a shame not to use parallelization, it could completely hinder its usage on non-enterprise hardware.
When I run
data_clas = (TextList.from_folder(path, vocab=data_lm.vocab) # vocab is passed in from our pretrained model so that the numerialization is exactly the same of the same words
#grab all the text files in path
.split_by_folder(valid='test')
#split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
.label_from_folder(classes=['neg', 'pos'])
#remove docs with labels not in above list (i.e. 'unsup')
.filter_missing_y()
#label them all with their folders
.databunch(bs=bs))
data_clas.save(‘tmp_clas’)
I get
TypeError Traceback (most recent call last)
<ipython-input-25-ef1d6c6e4867> in <module>
3 .split_by_folder(valid='test')
4 #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
----> 5 .label_from_folder(classes=['neg', 'pos'])
6 #remove docs with labels not in above list (i.e. 'unsup')
7 .filter_missing_y()
TypeError: 'bool' object is not callable
It loads for a bit then throws this. Any fix? I am using the latest version.