KevinB
(Kevin Bird)
November 19, 2018, 5:53pm
1
I’m having issues with the new data block api.
Here is the call I’m trying to use:
data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
.split_by_folder(valid='valid')
.label_from_folder(classes=['neg','pos'])
.filter_missing_y()
.databunch(bs=bs))
and this is my error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-138-cffcd242252b> in <module>
1 data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
2 .split_by_folder(valid='valid')
----> 3 .label_from_folder(classes=['neg','pos'])
4 .filter_missing_y()
5 .databunch(bs=bs))
TypeError: 'bool' object is not callable
This is what my directory structure looks like:
and inside of the neg and pos directory are .txt files.
I’m not really sure how to debug this further since it is inside of the data block, should I try going back to the other way of grabbing data?
Here is my version information:
=== Software ===
python version : 3.6.6
fastai version : 1.0.27
torch version : 1.0.0.dev20181029
nvidia driver : 396.37
torch cuda ver : 9.2.148
torch cuda is : available
torch cudnn ver : 7104
torch cudnn is : enabled
=== Hardware ===
nvidia gpus : 1
torch available : 1
- gpu0 : 16270MB | Quadro P5000
=== Environment ===
platform : Linux-3.10.0-862.11.6.el7.x86_64-x86_64-with-centos-7.5.1804-Core
distro : #1 SMP Tue Aug 14 21:49:04 UTC 2018
conda env : kbird
python : /home/kbird/.conda/envs/kbird/bin/python
sys.path :
/home/kbird/.conda/envs/kbird/lib/python36.zip
/home/kbird/.conda/envs/kbird/lib/python3.6
/home/kbird/.conda/envs/kbird/lib/python3.6/lib-dynload
/home/kbird/.local/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython
3 Likes
jm0077
(Jesús Pérez)
November 20, 2018, 11:47am
2
Hello, I’m getting the same error even with the updated fastai version (1.0.28). Did you find any solution?
BTW, how do you obtain all the versions of the libraries installed in your VM instance?
KevinB
(Kevin Bird)
November 20, 2018, 2:09pm
3
First, import the utils library from fastai:
from fastai.utils import *
Then the command to see the versions is:
show_install()
This also has an optional show_nvidia_smi
parameter that will show your nvidia-smi output as well.
I haven’t found an issue to this yet. Are you running your own dataset or the imdb one?
1 Like
jm0077
(Jesús Pérez)
November 20, 2018, 2:12pm
4
Thanks for the info. About the issue, I’m running in IMDB dataset (even with no modifications from my side). I think this post should be moved to fastai course v3 section (to obtain more visibility)
1 Like
KevinB
(Kevin Bird)
November 20, 2018, 2:13pm
5
You’re probably correct. I was going to leave it here since I was just having issues on my own dataset, but if you are seeing the same thing with the imdb data, it probably is best to put it over there to see if other people are seeing the same issue.
hi Kevin,
is this resolved? thank you!
KevinB
(Kevin Bird)
November 20, 2018, 6:02pm
7
Not yet, I might try to dig into it but I don’t really know how to walk through the issue
i used an earlier version of the nb and it does not have the filter_missing_y, and it works. I’m not sure what this does, the classes=[‘neg’, ‘pos’]) seems already selecting only neg/pos not the unsup.
what do you think?
data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
#grab all the text files in path
.split_by_folder(valid=‘test’)
#split by train and valid folder (that only keeps ‘train’ and ‘test’ so no need to filter)
.label_from_folder(classes=[‘neg’, ‘pos’])
#remove docs with labels not in above list (i.e. ‘unsup’)
.filter_missing_y()
#label them all with their folders
1 Like
KevinB
(Kevin Bird)
November 20, 2018, 8:41pm
9
I think you’re onto something. When I comment out .filter_missing_y()
It doesn’t error.
jm0077
(Jesús Pérez)
November 20, 2018, 9:09pm
10
I was searching in the github of the library and found that filter_missing_y is an object of TextList class so it can’t be callable, right?
class TextList(ItemList):
_bunch = TextClasDataBunch
_processor = [TokenizeProcessor, NumericalizeProcessor]
def __init__(self, items:Iterator, vocab:Vocab=None, **kwargs):
self.filter_missing_y = True
super().__init__(items, **kwargs)
self.vocab = vocab
Githhub for data.py
1 Like
fkautz
(Frederick Kautz)
November 21, 2018, 10:44pm
11
I just tested this and it worked:
text_list = TextList.from_folder(path, vocab=data_lm.vocab)
text_list_split_by_folder = text_list.split_by_folder(valid='test')
text_list_labeled_by_folder = text_list_split_by_folder.label_from_folder(classes=['neg', 'pos'])
text_list_labeled_by_folder.filter_missing_y = True
data_clas = text_list_labeled_by_folder.databunch(bs=bs)
data_clas.save('tmp_clas')
6 Likes
I’ve been looking for the updated version of pretrained wt103 with the .pth path. here I found them:
http://files.fast.ai/models/wt103_v1/
That’s correct. That’s why when you separate out each operation into separate lines, it works correctly (as suggested by @fkautz ).