'bool' object is not callable for TextList data block

KevinB · November 19, 2018, 5:53pm

I’m having issues with the new data block api.

Here is the call I’m trying to use:

data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
            .split_by_folder(valid='valid')
            .label_from_folder(classes=['neg','pos'])
            .filter_missing_y()
            .databunch(bs=bs))

and this is my error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-138-cffcd242252b> in <module>
      1 data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
      2             .split_by_folder(valid='valid')
----> 3             .label_from_folder(classes=['neg','pos'])
      4             .filter_missing_y()
      5             .databunch(bs=bs))

TypeError: 'bool' object is not callable

This is what my directory structure looks like:

and inside of the neg and pos directory are .txt files.

I’m not really sure how to debug this further since it is inside of the data block, should I try going back to the other way of grabbing data?

Here is my version information:

=== Software === 
python version  : 3.6.6
fastai version  : 1.0.27
torch version   : 1.0.0.dev20181029
nvidia driver   : 396.37
torch cuda ver  : 9.2.148
torch cuda is   : available
torch cudnn ver : 7104
torch cudnn is  : enabled

=== Hardware === 
nvidia gpus     : 1
torch available : 1
  - gpu0        : 16270MB | Quadro P5000

=== Environment === 
platform        : Linux-3.10.0-862.11.6.el7.x86_64-x86_64-with-centos-7.5.1804-Core
distro          : #1 SMP Tue Aug 14 21:49:04 UTC 2018
conda env       : kbird
python          : /home/kbird/.conda/envs/kbird/bin/python
sys.path        : 
/home/kbird/.conda/envs/kbird/lib/python36.zip
/home/kbird/.conda/envs/kbird/lib/python3.6
/home/kbird/.conda/envs/kbird/lib/python3.6/lib-dynload
/home/kbird/.local/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages
/home/kbird/.conda/envs/kbird/lib/python3.6/site-packages/IPython/extensions
/home/kbird/.ipython

jm0077 · November 20, 2018, 11:47am

Hello, I’m getting the same error even with the updated fastai version (1.0.28). Did you find any solution?
BTW, how do you obtain all the versions of the libraries installed in your VM instance?

KevinB · November 20, 2018, 2:09pm

First, import the utils library from fastai:

from fastai.utils import *

Then the command to see the versions is:

show_install()

This also has an optional show_nvidia_smi parameter that will show your nvidia-smi output as well.

I haven’t found an issue to this yet. Are you running your own dataset or the imdb one?

jm0077 · November 20, 2018, 2:12pm

Thanks for the info. About the issue, I’m running in IMDB dataset (even with no modifications from my side). I think this post should be moved to fastai course v3 section (to obtain more visibility)

KevinB · November 20, 2018, 2:13pm

You’re probably correct. I was going to leave it here since I was just having issues on my own dataset, but if you are seeing the same thing with the imdb data, it probably is best to put it over there to see if other people are seeing the same issue.

angelinayy · November 20, 2018, 6:00pm

hi Kevin,

is this resolved? thank you!

KevinB · November 20, 2018, 6:02pm

Not yet, I might try to dig into it but I don’t really know how to walk through the issue

angelinayy · November 20, 2018, 7:31pm

i used an earlier version of the nb and it does not have the filter_missing_y, and it works. I’m not sure what this does, the classes=[‘neg’, ‘pos’]) seems already selecting only neg/pos not the unsup.

what do you think?

data_clas = (TextList.from_folder(path, vocab=data_lm.vocab)
#grab all the text files in path
.split_by_folder(valid=‘test’)
#split by train and valid folder (that only keeps ‘train’ and ‘test’ so no need to filter)
.label_from_folder(classes=[‘neg’, ‘pos’])
#remove docs with labels not in above list (i.e. ‘unsup’)

.filter_missing_y()
         #label them all with their folders

KevinB · November 20, 2018, 8:41pm

I think you’re onto something. When I comment out .filter_missing_y() It doesn’t error.

jm0077 · November 20, 2018, 9:09pm

I was searching in the github of the library and found that filter_missing_y is an object of TextList class so it can’t be callable, right?

class TextList(ItemList):
    _bunch = TextClasDataBunch
    _processor = [TokenizeProcessor, NumericalizeProcessor]

    def __init__(self, items:Iterator, vocab:Vocab=None, **kwargs):
        self.filter_missing_y = True
        super().__init__(items, **kwargs)
        self.vocab = vocab

Githhub for data.py

fkautz · November 21, 2018, 10:44pm

I just tested this and it worked:

text_list = TextList.from_folder(path, vocab=data_lm.vocab)
text_list_split_by_folder = text_list.split_by_folder(valid='test')
text_list_labeled_by_folder = text_list_split_by_folder.label_from_folder(classes=['neg', 'pos'])

text_list_labeled_by_folder.filter_missing_y = True

data_clas = text_list_labeled_by_folder.databunch(bs=bs)
data_clas.save('tmp_clas')

angelinayy · November 26, 2018, 1:50am

I’ve been looking for the updated version of pretrained wt103 with the .pth path. here I found them:

http://files.fast.ai/models/wt103_v1/

Carte_Blanche · November 29, 2018, 11:19am

That’s correct. That’s why when you separate out each operation into separate lines, it works correctly (as suggested by @fkautz).