Lesson 3 In-Class Discussion ✅

Yikes, we both made a simple mistake here! I just noticed that last night you had suggested:

data_lm = (TextList.from_df(df, cols=[0])

and

data_lm = (TextList.from_df(df, cols=[‘label’])

…but I think you meant to refer to the text field - like cols=[1] or cols=[‘text’]. When I do that it works fine, with 1 or ‘text’! So that is a shorter syntax that works well:

data_lm = (TextList.from_df(df, cols=['text'])
                .random_split_by_pct(0.2)
                .label_for_lm()
                .databunch())
data_lm.save('tmp_lm')

So this is working fine for me now - but it’s weird that the platform makes a difference. I’m planning to change platform. Do you run mainly on GCP?

1 Like

Oh, I did it on purpose, since we had already tried cols='text'. However, the difference here is you are sending a list instead of a string :sunglasses:. I run GCP and it rocks :fire: but I am running out of credits and will turn back to my home server soon.

1 Like

@lesscomfortable,

Can I have all my images in a single folder. Image names (w/o suffix) & labels (one label per image) in a labels.csv file and build a databunch like this?

src = (ImageItemList.from_csv('./', 'labels.csv', folder='images', suffix='.jpg')
       .random_split_by_pct(0.2)
       .label_from_df(sep=',')
       .transform(get_transforms(), size=64)
      .databunch())

I tried.

Curiously no errors (How did the code know name of the DataFrame?) but

data.train_ds[0] >>>(Image (3, 64, 64), MultiCategory boston_bull)

So is the combination of ImageItemList.from_csv & label_from_df meant only for MultiCategory?

Update

       .label_from_df(sep=None)

nicely results into

data.train_ds[0] >>>(Image (3, 64, 64), Category boston_bull)

Thanks Francisco

I’m also having similar problems when trying to running data.show_batch(). I’m working with a really small image data set (only 12 images), and they’re all different sizes. Not sure if this plays any part into this error.

I’m getting the same problem. Did you figure out what was causing it?

Argh - I’m trying to move ahead and am now hitting another error, and it may be related. After:

learn.unfreeze()
learn.fit_one_cycle(10, 3e-3, moms=(0.8,0.7))
...
learn.save_encoder('fine_tuned_enc')
...
data_clas = (TextList.from_df(df, cols='text', vocab=data_lm.vocab)
            .random_split_by_pct(valid_pct=0.2)
            .label_from_df(cols='label')
            .databunch(bs=bs))
data_clas.save('tmp_clas')
data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
learn_c = text_classifier_learner(data_clas, drop_mult=0.5)
learn_c.load_encoder('fine_tuned_enc')

the last line throws:

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
	size mismatch for encoder.weight: copying a param of torch.Size([33080, 400]) from checkpoint, where the shape is torch.Size([33387, 400]) in current model.
	size mismatch for encoder_dp.emb.weight: copying a param of torch.Size([33080, 400]) from checkpoint, where the shape is torch.Size([33387, 400]) in current model.

I notice that fine_tuned_enc.pth appears to be unchanged from 2 days ago:


That seems really weird but I don’t know if that’s what’s causing the error.

1 Like

The same problem is happening for me, Size-Mismatch-Error by loading the encoder(of Language model) in Text_Classifier_Learner

NVM! I started up GCP today and this runs fine there - so this appears to be a Gradient problem! Weird…

2 Likes

Are you running on Gradient?

Were you able to fix the problem? I’m having the same problem when I’m trying to build language model using different dataset.

Yikes - yes I’m pretty sure I fixed it but it’s been weeks so I’ll have to go back to that nb and see if I have notes on it. I think it may have been fixed with a newer fastai version but don’t recall - but you should try that!

Yeah, I tried that already. I’ve updated to the latest version. I’m also getting another error saying “AttributeError: ‘float’ object has no attribute ‘replace’”

@hxiao0909 were you able to solve this issue, I am facing the same one too @joshfp @sgugger could you please help me with this one

I just updated the Zeit script to the latest fastai (1.0.40) and it’s working properly. Be sure to pull the last version of the corresponding notebook as it’s using the new fast way to do inference.

2 Likes

There is some new type of error coming up then @sgugger
client.js:34 POST https://fastai-1v3.appspot.com/analyze 500
analyze @ client.js:34
onclick @ (index):28
VM173:1 Uncaught SyntaxError: Unexpected token I in JSON at position 0
at JSON.parse ()
at XMLHttpRequest.xhr.onload (client.js:26)
xhr.onload @ client.js:26
load (async)
analyze @ client.js:24
onclick @ (index):28


@sgugger
The above was when i was trying to use google app engine
when I tried to use zeit first i got this error


Then i changed the now.json to
version:2 and removed all remaining lines

But then when i am trying to go on this deployed app site
it is showing the directory structure only which is here
https://zeit-6js94ceor.now.sh/

Actually, what Jeremy ones suggested was not the through the 4th channel out because it is like throwing away more information.

For an excellent code example for modifying 3 channel input pretrained models into 4 (or even more if you wish) by @wdhorton for the Human Protein Atlas competition here.

I’m working with a Kaggle fake news dataset. When I use:

data_lm = (TextList.from_df(df, cols=['text','type'])
                .random_split_by_pct()
                .label_for_lm()
                .databunch(bs=bs))

I get:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-632209f31b16> in <module>
----> 1 data_lm = (TextList.from_df(df, cols=['text','type'])
      2                 .random_split_by_pct()
      3                 .label_for_lm()
      4                 .databunch(bs=bs))

AttributeError: 'float' object has no attribute 'replace'

and when I try:

data_lm = (TextList.from_df(df, cols=['text','type'])
                .random_split_by_pct(0.2)
                .label_for_lm()
                .databunch(bs=bs))

I get:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-223d106d8d5b> in <module>
      1 data_lm = (TextList.from_df(df, cols=['text','type'])
----> 2                 .random_split_by_pct(0.2)
      3                 .label_for_lm()
      4                 .databunch(bs=bs))

AttributeError: 'float' object has no attribute 'replace'

In other words, same error on the next line. Does anyone know what’s going on here?

Thanks!

NVM! It was caused by Nans in the text column… Filtered the Nans from the df and it works fine!

1 Like