Lesson 3 In-Class Discussion ✅

ricknta · December 5, 2018, 1:11am

Yikes, we both made a simple mistake here! I just noticed that last night you had suggested:

data_lm = (TextList.from_df(df, cols=[0])

and

data_lm = (TextList.from_df(df, cols=[‘label’])

…but I think you meant to refer to the text field - like cols=[1] or cols=[‘text’]. When I do that it works fine, with 1 or ‘text’! So that is a shorter syntax that works well:

data_lm = (TextList.from_df(df, cols=['text'])
                .random_split_by_pct(0.2)
                .label_for_lm()
                .databunch())
data_lm.save('tmp_lm')

So this is working fine for me now - but it’s weird that the platform makes a difference. I’m planning to change platform. Do you run mainly on GCP?

lesscomfortable · December 5, 2018, 1:12am

Oh, I did it on purpose, since we had already tried cols='text'. However, the difference here is you are sending a list instead of a string . I run GCP and it rocks but I am running out of credits and will turn back to my home server soon.

sam2 · December 6, 2018, 2:14am

@lesscomfortable,

Can I have all my images in a single folder. Image names (w/o suffix) & labels (one label per image) in a labels.csv file and build a databunch like this?

src = (ImageItemList.from_csv('./', 'labels.csv', folder='images', suffix='.jpg')
       .random_split_by_pct(0.2)
       .label_from_df(sep=',')
       .transform(get_transforms(), size=64)
      .databunch())

I tried.

Curiously no errors (How did the code know name of the DataFrame?) but

data.train_ds[0] >>>(Image (3, 64, 64), MultiCategory boston_bull)

So is the combination of ImageItemList.from_csv & label_from_df meant only for MultiCategory?

Update

       .label_from_df(sep=None)

nicely results into

data.train_ds[0] >>>(Image (3, 64, 64), Category boston_bull)

Thanks Francisco

waydegg · December 6, 2018, 4:05am

I’m also having similar problems when trying to running data.show_batch(). I’m working with a really small image data set (only 12 images), and they’re all different sizes. Not sure if this plays any part into this error.

ricknta · December 7, 2018, 12:52am

I’m getting the same problem. Did you figure out what was causing it?

ricknta · December 7, 2018, 4:29am

Argh - I’m trying to move ahead and am now hitting another error, and it may be related. After:

learn.unfreeze()
learn.fit_one_cycle(10, 3e-3, moms=(0.8,0.7))
...
learn.save_encoder('fine_tuned_enc')
...
data_clas = (TextList.from_df(df, cols='text', vocab=data_lm.vocab)
            .random_split_by_pct(valid_pct=0.2)
            .label_from_df(cols='label')
            .databunch(bs=bs))
data_clas.save('tmp_clas')
data_clas = TextClasDataBunch.load(path, 'tmp_clas', bs=50)
learn_c = text_classifier_learner(data_clas, drop_mult=0.5)
learn_c.load_encoder('fine_tuned_enc')

the last line throws:

RuntimeError: Error(s) in loading state_dict for MultiBatchRNNCore:
	size mismatch for encoder.weight: copying a param of torch.Size([33080, 400]) from checkpoint, where the shape is torch.Size([33387, 400]) in current model.
	size mismatch for encoder_dp.emb.weight: copying a param of torch.Size([33080, 400]) from checkpoint, where the shape is torch.Size([33387, 400]) in current model.

I notice that fine_tuned_enc.pth appears to be unchanged from 2 days ago:

That seems really weird but I don’t know if that’s what’s causing the error.

gshashank84 · December 7, 2018, 6:53am

The same problem is happening for me, Size-Mismatch-Error by loading the encoder(of Language model) in Text_Classifier_Learner

ricknta · December 8, 2018, 4:16am

NVM! I started up GCP today and this runs fine there - so this appears to be a Gradient problem! Weird…

ricknta · December 8, 2018, 4:17am

Are you running on Gradient?

avinregmi · December 12, 2018, 10:19pm

Were you able to fix the problem? I’m having the same problem when I’m trying to build language model using different dataset.

ricknta · December 13, 2018, 5:28am

Yikes - yes I’m pretty sure I fixed it but it’s been weeks so I’ll have to go back to that nb and see if I have notes on it. I think it may have been fixed with a newer fastai version but don’t recall - but you should try that!

avinregmi · December 13, 2018, 5:31am

Yeah, I tried that already. I’ve updated to the latest version. I’m also getting another error saying “AttributeError: ‘float’ object has no attribute ‘replace’”

tabshaikh · January 17, 2019, 8:03pm

@hxiao0909 were you able to solve this issue, I am facing the same one too @joshfp @sgugger could you please help me with this one

sgugger · January 17, 2019, 8:06pm

I just updated the Zeit script to the latest fastai (1.0.40) and it’s working properly. Be sure to pull the last version of the corresponding notebook as it’s using the new fast way to do inference.

tabshaikh · January 18, 2019, 4:24am

There is some new type of error coming up then @sgugger
client.js:34 POST https://fastai-1v3.appspot.com/analyze 500
analyze @ client.js:34
onclick @ (index):28
VM173:1 Uncaught SyntaxError: Unexpected token I in JSON at position 0
at JSON.parse ()
at XMLHttpRequest.xhr.onload (client.js:26)
xhr.onload @ client.js:26
load (async)
analyze @ client.js:24
onclick @ (index):28

tabshaikh · January 18, 2019, 5:04am

@sgugger
The above was when i was trying to use google app engine
when I tried to use zeit first i got this error

Then i changed the now.json to
version:2 and removed all remaining lines

But then when i am trying to go on this deployed app site
it is showing the directory structure only which is here
https://zeit-6js94ceor.now.sh/

Preka · January 23, 2019, 10:39am

Actually, what Jeremy ones suggested was not the through the 4th channel out because it is like throwing away more information.

hwasiti · January 23, 2019, 5:11pm

For an excellent code example for modifying 3 channel input pretrained models into 4 (or even more if you wish) by @wdhorton for the Human Protein Atlas competition here.

ricknta · January 26, 2019, 4:12am

I’m working with a Kaggle fake news dataset. When I use:

data_lm = (TextList.from_df(df, cols=['text','type'])
                .random_split_by_pct()
                .label_for_lm()
                .databunch(bs=bs))

I get:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-632209f31b16> in <module>
----> 1 data_lm = (TextList.from_df(df, cols=['text','type'])
      2                 .random_split_by_pct()
      3                 .label_for_lm()
      4                 .databunch(bs=bs))

AttributeError: 'float' object has no attribute 'replace'

and when I try:

data_lm = (TextList.from_df(df, cols=['text','type'])
                .random_split_by_pct(0.2)
                .label_for_lm()
                .databunch(bs=bs))

I get:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-223d106d8d5b> in <module>
      1 data_lm = (TextList.from_df(df, cols=['text','type'])
----> 2                 .random_split_by_pct(0.2)
      3                 .label_for_lm()
      4                 .databunch(bs=bs))

AttributeError: 'float' object has no attribute 'replace'

In other words, same error on the next line. Does anyone know what’s going on here?

Thanks!

ricknta · January 26, 2019, 5:50am

NVM! It was caused by Nans in the text column… Filtered the Nans from the df and it works fine!