Label_from_df is stripping leading zeros

asparamancer · December 5, 2019, 1:15am

I have a df, when I do df.head() I get the following:

name labels
0 000001.jpg short-sleeve-top;trousers
1 000002.jpg short-sleeve-top;short-sleeve-top
2 000003.jpg long-sleeve-dress
3 000004.jpg long-sleeve-dress
4 000005.jpg long-sleeve-dress

But when I do:

src = (ImageList.from_df(df, path, folder='train')
       .split_by_rand_pct(0.2)
       .label_from_df(cols='labels', label_delim=';')
       .databunch().normalize(imagenet_stats)
      )

I get the following error: UserWarning: There seems to be something wrong with your dataset, for example, in the first batch can't access any element of self.train_ds. Tried: 25411,63210,83328,49358,42785...

That is because those elements do not exist, they would all have leading zeros ie 025411,063210,083328,049358,042785.

How do I stop the label_from_df from stripping leading zeros? I’ve tried adding converters={'name': lambda x: str(x)} to df = pd.read_csv(path/“labels.csv”) and the df still shows the image names with the leading zeros but I get the same error showing somewhere they’ve been stripped either between the df and the label_from_df or by _from_df

asparamancer · December 5, 2019, 4:28am

Redid the dataset to append ‘train_’ to the front of the numbers and updated the df to be the same, seems to have resolved it. Still not sure what the issue is though here.