L3 Planet, error issue, dataset, self.train

Hi friends

Using Google Colab,
I downloaded the new notebooks at Github: fastai/course-v3
and ran
https://course-v3.fast.ai/start_colab.html
and its working,

got the kaggle data etc, following instructions and all good until
this line where I get this error, any tips? thank you :wink:

data = (src.transform(tfms, size=128)
.databunch().normalize(imagenet_stats))

You can deactivate this warning by passing no_check=True.
/usr/local/lib/python3.6/dist-packages/fastai/basic_data.py:205: UserWarning: There seems to be something wrong with your dataset, canā€™t access self.train_ds[i] for all i in [12055, 18692, 2100, 23398, 10386, 21287, 7103, 3395, 22337, 15219, 30742, 12772, 13602, 23094, 5016, 29438, 1820, 5728, 20708, 4628, 29504, 22855, 10699, 24698, 27910, 31950, 12095, 17326, 10178, 25479, 9811, 23495, 4600, 29706, 7890, 2902, 21275, 15381, 10512, 7632, 31732, 32081, 19886, 543, 7920, 24887, 12894, 32221, 15135, 3343, 17710, 5169, 28317, 31561, 24068, 29490, 6354, 4988, 7926, 6202, 19658, 11343, 15324, 15448]
warn(f"There seems to be something wrong with your dataset, canā€™t access self.train_ds[i] for all i in {idx}")


The data is all there:

!ls -l data/planet/train-jpg

total 701152
-rw-rw-r-- 1 1003 1004 12441 Apr 19 2017 train_0.jpg
-rw-rw-r-- 1 1003 1004 17261 Apr 19 2017 train_10000.jpg
-rw-rw-r-- 1 1003 1004 14961 Apr 19 2017 train_10001.jpg
-rw-rw-r-- 1 1003 1004 17215 Apr 19 2017 train_10002.jpg
-rw-rw-r-- 1 1003 1004 17265 Apr 19 2017 train_10003.jpg
-rw-rw-r-- 1 1003 1004 12424 Apr 19 2017 train_10004.jpg
-rw-rw-r-- 1 1003 1004 16303 Apr 19 2017 train_10005.jpg
-rw-rw-r-- 1 1003 1004 12225 Apr 19 2017 train_10006.jpg
-rw-rw-r-- 1 1003 1004 21537 Apr 19 2017 train_10007.jpg
-rw-rw-r-- 1 1003 1004 25553 Apr 19 2017 train_10008.jpg
-rw-rw-r-- 1 1003 1004 19478 Apr 19 2017 train_10009.jpg
etc etc

3 Likes

Alo friends, any tips about this error? thank u :wink:

Hi @javismiles! I was having a similar problem earlier with another Kaggle dataset and seem to have fixed the issue.

What was causing the issue for me were the pathā€™s given to ImageItemList.

I too was using Google Colab and had everything in my working directory

so here is my code that worked

sz = 128 # our initial size will be 128*128
data = (ImageItemList.from_csv('/content', 'train_v2.csv', folder='train-jpg', suffix='.jpg')
        .random_split_by_pct(0.2) # will split our data for us (train/valid) 20% of it will go into valid
        .label_from_df(sep=' ')
        .transform(tfms, size=sz)
        .databunch().normalize(imagenet_stats)) 

Hope that helps! & Happy new years :slight_smile:

1 Like

Gracias Diego! Iā€™m gonna try your suggestion and see if it works :wink: happy new year :wink:

1 Like

@javismiles np! I truly hope it does help! You can also find great documentation on the datablock API here https://docs.fast.ai/data_block.html

Cheers! :slight_smile:

you know what, I just tried and it worked all great directly without having to add or change any code! odd! exact same code as the other day but this time it ran wellā€¦ puzzled oops

I am having the same problem.
fastai version 1.0.39

data = (ObjectItemList.from_df(pd.DataFrame(data=list(fn2bbox.keys())), path='data/train')
        .split_by_valid_func(lambda path: path2fn(path) in val_fns)                         
        .label_from_func(get_y_func, label_cls=StubbedObjectCategoryList)
        .transform(get_transforms(max_zoom=1, max_warp=0.05, max_rotate=0.05, max_lighting=0.2), tfm_y=True, size=(SZ,SZ), resize_method=ResizeMethod.SQUISH)
        .databunch(bs=BS, num_workers=NUM_WORKERS)
        .normalize(imagenet_stats))

Iā€™m also having a similar issue:

tfms = get_transforms(flip_vert=False, max_rotate=10, max_zoom=1, max_warp=0)

data = (ImageItemList.from_folder(path)
        .random_split_by_pct()
        .label_from_func(get_ctr, label_cls=PointsItemList)
#         .transform(tfms, tfm_y=True)
        .databunch()
       )

You can deactivate this warning by passing no_check=True.
/home/waydegg/anaconda3/lib/python3.7/site-packages/fastai/basic_data.py:205: UserWarning: There seems to be something wrong with your dataset, canā€™t access self.train_ds[i] for all i in [58, 102, 22, 3, 68, 40, 46, 87, 35, 77, 76, 73, 16, 14, 95, 9, 1, 57, 42, 18, 39, 60, 65, 53, 85, 26, 81, 7, 24, 62, 54, 19, 31, 34, 56, 51, 89, 64, 67, 83, 17, 43, 28, 13, 55, 98, 32, 37, 80, 100, 103, 15, 93, 45, 30, 69, 96, 82, 4, 12, 61, 49, 90, 99]
warn(f"There seems to be something wrong with your dataset, canā€™t access self.train_ds[i] for all i in {idx}")

I havenā€™t used Google Colab before, but did you just restart your instance of your jupyter notebook and everything was working again? Also what version of the fastai library are you using?

Well I figured out my own problem I posted earlier. I believe the fastai library was changed/updated as I made my datablock around the datablock in the BIWI Head-pose notebook, which after checking now has changed. After changing ImageItemList to PointsItemList and removing label_cls=PointsItemList, I can load in all my images just fine again.

friends, I believe that the best now is to wait for the new v3 vids and restart fresh with v3 videos and v3 code, it seems to run very smoothly in google colab, etc

I am also facing the same error. Please suggest how to fix this error?
Name: fastai
Version: 1.0.40.dev0
Windows10

Thanks,
Ritika

My error get fixed by using the below code but now I am facing the error in Camvid dataset

data = (ImageItemList.from_csv(path, ā€˜train_v2.csvā€™, folder=ā€˜train-jpgā€™, suffix=ā€™.jpgā€™)
.random_split_by_pct(0.2) # will split our data for us (train/valid) 20% of it will go into valid
.label_from_df(sep=ā€™ ')
.transform(tfms, size=128,padding_mode=ā€˜zerosā€™)
.databunch(bs=32).normalize(imagenet_stats))

Try this:
tfms = get_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
data = ImageDataBunch.from_csv(path, folder=ā€˜train-jpgā€™, csv_labels=ā€˜train_v2.csvā€™, size=224, suffix=ā€™.jpgā€™, sep=ā€™ ', ds_tfms=tfms)

1 Like

Hello All,

Im also having a similiar problem for part 1 lesson 1 when I hit

ā€¦Not sure if that image will show.

This is the block of code Im running:

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4).normalize(imagenet_stats)

And this is the error being thrown:

/home/ubuntu/anaconda3/lib/python3.7/site-packages/fastai/basic_data.py:259: UserWarning: There seems to be something wrong with your dataset, for example, in the first batch canā€™t access these elements in self.train_ds: 1027,152,1207,1286,6ā€¦
warn(warn_msg)

Ive been struggling with this for a while now and Im having trouble diagnosing the issue. Hope Im in the right thread.

I look forward to hearing from someone.

Regards,

Hey Carl, I am new to fast.ai but I was having a very similar problem to yours. I changed my batch size from 64 to 16 (bs=16) and it worked great. I just added the keyword parameter ā€œ, bs=16ā€ to the ImageDataBunch.from_folder() method call. Hope this gets you there!

Hi Steve,

Apologies for the late reply. Iā€™ve not been on here for a while; was brushing up on my knowledge via Kaggle and have been busy working. I will give this a try to see if it works. I initially thought it could be an issue with the directory but I will try your method to see if this resolves the problem.

Thank you for your response.

Carl