SageMaker & S3 Bucket Images

Hi,

I have just recently prepared a custom image dataset in S3 and have taken the following steps:

  1. Separated the train and test images
  2. created a file called ‘train-annotations.csv’ & ‘test-annotations.csv’
  3. Uploaded images and notation files to S3. They are all at the same level in the folder:
    e.g.

When I am in my Sagemaker notebook (conda_pytorch_p36), I am able to access the contents of the train-annotations.csv file but am struggling to access the images for my model.

I would appreciate any advice as to how I can get this working, I took a similar approach recently with tabular data, but it was just a single csv file (i.e. not needing to handle 100’s of images). I am open to creating folder structures if needed. I am sure that this is due to my inexperience with the platform (just switched over from Colab/Google Drive to AWS Sagemaker/S3).

@matt.mcclean, would this be something you have come across?

Thanks
Andrew

Hi Andrew,
I do not know if you are still stuck with that problem. But I had the same problem and created a post with a few explanations : My S3 ImageList version

Basically, as the error message points out, the problem comes from the fact that os.scandir() expects a file system path, while S3 URIs are URLs. So you need to rewrite the get_image_files() and use AWS method to browse your hosted files. Boto3 is the python lib where you should find all the methods you need to do that.

Hope this will help.