Help needed: PIL corrupt EXIF data

I am working on the Kaggle Cervical Cancer competition and following this topic to do some transfer learning with ResNet.

When I used the VGG16 model in Keras and Theano (part 1 environment), these train and test images are just fine. Now I am using Keras 2.0 and Tensorflow as backend (part 2 environment), some of the train images seed to have corrupted EXIF data and been ignored. Specifically, when I ran:

# precompute convolutional output trn_conv_features_resnet = resnet_model_conv.predict_generator(trn_batches, trn_batches.samples)

I got a bunch of messages like:
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 524288 bytes but only got 0. Skipping tag 3 "Skipping tag %s" % (size, len(data), tag)) /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 3 "Skipping tag %s" % (size, len(data), tag)) /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 33554432 bytes but only got 0. Skipping tag 4

I manually counted and found out that 258 images were ignored, out of the total 1281 training images. My questions are:

  1. Is there a way to fix this EXIF corruption issue? I searched but had no luck so far…
  2. How can I figure out which 258 images are ignored? I can at least manually remove these 258 images out of the train data, in the worst case scenario.

Thank you!

Check out the kaggle thread on this topic, removing and then creating new exif data for all of the files worked for me:

1 Like

I used the piexif to remove all the EXIF file and it worked pretty well. Thank you @kangway!

However, that only works for .jpg file, what about png, gif and other type of files…?

I’m new to fastai, im training resnet50 on dls id downloaded from bing image search. I’m getting the same error… What is the issue… I want to read more on this issue…

In case this helps anyone in the future, here’s how I removed all EXIF data from my dataset, which removed the PIL warnings.

# remove corrupt exif data

from PIL import Image

file_names = get_image_files(path)

def remove_exif(image_name):
    image = Image.open(image_name)
    if not image.getexif():
        return
    print('removing EXIF from', image_name, '...')
    data = list(image.getdata())
    image_without_exif = Image.new(image.mode, image.size)
    image_without_exif.putdata(data)

    image_without_exif.save(image_name)

for file in file_names:
    remove_exif(file)
print('done')
1 Like