Help needed: PIL corrupt EXIF data

(segovia) #1

I am working on the Kaggle Cervical Cancer competition and following this topic to do some transfer learning with ResNet.

When I used the VGG16 model in Keras and Theano (part 1 environment), these train and test images are just fine. Now I am using Keras 2.0 and Tensorflow as backend (part 2 environment), some of the train images seed to have corrupted EXIF data and been ignored. Specifically, when I ran:

# precompute convolutional output trn_conv_features_resnet = resnet_model_conv.predict_generator(trn_batches, trn_batches.samples)

I got a bunch of messages like:
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 524288 bytes but only got 0. Skipping tag 3 "Skipping tag %s" % (size, len(data), tag)) /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 3 "Skipping tag %s" % (size, len(data), tag)) /home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 33554432 bytes but only got 0. Skipping tag 4

I manually counted and found out that 258 images were ignored, out of the total 1281 training images. My questions are:

  1. Is there a way to fix this EXIF corruption issue? I searched but had no luck so far…
  2. How can I figure out which 258 images are ignored? I can at least manually remove these 258 images out of the train data, in the worst case scenario.

Thank you!

0 Likes

(kangway) #2

Check out the kaggle thread on this topic, removing and then creating new exif data for all of the files worked for me:

1 Like

(segovia) #3

I used the piexif to remove all the EXIF file and it worked pretty well. Thank you @kangway!

0 Likes

(Karl Mason) #4

However, that only works for .jpg file, what about png, gif and other type of files…?

0 Likes