What's the difference between training on jpeg and png?

I train a CNN for classification using jpegs. The accuracy dropped 20% when predicting png images. Has anyone come across such problems?

1 Like

Jpeg adds artefacts to image and png has no artefacts. Your model trained with artefacts and expects them.

1 Like

Thanks for your reply. And how to deal with it?

You can just convert png to jpeg or train NN again over png images (but do not try convert images from jpeg to png!)


JPEG is a lossy compression. Although you may not be able to see the difference, the network might when applying hundreds of filters. 20% drop in accuracy is still pretty extreme.

Does anybody have first hand experience how float16 and float32 compares? i usually use float16 for space saving reasons, but it may be a good idea to up my game.

Is any way to design a keras layer to convert png to jpeg? Since there exist two kind of images when doing inference, thus I can put such layer on top of trained model for png images.

It is not part of keras. I may suggest convert all png to jpg from command line: https://superuser.com/questions/71028/batch-converting-png-to-jpg-in-linux

1 Like

seems like youre suffering from neural-networks-will-solve-all-my-problems syndrome.

JPG and PNG are just image formats. they vanish once you read the data. all images will be uncomressed tensors by then. why would you convert PNGs to JPGs anyway if we just told you this will hurt accuracy?

It is necessary because NN was trained over images with JPEG artefacts

while in theory you could be right - i dont think that will make a difference

Yes most likely you are right. I just re-read original question. I think png vs jpg was false idea. Most likely it is over-fitting or incorrect training set.

Are you tried predict on jpeg or you use jpeg only for training? In other words is 20% difference between jpg vs png prediction or between jpg test set and png real data?

I only use jpeg for training and validation. And test model for jpg and png images of two different test datasets .

Do you have any reference about this one?