Am I doing the right steps; trouble with training


I’m having a bit of trouble understanding what I’m doing wrong.
As part of the part1 course, I’d like to train a model so it can classify if the long edges of a rectangle are parallel.

I create these pictures with Pillow, thus I know which rectangles have parallel edges (within threshold) and save them in a folder and like the cats and dogs notebook I put the labels in the filename.

No matter what I try, i will almost always get a plot after learn. lr_find() like so:

Then fitting one cycle will give me error rates at about 50%… I’ve tried different things, colouring, png’s, jpeg, greyscale png. 4000 images, 16000 images and today I let the pc chug and I made 100.000 images. No change.

The only different shaped plot (starting out high left at a very low lr, valley in the middle, and going up at the right side (high lr)) I got were when I ran the cells in the notebook while pictures got copied from a shared disk to my local disk. I got around 50.000 images then. Error rates were pretty low then, like 0.06
But I can not reproduce this result.
I think I’m missing a simple thing here. Any thoughts?


First, try a simpler task, like circles vs. squares. That way you can see whether your code, data setup, and steps are the right ones.

Second, it may be that your chosen task is one that resnet is not capable of. It’s good at detecting a feature set in an image, but notoriously bad at detecting spacial relationships between features. If you discover this to be the case, it would make a great blog post.

Here’s an interesting article, perhaps pertinent:

Please let us know what you discover!

Also, note that the default transforms apply perspective warping which… breaks parallelism. You should deactivate it if you want your model to train.

Thanks for that link. This seems to talk about what I would like to do. I would like ultimately to recognize position and rotation for different objects. Since I generate the images with Pillow, I can provide (x,y) of the center of the shape, and the rotation of the shape (z-rot).
I’ll skip parallellism for now and start out with first discovering shapes and sizes (square, rectangle, circle, triangle, big, small)
I’ll look at positions later.

Thanks, I’ve tried avoiding transforms since I could generate as much images as I want. I’ve used ds_tfms=get_transforms(do_flip=False, max_warp=0.) Is that be the correct setting?

That’s the correct setting indeed.

@Pomo @sgugger thanks for the tips.

I’ve been playing around and I’ve made data that consists of squares, rectangles, circles and ellipses. I started out with random backgrounds and foregrounds per image (from 8 colors) with squares and rectangles and in the end switched to dark grey background and light grey foreground which gave much better results.
After that I added circles and squares, more images and used a resnet50, and I now have satisfying results which I can live with for this tryout. Great to see that the faults occur between rectangle<->square and ellips<->circle that makes sense if you take into account that the original images get scaled down quite a lot.