Cnn architecture for really simple images?

i’ve been playing with an idea today for encoding tabular data into images so i can throw them at a CNN (mainly because fastai.tabular seems to hate me). it’s based off a paper i saw where they were literally putting the variable values into the images but i don’t see the point in wasting training time getting it to understand values i can encode as colors so i use a greyscale image with colored blocks. i’m personally getting better results with blocks than numbers.

i was playing around with titanic on kaggle and so far i’ve managed 0.796 with resnet18 and i seem to be getting my best results on simpler architectures and minimal training (0.796 was 2 lots of 5 epochs without unfreezing).

i’m wondering if there are other architectures or training tricks i should be looking at for something so simple. my data looks like this.


there will be a blog post and a repo with some code when i’m done but google drive and colab are throwing a hissy-fit so i’m done for today.

any suggestions?


Very interesting approach. I know exactly the paper you are talking about and it was discussed on the forum before (can’t find the link right now). I looked into it too, potentially embedding colors directly into it but did not see too much improvement.

A few ideas I’ve thought about trying is yes a simple architecture seems like the better way to go, probably one not pre-trained I think I found. Another potential would be the xresnet's, potentially MobileNet, and of course efficientnet

i tried pretrained false and unfreeze right away but it didn’t make a lot of difference to me. in 5 epochs i was in pretty much the same place, it just took bigger steps to get there with the untrained model.

i was looking at the models in the zoo, and i can probably just try them all but i wondered if there was an understood approach to for simpler problems.

the paper is here:


I think when i tried my own version of SuperTML I wound up integrating color into it too (not just B/W), using a color gradient of sorts for the cardinalities

1 Like

quick question, off the top of your head, does anyone have any idea why i start with this:
download (3)
then do this:

from import Image
import torchvision.transforms as tfms

img_pil = normal_img_from_my_code()
img_tensor = tfms.ToTensor()(img_pil)
img_fastai = Image(img_tensor)

and end up with this?
download (4)
i feel like i’m missing out on an opportunity to encode something more into the images but i don’t understand why it’s going from A to B.

1 Like

You’re not specifying it to be Black and White, or it’s not being interpreted as a one-channel picture

1 Like

i’m top 9% on the titanic leaderboard with a CNN. this is hilarious.


if anyone wants to play, the image encoder repo is here:

blog post here:

Off to do the rest of the course now…