Transfer learning and conv nets for tabular data

Interesting paper using convolutional networks and transfer learning on tabular data.


This is absolutely fascinating (and really well written)! I’m going to have to read it through a few more times, but this sounds like a fun project to develop. I did a (very) quick google search and came up blank - do you know if there’s a code example floating out there?

Correct me if I’m wrong but did they not just take their important variables they wanted to use, chuck it into a picture, and classify as such? I’ve been slowly reading it for the past few weeks trying to make sense of it in my head and want to be sure I’m reading that right.

I think I may be able to do an implementation of this. I’ll try it on the Adult dataset and I’ll post my results when I have them.

That’s how I’m reading it.

Awesome. I’m starting it now, I’ll post here and on a separate thread with my results and if I manage to beat Jeremy’s score

1 Like

The use of images for tabular classification is discussed extensively in the time series analysis group here:

There is a whole library here dedicated to time series to image transformations:

I’m pretty skeptical of the author’s approach because they don’t provide enough details to replicate the image creation process and they don’t provide code. They elude to the fact they use feature importance to govern font size used in the image creation but I don’t see any way to reproduce their results from their paper.

Considering the transformation of tabular data to images is the key innovation in the paper, this is the only discussion of how this is done:

Algorithm 2 SuperTML EF: SuperTML method with Equal Fontsize for embedding.
Input: Tabular data training set
Parameter: Imagesize of the generated SuperTML images
Output: Finetuned CNN model
1: for each sample in the tabular data do
2: for each feature of the sample do
3: Draw the feature in the same fontsize without overlapping, such that the total features of the sample will occupy the imagesize as much as possible.
4: end for
5: end for

1 Like

They elude to the fact they use feature importance to govern font size used in the image creation but I don’t see any way to reproduce their results from their paper.

In fairness, they do mention that the version using feature importance to govern font size wasn’t any more predictive than the version without. I’m having a hard time conceptualizing why this would work better than the traditional approach to tabular data, but am hopeful I (or more likely a better programmer like @muellerzr) will be able to prove/disprove. :slight_smile:

I also found this confusing as well. I’m trying my best to recreate what they describe as close as possible. It looks like the bottom two rows essentially turn into 4x4 boxes if you would of text. However I ran into issues with their feature selection and choices. Due to this fact, I may deviate from the paper slightly in that regard along a few others.

I share @whamp’s skepticism about this paper - at the beginning it even seemed to me like a kind of a joke. Don’t get me wrong - the idea of converting tabular data to images and using pre-trained models to classify them is very interesting and promising. However, the conversion to images that they propose is not making any sense to me. Why convert nice numeric data, which can be (relatively) easily used by a model, to a bunch of noisy letters and digits, and force the model first to understand these arbitrary characters and then predict an answer?

What if the letters would have been converted to Hebrew or Russian characters? the model should still work since it doesn’t understand English any better than these languages. So if it will work it means that the model has to first understand the representation of a language - a hard task indeed - and then solve the original task. In a similar way, using a different font, or color, or whatever should also work and that means that the model must obtain a very high level knowledge about the world.

A simpler conversion idea, to my view, would be to map each feature value to a different color pixel in the image (and if there is a temporal data involved use time as one of the image dimensions).

Also, the paper’s general writing level is pretty poor with many typos (not that I write so well myself, but I expect some level from a published paper). Also, they cite in 2019 irrelevant information from 2015-2016, for example the claim about XGBoost being the winning model for every structured competition on Kaggle which was correct in 2016 but I don’t think is true at 2019.

And a positive ending - thanks (Will) for the links to the time series discussions on the forum - they are very enriching!


I read the paper. It is a fascinating idea if it really works. I can’t fully understand their learning mechanism. CNN is for the spacial relationship, but their generated images don’t have any relationship between each pixel. Can anyone interpret this model? Many thanks

A little update on this : I have developed a successful prototype that has equaled the tabular model’s accuracy. I will replicate the adult revenue dataset results to compare this to a benchmark.


I’m interested to know if anyone managed to reproduce some of the paper results. I’ve been working on the Higgs ML challenge for a while now and a score of 3.979 is amazing (if that is on test set). I’m hoping to get some time to try out their method soon, so will report back.

I agree with @yonatan365. There is no sense in converting numeric values into images and asking a CNN to convert them back into values. I have also looked at the time series images transformations suggested by @whamp, but I feel that, from the perspective of information theory, these transformations share the same problem as that proposed in the SuperTML paper: they convert the original data set in to a highly inefficient expression of the same information using many more bits, just in order to create a 2D image. In Figure 1 of the SuperTML paper, a four dimensional vector is converted into a 229 x 229 image matrix. And a Gramian field converts a time series length n into an n x n image.
As Jeremy explains in Lesson 4, it makes more sense to train the network to learn the embeddings than to proscribe them.

The only advantage I can think of is that the benefit of transfer learning outweighs the cost of the inefficient embedding. However the transfer learning can only bring a benefit if the original problem was similar, and in their examples there seems to be such a gap between Imagenet and the tabular data, that I don’t see how they manage to get such good results.

I’ve nonetheless applied their method to the Higgs ML dataset (using Resnet34 rather than SE-net) and found that whilst it does appear to be learning something, the final score is around 2.8; much worse than a traditional ML approach, and far from their 3.979.

Additionally, I attempted some of the inference techniques taught in lesson 6 of V3 of the DL course, as I assumed they would highlight the features being used, however they seem to mostly highlight the blank spaces of the image.

The code is here ( if anyone’s interested or has suggestions. Formatting the data and evaluating the predictions requires an external package to deal with the physics, but it’s available from pip.

1 Like

Transferring numeric and categorical values to images is very inefficient (it took me 4 hours to create 1M images) of a tabular dataset. The adult dataset took me 1h to generate… but it definitely works as I managed to replicate average tabular models without working on optimizing the image generation component.


Nice that you got it to work! Do you have a link to your code?

Apologies for the double-post. I’ve been trying out encoding the data as solid grey-scale blocks by feeding normalised and standardised values through a sigmoid and timesing the result by 255. I figured this would provide direct access to the numerical information rather than an abstract representation of it. The images creation turns out to be ~3 times quicker, and the resulting dataset about three quarters of the filesize. The model shows mild improvement during training, and the final score also improves considerably (2.83->3.36), but that’s still worse than a single 4-layer FCNN trained directly on the tabular data.

Additionally, I tried data augmentation during training by slightly adjusting the brightness and contrast, but didn’t find significant improvement.

Pixel encoding:
Model trained on new images:

1 Like