An Attempt to Find the Right Hidden Layer Size for Your Tabular Learner

Hi all! I decided to make this post as there is not much information into this here on the forums (or really anywhere) and I wanted to share what I had found, in hopes it may help improve your tabular projects.

While exploring various tabular-related problems I began to wonder what the right hidden layer size really was. In the ADULT’s notebook, Jeremy uses [200,100] without much explanation as to why, it just works. In hoping to find a more equation-esq answer, I found this stackoverflow post:

Within it is found this formula:

As the guide suggests, Ni is your amount of input variables, No is the output amount, Ns is our training set size, and a is an alpha value. From here, I usually pass in to layers an array of this output divided by 2, splitting the total hidden amount into two separate layers. Here is my function for doing so:

def calcHiddenLayer(data, alpha, numHiddenLayers):
  tempData = data.train_ds
  i, o = len(tempData.x.classes), len(tempData.y.classes)
  io = i+o
  return [(len(data.train_ds)//(alpha*(io)))//numHiddenLayers]*numHiddenLayers

For example, having an alpha value of 3 on the ADULT’s dataset, I achieved an accuracy between 84.5-85% within 5 epochs. If anyone has questions on this please do not hesitate, I had found success with this method and wanted to share it with others. This is absolutely not a be all tell all answer to what the best hidden-layer sizes actually are, but this is what I have found within my research.

If you wish to see the notebook, it is found here:

Thank you for reading!


Edit: I simplified the above function to only need to pass in the databunch, the alpha, and the number of layers


Zack, so far as I can tell you are the king of the tabular hill. A few days ago, I saw a video interview of you. I hope I can persuade you to give my problem a look. I’ve been working on a very large csv of Alzheimer’s data since last July. Trying to a good learn.predict for the age of onset of dementia. I’m a newby, but I had a little progress and wanted to show it to some other people, so I loaded fastai on my laptop.

I had so much trouble with accuracy that I went back to the original lesson4-tabular (the adult.csv).
Now I can’t get that to run correctly either. The learn.predict(row) gives a wrong category and the tensor results are backward. On both my machines.

Backward means learn.predict(row) predicts the wrong category. The tensor values are reversed:
(Category <50k, tensor(0), tensor([0.5398, 0.4602]))

The original example notebook in DL1 has a correct example:
(Category >=50k, tensor(1), tensor([0.4402, 0.5598]))

Both runs are for row 0.

I found your post about the HiddenLayer size. Thanks for explaining and providing the USEFUL (extremely) notebook and code. It gave me another chance at the model, so I downloaded and ran it without change. My results are backward. I got this running your notebook.
(Category <50k, tensor(0), tensor([0.5236, 0.4764]))

Your posted notebook showsresults in line with the original example notebook:
(Category >=50k, tensor(1), tensor([0.4402, 0.5598]))

I have two environments and both give me this backward tensor.

Both are Win10, refreshed Anaconda 3.7 and fastai. One has a gpu the has a cpu only. Everything else seems to work normally. No problem with other notebooks.

I mostly work on the gpu. I wanted to share, so added Anaconda and fastai to my laptop. That’s when I noticed the problem. My assumption is that I’ve gotten something wrong with the environment. Twice. If that is so, however, it should be a glaring error.

I have some analytic chops from dinosaur times. I did problem determination on mainframe software. But I’m new to DL. Saw my first line of Python last year.

I sometimes have to bang my head against a wall until the scales are knocked from my eyes, but I’m baffled. I’m beginning to suspect something isn’t right in the tabular folder, but the code is beyond my skill level.

I’m in this because I am determined to take a hard look at medical data on Alzheimer’s disease. A metaphorical Niagara falls of money pours into Alz research, but instead of forming a river of results it disappears into a metaphorical black hole: tables of records that don’t seem to show anything.

Please pity the dinosaur and give me a clue.

Mike Hawkins

1 Like

Hi @mike00! There could be a number of reasons why it’s not stacking up! A better way to judge how it’s doing would be to instead run it on a bunch of predictions (say 50-100 entries) and see how it stacks up. You could have also possibly been overfitting your data, how many epochs did you train for? How did the results look (validation and training loss). Since you were predicting age, did you set up for a regression based problem? Like lesson 6, Rossman? As that would certainly be fitting. Let me know :slight_smile:


@muellerzr Thanks for getting back to me.

I experience the problem when only running one epoch. I had tried as many as 8, but saw over-fitting and rarely saw accuracy improve after 4 epochs. Based on your USEFUL link, I’ll see how it goes with 5.

It’s so obvious to run more predictions, I don’t know why I didn’t think of it. I reran the class lesson with 1 epoch and got 5 correct predictions in a row. I’ll run more, but that makes me think of another issue, reproducibility. I know I’m not going to get identical results, but it’s disheartening to get results that were so far from the mark.

I’ve got a long way to go. Thanks for the regression recommendation, too. The age range is from 15 to 110. I haven’t gotten past 54% accuracy. I am looking forward to implementing your hidden layer size finder. By the way, it’s layers are specified as [200,100] but it looks like your formula for N subh yields a single number. Can I just write layers=[Nh]?


I usually do whatever is found and then either do 2-3 of that size, or I’ll half it each time (usually it) so layers=[NH, NH/2, NH/4] (sometimes the /4)

Hi muellerzr Hope your having an awesome day!
Thanks for an awseome post :smiley: knowing how to calculate hidden layers has always been a dark art to me. Now I can at least bring a bit more clarity to the subject.

Many Thanks mrfabulous1 :smiley: :smiley:

if we use calcHiddenLayer, the hidden layer size will be kind of proportional to the size of training dataset.

If the dataset size is more, that means hidden layer size also increases. But for tabular do you recommend larger hidden layer size ?.