Hi, let’s say you want to design a neural network, you know the numbers of input neurons, the application going to be general image classification purpose, the activation function is Sigmoid, the loss function is cross entropy, so with all the specifications here, how many hidden layers would be needed in a rule of thumb manner?

A image drawing presentation would be helpful to all of us here.

Although I’m not an instructor I think that I have an answer that might not be helpful but will help guide you.

The amount of hidden layers help determine how many parameters there are in your model to train, thus when you have more hidden layers you will get more parameters (weights) to train as you go through the the network.

That being said, the amount of layers you need needs to be enough to catch the underlying complexity of the network. I would say that for general classification you would need at least 1 hidden layer (otherwise you can’t do XOR) to learn meaningful things about images, but there is no correct amount.

Also the activation function and loss function shouldn’t effect the amount of hidden layers you need.

Look into the efficient net paper. They start with a base model B0 and then use a weird rule of thumb to increase the capacity of their model by increasing depth, width, and training image size all at the same time. They end up with models B1-B7 that are each deeper, wider, and with larger input than the preceding model. They also get better accuracy as the models get bigger.

Is B7 better than B0? In terms of accuracy on Imagenet, yes. But B0 trains way faster than B7.