Thought process for building my own network

Models like resnet, perform very well in image classification tasks. If I want to build a model from scratch instead of using transfer learning, how do I determine which layers must be used and when? I understand it can be done using trial and error. I would like to know if there is any other way that will let me construct models that perform well.