Set of questions after Part 1 is completed

dmitryfedotkin · August 28, 2018, 8:59am

Hey there, I completed part one few weeks ago and it was amazing course. I made notes for every lesson and wrote out questions every time I had one. I didnt ask them here right away and most of them were answered later in the course. After I watched lecture 7 I decided to spend some time learning stuff from different articles. Some of questions are still not answered though and I will appreciate if someone could help me with it.

If there is SGD, do people use regular GD?
What is difference between different optimization algorithms? What algorithm I should use?
How similar kaggle competitions to real life problems which data scientists solve on daily basis?
How often you create new DL model instead of using pretrained one?
If almost any problem can be solved by a neural network, why do we need ML algorithms?
If you create a new model, how do you decide what layers your model need?
How do you decide what layers should be frozen?

Lankinen · August 28, 2018, 12:07pm

I have half a year knowledge and I have almost finnished the first part. So I might be wrong sometimes.

I dont think people anymore use GD. You can test it if it works better but in 99.99% cases SGD is faster to converge.
Use adam. It is the best and that is all you need to know.
I dont have a lot of knowledge about this but I think only difference is that in real life you probably need to make your own dataset.
I make always a new because I dont know very well how to use old ones. But I think with image and text you should use prebuilt models.
Neutral network is ml algoritm. If you mean like linear regression and random tree forest then answer is: sometimes with simple problems they might get better results but I think in future we use those less and less.
Try to find similar codes and make your layers same kind. Like if in some paper researchers made cat classifier then you should use same layer amount. Often you have to choose your layer count randomly but I often start with 2 hidden layers and all have input amount nodes. This seems solving very well all problems.
I have not used frozing a lot so I cant tell answer to this. I hope someone else can.

dmitryfedotkin · September 3, 2018, 6:43pm

Thanks for the answers!
Could anyone give me more information on questions 6 and 7?

alexandrecc · September 3, 2018, 9:27pm

To stay generic about model architecture, It is usually a compromise about model representational capacity, problem complexity, amount of data available, and variability of available data.

Many parameters affect representational capacity but the more important, at least in computer vision, are model depth and trainable amount of parameters.

Problem complexity is related to the complexity of hierarchical relations between features to solve the problem. For example, it is easier to detect a yellow triangle on a white background than an irregular spiculated mass in a similar fuzzy background. Relations between features can be spatially related (better solved by convolutions layers) or not spatially related (better solved by fully connected layers).

Amount of available data also helps you to decide the amount of trainable parameters needed. This consequently helps you to decide how many layers to freeze in an pre-initialized deep network. Trying to train too much network parameters for the amount of data will likely overfit the data very rapidly. Challenge is to find the optimum.

Variability of data also influence the amount of parameters that you need and that you can be able to train. Higher variability will likely need more parameters to catch a valid statistical representation of data, and consequently will need more data to properly train these parameters. Lower variability of data will need less parameters and most probably less data.

I tried to stay generic in the answer. Because 6 and 7 can have more specialized answers depending on the type of problem (NLP, CV, etc).

I hope it helps.