Do you have any good recommendations which is a good GPU which is value for money at the moment? As RTX 3090Ti of 24GB memory is not needed much if using techniques like gradient accumulation?
If you have a well-functioning but large model, do you think it can make sense to train a smaller model to produce the same final activations as the large model?
Yeah you want to look into knowledge distillation. It’s a very useful and common technique for making models smaller and faster, especially for practical inference applications.
Question:
Would it be better to probably instead of splitting the data randomly, split it by disease? Would that be better for splitting? How’d we do something like that?
ah I see what you are saying.
I don’t think this would be wise.
You want the model to get to know all the diseases you care about.
On another note, if we had a plant_id, e.g. some sort of identifier of a single plant, then it would be very wise to split on that. E.g. NOT having the same plants across training and validation.
That would ensure your model doesn’t learn how a specific plant looks like VS a disease.