Can I skip the validation set with infinite labeled data?

ZoVoS · July 9, 2019, 3:49pm

Quick question to make sure I understand what is going on here:

If as a premise I generated an infinite set of image batches one at a time (say 32-64 RGB images) each with labels to perform a regression on floats, after I tuned the network parameters for basic training, would it be fine to just feed a never ending stream of 100% training set data into the model?

As far as I can see the validation set is only there for human comprehension to ensure that the data isn’t over fitting in the model we are training and to produce unbiased metrics.

There would be no epocs in this model as each training batch would be generated live one at a time, each would be pseudo unique and therefore it should be almost impossible to over-fit my data (if enough consideration is put into the generation side).

This data would have 0 transforms applied apart from normalization on a per batch basis. as the random crop, skews, angles, flip etc are all taken into account when generating, and although generation is not prohibitively costly, extra data being generated that is not contributing to training the model would definitely add unwanted overheads.

My gut instinct says this should be fine, however maybe once every 20-30 batches is it worth generating a validation set to be sure that everything is running smoothly?

Side question for bonus points
If I generate an image of a face and for example wanted to regress a float to represent the faces width, could I (after training is completed) input 5-10 faces and simply mean the output giving me the best chance of getting the number right? Again intuition tells me this is fine.

kushaj · July 9, 2019, 4:20pm

Maybe. But if your model starts overfitting than you would not be able to tell.

The most important step.

It can be true provided you have a large variety in your dataset. If you have only 1 unique image than after doing all transforms, you would not be able to generalize. Now as you start increasing the number of unique images the gap starts decreasing and possibly after a threshold it becomes 0.

If the 5-10 faces are transformation of the 1 face you want output for, than you would also have to add code that adjusts the width value depending on the transform applied. So if you take an image as reference and you get the width value for it. Now if rotate that image and get a width value for it. Than you would need to adjust this width value so that you get the value for your reference image. This thing becomes quite difficult to get right in some cases, so in theory it could work, but a lot of work would be needed.

marii · July 9, 2019, 4:28pm

Generally the only way we have infinite labeled data of the top of my head is a GAN. A few issues with your example is pseudo-unique, as we need to know what that means. Batch size would have to be large because statistics (probably 64 depends on classes). We also tend to turn off features in validation such as dropout.

Still I would run a validation set against the model to get metrics for the human. It also gives you a metric for when to reasonably run your model against the test set, which should stay constant and it is how the final validity of the model is determined.

I imagine this would be about like validating a GAN if you did not run it against a validation set. Wouldn’t know how it was doing until someone looked at it, ie run against the test set if you did not have a validation set.

I think it is sort of a mute point because running a validation set every 1000(any larger number) batches would have VERY minimal performance overhead for the insight it would give into your model.

ZoVoS · July 9, 2019, 5:02pm

So for better background info, the input data is pictures of faces dynamically generated via randomization shape-key deformations. I have integrated FastAI directly into my blender install so that I can render and train a network from there. There are NO TRANSFORMS applied to this data in the conventional sense, all of this data is 100% true to real life representation.

Textures are currently static, I have a few to cycle but I’m a dab hand with substance designer so I will eventually just create a bunch of procedural face textures that generate on a set of floats much in the same way as I’m generating the random shape keys.

The background and lighting is also dynamically generated by using HDRI images and overlapping them to (much like in the section 2 videos of fuzzy label training) give an almost limitless set of backdrop and lighting conditions.

The camera position, angle, slide from origin, focal length are generated to give all possible permutations of the end users facial input although I haven’t taken into consideration multiple face inputs or “non face objects” yet but I thought I’d just train a mini classifier for that at some point.

The reason the variance is so high is because I have a desire to generalise this from the artificial world to the real world.

I am in essence generating a limitless, full variation data set of faces, based on minimum and maximum bounds that a human face can express.

tempface

It should be, by very definition impossible to over fit my data as the data never gets sent twice to the network, it will only ever see each image once. <— This is what I need to check because this is just a scrap project outside the remit of what has been taught to us to ensure that I understand the underlying fundamental concepts that are being taught.

All of my tests seem to show this works fine, I’m just wanting to make sure that the validation set is just for me, as I’m never going to bother looking at the results of training epocs after the initial set up and tweaking of hyper params, I wanted to be sure there is nothing I’m missing when trading the code involving the validation set.

Second bonus question
Am I right in saying the purpose of interception of the network in lesson 7 using the vgg16_bn is because the loss function doesn’t adequately express the loss in fine feature detail for the pets, but that also means that in theory we can directly use the cnn kernel from the resnet to perform a loss on the kernels in respect to the generator network?

Without the use of GANs shouldn’t we be able to train an inverse convolution that simply estimates the output kernel feature space using the kernels as a direct loss?

EDIT : - one last thing, I’m using a simple resnet50 with transfer learning from the image-net data-set as the feature space is already encoded for faces, also the model is only designed to regress shape-keys so it’s a far simpler process than the task first looks.

ZoVoS · July 9, 2019, 5:03pm

Thanks for the catch on pseudo vs sudo >_< too much linux!

marii · July 9, 2019, 5:15pm

Thank you for the example on how you were getting your limitless training data. This is actually a technique I have seen used in automated driving.

The technique you are using would have to adequately model a “real” scenario. As in same distribution of classes, and the features (what you are changing) that are available. I would actually be MORE interested in having a validation set in your case, to make sure the image generation technique was appropriately modeling reality.

“It should be, by very definition impossible to over fit my data as the data never gets sent twice to the network, it will only ever see each image once” -> this part is true, under this very specific scenario. Though they models effectiveness is completely dependent on your generation algorithm at this point.

ZoVoS · July 9, 2019, 5:25pm

The problem is to get a validation set of real images I would have to eyeball the outputs anyway, any real image I would have to replicate in a 100-300 dimension feature space by moving shape-keys which is more time prohibitive than eyeballing a result VS running a single hit image at random intervals of the training step and having it email me a rendered image of the same shape-key variety for a human comparison.

In essence this is an experiment to test if I understand the process well enough to train and forget, then come back in a large amount of time and see if it works.

The gradients shrinking/exploding are a fear but apart from that Jeremy’s waterfall model showed the only real limitation to training is data which is why I designed this workflow.

I can see what you mean though and that does add a few extra considerations to my training process so thanks for your input

marii · July 9, 2019, 6:08pm

If you are in part 1 I wouldn’t worry too much about shrinking or exploding gradients. There are is enough regularization and the parameters are picked well enough to make it not an issue in your case.

In essence, yes your model should be able to train forever, as you only have 1 epoch. This is assuming that the features have infinite permutations, as you could also generate the same image over and over if it was a finite.

Also, since we are assuming the generation algorithm is absolute, let us say that it only generated finite number if images,5, over and over. The training,validation and test set are all generated by the same algorithm would be an infinite number of the 5 same images. Therefore we can say you have actually removed the idea of over fitting entirely from your model, because if it accurately can detect the 5 images, it gets 100% accuracy in training, validation, and test. The model would never over fit, as there is just no difference in what is in the training, validation, or test set. It is completely okay in this case for the “model” to be pixel by pixel comparison of the images, and we would still accurately predict the validation and test set. This pixel by pixel like comparison would generally be considered a strong level of over fitting.

ZoVoS · July 9, 2019, 7:22pm

Great that’s exactly what I was hoping to hear, I did finish all the videos but I did crush them into a week or so. This makes me a little nervous that all the self evident things in my head are actually wrong and I am fooling myself to think I understand it, but that’s just general anxiety I suppose, I don’t think anything will work until it does!

Thanks again for the confirmations, I’m not good at asking other people if my assumptions are right and would rather just smash my head on the problem until I can confirm it from 100 different angles. But the lessons say get involved and ask questions on the forum, and these have been the most informative and concise examples I have ever seen on practical deep learning so the guys are doing something right, and it’s in my interests to take their advice.

kushaj · July 10, 2019, 12:09pm

I have just started getting into automated robotics/cars. So I don’t have much knowledge in this field. Can you please guide me on what I should focus on? Currently, I am using Coursera for various courses related to this. And also, can you explain like how you are using the above technique in automated driving?

marii · July 10, 2019, 7:50pm

“Seen used”, I went to a conference(GTC) and multiple groups were presenting research on the topic. I had a general interest, though I am not particularly working with automated cars. I was mostly trying to get an idea of what they were doing in the space. It seems that to a certain extent 3d simulation is very important because the expense of real world tests is fairly high, and much slower than what can be done by simulating the environment. You can also apply various “augmentations” to the environment as well, allowing for more varied training data.

I am not working within that field, or plan to in the near future, so I can’t really judge how impactful this research was, but it seemed very useful in the way that they were framing it. I mostly have an interest in possibly doing it as a hobby after I get my first deep/machine learning job.

kushaj · July 11, 2019, 6:17am

Ok thanks.