Incredible paper. I made a DataLoader but I can’t figure out the cross_entropy part of the loss function since y is continuous. fastai mixup has a stack_y attribute which might be the better approach.
I find it fascinating that a network can be pruned so significantly and produce the same results. It suggests starting with large networks is very wasteful. It helps with faster convergence because there is more memory available to model the space but pruning shows it is far from optimal.
Here is a thought.
Why don’t we start with smaller networks that through annealing grow and shrink and grow and shrink until we see no additional benefit during training? Has anyone seen any experiments using this approach?
This is an interesting paper that gets at a few things I’ve been thinking about for a while. The authors first train a VAE model using a paired input to output framework. While doing so, the model learns priors related to classes in the dataset. After training, the priors can be used with just the decoder of the model to generate new samples from the training distribution.
In fastai we see two types of models that output images. There’s the WGAN type models used for bedroom generations and whatnot, and the paired input to output UNet models used for things like super resolution, deoldify, decrappify, etc. Generative models like WGAN are a pain to train, but the end model is able to generate new images from a latent vector. The UNet models get around the training issues by using paired training data and using the NoGAN training technique, but can’t be used to generate novel images as the model requires an image input.
I’ve tried previously to adapt NoGAN to generative models using an encoder/decoder architecture without skip connections. The idea was that during training I’d track mean/variance stats of the vector input to the decoder. To generate new samples, I would sample vectors from that distribution and pass them through the decoder. The results were pretty bad - I couldn’t get the sampling right and the generated images didn’t look like anything.
I think the techniques in the VQ-VAE paper for learning the correct prior would allow for adapting NoGAN training to generative models. The result would be generative models that are much, much easier to train.
Update:
Fixed link
I just watched https://m.youtube.com/watch?v=s7DqRZVvRiQ lottery ticket paper author and found it really helpful before going into details of the paper
No, the URL link is to a celebrity interview show. Don’t get me wrong, it’s still an interesting video! But, @Kasianenko Could you please post the correct link? Thanks!
Nothing to implement here, but some things worth thinking about. I posted about it on the forum already (without much response) but now this paper was picked up in Andrew Ngs weekly newsletter, so maybe some more people will think it’s worth having a look:
I have a tons of interesting paper that I want to go through but I hesitate to post them so that I don’t flood the thread.
I was wondering however if anyone else is interested in domain adaptation (supervised or unsupervised). I am currently focusing on this area and for that reason I collected the latest CVPR papers on it that looked promising. I am going over them as we speak but I would love to work with others on trying to implement some of their ideas.
Let me know if this sounds interesting to any of you and we can maybe do a working group!
p.s. if the category of domain adaptation is interesting to this wiki let me know and I will post my review so far.
Sure!! domain adaptation could be interesting for someone. So please feel free to post the papers who liked most, possibly the most recent ones. And add your review as well. If they get many “likes” we will put in the wiki.