Research collaboration opportunity with Leslie Smith


(Leslie N. Smith) #1

Hello everyone,

I am a researcher in the area of deep learning. Jeremy and I have been discussing various topics over the past several months. In our most recent conversation, we have some interesting ideas on what might work and how well. We would like a curious and enterprising individual or team to empirically test and compare our ideas. Your reward for your effort should be co-authoring a publication with us, if all goes well.

Here is how I described my idea to Jeremy: “In addition, I can suggest a new idea for faster training. You mentioned progressive resizing. I am toying with a different approach. What if one trained in stages where the first stage includes only one image per class for a couple of hundred iterations/epochs, the second stage included 10 images per class, third stage 100 images, and finally all the images. This should be very fast so the question is does it perform well. One can even choose the first image to be iconic for the class so the network learns a good initial set of weights to initialize the next stage’s weights.”

Jeremy replied with the following idea; “So I guess another alternative would be: in the first stage just include a small number of very different classes (e.g. one type of fish, one type of plant, one type of vehicle). Then gradually add more classes, and towards the end of training add more similar classes (e.g. different breeds of dog). My intuition is that the latter approach might be more successful, especially when trying to train with large learning rates. But I’d be interested to see!”

Ah, the art of science. We have two different hypothesis on ways to potentially speed up training. Which one is right? Or neither? Or both? Experiments must be run!

I believe that trying this on Imagenet will be definitive since it is so computationally intensive but my approach is to always start as simple as possible to get the bugs out. So start with a small dataset (i.e., MNIST, Cifar-10, …) and when the process is working, try it on Imagenet.

I will periodically but infrequently, be checking on replies to this post. If you need to contact me directly, my email address is leslie.smith@nrl.navy.mil.

As a postscript, Jeremy and I were also discussing that it would also be interesting to have a study that compares transfer learning and various initialization methods (i.e., Gaussian, msra, LSUV, etc.) to determine if one should always start training by transfer learning or not. It would also be illuminating to compare which source datasets are best for transfer learning for which target datasets.

I am also thinking of a new initialization that is similar to LSUV (see https://arxiv.org/abs/1511.06422) but instead of decomposing Gaussian noise into an orthonormal basis, to use orthonormal basis such as Gabor filters that are indicated by papers such as https://arxiv.org/abs/1411.1792.

One more thing - you might want to get organized among yourselves and split up the work. There is plenty to do and this can be a team effort. Enjoy and best of luck to you on this exploration and adventure!

Best,
Leslie


#2

this is truly amazing :slight_smile:

I’ll have to rearrange what I am working on :slight_smile: I should have a docker container with CIFAR10 (thanks @hamelsmu :wink: ) ready soon-ish that others will be able to use along with preliminary results on this:

Oh, this will be fun! :slight_smile:


(Cedric Chee) #3

Just in case anyone missed it, there’s an interview with Leslie N. Smith, PhD, Senior Research Scientist at the US Naval Research Laboratory, by @reshama We can learn more about him in that interview as well.


(Vishal Pandey) #4

It would be really amazing and fun to experiment with all these ideas…!!


(Wayne Nixalo) #5

This sounds very interesting. I’ll get started with cifar-10. @radek that docker idea sounds like a good way to replicate experiments.


(Mohammad Saad) #6

I’ll also take a crack at it!


(Reza Sohrabi) #7

Very interesting! weird if no one has addressed this yet in some extent.


(Abi Komma) #8

I’d be super interested to contribute on this task as well.

To get a bit organized, should we start a spreadsheet (like this: https://goo.gl/Bk9An8) and list all the potential tasks/work-streams/ that we need to experiment with and volunteers can put their names beside the task and update on the progress?

I think this effort will be useful to compete in general kaggle competitions as well. I have been struggling to get organized with a good framework for running multiple experiments and keeping track when trying things for kaggle.

Open to other ideas for streamlining within the group.


(deep narain singh) #9

I am interested to get involved in this , any plans on collabration and distribution of task ?

Thanks,
Deep


(nirant) #10

I am working on this idea from Jeremy using CIFAR-10, then CIFAR-100:

“So I guess another alternative would be: in the first stage just include a small number of very different classes (e.g. one type of fish, one type of plant, one type of vehicle). Then gradually add more classes, and towards the end of training add more similar classes (e.g. different breeds of dog). My intuition is that the latter approach might be more successful, especially when trying to train with large learning rates. But I’d be interested to see!”

If you are interested in collaborating, please reply or @ mention me :slight_smile:


(Vitaly Bushaev) #11

Very interesting ideas. With no real justification, I intuitively like Jeremy’s idea better. I would love to take part in this project as well! what would the baseline results we would want to compare our results to ?


(Kevin Bird) #12

When you mention progressive resizing, do you mean starting as sz=32 then 64, 128, etc? I have used this approach and it is very helpful in getting you going in the correct direction without taking a really long time. I’m wondering if there would be another improvement too that which would be able to take the original image size and automatically take the resizes so the model would take whatever your original images are and then somehow use the higher quality images less frequently or at a later step so they wouldn’t have as much training on the slow, higher quality images. I really like the idea you are proposing as well Leslie. It seems similar to me in the fact that it’s just going to start pointing your model in the correct direction, but will go faster.


(Michael) #13

Hello everybody,

I would be also interested in joining this project to learn and get new insights! :slight_smile:

Best regards
Michael


(Michael Skinner) #14

Count me in as well! @jeremy is there a way to “RSVP” for this collaboration?


(Iacopo Poli) #15

Hi all, a few pointers to related work for people interested in this project, someone already pointed them out on Twitter, but I think it is useful to collect them here:

  • Curriculum Learning, Bengio et al., ICML 2009
  • DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size, Kabkab, Alavi, and Chellappa, 2016

The introduction of the second paper contains pointers to other relevant work. To find more, the keywords are exemplar selection (or equivalently active selection) and active learning.


(Iacopo Poli) #16

I’m interested. I’m a full time R&D engineer in ML at a company in France so I can’t give my full time availability as I would be working on this on my free time.


(Rajat Gupta) #17

Hi @iacolippo and @nirantk I will also love to collaborate and learn with you guys so please keep me in loop, I have started on analysing the problem a bit and will share the progress as it gets to something good.

Thanks


(Leslie N. Smith) #18

Hi all,

I am completely overwhelmed and gratified by the response.

It sure looks like this will be a team effort. I’d like to see this develop organically so you all need to figure out what exactly needs to be done (i.e., baselines, code modifications, full suits of experiments, solve problems as they arise, etc.). I like that Abi Komma created a spreadsheet for the tasks so use it. Perhaps someone needs to step forward as the leader who defines the tasks. Or perhaps you all individually state what you want to do - let’s see what works best for the team.

I will be watching and will comment only when needed. For me this project is an experiment and if it works well, I’ll want to do this again.

I’d like to point out another paper that is relevant to my idea. Take a look at:
Hestness, Joel, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou. “Deep Learning Scaling is Predictable, Empirically.” arXiv preprint arXiv:1712.00409 (2017).
This is a nice paper by a team from Baidu Research. Section 5 is particularly practical. Section 5.3 starts with:
“Model Exploration using Small Data: It may seem counterintuitive, but an implication of predictable
scaling is that model architecture exploration should be feasible with small training data
sets. Consider starting with a training set that is known to be large enough that current models show
accuracy in the power-law region of the learning curve. Since we expect model accuracy to improve
proportionally for different models, growing the training set and models is likely to result in the
same relative gains across the models.”
This says one can choose a model with a small dataset. I am wondering, given a model if one can also get reasonable weights by training on the small dataset to do “transfer learning” as a starting point for training a larger dataset. Hence my stages.

BTW, the paper also says “We found that larger training sets and larger models become harder to optimize.” So it might not be so simple as I state.

My deepest appreciation to all of you for your interest and enthusiasm. Perhaps we are discovering a new method for doing science. It should be fun.

Best regards,
Leslie


(Pranjal Yadav) #19

I’m interested, PM me and we can discuss the details


#20

Hi, @iacolippo and @nirantk I’m also interested. I can give 2 hrs during weekdays(India time) and more on weekends.