Research collaboration opportunity with Leslie Smith

iacolippo · May 17, 2018, 11:49am

@nachiket273 @boxreb14 @mcskinner @MicPie @bushaev @nirantk @deepnarainsingh @abi @radek @Borz

let’s create a shared Github repository to collaborate? What do you say? It would make collaboration much easier. We can link to the Google spreadsheet in the readme to distribute the work, or use the Project feature of Github repos to assign tasks

I can create it and make it either private or public, I just need your Github usernames.

If I forgot anyone, @ me

Nubbinsonfire · May 17, 2018, 11:53am

Hi I would interested in helping too!

diazandr3s · May 17, 2018, 12:00pm

@iacolippo That sounds great. I would be happy to collaborate in that way. Each collaborator could have their own repository or branch.
What do you guys think?

Srividya22 · May 17, 2018, 12:55pm

Hi, I am interested in contributing to this project. Please add me in as well. Thanks

emilmelnikov · May 17, 2018, 1:13pm

Count me in too, if possible.

MasonSu · May 17, 2018, 1:29pm

@iacolippo Using github should be a good way to collaborate. I am also interested in this project. Please count me in. Thanks.

ravivijay · May 17, 2018, 1:31pm

I’m in too!

abi · May 17, 2018, 4:33pm

Great idea, I think you should create the github repo, public for now should be Ok?

Once people are making progress, they can check in their notebooks and add a line in the spreadsheet or we can evolve to the github project features as needed.

Let’s start lightweight first? spreadsheet and public repo with a readme for now?

~Abi

MicPie · May 17, 2018, 4:58pm

My GitHub name ist also MicPie (https://github.com/MicPie).
Thank you for the setup!

nachiket273 · May 17, 2018, 5:15pm

sure , my github user name is nachiket273(https://github.com/nachiket273)

iacolippo · May 17, 2018, 5:21pm

I created a Github organization to make things even easier, provisory name: theresizers https://github.com/theresizers I’m open to suggestion.

I’ve created the repository: https://github.com/theresizers/smart-dataset-growth same thing, open to suggestions for the name

I’ve put the google spreadsheet in the readme.

To add you as collaborators in the organization I need your usernames, I’ve added two people already, for the rest of you, send me your username in PM.

For the ones of you that I already added, you should have an invitation in your inbox!

gokkulnath · May 17, 2018, 5:54pm

I would also like to participate ! My GitHub name is Gokkulnath (https://github.com/Gokkulnath).
Looking forward to uncover exciting results ! and Thanks for the oppurtunity!

iacolippo · May 17, 2018, 7:07pm

added everyone that gave me his/her Github username. We can use the repo wiki to share ideas and common practices to have uniformity in the experiments.

Alkahwaji · May 17, 2018, 9:50pm

yes, I am interesting

jeremy · May 17, 2018, 11:48pm

FYI gang, something I’ve noticed is that these kind of joint projects work best when individuals decide to just go ahead and implement stuff - then as they go, they can provide updates on progress, and make specific requests as to additional bits of work that need to be done, which others then can get to work on.

My suggestion: don’t wait for someone to organize you all into a group and distribute work to you, since I’ve noticed in practice this rarely happens at all, and even when it does it tends to be slower than just enthusiastic individuals jumping in and getting to work!

jeremy · May 18, 2018, 12:19am

Leslie:

“Model Exploration using Small Data: It may seem counterintuitive, but an implication of predictable

scaling is that model architecture exploration should be feasible with small training data

sets. Consider starting with a training set that is known to be large enough that current models show

accuracy in the power-law region of the learning curve. Since we expect model accuracy to improve

proportionally for different models, growing the training set and models is likely to result in the

same relative gains across the models.”

This says one can choose a model with a small dataset. I am wondering, given a model if one can also get reasonable weights by training on the small dataset to do “transfer learning” as a starting point for training a larger dataset. Hence my stages.

This is a great point. And of course remember the idea we discussed in class - using smaller images for your experiments is a great way to do experiments more quickly, and the insights are likely to be similar to what you’d get with larger images.

Vishucyrus · May 18, 2018, 11:21am

Nice idea…
My GitHub username is https://github.com/VishuCyrus

jeremy · May 18, 2018, 1:45pm

Sorry @Leslie the people have spoken

narvind2003 · May 19, 2018, 2:53am

I am trying to find existing literature around this “incremental learning” problem. I could find work related to class-incremental learning (adding new classes as we run more batches) and some ways to avoid catastrophic forgetting - closer to Jeremy’s idea. I am yet to see anything where the same classes are used in each epoch but with stage-wise increase in batch sizes - Leslie’s Idea.

some links which might be useful to folks working on this problem:

radek · May 19, 2018, 4:41am

Please find the docker setup here. This can be nice for getting you off the ground in your experiments.

There are two paths you can take. You can either use this as a blueprint for creating your own environment locally on your machine (most of the commands you might need are in the Dockerfile). Or you might want to fork the repo and do your work in it as is. Meaning - as you make changes / create new notebooks the changes to them should show in the workspace folder of the repository. You should be able to make a git commit and push them to github (potentially for sharing the results with others or storing your own work).

I haven’t had a chance to use this like this extensively (this is the first docker image I defined) so mileage can vary.

I went for relative cleanliness but the issue with this setup is that the fastai library is hard to get to (it doesn’t live in the workspace folder but only lives in the container). For the type of work we are setting out to do here this can be limiting. If there would be interest, I can create a separate branch which will use a fastai repo living in workspace but this comes with a couple of rough edges.

I haven’t had a chance to test this out with the most recent version of the library so this pulls down a specific tag on my own fork of fastai. I’ll point this to fastai master or some earlier commit should it be needed when I have a chance to test drive that it works.