FastGarden - A new ImageNette like competition (just for fun)

marii · March 20, 2020, 10:57am

This is a bit silly, but could we clarify “sample”? Is that number of times you read an image off the hard drive?

muellerzr · March 20, 2020, 11:09am

See point about five epochs total. That is five epochs total with any model/models, but you are not allowed to begin with any pretrained weights whatsoever.

muellerzr · March 20, 2020, 11:11am

I believe we tried to but you are only allowed ~12,000 (the size of the dataset) images in one epoch. You can over/under sample etc however you want but you must keep the overall dataset size the exact same.

marii · March 20, 2020, 11:15am

Lets say I had an image, made a copy of it, and then zoomed into two different places. Would this be one or two samples? (within same batch)

muellerzr · March 20, 2020, 11:17am

Two. (You made a copy, and they’re both in a batch)

marii · March 20, 2020, 11:47am

Thanks, I thought it would be the one! Thank you for the clarification! I guess it wasn’t silly to ask.

muellerzr · March 20, 2020, 11:49am

A better way to think about it is the number of total individual data points the model will be seeing during one epoch (I hope that’s a better way?)

marii · March 20, 2020, 11:55am

Yeah, my thought was that data augmentations were just mathmatical transformations on the data. So, it would be the same as passing in the same image to a model that took two inputs. Therefore it was only one data point. I got the clarification though.

muellerzr · March 20, 2020, 11:58am

They are, however it adjusts the raw data and it’s one sample in the dataset (no copies are made, etc). You can test this quickly by looking at the PETs dataset, load in augmentations and take a look at A.) how many filenames there are after you return get_image_files and then compare it with B.) total batches in train and validation * batch size. You’ll notice they’ll line up almost perfectly (the dataloader total may be a little less, we drop the last incomplete batch of the training data)

jwuphysics · March 20, 2020, 12:45pm

Wow, you set up that repo very quickly! Is it that easy to get started with nb_dev or are you just really fast at getting projects up and running?

muellerzr · March 20, 2020, 12:47pm

It’s really that easy (source: I’ve made 3 repos now via nbdev)

jwuphysics · March 20, 2020, 3:46pm

For anybody who is trying to run tests outside of Colab (and perhaps GCP), it’s worth noting that tfrecord will throw errors due to the TensorFlow v1 and v2 inconsistencies. I’ve found that you can remedy this (at least on Paperspace, using the fastai v4 course VM) by doing the following:

Install tfrecord from the repo above:

git clone https://github.com/pgmmpk/tfrecord.git

Go into tfrecord/__init__.py and edit the following:

change first two instances of tf.python_io to tf.io (lines 38 and 54)
change final instance of tf.python_io to tf.compat.v1.io (line 80)

Run your code. It will throw a warning:

WARNING:tensorflow:From /notebooks/FastGarden/nbs/tfrecord/tfrecord/__init__.py:81: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)

Hope this is helpful to anyone else stuck on getting tfrecord to behave!

jwuphysics · March 20, 2020, 8:22pm

@muellerzr

I ran a few tests, adding RandomResizeCrop(224, min_scale=0.7) to individual item transforms, and slightly lowering the learning rate: fit_flat_cos(5, lr=6e-3). This gave me the best results (accuracy = 71.70% ± 0.37%, which is statistically consistent with your baseline single example). Obviously more data augmentation will help, and I’ll be doing more tests later!

Notebook can be viewed here.

muellerzr · March 20, 2020, 8:25pm

I believe you mean fit_flat_cos? (looked at the notebook, unless I missed something )

jwuphysics · March 20, 2020, 8:25pm

Indeed! Whoops, edited.

muellerzr · March 20, 2020, 8:27pm

BTW, once you guys are getting comparable results, go ahead and edit the second post with the results. I’ll show an example with @jwuphysics momentarily We’ll keep it on the honor system with keeping individuals on the leaderboard, just move them down one when it needs updating

If we decide a different format is more readable please post ideas!

DanielLam · March 20, 2020, 10:38pm

I’m following the baseline notebook, and trying to get the kaggle file set with the !wget ‘URL’ -O ‘name.zip’ , and my download seems to stop after a minute or so. Does anyone else get this issue? I just copied the download link from the developer console. Thanks -Daniel

muellerzr · March 21, 2020, 1:14am

I have not, however perhaps try again? And could you post all the steps? (the exact place you copied the URL from on the console, the browser used, etc)

Imolajay · March 21, 2020, 5:46pm

The value in limiting the batch size is that they go through the system more quickly and with less variability, which fosters faster learning

wdhorton · March 21, 2020, 7:18pm

Going to give this a shot. But for the setup instructions, would it be easier to just suggest people use the kaggle cli? Then it’s as easy as kaggle competitions download -c flower-classification-with-tpus to get the data, and you learn something that’ll be useful for future competitions.