FastGarden - A new ImageNette like competition (just for fun)

This is a bit silly, but could we clarify “sample”? Is that number of times you read an image off the hard drive?

See point about five epochs total. That is five epochs total with any model/models, but you are not allowed to begin with any pretrained weights whatsoever.

I believe we tried to but you are only allowed ~12,000 (the size of the dataset) images in one epoch. You can over/under sample etc however you want but you must keep the overall dataset size the exact same.

Lets say I had an image, made a copy of it, and then zoomed into two different places. Would this be one or two samples? (within same batch)

Two. (You made a copy, and they’re both in a batch)

Thanks, I thought it would be the one! Thank you for the clarification! I guess it wasn’t silly to ask.

1 Like

A better way to think about it is the number of total individual data points the model will be seeing during one epoch :slight_smile: (I hope that’s a better way?)

Yeah, my thought was that data augmentations were just mathmatical transformations on the data. So, it would be the same as passing in the same image to a model that took two inputs. Therefore it was only one data point. I got the clarification though.

They are, however it adjusts the raw data and it’s one sample in the dataset (no copies are made, etc). You can test this quickly by looking at the PETs dataset, load in augmentations and take a look at A.) how many filenames there are after you return get_image_files and then compare it with B.) total batches in train and validation * batch size. You’ll notice they’ll line up almost perfectly (the dataloader total may be a little less, we drop the last incomplete batch of the training data)

1 Like

Wow, you set up that repo very quickly! Is it that easy to get started with nb_dev or are you just really fast at getting projects up and running?

It’s really that easy :wink: (source: I’ve made 3 repos now via nbdev)

1 Like

For anybody who is trying to run tests outside of Colab (and perhaps GCP), it’s worth noting that tfrecord will throw errors due to the TensorFlow v1 and v2 inconsistencies. I’ve found that you can remedy this (at least on Paperspace, using the fastai v4 course VM) by doing the following:

  1. Install tfrecord from the repo above:
git clone https://github.com/pgmmpk/tfrecord.git
  1. Go into tfrecord/__init__.py and edit the following:
  • change first two instances of tf.python_io to tf.io (lines 38 and 54)
  • change final instance of tf.python_io to tf.compat.v1.io (line 80)
  1. Run your code. It will throw a warning:
WARNING:tensorflow:From /notebooks/FastGarden/nbs/tfrecord/tfrecord/__init__.py:81: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)

Hope this is helpful to anyone else stuck on getting tfrecord to behave!

3 Likes

@muellerzr

I ran a few tests, adding RandomResizeCrop(224, min_scale=0.7) to individual item transforms, and slightly lowering the learning rate: fit_flat_cos(5, lr=6e-3). This gave me the best results (accuracy = 71.70% ± 0.37%, which is statistically consistent with your baseline single example). Obviously more data augmentation will help, and I’ll be doing more tests later!

Notebook can be viewed here.

1 Like

I believe you mean fit_flat_cos? (looked at the notebook, unless I missed something :slight_smile: )

1 Like

Indeed! Whoops, edited.

1 Like

BTW, once you guys are getting comparable results, go ahead and edit the second post with the results. I’ll show an example with @jwuphysics momentarily :slight_smile: We’ll keep it on the honor system with keeping individuals on the leaderboard, just move them down one when it needs updating :slight_smile:

If we decide a different format is more readable please post ideas!

1 Like

I’m following the baseline notebook, and trying to get the kaggle file set with the !wget ‘URL’ -O ‘name.zip’ , and my download seems to stop after a minute or so. Does anyone else get this issue? I just copied the download link from the developer console. Thanks -Daniel

I have not, however perhaps try again? And could you post all the steps? (the exact place you copied the URL from on the console, the browser used, etc)

1 Like

The value in limiting the batch size is that they go through the system more quickly and with less variability, which fosters faster learning

Going to give this a shot. But for the setup instructions, would it be easier to just suggest people use the kaggle cli? Then it’s as easy as kaggle competitions download -c flower-classification-with-tpus to get the data, and you learn something that’ll be useful for future competitions.