FastGarden - A new ImageNette like competition (just for fun)

wdhorton · March 21, 2020, 7:23pm

Also if anyone is running outside of Colab you’ll have to do pip install tensorflow to run the starter notebook

Edit: more specifically, you’ll actually want pip install "tensorflow>=1.15,<2.0" since this tfrecord repository isn’t compatible with TF2.

muellerzr · March 21, 2020, 7:35pm

No, as it will download it all as tons and tons of zip files I found (atleast on colab)

wdhorton · March 21, 2020, 7:36pm

That’s weird, for me when I run it I just get one zip, flower-classification-with-tpus.zip

muellerzr · March 21, 2020, 7:41pm

In colab? Or a different env

wdhorton · March 21, 2020, 7:42pm

I’m running on my own machine in Ubuntu so it definitely could be an env or version difference

muellerzr · March 21, 2020, 7:42pm

I can say that in Colab it downloaded as a bunch of zips

wdhorton · March 21, 2020, 7:42pm

I should’ve led with this, but: thank you for setting this up! It’s a fun way for fastai veterans to get acquainted with the new library, with a little less pressure than a full Kaggle competition

miwojc · March 21, 2020, 8:41pm

On paper space I just got one zip file using kaggle API and curlwget chrome extension

DanielLam · March 21, 2020, 11:11pm

Hi Zachary,

***UPDATE:
Tried to figure this out. It seems like I get intermittent byte errors, and the wget freezes. To get the wget to restart itself after intermittent byte errors, I used the following line

!wget -T 15 -c “YOUR_URL” -O “flowers.zip”

-T = checks for timeout after 15 seconds (default 900 secs)
-c = continue wget download in case it was partially downloaded already

***END

I ended up just downloading from the site instead of through jupyter notebook.

But here is what I was trying.

Get link from kaggle. Note: the coped link is much longer than the picture.

image1003×442 41.6 KB
Wget in jupyter notebook. Note, this is not the curl method.

image1187×91 7.36 KB
Download freezes

image895×128 5.05 KB

I’m wondering if I’m missing some authentication or key. But I thought the copied download link contains it.

Thanks,
Daniel Lam

muellerzr · March 22, 2020, 12:56am

I’m not 100% sure what the errors could be, I’d try the recommendations from others in the thread too. It could be dataset dependent and/or environment dependent too (from Kaggle’s side on a firewall standpoint,etc). Sorry about the issues

DanielLam · March 22, 2020, 1:15am

I figured out a solution for my problem (updated in my post). I just added a “-T 15” flag into my !wget command. During my kaggle download, I’d get a byte error that stopped the download.

nok · March 24, 2020, 3:28pm

what is the MaxBlurPool Layer? It seems to help the model get ~1-2% boost.

muellerzr · March 24, 2020, 3:32pm

It’s a recent technique that has the current highest leaderboard spot on ImageWoof and Nette:

github.com

ducha-aiki/imagewoofv2-fastv2-maxpoolblur/blob/master/fastai2-imagenette-train-maxblurpool.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fastai2.basics import *\n",
    "from fastai2.vision.all import *\n",
    "from fastai2.callback.all import *\n",
    "from fastai2.distributed import *\n",
    "from fastprogress import fastprogress\n",
    "from torchvision.models import *\n",
    "from fastai2.vision.models.xresnet import *\n",
    "from fastai2.callback.mixup import *\n",
    "from fastscript import *\n",
    "\n",
    "torch.backends.cudnn.benchmark = True\n",
    "fastprogress.MAX_COLS = 80\n",

This file has been truncated. show original

jwuphysics · March 25, 2020, 2:46am

From what I understand, the MaxBlurPool is designed to help convnets (re-)learn translational invariance. Translational (or shift) invariance can be sort of learned via image augmentation, but aliasing problems can cause issues. R. Zhang showed in his ICLR paper that a BlurPool (=downsampling) layer after MaxPool operation can help anti-alias the network and recover translational invariance. See also the official Github repo for neat examples.

As is demonstrated in the imagenette leaderboards (see post by @muellerzr) MaxBlurPool can be used as a drop-in replacement for all instances of MaxPool in the xresnet architecture.

EDIT: ~~Based off my very limited number of tests, MaxBlurPool hasn’t helped me top the FastGarden leaderboard…~~ Wait, actually I’m starting to get good results using MaxBlurPool and a lower learning rate. Unfortunately Paperspace gave me a Quadro M4000 for this session, which means that it takes 7 minutes to train an epoch…

muellerzr · March 25, 2020, 5:05am

Why not try Colab?

vijayabhaskar · March 25, 2020, 6:53am

@muellerzr To download flower-classification-with-tpus.zip using kaggle-api all you have to do is

import os
kaggle_data={"username":YOUR_KAGGLE_USERNAME,"key":YOUR_KAGGLE_KEY}
os.environ['KAGGLE_USERNAME']=kaggle_data["username"]
os.environ['KAGGLE_KEY']=kaggle_data["key"]
!pip install git+https://github.com/Kaggle/kaggle-api.git --upgrade
!kaggle competitions download -c flower-classification-with-tpus

using the default kaggle cli version that comes preinstalled with colab downloades bunch of zips, install the latest github version to avoid it.

jwuphysics · March 25, 2020, 11:42am

I don’t love the Colab interface, and wanted to try Paperspace since it was recommended for the current course. But yeah, maybe I’ll stick with Colab!

muellerzr · March 25, 2020, 3:34pm

I’m resharing directions for turning it into native Jupyter. Looks like it doesn’t break the TOS so you should be interested by that

DanielLam · March 25, 2020, 5:34pm

Anyone getting >75%? I’m plateauing around 70-73%, which is mostly the baseline notebook approach. Image size is 224x224. Things I’ve tried:

Architecture: resnet, xresnet, 34, 50, 101, deep resnets, efficientnet, densenet121, densenet169
Optimizer: Adam, ranger (ranger seems faster for 5 epochs)
Fit function: fit_one_cycle, fit_flat_cos (both similar with ranger, adam is less accurate)
Self-attention seemed to help maybe ~4-5% (only tried on the xreset)
Stratified training/validation data: downsampled to stratify, and then augmented to get back to ~12000 (similar results as using the original training set with the splitter). Just as a data point, from all the data (~16000 pts), labels 6,34,44 have only 23 examples (lowest). Label 67 has the most at 1010 examples

Here’s what the data looks like if you use the “splitter = IndexSplitter(range(12753, len(data)))” from the Zachary’s baseline notebook.

morgan · March 25, 2020, 6:25pm

When I get done with the DeepFake competition next I’ll be giving some of these techniques a go:

I’ve only skimmed it so maybe most are covered, but worth a watch/read if you’re looking for new ideas