Share your work here ✅

hkristen · October 24, 2018, 10:58am

Hi everybody!

While watching lesson 1 of the new course I was wondering where to get a big forest/nature related dataset to build an image classifier on, as this is the domain I am coming from.

I finally found the ImageCLEF Plant Identification Challenge 2013 which provides a already labeld training dataset containing images of 250 plant species on 10485 images (25GB). Most of the images are showing leafs but there also images of flowers, fruit, stem & the entire plant.

For the classifier I used the images with a uniform background (category=SheetAsBAckground) which only contain leaves: 4921 samples and 124 classes.

I started with training a pretrained resnet34 and already got the error rate down to ~3% after 17 epochs. Interestingly fine tuning didn´t help to improve accuracy/loss drastically.

What to do next:

Train network for category=NaturalBackground
Maybe exclude classes with samples < 10?

Below you can find a GIST of the notebook I used. I am looking forward to your feedback on what I can improve or what else could be done with this dataset

gist.github.com

https://gist.github.com/hkristen/696d5111692c25c9ae0d6949501e9215

lesson1-imageclef2013_plant_leaf_types.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",

This file has been truncated. show original

Cheers,
Harald