Create an image dataset from scratch

(Ben Love) #22

I ran it with my structure above and it worked on most cells but not all.
When I ran it, it did create the models and tmp directories on the same level as train/valid/test though.
I haven’t investigated the errors, but I got a ValueError on
# 2. A few incorrect labels at random plot_val_with_title(rand_by_correct(False), "Incorrectly classified")

and cells like

plot_val_with_title(most_by_correct(0, False), "Most incorrect cats")
plot_val_with_title(most_by_correct(1, False), "Most incorrect dogs")

resulted in <Figure size 1152x576 with 0 Axes> . So I’ll have to look into what’s up there. But I got some pretty cool results otherwise.

Looking at that example from @reshama it looks like I need a sample folder too. Thanks!

(Ronnyronay) #23

@benlove do you know what data should be inside sample directory?

(Ben Love) #24

@chadst88, I can’t answer that. I don’t know if we actually need anything there for our own images. The sample directory in dogscats has train and valid directories, each with a cats and a dogs directory. There are also a couple of np array files (cached or compiled? idk). Maybe someone else can shed some light on what the sample directory is for. It looked like the video from lesson 1 was about to discuss each directory but then moved on.

(Jeremy Howard) #25

The samples dir is just if you want to work on a subset of the data for some reason.

However, it’s more flexible to just use a CSV, as we do for the Planet dataset.

(Ronnyronay) #26

Hi @jeremy if I want to replicate model with different datasets, should I fill in samples dir or just left them blank? Thanks

(Amogelang Moloko) #27

Hi Ben,

Were you able figure out why you got the errors listed above? As I got similar errors too.


(Ben Love) #28

I haven’t had time to investigate those errors yet, sorry. Best of luck.

(HN) #29

I had a similar problem, where I downloaded lots of images and wanted to assign labels myself. There wasn’t a good solution for this, so I created my own and open-sourced it.
Maybe you find it useful in any way! :slight_smile:

Here is a short description
From Idea to Open-Source in 12 days – holger – Medium
and here is the Github project
GitHub - hellno/kono_data: kono data - the human way to annotate a dataset


(Amogelang Moloko) #30

Thanks Holger!

Awesome work :smiley: