Create an image dataset from scratch

I ran it with my structure above and it worked on most cells but not all.
When I ran it, it did create the models and tmp directories on the same level as train/valid/test though.
I haven’t investigated the errors, but I got a ValueError on
# 2. A few incorrect labels at random plot_val_with_title(rand_by_correct(False), "Incorrectly classified")

and cells like

plot_val_with_title(most_by_correct(0, False), "Most incorrect cats")
and
plot_val_with_title(most_by_correct(1, False), "Most incorrect dogs")

resulted in <Figure size 1152x576 with 0 Axes> . So I’ll have to look into what’s up there. But I got some pretty cool results otherwise.

Looking at that example from @reshama it looks like I need a sample folder too. Thanks!

@benlove do you know what data should be inside sample directory?

@chadst88, I can’t answer that. I don’t know if we actually need anything there for our own images. The sample directory in dogscats has train and valid directories, each with a cats and a dogs directory. There are also a couple of np array files (cached or compiled? idk). Maybe someone else can shed some light on what the sample directory is for. It looked like the video from lesson 1 was about to discuss each directory but then moved on.

The samples dir is just if you want to work on a subset of the data for some reason.

However, it’s more flexible to just use a CSV, as we do for the Planet dataset.

2 Likes

Hi @jeremy if I want to replicate model with different datasets, should I fill in samples dir or just left them blank? Thanks

Hi Ben,

Were you able figure out why you got the errors listed above? As I got similar errors too.

Thanks!

I haven’t had time to investigate those errors yet, sorry. Best of luck.

Hi,
I had a similar problem, where I downloaded lots of images and wanted to assign labels myself. There wasn’t a good solution for this, so I created my own and open-sourced it.
Maybe you find it useful in any way! :slight_smile:

Here is a short description
From Idea to Open-Source in 12 days – holger – Medium
and here is the Github project
GitHub - hellno/kono_data: kono data - the human way to annotate a dataset

Best,
Holger

6 Likes

Thanks Holger!

Awesome work :smiley:

:rocket:

1 Like

I am a newbie to this forum. So I am putting my request here. Let me explain the scenario.

I am trying to load images using ImageDataBunch.from_folder(), images are loaded for the training set, but for the validation set, I am getting the following error message.

Please help to fix this.

I have given below my code and error message.

Code:

np.random.seed(41)
data = ImageDataBunch.from_folder(path, train=“training”, valid =“kanna”,
ds_tfms=get_transforms(), size=(256,256), bs=32, num_workers=4).normalize()

ERROR:

C:…/…Anaconda3\lib\site-packages\fastai\data_block.py:541: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
training
if getattr(ds, ‘warn’, False): warn(ds.warn)