Not a Directory error in CIFAR10 exercise

get_data(32,4) brings up an error
NotADirectoryError: [Errno 20] Not a directory: ‘data/cifar10/train/0_frog.png’
How do if ix this ?

2 Likes

Can you be bit more specific? What Notebook / Location. Screenshots?

Setup: Are you using Local system (git clone?) or Paperspace Fast.ai or Crestle or any other environment?

Try a few thing -

  1. !pwd - to see what is the current working directory
  2. !ls or !dir to see what’s in your current working dir. Do you see a folder called data? Then do the same for subfolders.

I am in dl 1 folder. I downloaded and unzipped the dataset in /data directory. Not sure where the error lies. I ran the notebook on CPU as well as GPU cluster, but this error persists.

What can be the possible reason ? Is it specific to fastai ?

Seeing this as well. I solved it by making folders for classes in “train” and “test”.

Remember how we did cats and dogs in lesson 1?
train/cats and train/dogs

I wanted to see what classes we had:
cd train && find . | grep -o [a-z]*.png | sort -u && cd .

We have: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck

I made new folders
mkdir train_ test_

I went into one of them to make our classes, created the fn to organize the files, and executed it:

  1. cd train_

  2. mkdir airplane automobile bird cat deer dog frog horse ship truck

  3. cd ..

  4. function copytrain { for arg in $@; do cp $(find train -name '*'$arg'.png') train_/$arg/; done; };

  5. copytrain $(ls train_ | grep -o "[a-z]*")

It took a few minutes to run. Lots of files.

Then repeat 1-5, but with test and test_ instead of train and train_

Now it all works. This is because that from_paths method is expecting folders for the classes.

Make sure the new folders you created match the names you provide to from_paths val_name and trn_name.

22 Likes

Thanks, worked like charm !

1 Like

oh wow. this didn’t come up when I searched on the same error at about the same time.

If someone wants to do this in python (went with python since I’m on windows) this was the code I used (assumes either your .py file or notebook file is located in the courses/dl1 directory):

import os
import glob
import shutil
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
cwd = os.getcwd()
train_path = cwd + '/data/cifar/train/'
# go through classes and make a directory for each one
for class_now in classes:
    path_now = train_path + class_now
    if not os.path.exists(path_now):
        os.makedirs(path_now)
# go through classes and match them with file names
# file names are e.g. '123_frog.png' so glob picks out all the e.g. frog files
for class_now in classes:
    identifier = train_path + '*' + class_now + '.png'
    class_files = glob.glob(identifier)
    file_destination = train_path + class_now
    # move all frog files to proper class directory
    for file_to_move in class_files:
        shutil.move(file_to_move, file_destination)

# do all the same but now for the test data
test_path = cwd + '/data/cifar/test/'
for class_now in classes:
    path_now = test_path + class_now
    if not os.path.exists(path_now):
        os.makedirs(path_now)
for class_now in classes:
    identifier = test_path + '*' + class_now + '.png'
    class_files = glob.glob(identifier)
    file_destination = test_path + class_now
    for file_to_move in class_files:
        shutil.move(file_to_move, file_destination)
14 Likes

Thank you! This is so helpful

I’m also on windows. I use GitBash to be able to execute bash scripts.

If you are on 64 bit and win 10, kindly search for Windows Subsystem Linux

Hi @jsonm
Thanks a lot. This was very helpful. But I run into a different error after following your instructions.
This is the error I get when I run data = get_data(32,4) :

ValueError                                Traceback (most recent call last)
<ipython-input-47-6a185ac353fc> in <module>()
----> 1 data = get_data(32,4)

<ipython-input-45-88c9e0487857> in get_data(sz, bs)
      1 def get_data(sz,bs):
      2     tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz//8)
----> 3     return ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)

~/fastai/courses/dl1/fastai/dataset.py in from_paths(cls, path, bs, tfms, trn_name, val_name, test_name, test_with_labels, num_workers)
    423             test = folder_source(path, test_name) if test_with_labels else read_dir(path, test_name)
    424         else: test = None
--> 425         datasets = cls.get_ds(FilesIndexArrayDataset, trn, val, tfms, path=path, test=test)
    426         return cls(path, datasets, bs, num_workers, classes=trn[2])
    427 

~/fastai/courses/dl1/fastai/dataset.py in get_ds(fn, trn, val, tfms, test, **kwargs)
    362         res = [
    363             fn(trn[0], trn[1], tfms[0], **kwargs), # train
--> 364             fn(val[0], val[1], tfms[1], **kwargs), # val
    365             fn(trn[0], trn[1], tfms[1], **kwargs), # fix
    366             fn(val[0], val[1], tfms[0], **kwargs)  # aug

~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, y, transform, path)
    259         self.y=y
    260         assert(len(fnames)==len(y))
--> 261         super().__init__(fnames, transform, path)
    262     def get_y(self, i): return self.y[i]
    263     def get_c(self):

~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, transform, path)
    235     def __init__(self, fnames, transform, path):
    236         self.path,self.fnames = path,fnames
--> 237         super().__init__(transform)
    238     def get_sz(self): return self.transform.sz
    239     def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))

~/fastai/courses/dl1/fastai/dataset.py in __init__(self, transform)
    154         self.transform = transform
    155         self.n = self.get_n()
--> 156         self.c = self.get_c()
    157         self.sz = self.get_sz()
    158 

~/fastai/courses/dl1/fastai/dataset.py in get_c(self)
    265 
    266 class FilesIndexArrayDataset(FilesArrayDataset):
--> 267     def get_c(self): return int(self.y.max())+1
    268 
    269 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
     24 # small reductions
     25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26     return umr_maximum(a, axis, None, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):

ValueError: zero-size array to reduction operation maximum which has no identity

Hi, would you mind tell me that these codes should be run in jupyter notebook or anywhere else?

That’s written in bash.

So if you’re on a unix machine (linux / mac os), you can just run it from the target directory in the terminal.

If you’re on windows, you’ll need a bash emulator- Git Bash works well.

Very useful, thank you. Bash is very handy and I want to learn it, do you have any tutorials to recommend to me?

Bash is useful- but it’s just one way to interact with unix (which is what you’ll actually want to learn)

Unlike learning a language, it’s not vital to be fluent in bash to get some incredibly useful things done.

Learning basic syntax and how to use some tools like grep and awk are helpful for most things you’ll want to do, but honestly, it’s very task specific.

Depending on what you’re trying to do, you’ll be using dramatically different tools/binaries which will have their own docs/usage.


I’d say the most useful things to learn would be (loosely in this order):

This is a pretty good resource.

It’s one of those things that is easiest to learn by doing.

1 Like

Learned a lot, thank you.:grin:

images_name = os.listdir(‘cifar/train’)
os.mkdir(‘cifar/train1’)
for x in classes:
-----os.mkdir(‘cifar/train1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/train/’+x,‘cifar/train1/’+ dir_name + ‘/’ +x)

images_name = os.listdir(‘cifar/test’)
os.mkdir(‘cifar/test1’)
for x in classes:
-----os.mkdir(‘cifar/test1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/test/’+x,‘cifar/test1/’+ dir_name + ‘/’ +x)

Your method using python language rather than bash command is cool, too. Thank you very much!