get_data(32,4) brings up an error
NotADirectoryError: [Errno 20] Not a directory: ‘data/cifar10/train/0_frog.png’
How do if ix this ?
Can you be bit more specific? What Notebook / Location. Screenshots?
Setup: Are you using Local system (git clone?) or Paperspace Fast.ai or Crestle or any other environment?
Try a few thing -
-
!pwd
- to see what is the current working directory -
!ls
or!dir
to see what’s in your current working dir. Do you see a folder called data? Then do the same for subfolders.
I am in dl 1 folder. I downloaded and unzipped the dataset in /data directory. Not sure where the error lies. I ran the notebook on CPU as well as GPU cluster, but this error persists.
What can be the possible reason ? Is it specific to fastai ?
Seeing this as well. I solved it by making folders for classes in “train” and “test”.
Remember how we did cats and dogs in lesson 1?
train/cats
and train/dogs
I wanted to see what classes we had:
cd train && find . | grep -o [a-z]*.png | sort -u && cd .
We have: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
I made new folders
mkdir train_ test_
I went into one of them to make our classes, created the fn to organize the files, and executed it:
-
cd train_
-
mkdir airplane automobile bird cat deer dog frog horse ship truck
-
cd ..
-
function copytrain { for arg in $@; do cp $(find train -name '*'$arg'.png') train_/$arg/; done; };
-
copytrain $(ls train_ | grep -o "[a-z]*")
It took a few minutes to run. Lots of files.
Then repeat 1-5, but with test
and test_
instead of train
and train_
Now it all works. This is because that from_paths
method is expecting folders for the classes.
Make sure the new folders you created match the names you provide to from_paths
val_name
and trn_name
.
Thanks, worked like charm !
oh wow. this didn’t come up when I searched on the same error at about the same time.
If someone wants to do this in python (went with python since I’m on windows) this was the code I used (assumes either your .py file or notebook file is located in the courses/dl1 directory):
import os
import glob
import shutil
classes = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
cwd = os.getcwd()
train_path = cwd + '/data/cifar/train/'
# go through classes and make a directory for each one
for class_now in classes:
path_now = train_path + class_now
if not os.path.exists(path_now):
os.makedirs(path_now)
# go through classes and match them with file names
# file names are e.g. '123_frog.png' so glob picks out all the e.g. frog files
for class_now in classes:
identifier = train_path + '*' + class_now + '.png'
class_files = glob.glob(identifier)
file_destination = train_path + class_now
# move all frog files to proper class directory
for file_to_move in class_files:
shutil.move(file_to_move, file_destination)
# do all the same but now for the test data
test_path = cwd + '/data/cifar/test/'
for class_now in classes:
path_now = test_path + class_now
if not os.path.exists(path_now):
os.makedirs(path_now)
for class_now in classes:
identifier = test_path + '*' + class_now + '.png'
class_files = glob.glob(identifier)
file_destination = test_path + class_now
for file_to_move in class_files:
shutil.move(file_to_move, file_destination)
Thank you! This is so helpful
I’m also on windows. I use GitBash to be able to execute bash scripts.
If you are on 64 bit and win 10, kindly search for Windows Subsystem Linux
Hi @jsonm
Thanks a lot. This was very helpful. But I run into a different error after following your instructions.
This is the error I get when I run data = get_data(32,4)
:
ValueError Traceback (most recent call last)
<ipython-input-47-6a185ac353fc> in <module>()
----> 1 data = get_data(32,4)
<ipython-input-45-88c9e0487857> in get_data(sz, bs)
1 def get_data(sz,bs):
2 tfms = tfms_from_stats(stats, sz, aug_tfms=[RandomFlip()], pad=sz//8)
----> 3 return ImageClassifierData.from_paths(PATH, val_name='test', tfms=tfms, bs=bs)
~/fastai/courses/dl1/fastai/dataset.py in from_paths(cls, path, bs, tfms, trn_name, val_name, test_name, test_with_labels, num_workers)
423 test = folder_source(path, test_name) if test_with_labels else read_dir(path, test_name)
424 else: test = None
--> 425 datasets = cls.get_ds(FilesIndexArrayDataset, trn, val, tfms, path=path, test=test)
426 return cls(path, datasets, bs, num_workers, classes=trn[2])
427
~/fastai/courses/dl1/fastai/dataset.py in get_ds(fn, trn, val, tfms, test, **kwargs)
362 res = [
363 fn(trn[0], trn[1], tfms[0], **kwargs), # train
--> 364 fn(val[0], val[1], tfms[1], **kwargs), # val
365 fn(trn[0], trn[1], tfms[1], **kwargs), # fix
366 fn(val[0], val[1], tfms[0], **kwargs) # aug
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, y, transform, path)
259 self.y=y
260 assert(len(fnames)==len(y))
--> 261 super().__init__(fnames, transform, path)
262 def get_y(self, i): return self.y[i]
263 def get_c(self):
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, fnames, transform, path)
235 def __init__(self, fnames, transform, path):
236 self.path,self.fnames = path,fnames
--> 237 super().__init__(transform)
238 def get_sz(self): return self.transform.sz
239 def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))
~/fastai/courses/dl1/fastai/dataset.py in __init__(self, transform)
154 self.transform = transform
155 self.n = self.get_n()
--> 156 self.c = self.get_c()
157 self.sz = self.get_sz()
158
~/fastai/courses/dl1/fastai/dataset.py in get_c(self)
265
266 class FilesIndexArrayDataset(FilesArrayDataset):
--> 267 def get_c(self): return int(self.y.max())+1
268
269
~/anaconda3/envs/fastai/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
24 # small reductions
25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26 return umr_maximum(a, axis, None, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
ValueError: zero-size array to reduction operation maximum which has no identity
Hi, would you mind tell me that these codes should be run in jupyter notebook or anywhere else?
That’s written in bash.
So if you’re on a unix machine (linux / mac os), you can just run it from the target directory in the terminal.
If you’re on windows, you’ll need a bash emulator- Git Bash works well.
Very useful, thank you. Bash is very handy and I want to learn it, do you have any tutorials to recommend to me?
Bash is useful- but it’s just one way to interact with unix (which is what you’ll actually want to learn)
Unlike learning a language, it’s not vital to be fluent in bash to get some incredibly useful things done.
Learning basic syntax and how to use some tools like grep
and awk
are helpful for most things you’ll want to do, but honestly, it’s very task specific.
Depending on what you’re trying to do, you’ll be using dramatically different tools/binaries which will have their own docs/usage.
I’d say the most useful things to learn would be (loosely in this order):
- Variables (and inline execution)
- Piping / Redirection
- Popular unix commands
- Regex
- Conditions
- Loops
This is a pretty good resource.
It’s one of those things that is easiest to learn by doing.
Learned a lot, thank you.
images_name = os.listdir(‘cifar/train’)
os.mkdir(‘cifar/train1’)
for x in classes:
-----os.mkdir(‘cifar/train1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/train/’+x,‘cifar/train1/’+ dir_name + ‘/’ +x)
images_name = os.listdir(‘cifar/test’)
os.mkdir(‘cifar/test1’)
for x in classes:
-----os.mkdir(‘cifar/test1/’ + x)
for x in images_name:
-----dir_name = x.split(’_’)[1][:-4]
-----os.renames(‘cifar/test/’+x,‘cifar/test1/’+ dir_name + ‘/’ +x)
Your method using python language rather than bash command is cool, too. Thank you very much!