Does anyone know if there is a way to exclude images from a databunch on the fly, if the image file itself is missing? I created a databunch from a datablock, where the label metadata came from a csv file.
But the csv file turned out to include a few filenames for files that didnāt exist, which caused my model to crash in the middle of training. Checking beforehand turns out to be slow (I used Path.is_file() from Pathlib).
I used os.path.isfile and it isnāt very fast - a couple of minutes for a million rows or so - but I feel this should be a one-time clean-up rather than hitting this during your training process. Iād be interested if there is a better way.
Yeah, I guess the question is whether itās better to clean up the data on the fly by adding a line like this:
If not path.is_file()...skip this one...(and possibly warn)
to the DataLoader or perhaps the Databunch creation step, or whether you are asking for trouble by trying to continue with a csv metadata file that doesnāt match the file directory, and as you say itās therefore better to clean it all up beforehand, even if the process is slower. (I think I convinced myself while writing this that the pre-cleaning is better).
I used it similar to how the image regression one was one (I pointed to the image folder) IIRC. I donāt have the code in front of me though at the moment.
Same concept in the end though If you still canāt get it let me know and Iāll try to find when I did it. Finally jumping back into the code now that the holidays are done
Also @s.s.o is the dataset publicly available? Iād be interested in that for the study group
@muellerzr Currently the dataset not public we are still collecting. Itās dental data (not my domain tho). Iām trying to convince my colleagues to make it public.
ERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.
ERROR: gql 0.2.0 has requirement graphql-core<2,>=0.5.0, but you'll have graphql-core 2.2.1 which is incompatible.
Second after from fastai2.vision.all import *:
ImportError Traceback (most recent call last)
<ipython-input-1-533e7442bc6c> in <module>()
1 from fastai2.basics import *
2 from fastai2.callback.all import *
----> 3 from fastai2.vision.all import *
4 from fastai2.notebook.showdoc import *
5
7 frames
/usr/local/lib/python3.6/dist-packages/torchvision/transforms/functional.py in <module>()
3 import sys
4 import math
----> 5 from PIL import Image, ImageOps, ImageEnhance, PILLOW_VERSION
6 try:
7 import accimage
ImportError: cannot import name 'PILLOW_VERSION'
Today is Friday, 1/3/2020.
Is there any way to fix this? Thanks.
I have a fastai2 model producing promising results and I am wondering about a remark Jeremy made in class: that he did well by training a model on small images and then scaling up to larger images. Two questions:
How do you decide when itās time to scale up? Is there any useful indicator, or is it just a matter of how much time you have left? I havenāt tried it yet so I donāt have a gut feeling for how much large images will slow the training, or how long it will take to get the model back to the same level of accuracy with larger images; and
How big to go? Is there a limit either to the image size or to the size increment where you lose advantage from going any larger? The main advantage I see is that the center crop on the test images will be larger.
I know I can figure this out the hard way, but Iām hoping some of you who are much more experienced than I am will have words of wisdom.
@sgugger just a minor tidbit I want to be sure of. In the most recent version you adjusted predict, and it seems we can no longer do something like the following:
learn.predict('image1.png')
Instead I need to make a path first. (IE path_im = Path('image1.png')) Is this a permanent adjustment?
I have no idea what your dataset look like, but as the error message should have warned you, the predict method was expecting one of the type encountered while processing your training/validation data.
This is permanent, yes.
It was an image on the local directory, and the warning was it was not a Path type or an image in the dataset. We used to be able to just pass in a string for the file location and it would convert it to a path object (PathOrStr IIRC). Got it. Thanks
Ah wait I totally mis-read that. Looking at the lesson 2 example: pred_class,pred_idx,outputs = learn.predict(path/'black'/'00000021.jpg') it still used a path. Sorry!