wjsheng
(jswong)
February 25, 2019, 3:27am
1
Hi guys,
I am working on an image classification problem and I have an image
master folder with 20 folders for each label:
ford (00001.png, 00002.png, …, 00150.png)
honda (00151.png, 00152.png, …, 00300.png)
…
I also have a valid.txt
containing:
ford/00125
ford/00127
…
ford/00150
honda/00276
honda/00277
…
What split
should I be using and how should I use it to parse the given validation set in the valid.txt
?
Thanks!
jls
(Junlin)
February 25, 2019, 3:36am
2
wjsheng
(jswong)
February 25, 2019, 3:50am
3
Thanks for the reply!
Do you think i need to further pass any function to only retrieve the 00001.png
since the txt contains {folder_name}/{image_id}
.
I tried split_by_fname_file
and called data.valid_ds.x
but the result was unfortunately ImageItemList (0 items)
.
yeldarb
(Brad)
February 25, 2019, 3:57am
4
Sounds like you could add .png to the end of each line in valid.txt
wjsheng
(jswong)
February 25, 2019, 4:08am
5
Thanks for the reply.
valid.txt now contains ford/00001.png
Just tried so. I am still suspecting it’s the parsing problem i did not quite get right. Here is the source code:
path_val = PATH + "valid.txt"
src = (ImageItemList.from_folder(path_images)
.split_by_fname_file(path_val)
.label_from_folder())
data = (src.transform(get_transforms())
.databunch(bs=bs)
.normalize(imagenet_stats))
data.valid_ds.x
ImageItemList (0 items)
Any idea what might be wrong?
wjsheng
(jswong)
February 25, 2019, 4:28am
6
had a look at the source code.
def split_by_fname_file(...):
...
return self.split_by_files(valid_names)
Calling split_by_files
does not seem like the split I need?
yeldarb
(Brad)
February 25, 2019, 4:35am
7
First off, are you sure the path is right and that it is loading the images into your training set Ok?
wjsheng
(jswong)
February 25, 2019, 4:37am
8
Yes, by doing data.train_ds.x
I am getting ImageItemList (1000 items)
jls
(Junlin)
February 25, 2019, 8:26am
9
Hey! Maybe you should try .split_by_fname_file('valid.txt')
.
wjsheng
(jswong)
February 25, 2019, 6:08pm
10
Thanks Junlin for the suggestion! My argument already includes valid.txt
from path_val = PATH + "valid.txt"
. I tried only including the file name and encountered [Errno 2] No such file or directory
.
wjsheng
(jswong)
February 25, 2019, 6:12pm
11
Oh no, i just realized that i might have ford/00001 and then benz/00001, which essentially means i need to split by including folder name.
wjsheng
(jswong)
February 26, 2019, 1:29am
12
Okay I tried using split_by_list
instead.
I now have train_list = ['ford/00001.jpg', 'ford/00002.jpg', ...]
and valid_list = ['ford/00125.jpg', 'ford/00126.jpg', ...]
after calling
src = (ImageItemList.from_folder(path_images)
.split_by_list(train_list, valid_list)
.label_from_folder())
the error is now
'list' object has no attribute 'ignore_empty'
I suspect is my lack of understanding the source code as in not knowing the correct way of passing the argument (i.e., including the path /ford
etc.)
Highly appreciate any help. Thank you!
your path_images
may be incorrect.
wjsheng
(jswong)
February 26, 2019, 1:59am
14
I think it is correct because i am getting some result from doing the default split.
doing !ls path_images
shows all the folders:
ford, honda, ...
Is that the full errror? What is the full traceback error?
wjsheng
(jswong)
February 26, 2019, 3:07pm
16
Thanks @ilovescience , yes that is the full error. nothing further in the stack. Any idea?
enemni
(enemni)
January 22, 2021, 11:19am
17
Same error here,
train_fnames=np.loadtxt('train__fnames.csv',delimiter=",", dtype=str).tolist()
valid_fnames=np.loadtxt('valid_fnames.csv',delimiter=",", dtype=str).tolist()
src = (SegmentationItemList.from_folder(path_img))
tfms = get_transforms(flip_vert=True, max_warp=0.1, max_rotate=20, max_zoom=2, max_lighting=0.3)
src = (src.split_by_list(train_fnames,valid_fnames)
.label_from_func(get_y_fn, classes=codes))
data = (src.transform(tfms, size=size, tfm_y=True)
.databunch(bs=bs)
.normalize(imagenet_stats))
Error:
'list' object has no attribute 'ignore_empty'
Quick Fix
The easiest solution is using split_by_files(valid_fnames)
keeping in path_img only train and valid. What won’t go into valid will go into train. My problem is that I have also my test set tiles into path_img.
Question
@sgugger I think I am looking for something like .split_by_idxs(train_idx=test_idx, valid_idx=test_idx)
using file names instead of idx. Is it implemented somehow? Is there a workaround?
enemni
(enemni)
June 30, 2021, 2:22pm
18
Hello @sgugger , the above is still very relevant for us. Have you found a solution or suggestion for it? thanks a lot