does anyone know I can NOT split my dataset? I am making a testset dataloader. It has groundtruth data for evaluation of the model. I can make the Datablock and the Dataloader. However, the default is the 0.2 split. how can I remove this splitting?
You should do dls.test_dl(yourfnames, with_labels=True). Not entirely sure your file structure/etc so that’s the closest to a recommendation I can give ATM
Thanks. My file structure is essentially a CSV, with a column for image filepaths, 4 columns for bounding box coordinates, and a column for classnumber.
For my test datablock, I created a datablock (Image, Category) that takes a row index filename, creates an image, extracts out a crop based on the bounding box in the row for the filename, and returns this crop as the image.
I used a datablock because I was not sure how to modify the dataloader effectively. What this method would work for a datablock, rather than the dataloader?
I’m assuming then that your test setup isn’t the same as your training setup, so a new DataBlock is fine. I wrote a little no_split function awhile ago, use this for your splitter:
def nosplit(o): return L(int(i) for i in range(len(o))), L()
Literally splitter = nosplit
And then when calling .dataloaders() pass shuffle_train=False and drop_last=False.
Absolute worst case (not in front of a computer to test), make the splitter:
def nosplit(o): return L(int(i) for i in range(len(o))), L(int(i) for i in range(len(o)))
And then the dl you’d want to use is the .valid (so test_dls = newDataBlock.dataloaders().valid)