SOLVED:NOT splitting datablock

nchukaobah · January 18, 2021, 2:26am

Hi,

does anyone know I can NOT split my dataset? I am making a testset dataloader. It has groundtruth data for evaluation of the model. I can make the Datablock and the Dataloader. However, the default is the 0.2 split. how can I remove this splitting?

Thanks

muellerzr · January 18, 2021, 3:30am

You should do dls.test_dl(yourfnames, with_labels=True). Not entirely sure your file structure/etc so that’s the closest to a recommendation I can give ATM

nchukaobah · January 18, 2021, 3:45am

Thanks. My file structure is essentially a CSV, with a column for image filepaths, 4 columns for bounding box coordinates, and a column for classnumber.

For my test datablock, I created a datablock (Image, Category) that takes a row index filename, creates an image, extracts out a crop based on the bounding box in the row for the filename, and returns this crop as the image.

I used a datablock because I was not sure how to modify the dataloader effectively. What this method would work for a datablock, rather than the dataloader?

muellerzr · January 18, 2021, 3:50am

I’m assuming then that your test setup isn’t the same as your training setup, so a new DataBlock is fine. I wrote a little no_split function awhile ago, use this for your splitter:

def nosplit(o): return L(int(i) for i in range(len(o))), L()

Literally splitter = nosplit

And then when calling .dataloaders() pass shuffle_train=False and drop_last=False.

Absolute worst case (not in front of a computer to test), make the splitter:

def nosplit(o): return L(int(i) for i in range(len(o))), L(int(i) for i in range(len(o)))

And then the dl you’d want to use is the .valid (so test_dls = newDataBlock.dataloaders().valid)

nchukaobah · January 18, 2021, 3:54am

That works. Thank you so much.

rahat · June 12, 2023, 6:34pm

@muellerzr @nchukaobah while creating datablock you can pass indexsplitter with an empty list, something like this -

dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   get_x=ColReader('path', pref=base_dir),  # Replace 'path_to_image_folder/' with the path to your image folder
                   get_y=ColReader('Label'),
                   #splitter=RandomSplitter(seed=42),
                   splitter=IndexSplitter([]),
                  item_tfms=Resize(192, method='squish'))