I’m trying to reproduce the bounding box selection based on Lesson 8, and using google open image dataset.
that dataset provides bounding boxes as values in [0,1] range, and all images vary in sizes, thus there’s no easy conversion to TfmType.COORD type of coordinates.
I’m creating a dataset requesting no alternations to be done for the Y dependent tensor containing bounding boxes, as there’s no fastai support for relative bounding boxes values. I probably need implement my own Rotate and Flip transforms.
f_model = resnet34 bs = 64 sz = 224 num_workers = 8 tfm_y = TfmType.NO augs = [RandomRotate(5, p=0.5, tfm_y=tfm_y), RandomLighting(0.05,0.05, tfm_y=tfm_y)] tfms = tfms_from_model(f_model=resnet34, sz=sz, aug_tfms=augs, crop_type=CropType.NO, tfm_y=tfm_y, norm_y=False) datasets = ImageClassifierData.get_ds(FilesIndexArrayRegressionDataset, trn_bbox_ds, val_bbox_ds, tfms, path=PATH) md = ImageClassifierData(PATH, datasets, bs, num_workers, classes=) head_reg4 = nn.Sequential(Flatten(), nn.Linear(25088,4)) learn = ConvLearner.pretrained(f_model, md, custom_head=head_reg4) learn.opt_fn = optim.Adam learn.crit = nn.L1Loss()
If I leave the coordinates in [0,1] range, then loss never optimizes below 0.35 and predicted bounding boxes hardly make any sense. However, if I multiply the training values by 1e3, then loss optimizes down to 50 (which is 0.5% of [0,1e3] range) and predictions look rather accurate. Are there any requirements for regression training value range?
Also what is the reason for generating class labels for bounding boxes in regression case? I see this case is specifically handled in fastai dataset.py, although it doesn’t seem to have any effect later on - i.e. I supply an empty class label list, and it yields same results for regression case.
def dict_source(folder, fnames, csv_labels, suffix='', continuous=False): all_labels = sorted(list(set(p for o in csv_labels.values() for p in ( if type(o) == float else o)))) full_names = [os.path.join(folder,str(fn)+suffix) for fn in fnames] if continuous: label_arr = np.array([np.array(csv_labels[i]).astype(np.float32) for i in fnames])