For a multi-label image classification problem, I was wondering if colorful backgrounds were responsible for poor results, so I hoped to put all images in gray scale before training and classification.
I’ve managed to get this working, but I’m having trouble figuring out if there’s a way to tell the Data Block API the order in which I’d like the transforms to happen.
class ToGrayscaleTransform(Transform):
"""fastai transforms are a piece of work.
item_tfms decide whether to run based on the type.
so if we didn't have that type annotation on x, it would
get run on both the image and category label (!).
"""
def encodes(self, x:Image.Image):
grey_img = x.convert("L")
a = np.asarray(grey_img)
rgb = np.stack([a,a,a], axis=-1)
print(f"shape of rgb data: {rgb.shape}")
return Image.fromarray(rgb, mode="RGB")
breeds = DataBlock(
blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=parent_label,
item_tfms=[Resize(128), ToGrayscaleTransform()]
)
I would like (for efficiency) to first Resize
down to thumbnail size and then convert to gray scale. However, whether I put item_tfms=[Resize(128), ToGrayscaleTransform()]
or item_tfms=[ToGrayscaleTransform(), Resize(128)]
, summary()
and the learners always seem to first run ToGrayscaleTransform()
and then run Resize(128)
:
breeds.summary(path, bs=3, show_batch=False)
outputs:
Building one sample
Pipeline: PILBase.create
starting from
images/dogs/pointer/00000104.jpg
applying PILBase.create gives
PILImage mode=RGB size=600x900
...
Building one batch
Applying item_tfms to the first sample:
Pipeline: ToGrayscaleTransform -> Resize -- {'size': (128, 128), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
starting from
(PILImage mode=RGB size=600x900, TensorCategory(2))
shape of rgb data: (900, 600, 3)
applying ToGrayscaleTransform gives
(PILImage mode=RGB size=600x900, TensorCategory(2))
applying Resize -- {'size': (128, 128), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} gives
(PILImage mode=RGB size=128x128, TensorCategory(2))
applying ToTensor gives
(TensorImage of size 3x128x128, TensorCategory(2))
Is it intended that item_tfms can’t convey ordering? I noticed that type dispatch means the functions will get assorted variously to the input or labels, so maybe ordering gets lost somewhere in fastcore/fastai?