New PyPI package `fastai-rawpy` to use RAW image files

Hey folks,

just wanted to share with you about a new package fastai-rawpy that I’ve developed and uploaded to PyPI.

fastai-rawpy on PyPI:

TL;DR:
If you want to use RAW image files for vision tasks,
you can install fastai-rawpy and it then enables RawImageBlock(),
which is replaceable with ImageBlock() in order to use the RAW image files.

Why use RAW image files?
Depending on your tasks, it could be useful. They are much more detailed than JPG, PNG, BMP formats, etc (on which PIL is mostly based), and the latters are anyway derived from the RAW image files.

Here are the results I got after I compared the training of two datasets (RAW vs JPG) of the same photos and fixed variables for a regression vision task. The SSIM metric demonstrates how similar two separated images are to each other [0 - Not at all, 1 - Identical]

Using RawPy (which is based on LibRaw) lets you benefit from a lot of the postprocess() feature, in addition to many other methods that RawPy offers.

How to use?

Explanations here:

Practical example:

Special thanks
to many members here who have supported and guided in technical as well as conceptual issues. Either implicitly or explicitly made a huge contribution to this package to come final.
These are: (and forgive me if I’ve missed somebody, get welcome to contact me to add your name here)

Note
This is my first ever published package that I’ve worked on, so I’d be glad to improve and learn more. If you got an idea, issue, or suggestion, please contact me and I’ll gladly make improvements. If there are bugs or errors, please contact me and I will try to fix them…

Thanks!

12 Likes

Congratulations on your first python package. Well done!

1 Like

Nicely done. One suggestion, though. Rather than recreating all the fastai item augmentations for RAWImage, you can use fastcore’s type dispatch and patching to add a new encodes method for RAWImage to existing augmentations.

An example of this in practice from fastxtend:

@RandomCrop
def encodes(self, x:TensorImage|TensorMask):
    'Extends RandomCrop to `TensorImage` & `TensorMask`'
    return x.crop_pad(self.size, self.tl, orig_sz=self.orig_sz)

This adds support for TensorImage and TensorMask to the existing RandomCrop item augmentation and allows users to use RandomCrop on the original PILImage types and TensorImage & TensorMask types rather than needing to keeo track of which RandomCrop they imported.

For fastai batch transforms, if you define TensorRawImage to inherit from TensorImage like fastai’s TensorImageBW:

class TensorImageBW(TensorImage): _show_args = ArrayImageBW._show_args

overloading the show method like so:

class TensorRawImage(TensorImage):
    def show(self, ctx=None, **kwargs):
        return show_raw_image(self, ctx=ctx, verbose=True, **kwargs)

then you shouldn’t need to redefine any batch transforms as they will apply to TensorRawImage due to inheriting from TensorImage. I have an example of this when defining TensorImageGeo here. If I recall correctly, you also don’t need to redefine the register_func due to inheritence.

If you do need to something different for a TensorRawImage batch transform, you can patch in a new encodes.

3 Likes

Wow, thanks for the brilliant ideas. Great job on the TensorImageGeo . I will try to implement the TensorRawImage in the same way that you did.

Some transforms are taking TensorImage as arguments, which then could be applied on TensorRawImage as well, but some transforms are only accepting Image.Image classes as arguments, like RandomResizedCrop which is why I needed to adjust it differently to rawpy. Said that, your idea is brilliant as to transforms that getTensorImage as arguments

That’s what this type dispatch patch @RandomCrop is for:

@RandomCrop
def encodes(self, x:TensorImage|TensorMask):
    'Extends RandomCrop to `TensorImage` & `TensorMask`'
    return x.crop_pad(self.size, self.tl, orig_sz=self.orig_sz)

As you can see below, RandomCrop in fastai doesn’t have an encodes for TensorImage, but the above code adds one using fastcore patching (source links in my first post).

@delegates()
class RandomCrop(RandTransform):
    "Randomly crop an image to `size`"
    split_idx,order = None,1
    def __init__(self,  size:int|tuple,  **kwargs):
        size = _process_sz(size)
        store_attr()
        super().__init__(**kwargs)

    ...

    def encodes(self, x:Image.Image|TensorBBox|TensorPoint):
        return x.crop_pad(self.size, self.tl, orig_sz=self.orig_sz)
1 Like

I will do some learning about fastcore’s type dispatch and patching, as it seems like a tool that I could make better use of it. Thank you!

I’d kindly need your help on understanding the proper use of the transforms:

This transform RandomCrop [1] could be applied on Image.Image objects before the batching step, at item_tfms, but the second RandomCrop [2] could be applied on TensorImage objects, which are created only after the item_tfms, meaning that only at batch_tfms, the RandomCrop [2] transform could be applied on the TensorImage (and alternatively, TensorRawImage).

So in other words, if I wanted to use the RandomCrop [2] at item_tfms, it wouldn’t really work, since TensorRawImage wasn’t created yet, right?

[1]

@delegates()
class RandomCrop(RandTransform):
    ...

    def encodes(self, x:Image.Image|TensorBBox|TensorPoint):
        return x.crop_pad(self.size, self.tl, orig_sz=self.orig_sz)

[2]

@RandomCrop
def encodes(self, x:TensorImage|TensorMask):
    'Extends RandomCrop to `TensorImage` & `TensorMask`'
    return x.crop_pad(self.size, self.tl, orig_sz=self.orig_sz)

If I’m correct about this, then this is why I needed to adjust RandomResizedCrop transform to accept x:RAWPYobj. This transform was originally designed only for Image.Image objects. I needed to make the random resized cropping transformation on the items (=the data holder objects, RAWPYobj, alike Image.Image) before they become tensors. So when they become tensors, they are already in their smaller sizes, which the GPU RAM can store without reaching its limit.

I hope that I didn’t miss your good idea, which is why I explained my motivation as to adjusting RandomResizedCrop transform to RAWPYobj, rather than using [2]

So if I got things right (after I look through the typedispatch explanation), I could write the encodes() alone, and add the "@RandomResizedCrop before it, so later on, if I use the RandomResizedCrop, the interpreter of Python will know to use the proper encodes() that fits RAWPYobj passed-in argument with it. Right? This is to save me the overriding of the whole RandomResizedCrop.

let’s say, something like this:

@RandomResizedCrop
def encodes(self,x:RAWPYobj):
        ...
        return x

Yeah, like that.

1 Like

I will keep this topic alive for anyone that is interested in further discussion, debugging, or support on this package. I will also share my progress or seek some guidance while improving this one…

I’m now trying to solve some bug, while I was trying to run a test of regression training a model where the blocks in DataBlock get two TransformBlocks like this: (different bit_depth)

(input block: 16bits image, target block: 8 bits image)

I noticed that combinations of blocks like these work perfectly fine:
(in: 8bits, trgt: 16bits)
(in: 8bits, trgt: 8bits)
(in: 16bits, trgt: 16bits)

The TensorRawImage gets the values of each image exactly as rawpy extracts them.

The only one set that doesn’t work perfectly is:
(in: 16bits, trgt: 8bits)

You can see it in detail at this example [1] with each image and its pixels array.

Explanation of the problem:
Somewhere after opening the image file, saving its values into RAWPYobj in its attribute ndarr, and before this becomes a final TensorRawImage (before going through IntToFloatTensor), something goes wrong:

Looking at the 16bits input TransformBlock part of the block - all of the values of pixels above 255 are cut and floored to 255.

It’s like the dtype of the TensorBase was set to uint8 or such and applied on both TransformBlocks, although only the second TransformBlock needed a uint8 type for its 8bits depth.

[1]

# This is the input image of the block/batch (x)
# here `rawpy` opens and reads the image file at 16bits mode.
RAWPYobj: fn =  /content/drive/MyDrive/SID/Short/0.033/00214_00_0.033s.ARW
RAWPYobj: output_bps = 16
RAWPYObj: ndarr.max() = 4499.0  # this is the maximum pixel value of the 16bits image
RAWPYObj: ndarr = [[[379. 247.  23.] # This is the array of all pixels of that image
  [486. 216.   0.]
  [410.   0.   4.]
  ...
  [ 52.   0.   0.]
  [  0.   0.   0.]
  [  0.   0.   0.]]

 [[  0.   0. 430.]
  [ 75. 240.   0.]
  [  0.   0.   0.]
  ...
  [596.   0.   9.]
  [ 75.  72.  19.]
  [262.   0.  97.]]

 [[  0.  16. 179.]
  [ 14. 161.   0.]
  [ 85.  56. 184.]
  ...
  [171. 115.   0.]
  [677.   0.   8.]
  [249.   0.   3.]]

 ...

 [[  0.   0. 600.]
  [  0.   0.   0.]
  [661.   0.  11.]
  ...
  [108.   0. 481.]
  [ 77. 255.   0.]
  [423.  29. 235.]]

 [[575.  84.   0.]
  [  0.   0. 540.]
  [ 62.   0. 555.]
  ...
  [892.   0.   8.]
  [  0.   0. 338.]
  [ 42. 153. 107.]]

 [[125.   0. 501.]
  [  6. 112.  46.]
  [448.  99. 178.]
  ...
  [ 14. 151.   0.]
  [249.  36.  74.]
  [611. 123.   0.]]]

# This is the target image of the block/batch (y)
RAWPYobj: fn =  /content/drive/MyDrive/SID/Long/10/00214_00_10s.ARW 
RAWPYobj: output_bps = 8 # Here the file opens and read into 8bits depth of bits array
RAWPYObj: ndarr.max() = 255.0 # The maximum value of the image of the 8bits bits depth
RAWPYObj: ndarr = [[[106. 104.  85.]  # The image array
  [107.  91. 103.]
  [113. 102. 105.]
  ...
  [ 42.  34.  32.]
  [ 33.  39.  39.]
  [ 39.  49.  32.]]

 [[111. 108.  87.]
  [115. 112.  70.]
  [107. 107.  84.]
  ...
  [ 38.  36.  41.]
  [ 39.  37.  38.]
  [ 39.  28.  39.]]

 [[107. 103. 101.]
  [110. 102.  90.]
  [112. 107.  70.]
  ...
  [ 36.  47.  38.]
  [ 39.  27.  48.]
  [ 51.  32.  60.]]

 ...

 [[  1.   3.  21.]
  [  9.   0.   0.]
  [  9.   0.   0.]
  ...
  [ 66.  44.   0.]
  [ 70.  42.   8.]
  [ 66.  52.  10.]]

 [[  2.   1.   0.]
  [  8.   0.   0.]
  [ 16.   0.   0.]
  ...
  [ 66.  33.   9.]
  [ 76.  46.   3.]
  [ 70.  42.  23.]]

 [[  0.   0.   1.]
  [  0.   0.   0.]
  [  5.   0.  25.]
  ...
  [ 70.  37.  11.]
  [ 74.  32.   3.]
  [ 73.  24.   8.]]]


# Here are the Tensors of each of the images above:

# Note the maximum value of the image array is 255, although it was supposed to be up to 4499.

# Any pixel value above 255 was cut and floored to 255.


# This is the input (x) Tensor of the batch of the input image above

TensorRawImage([[[255, 255, 255,  ...,  52,   0,   0], 
                 [  0,  75,   0,  ..., 255,  75, 255],
                 [  0,  14,  85,  ..., 171, 255, 249],
                 ...,
                 [  0,   0, 255,  ..., 108,  77, 255],
                 [255,   0,  62,  ..., 255,   0,  42],
                 [125,   6, 255,  ...,  14, 249, 255]],

                [[247, 216,   0,  ...,   0,   0,   0],
                 [  0, 240,   0,  ...,   0,  72,   0],
                 [ 16, 161,  56,  ..., 115,   0,   0],
                 ...,
                 [  0,   0,   0,  ...,   0, 255,  29],
                 [ 84,   0,   0,  ...,   0,   0, 153],
                 [  0, 112,  99,  ..., 151,  36, 123]],

                [[ 23,   0,   4,  ...,   0,   0,   0],
                 [255,   0,   0,  ...,   9,  19,  97],
                 [179,   0, 184,  ...,   0,   8,   3],
                 ...,
                 [255,   0,  11,  ..., 255,   0, 235],
                 [  0, 255, 255,  ...,   8, 255, 107],
                 [255,  46, 178,  ...,   0,  74,   0]]])
im.max() = TensorRawImage(255)

# Thisis the target (y) Tensor  of the target image above

# Here the image file was read at 8bits depth, so every pixel is anyway on the scale between 0-255

TensorRawImage([[[106, 107, 113,  ...,  42,  33,  39],
                 [111, 115, 107,  ...,  38,  39,  39],
                 [107, 110, 112,  ...,  36,  39,  51],
                 ...,
                 [  1,   9,   9,  ...,  66,  70,  66],
                 [  2,   8,  16,  ...,  66,  76,  70],
                 [  0,   0,   5,  ...,  70,  74,  73]],

                [[104,  91, 102,  ...,  34,  39,  49],
                 [108, 112, 107,  ...,  36,  37,  28],
                 [103, 102, 107,  ...,  47,  27,  32],
                 ...,
                 [  3,   0,   0,  ...,  44,  42,  52],
                 [  1,   0,   0,  ...,  33,  46,  42],
                 [  0,   0,   0,  ...,  37,  32,  24]],

                [[ 85, 103, 105,  ...,  32,  39,  32],
                 [ 87,  70,  84,  ...,  41,  38,  39],
                 [101,  90,  70,  ...,  38,  48,  60],
                 ...,
                 [ 21,   0,   0,  ...,   0,   8,  10],
                 [  0,   0,   0,  ...,   9,   3,  23],
                 [  1,   0,  25,  ...,  11,   3,   8]]])
im.max() = TensorRawImage(255)

Here’s where I’m stuck. Any idea how to make TensorBase adjust the dtype according to the depth of bits for each image accordingly?

I tried to dig under the hood of the TensorBase. Found this:

# %% ../nbs/00_torch_core.ipynb 94
class TensorBase(Tensor):
    "A `Tensor` which support subclass pickling, and maintains metadata when casting or after methods"
    debug,_opt = False,defaultdict(list)
    def __new__(cls, x, **kwargs):
        res = cast(tensor(x), cls)
        for k,v in kwargs.items(): setattr(res, k, v)
        return res

    @classmethod
    def _before_cast(cls, x): return tensor(x)
    def __repr__(self): return re.sub('tensor', self.__class__.__name__, super().__repr__())

    def __reduce_ex__(self,proto):
        torch.utils.hooks.warn_if_has_hooks(self)
        args = (self.storage(), self.storage_offset(), tuple(self.size()), self.stride())
        if self.is_quantized: args = args + (self.q_scale(), self.q_zero_point())
        args = args + (self.requires_grad, OrderedDict())
        f = torch._utils._rebuild_qtensor if self.is_quantized else  torch._utils._rebuild_tensor_v2
        return (_rebuild_from_type, (f, type(self), args, self.__dict__))

    @classmethod
    def register_func(cls, func, *oks): cls._opt[func].append(oks)

    @classmethod
    def __torch_function__(cls, func, types, args=(), kwargs=None):
        if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
        if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
        res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
        dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
        if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)
        return res

    def new_tensor(self, size, dtype=None, device=None, requires_grad=False):
        cls = type(self)
        return self.as_subclass(Tensor).new_tensor(size, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)

    def new_ones(self, data, dtype=None, device=None, requires_grad=False):
        cls = type(self)
        return self.as_subclass(Tensor).new_ones(data, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)

    def new(self, x=None):
        cls = type(self)
        res = self.as_subclass(Tensor).new() if x is None else self.as_subclass(Tensor).new(x)
        return res.as_subclass(cls)
    
    def requires_grad_(self, requires_grad=True):
        # Workaround https://github.com/pytorch/pytorch/issues/50219
        self.requires_grad = requires_grad
        return self

I thought about using this here [2] but I’m not sure how to reach out to it from the higher API of fastai.

[2]

   def new_tensor(self, size, dtype=None, device=None, requires_grad=False):
        cls = type(self)
        return self.as_subclass(Tensor).new_tensor(size, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)

If you made it this far, thank you :wink:

When you convert… (in: 8bits, trgt: 16bits) …does the number stay the same? or does it stretch/scale to a relative bigger size?
If the former, then the MSB are being padded with zeros, so it might make sense that converting back (in: 16bits, trgt: 8bits) might simply drop the MSB byte.
What you seem to be looking for is 8=>16 padding zeros in LSB, and 16=>8 shift to overwrite the LSB.

You might want to consider tracing cast() fastdispatch/core.py at 29f7a86a978f620a8c489bf6c18c90679a7e13a6 · fastai/fastdispatch · GitHub
I can’t follow what its doing just looking at the code.

Not sure if this is related, but itneresting nonetheless. Tips Tricks 26 - How to properly convert 16 bit to 8 bit images in python - YouTube

1 Like

Hey guys,
It has been a year already.

I’ve created a video that shows a bar chart race of all the downloads per day, per country, over a year span.

I used the data from PyPI database. I can’t tell whether a single record of a downloading indicates the country of which the user is from, or indicates the country of which the location of the mirroring server is at. How is it more likely to be?

I used gifyu in order to upload this 26MB GIF file.