I will keep this topic alive for anyone that is interested in further discussion, debugging, or support on this package. I will also share my progress or seek some guidance while improving this one…
I’m now trying to solve some bug, while I was trying to run a test of regression training a model where the blocks
in DataBlock
get two TransformBlock
s like this: (different bit_depth
)
(input block: 16bits image, target block: 8 bits image)
I noticed that combinations of blocks like these work perfectly fine:
(in: 8bits, trgt: 16bits)
(in: 8bits, trgt: 8bits)
(in: 16bits, trgt: 16bits)
The TensorRawImage
gets the values of each image exactly as rawpy
extracts them.
The only one set that doesn’t work perfectly is:
(in: 16bits, trgt: 8bits)
You can see it in detail at this example [1] with each image and its pixels array.
Explanation of the problem:
Somewhere after opening the image file, saving its values into RAWPYobj
in its attribute ndarr
, and before this becomes a final TensorRawImage
(before going through IntToFloatTensor
), something goes wrong:
Looking at the 16bits input TransformBlock
part of the block - all of the values of pixels above 255 are cut and floored to 255.
It’s like the dtype
of the TensorBase
was set to uint8
or such and applied on both TransformBlock
s, although only the second TransformBlock
needed a uint8
type for its 8bits depth.
[1]
# This is the input image of the block/batch (x)
# here `rawpy` opens and reads the image file at 16bits mode.
RAWPYobj: fn = /content/drive/MyDrive/SID/Short/0.033/00214_00_0.033s.ARW
RAWPYobj: output_bps = 16
RAWPYObj: ndarr.max() = 4499.0 # this is the maximum pixel value of the 16bits image
RAWPYObj: ndarr = [[[379. 247. 23.] # This is the array of all pixels of that image
[486. 216. 0.]
[410. 0. 4.]
...
[ 52. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
[[ 0. 0. 430.]
[ 75. 240. 0.]
[ 0. 0. 0.]
...
[596. 0. 9.]
[ 75. 72. 19.]
[262. 0. 97.]]
[[ 0. 16. 179.]
[ 14. 161. 0.]
[ 85. 56. 184.]
...
[171. 115. 0.]
[677. 0. 8.]
[249. 0. 3.]]
...
[[ 0. 0. 600.]
[ 0. 0. 0.]
[661. 0. 11.]
...
[108. 0. 481.]
[ 77. 255. 0.]
[423. 29. 235.]]
[[575. 84. 0.]
[ 0. 0. 540.]
[ 62. 0. 555.]
...
[892. 0. 8.]
[ 0. 0. 338.]
[ 42. 153. 107.]]
[[125. 0. 501.]
[ 6. 112. 46.]
[448. 99. 178.]
...
[ 14. 151. 0.]
[249. 36. 74.]
[611. 123. 0.]]]
# This is the target image of the block/batch (y)
RAWPYobj: fn = /content/drive/MyDrive/SID/Long/10/00214_00_10s.ARW
RAWPYobj: output_bps = 8 # Here the file opens and read into 8bits depth of bits array
RAWPYObj: ndarr.max() = 255.0 # The maximum value of the image of the 8bits bits depth
RAWPYObj: ndarr = [[[106. 104. 85.] # The image array
[107. 91. 103.]
[113. 102. 105.]
...
[ 42. 34. 32.]
[ 33. 39. 39.]
[ 39. 49. 32.]]
[[111. 108. 87.]
[115. 112. 70.]
[107. 107. 84.]
...
[ 38. 36. 41.]
[ 39. 37. 38.]
[ 39. 28. 39.]]
[[107. 103. 101.]
[110. 102. 90.]
[112. 107. 70.]
...
[ 36. 47. 38.]
[ 39. 27. 48.]
[ 51. 32. 60.]]
...
[[ 1. 3. 21.]
[ 9. 0. 0.]
[ 9. 0. 0.]
...
[ 66. 44. 0.]
[ 70. 42. 8.]
[ 66. 52. 10.]]
[[ 2. 1. 0.]
[ 8. 0. 0.]
[ 16. 0. 0.]
...
[ 66. 33. 9.]
[ 76. 46. 3.]
[ 70. 42. 23.]]
[[ 0. 0. 1.]
[ 0. 0. 0.]
[ 5. 0. 25.]
...
[ 70. 37. 11.]
[ 74. 32. 3.]
[ 73. 24. 8.]]]
# Here are the Tensors of each of the images above:
# Note the maximum value of the image array is 255, although it was supposed to be up to 4499.
# Any pixel value above 255 was cut and floored to 255.
# This is the input (x) Tensor of the batch of the input image above
TensorRawImage([[[255, 255, 255, ..., 52, 0, 0],
[ 0, 75, 0, ..., 255, 75, 255],
[ 0, 14, 85, ..., 171, 255, 249],
...,
[ 0, 0, 255, ..., 108, 77, 255],
[255, 0, 62, ..., 255, 0, 42],
[125, 6, 255, ..., 14, 249, 255]],
[[247, 216, 0, ..., 0, 0, 0],
[ 0, 240, 0, ..., 0, 72, 0],
[ 16, 161, 56, ..., 115, 0, 0],
...,
[ 0, 0, 0, ..., 0, 255, 29],
[ 84, 0, 0, ..., 0, 0, 153],
[ 0, 112, 99, ..., 151, 36, 123]],
[[ 23, 0, 4, ..., 0, 0, 0],
[255, 0, 0, ..., 9, 19, 97],
[179, 0, 184, ..., 0, 8, 3],
...,
[255, 0, 11, ..., 255, 0, 235],
[ 0, 255, 255, ..., 8, 255, 107],
[255, 46, 178, ..., 0, 74, 0]]])
im.max() = TensorRawImage(255)
# Thisis the target (y) Tensor of the target image above
# Here the image file was read at 8bits depth, so every pixel is anyway on the scale between 0-255
TensorRawImage([[[106, 107, 113, ..., 42, 33, 39],
[111, 115, 107, ..., 38, 39, 39],
[107, 110, 112, ..., 36, 39, 51],
...,
[ 1, 9, 9, ..., 66, 70, 66],
[ 2, 8, 16, ..., 66, 76, 70],
[ 0, 0, 5, ..., 70, 74, 73]],
[[104, 91, 102, ..., 34, 39, 49],
[108, 112, 107, ..., 36, 37, 28],
[103, 102, 107, ..., 47, 27, 32],
...,
[ 3, 0, 0, ..., 44, 42, 52],
[ 1, 0, 0, ..., 33, 46, 42],
[ 0, 0, 0, ..., 37, 32, 24]],
[[ 85, 103, 105, ..., 32, 39, 32],
[ 87, 70, 84, ..., 41, 38, 39],
[101, 90, 70, ..., 38, 48, 60],
...,
[ 21, 0, 0, ..., 0, 8, 10],
[ 0, 0, 0, ..., 9, 3, 23],
[ 1, 0, 25, ..., 11, 3, 8]]])
im.max() = TensorRawImage(255)
Here’s where I’m stuck. Any idea how to make TensorBase
adjust the dtype
according to the depth of bits for each image accordingly?
I tried to dig under the hood of the TensorBase
. Found this:
# %% ../nbs/00_torch_core.ipynb 94
class TensorBase(Tensor):
"A `Tensor` which support subclass pickling, and maintains metadata when casting or after methods"
debug,_opt = False,defaultdict(list)
def __new__(cls, x, **kwargs):
res = cast(tensor(x), cls)
for k,v in kwargs.items(): setattr(res, k, v)
return res
@classmethod
def _before_cast(cls, x): return tensor(x)
def __repr__(self): return re.sub('tensor', self.__class__.__name__, super().__repr__())
def __reduce_ex__(self,proto):
torch.utils.hooks.warn_if_has_hooks(self)
args = (self.storage(), self.storage_offset(), tuple(self.size()), self.stride())
if self.is_quantized: args = args + (self.q_scale(), self.q_zero_point())
args = args + (self.requires_grad, OrderedDict())
f = torch._utils._rebuild_qtensor if self.is_quantized else torch._utils._rebuild_tensor_v2
return (_rebuild_from_type, (f, type(self), args, self.__dict__))
@classmethod
def register_func(cls, func, *oks): cls._opt[func].append(oks)
@classmethod
def __torch_function__(cls, func, types, args=(), kwargs=None):
if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)
return res
def new_tensor(self, size, dtype=None, device=None, requires_grad=False):
cls = type(self)
return self.as_subclass(Tensor).new_tensor(size, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)
def new_ones(self, data, dtype=None, device=None, requires_grad=False):
cls = type(self)
return self.as_subclass(Tensor).new_ones(data, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)
def new(self, x=None):
cls = type(self)
res = self.as_subclass(Tensor).new() if x is None else self.as_subclass(Tensor).new(x)
return res.as_subclass(cls)
def requires_grad_(self, requires_grad=True):
# Workaround https://github.com/pytorch/pytorch/issues/50219
self.requires_grad = requires_grad
return self
I thought about using this here [2] but I’m not sure how to reach out to it from the higher API of fastai.
[2]
def new_tensor(self, size, dtype=None, device=None, requires_grad=False):
cls = type(self)
return self.as_subclass(Tensor).new_tensor(size, dtype=dtype, device=device, requires_grad=requires_grad).as_subclass(cls)
If you made it this far, thank you