What is the logic behind freqhist_bins function of medical.imaging module?

I have been trying to understand the logic behind freqhist_bins function of medical.imaging module. This function can be found in fastai/60_medical.imaging.ipynb at master · fastai/fastai · GitHub. Git Permalink is fastai/imaging.py at f1977193eb21742f72c72199a52f862a471c4bf5 · fastai/fastai · GitHub.

def freqhist_bins(self:Tensor, n_bins=100):
    "A function to split the range of pixel values into groups, such that each group has around the same number of pixels"
    imsd = self.view(-1).sort()[0]
    t = torch.cat([tensor([0.001]),
                   torch.arange(n_bins).float()/n_bins+(1/2/n_bins),
                   tensor([0.999])])
    t = (len(imsd)*t).long()
    return imsd[t].unique()

The algorithm given above seems heavily correlated with torch.linspace. Thus, I am puzzled as to why these values were chosen? In my experience, I have noticed the torch.linspace works better than the above algorithm. Also, any specific reason why uniform transformation is preferred in medical module over normal transformation? If uniform transformation does perform better than normal, then why not use QuantileTransformer from sklearn?

Thanks

The ideas are discussed here:

We can’t use sklearn since it’s not GPU accelerated and doesn’t use PyTorch tensors.

1 Like

Yes, I have read that. However, it seems freqhist_bins has been used directly there without any insights about selection of the values (eg. why 0.05 was added after dividing by n_bins). Could you please shed some light on these? As far as sklearn goes, was that the only constraint? Thanks

I’m afraid I don’t recall the details any more!

I didn’t look into sklearn much since I was writing a lib targeting PyTorch, so I couldn’t say if there are any particular issues there.

1 Like