Preprocessing issue with opencv


I’m using opencv for preprocessing step
I followed the last comment in the following URL:

  • Opencv based image pre-processing step
  • The code is
    def load_format(path, convert_mode, after_open)->Image:
    image = cv2.imread(path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = crop_image(image) #call image tranformation
    return Image(pil2tensor(image, np.float32).div
    (255)) #return fastai Image format = _load_format

But my own “load_format” was heavy.
The heaviness is not big deal for me.

When I checked the loaded image after preprocessing, I could see the correct preprocessed image with

  • temp =
  • = load_format
  • (data loading)
  • learn = cnn_learner(data, arch, metrics=[accuracy])
  • test1 =[15][0]

However, when I run

  • = temp
  • learn.fit_one_cycle(1, slice(lr))
  • test1 =[15][0]

, the preprocessed image data disappeared and the original image data came.

So I guess “learn.fit_one_cycle” is using “”.

I wanna spend the computing time for preprocessing once before training.

I don’t know why “learn.fit_one_cycle” is needed to open each image again.

What is the best solution for this problem?

If I have understood you correctly, you want to preprocess images and use the new images instead of the old ones? Then, just put the processed images into a folder and use these images as input to your ImageDataBunch and if you do not want the images to be transformed pass [] to tfms. I hope that helps.

Thanks for replying!

I was looking for a way to use the processed image directly.
I want to avoid the process of save and load.

I see. Then I think you have to change the “open” method of the DataBunch. The default is loading an image from disk, you could change this to lading and preprocessing it, for example.

As I checked,

  • “DataBunch” is using “ImageList”.
  • “ImageList” is using “open”
  • “open” is using “open_image(fn, convert_mode=self.convert_mode, after_open=self.after_open)”

In the above, I was changing “open_image” like

  • = load_format

Is this what you mean?


Ok, now I understand what you did. If I look at your code, you set

before fitting. But that is the original method and not the one you changed/implemented, isn’t it?

I was highlighting the re-use issue of “open_image” while “learn.fit_one_cycle(1, slice(lr))” is running.

If I do “ = load_format”, of course I have no problem.
However, my heavy preprocessing takes time in “learn.fit_one_cycle(1, slice(lr))” again although I already spend time when I loaded data with preprocessing.

In summary,

  • The image data is already preprocessed with “ = load_format” when I load data.
  • When I use “fit”, this function seems taking preprocessing with “ = load_format” again.
  • In my opinion, the preprocessing step should take once not every “fit”.

Maybe you could introduce a callback “preprocess_at_first_epoch”, that is only called, if you are in the first epoch. But maybe it is still the best idea to preprocess independently of the fitting and put the preprocessed data in the DataBunch. Loading the images while fitting is the standard procedure and makes absolutely sense because of memory limitations. Hope I could still help you a little bit :slight_smile:

Then the best solution looks the one you suggested

  • saving the processed data and loading it.

Thank you very much!

1 Like