Preprocessing issue with opencv

sijang · July 18, 2019, 5:26am

Hi.

I’m using opencv for preprocessing step
I followed the last comment in the following URL:

Opencv based image pre-processing step
The code is
###############
def load_format(path, convert_mode, after_open)->Image:
image = cv2.imread(path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = crop_image(image) #call image tranformation
return Image(pil2tensor(image, np.float32).div(255)) #return fastai Image format
vision.data.open_image = _load_format
#############

But my own “load_format” was heavy.
The heaviness is not big deal for me.

When I checked the loaded image after preprocessing, I could see the correct preprocessed image with

temp = vision.data.open_image
vision.data.open_image = load_format
(data loading)
learn = cnn_learner(data, arch, metrics=[accuracy])
test1 = learn.data.train_ds[15][0]

However, when I run

vision.data.open_image = temp
learn.fit_one_cycle(1, slice(lr))
test1 = learn.data.train_ds[15][0]

, the preprocessed image data disappeared and the original image data came.

So I guess “learn.fit_one_cycle” is using “vision.data.open_image”.

I wanna spend the computing time for preprocessing once before training.

I don’t know why “learn.fit_one_cycle” is needed to open each image again.

What is the best solution for this problem?

ptrampert · July 18, 2019, 6:11am

If I have understood you correctly, you want to preprocess images and use the new images instead of the old ones? Then, just put the processed images into a folder and use these images as input to your ImageDataBunch and if you do not want the images to be transformed pass [] to tfms. I hope that helps.

sijang · July 18, 2019, 6:20am

Thanks for replying!

I was looking for a way to use the processed image directly.
I want to avoid the process of save and load.

ptrampert · July 18, 2019, 6:25am

I see. Then I think you have to change the “open” method of the DataBunch. The default is loading an image from disk, you could change this to lading and preprocessing it, for example.

sijang · July 18, 2019, 6:39am

As I checked,

“DataBunch” is using “ImageList”.
“ImageList” is using “open”
“open” is using “open_image(fn, convert_mode=self.convert_mode, after_open=self.after_open)”

In the above, I was changing “open_image” like

vision.data.open_image = load_format

Is this what you mean?

Thanks!

ptrampert · July 18, 2019, 6:43am

Ok, now I understand what you did. If I look at your code, you set

before fitting. But that is the original method and not the one you changed/implemented, isn’t it?

sijang · July 18, 2019, 6:58am

I was highlighting the re-use issue of “open_image” while “learn.fit_one_cycle(1, slice(lr))” is running.

If I do “vision.data.open_image = load_format”, of course I have no problem.
However, my heavy preprocessing takes time in “learn.fit_one_cycle(1, slice(lr))” again although I already spend time when I loaded data with preprocessing.

In summary,

The image data is already preprocessed with “vision.data.open_image = load_format” when I load data.
When I use “fit”, this function seems taking preprocessing with “vision.data.open_image = load_format” again.
In my opinion, the preprocessing step should take once not every “fit”.

ptrampert · July 18, 2019, 7:06am

Maybe you could introduce a callback “preprocess_at_first_epoch”, that is only called, if you are in the first epoch. But maybe it is still the best idea to preprocess independently of the fitting and put the preprocessed data in the DataBunch. Loading the images while fitting is the standard procedure and makes absolutely sense because of memory limitations. Hope I could still help you a little bit

sijang · July 18, 2019, 7:10am

Then the best solution looks the one you suggested

saving the processed data and loading it.

Thank you very much!