I’m wondering if anyone might have any pointers to tutorials or other resources on training a new model from scratch. I’m working with a corpus of 320x256 grayscale images. Each file is the raw output from a camera that allocates 16bits per pixel in a single layer. While I could scale the 16bits down into a 8bit image, I’d likely lose a lot of information in the process.
I’ve built other CNN based classifiers in the past, though they’ve always dealt with color RGB images and have been based off of a previously trained model. Starting from the beginning is something entirely new for me, and I’d appreciate any tips people might have!
We use floats as inputs to our models, so the 16 bit data should work fine. Have a look at the threads about the iceberg comp to see some examples of how people are handling non 3 channel data at the moment.
Ultimately it will be a classifier, yes, though as a preceding step, I was hoping to extract a region of interest (ie. a CNN outputting a bounding box around the ROI.), that would then in turn be classified.
To visualize the data I’d been collecting from our sensors, I had been scaling the data in much the same way as is being done in “Kaggle Iceberg Challenge Starter Kit” thread, except that I was outputting a 1-channel grayscale image. To make it easier to use existing models that expect 3-channel RGB images, I suppose that I could set each of the RGB channels to the same scaled value? This wouldn’t add any new information, but would allow me to reuse an existing model to start from.
I am concerned that if I’m squashing 16 bits of info (though I believe it’s actually only 14 bits output by the sensor) down into 8 bits, that I’ll be losing some granularity, though perhaps this is a “good enough” starting point…
Yes that’s a common approach for reusing a 3-channel model with one channel. There are others, but we won’t cover them until part 2.
Great, I’ll start there then.
There’s no reason to squash down to 8 bit. What makes you think that might be needed? We use floats as our inputs.
I must have missed something…I thought that for the models we had been using thus far (ie. the dogs and cats) the inputs were 3-channel RGB images, which consisted of 8-bit values? Can you point me towards one that takes floats as input?
If you feed in floats between 0 and 255 all the models we’ve seen should work fine with no code changes. Let me know if you find this not to be the case - I haven’t tried it, but I know that it’s all floats internally.
If you feed in floats between 0 and 255 all the models we’ve seen should work fine with no code changes. Let me know if you find this not to be the case - I haven’t tried it, but I know that it’s all floats internally.
Ah, ok! I can continue doing the scaling I’d been using in the past then, but instead of converting the final output to a uint8, i’ll just leave it as a floating point. That’s great, thanks!
@yinterian just curious would it also be ok/effective to use open cv’s function to convert grayscale to RGB like below? img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)
BGR is the format OpenCV uses internally. Doing a conversion to RGB would also work.
I have a feeling however (unsubstantiated, though? I haven’t tested), that OpenCV will expect all of the values of the Numpy array to be of uint8, as opposed to floats that Jeremy and I were discussing above.