Resources detailing how to train a new model from scratch

sprocket · November 11, 2017, 2:33am

Hello everyone -

I’m wondering if anyone might have any pointers to tutorials or other resources on training a new model from scratch. I’m working with a corpus of 320x256 grayscale images. Each file is the raw output from a camera that allocates 16bits per pixel in a single layer. While I could scale the 16bits down into a 8bit image, I’d likely lose a lot of information in the process.

I’ve built other CNN based classifiers in the past, though they’ve always dealt with color RGB images and have been based off of a previously trained model. Starting from the beginning is something entirely new for me, and I’d appreciate any tips people might have!

jeremy · November 11, 2017, 3:31pm

Hopefully some folks can weigh in here - but just FYI we’ll be covering this topic later in the course.

sprocket · November 12, 2017, 11:24pm

Thanks, Jeremy!

Do you have any estimates as to which lesson we might cover this material in?

jeremy · November 13, 2017, 12:32am

In the 2nd half of the course - not sure beyond that.

helena · November 13, 2017, 4:29am

interesting, the first idea that comes to mind it to split 16 bits into two 8 bit channels perhaps?
are you building a classifier?

jeremy · November 13, 2017, 4:36am

We use floats as inputs to our models, so the 16 bit data should work fine. Have a look at the threads about the iceberg comp to see some examples of how people are handling non 3 channel data at the moment.

sprocket · November 13, 2017, 4:12pm

Ultimately it will be a classifier, yes, though as a preceding step, I was hoping to extract a region of interest (ie. a CNN outputting a bounding box around the ROI.), that would then in turn be classified.

sprocket · November 13, 2017, 4:22pm

HI Jeremy -

To visualize the data I’d been collecting from our sensors, I had been scaling the data in much the same way as is being done in “Kaggle Iceberg Challenge Starter Kit” thread, except that I was outputting a 1-channel grayscale image. To make it easier to use existing models that expect 3-channel RGB images, I suppose that I could set each of the RGB channels to the same scaled value? This wouldn’t add any new information, but would allow me to reuse an existing model to start from.

I am concerned that if I’m squashing 16 bits of info (though I believe it’s actually only 14 bits output by the sensor) down into 8 bits, that I’ll be losing some granularity, though perhaps this is a “good enough” starting point…

jeremy · November 13, 2017, 4:40pm

Yes that’s a common approach for reusing a 3-channel model with one channel. There are others, but we won’t cover them until part 2.

There’s no reason to squash down to 8 bit. What makes you think that might be needed? We use floats as our inputs.

sprocket · November 13, 2017, 5:41pm

Yes that’s a common approach for reusing a 3-channel model with one channel. There are others, but we won’t cover them until part 2.

Great, I’ll start there then.

There’s no reason to squash down to 8 bit. What makes you think that might be needed? We use floats as our inputs.

I must have missed something…I thought that for the models we had been using thus far (ie. the dogs and cats) the inputs were 3-channel RGB images, which consisted of 8-bit values? Can you point me towards one that takes floats as input?

jeremy · November 13, 2017, 5:43pm

If you feed in floats between 0 and 255 all the models we’ve seen should work fine with no code changes. Let me know if you find this not to be the case - I haven’t tried it, but I know that it’s all floats internally.

sprocket · November 13, 2017, 5:55pm

If you feed in floats between 0 and 255 all the models we’ve seen should work fine with no code changes. Let me know if you find this not to be the case - I haven’t tried it, but I know that it’s all floats internally.

Ah, ok! I can continue doing the scaling I’d been using in the past then, but instead of converting the final output to a uint8, i’ll just leave it as a floating point. That’s great, thanks!

yinterian · November 13, 2017, 6:02pm

Hi Cory,
Here is some example on how to take a grayscale images and will save them with 3 channels.

github.com

yanneta/pytorch-tutorials/blob/master/bone-age/data_processing.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true

This file has been truncated. show original

jamesrequa · November 13, 2017, 9:21pm

@yinterian just curious would it also be ok/effective to use open cv’s function to convert grayscale to RGB like below?
img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)

yinterian · November 13, 2017, 9:22pm

Anything that works. (-:

jamesrequa · November 13, 2017, 9:24pm

Sorry to clarify I meant that I see you used cv2.COLOR_GRAY2BGR (instead of RGB) is that better than just converting to RGB directly?

yinterian · November 13, 2017, 9:26pm

I am not sure. I remember just looking around on the web to try to find a solution.

sprocket · November 13, 2017, 9:54pm

BGR is the format OpenCV uses internally. Doing a conversion to RGB would also work.

I have a feeling however (unsubstantiated, though? I haven’t tested), that OpenCV will expect all of the values of the Numpy array to be of uint8, as opposed to floats that Jeremy and I were discussing above.