3D convolution examples

(Charles Lu) #1

Does anyone of any good examples using 3D convolution in Keras? I haven’t been able to find any code that deals with video data as 3D numpy arrays. Much appreciate the help.

(David Gutman) #2

Are you sure you want to do a 3D convolution? You’ll need to train your weights from scratch, otherwise the concept is the same (your tensors will actually be 5D FYI just like 2DConv are 4D).

If you use 2D convolutions with the TimeDistributed layer wrapper, you can use a pretrained network from ImageNet. That might be he better approach unless you have a lot of resources and data.


(Charles Lu) #3

The video data (medical) I have is unlike data from Imagenet, so I think it best to train the weight from scratch.

(Charles Lu) #4

I found this kaggle script but it’s in Tensorflow not Keras. If anyone knows of any good Keras examples, I’d be very appreciative.

(David Gutman) #5

Is your video data 2D video or 3D video (e.g. Cardiac MRI)?

If it’s 2D you may still want to use the timedistributed Conv2D layers.

Keras has built in Conv3D, MaxPooling3D, and GlobalAveragePooling3D layers that all work like their 2D counterparts.

(Charles Lu) #6

It’s 2D rotation video of ciliary motion. Do you know of any examples of time distributed 2D examples in Keras or Tensorflow?

(William Minshew) #7

Check out the last two examples here (pasted below)

from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential

# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3) activation='relu', padding='same', input_shape=(3, 224, 224)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))

# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(3, 224, 224))
encoded_image = vision_model(image_input)

# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 word long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# Let's concatenate the question vector and the image vector:
merged = keras.layers.concatenate([encoded_question, encoded_image])

# And let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# This is our final model:
vqa_model = Model(inputs=[image_input, question_input], outputs=output)

# The next stage would be training this model on actual data.

from keras.layers import TimeDistributed

video_input = Input(shape=(100, 3, 224, 224))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)

(Jeremy Howard) #8

Most medical imaging data, funnily enough, actually does benefit from fine-tuning from imagenet. The early layers learn to recognize various geometric shapes and patterns that are useful for most types of image data.

Here’s some (old) keras conv3d code https://gist.github.com/albertomontesg/d8b21a179c1e6cca0480ebdf292c34d2