3D convolution examples

Does anyone of any good examples using 3D convolution in Keras? I haven’t been able to find any code that deals with video data as 3D numpy arrays. Much appreciate the help.

Are you sure you want to do a 3D convolution? You’ll need to train your weights from scratch, otherwise the concept is the same (your tensors will actually be 5D FYI just like 2DConv are 4D).

If you use 2D convolutions with the TimeDistributed layer wrapper, you can use a pretrained network from ImageNet. That might be he better approach unless you have a lot of resources and data.


1 Like

The video data (medical) I have is unlike data from Imagenet, so I think it best to train the weight from scratch.

I found this kaggle script but it’s in Tensorflow not Keras. If anyone knows of any good Keras examples, I’d be very appreciative.

1 Like

Is your video data 2D video or 3D video (e.g. Cardiac MRI)?

If it’s 2D you may still want to use the timedistributed Conv2D layers.

Keras has built in Conv3D, MaxPooling3D, and GlobalAveragePooling3D layers that all work like their 2D counterparts.

It’s 2D rotation video of ciliary motion. Do you know of any examples of time distributed 2D examples in Keras or Tensorflow?

Check out the last two examples here (pasted below)

from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential

# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3) activation='relu', padding='same', input_shape=(3, 224, 224)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))

# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(3, 224, 224))
encoded_image = vision_model(image_input)

# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 word long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# Let's concatenate the question vector and the image vector:
merged = keras.layers.concatenate([encoded_question, encoded_image])

# And let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# This is our final model:
vqa_model = Model(inputs=[image_input, question_input], outputs=output)

# The next stage would be training this model on actual data.

from keras.layers import TimeDistributed

video_input = Input(shape=(100, 3, 224, 224))
# This is our video encoded via the previously trained vision_model (weights are reused)
encoded_frame_sequence = TimeDistributed(vision_model)(video_input)  # the output will be a sequence of vectors
encoded_video = LSTM(256)(encoded_frame_sequence)  # the output will be a vector

# This is a model-level representation of the question encoder, reusing the same weights as before:
question_encoder = Model(inputs=question_input, outputs=encoded_question)

# Let's use it to encode the question:
video_question_input = Input(shape=(100,), dtype='int32')
encoded_video_question = question_encoder(video_question_input)

# And this is our video question answering model:
merged = keras.layers.concatenate([encoded_video, encoded_video_question])
output = Dense(1000, activation='softmax')(merged)
video_qa_model = Model(inputs=[video_input, video_question_input], outputs=output)
1 Like

Most medical imaging data, funnily enough, actually does benefit from fine-tuning from imagenet. The early layers learn to recognize various geometric shapes and patterns that are useful for most types of image data.

Here’s some (old) keras conv3d code https://gist.github.com/albertomontesg/d8b21a179c1e6cca0480ebdf292c34d2


def get_liveness_model():
model = Sequential() model.add(Conv3D(32, kernel_size=(3, 3, 3), activation=‘relu’, input_shape=(24,100,100,1))) model.add(Conv3D(64, (3, 3, 3), activation=‘relu’)) model.add(MaxPooling3D(pool_size=(2, 2, 2))) model.add(Conv3D(64, (3, 3, 3), activation=‘relu’)) model.add(MaxPooling3D(pool_size=(2, 2, 2))) model.add(Conv3D(64, (3, 3, 3), activation=‘relu’)) model.add(MaxPooling3D(pool_size=(2, 2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(128, activation=‘relu’)) model.add(Dropout(0.5)) model.add(Dense(2, activation=‘softmax’))
return model

Any Idea how to do transfer learning with face images any suggestions related code?