Cnn with Rnn for Video Classification

I’ve been trying to do some video/action classification by creating a sequence of frames from the video. the first thing i did was just putting them through the resnet model but i think that takes away the sequence aspect of the video/action.

something ive seen that might fix that is to take the output layer out of the resnet model and insert an rnn. im wondering if anyone has any input about this and also how the data should be organized. if the training folder should be say 10 folders of the action, each with however many images?

i was trying to stick with models ive worked with before, but have also seen 3d models. not sure which would be better/easier to start with.

1 Like