Classification of video


I was wondering if there is any project/code I can use to classify video snippets.
What I want to do is mark a juggling video by identifying the individual juggling tricks in it. Anyone has any idea on how to do this?
I’m interested in the theory side (which architecture) and the practical side as well (if there’s any code out there that I can use to build a prototype of this quickly).

(alex) #2

I’m also interested in this. I’ve started building it myself.

I’m running a truncated CNN on the frames of the video to get a vector of ~25k numbers from the final bottleneck representing each frame. Then just run an LSTM over the vector of frames to compute a vector for the snippet. Also working on stacking LSTMs. Then just use the final hidden state of the LSTM to fit a final softmax layer into the class you’re predicting.


Does it work so far?


@markovbling: Any progress?

(alex) #5

yeah, it works! initially just did resnet on each frame and then did a sort of filter over the outputs to smooth the predictions. LSTM added decent improvement too. Writing a paper on it will post when draft done…


I would be very interested in seeing a video with the results from this!