I’m working on a project where we’re trying to count the occurrences of a certain event in video footage.
These events sometimes happen on overlapping time intervals, and we’re interested in how many occurrences happen give a certain unit time.
For now, I think I’m going to formulate the problem as a regression problem, where the network tries to guess the density with which the event occurs with respect to time.
The problem is, we don’t have a lot of labelled data to train the network on our specific problem.
It would be nice to have access to a pre-trained model, or at least a dataset containing videos with a variety of annotated “events” (which somtimes happen on overlapping time intervals) on which I could pre-train my network before I fine tune on our task. However, I’ve been googling for a few days and I can’t find anything about this subject.
Any suggestions on a dataset we could use or adapt to this purpose?
How much data do you have labeled? One option is to train a model with that, then get predictions on your unlabeled data, go through it, and adjust those predictions as such to slowly label your new data, retrain, and repeat. Ideally you should be adding ~30% or so of ‘new’ data every cycle and slowly work your way to the whole set. One thought.
This is semi-supervised learning.
I have about 88 hours of labelled footage, but the event we need to detect is quite rare and we have only about 770 events of about 10 seconds each to train on. Of course this would only be a fraction of the training set, the rest being random clips sampled from parts of the footage when nothing was happening (event density = 0)
That sounds like a good idea on something we could do, however I’m doubting whether the quality of the annotations is good enough for such an approach.
If you need to, then start with less labeled data. Go through Maybe 10% of that and ensure they’re good, and then do the semi-supervised on your labeled data you’re doubting.