Batch predict from video frames

I’m trying to classify video frames, and for that i need to create a batch from a sequence of frames and pass it to the model. currently am classifying frame by frame as following:

>        cap = cv2.VideoCapture(video_path)
>         _, frame=
>         frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
>         img_t = pil2tensor(frame, np.float32)
>         img_t.div_(255.0)
>         image = Image(img_t)
>         pred = model.predict(image)

is there a way to create batch directly form frames, or to add them to a dataset and get the batch from it?

Thanks in advance

The way to go would be to create a custom Dataset that will load your data in init (if they fit in your memory) and to have getitem retrieves one video and its label.

Once you have done that, a standard DataLoader can be set with this Dataset object and your batch size. PyTorch/FastAi will then automatically take care of generating your batches.

One example below:
Note: my dataset doesn’t fit in the (~1.5TB) so I load files in getitem

class SequenceDataset(Dataset):
    def __init__(self,
                 file_list: np.array = None,
                 list_labels: np.array = None) -> None:
        super(Dataset, self).__init__()
        self._list_sequence_file = file_list
        self.labels = list_labels

    def __getitem__(self, index) -> np.array:
        if index < len(self._list_sequence_file):
            sequence = np.load(self._list_sequence_file[index])
            sequence = sequence.astype(np.float)

            # Any needed normalization

            sequence = np.expand_dims(sequence, axis=0)
            sequence = torch.from_numpy(sequence).float()
            return sequence, labels[index]

        return None, None

    def __len__(self) -> int:
        return len(self._list_sequence_file)

@bennnun thanks for your answer i’ll try this,
but i was wondering since i’m using an exported model where should I (if needed) apply the transformation function tfms (for example the one that normalize the tensor) ?

@kyis, I don’t think (not sure though) that FastAi library is able to apply transformations on videos through the data bunch.

I think your best chance is to apply the needed transformations directly in the getitem function before returning the two tensors (videos + label).

@bennnun ok i’ll go with this way, many thanks :slight_smile: