[Q] Best way to preprocess images+text sequences for feeding into image captioning model?

lobvh · September 27, 2021, 5:35pm

This is my first time doing IC modeling. I want to extract images + corresponding text sequences, but I don’t know what is the best way to organise those files for training an image captioning model?