I was wondering is it possible to apply a tranformer such as fashion clips for image classification?
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification
processor = AutoProcessor.from_pretrained(“patrickjohncyh/fashion-clip”)
model = AutoModelForZeroShotImageClassification.from_pretrained(“patrickjohncyh/fashion-clip”)
is there any way to incorporate this to get better results?
I would like to use a custom transformer with Mobilenet or yolo classification models to achieve better results while staying light weight. I am currently using resnet for image classification but the models are around 40mbs.