This question is mostly for @sgugger and @etremblay as they worked on MixedItemList but I hope others familiar with multi input networks can help me as well as it doesn’t appear to be a full supported scenario in v1 yet.
I’m working on a project that takes frames from the webcam using OpenCV, compares it against a reference image and then outputs two floats. So
(refImage + webcamFrame) => (x,y). I built a custom PyTorch network that takes two input images and after some Googling figured out how use experimental MixedItemList using this thread as well as the source code put up by @etremblay to get training going (which appears to be working fine).
class YNet(nn.Module): def __init__(self): super(YNet, self).__init__() self.left = create_body(models.resnet18) self.right = create_body(models.resnet18) self.head = create_head(512 * 2 * 2, 2) def forward(self, x, y): z1 = self.left(x) z2 = self.right(y) z3 = torch.cat([z1, z2], dim=1) return self.head(z3) df = pd.DataFrame(data, columns = ['reference', 'frame', 'x', 'y']) refImageList = ImageList.from_df(df, cols="reference", path=".") frameImageList = ImageList.from_df(df, cols="frame", path=".") transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0) data = (MixedItemList([refImageList, frameImageList], path=".", inner_df = refImageList.inner_df) .split_by_rand_pct(0.2, 42) .label_from_df(cols=[2, 3], label_cls=FloatList) .transform([transforms,transforms], size=(150,200)) .databunch(bs=100)) learn = Learner(data, YNet(), metrics=root_mean_squared_error) learn.fit_one_cycle(30)
Now I’m ready to move the model into my application, which has an infinite loop pulling frames from the camera and will pass the reference image and the current frame to the network for prediction and use the prediction. Ideally my loop would look like this:
capture = cv2.VideoCapture(0) referenceImage = load_image("/path/to/ref/image.png") learn = load_learner(".", "export.pkl") while True: ret, frame = self.cap.read() #convert to fastai format frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) p2t = pil2tensor(frame, dtype=np.float32)/255 frameImage = Image(p2t) #ideally, something like this preds = learn.predict(referenceImage, frameImage) #do work with preds # ...
However, MixedItemList is an experimental feature introduced in 1.0.46 and hasn’t been updated to be exported as part of the learner.export process, as documented here. Given that the issue is marked as closed, I am unsure on how to proceed. Looking over @etremblay code, it appears they loaded the test data in as a validation set and then predicted on that. I’ve attempted something like this but I receive several errors along the way.
Current Code, Not Working
capture = cv2.VideoCapture(0) referenceImagePath = "/path/to/ref/image.png" learn = load_learner(".", "export.pkl") while True: ret, frame = self.cap.read() #write frame to disk cv2.imwrite("temp.png", frame) #recontruct temporary dataframe df = pd.DataFrame([[referenceImagePath, "temp.png", 0, 0]], columns = ['calibration', 'sample', 'x', 'y']) print(df) #recontruct MixedItemList referenceImageList = ImageList.from_df(df, cols="calibration", path=".") frameImageList = ImageList.from_df(df, cols="sample", path=".") transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0) data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df) .split_none() .label_from_df(cols=[2, 3], label_cls=FloatList) .transform([transforms,transforms], size=(150,200)) .databunch(bs=1)) #recreate learner learn = Learner(data, YNet(), metrics=root_mean_squared_error) learn.load("export.pkl") #not sure if this works, code never makes it here. preds = learn.get_preds() #do work with preds # ...
File "app.py", line 226, in Run data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df) File "/opt/conda/lib/python3.7/site-packages/fastai/data_block.py", line 784, in __init__ items = range_of(item_lists) if len(item_lists) >= 1 else  File "/opt/conda/lib/python3.7/site-packages/fastai/core.py", line 231, in range_of return list(range(len(x))) File "/opt/conda/lib/python3.7/site-packages/fastai/data_block.py", line 71, in __len__ def __len__(self)->int: return len(self.items) or 1 TypeError: len() of unsized object
In addition to crashing, this isn’t ideal code - I’m saving the frame to disk, then recreating a dataset for fastai and then fastai will reload from the disk. As I am trying to do realtime inference with a webcam, skipping the save to disk step would be preferred. Unfortunately, I’m not entirely sure how to handle this and would love to hear any approaches to solving it (I’m at the 99% mark for this project, the final step being getting the predictions from the learner).
How can I best complete this code so that the learner can be used to generate predictions using realtime webcam frames? Any advice for direction will be appreciated.
I see that @etremblay’s code uses a collate function for some text input. As I don’t have any test input and my refImage and frameImage are 1:1, do I need to worry about collating?
Since MixedItemList is an experimental feature, is there a better solution in fastai for this? Any method that gets those two images to my network are acceptable as long as I can stay within the fastai training pipeline.