Hello all.
This question is mostly for @sgugger and @etremblay as they worked on MixedItemList but I hope others familiar with multi input networks can help me as well as it doesn’t appear to be a full supported scenario in v1 yet.
I’m working on a project that takes frames from the webcam using OpenCV, compares it against a reference image and then outputs two floats. So (refImage + webcamFrame) => (x,y)
. I built a custom PyTorch network that takes two input images and after some Googling figured out how use experimental MixedItemList using this thread as well as the source code put up by @etremblay to get training going (which appears to be working fine).
Training Code
class YNet(nn.Module):
def __init__(self):
super(YNet, self).__init__()
self.left = create_body(models.resnet18)
self.right = create_body(models.resnet18)
self.head = create_head(512 * 2 * 2, 2)
def forward(self, x, y):
z1 = self.left(x)
z2 = self.right(y)
z3 = torch.cat([z1, z2], dim=1)
return self.head(z3)
df = pd.DataFrame(data, columns = ['reference', 'frame', 'x', 'y'])
refImageList = ImageList.from_df(df, cols="reference", path=".")
frameImageList = ImageList.from_df(df, cols="frame", path=".")
transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0)
data = (MixedItemList([refImageList, frameImageList], path=".", inner_df = refImageList.inner_df)
.split_by_rand_pct(0.2, 42)
.label_from_df(cols=[2, 3], label_cls=FloatList)
.transform([transforms,transforms], size=(150,200))
.databunch(bs=100))
learn = Learner(data, YNet(), metrics=root_mean_squared_error)
learn.fit_one_cycle(30)
Now I’m ready to move the model into my application, which has an infinite loop pulling frames from the camera and will pass the reference image and the current frame to the network for prediction and use the prediction. Ideally my loop would look like this:
Ideal Code
capture = cv2.VideoCapture(0)
referenceImage = load_image("/path/to/ref/image.png")
learn = load_learner(".", "export.pkl")
while True:
ret, frame = self.cap.read()
#convert to fastai format
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
p2t = pil2tensor(frame, dtype=np.float32)/255
frameImage = Image(p2t)
#ideally, something like this
preds = learn.predict(referenceImage, frameImage)
#do work with preds
# ...
However, MixedItemList is an experimental feature introduced in 1.0.46 and hasn’t been updated to be exported as part of the learner.export process, as documented here. Given that the issue is marked as closed, I am unsure on how to proceed. Looking over @etremblay code, it appears they loaded the test data in as a validation set and then predicted on that. I’ve attempted something like this but I receive several errors along the way.
Current Code, Not Working
capture = cv2.VideoCapture(0)
referenceImagePath = "/path/to/ref/image.png"
learn = load_learner(".", "export.pkl")
while True:
ret, frame = self.cap.read()
#write frame to disk
cv2.imwrite("temp.png", frame)
#recontruct temporary dataframe
df = pd.DataFrame([[referenceImagePath, "temp.png", 0, 0]], columns = ['calibration', 'sample', 'x', 'y'])
print(df)
#recontruct MixedItemList
referenceImageList = ImageList.from_df(df, cols="calibration", path=".")
frameImageList = ImageList.from_df(df, cols="sample", path=".")
transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0)
data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df)
.split_none()
.label_from_df(cols=[2, 3], label_cls=FloatList)
.transform([transforms,transforms], size=(150,200))
.databunch(bs=1))
#recreate learner
learn = Learner(data, YNet(), metrics=root_mean_squared_error)
learn.load("export.pkl")
#not sure if this works, code never makes it here.
preds = learn.get_preds()
#do work with preds
# ...
Current Error
File "app.py", line 226, in Run
data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df)
File "/opt/conda/lib/python3.7/site-packages/fastai/data_block.py", line 784, in __init__
items = range_of(item_lists[0]) if len(item_lists) >= 1 else []
File "/opt/conda/lib/python3.7/site-packages/fastai/core.py", line 231, in range_of
return list(range(len(x)))
File "/opt/conda/lib/python3.7/site-packages/fastai/data_block.py", line 71, in __len__
def __len__(self)->int: return len(self.items) or 1
TypeError: len() of unsized object
In addition to crashing, this isn’t ideal code - I’m saving the frame to disk, then recreating a dataset for fastai and then fastai will reload from the disk. As I am trying to do realtime inference with a webcam, skipping the save to disk step would be preferred. Unfortunately, I’m not entirely sure how to handle this and would love to hear any approaches to solving it (I’m at the 99% mark for this project, the final step being getting the predictions from the learner).
Question 1
How can I best complete this code so that the learner can be used to generate predictions using realtime webcam frames? Any advice for direction will be appreciated.
Question 2
I see that @etremblay’s code uses a collate function for some text input. As I don’t have any test input and my refImage and frameImage are 1:1, do I need to worry about collating?
Question 3
Since MixedItemList is an experimental feature, is there a better solution in fastai for this? Any method that gets those two images to my network are acceptable as long as I can stay within the fastai training pipeline.