Productionizing MixedItemList - Alternatives for Mixed Input Networks?

Hello all.

This question is mostly for @sgugger and @etremblay as they worked on MixedItemList but I hope others familiar with multi input networks can help me as well as it doesn’t appear to be a full supported scenario in v1 yet.

I’m working on a project that takes frames from the webcam using OpenCV, compares it against a reference image and then outputs two floats. So (refImage + webcamFrame) => (x,y). I built a custom PyTorch network that takes two input images and after some Googling figured out how use experimental MixedItemList using this thread as well as the source code put up by @etremblay to get training going (which appears to be working fine).

Training Code

class YNet(nn.Module):
    def __init__(self):
        super(YNet, self).__init__()
        self.left = create_body(models.resnet18)
        self.right = create_body(models.resnet18)
        self.head = create_head(512 * 2 * 2, 2)
    def forward(self, x, y):
        z1 = self.left(x)
        z2 = self.right(y)        
        z3 =[z1, z2], dim=1)
        return self.head(z3)

df = pd.DataFrame(data, columns = ['reference', 'frame', 'x', 'y'])

refImageList = ImageList.from_df(df, cols="reference", path=".")
frameImageList = ImageList.from_df(df, cols="frame", path=".")
transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0)

data = (MixedItemList([refImageList, frameImageList], path=".", inner_df = refImageList.inner_df)
      .split_by_rand_pct(0.2, 42)
      .label_from_df(cols=[2, 3], label_cls=FloatList)
      .transform([transforms,transforms], size=(150,200)) 

learn = Learner(data, YNet(), metrics=root_mean_squared_error)

Now I’m ready to move the model into my application, which has an infinite loop pulling frames from the camera and will pass the reference image and the current frame to the network for prediction and use the prediction. Ideally my loop would look like this:

Ideal Code

capture = cv2.VideoCapture(0)
referenceImage = load_image("/path/to/ref/image.png")
learn = load_learner(".", "export.pkl")

while True:
    ret, frame =
    #convert to fastai format
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    p2t = pil2tensor(frame, dtype=np.float32)/255
    frameImage = Image(p2t)

    #ideally, something like this
    preds = learn.predict(referenceImage, frameImage)

    #do work with preds
    # ...

However, MixedItemList is an experimental feature introduced in 1.0.46 and hasn’t been updated to be exported as part of the learner.export process, as documented here. Given that the issue is marked as closed, I am unsure on how to proceed. Looking over @etremblay code, it appears they loaded the test data in as a validation set and then predicted on that. I’ve attempted something like this but I receive several errors along the way.

Current Code, Not Working

capture = cv2.VideoCapture(0)
referenceImagePath = "/path/to/ref/image.png"
learn = load_learner(".", "export.pkl")

while True:
    ret, frame =

    #write frame to disk
    cv2.imwrite("temp.png", frame) 

    #recontruct temporary dataframe
    df = pd.DataFrame([[referenceImagePath, "temp.png", 0, 0]], columns = ['calibration', 'sample', 'x', 'y'])

    #recontruct MixedItemList
    referenceImageList = ImageList.from_df(df, cols="calibration", path=".")
    frameImageList = ImageList.from_df(df, cols="sample", path=".")
    transforms = get_transforms(do_flip = False, max_rotate = 0, max_zoom = 0, max_warp = 0)

    data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df)
        .label_from_df(cols=[2, 3], label_cls=FloatList)
        .transform([transforms,transforms], size=(150,200)) 

    #recreate learner
    learn = Learner(data, YNet(), metrics=root_mean_squared_error)

    #not sure if this works, code never makes it here.
    preds = learn.get_preds()

    #do work with preds
    # ...

Current Error

  File "", line 226, in Run
    data = (MixedItemList([referenceImageList, frameImageList], path=".", inner_df = referenceImageList.inner_df)
  File "/opt/conda/lib/python3.7/site-packages/fastai/", line 784, in __init__
    items = range_of(item_lists[0]) if len(item_lists) >= 1 else []
  File "/opt/conda/lib/python3.7/site-packages/fastai/", line 231, in range_of
    return list(range(len(x)))
  File "/opt/conda/lib/python3.7/site-packages/fastai/", line 71, in __len__
    def __len__(self)->int: return len(self.items) or 1
TypeError: len() of unsized object

In addition to crashing, this isn’t ideal code - I’m saving the frame to disk, then recreating a dataset for fastai and then fastai will reload from the disk. As I am trying to do realtime inference with a webcam, skipping the save to disk step would be preferred. Unfortunately, I’m not entirely sure how to handle this and would love to hear any approaches to solving it (I’m at the 99% mark for this project, the final step being getting the predictions from the learner).

Question 1
How can I best complete this code so that the learner can be used to generate predictions using realtime webcam frames? Any advice for direction will be appreciated.

Question 2
I see that @etremblay’s code uses a collate function for some text input. As I don’t have any test input and my refImage and frameImage are 1:1, do I need to worry about collating?

Question 3
Since MixedItemList is an experimental feature, is there a better solution in fastai for this? Any method that gets those two images to my network are acceptable as long as I can stay within the fastai training pipeline.

1 Like

Given the fact you are using tuples of images as inputs, you should use a custom ItemList like in this tutorial instead of MixedItemList. It would come with the advantage of being fully supported and exportable with Learner.export.

Note that in your code, the line learn.load("export.pkl") can’t work as you’re trying to load the exported file, not the model.


Thanks @sgugger, I’ll get started on implementing ImageTuple - it looks like that’ll help me solve this. Thanks for the note on learn.load, it was a forum post typo in this case but I’ll ensure it’s fixed in code.

As a question for future me: I’m planning on working on a project next which is more of a true mixed input model, something like (frame:Image + state:List[float] + text:string) => predClass. Would the process be the same? Implement a custom ItemList and return the data as a tuple?

If you want the whole library support, yes. MixedItemList won’t be developed further, v2 will provide better functionality for this.

Thanks @sgugger for the points.
Here’s how I solved it, appears to be working correctly for both training and inference.

class ImageTuple(ItemBase):
    def __init__(self, imageRef, imageFrame):
        self.imageRef = imageRef
        self.imageFrame = imageFrame
        self.obj = (imageRef, imageFrame) = [-1+2*, -1+2*]

    def apply_tfms(self, tfms, **kwargs):
        self.imageRef = self.imageRef.apply_tfms(tfms, **kwargs)
        self.imageFrame = self.imageFrame.apply_tfms(tfms, **kwargs) = [-1+2*, -1+2*]
        return self
    def __str__(self): 
        return "[1] %s, [2] %s)" % (str(self.obj[0]), str(self.obj[1]))
    def to_one(self): 
        return (self.imageRef, self.imageFrame)
class ImageTupleList(ItemList):
    def __init__(self, items, **kwargs):
        super().__init__(items, **kwargs)
    def get(self, i):
        item = self.items[i]
        return ImageTuple(item[0], item[1])
    def reconstruct(self, t:Tensor): 
        return ImageTuple(Image(t[0]/2+0.5), Image(t[1]/2+0.5))
    def from_df(cls, df, cols, path, **kwargs):
        referenceImageList = ImageList.from_df(df, cols=cols[0], path=path)
        frameImageList = ImageList.from_df(df, cols=cols[1], path=path)
        zipped = list(zip(referenceImageList, frameImageList))
        res = ImageTupleList(zipped)
        res.path = path
        res.inner_df = df
        return res