Tuplify bug with iterables

This is going to be a long post, but what I want to show what is happening, why is happening and discuss some fixes at the end

For this I’m going to be doing something similar to the Siamese example, the core issue is the creation of custom types that are iterable.

So let’s start, I’ll provide the minimalist example that I can, starting with the creation of the custom type:

class ImageNumber(Tuple):
  def show(self, ctx=None, **kwargs):
    return show_titled_image(self, ctx=ctx, **kwargs)
  
  @classmethod
  def create(cls, fn, num): return cls(PILImage.create(fn),num)

Something very simple, basically just an image and a number together.
Now lets’s grab some items:

source = untar_data(URLs.COCO_SAMPLE)
fns = get_image_files(source)

I’ll modify fns a bit to contain a file name and a number:

fns_pairs = fns.map(lambda o: [o, 1.])

Now we can create our dataset:

dset = Datasets(fns_pairs, tfms=[[lambda o: ImageNumber.create(*o)],
                                 [itemgetter(0), PILImage.create]])

Calling show will properly display our item:


Now, let’s create the dls:

dls = dset.dataloaders(after_item=[ToTensor(), Resize(128)],
                       after_batch=[IntToFloatTensor()])

Let’s also register a show_batch method for our types:

@typedispatch
def show_batch(x:ImageNumber, y:TensorImage, samples, **kwargs):
  return show_batch.funcs[TensorImage][TensorImage](x, y, samples, **kwargs)

We can use show_batch to check everything is working as expected:

The setup for the description of the problem is now complete, the problem happens at decode_batch, I first encountered it when calling learn.predict and traced the problem back to this line:

dec = self.dls.decode_batch((*tuplify(inp),*tuplify(dec_preds)))[0]

Let’s start by calling decode_batch on a generated batch:

b = dls.one_batch()
dls.decode_batch(b)

In the above everything runs correctly, now, let’s try doing:

xb,yb = dls.one_batch()
dls.decode_batch((*tuplify(xb),*tuplify(yb)))

This fails with a very crypt error message:

~/libs/fastcore/fastcore/dispatch.py in retain_type(new, old, typ)
    160         typ = old if isinstance(old,type) else type(old)
    161     # Do nothing the new type is already an instance of requested type (i.e. same type)
--> 162     if typ==NoneType or isinstance(new, typ): return new
    163     return retain_meta(old, cast(new, typ))
    164 

TypeError: isinstance() arg 2 must be a type or tuple of types

The problem happens when tuplify is called on xb, this is what tuplify does:

def tuplify(o, use_list=False, match=None):
    "Make `o` a tuple"
    return tuple(L(o, use_list=use_list, match=match))

Because xb is a iterable, L will not “wrap it”, so when we do:

*tuplify(xb)

we end up having *(xb[0], xb[1]) instead of the expected *(xb,), and this is what causes the error!


Okay, so now that I explained what and why is happening, I would like to propose some solutions.

What we need to happen there, is that L needs to wrap xb, currently it does not because xb is an iterable.

  • Solution 1: Create a custom type, that L always wrap, then I can do class ImageNumber(CustomTuple):
  • Solution 2: Make so that L always wraps tuples? I’m afraid this would break other interactions on the library.
  • Solution 3: When creating my ImageNumber I can make it non iterable, but this is the same as just creating a CustomTuple that is not iterable and inheriting from that
  • Solution 4: You will probably have better solutions than me :sweat_smile:

I hope this is helpful!

4 Likes

Ok, the problem is not in tuplify or L who all work as expected but in the way the input and target are concatenated to form a tuple in

dec = self.dls.decode_batch((*tuplify(inp),*tuplify(dec_preds)))[0]

inside learn.predict. We need a proper method do this and respects n_inp when it’s there and does not try to star it. For instance

dls.decode_batch((*xb,yb))

fails to. Because xb is a tuple, the library assumes it represents two inputs, expect we know it does not. Might need a bit of time to fix this properly, but will try to have something ready this evening or this weekend.

1 Like

Do you think this would work?

dls.decode_batch((*tuplify([xb]),*tuplify([yb])))

It works for the described experiment, I’ll try with other examples

EDIT: Noo, it would fail if xb was two inputs

I understand you came around the problem because predict did not work. Does show_results work?

Excellent catch, yes! It does work. (Although the items were not being displayed properly, but I think this might be a separate issue)

Ok, though it would from what I see. Let me try to write a fix.

1 Like

Pushed something, tell me if that fixes your issue with predict.

2 Likes

It does yeah! Thank you =)

Surprisingly simple solution as well

2 Likes