Fastai v2 chat

I don’t think your second one in the above time test is actually flattening.

Here it is with flattening - still slower, but not so bad

1 Like

Yes, you’re right, not sure what happened there in my code :man_shrugging::slightly_smiling_face:

look here : https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/

In a nutshell : sum(lst, []) has quadratic complexity. Comparing to:
list(itertools.chain.from_iterable(lst))
which has linear complexity, sum(lst, []) is very inefficient.

3 Likes

Thank you very much Fabrizio for sharing this, this is good to know!

Oh that makes way more sense - thanks for that!

1 Like

BTW we do already have this function in fastai v2. It’s called concat. I just changed it from using the nested comprehension approach to the chain approach (I think both are linear complexity, but chain is clearer to me).

1 Like

FYI we’re currently working on merging TfmdDS and DataSource - which means that TfmdDS doesn’t exist any more. DataSource has all of its functionality. Code is looking quite a bit simpler and easier to understand now. Not everything works though - will be fixing all the nbs over the next day or two

6 Likes

From Data pipeline and Tfms:

tfms = [Transform(math.sqrt)]
compose_tfms(4., tfms=tfms, is_enc=True)
>>> 2.0


tfms = [Transform(math.sqrt)]
compose_tfms(4., tfms=tfms, is_enc=False)
>>> 4.0

Uh… Pipe works fine for encoder but not if is_enc = False.

This causes problems for me when I do something like:

class A(Transform):
    def encodes(self, x:float):  return Float(x+1)
    def decodes(self, x): return x-1
    
tfms = [A(), Transform(math.sqrt)]
test_eq(compose_tfms(4., tfms=tfms, reverse=True, is_enc=False), 1.0)

But that is not what happens and we get 3 instead.

I think the root cause is inside Transforms and as part of TypeDispatch, for encoder we have something like {object: 'sqrt'} but not for decoder.

And since, math.sqrt is an in-built function, it wouldn’t have a decoder.

This can be seen by doing:

t = Transform(math.sqrt)
t(4.)

>> 2.0

t.decode(4.)
>> 4.0

Just wondering if the behaviour is such on purpose?

**Personally, I was expecting math.sqrt to be called anyhow since it is an inbuilt function

I’m not sure what behavior you expected. You didn’t pass a decoding function, so decode did nothing. If you want decode to do something, pass a 2nd arg to Transform (or override decodes).

2 Likes

My bad, I misunderstood the behaviour.

I was expecting the pipeline to work without passing a decoder.

Thanks for the reply!

No problem - happy to answer any questions! :slight_smile:

1 Like

Could you please give us a heads up when the new DataSource is in a state where one can start playing around with it? I looked at the repo right now and seems both the DataSource class defined in 03_data_pipeline and _06_data_source still at some point rely on TfmdList?

I understand that things can still change but would be really super happy to put my hands on whatever the latest iteration of the DataSource might be, even if there should be breaking changes just around the corner :slight_smile:

Pardon my newbness, but if I were to create a DataBunch using low level APIs, what would be the order I would create things in?

I see that a DataLoader accepts a dataset but what could serve as a dataset? DataSource is something more than a data set since it can split itself and has(d) a databunch method on it that went directly to a DataBunch?

I think the functionality of the DataSource maybe changed a little bit in the new incarnation of it, not sure, which is one of the steps along the way where I might be getting confused :).

Either way, if you would be so kind please and provide the low level progression of things to go from items to a DataBunch and give me a heads up when it is in a state where I could start playing with this it would be greatly appreciated :slight_smile:

It seems nbs 21 and 23 use the DataBlock API, I could probably jump into using this, but would really appreciate a chance at playing with the low level building blocks.

Thank you so much!

I found, it seems, incorrect implementation of resize_max() in 07_vision_core.
When max_px (maximum pixels) specified and is less than n_px (number of pixels), h and w must be scaled on math.sqrt(max_px/x.n_px), not simply (max_px/x.n_px) as of now.
To make sure the resulting n_px <= max_px I suggest using math.floor on h and w, like this:

if max_px and x.n_px>max_px: k=math.sqrt(max_px/x.n_px); h,w = math.floor(h*k),math.floor(w*k)

This leads to correct result:

In:  im.n_px
Out: 600
In:  im.resize_max(max_px=300).n_px
Out: 294

However, not sure how to write a correct test for this case, since the resulting n_px is not equal to the specified max_px. Any ideas?

This will do:

def test_le(a, b):
    "`test` that `a<=b`"
    test(a,b,operator.le,'<=')

Note that any nb starting with _ isn’t exported. We use that naming either to temporarily keep around old nbs that we will probably delete soon (as in this case) or for nbs that we haven’t gotten working yet so don’t want exported.

We are keeping TfmdList, BTW. It’s just TfmdDS that’s going away.

2 Likes

Sorry @radek I accidentally wrote TfmdList instead of TfmdDS when I wrote the earlier post announcing the merge. Edited it now

no worries at all :slight_smile: I think I should be able to figure out how things click together now, thank you!

Thanks for flagging this, it’s fixed now.

I just pushed some broken code for @sgugger to fix while I’m at breakfast, so don’t git pull for the next couple of hours

4 Likes

OK looks like it’s all working again. There’s been a bit of module and nb renaming too.

1 Like