Fastai v2 chat

jeremy · September 22, 2019, 1:13pm

I don’t think your second one in the above time test is actually flattening.

jeremy · September 22, 2019, 1:16pm

Here it is with flattening - still slower, but not so bad

radek · September 22, 2019, 1:47pm

Yes, you’re right, not sure what happened there in my code

fabris · September 22, 2019, 2:14pm

look here : https://mathieularose.com/how-not-to-flatten-a-list-of-lists-in-python/

In a nutshell : sum(lst, []) has quadratic complexity. Comparing to:
list(itertools.chain.from_iterable(lst))
which has linear complexity, sum(lst, []) is very inefficient.

radek · September 22, 2019, 5:34pm

Thank you very much Fabrizio for sharing this, this is good to know!

jeremy · September 22, 2019, 9:10pm

Oh that makes way more sense - thanks for that!

jeremy · September 22, 2019, 9:46pm

BTW we do already have this function in fastai v2. It’s called concat. I just changed it from using the nested comprehension approach to the chain approach (I think both are linear complexity, but chain is clearer to me).

jeremy · September 22, 2019, 10:20pm

FYI we’re currently working on merging TfmdDS and DataSource - which means that TfmdDS doesn’t exist any more. DataSource has all of its functionality. Code is looking quite a bit simpler and easier to understand now. Not everything works though - will be fixing all the nbs over the next day or two

arora_aman · September 23, 2019, 1:12am

From Data pipeline and Tfms:

tfms = [Transform(math.sqrt)]
compose_tfms(4., tfms=tfms, is_enc=True)
>>> 2.0


tfms = [Transform(math.sqrt)]
compose_tfms(4., tfms=tfms, is_enc=False)
>>> 4.0

Uh… Pipe works fine for encoder but not if is_enc = False.

This causes problems for me when I do something like:

class A(Transform):
    def encodes(self, x:float):  return Float(x+1)
    def decodes(self, x): return x-1
    
tfms = [A(), Transform(math.sqrt)]
test_eq(compose_tfms(4., tfms=tfms, reverse=True, is_enc=False), 1.0)

But that is not what happens and we get 3 instead.

I think the root cause is inside Transforms and as part of TypeDispatch, for encoder we have something like {object: 'sqrt'} but not for decoder.

And since, math.sqrt is an in-built function, it wouldn’t have a decoder.

This can be seen by doing:

t = Transform(math.sqrt)
t(4.)

>> 2.0

t.decode(4.)
>> 4.0

Just wondering if the behaviour is such on purpose?

**Personally, I was expecting math.sqrt to be called anyhow since it is an inbuilt function

jeremy · September 23, 2019, 1:35am

I’m not sure what behavior you expected. You didn’t pass a decoding function, so decode did nothing. If you want decode to do something, pass a 2nd arg to Transform (or override decodes).

arora_aman · September 23, 2019, 1:40am

My bad, I misunderstood the behaviour.

I was expecting the pipeline to work without passing a decoder.

Thanks for the reply!

jeremy · September 23, 2019, 5:16am

No problem - happy to answer any questions!

radek · September 23, 2019, 12:28pm

Could you please give us a heads up when the new DataSource is in a state where one can start playing around with it? I looked at the repo right now and seems both the DataSource class defined in 03_data_pipeline and _06_data_source still at some point rely on TfmdList?

I understand that things can still change but would be really super happy to put my hands on whatever the latest iteration of the DataSource might be, even if there should be breaking changes just around the corner

Pardon my newbness, but if I were to create a DataBunch using low level APIs, what would be the order I would create things in?

I see that a DataLoader accepts a dataset but what could serve as a dataset? DataSource is something more than a data set since it can split itself and has(d) a databunch method on it that went directly to a DataBunch?

I think the functionality of the DataSource maybe changed a little bit in the new incarnation of it, not sure, which is one of the steps along the way where I might be getting confused :).

Either way, if you would be so kind please and provide the low level progression of things to go from items to a DataBunch and give me a heads up when it is in a state where I could start playing with this it would be greatly appreciated

It seems nbs 21 and 23 use the DataBlock API, I could probably jump into using this, but would really appreciate a chance at playing with the low level building blocks.

Thank you so much!

kdorichev · September 23, 2019, 12:41pm

I found, it seems, incorrect implementation of resize_max() in 07_vision_core.
When max_px (maximum pixels) specified and is less than n_px (number of pixels), h and w must be scaled on math.sqrt(max_px/x.n_px), not simply (max_px/x.n_px) as of now.
To make sure the resulting n_px <= max_px I suggest using math.floor on h and w, like this:

if max_px and x.n_px>max_px: k=math.sqrt(max_px/x.n_px); h,w = math.floor(h*k),math.floor(w*k)

This leads to correct result:

In:  im.n_px
Out: 600
In:  im.resize_max(max_px=300).n_px
Out: 294

However, not sure how to write a correct test for this case, since the resulting n_px is not equal to the specified max_px. Any ideas?

This will do:

def test_le(a, b):
    "`test` that `a<=b`"
    test(a,b,operator.le,'<=')

jeremy · September 23, 2019, 1:07pm

Note that any nb starting with _ isn’t exported. We use that naming either to temporarily keep around old nbs that we will probably delete soon (as in this case) or for nbs that we haven’t gotten working yet so don’t want exported.

We are keeping TfmdList, BTW. It’s just TfmdDS that’s going away.

jeremy · September 23, 2019, 1:33pm

Sorry @radek I accidentally wrote TfmdList instead of TfmdDS when I wrote the earlier post announcing the merge. Edited it now

radek · September 23, 2019, 1:39pm

no worries at all I think I should be able to figure out how things click together now, thank you!

sgugger · September 23, 2019, 1:50pm

Thanks for flagging this, it’s fixed now.

jeremy · September 23, 2019, 2:20pm

I just pushed some broken code for @sgugger to fix while I’m at breakfast, so don’t git pull for the next couple of hours

jeremy · September 23, 2019, 4:55pm

OK looks like it’s all working again. There’s been a bit of module and nb renaming too.