Abbreviations for train/validation/test variables

heh - I just suggested t in your PR :slight_smile: I kinda hate using x

1 Like

We’ll be adding type annotations to all params.

2 Likes

Ah! Fantastic!

I strongly agree with @jsa169

Abbreviations make things a lot harder to read. I’ve been coding professionally for about five years now and I always see beginners make the mistake of being too cryptic with their code. The purpose of using a programming language is to make it easier to talk to the computer, not harder.

I just started watching Part 1’s lessons recently and the hardest things for me so far when it comes to code have been the massive amounts of global imports and the abbreviations.

Zen of Python

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.

Imports

When it comes to imports, it’s better to import the library then call its functions Library.some_function() instead of a cryptic some_function(). Knowing what library is being called makes it a lot easier to understand intent, and helps when you want to look up documentation when necessary.

Abbreviations

Jason’s example is perfect (copying pasting for readability).

def seq2seq_loss(input, target):
    sl,bs = target.size()
    sl_in,bs_in,nc = input.size()
..... (truncated)
def sequence2sequence_loss(input, target):
    target_sequence_length, target_batch_size = target.size()
    input_sequence_length, input_batch_size, num_unique_tokens = input.size()
....(truncated)

The second example is readable, the first isn’t.

Hope this feedback helps.

Matan

1 Like

Please don’t re-cover things that have been extensively discussed on the forum, in the style guide, and in the lessons.

I have to apologize for starting this. You’ve definitely covered this and I think I’m one of those programmers that you’re referring to here in the quote below. I’m going to try to make a conscious effort to not let knee-jerk reactions on short names that I’ve developed over the years get in the way of understanding the benefits. (I’ll try!) :grinning: Anyway- it’s definitely a new approach to me and I know it’s going to be for many others.

Everyone has strong opinions about coding style, except perhaps some very experienced coders, who have used many languages, who realize there’s lots of different perfectly acceptable approaches. The python community has particularly strongly held views, on the whole. I suspect this is related to Python being a language targeted at beginners, and therefore there are a lot of users with limited experience in other languages; however this is just a guess.

I think it’ll be helpful to link to what you’re referring to- yes, we’re beating a dead horse.

Forum discussion

Style guide

3 Likes

Thanks, good reads.

Makes more sense now why you would want more condensed code for math related stuff, especially when contributing code as opposed to just using the library.

1 Like

@stas FYI @rachel has just started adding prose to the first notebook. To avoid annoying problems with merging, you probably want to know about this:

Yes, nbdime is great. Thank you.

Perhaps @rachel could notify me when her changes have been merged - so that we don’t step on each other’s toes, and I will resume then.

(I’ve been migrating to kubuntu 18.04 so I’ve been away)

Actually the toe-stepping may be a problem, since @313V has started working on adding prose to the other notebooks now too. So I’ve suggested to him that maybe it’s easier for him to do the renaming as he goes. If you’ve already made some changes @stas it might be a good idea to push them.

No, not yet. I was just finishing the setup of the new 18.04 env.

Actually the toe-stepping may be a problem, since @313V has started working on adding prose to the other notebooks now too.

ok, then I will occupy myself with other things for now.

and there are 2 outstanding questions waiting for your attention @jeremy (naming-wise):

Thanks

Python modules use _, so let’s do that in nbs too.

I don’t think so - because this pattern is nearly always in for xb,yb in dl, where we don’t want long names. I know in the notebooks we have this pattern separated out, but I’d like to keep it consistent with how people will generally see it in the code.

@313V please let me know if I can help you with adding prose to the notebooks. I have been following the discussions closely and have free time available.

help would be great, technical writing isn’t necessarily my strength anyway so at minimum someone to help review and edit would be awesome and i’m sure you could add a lot more than that.
What time zone are you in? Are you on any kind of video chat like skype/gchat/facetime etc? It would be cool to chat face to face and form a plan of attack.

1 Like

Sorry, this got miscommunicated, quoting again:

loss_fn(model(x_valid[0:bs]), y_valid[0:bs])
xb, yb = next(iter(valid_dl))

Should x_valid and y_valid be valid_x and valid_y? I quoted the second line because of valid_dl

More examples of inconsistencies from the current codebase:

data = DataBunch            (train_ds, valid_ds,
data = DataBunch.from_arrays(x_train,y_train,x_valid,y_valid, x_tfms=mnist2image)\n",

should these be: train_x,train_y,valid_x,valid_y? It’d now look:

data = DataBunch            (train_ds, valid_ds,
data = DataBunch.from_arrays(train_x,train_y,valid_x,valid_y, x_tfms=mnist2image)\n",

except x_tfms is now a mismatch with this group. But it’s _tfms everywhere in the current code so it’s probably good.

Good, so could one of you with commit access do the filename renaming - it’d complicate things to do this via PR, if changes happen in the files meanwhile.

and also I mentioned keeping consistent casing would be awesome too - it seems that all-low-case are the majority of files so far, so perhaps renaming files to lowercase would be a sensible choice.

if you have https://stackoverflow.org/wiki/Rename.pl it’s 2 secs:

rename.pl 's|-|_|g' *ipynb
rename.pl '$_=lc $_' *ipynb

Thank you.

Done.

1 Like

Sorry my fault for not reading carefully. Yes I guess they should.

except x_tfms is now a mismatch with this group. But it’s _tfms everywhere in the current code so it’s probably good.

Ugh. Hmm… I’m having trouble coming up with reasons why it shouldn’t be tfms_x.

I think it’s correct for it to be _tfms, it’s like _dl, _ds, etc. And in the current v1 code base it is _tfms everywhere. Moreover if you do decide to flip it, then what do we do here?

DataBunch(train_tds, valid_ds, bs=bs, train_tfms=batch_transforms, valid_tfms=batch_transforms)

how do we deal with valid_ and train_ being prefixes.

I just meant that it locally will break the flow of something_x, followed by x_something.
Perhaps it shouldn’t be ‘x’ in x_tfms, but something else and that will fix the issue, leaving _tfms as is?

Well the good news is that I removed that parameter today, so nothing to worry about now :wink:

2 Likes