Abbreviations for train/validation/test variables


(Jeremy Howard) #21
  1. Yes I think valid_ here
  2. That’s my concern too. Another option would be to use _ as the lambda variable. And if there are multiple, _1 and _2. Is that too weird?
  3. I don’t think we have anything with multiple LRs yet do we? If we do, then I think lrs is important too knowing that you’ve got a list of them.

(Jeremy Howard) #22
  1. Yes that’s what I meant. dir is a commonly used function in python, so maybe we should stick with folder
  2. class is a reserved word
  3. I really meant cls to be used just for the argument to class methods. So maybe we should use clas here
  4. Yes
  5. I’m finding it useful to clarify my own thinking. And it’s nice to have a clear role model for standards. But when people contribute in the future I don’t want us to be too strict about this kind of thing - we can always refactor/rename PRs ourselves if we want to.

(Stas Bekman) #23

Yes, I meant pandas’s dataframe. Many notebooks seem to use dep:

train_proc_df, y, nas, mapper = proc_df(train_df, dep, do_scale=True)

Hmm, this forum software is not super-friendly for this kind of parallel multi-issue discussion. Is there a better workflow to follow? It’s so much easier to do that over email with automatic quoting. The Internet progress keeps on dumping things that work so well and re-invents the wheel (poorly). :frowning:


(Stas Bekman) #24

_ is already used as a convention to discard return values, so probably this is not the best choice. _\1 is weird.

Perhaps it’s ok if we continue using x in very short lambdas. or perhaps l for lambda?

I don’t think we have them at the moment, the v0’s API uses lrs & wds in fit()

for better clarity in communications how should we refer to the pre-v1 codebase? v0?


(Stas Bekman) #25

ah, yes! all those python reserved words.

clas is weird. If we don’t want to collide with anything perhaps cl?

another alternative adding some short prefix img_class? it also makes it more specific

Thank you for the feedback, Jeremy. That’s helpful to me.


#26

It does, but it also accepts an array of lrs/wds for differential learning rates/weight decays, or just one, it works in both cases.


(Jeremy Howard) #27

Hmm, this forum software is not super-friendly for this kind of parallel multi-issue discussion. Is there a better workflow to follow? It’s so much easier to do that over email with automatic quoting.

You can actually just prefix lines with > just like with email, and it turns it into quote - that’s what I’ve done above. Or whilst composing you can select something from a previous message, and it pops up a ‘quote’ button, like so:

OK I’m back to using o then :slight_smile:

OK.

Much better! Actually maybe we should stop calling them ‘classes’ and start calling them ‘categories’ - which is quite naturally then cat and cats. I think in v0 I might have used these terms interchangeably…


(Stas Bekman) #28

You’re correct for how they were passed as positional arguments in notebooks:

"learn.fit(lr, 3, cycle_len=1, cycle_mult=2)"
"learn.fit(lrs, 3, cycle_len=1, cycle_mult=2)"

but then internally and the named argument is lrs - perhaps the new API can be consistent?

v0:

def get_layer_opt(self, lrs, wds):
def fit(self, lrs, n_cycle, wds=None, **kwargs):

yet, it v1 we currently have:

def fit(self, epochs, lr, opt_fn=optim.SGD):

I’m talking about s/lr/lrs/ in v1. I hope we are on the same page now.


(Stas Bekman) #29

Hmm, this forum software is not super-friendly for this kind of parallel multi-issue discussion. Is there a better workflow to follow? It’s so much easier to do that over email with automatic quoting.

You can actually just prefix lines with > just like with email, and it turns it into quote - that’s what I’ve done above. Or whilst composing you can select something from a previous message, and it pops up a ‘quote’ button, like so:

Yes, that I figured out :slight_smile: but when quoting it doesn’t respect the hierarchy of quotes, flattening all quotes to the same level, making it impossible to distinguish who said what and I have to go and re-add >>. Not user-friendly at all.

Perhaps it’s ok if we continue using x in very short lambdas. or perhaps l for lambda?

OK I’m back to using o then :slight_smile:

Did you mean 'back to using x and not o?

another alternative adding some short prefix img_class? it also makes it more specific

Much better! Actually maybe we should stop calling them ‘classes’ and start calling them ‘categories’ - which is quite naturally then cat and cats. I think in v0 I might have used these terms interchangeably…

That’s even better. I wasn’t sure whether categories were already given to cat_vars. So can you be more specific, Jeremy? Do you suggest:

s/cat_vars/category_vars/
s/cats/categories/
s/cat/category/
s#folder/cls#folder/category#

Yes? So for example in nb_002.py it’d appear:

class FilesDataset(Dataset):
    def __init__(self, folder, categories):
        self.fns, self.y = [], []
        self.categories = categories
        for i, category in enumerate(categories):
            fnames = get_image_files(folder/category)
            self.fns += fnames
            self.y += [i] * len(fnames)

Continuing this thread of thought cat_vars should really be cat_cols (or in the new way category_cols). as they are columns in the dataframe and not really variables. Thoughts?

And if so, expanding further:

dep_col
category_cols
contin_cols

Perhaps some more rounded up word for cont/contin/?


#30

In v1, we currently don’t have differential lrs at the moment, which is why it’s written lr for now. I don’t know yet how we will deal with the differential learning rates, so we will see if that lr becomes lrs or not :wink:


(Jeremy Howard) #31

@stas I mean cat not category. But thinking about it more, that’s a problem because “cat” could mean “category” or it could mean “categorical”, and they’re really different things that are likely to appear in the same method, so that’ll be confusing! So I think we should say for cl in classes after all…

I really did mean o for lambdas, since we could well have situations where we have a tensor in the outer scope called x - and I try to never have anything in the outer scope called o.

BTW, I’m not sure I’ve seen anyone else on the forum using multi-level quoting. Personally I don’t find it that necessary, because when you quote with the UI (i.e not just with >) then you get a hyperlink back to the original post, so it’s easy to see the whole context that way. I use that a lot to navigate threads that I haven’t been previously involved in.


(Stas Bekman) #32

I mean cat not category. But thinking about it more, that’s a problem because “cat” could mean “category” or it could mean “categorical”, and they’re really different things that are likely to appear in the same method, so that’ll be confusing! So I think we should say for cl in classes after all…

ok, so cl/classes for classes/categories
and cat reserved exclusively for categorical

BTW, I’m not sure I’ve seen anyone else on the forum using multi-level quoting. Personally I don’t find it that necessary, because when you quote with the UI (i.e not just with >) then you get a hyperlink back to the original post, so it’s easy to see the whole context that way. I use that a lot to navigate threads that I haven’t been previously involved in.

I’m old school and keeping relevant context the way it was done 20 years ago is a way more efficient and it requires the users to think a little bit to keep what’s important and trimming what’s not. Skipping back and force between a mix of flattened messages on various topic is so inefficient. But oh well, the lazy manager reply on top email style won, the geeks lost. I’m fine with the new new thing.

At the very least quoting feature could keep the quoted text’ markdown intact, yet it removes all markdown and you have to put it back or just make the communication less clear.


(Stas Bekman) #33

So here is the summary of what has been discussed (agreed on?) so far:

1) data 

prefixes:

train
valid
test

suffixes:

w/o   DataBunch object. 
df    DataFrame
ds    DataSet
dl    DataLoader

2) tensors

x     generic parameter name for tensors (forward(x) in nn.Module)
indep independent variable tensor
dep   dependent variable tensor 

3) loops

b     batch (from a dataloader)
xb    x parts of the batch
yb    y parts of the batch

4) lambdas

o     lambda arg

5) pandas

dep_col   name of the dependent column passed to proc_df
cat_col   single categorical column
cat_cols  multiple categorical columns   

6) classes

cl       single class/category (cls and class are reserved)
classes  list of classes/categories

7) categorical vars
   
cat   single categorical var
cats  list of categorical vars

if I missed anything please let me know.

Once confirmed/agreed on I can merge it into abbr.md.


(Jeremy Howard) #34

FYI me pressing “like” on that means “confirmed” :wink:


(Stas Bekman) #35

Thank you for clarifying that, Jeremy.


(Jeremy Howard) #36

I assume by ‘var’ here you mean a Pandas Series. In which case I imagine we’ll be using cat_col and cat_cols. Although it’ll be a while before we get to Pandas stuff so this might change.


(Stas Bekman) #37

Right, I adjusted the summary @ Abbreviations for train/validation/test variables. Thank you, @jeremy


(s.s.o) #38

Instead of def normalize(mean,std,x) and def denorm() -> normalize / denormalize or norm/denorm will be nice.


(Jeremy Howard) #39

@stas one problem I’ve noticed in the new notebooks is that I’m sometimes using ‘x’ for a tensor in a transform, and sometimes ‘img’. Really these should all be ‘img’ if they’re specifically transforms for images - I think it’s helpful to know what a tensor represents, where that’s possible.


(Jeremy Howard) #40

denormalize is better. norm has a specific linear algebra meaning so we shouldn’t use that for normalization.