Lesson 6 In-Class Discussion ✅

lesscomfortable · November 28, 2018, 2:51am

It is always positive as long as the information is relevant and you regularize accordingly.

Lothar · November 28, 2018, 2:52am

Don’t all these date parts add a ton of co-linearity among features?

mkolodny · November 28, 2018, 2:53am

Do you have any advice for combining image data with tabular data?

For example, say you have images of stores, and you also have tabular information about those stores like their locations’ coordinates.

rachel · November 28, 2018, 2:54am

Co-linearity isn’t really a problem, in that you are not trying to get a model with the fewest features, but rather just to make accurate predictions.

apparle · November 28, 2018, 2:54am

Categorify is a fastai specific feature, or is part of pandas itself ?

sgugger · November 28, 2018, 2:54am

It’s a processor of fastai.

KevinB · November 28, 2018, 2:55am

Would explaining fastai processors be part of this or an advanced discussion?

crostino · November 28, 2018, 2:55am

He answered that question in lesson 4 (not 100% sure). But what you do is have two model and combine them. He answered this for combining NLP tokenized data with metadata

sgugger · November 28, 2018, 2:55am

Processors in general is advanced. If you have a question on any of the ones Jeremy is explaining right now, go ahead!

rohitr · November 28, 2018, 2:56am

Is day 365 that much different from day 1 of next year? But on this scale it will be.

sgugger · November 28, 2018, 2:56am

One thing Jeremy didn’t say with FillNA is that it only makes modifications on continuous variables. On categorical variables, the NaNs have a special code (-1) so we don’t need to create a new column for them.

KevinB · November 28, 2018, 2:57am

Ok, my questions are more in general and how everything is handled going through the parameters. I will put it in a new post if I still have answers after this lecture.

sgugger · November 28, 2018, 2:57am

No because it’s a categorical variable. So the code 365 might no be that different from the code 1. The model will learn this.

nbharatula · November 28, 2018, 2:58am

Can Jeremy explain how/which categorical variables and how/which continuous variables become the independent variables?

crostino · November 28, 2018, 2:58am

How to deal with categorical variables that have high cardinalities? or How to deal with situation where you have categories that exist in the test set but not in training set?

bholmer · November 28, 2018, 2:58am

what if test test set has a value of a categorical variable that is not in your training set?

techjoey · November 28, 2018, 2:58am

Where does “procs” come from when adding it to an item list creator to run pre-processors on your data? How do you specify which pre-processors to include in procs for your data set?

rachel · November 28, 2018, 2:58am

Here the dependent variable is sales.

sgugger · November 28, 2018, 2:59am

The categories that don’t exist in your validation/test set will be set to unknown (or -1 in pandas).

dotkay · November 28, 2018, 2:59am

Does categorify remember sub-categories (like Jan, Feb would be a sub-set of Jan, Feb, Mar, Apr). Does it matter at all?