Lesson 6 In-Class Discussion ✅

It is always positive as long as the information is relevant and you regularize accordingly.

1 Like

Don’t all these date parts add a ton of co-linearity among features?

8 Likes

Do you have any advice for combining image data with tabular data?

For example, say you have images of stores, and you also have tabular information about those stores like their locations’ coordinates.

6 Likes

Co-linearity isn’t really a problem, in that you are not trying to get a model with the fewest features, but rather just to make accurate predictions.

6 Likes

Categorify is a fastai specific feature, or is part of pandas itself ?

It’s a processor of fastai.

3 Likes

Would explaining fastai processors be part of this or an advanced discussion?

1 Like

He answered that question in lesson 4 (not 100% sure). But what you do is have two model and combine them. He answered this for combining NLP tokenized data with metadata

Processors in general is advanced. If you have a question on any of the ones Jeremy is explaining right now, go ahead!

1 Like

Is day 365 that much different from day 1 of next year? But on this scale it will be.

1 Like

One thing Jeremy didn’t say with FillNA is that it only makes modifications on continuous variables. On categorical variables, the NaNs have a special code (-1) so we don’t need to create a new column for them.

7 Likes

Ok, my questions are more in general and how everything is handled going through the parameters. I will put it in a new post if I still have answers after this lecture.

No because it’s a categorical variable. So the code 365 might no be that different from the code 1. The model will learn this.

2 Likes

Can Jeremy explain how/which categorical variables and how/which continuous variables become the independent variables?

2 Likes

How to deal with categorical variables that have high cardinalities? or How to deal with situation where you have categories that exist in the test set but not in training set?

3 Likes

what if test test set has a value of a categorical variable that is not in your training set?

3 Likes

Where does “procs” come from when adding it to an item list creator to run pre-processors on your data? How do you specify which pre-processors to include in procs for your data set?

Here the dependent variable is sales.

The categories that don’t exist in your validation/test set will be set to unknown (or -1 in pandas).

1 Like

Does categorify remember sub-categories (like Jan, Feb would be a sub-set of Jan, Feb, Mar, Apr). Does it matter at all?

1 Like