It is always positive as long as the information is relevant and you regularize accordingly.
Don’t all these date parts add a ton of co-linearity among features?
Do you have any advice for combining image data with tabular data?
For example, say you have images of stores, and you also have tabular information about those stores like their locations’ coordinates.
Co-linearity isn’t really a problem, in that you are not trying to get a model with the fewest features, but rather just to make accurate predictions.
Categorify is a fastai specific feature, or is part of pandas itself ?
It’s a processor of fastai.
Would explaining fastai processors be part of this or an advanced discussion?
He answered that question in lesson 4 (not 100% sure). But what you do is have two model and combine them. He answered this for combining NLP tokenized data with metadata
Processors in general is advanced. If you have a question on any of the ones Jeremy is explaining right now, go ahead!
Is day 365 that much different from day 1 of next year? But on this scale it will be.
One thing Jeremy didn’t say with FillNA is that it only makes modifications on continuous variables. On categorical variables, the NaNs have a special code (-1) so we don’t need to create a new column for them.
Ok, my questions are more in general and how everything is handled going through the parameters. I will put it in a new post if I still have answers after this lecture.
No because it’s a categorical variable. So the code 365 might no be that different from the code 1. The model will learn this.
Can Jeremy explain how/which categorical variables and how/which continuous variables become the independent variables?
How to deal with categorical variables that have high cardinalities? or How to deal with situation where you have categories that exist in the test set but not in training set?
what if test test set has a value of a categorical variable that is not in your training set?
Where does “procs” come from when adding it to an item list creator to run pre-processors on your data? How do you specify which pre-processors to include in procs for your data set?
Here the dependent variable is sales.
The categories that don’t exist in your validation/test set will be set to unknown (or -1 in pandas).
Does categorify remember sub-categories (like Jan, Feb would be a sub-set of Jan, Feb, Mar, Apr). Does it matter at all?