Can you explain what Normalize does in this result? Wouldn’t the Neural net be able to just have different weights that handle non-normalized data if you didn’t normalized it beforehand?
Are binary variables worth being represented by embeddings?
can someone remind me how we handle rows which don’t have data for some features? like there’s no marital status information for few of the adults? Do we ignore those data points or average out similar ones? or average that feature over all the other data points?
can data augmentation be used for NLP?
Neural nets always love normalized data.
Have you ever tried using embeddings for continuous variables?
Well, we first have to think of how to augment them. Image transformation a lot of times is simply matrix multiplication. Text is not that easy.
For all of those who were asking questions about tokenization and numericalization in NLP, those are processors like the ones Jeremy is talking about right now.
That’s a great start. But we have terabytes of tabular data. Can we use tf.dataset like thing which can read from disk in batches?
@sgugger, this processors that Jeremy is talking about right now in tabular are the same kind of processor we have to find in text to set tokenizer to another language, max_vocab, etc?
Is there a limit to the cardinality of a categorical feature?
such as adding noise? replace some words with synonyms?
So it looks like tabular data is the first example we’ve looked at where we don’t use transfer learning at all - we’re building our model from scratch.
Ah, that definitely warrants such feature. I will leave the question to people who understand the library better.
Only because we haven’t found a huge dataset that we could learn from
URLs.ADULT_SAMPLE
‘http://files.fast.ai/data/examples/adult_sample’
file does not exists when your run the example
Why do the continuous variables end up negative after normalization - I thought the range was 0 to 1?
Can I use K-nearest neighbors to impute missing values?
No it’s mean 0 and std 1.