Lesson 4 In-Class Discussion ✅

KevinB · November 14, 2018, 3:28am

Can you explain what Normalize does in this result? Wouldn’t the Neural net be able to just have different weights that handle non-normalized data if you didn’t normalized it beforehand?

shaun1 · November 14, 2018, 3:28am

Are binary variables worth being represented by embeddings?

dotkay · November 14, 2018, 3:29am

can someone remind me how we handle rows which don’t have data for some features? like there’s no marital status information for few of the adults? Do we ignore those data points or average out similar ones? or average that feature over all the other data points?

angelinayy · November 14, 2018, 3:29am

can data augmentation be used for NLP?

sgugger · November 14, 2018, 3:29am

Neural nets always love normalized data.

mkolodny · November 14, 2018, 3:29am

Have you ever tried using embeddings for continuous variables?

PegasusWithoutWinds · November 14, 2018, 3:29am

Well, we first have to think of how to augment them. Image transformation a lot of times is simply matrix multiplication. Text is not that easy.

sgugger · November 14, 2018, 3:30am

For all of those who were asking questions about tokenization and numericalization in NLP, those are processors like the ones Jeremy is talking about right now.

nithanaroy · November 14, 2018, 3:30am

That’s a great start. But we have terabytes of tabular data. Can we use tf.dataset like thing which can read from disk in batches?

fredguth · November 14, 2018, 3:30am

@sgugger, this processors that Jeremy is talking about right now in tabular are the same kind of processor we have to find in text to set tokenizer to another language, max_vocab, etc?

anuraj · November 14, 2018, 3:30am

Is there a limit to the cardinality of a categorical feature?

angelinayy · November 14, 2018, 3:30am

such as adding noise? replace some words with synonyms?

simonw · November 14, 2018, 3:31am

So it looks like tabular data is the first example we’ve looked at where we don’t use transfer learning at all - we’re building our model from scratch.

PegasusWithoutWinds · November 14, 2018, 3:31am

Ah, that definitely warrants such feature. I will leave the question to people who understand the library better.

sgugger · November 14, 2018, 3:31am

Only because we haven’t found a huge dataset that we could learn from

gerardo · November 14, 2018, 3:32am

URLs.ADULT_SAMPLE
‘http://files.fast.ai/data/examples/adult_sample’

file does not exists when your run the example

nithanaroy · November 14, 2018, 3:32am

@sgugger any thoughts?

dwcar49us · November 14, 2018, 3:32am

Why do the continuous variables end up negative after normalization - I thought the range was 0 to 1?

radikubwa · November 14, 2018, 3:32am

Can I use K-nearest neighbors to impute missing values?

sgugger · November 14, 2018, 3:32am

No it’s mean 0 and std 1.