Lesson 4 In-Class Discussion

Guess both are embeddings. Instead of using Word2Vec embeddings, we create our own from scratch.
Per @yinterian question, we could use both…like merging or something?

1 Like

When you use multiple languages, does this mean you have to use multiple models and select between them? Is there a way to use one big model?

1 Like

Interesting. As Jeremy said, the model learnt English. Maybe, when we’ve multiple languages, it’ll learn to distinguish? Thought - Start with a model and throw data to teach multiple languages and use that as a pre-trained model?

Word2vec learns an embedding matrix as well. There are multiple ways of learning embeddings.

3 Likes

@Jeremy, If you don’t mind going into it, how does this method of creating the language embeddings differ from those that would be created by running GloVe or word2Vec on the custom dataset?

Is this method superior because it’s leveraging some of the new techniques you’ve covered like gradient descent with restart and (I’m assuming) bi-lstms with dropout, or is there something else going on under the hood?

1 Like

@jeremy Is there any rule of thumb for choosing “categorical” or “continuous” when it comes to discrete variables? What are pros and cons of each method? For example, we could say “year” as both categorical and continuous. Thanks in advance.

Yes set to zero. So no gradients are propagated back from that unit for that mini batch.

Not with the fastai lib, unless you make a copy of it.

1 Like

Since the are binary, it doesn’t really matter where they go.

n*1 concat 4*1 --> (n+4)*1

Make sense?

2 Likes

With different types of datasets you can need very different amounts of dropout. The normal amounts for image CNNs are very different to structured data, for instance.

1 Like

Not really - you have a try a few amounts. But somewhere around this number should be OK. If you have less data, you’ll need less activations, or you’ll overfit.

1 Like

Yes it works great. Remind me to show you next week if I forget!

5 Likes

The point is to learn the embedding of ‘Sunday’. We don’t know up front what an appropriate encoding is.

Conceptually, yes, you can use the same approach, but it’s not in the library yet (or any other software). Would make for a good project!

3 Likes

Great point. I’d suggest adding columns for locality, area, city, county, and state, and then make them all (including zip code) separate categorical variables!

7 Likes

Yeah I guess SMOTE is, strictly speaking, a data augmentation technique for structured data. But I haven’t seen it used with these types of models or know if it’s helpful. It seems tricky to implement with embeddings - not quite sure how it would work…

It’s just what empirically has been found to work by NLP researchers. >600 hasn’t been found helpful so far. It seems the implicit dimensionality of words in English just isn’t higher than that.

2 Likes

bump! @jeremy

I am using Random Forest model for a project there we are analyzing URLs and content. But features looks like this:
http://kase.twemail.com.de/contact/dmB4L3686825_frIV688.iTBV516_92qwH.html 0.0,0.0,2570.0,783.0,0.0,0.0,0.0,1.0,0.0,0.0,20.0,6.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,1.0,1.0,1.0,3.0,0.0,1.0,103.0,1.0,2

Here 2 at the end id the label and rest are feature. Valid values for most of the feature is -1, 0, 1. Seems like features are already categorical with three categories. Not sure how to add more dimensions using embedding to leverage deep learning.