Lesson 3 In-Class Discussion ✅

angelinayy · November 9, 2018, 4:43am

i actually have cases that these single appearance keywords are important and cannot be ignored.

sgugger · November 9, 2018, 4:43am

The exact rule is 60k words max, and then an appearance of at least twice (otherwise there’s nothing to learn), but your general statement is true.

howkhang · November 9, 2018, 4:44am

Thanks! Now this all makes sense. I have a multilabel csv with 26 fields of labels which must be the reason why it’s picking those up.

gamino · November 9, 2018, 4:45am

Correct! It basically is a way to signal the neural network that this field is different than the other.

ram_cse · November 9, 2018, 4:46am

Since there is convolution operation behind the scene there is no new weight add up.

devforfu · November 9, 2018, 4:47am

Do you think that universal approximation theorem is something similar to Fourier series for functions? Like, you can decompose any function into a bunch of sin/cos and same here you can build any N-dimensional surface from these “rectangular platos”?

whatrocks · November 9, 2018, 4:48am

I think the other key is figuring out how many (and which) layers to include or not

rachel · November 9, 2018, 4:48am

Jeremy will cover NLP and ULMFit in much more detail in a future lesson. This was just a brief example.

sgugger · November 9, 2018, 4:49am

Very similar yes. I haven’t read the details of the proof, but I’m pretty sure both of them use the same mathematical theorem behind the scenes.

jcatanza · November 9, 2018, 4:49am

Yes, I think it’s the same concept.

GiantSquid · November 9, 2018, 4:49am

I understand how fully connected layers relate to linear models, but don’t convolution layers do something sort of different?

cedric · November 9, 2018, 4:49am

Use SentencePiece Byte-Pair-Encoding (BPE): https://github.com/google/sentencepiece ?

hwasiti · November 9, 2018, 4:50am

Some satellite images has 4 channels, how can we deal with 4 channels or 2 channels datasets using pretrained models?

ertan · November 9, 2018, 4:53am

Character level language models are interesting too http://karpathy.github.io/2015/05/21/rnn-effectiveness/

jcatanza · November 9, 2018, 4:53am

For 2 channels you could create a dummy 3rd channel that is the average of the 2 channels

AlexisGallagher · November 9, 2018, 4:53am

Like if another channel is from a depth sensor, like in the head pose data from the Kinect?

BryanHennessy · November 9, 2018, 4:53am

are certain shapes, or classification problems, more suited to say relu. for example with fourier (sp) transforms takes a ton of terms to build a step change, but one term to build a sine wave. is there something analogous four relu and the size of architecture, or size of middle layers?

jcatanza · November 9, 2018, 4:55am

For 4 channels, you could try to do some kind of dimensionality reduction (linear combinations of channels?) to transform to 3 channels.

robert.hryniewicz · November 9, 2018, 4:56am

Does predict function return multiple labels for multi-label classification?

source99 · November 9, 2018, 4:57am

Is it possible to create a model that can predict a class and a regression number by just creating the correct databunch?