Lesson 3 In-Class Discussion ✅

raviraj_mg · November 9, 2018, 4:41am

How can we handle noisy images - please let us know are there any standard de noising techniques ?

sgugger · November 9, 2018, 4:41am

Not sure, but we’re still not using torchtext.

howkhang · November 9, 2018, 4:41am

I get a bunch of consecutive xxfld tokens at the beginning of my text when I create a TextLMDataBunch. What does it stand for?

xxfld 1 0 xxfld 2 0 xxfld 3 0 xxfld 4 0 xxfld 5 0 xxfld 6 1 xxfld 7 0 xxfld 8 0 xxfld 9 0 xxfld 10 0 xxfld 11 0 xxfld 12 0 xxfld 13 0 xxfld 14 0 xxfld 15 0 xxfld 16 0 xxfld 17 0 xxfld 18 0 xxfld 19 0 xxfld 20 0 xxfld 21 0 xxfld 22 0 xxfld 23 0 xxfld 24 0 xxfld 25 0 xxfld 26

simonw · November 9, 2018, 4:41am

I believe rare words are considered to be words which only appear once in the entire dataset - if a word appears twice or more I don’t think it gets put in xxunk. If a word is so rare it only appears once it can’t be used for training (as it is guaranteed to over-fit to just that single record).

gamino · November 9, 2018, 4:42am

xxfld means Field 1, Field 2, etc, somehow your feeding the data so it thinks each is a field. When I feed data to text it’s just label, text (not tokenize)

jeremy · November 10, 2018, 12:36am

4 posts were merged into an existing topic: Lesson 3 Advanced Discussion

angelinayy · November 9, 2018, 4:43am

i actually have cases that these single appearance keywords are important and cannot be ignored.

sgugger · November 9, 2018, 4:43am

The exact rule is 60k words max, and then an appearance of at least twice (otherwise there’s nothing to learn), but your general statement is true.

howkhang · November 9, 2018, 4:44am

Thanks! Now this all makes sense. I have a multilabel csv with 26 fields of labels which must be the reason why it’s picking those up.

gamino · November 9, 2018, 4:45am

Correct! It basically is a way to signal the neural network that this field is different than the other.

ram_cse · November 9, 2018, 4:46am

Since there is convolution operation behind the scene there is no new weight add up.

devforfu · November 9, 2018, 4:47am

Do you think that universal approximation theorem is something similar to Fourier series for functions? Like, you can decompose any function into a bunch of sin/cos and same here you can build any N-dimensional surface from these “rectangular platos”?

whatrocks · November 9, 2018, 4:48am

I think the other key is figuring out how many (and which) layers to include or not

rachel · November 9, 2018, 4:48am

Jeremy will cover NLP and ULMFit in much more detail in a future lesson. This was just a brief example.

sgugger · November 9, 2018, 4:49am

Very similar yes. I haven’t read the details of the proof, but I’m pretty sure both of them use the same mathematical theorem behind the scenes.

jcatanza · November 9, 2018, 4:49am

Yes, I think it’s the same concept.

GiantSquid · November 9, 2018, 4:49am

I understand how fully connected layers relate to linear models, but don’t convolution layers do something sort of different?

cedric · November 9, 2018, 4:49am

Use SentencePiece Byte-Pair-Encoding (BPE): https://github.com/google/sentencepiece ?

hwasiti · November 9, 2018, 4:50am

Some satellite images has 4 channels, how can we deal with 4 channels or 2 channels datasets using pretrained models?

ertan · November 9, 2018, 4:53am

Character level language models are interesting too http://karpathy.github.io/2015/05/21/rnn-effectiveness/