Lesson 3 In-Class Discussion ✅

How can we handle noisy images - please let us know are there any standard de noising techniques ?

2 Likes

Not sure, but we’re still not using torchtext.

1 Like

I get a bunch of consecutive xxfld tokens at the beginning of my text when I create a TextLMDataBunch. What does it stand for?

xxfld 1 0 xxfld 2 0 xxfld 3 0 xxfld 4 0 xxfld 5 0 xxfld 6 1 xxfld 7 0 xxfld 8 0 xxfld 9 0 xxfld 10 0 xxfld 11 0 xxfld 12 0 xxfld 13 0 xxfld 14 0 xxfld 15 0 xxfld 16 0 xxfld 17 0 xxfld 18 0 xxfld 19 0 xxfld 20 0 xxfld 21 0 xxfld 22 0 xxfld 23 0 xxfld 24 0 xxfld 25 0 xxfld 26

1 Like

I believe rare words are considered to be words which only appear once in the entire dataset - if a word appears twice or more I don’t think it gets put in xxunk. If a word is so rare it only appears once it can’t be used for training (as it is guaranteed to over-fit to just that single record).

1 Like

xxfld means Field 1, Field 2, etc, somehow your feeding the data so it thinks each is a field. When I feed data to text it’s just label, text (not tokenize)

4 posts were merged into an existing topic: Lesson 3 Advanced Discussion :white_check_mark:

i actually have cases that these single appearance keywords are important and cannot be ignored.

The exact rule is 60k words max, and then an appearance of at least twice (otherwise there’s nothing to learn), but your general statement is true.

Thanks! Now this all makes sense. I have a multilabel csv with 26 fields of labels which must be the reason why it’s picking those up.

Correct! It basically is a way to signal the neural network that this field is different than the other.

Since there is convolution operation behind the scene there is no new weight add up.

1 Like

Do you think that universal approximation theorem is something similar to Fourier series for functions? Like, you can decompose any function into a bunch of sin/cos and same here you can build any N-dimensional surface from these “rectangular platos”?

4 Likes

I think the other key is figuring out how many (and which) layers to include or not

Jeremy will cover NLP and ULMFit in much more detail in a future lesson. This was just a brief example.

7 Likes

Very similar yes. I haven’t read the details of the proof, but I’m pretty sure both of them use the same mathematical theorem behind the scenes.

2 Likes

Yes, I think it’s the same concept.

1 Like

I understand how fully connected layers relate to linear models, but don’t convolution layers do something sort of different?

1 Like

Use SentencePiece Byte-Pair-Encoding (BPE): https://github.com/google/sentencepiece ?

4 Likes

Some satellite images has 4 channels, how can we deal with 4 channels or 2 channels datasets using pretrained models?

17 Likes

Character level language models are interesting too http://karpathy.github.io/2015/05/21/rnn-effectiveness/

1 Like