Thanks for the reply.
After thinking about this for a while, I realize that the . is used here to represent the end of a ‘sentence’(only a word in this case), hence representing that idea of a pause from one thing to another.
Thanks for the reply.
After thinking about this for a while, I realize that the . is used here to represent the end of a ‘sentence’(only a word in this case), hence representing that idea of a pause from one thing to another.
Has anyone worked on or read about abstractive summarization in NLP using Fastai or PyTorch? Any resources i.e. papers, blog posts etc. would also be helpful!
@wgpubs blurr library has an example of using BART for text summarisation using fastai and huggingface
Hi friends! I am getting a weird error when trying to load a model
learn_inf = load_learner(model_path)
File "C:\Users\maciamug\.conda\envs\imdb\lib\site-packages\fastai2\learner.py", line 520, in load_learner
res = torch.load(fname, map_location='cpu' if cpu else None)
File "C:\Users\maciamug\.conda\envs\imdb\lib\site-packages\torch\serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\Users\maciamug\.conda\envs\imdb\lib\site-packages\torch\serialization.py", line 773, in _legacy_load
result = unpickler.load()
File "vocab.pyx", line 580, in spacy.vocab.unpickle_vocab
File "C:\Users\maciamug\.conda\envs\imdb\lib\site-packages\srsly\_pickle_api.py", line 23, in pickle_loads
return cloudpickle.loads(data)
AttributeError: Can't get attribute 'cluster' on <module 'spacy.lang.lex_attrs'
This was working last week. Have there been any changes?
When I generate a DataBlock
for classification for sentiment analysis of the IMDB dataset, I find that my first few batches are almost entirely padding (xpad
). But when I train a classifier, I still get an ok accuracy (0.86 after one round of fit_one_cycle
)
I’ve looked at as many as 50 batches with show_batch
, and all but the first batch are entirely padding. Has anyone encountered this? Does anyone have any thoughts on how I can best investigate this further?
I do not know the answer myself. But …
From Chapter 10 information about why this is the case
We will expand the shortest texts to make them all the same size. To do this, we use a special padding token that will be ignored by our model. Additionally, to avoid memory issues and improve performance, we will batch together texts that are roughly the same lengths (with some shuffling for the training set). We do this by (approximately, for the training set) sorting the documents by length prior to each epoch. The result of this is that the documents collated into a single batch will tend of be of similar lengths. We won’t pad every batch to the same size, but will instead use the size of the largest document in each batch as the target size.
The sorting and padding are automatically done by the data block API for us when using a
TextBlock
, withis_lm=False
. (We don’t have this same issue for language model data, since we concatenate all the documents together first, and then split them into equally sized sections.)
My understanding there is a sample in this dataset that is of large size which causes this additional padding tokens. But I do not know how to validate this because of my minimal understanding of SortedDL.
@init_27 @muellerzr - Could you please assist with this question?
It’s a bug, however it seems no one has opened a bug report on it (@jeremy I know you were working on text for a while, I presume you saw this issue?)
Doesn’t this mean that most of the sequences inside this batch are far less than the default of 72. As you pointed out, I also feel one sample in the dataset is too long compared to the others
IIRC it’s actually that there’s one review that’s much longer than the others, so the others have a lot of padding. And only the first few tokens are shown, so the non-padding bit isn’t visible.
I get exactly the same error, replicated the code in two machines same results.
However I don’t get a good classification error basically totally off.
From lesson #8 running the Notebook #10 nlp Creating the Classifier Data Loader
Notebook #10 everything was perfect up to the language model then when moving into the Classifier Data Loader I face the problem of the xxpad many times.
I understand your explanation on the padding, however the results after the training are completely off which leads me to believe is not the padding issue but a true problem with the data and thus the notebook.
I ran the notebook in two different machines, and get the same problem.
Appreciate your help here.
did u solved it?, I am also getting the same one
Hi Rachel
If I have processed my data with SubWordTokenizer() and created a tmp with spm.model and spm.vocab files. How would I pass this to TextBlock.from _folder() if I want to use it for another model without retraining the tokenizer?
Thanks!
Hi everybody,
I’m not sure if I understand batch size and sequence length for RNNs. In the example from chapter 10, bs is 128 and seq_len is 80. My understanding was that the concatenated text is split in 128 mini-streams, and the first batch consists of the first 80 tokens of every mini-stream. (See the toy example on page 340 in the print version)
When I print the batch however, it looks like it’s the same line 128 times:
Can somebody explain this? It is not the case when creating the DataLoader manually:
Thank you!
Alright, new day new try. I could not reproduce this today, all lines of the batch are different now (as they should be, following the explanation in the chapter).
I don’t know what caused this, but I’m now pretty sure my understanding is correct of the batches is correct.
Hi everyone, quick question. In the implementation of the Dropout layer, why is the mask defined in this complicated way?
class Dropout(Module):
def __init__(self, p): self.p = p
def forward(self, x):
if not self.training: return x
mask = x.new(*x.shape).bernoulli_(1-p)
return x * mask.div_(1-p)
Why not just use the bernoulli
method that is not inplace?
mask = x.bernoulli(1-p)
Thank you for any ideas.
One more question. How does weight tying in the AWD-LSTM work, although the layers have different dimensions?
self.i_h = nn.Embedding(vocab_sz, n_hidden)
self.h_o = nn.Linear(n_hidden, vocab_sz)
self.h_o.weight = self.i_h.weight
Blockquote
h = F.relu(self.h_h(self.i_h(x[:,0])))
h = h + self.i_h(x[:,1])
h = F.relu(self.h_h(h))
h = h + self.i_h(x[:,2])
h = F.relu(self.h_h(h))
return self.h_o(h)
hey guys,i’m little confused by this diagram and the corresponding code,in the diagram there are only two orange arrows(which means self.h_h() + Relu()),but in the code there are three lines of F.relu(self.h_h(h)).So shouldn’t there be another orange arrow before the blue one going to the output?
Hi, I think we can view the “circles” as representations of F.relu(self.h_h(...))
, the rectangles as representations of self.i_h(x)
and the triangle as the representation of self.h_o(h)
, then it makes sense. The first self.h_h
is applied to the input only, that’s why the arrow is green.
But I agree, the text says the colored arrows indicate identical weight matrices, so there should be a third orange arrow somewhere for the image to match the text…