Part 2 Lesson 10 wiki

All you have to do is make sure you do:
from fastai import *

everything you need will be imported automatically!

Can ULMFiT pre-trained model useful in text summarization? I’ve seen works in text classification. Any examples for summarization using this pre-trained model available? If so, point me there. Thanks.

seq2seq is not part of fastai yet (afaik).

Im getting this error when Im running the notebook of lession 10 => ‘Tokenizer’ object has no attribute ‘proc_all_mp’,
I’ve seen the code, there is no “proc_all_mp” is implemented?? is that code has been changed??
How can I solve this?? Please help

I was using latest version of fastai before, this got resolved by downgrading to fastai 0.7

I don’t understand 1st issue about why range is between n_lbls+1 and len(df.columns).
Did you manage to understand that? If you did, please tell me this reason.

seems like for i in range(n_lbls+1, len(df.columns)) is activated only when you have more than one text column.

Suppose if you have 4 labels,df[0] to df[3], followed by 3 text columns, df[4] to df[6]:

n_lbls=4
for i in range(n_lbls+1, len(df.columns)) will become for i in range(5,7), which will add df[5] and df[6] to text

In most cases where you have only 1 label followed by 1 text column, giving n_lbls=1
that range will be for i in range(2,2), which hence doesn’t add anything to text

1 Like

I don’t come up with the case where there are more than one label.
But, your reply helps me understand that this code can be applied for not just imdb, but other cases.
Thx!!

I am getting the same error “ValueError: not enough values to unpack (expected 2, got 1)”. Did you find any solution for this error?

Has anyone tried making predictions on a large dataset? I have a 100 gb dataset with 1 billion rows and I want the fastai model to predict the sentiment on each row. Although I have a server, and the dataset would probably fit into the ram, the kernel keeps dieing when I use the fastai dataloader.

Another thing I tried is use parquet files. However I am unable to generate an iterator from parquet files.

Any suggestions on how to proceed? Any advise is good advise :slight_smile:

Have you found anything for text summarization? Thanks.

Has anyone gotten colab to run the lesson 10 imdb notebook without crashing from RAM issues?

I know the solution involves doing batches and reducing the loaded RAM, but I’m not sure how to implement it and just wanted to play with NLP more than debug and fix the data loader.

I had a similar issue. Basically the save-and-load functions of fast.ai were somewhat broken so i was dealing with text i loaded several cells up. These were loaded using data_lm = TextDataBunch.from_csv(path, 'texts.csv'), which eventually caused a problem with my GPU not having enough memory. Changing the code to data_lm = TextLMDataBunch.from_csv(path, 'for_lm.csv') instead seemed to do the trick and I was able to proceed with the notebook.

I did it by loading the wikitext model, i.e. learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3), and then skipping to the language generation cell without running the model training cell. The predictions look very much like wikipedia text.


I hope this isn’t a double post, but the highlighted link leads to an ‘Object not found error.’ on the Stanford website, does anyone know where it actually leads to?

I think this is the link http://web.stanford.edu/class/cs224n/

1 Like

can we train this model to build hindi-english transliteration ?

Just to add my 2¢ here, you can think that word2vec is a one layer network which you will use as feature extractor. If you had multiple layers, it would be better.

1 Like