Remote NLP Study Group meets Saturdays at 8 AM PST, starting 12/14/2019

Hi,

I won’t make it today to join the group, but let me share with you folks a great summary of the NLP scene in 2019 https://medium.com/dair-ai/nlp-year-in-review-2019-fb8d523bcb19

1 Like

The revised and annotated notebook 2b_odds_and_ends_jcat.ipynb that we worked from during the 1/04/2020 meetup is now available in our study group’s git repository.

I guarantee that working through it will help with your understanding of lesson 2!

By the way, if your New Year’s Resolution is to learn NLP, now is a great time to make a start!

Just Do It! Join the NLP Study Group. We are making our way through the lessons at a leisurely pace – there is plenty of time to catch up!

@jcatanza take 2 : sorry for the poorly worded question

I’m looking at lecture 4 (7:24) https://youtu.be/hp2ipC5pW4I and Rachel talks about the PAD special token. I copied the definition below.

PAD (xxpad) is the token used for padding, if we need to regroup several texts of different lengths in a batch

I can’t follow the definition and my googling is not helping to find another explanation. Is there another way to explain what this token means?

Hi @foobar8675 could you please restate your question a bit more clearly? I think I have the gist, but I need clarification. Thanks!

The Fastai NLP Study Group will meet
Saturday January 11, at 8 AM PST, 11 AM EST, 5 PM CET, 9:30 PM IST

Join the Zoom Meeting when it’s time!

Topic: Sentiment Classification with NaĂŻve Bayes

Suggested homework / preparation:

1. Watch NLP video #4

Video playlist is here

2. Read and work through the notebook Sentiment Classification of Movie Reviews (using Naive Bayes, Logistic Regression, and Ngrams) up to but not including the NaĂŻve Bayes section

Course notebooks are available on github

To join via Zoom phone
Dial US: +1 669 900 6833 or +1 646 876 9923
Meeting ID: 832 034 584

The current meetup schedule is here.

Sign up here to receive meetup announcements via email.

see if this explanation by Rachel from video 18 helps. She explained the padding (watch about 15 seconds of it).

1 Like

That is very helpful. Thanks @wyquek!

I wasn’t sure what an empty row with the Compressed Sparse Row https://youtu.be/hp2ipC5pW4I would look like so I googled it and found this. https://stackoverflow.com/questions/43771387/compressed-sparse-row-csr-how-do-you-store-empty-rows and wrote an example.

so given that, if the first row in Rachel’s example, 22, 23, 25 were all empty, then would the first 4 RowPtrs should look like 0,3,3,6

just thought I’d share - since I wasn’t sure myself.

Brilliant. Using a GPT-2 language model to play chess.
This is why I love AI.

2 Likes

No, you’re not stupid if you were confused by the explanation of CSR (Compressed Sparse Row) representation given by the Emory University website that was discussed in video #4.

The reason is that the authors gave a sloppy and incomplete definition of CSR!!!

So, here is a proper explanation of CSR based on material from this Wikipedia article.

Given a full matrix A with m rows, n columns, and N nonzero values, the CSR (Compressed Sparse Row) representation is stored using three arrays as follows:

  1. Val[0:N] contains the values of the N non-zero elements of A
  2. Col[0:N] contains the column indices in A of the N non-zero elements.
  3. RowPointer[0:m+1] For each row i of A, RowPointer[i] contains the index in Col of the first nonzero value in row i. If there are no nonzero values in the ith row, then RowPointer[i] = 0 And, by convention, an extra entry RowPointer[m] = N is tacked on at the end.

Question: How many floats and ints does it take to store the matrix A in CSR format?

I pushed an updated version of 3-logreg-nb-imdb_jcat.ipynb to the NLP Study Group repo. If you missed Saturday’s meetup (and even if you didn’t!), you can catch up with/review the material in video #4 (even if you haven’t watched it yet!) by reading , and running, and playing with this notebook, down to but not including Section 8: Naive Bayes.

looks like it starts next week

Sorry I didn’t understand – what starts next week?

getting an error in 3-logreg-nb-imdb with this cell in colab

m = LogisticRegression(C=0.1, dual=True)
m.fit(x, y.items.astype(int))
preds = m.predict(val_term_doc)
(preds==val_y).mean()

first error i got was

ValueError: Solver lbfgs supports only dual=False, got dual=True

so i naively set to False, since per the docs, that sounds like the right thing to do anyways

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

which is when i get a very different error

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
0.655

@jcatanza have you seen this error?

Hi @foobar8675
I obtained good results with the liblinear and newton-cg solvers. The other solvers got poorer results and failed to converge.

1 Like

Hi all. Due to a family emergency, I must cancel today’s (Saturday 1/18) NLP class. The class will resume next week as usual.

The good news is that I have refactored and annotated the 3-logreg-nb-imdb.ipynb notebook and pushed it to github here https://github.com/jcatanza/Fastai-A-Code-First-Introduction-To-Natural-Language-Processing-TWiML-Study-Group/blob/master/3-logreg-nb-imdb_jcat.ipynb

3-logreg-nb-imdb_jcat.ipynb is a self-contained tutorial on Naive Bayes and Logistic Regression applied to the IMDb data. I think you’ll find it useful!

Today’s assignment: please get the 3-logreg-nb-imdb_jcat.ipynb notebook and use the 1.5 class hours to read, run, play with, and learn from it!

Have a great weekend, and I’ll see you next week.

that makes sense. thanks!

The Fastai NLP Study Group will meet
Saturday January 25, at 8 AM PST, 11 AM EST, 5 PM CET, 9:30 PM IST

Join the Zoom Meeting when it’s time!

Topic: Sentiment Classification with NaĂŻve Bayes and Logistic Regression

Suggested homework / preparation:

  1. Watch video #5; Video playlist is here

  2. Read and work through my extensively refactored and annotated version of the 3-logreg-nb-imdb.ipynb notebook

  3. Note: in order to run my version of the notebook you’ll need to fork or clone the study group repository

To join via Zoom phone
Dial US: +1 669 900 6833 or +1 646 876 9923
Meeting ID: 832 034 584

The current meetup schedule is here.

Sign up to receive meetup announcements via email.

You can read (but not run) the notebook for this week’s discussion in nbviewer.