Hi,
I wonât make it today to join the group, but let me share with you folks a great summary of the NLP scene in 2019 https://medium.com/dair-ai/nlp-year-in-review-2019-fb8d523bcb19
The revised and annotated notebook 2b_odds_and_ends_jcat.ipynb
that we worked from during the 1/04/2020 meetup is now available in our study groupâs git repository.
I guarantee that working through it will help with your understanding of lesson 2!
By the way, if your New Yearâs Resolution is to learn NLP, now is a great time to make a start!
Just Do It! Join the NLP Study Group. We are making our way through the lessons at a leisurely pace â there is plenty of time to catch up!
@jcatanza take 2 : sorry for the poorly worded question
Iâm looking at lecture 4 (7:24) https://youtu.be/hp2ipC5pW4I and Rachel talks about the PAD special token. I copied the definition below.
PAD
(xxpad) is the token used for padding, if we need to regroup several texts of different lengths in a batch
I canât follow the definition and my googling is not helping to find another explanation. Is there another way to explain what this token means?
Hi @foobar8675 could you please restate your question a bit more clearly? I think I have the gist, but I need clarification. Thanks!
The Fastai NLP Study Group will meet
Saturday January 11, at 8 AM PST, 11 AM EST, 5 PM CET, 9:30 PM IST
Join the Zoom Meeting when itâs time!
Topic: Sentiment Classification with NaĂŻve Bayes
Suggested homework / preparation:
1. Watch NLP video #4
Video playlist is here
2. Read and work through the notebook Sentiment Classification of Movie Reviews (using Naive Bayes, Logistic Regression, and Ngrams) up to but not including the NaĂŻve Bayes section
Course notebooks are available on github
To join via Zoom phone
Dial US: +1 669 900 6833
or +1 646 876 9923
Meeting ID: 832 034 584
The current meetup schedule is here.
Sign up here to receive meetup announcements via email.
see if this explanation by Rachel from video 18 helps. She explained the padding (watch about 15 seconds of it).
I wasnât sure what an empty row with the Compressed Sparse Row https://youtu.be/hp2ipC5pW4I would look like so I googled it and found this. https://stackoverflow.com/questions/43771387/compressed-sparse-row-csr-how-do-you-store-empty-rows and wrote an example.
so given that, if the first row in Rachelâs example, 22, 23, 25 were all empty, then would the first 4 RowPtrs should look like 0,3,3,6
just thought Iâd share - since I wasnât sure myself.
No, youâre not stupid if you were confused by the explanation of CSR (Compressed Sparse Row) representation given by the Emory University website that was discussed in video #4.
The reason is that the authors gave a sloppy and incomplete definition of CSR!!!
So, here is a proper explanation of CSR based on material from this Wikipedia article.
Given a full matrix A with m rows, n columns, and N nonzero values, the CSR (Compressed Sparse Row) representation is stored using three arrays as follows:
-
Val[0:N]
contains the values of the N non-zero elements of A -
Col[0:N]
contains the column indices in A of the N non-zero elements. -
RowPointer[0:m+1]
For each row i of A,RowPointer[i]
contains the index inCol
of the first nonzero value in row i. If there are no nonzero values in the ith row, thenRowPointer[i] = 0
And, by convention, an extra entryRowPointer[m] = N
is tacked on at the end.
Question: How many floats
and ints
does it take to store the matrix A in CSR format?
I pushed an updated version of 3-logreg-nb-imdb_jcat.ipynb to the NLP Study Group repo. If you missed Saturdayâs meetup (and even if you didnât!), you can catch up with/review the material in video #4
(even if you havenât watched it yet!) by reading , and running, and playing with this notebook, down to but not including Section 8: Naive Bayes
.
looks like it starts next week
Sorry I didnât understand â what starts next week?
getting an error in 3-logreg-nb-imdb
with this cell in colab
m = LogisticRegression(C=0.1, dual=True)
m.fit(x, y.items.astype(int))
preds = m.predict(val_term_doc)
(preds==val_y).mean()
first error i got was
ValueError: Solver lbfgs supports only dual=False, got dual=True
so i naively set to False, since per the docs, that sounds like the right thing to do anyways
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
which is when i get a very different error
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/_logistic.py:940: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
0.655
@jcatanza have you seen this error?
Hi @foobar8675
I obtained good results with the liblinear
and newton-cg
solvers. The other solvers got poorer results and failed to converge.
Hi all. Due to a family emergency, I must cancel todayâs (Saturday 1/18) NLP class. The class will resume next week as usual.
The good news is that I have refactored and annotated the 3-logreg-nb-imdb.ipynb
notebook and pushed it to github here https://github.com/jcatanza/Fastai-A-Code-First-Introduction-To-Natural-Language-Processing-TWiML-Study-Group/blob/master/3-logreg-nb-imdb_jcat.ipynb
3-logreg-nb-imdb_jcat.ipynb
is a self-contained tutorial on Naive Bayes and Logistic Regression applied to the IMDb data. I think youâll find it useful!
Todayâs assignment: please get the 3-logreg-nb-imdb_jcat.ipynb
notebook and use the 1.5 class hours to read, run, play with, and learn from it!
Have a great weekend, and Iâll see you next week.
that makes sense. thanks!
The Fastai NLP Study Group will meet
Saturday January 25, at 8 AM PST, 11 AM EST, 5 PM CET, 9:30 PM IST
Join the Zoom Meeting when itâs time!
Topic: Sentiment Classification with NaĂŻve Bayes and Logistic Regression
Suggested homework / preparation:
-
Read and work through my extensively refactored and annotated version of the
3-logreg-nb-imdb.ipynb
notebook -
Note: in order to run my version of the notebook youâll need to
fork or clone
the study group repository
To join via Zoom phone
Dial US: +1 669 900 6833
or +1 646 876 9923
Meeting ID: 832 034 584
The current meetup schedule is here.
Sign up to receive meetup announcements via email.