Chit Chat Thread

How does

ensure your fastai library is up to date work

If I do

conda update -c fastai fastai

it returns

#All requested packages already installed.

I understand from fastai docs that I should have used

conda install -c fastai fastai

Which is that the correct update behaviour

I do

conda install -c pytorch -c fastai fastai

to update everything :slight_smile:

2 Likes

Or if you want to be at the bleeding edge, there are instructions to set it up such that you can git pull each time you want to updated.
Please check the course repo for the instructions to set it up.

Or if you donā€™t want the bleeding edge, you keep it conda installed.

I have both installations on 2 different conda env(s), I do an update everyday when I wake up. (It might be a better idea to automate it :thinking:)

Ah sorry yes! I cloned the repo locally, then added it to my google drive, and opened it from there.

1 Like

Got it. That makes sense. Thank you!

1 Like

SeqToSeq with transformers. Iā€™ve been watching the YouTube online version of the Stanford CS224N: NLP with Deep Learning | Winter 2019 course and was curious about a pytorch model they mentioned; The Annotated Transformer by Alexander Rush at the Harvard NLP group. That pytorch model is an annotated jupyter notebook of the Attention Is All You Need paper by a bunch of people from Google Brain.

Since Iā€™m running the latest and greatest pytorch (1.01), the code didnā€™t work right away, I had to do some minor porting. I have only two GPUs (he used 8), so I had to make a few minor changes to the multi-gpu attention code, but eventually it working fairly well. I was curious how this would compare to the RNN code using attention model from 2018 Part 2 Lesson 11, so I got that notebook running again (requires only 1 minor change, but uses fastai 0.7). I was about to port the Alexander Rush notebook, when I first decided to see if Jeremy planned to cover this in an upcoming lesson, and sure-enough found the translation-transformer notebook in the course notebooks.

I know Jeremy doesnā€™t like to get ahead and that we will be covering this in class, but for right now, all I need to move forward are the csv question files, which will be in the giga_fren.tar file when itā€™s released. If anyone knows the correct format for those files, Iā€™d be grateful.

If anybody wants the updated Alexander Rushā€™s updated notebook, let me know how I can post it and itā€™s yours.

1 Like

I use the Slip Box method. You can learn about it in How to Take Smart Notes by Sƶnke Ahrens. Basically, during class, I shut off my laptop and write manual notes in my own words in a notebook. Then I transfer those notes into a permanent store (I write to a synced Dropbox volume using iA Writer and a simple indexing system). Later, Iā€™ll curate and link those notes. I may watch the video a few times until I can express the ideas in a context-free, clear way. This is how I know I have a deep grasp of the conversation.

I also work a lot with Jupyter notebooks. Iā€™ll write my own notebooks, apply the same code with small tweaks. This is advanced beginner work in the Dreyfus model. Andy Hunt talks about this in Pragmatic Thinking and Learning. Basically, a novice can read tutorials and attend classes. An advanced beginner can tweak those ideas a little while they experiment. Someone competent can see the whole picture and decide which steps to take, finishing their own projects with errors and support. A proficient person has done this enough times to fix errors while they make them. Finally, people seek advice from someone who has become an expert. Learning to be OK with not knowing, working on a practice when I feel out of my depth is the important thing. Itā€™s a lot like playing the whole game, as Jeremy encourages us to do.

This course is a bit hard because Jeremy can see things we need to know before we realize we need to know them, so we donā€™t quite follow naturally into the next step, we just trust that this makes the code more beautiful, or more performant, or more flexible for things I didnā€™t know I was going to need. What Iā€™ve done with my own notebooks is explore around the areas where Iā€™m least confident, until I can repeat to myself or in my notes cold why something is interesting and how it fits in a larger context.

One final mode thatā€™s worked for me is to organize all my notes into a structured summary that has:

  • Citation/Title: just the course number, or a citation for an article Iā€™m reading related to the course.
  • Terms and Phrases: I write my own definitions.
  • Questions and Discussion: these are my questions, the main points distilled.
  • Relation of Material: how this material compliments or contradicts other things Iā€™ve worked on.
  • Application: ways I could use this information.
  • General Response: what I think about the lesson, chapter, or article.

I write these final notes from my permanent notes after Iā€™ve had plenty of time to practice with the code and work on each idea context-free. The value of this is in writing them because writing (code or notes) is a form of thinking that reinforces my ability to make decisions, see patterns, and retain a useful grasp on the material. Itā€™s an iterative process. I donā€™t rush any of it. Sometimes an article can be summarized the same afternoon I read it, but something as rich and open as these lessons take me longer to appreciate and internalize.

6 Likes

It can be very useful to try doing things a different way to what I show, if you think thereā€™s another way that might be easier - and then you either find that later on, yes, there is now a problem and you can really see why I did it that way. Orā€¦ you find my way was kinda stupid, and you can let me know and I can learn something new! :slight_smile:

1 Like

All you need to download and prepare the questions is in the translation notebook.

2 Likes

I have a similar approach to notes. Hand written are very important for me to organize my ideas and I use different colors to break it up. My categories are:

  • HW
  • Practice
  • Sharing
  • MISC
  • Summary: at the end

With an example of my table of contents for week 1.

I use lots of jupyter notebooks to steal information, move items over test ideas. Also lots of onenote to track links and items I need to go back to.

2 Likes

Thanks Sylvain!

I know this notebook is not done yet and probably will be changed before being presented in class. So here are a couple of notes:

In the ā€œPreTrained embeddingsā€ to install fastText: cell: you might want to encourage the use of the pip install git+ form as
pip install git+https://github.com/facebookresearch/fastText.git

also to run the nb I had to download the latest and gretaest fastai (1.0.51.dev0) (or the model= Seq2SeqARNN cell gave an error) and that version does not include the get_bleu() function or method. This might be a TBD item.

The NB ran well, Thanks

#PerthMLGroup ā€¦its pretty active

EDIT: Iā€™m going to create a new topic on the fastai users forum about this now that Iā€™ve dug up a bit more about it.

I could use some help figuring out why my model is training so slowly.

I am comparing a vast.ai RTX 2080 Ti instance to my P5000 Paperspace machine.

The main reason I need to switch from Paperspace is that my model needs 64Gb of RAM to train. I only have 30 Gb on Paperspace so half of my data is loaded from the disk which is slow.

I would expect the vast.ai training to be a lot faster (better GPU, more RAM). In a way, the GPU seems to run through the batches much faster. However, there is a 20 second delay at the beginning of each epoch, and also another delay right before we go through the validation data at the end of each epoch.

I have a similar delay on Paperspace, but half shorter.

What could be the bottleneck here? CPU speed?

Thanks

To add to my post above:

Within the fit function, we get to this part of the code:

for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
xb, yb = cb_handler.on_batch_begin(xb, yb)
loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
if cb_handler.on_batch_end(loss): break

The first iteration of this loop takes forever to start (~20 seconds), whereas the next iterations are much faster. So it seems that something is happening with the dataloader the first time we use it.

pre training bert in 76 mins https://arxiv.org/abs/1904.00962 :grinning:

ā€¦ on a pod of TPUs :neutral_face:

1 Like

Iā€™ve hesitated to organize a study group since I live remotely and fly in for class every week. Are there other remote people taking this course or groups amenable to remote participation? Iā€™m working on a question and answer NLP system, if that supports/inspires next steps.

is lesson 10 done?

about to begin in 25 minutes

@drichards, question and answer NLP system, thats an area that i am also working on. We could discuss further on this