Or if you want to be at the bleeding edge, there are instructions to set it up such that you can git pull each time you want to updated.
Please check the course repo for the instructions to set it up.
Or if you donāt want the bleeding edge, you keep it conda installed.
I have both installations on 2 different conda env(s), I do an update everyday when I wake up. (It might be a better idea to automate it )
Since Iām running the latest and greatest pytorch (1.01), the code didnāt work right away, I had to do some minor porting. I have only two GPUs (he used 8), so I had to make a few minor changes to the multi-gpu attention code, but eventually it working fairly well. I was curious how this would compare to the RNN code using attention model from 2018 Part 2 Lesson 11, so I got that notebook running again (requires only 1 minor change, but uses fastai 0.7). I was about to port the Alexander Rush notebook, when I first decided to see if Jeremy planned to cover this in an upcoming lesson, and sure-enough found the translation-transformer notebook in the course notebooks.
I know Jeremy doesnāt like to get ahead and that we will be covering this in class, but for right now, all I need to move forward are the csv question files, which will be in the giga_fren.tar file when itās released. If anyone knows the correct format for those files, Iād be grateful.
If anybody wants the updated Alexander Rushās updated notebook, let me know how I can post it and itās yours.
I use the Slip Box method. You can learn about it in How to Take Smart Notes by Sƶnke Ahrens. Basically, during class, I shut off my laptop and write manual notes in my own words in a notebook. Then I transfer those notes into a permanent store (I write to a synced Dropbox volume using iA Writer and a simple indexing system). Later, Iāll curate and link those notes. I may watch the video a few times until I can express the ideas in a context-free, clear way. This is how I know I have a deep grasp of the conversation.
I also work a lot with Jupyter notebooks. Iāll write my own notebooks, apply the same code with small tweaks. This is advanced beginner work in the Dreyfus model. Andy Hunt talks about this in Pragmatic Thinking and Learning. Basically, a novice can read tutorials and attend classes. An advanced beginner can tweak those ideas a little while they experiment. Someone competent can see the whole picture and decide which steps to take, finishing their own projects with errors and support. A proficient person has done this enough times to fix errors while they make them. Finally, people seek advice from someone who has become an expert. Learning to be OK with not knowing, working on a practice when I feel out of my depth is the important thing. Itās a lot like playing the whole game, as Jeremy encourages us to do.
This course is a bit hard because Jeremy can see things we need to know before we realize we need to know them, so we donāt quite follow naturally into the next step, we just trust that this makes the code more beautiful, or more performant, or more flexible for things I didnāt know I was going to need. What Iāve done with my own notebooks is explore around the areas where Iām least confident, until I can repeat to myself or in my notes cold why something is interesting and how it fits in a larger context.
One final mode thatās worked for me is to organize all my notes into a structured summary that has:
Citation/Title: just the course number, or a citation for an article Iām reading related to the course.
Terms and Phrases: I write my own definitions.
Questions and Discussion: these are my questions, the main points distilled.
Relation of Material: how this material compliments or contradicts other things Iāve worked on.
Application: ways I could use this information.
General Response: what I think about the lesson, chapter, or article.
I write these final notes from my permanent notes after Iāve had plenty of time to practice with the code and work on each idea context-free. The value of this is in writing them because writing (code or notes) is a form of thinking that reinforces my ability to make decisions, see patterns, and retain a useful grasp on the material. Itās an iterative process. I donāt rush any of it. Sometimes an article can be summarized the same afternoon I read it, but something as rich and open as these lessons take me longer to appreciate and internalize.
It can be very useful to try doing things a different way to what I show, if you think thereās another way that might be easier - and then you either find that later on, yes, there is now a problem and you can really see why I did it that way. Orā¦ you find my way was kinda stupid, and you can let me know and I can learn something new!
I have a similar approach to notes. Hand written are very important for me to organize my ideas and I use different colors to break it up. My categories are:
HW
Practice
Sharing
MISC
Summary: at the end
With an example of my table of contents for week 1.
I know this notebook is not done yet and probably will be changed before being presented in class. So here are a couple of notes:
In the āPreTrained embeddingsāto install fastText: cell: you might want to encourage the use of the pip install git+ form as pip install git+https://github.com/facebookresearch/fastText.git
also to run the nb I had to download the latest and gretaest fastai (1.0.51.dev0) (or the model= Seq2SeqARNN cell gave an error) and that version does not include the get_bleu() function or method. This might be a TBD item.
EDIT: Iām going to create a new topic on the fastai users forum about this now that Iāve dug up a bit more about it.
I could use some help figuring out why my model is training so slowly.
I am comparing a vast.ai RTX 2080 Ti instance to my P5000 Paperspace machine.
The main reason I need to switch from Paperspace is that my model needs 64Gb of RAM to train. I only have 30 Gb on Paperspace so half of my data is loaded from the disk which is slow.
I would expect the vast.ai training to be a lot faster (better GPU, more RAM). In a way, the GPU seems to run through the batches much faster. However, there is a 20 second delay at the beginning of each epoch, and also another delay right before we go through the validation data at the end of each epoch.
I have a similar delay on Paperspace, but half shorter.
Within the fit function, we get to this part of the code:
for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
xb, yb = cb_handler.on_batch_begin(xb, yb)
loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
if cb_handler.on_batch_end(loss): break
The first iteration of this loop takes forever to start (~20 seconds), whereas the next iterations are much faster. So it seems that something is happening with the dataloader the first time we use it.
Iāve hesitated to organize a study group since I live remotely and fly in for class every week. Are there other remote people taking this course or groups amenable to remote participation? Iām working on a question and answer NLP system, if that supports/inspires next steps.