Thank You very much @jeremy and @rachel and @yinterian. Applying fastai library to a real life Structured Dataset problem was extremely useful. The idea of embeddings instead of simple dummy variables is amazing.
Lesson 3 (Rossman notebook – structured data sets) alone is worth the course. Besides all that we have clear explanations on tons of useful topics like image classification, CNN, convolutions, working on Kaggle competitions, AWS (with a decent amount of credits for free!), etc. In parallel started watching the Machine Leaning material and discovered another treasure!
While waiting for part2, I will be gaining experience on all this material and applying whatever I can in real projects and Kaggle competitions.
Grateful for this experience and please count on me to help if you need.
I have taken many courses online but @jeremy your teaching approach and style is different and very practical. At first it feels like, is it this simple, just run few lines and you get world’s best classifier? And actually it is once you understand the theory, reasons and optimization’s behind them. I still have to catch up on lot of topics and then try to solve some real world problems, by that time it will be time for part2…
Just want to say thank you and your team for your time in creating this course and offering to general public. Hats off to your patience and dedication in this field. There is lots to learn from you, not just ML/DL. Glad I found your course.
My journey begins…
Does anyone recall if we covered how to merge e.g. categorical features with, say, image or textual data into a single model? In looking at p1v1 of this course, I see that in Keras you would use the merge function. I’m wondering how this is done in fastai / pytorch . I tried to do it here in PredictHappinessDataset but I don’t know if that was correct or not. (That code runs without error, I’m just not convinced it helps the results)
It seems to me you could also use this sort of mechanism to add features from pretrained models, such as word2vec, etc. You might not want to train them further, but they could be additional features.
This is the last batch of “Video Timelines for Part 1 V2”, I hope this will help those of you who like to review specific parts of the lessons.
@Jeremy: I remember that you edited (shortened) the 2016/2017 videos captured in-class, before posting them in the public version of the MOOC. As a result, my current timelines posts in Part 1 V2 may become out-of-sync.
Video timelines for Lesson 7
(Updated for the final video version, thanks to @hiromi )
00:03:04 Review of last week lesson on RNNs,
Part 1, what to expect in Part 2 (start date: 19/03/2018)
00:08:48 Building the RNN model with ‘self.init_hidden(bs)’ and ‘self.h’, the “back prop through time (BPTT)” approach
00:17:50 Creating mini-batches, “split in 64 equal size chunks” not “split in chunks of size 64”, questions on data augmentation and choosing a BPTT size, PyTorch QRNN
00:23:41 Using the data formats for your API, changing your data format vs creating a new dataset class, ‘data.Field()’
00:24:45 How to create Nietzsche training/validation data
00:35:43 Dealing with PyTorch not accepting a “Rank 3 Tensor”, only Rank 2 or 4, ‘F.log_softmax()’
00:44:05 Question on ‘F.tanh()’, tanh activation function,
replacing the ‘RNNCell’ by ‘GRUCell’
00:47:15 Intro to GRU cell (RNNCell has gradient explosion problem - i.e. you need to use low learning rate and small BPTT)
00:53:40 Long Short Term Memory (LSTM), ‘LayerOptimizer()’, Cosine Annealing ‘CosAnneal()’
01:01:57 Back to Computer Vision with CIFAR 10 and ‘lesson7-cifar10.ipynb’ notebook, Why study research on CIFAR 10 vs ImageNet vs MNIST ?
01:08:54 Looking at a Fully Connected Model, based on a notebook from student ‘Kerem Turgutlu’, then a CNN model (with Excel demo)
01:21:54 Refactored the model with new class ‘ConvLayer()’ and ‘padding’
01:25:40 Using Batch Normalization (BatchNorm) to make the model more resilient, ‘BnLayer()’ and ‘ConvBnNet()’
01:36:02 Previous bug in ‘Mini net’ in ‘lesson5-movielens.ipynb’, and many questions on BatchNorm, Lesson 7 Cifar10, AI/DL researchers vs practioners, ‘Yann Lecun’ & ‘Ali Rahimi talk at NIPS 2017’ rigor/rigueur/theory/experiment.
01:52:43 Replace the model with ResNet, class ‘ResnetLayer()’, using ‘boosting’
01:58:38 ‘Bottleneck’ layer with ‘BnLayer()’, ‘ResNet 2’ with ‘Resnet2()’, Skipping Connections.
02:02:01 ‘lesson7-CAM.ipynb’ notebook, an intro to Part #2 using ‘Dogs v Cats’.
02:08:55 Class Activation Maps (CAM) of ‘Dogs v Cats’.
02:14:27 Questions to Jeremy: “Your journey into Deep Learning” and “How to keep up with important research for practioners”,
“If you intend to come to Part 2, you are expected to master all the techniques in Part 1”, Jeremy’s advice to master Part 1 and help new students in the incoming MOOC version to be released in January 2018.
@EricPB I post the edit video the day after each class and link it from the wiki. It looks like at least for lesson 7 you’ve used the automatically saved version of the live stream? Did you use that for the other timelines too? If so, as you say, we’ll need to redo them.
I may have made a mistake on this Lesson 7 session indeed.
Right now, with Christmas time arriving and family & friends coming over to Stockholm, timing is pretty tight.
What is the deadline to fix those timelines before open/public release ?
That would be great! If you edit the original wiki post you can see the markdown format of the timeline - so the idea would be to replicate that, but with the corrected times using the posted lesson video.
Jus to confirm, nn.Parameter(torch.zeros(nf,1,1)) in BnLayer (Lesson 7 CIFAR-10 nb) can also be replaced by Variable(torch.zeros(nf,1,1), requires_grad = True). They will have same functionality. Right?
Are RNNs only used in NLP or other areas of research also?
Please let me know if I am right or wrong. I am thinking of RNN as a model for predicting over a sequence of events.
For instance, we can predict sequence of moves to be taken by Rafael Nadal in a live match, given his first 3 moves (for a single rally). OR we can predict trajectory followed by a taxi given only it’s first 4-5 coordinates. (like for taxi trajectory competition).
Is it right to think of these examples? If not, what other uses of RNN can there be?
So, I tried to search for answer to my own question. This post turned out to be helpful.
I think it is better to think of RNN as model for unknown/ varying number of inputs and outputs (than just a sequence of events)
Summary of a few use cases –
• 1 image as input --> varying length english sentence describing image as output (1 to many mapping)
• 1 video as input --> varying length english sentences describing different frames of video as output (many to many)
• 1 varying length sentence as input --> 1 output telling +ve or -ve sentiment (many to 1)