a slightly more detailed note for live coding 16 (built from Video timeline by @fmussari)
00:00 - Start
01:04 - About Weighting (WeightedDL
)
01:50 - Nick has been applying all the techniques learnt from chp2 to Paddy competition. Jeremy has not practiced curriculum learning.
03:08 - Distribution of the test set vs training set. Why we don’t want balanced dataset? We like our training set to be more like the test set. When is the appropriate situation to use WeightedDL
? when the distribution of training set is different from test set.
03:35 - Is Curriculum Learning related to Boosting? What is Boosting and what’s the pitfall of boosting if not careful? What is curriculum learning? To use more often the subset of data which the model did poorly
04:38 - Are the labels ever wrong by accident or real life complexity? Do read the techniques in chp2 and experiment on the techniques which should be enough for this case
06:40 - Image annotation issues: Paddy Kaggle discussion 6
Don’t knock out hard samples, still don’t knock out wrongly labelled samples as the test set may be consistently wrongly labelled too. Again review chap 2.
08:23 - UNIFESP X-ray Body Part Classifier Competition 6
10:20 1 - Medical images / DICOM Images
What are the troubles of using this type of medical images?
10:57 1 - fastai for medical imaging
There is a small sublibrary called fastai.vision.medical
can handle DICOM directly
11:40 1 - JPEG 2000 Compression and fastai medical image tutorial available
12:40 - ConvNet Paper and Syvian’s AdamW blog post
13:50 - On Research Field
15:30 1 - When a paper is worth reading?
A paper from a Kaggle competition or good results from experiments with less data or less time
Papers on transfer learning and people you read and liked before and their colleagues
17:14 - Quoc V. Le 2
17:50 - What to do when your model is trained on the dataset which is not quite the same to the data samples during deployment? Try to capture the data during deployment because these are the real data you want to train your model with. Also try to use semi-supervised learning and transfer learning to maximize the juice of the data you collected during deployment.
20:30 What would you do when some of the dataset have updated or changed to some extent, e.g., a new medical equipment is producing new images for your dataset? use fine-tuning, and it won’t take much time nor data for your model to be fine-tuned. It won’t solve this problem by training the entire dataset longer.
21:33 - What if you don’t have enough data for some category? So Don’t use the model for this category. Use Binary Sigmoid as last layer instead of SoftMax. Have a human review in the loop
23:50 - Question about submitting to Kaggle
Create a good validation set is very important
24:50 - Always have a validation set. When a random split is appropriate? What should the validation set be like? Should it be as similar to test set and deployment set as possible? You should check whether training set and the test set have similar distribution. If the test set and training set are not randomly selected, then you should be alarmed.
27:19 Radek comments on the uses of comparing your validation set results ( as many as you can since stored locally) and public leaderboards results (only 2 per day)
29:30 - Where did we get to in the last lesson?
31:20 - GradientAccumulation
on Jeremy’s Scaling Up: Road to the Top, Part 3 Notebook
37:20 - “Save & Run” a Kaggle notebook
38:55 1 - Plans for the next lessons: How outputs (multi-target loss, softmax, x-entropy loss, binary sigmoid) and inputs (embeddings in collaborative filtering) to a model looks like?
40:55 - Plans for Next lessons: How the “middle” (convnet) of a model looks like
Plan for the next lesson or lesson 8
41:32 - How to debug middle layers? It will be in Part 2: Deep dive into the middle layers for advanced debugging techniques in previous part 2 and collaborative filtering will lead us into it.
42:53 - The Ethical Side also deserve more attention, a lecture video 2020 from Rachel.
44:30 - fastai1/courses/dl1/excel/ 2 How underappreciated has Excel been and how useful and helpful Excel is actually.