Lesson 1 - Official topic

akashpalrecha · March 18, 2020, 6:23am

@harish3110, the error rate is calculated on the validation data itself, and not on the training data.

Also, if you optimize your model training too much to get a good score on a particular fixed validation set, even if you haven’t used the validation data to update your model weights, it still might be seen as overfitting the validation set. The reasoning here is that your training is probably only good for that particular validation set and may not work as well for other data since you’ve optimized for that validation set all this while. Maybe your current validation set contains very few examples of a class that is hard to predict on but is actually much more present in the test set and other samples of real world data? In this case your model would perform poorly in a real use case. And that would be called overfitting.

EDIT:
One more thing: the validation loss isn’t calculated with every forward pass. After the end of one epoch on the training data, the model is evaluated on the full validation set and the validation loss is thus calculated.

0tist · March 18, 2020, 7:01am

Will this course discuss the concept of adversarial machine learning?

rsomani95 · March 18, 2020, 7:02am

To add to that, one forward pass looks at batch_size images. So if your dataset has 1000 images and your batch_size=10, that would be 100 forward passes for the completion of one epoch.

akashpalrecha · March 18, 2020, 7:17am

The current course notebooks don’t seem to be dealing with this, so my guess would be a No.

Lankinen · March 18, 2020, 9:47am

I’m experimenting this year with the Notion as a platform where I add my notes. It is faster to write and imo it looks cleaner than Medium.

The notes are part of my notebook as I’m paying member and I’m not sure if free plan would cover the whole course. These notes aren’t searchable from any search engine and there shouldn’t be public links available to my Notion. I have shared some other notes to maybe 2-4 people in the past but as far as I know no one is checking my notebook actively for anything new.

Please let me know if there is something I have missed or it’s not save enough and I will do the needed actions. I certainly don’t want to leak my notes before the course is publicly available for anyone outside this group.

abhikjha · March 18, 2020, 11:17am

Absolutely agree! I never did coding ever in my life nor its required in my daily job but I guess passion and interest to learn something surpasses everything…

init_27 · March 18, 2020, 11:19am

I think you’ve just put into words what brings the fastai family together and makes it great

Antoine.C · March 18, 2020, 11:26am

Following the “color red” game as described by Jeremy, any research on the following strategy?

Wash your hands.
Wash your face.
Wash your hands again.

A quick search on the web didn’t return anything particularly relevant.

jeremy · March 18, 2020, 2:08pm

I’ve just added the covid-19 video to the top post. Here it is:

Please help us share this with anyone you think might find it useful.

clck10 · March 18, 2020, 2:15pm

Thank you so much Jeremy! I have a whole list of people who need to see this…

jamesrequa · March 18, 2020, 5:43pm

You are correct that the validation loss isn’t used in updating the weights and the validation metrics are a good guide to see if your model is overfitting during the training process, however since you are typically optimizing your hyperparameters iteratively to improve the validation metrics it is still possible that your model could overfit to the validation set. This is the reason why we have yet another portion of data called the test set which was not used in any part of the training process.

For production, how the model performs on the test set is what ultimately matters most and if you find that your validation metrics are highly inconsistent with your test set results you should take more care in selecting the validation set so that it is a closer representation of the test set.

sinsji · March 18, 2020, 6:39pm

ICU beds occupied in the Netherlands due to COVID-19:

16 March: 96 (8.3% or 16.7%)
17 March: 135 (11.7% or 23.5%)
18 March: 171 (14.9% or 29.7%)
25 March: aproximately 600 beds occupied (total nr of beds evolving with nationwide upscaling)

Total capacity: 1150 of which 575 reserved for COVID-19 patients.
Dutch parlement decidid today ‘lock-down’ is not needed.

Quote by chair of Dutch ICU association: “When we feel that the increase is going too fast, we have consultations with the RIVM (Dutch public health institution), which is in contact with the Cabinet again.”

This is when the virus has not yet spread to Amsterdam and other major cities in the West of the Netherlands, like it has done so far in the southern provinces.

Data from John Hopkins University (second hump in development?):

(Excellent) Link

Edward · March 18, 2020, 8:38pm

A great first lesson that feels more polished and well paced than the previous course (and that was a great course in itself). It might be because I studied the previous course, but as a student you feel “safe” and the calm and concise style is comfortable while keeping your interest primed. I really liked the conceptual breakdown and description of machine learning while the historical background gave context. Thanks @jeremy, @sgugger and @rachel and everyone else involved for all the work you have put into creating another great course and this time with a book!

barnacl · March 18, 2020, 10:15pm

Is it as simple as this: " we have studied flu for a long time and still can’t really contain it and prevent deaths. Covid-19 is NOVEL !!!" ? Or is this logic flawed ??

Albertotono · March 18, 2020, 10:39pm

Dear All,
where can I find the documentation for ImageDataLoaders.from_name_func in order to understand it better?
Furthermore, playing with it is also fun but it takes some time (5min) every time I rin in Colab.

I move seed to 115 and I got these results.

lgvaz · March 18, 2020, 10:43pm

You can run in the notebook:

doc(ImageDataLoaders.from_name_func)

barnacl · March 18, 2020, 11:05pm

github.com

fastai/fastai2/blob/ac5233c33423ce6ea7f1bfac79e774531a9b55e5/fastai2/vision/data.py#L79




PointBlock.__doc__ = "A `TransformBlock` for points in an image"
BBoxBlock.__doc__  = "A `TransformBlock` for bounding boxes in an image"


# Cell
def BBoxLblBlock(vocab=None, add_na=True):
    "A `TransformBlock` for labeled bounding boxes, potentially with `vocab`"
    return TransformBlock(type_tfms=MultiCategorize(vocab=vocab, add_na=add_na), item_tfms=BBoxLabeler)


# Cell
class ImageDataLoaders(DataLoaders):
    "Basic wrapper around several `DataLoader`s with factory methods for computer vision problems"
    @classmethod
    @delegates(DataLoaders.from_dblock)
    def from_folder(cls, path, train='train', valid='valid', valid_pct=None, seed=None, vocab=None, item_tfms=None,
                    batch_tfms=None, **kwargs):
        "Create from imagenet style dataset in `path` with `train` and `valid` subfolders (or provide `valid_pct`)"
        splitter = GrandparentSplitter(train_name=train, valid_name=valid) if valid_pct is None else RandomSplitter(valid_pct, seed=seed)
        dblock = DataBlock(blocks=(ImageBlock, CategoryBlock(vocab=vocab)),
                           get_items=get_image_files,
                           splitter=splitter,

??ImageDataLoaders.from_name_func or ImageDataLoaders.from_name_func?? should show you the code too, and the last line will also tell you the path

jeremy · March 19, 2020, 2:36am

Not really - the population size of a country isn’t really a factor in spread until the % infected is a significant proportion of the total population.

jeremy · March 19, 2020, 2:41am

IIRC around 90% go to O’Reilly.

jeremy · March 19, 2020, 2:48am

I’m using our university’s computer. It has Titan RTX cards. (Not recommended unless you’re lucky enough, like us, to have a generous donor paying for them!)