Lesson 3 - Official Topic

radikubwa · April 1, 2020, 2:39am

Can crontab be used for incremental training since you can schedule code to run at different intervals? Depending on how you wrote your code.

joshfp · April 1, 2020, 2:39am

Do you think Multi-Armed Bandits helps breaking feedback loops?

ilovescience · April 1, 2020, 2:39am

Jeremy’s notebook is not shared on the screen. Is this on purpose?

maya · April 1, 2020, 2:40am

Any feedback affects and biases future data.
That’s awesome @rachel

sgugger · April 1, 2020, 2:40am

But you can, if you save your previous models regularly, and the data used to train them. Andrei Karpathy gave a very good talk about that at the PyTorch dev summit in 2018. I’ll try to find the link to it tomorrow.

Yolo · April 1, 2020, 2:40am

Would that rule out deployment in situations where review is illegal/unethical ? e.g. security cams for indoor spaces . Call/Speech privacy ?

rfhink · April 1, 2020, 2:41am

Would it be possible to break a feedback loop by adding some noise? This problem seems similar to overfitting, at least to me.

init_27 · April 1, 2020, 2:41am

My Favourite blog on Why you, yes YOU should blog

(Will add to Top wiki during break)

Moody · April 1, 2020, 2:42am

For my pathology project, I let the assessors to annotate the same set of images individually and then use Intraclass Correlation Coefficient in R, mean differences graph and overlay all annotations for each image. We did it at the beginning of the project and identify individual bias prior to mass annotations. Also, we are preparing a gold standard guide to include exceptions during the process.

I would like to hear your feedback on our approach.

sgugger · April 1, 2020, 2:42am

You can’t train a model without labeled data, so you need a process to anonymize the data in ways that lets you have humans seeing it. Never trust a deep learning model if it hasn’t been properly validated on labeled data.

MJB · April 1, 2020, 2:43am

Yes, either classifying disease/non-diseased tissue or grading/staging severity.

When reporting the effects of interventions sometimes multiple observers are used and they are (double) blinded to control vs treated to minimize bias.

I’m just wondering if we should be following this general principal during the manual phase?

rachel · April 1, 2020, 2:44am

Advice for Better Blog Posts: slightly more detailed advice and avoiding common pitfalls

ilovescience · April 1, 2020, 2:44am

How to set up fastpages:

ilovescience · April 1, 2020, 2:46am

If you struggle with the questionnaire, check out the solutions here:

FraPochetti · April 1, 2020, 2:47am

It would if you don’t use that data wisely.
This is what semi-supervised learning (SSL) is all about, and it is a huge thing!
I work in a fintech in application credit scoring and I can tell you that SSL (we call it reject inference in the financial domain) is super important.
In a nutshell, the general idea is to:

build a solid model
run inference on unlabeled data
pick only the predictions the model is VERY (according to a threshold you set) confident about. E.g. in case of binary classification, predictions with very low/high probabilities.
add these new data points to the originally labeled dataset and train a new model
keep iterating

EDIT: Look at this paper for context. I implemented it at work and it works really well!

ilovescience · April 1, 2020, 2:48am

In terms of blogging, I have always wondered: even though there is a larger beginner audience, there are also probably more beginner posts too, right? This is why I am unsure especially about writing beginner tutorial blog posts.

sut · April 1, 2020, 2:48am

I’m having trouble viewing relevant docs in fastai.

I want to know what the unique parameter does when I run:
dls.train.show_batch(max_n=8, nrows=2, unique=True):

I’ve tried doc(show_batch) but I’m not getting info or even able to ctrl+f “unique” and find anything, does anyone have pointers of how to do this?

sgugger · April 1, 2020, 2:49am

Updating fastai
This was added yesterday by @lgvaz and is in the new release from today. In general, it’s a good thing to run an update just before the course as we make a release each Tuesday during the period it runs

lgvaz · April 1, 2020, 2:50am

unique will plot a batch of the same images. This is used for checking how your transforms look on a single item

sut · April 1, 2020, 2:55am

I git pulled an hour ago, doc() takes me to github source but I still can’t trace the code, as I’m assuming it’s in kwargs of methods call within this method?

Looking for general advice on reading the docs -> understanding things