Lesson 3 - Official Topic

For my pathology project, I let the assessors to annotate the same set of images individually and then use Intraclass Correlation Coefficient in R, mean differences graph and overlay all annotations for each image. We did it at the beginning of the project and identify individual bias prior to mass annotations. Also, we are preparing a gold standard guide to include exceptions during the process.

I would like to hear your feedback on our approach. :blush:

2 Likes

You can’t train a model without labeled data, so you need a process to anonymize the data in ways that lets you have humans seeing it. Never trust a deep learning model if it hasn’t been properly validated on labeled data.

2 Likes

Yes, either classifying disease/non-diseased tissue or grading/staging severity.

When reporting the effects of interventions sometimes multiple observers are used and they are (double) blinded to control vs treated to minimize bias.

I’m just wondering if we should be following this general principal during the manual phase?

Advice for Better Blog Posts: slightly more detailed advice and avoiding common pitfalls

16 Likes

How to set up fastpages:

6 Likes

If you struggle with the questionnaire, check out the solutions here:

1 Like

It would if you don’t use that data wisely.
This is what semi-supervised learning (SSL) is all about, and it is a huge thing!
I work in a fintech in application credit scoring and I can tell you that SSL (we call it reject inference in the financial domain) is super important.
In a nutshell, the general idea is to:

  1. build a solid model
  2. run inference on unlabeled data
  3. pick only the predictions the model is VERY (according to a threshold you set) confident about. E.g. in case of binary classification, predictions with very low/high probabilities.
  4. add these new data points to the originally labeled dataset and train a new model
  5. keep iterating

EDIT: Look at this paper for context. I implemented it at work and it works really well!

9 Likes

In terms of blogging, I have always wondered: even though there is a larger beginner audience, there are also probably more beginner posts too, right? This is why I am unsure especially about writing beginner tutorial blog posts.

2 Likes

I’m having trouble viewing relevant docs in fastai.

I want to know what the unique parameter does when I run:
dls.train.show_batch(max_n=8, nrows=2, unique=True):

I’ve tried doc(show_batch) but I’m not getting info or even able to ctrl+f “unique” and find anything, does anyone have pointers of how to do this?

Updating fastai :wink:
This was added yesterday by @lgvaz and is in the new release from today. In general, it’s a good thing to run an update just before the course as we make a release each Tuesday during the period it runs :slight_smile:

4 Likes

unique will plot a batch of the same images. This is used for checking how your transforms look on a single item

4 Likes

I git pulled an hour ago, doc() takes me to github source but I still can’t trace the code, as I’m assuming it’s in kwargs of methods call within this method?

Looking for general advice on reading the docs -> understanding things

1 Like

This is a good question. I think it’s helpful to be even more specific about your audience-- not just beginners, but say, beginners with X particular background or Y previous experience. The more specific you can be about your audience the better. It’s also helpful to think about what specific things were missing or difficult for you 6 months ago. Was there a topic or concept that you couldn’t find an explanation that really made sense (even if there were plenty of posts about it)? An alternate way of approaching this is to spend time answering questions for others (potentially here on the forums) to see what is missing in the way of learning materials

10 Likes

I don’t know if anyone has noticed that the fileupload widget is in version 7.5.0 of ipywidgets. I can’t seem to find it in newer versions.

No, it’s where it should be in TfmdDL.show_batch. Note that dls.train is a TfmdDL.

2 Likes

Yes, you need to pin that version.

2 Likes

On a CPU also, Is a PyTorch tensor more efficient than a numpy array for mathematical computations?

How can we display the number of pixels in our image?

On most functions, it’s the same speed.

1 Like

tensor.shape usually gives you that. Multiply the dimensions to get the number of pixels.

3 Likes