For my pathology project, I let the assessors to annotate the same set of images individually and then use Intraclass Correlation Coefficient in R, mean differences graph and overlay all annotations for each image. We did it at the beginning of the project and identify individual bias prior to mass annotations. Also, we are preparing a gold standard guide to include exceptions during the process.
I would like to hear your feedback on our approach.
You can’t train a model without labeled data, so you need a process to anonymize the data in ways that lets you have humans seeing it. Never trust a deep learning model if it hasn’t been properly validated on labeled data.
Yes, either classifying disease/non-diseased tissue or grading/staging severity.
When reporting the effects of interventions sometimes multiple observers are used and they are (double) blinded to control vs treated to minimize bias.
I’m just wondering if we should be following this general principal during the manual phase?
It would if you don’t use that data wisely.
This is what semi-supervised learning (SSL) is all about, and it is a huge thing!
I work in a fintech in application credit scoring and I can tell you that SSL (we call it reject inference in the financial domain) is super important.
In a nutshell, the general idea is to:
build a solid model
run inference on unlabeled data
pick only the predictions the model is VERY (according to a threshold you set) confident about. E.g. in case of binary classification, predictions with very low/high probabilities.
add these new data points to the originally labeled dataset and train a new model
keep iterating
EDIT: Look at this paper for context. I implemented it at work and it works really well!
In terms of blogging, I have always wondered: even though there is a larger beginner audience, there are also probably more beginner posts too, right? This is why I am unsure especially about writing beginner tutorial blog posts.
Updating fastai
This was added yesterday by @lgvaz and is in the new release from today. In general, it’s a good thing to run an update just before the course as we make a release each Tuesday during the period it runs
I git pulled an hour ago, doc() takes me to github source but I still can’t trace the code, as I’m assuming it’s in kwargs of methods call within this method?
Looking for general advice on reading the docs -> understanding things
This is a good question. I think it’s helpful to be even more specific about your audience-- not just beginners, but say, beginners with X particular background or Y previous experience. The more specific you can be about your audience the better. It’s also helpful to think about what specific things were missing or difficult for you 6 months ago. Was there a topic or concept that you couldn’t find an explanation that really made sense (even if there were plenty of posts about it)? An alternate way of approaching this is to spend time answering questions for others (potentially here on the forums) to see what is missing in the way of learning materials