How to use library with chest x-ray 14 dataset

hi folks ,hope you are enjoying Christmas. recently i heard and read some article about chest x-ray 14 dataset so i was wondering how to use library in order to produce some appreciable results.

i humbly request to all the experienced practitioners to provide your feedback on how should i approach chest x-ray 14 dataset should i start using resnet34 or vvg 16 or some other architecture what so be the image size i should use with the architecture and most importantly which data augumentation transformation i should use will sideon will work with this dataset too.

once again merry Christmas and a advance happy new year.


I am not sure what the dataset is but i think you should try both of them out. It might additionally require you to freeze some layers and fine tune the network as well.

1 Like

thank you for your response ,in order to have a better intuition about the dataset please do have a look on the dataset through this link

I’ve been working on this same dataset as well, starting yesterday. I’m comparing Ng et al paper and the paper from NIH. Some people have noted problems with the dataset as in the article above.

My approach so far is using Lesson 3 as starting point

I’m not sure how to deal with “Bounding boxes”. It seems some areas of particular images are of more interest. There’s a csv file noting these regions of interest. I’m gonna visit Lesson 7 and try to get it down.

I also noticed @jeremy responded through a tweet to the author who was pointing out issues with the dataset – but I still think it’s a relevant dataset to try. I’m personally vested in an imaging tool like this being useful in regions of the world where radiologists are scarce.

1 Like

Any luck so far?

Bounding boxes aren’t something we covered in this year’s part 1 course, so you’d be best off using the keras lesson 7 you referred to - unless @yinterian is able to help, since she wrote the bounding box stuff for fastai.


The library has the functionality you need to work with bounding boxes. Here is a notebook that has some of it.

Here is a notebook that shows that the transforms work with bounding boxes.

I am quite interested in this paper. Unfortunately at the moment I am preparing to teach a machine learning class and I won’t have time to work on it.


nope,just trying to understand this paper


mainly ,i’m unable to decide what to do when i read these lines,because through the time we always try to get good predictions to get a high performance .may be my inexperience in this domain is not allowing me to move further or to make any step further
could any one guide me along or share their rich expertise in the context of the lines which i have mentioned above

here i found another link regarding the same dataset,it might be helpful as well

As an addendum:

There’s a Notebook on Kaggle that uses Keras to get an accuracy rate around 73%

I’m not able to make the progress I thought I would using the FastAI library as I’m stuck on something that should be minor to solve. I think the Keras option might be a good starting point for anyone stuck.

Kinda of a noob question:

Is 70%'ish currently about the best as it gets in terms of CXR detection of cancer / lesion /etc. ? I wonder about this because based on the 2017 data science bowl, most kernels I’ve seen on kaggle seem to split the data in such a way that validation set almost always has about 2/3 cancer 1/3 no cancer. So i wonder if there’s a skew there.

Are there models that are well above 90%+ that anyone knows of?

Has anyone been able to get around 90% accuracy for the X-Ray images?