Apparently things aren’t quite working. No errors but network not training. It’s a busy clinical week for me, so I prob won’t get to revisit until sometime this weekend.
@neuradai Thank you for update.
I didn’t get anywhere in terms of unzipping the dataset (if someone can post it or even a subset link I would appreciate it).
However…I did find something exciting and have tested it out on MNIST today with really great results . Rewriting some of the fastai stuff to use it for ImageNette next - it requires the loss to be passed into the step function in order to compute how to step so it’s not like Adam, etc. and thus doesn’t just plug into the current framework.
This is what I am testing:
and this is why it’s exciting:
"We present experiments on the CIFAR and SNLI data sets, where we demonstrate the significant superiority of our method over Adam, Adagrad, as well as the recently proposed BPGrad and AMSGrad. "
More importantly, in my first testing today it just smashed through a subset of MNIST:
I’ll test in on Imagenette next and see if this continues, but if so, it might be a very nice advantage for us for training.
Not sure what your exact problem is with windows and unzip. In the past I have installed cygwin on windows and used bash terminal to unzip . By the way I am unzipping (command line) the data (ubuntu 18.04 Zip V3.0) and it takes a few 10s of seconds. Also note below there are some warnings during the unzip process but it seems I have matched pierreguillou’s results. It maybe a zip version issue although when I run file MRNet-v1.0.zip it returns at least v1.0 to extract a very early version.
Note :-
Earlier versions of
zip/unzipcould not handle files larger than2 or 4GB
The latest sources and executables are here http
Try
zipinfoon the archive.
The output shortened from unzip and zip info :-
dl@DL1:~/mrnet-fastai$ unzip MRNet-v1.0.zip
Archive: MRNet-v1.0.zip
warning [MRNet-v1.0.zip]: 4294967296 extra bytes at beginning or within zipfile
(attempting to process anyway)
file #1: bad zipfile offset (local header sig): 4294967296
(attempting to re-compensate)
creating: MRNet-v1.0/
inflating: MRNet-v1.0/valid-abnormal.csv
creating: MRNet-v1.0/valid/
creating: MRNet-v1.0/valid/axial/
inflating: MRNet-v1.0/valid/axial/1139.npy
inflating: MRNet-v1.0/valid/axial/1138.npy
inflating: MRNet-v1.0/valid/axial/1249.npy
inflating: MRNet-v1.0/valid/axial/1248.npy
inflating: MRNet-v1.0/valid/axial/1207.npy
inflating: MRNet-v1.0/valid/axial/1213.npy
…
inflating: MRNet-v1.0/valid/.DS_Store
creating: __MACOSX/
creating: __MACOSX/MRNet-v1.0/
creating: __MACOSX/MRNet-v1.0/valid/
inflating: __MACOSX/MRNet-v1.0/valid/._.DS_Store
creating: MRNet-v1.0/valid/coronal/
…inflating: MRNet-v1.0/train/axial/.DS_Store
creating: __MACOSX/MRNet-v1.0/train/
creating: __MACOSX/MRNet-v1.0/train/axial/
inflating: __MACOSX/MRNet-v1.0/train/axial/._.DS_Store
inflating: MRNet-v1.0/train/axial/0593.npy
…
inflating: MRNet-v1.0/train/sagittal/0216.npy
file #2547: bad zipfile offset (local header sig): 1353202
(attempting to re-compensate)
inflating: MRNet-v1.0/train/sagittal/1108.npy
…
dl@DL1:~/mrnet-fastai$ info zipinfo
dl@DL1:~/mrnet-fastai$ zipinfo -h MRNet-v1.0.zip
Archive: MRNet-v1.0.zip
Zip file size: 6087523606 bytes, number of entries: 3784
warning [MRNet-v1.0.zip]: 4294967296 extra bytes at beginning or within zipfile
(attempting to process anyway)
dl@DL1:~/mrnet-fastai$ du -h MRNet-v1.0
2.2G MRNet-v1.0/train/sagittal
2.1G MRNet-v1.0/train/coronal
2.4G MRNet-v1.0/train/axial
6.6G MRNet-v1.0/train
230M MRNet-v1.0/valid/sagittal
222M MRNet-v1.0/valid/coronal
258M MRNet-v1.0/valid/axial
709M MRNet-v1.0/valid
7.3G MRNet-v1.0
The count of files inflated in MRNet-v1.0 :-
ls -lR MRNet-v1.0 | wc -l == 3790
ls -laR __MAXOSX | wc -l == 46mainly empty directories in.zip file directory
zipinfo with the -l option lists in long Unix format gives :-
zipinfo -l MRNet-v1.0 | wc -l
warning [MRNet-v1.0.zip]: 4294967296 extra bytes at beginning or within zipfile
(attempting to process anyway)
3789
Don’t understand the 3790, 3789 difference perhaps not an issue!!
Please be more specific about your problem if any of this does not enlighten you. Cheers
Oh I am from across the pond (GMT/UTC/BST depending on time of year ) so sleep and play (as I don’t work) are near opposite to yourselves.
Have implemented a Case-centered custom ItemList for MRNet data. See PR in github.
Great work everyone!
Hey, Less, sorry for missing your request for the data. Do you have the dataset or are you still in need?
I have posted to github two notebooks:
- save_middle_slices_as_images.ipynb
- MRNet_Baseline_Models.ipynb
The first is a batch preprocessing step to take the .npy array files and save them as three-channel images in separate directories using either just the middle slice, or else three slices centered on and including the middle slice, possibly skipping one or two slices.
The “baseline models” notebook is a scaffold for fitting a set of models on the images from each plane prepared in the “save middle slices” notebook.
Baseline models can be extended in many ways, including the following:
- Unfreeze the weights and retrain
- Adjust the learning rate, add scheduling
- Use a version of resnets or other vision model
- Add data augmentation
- Add layers/model to aggregate predictions from separate planes (paper uses logistic regression)
- Add code to process the model metrics files so we can keep track of performance improvements
Let me know if you’ve gotten the notebooks running.
Let everyone know if you’re attacking one of the steps above, so we avoid duplication.
For reference, average AUC across tasks with a basic setup is about 0.78 on the validation set, whereas the competition baseline model got 0.92. Lots of room for improvement…
Also, the “dilations” idea of taking three slices from each scan results in blurry RGB images. As expected, there’s no benefit from the additional information (from 3 instead of 1 slice) without unfreezing the weights.
Basic setup:
using pretrained AlexNet
with LR defaults
trained for a small number of epochs (10)
on a single slice or three slices from the center of each scan from each plane
and using only the plane model with highest performance for each target label
Unfortunately I am still in need. I’ve spent an hour+ on Windows, and then I tried to get it over to FloydHub with no luck either…if it’s possible for you to just even post a link to a small subset that would let me start looking at things and get going.
Thanks very much!
I just tested to download and unzip the files and it seemed to work:
What i did was this:
- Start my training server instance
- Open jupyter notebook (this is not necessary if you can just ssh to it)
- Start a terminal
- Register on the MRNet page to get the download url
- go to your data folder
- run
wget http://download.cs.stanford.edu/deep/the-real-filename.zip - run
unzip the-real-filename.zip
You could run these commands via a notebook if you wish, but i like to use the terminal.
Worked like a champ! Thanks for the explicit instructions, exactly what was needed.
Now that I’ve got actual data can start moving forward with it.
Thanks again Christoffer!
Hi here it’s a notebook in colab to get started with the competition with fastai !
Hope it helps 
https://colab.research.google.com/drive/10GoEbF6FuKtVibXHFc7UjwwdHzX5_pvE
I wonder if it could be a good idea to generate data augmentations by stacking the volume images and rotating the volume by random degrees and then re-generating new slices from that volume?
kind of like this https://youtu.be/f4IPsdTn7c8?t=285 but with code and then generating slices.
I have taken this competition as a part of my Master’s thesis. I have tested various models.
Please let me know if anyone is interested in working with my code
Thats a pretty great idea. I also watched the video.
I implemented it and imported a sample file from the dataset to photoshop. The main problem I got was that the image is not of the good quality and it is very tough to interpret.
Hi! I am a data science student at UVA trying to work with this knee data set as a basis for other medical applications. I would love to take a look at your code if possible.
did you get anyone , i did my EDA , but needed more help , could you help ?
hello sir , could you help me on this dataset ?
