Lung cancer detection; Convolution features + Gradient boost

That’s handy! For the one problem subject, does it give an exception? Or do we have to check for some problem?

I’m downloading the kaggle data using “kg download”, so fart it’s quite slow, does anyone know a better way to speed up the downloading?

I can’t remember if it threw an exception or just logged something to STDOUT, but it mentioned unequal slice spacing.

Subject ID is b8bb02d229361a623a4dc57aa0e5c485, looks like they repeated the study under same series (or maybe it’s with and without contrast, haven’t looked at images).

Slices go from one side of pt to other, then start over again. By instance number each slice goes -2.5 mm then jumps back 260 mm before hitting the same positions again.

That’s all I did - but yeah it was slow! You can also use torrent - instructions on the competition page.

thank you, I will look it up.

I ran out of space on p2 instance while unzipping stage1, and that’s just kaggle data. Anyone can let me know the entire size (unzipped) of kaggle + luna data?

Luna data is anout the same magnitude of kaggle data - 60 gb zipped, around 100 unzipped

1 Like

Have people stopped seeding the Luna dataset? All options for getting it are failing for me so far.

You can try to use google drive, see my post from yesterday

1 Like

You’re right. Drive works. Had only tried Dropbox and torrent (both of which failed). Thanks.

Now that I finally have the LUNA dataset, it looks like SimpleITK can read MHD files and convert to ‘.nii’ or ‘.nii.gz’ files automatically (with ReadImage and WriteImage).

conda install -c simpleitk simpleitk

Forgot to mention there are several nice 3D viewers for Nifti files, particularly like Mango (http://ric.uthscsa.edu/mango/)

EDIT: Someone made the mango JS library (papaya) notebook compatible:

2 Likes

Once the images are processed , how do I input it to a model. I am getting stuck here. Do I average all the slices for a patient and create one single array and label that single array cancerous or non cancerous.

You could, but I’d doubt that would work well. (You can easily check what that would look like once you have a 3D array with plt.imshow(img.mean(axis=-1)))

Can use 3D layers on the whole dataset or apply 2D layers to each slice and perform some kind merge or recurrent layer on them.

I’ve created labeled masks for the nodules so will be trying a 3D segnet approach to find candidates for further processing.

Using a 3D conv net on the whole thing cropped and resized to 128x128x128 got me into the top 33%, but that’s not very good and most of my predictions were near the %cancer. (My score was ~0.58, which is better than 0.69 for all 0.5, but not sure how much better than a submission of all ~0.25.)

I have a question about converting brain tumor contour data (in DICOM format) which is represented as [x1,y2,z3, x2, y2, z2, …, xn, yn, zn] for a given slice of a 3d MR scan into pixel positions (i, j).

So from DICOM data I’ve extracted these fields:

ContourData:

This field has x,y,z coordinates (in mm) with respect to origin for a given slice. Where z is constant since we are dealing with a single slice.

ImagePositionPatient:

This is the center position of the upper left voxel. Again z is same as the z we have in ContourData since it’s the same single slice. We can consider this as origin and find pixel positions i, j with respect to this.

ex: (-129.746’, ‘-167.203’, z) in mm

Pixel Spacing:

This is the distance between center’s of two adjacent pixel in x and y direction (in mm).

So in theory these information should be enough to designate the pixel positions of each contour coordinate data given the math:

x: contour coordinate on x-axis in mm
y: contour coordinate on y-axis in mm

origin_x: origin on x-axis
origin_y: origin on y-axis

x_spacing: spacing on x-axis
y_spacing: spacing on y-axis

pixel position (i, j):

(x - origin_x) / x_spacing, (y - origin_y) / y_spacing

But with this formula I’m getting these numbers when I am expecting to be getting integers only. Since this will be the ground truth of a brain tumor segmentation project I’m approaching numerical stabilization highly sensitively.

282.4990153603781 253.99960614415122
282.9992122883025 253.49940921622687
287.0007877116975 253.49940921622687
287.4990153603781 253.99960614415122
287.9992122883025 254.49980307207562
288.99960614415124 254.49980307207562

Yes, I’ve tried many online sources to figure out this problem but only one person in a google group had a similar issue which still didn’t answered my particular case. This is a great community to find experts in both medical imaging and deep learning so I hope you can help me out, or even better guide me how to extract X and y from DICOM in the best possible numerically stable way.

Thanks in Advance

I think you should use dcm.pixel_array for pixel values. Please see the completion page for sample pre processing. Also use np array with right value types. ((x - origin_x) // x_spacing)

The problem is contour dcm file includes 7 different contour data (different measures) and it doesn’t have a pixel_array field. Where as regular slice dcm files have that field. Thanks

Your numbers are off by 0.5 or 1.0, my guess is it’s a +/- 1 issue rather than a problem with the DICOM format itself.

It’s seems like we will end up rounding to get pixel positions. I learned that sometimes different commercial products also give different contouring probably due to this issue. Just wanted make sure thanks again !

Hi, I am working with nifti (CT) image segmentation. I would like to apply Hounsfield windowing to the images. How to access or set windows/levels in the nifti image. In dicom, it is possible to get the values using print(image) but I’m unable to get the values in nifti.

Thanks,
Priya

You need the slope and intercept values, which may be stored in the Nifti header, otherwise will need original DICOM header.