I want to store Numpy arrays as values for cells in my Dataframe. Is there any way to do this? Basically i have pixel data which is a (512,512) Numpy array that i want to save as the value for pixel_data column corresponding to its particular id in the ID column of my Dataframe. How can i do this?
what i’m trying to do is extract pixel arrays from a bunch of dicom files. The folder is extremely big and i’m running into ram issues with other methods too. i thought i’ll save all the np arrays using np.save but again i ran into ram issues. Any way i can extract pixel arrays to be used later? i just need to extract all the pixel arrays from each dicom file
You can load the DICOMs directly into your dataloader. Might need adapting for your needs (especially div which should be true if the pixel data is integers and false if float, I used the default True but might depend on the dicoms) but I’ve used:
You just use it like a normal ImageList, doing DicomImageList.from_folder (/from_df etc).
If pre-processing you need to save the arrays one image at a time (you might also be better to convert to Tensors and save in pytorch format as it might be a bit faster to load, though likely not much). You also need to be careful not to leave any variables pointing to images after you process and save them. You can so del some_var to explicitly delete a variable .
hey this is really useful thanks.
I was wondering if using the numpy arrays directly in my conv net by converting to torch tensors was better than converting to images (which i assume are byte arrays that get converted to tensors). Do u have any idea about this?
How do you mean, converting to images? Do you mean writing them out as JPEG say? Or converting to a fastai Image as above. The fastai Image is just a wrapper around a tensor with various operations so very fast.
In terms of writing to images, generally storing the tensors should be faster (or numpy arrays, torch.from_numpy() shares the memory with the numpy array so is very fast,other funnctions ot convert numpy copy the data). That avoids the work needed to decode the image files. Though with uncompressed formats this won’t be much work, the saved tensors will be optimised for loading. But then you can’t use standard tools on them so it’s a tradeoff.
I’m not sure how fast the above is. The pydicom library could be slow so it might be better to extract them from the dicoms as images first. I didn’t test that. Though that will all happen in worker processes anyway, so unless it’s really slow shouldn’t affect training speed too much.