Creating your own dataset

MikaelN · July 21, 2020, 8:11am

Hi there!
This is my first post and I’m new to coding with python so apologies for my ignorance. I’m physician and working on my own DL project about cardiac imaging as a PhD thesis.
I try to minimize redundat work due to scarce time I’m able to put into this project since I’m doing it alongside my daywork.
I have done small pilot project which proved that the concept works and I managed to accumulate small dataset n=90 but now it’s time to scale up and complicate the problem a bit. I have previously created a workflow to automaticly process DICOM files into ground truth and mask files and do some augumentation and it worked fine with U-Net.
The problem: I’m now adding multiple numerical parameters to the images and expanding the dataset so I figured out that building proper organized dataset would be really beneficial for the project. I have only worked with image file dataset and it was fairly simple. I tried to browse for tips to build a datasets but I only found really general advices. Are there any “good practices” for building a dataset and are there any great resources for learning to build your own multilabel dataset from scratch with different feature types?
Mayby more accurate question would be: What would be efficient/smart way to integrate my automatic DICOM processing script to a database generating script and what methods should I look into to create database generating script?

Thank you in advance!

edit: I would like to emphasise that I don’t have any experience with other programming languages other than Python and I have only worked with python for 4 months now so I’m really novice in programming but I’m quick to learn.