Anyone want real world data? Have possible collaboration with Scripps grad students

I have a friend at Scripps Oceanography, a research institution in Southern California. She’s put me in touch with several research groups whose research is based on image classification problems to investigate important scientific hypotheses.

Is anyone interested in connecting with one of these groups? I see it as an opportunity to get access to real world datasets while furthering scientific progress.

Reply here or shoot me a message if you’re interested. Please make sure you can commit – these are real people who need results to complete their research.

Project Descriptions:

Physics / Biotech Imaging:
“My work relies heavily on image processing. Primarily rendering and segmentation of large tomographic datasets but also particle detection etc. At the moment I am using platforms such as Matlab and Image J but I have a strong interest in making use of the current advancements in deep learning and AI. It could be a perfect collaborative effort”

Fishery Research:
"I’m a PhD student studying fisheries, and one of my thesis chapters is about the dispersal and survival of eggs from a Nassau Grouper spawning aggregation in the Cayman Islands. Instead of traditional plankton net tows, we’ve used a new method to look at the fine-scale egg distribution - towing a fancy underwater microscope. The challenge is then to classify the plankton images into meaningful categories: fish egg, fish larvae, copepod, chaetognath, etc.

In 2016 we did a test tow following spawning for about 4 hours and saved ~80k images. I manually classified 49k of these and it would be nice to classify the remaining 30k. More pressing, though, are the images from Feb 2017. We did a much larger experiment, following the egg spawn cloud for 16 hours on night 1 and 36 hours on night 2. There are about 230k images, of which I manually classified 18k from night 1 and 23k from night 2 (18 classes, including “blurry/unknown”).

Oceanographic research
Our lab has a large amount of image data (~6 million in one data set, ~50k in the other). The two data sets both have subsets that have been labeled. Our lab spends a large amount of time tracing and categorizing what is in these photos for the large dataset and hence have seen the importance of machine learning for our data. We have made some steps towards machine learning but it is still in its infancy.

Antibiotic resistance research
"We use fluorescence microscopy to identify the mechanism of action of natural products and antibiotics.

It takes much too much time to put together our images and I really think we could benefit from your expertise. Here’s an example of the kinds of data we collect and the methods we use for analyses.

In one microscopy session we have anywhere from 1-30 samples, anywhere from 1 to 5 time points. During each session we take on the upwards of 100 images that consist of 4 different color channels, through 8-12 stacks. When we’re done we deconvolve our images and import the d3d.dv images into Fiji. From there we choose the stack best in focus for each color and create a .jpg for each color. This now takes our images and multiplies them by 4. Next, we import the Now focused colored images into photoshop as a stack. From here we save the image and crop then scale the cells that represent the observation.

Reply here or shoot me a message if you’re interested. Please make sure you can commit – these are real people who need results to complete their research.


Is it possible that they could upload their data to Kaggle Datasets? That way we could collaborate here on the forum on the projects.


I think that’s a great idea, never crossed my mind!

I’m going to followup with them in the next day or two and will propose it to them.