@davecg How about this scenario
Let’s say we have a big stadium. We put microphones distributed evenly in the area of the stadium. The spectrogram of each microphone output is an image of sound (frequency x time) . Of course,each microphone is related with all the others in such a way that, large volume of sounds are picked up by roughly all microphones. However small group chit-chats are only detected by nearby microphones.
So the microphone spectrograms for 1 sec will be a square image that is related somehow with the others especially nearby microphones.
So instead of the xray model where we have only to add a convolutional layer with input of (2x7) to capture the spatial information of roughly relation of the two views, here we should add 3d-CNN to capture the 2D spatial information of the microphone distribution on the stadium floor? If we have 6 x 8 microphones, the 3d-CNN input should be like 6 x 8 x 7, right? What do you think?
This is an interesting project that I remembered using micrphones in a forest:
The fight against illegal deforestation with TensorFlow
I think Jeremy mentioned that Sara Hooker was involved in this project which is one of the best fastai students. Here is her talk about the project
Thank you guys for the amazing insight and discussions!