In Lesson 7, we see how to add non-spatial meta features to a ConvNet by concatenating them in at the penultimate (conv) layer before feeding them into a dense layer. What about FCN’s where the task is segmentation or per-pixel / per-voxel labeling? If we have useful meta data such as voltage, or altitude, or mph, or some other equivalent scalar that we we’d like to add into the network, how can this be accomplished?
So far, my only thought is to simply encode the data into a w*h image, where every pixel value is equal to whatever single data feature I’d like to add in and then concatenate this in as a channel: e.g. suppose my last conv layer’s input looks like : (None, 5, 256, 256), merge in the fature so that the input is (None, 6, 256, 256) then proceed as normal. However that seems like a very suboptimal and hacky way to go about doing things, since I’d be adding the scalar in 256**2 times.
Right? So that kinda goes back to my original idea where I create a spatial layer–in your suggestion at the bottom of the network where features are the most dense and spatial dimensions are the least–but still, in that situation, I’d have to create a WW x HH 2D array, where each element equaled whatever metadata scalar, and merge it in as a separate channel.
Is that the only acceptable way to get non-spatial information into a FCN?
I’m looking at this now with the application being adding a patient’s age and other metadata about the patient from a dicom into a segmentation unet. Have been trying to add it into the middle of the unet (middle in the kernel) but struggling to get the shapes to work. Has anyone seen this sort of thing implemented anywhere else, or any object detection algos that use more than just the image? Thanks!