Practical questions on how to structure different input data types

cardinal · April 7, 2017, 12:50am

hey guys, I’m wondering about how to structure the input for a problem, hope I can get some tips. I have a bunch of attributes to describe an entity, the attributes are different data types (string, constant, int, float), I understand I need to one-hot encode the strings and constants, so my input for an entity would be something like [[0,0,1,0], 1359283, [1,0], 45.1, 2.1, 0.17, [0.43,0.63,0.11,0.35]].

So do I just flatten this vector to something like [0,0,1,0,1359283, 1,0, 45.1, 2.1, 0.17, 0.43,0.63,0.11,0.35] and feed as input? Feels like if I do it this way it loses a lot of information like the (0,0,1,0) is supposed to be a tuple together, and part of the tuple like 1,0 are completely independent of the rest like 1359283.

Or do I separate them into different input layers and merge them? Basically merge([input(shape=(x,4,1)), input(shape=(x,1,), dtype=‘int64’), input(shape=(x,2,1), input(shape=(x,1,), dtype=‘float64’), input(shape=(x,1,), dtype=‘float64’), input(shape=(x,1,), dtype=‘float64’), input(shape=(x,4,1), dtype=‘float64’)]), x being the length of my samples.

Another big problem I’m running into here, is that I’m not sure how to deal with optional attributes. I remember in our word2vec class, the words that are outside the 500 word range are all assigned index 500. But here for integer and float attributes that are optional, I don’t know what should I do here? Do I assign them a random number like 0 or -1? Or do I make it a vector - like [1, 0.63] which means the attribute for this entity has a value, [0,0] means it doesn’t?

Hope I can get some help here, thanks a lot!