How to train with many observations to one label


I have an interesting problem where I’m trying to assign a web browsing history to a person’s age, which is defined by 10 mutually exclusive buckets (18-20, 21-24, etc.).

My data comes in batches of n people. For every person I have a 1xp vector of whether or not they’ve been to each of p websites as my training data, but for a training label I only have the the average representation of the group in each bucket, (18-20: 5%, 21-24: 9%, etc.).

It seems straightforward to me that I’ll need to somehow combine the person records into a single “observation” so I can train against its label. Also, even though it’s a classification problem at heart, a continuous loss like a cosine distance seems to be needed so I can predict those percentage breakdowns.

So far, to combine the records I’ve just averaged all my 1xp vectors and created a new 1xp vector, where cell i is mean(i) from the sources, but this doesn’t seem to work very well. Projecting it into an embedding space would help maybe, but for that I need some kind of cleaner labels to build the space I think.

My question is this - what ways could I combine my person records to make a NN-friendly input, or alternatively what kind of network structures could take these independent people records and train a single label from them?