I am trying to generate embeddings for a continuous variable (price)
Right now i am doing this by creating buckets of 500 euros and then training like it was a categorical variable.
There are many items sold in the 0-20000 euro price range, a lot less items are more expensive than that. That causes some isssues
many buckets will be undefined, i plan to use the emeddings in a lookup table so not having values is problematic
if buckets exists the embeddings are frequently all over the place because there are so few items sold at that price point
someone looking for a 2000 euro product might be interested in a 2500 product but not a 4000 euro product because they simply do not have the money to spend. People looking for a 40000 euro product might also be interested in a much more expensive product, say a 50000 euro product
One way i thought of to fix this would be to create variable size bins, which are narrow at the start of the price rang and wide as the price goes up.
Can any one suggest a way to come up with the optimal bin sizes or propose another solution to this problem