The issue I face is that I want to match properties that are similar to each other (e.g. longitude and latitude (numerical), bedrooms (numerical), district (categorial), condition (categorical) etc.) using deep learning. The data is heterogenous because we mix numerical and categorical data and the problem is unsupervised because we don’t use any labels.
My goal is to get a measure for how similar properties are so I can find the top matches for each target property. I could use KNN, but I want to use something that allows me to find embeddings and that uses deep learning.
I suppose I could determine a mixed distance measure such as the Gower Distance as the loss function, but how would I go about setting up a model that determines the, say, the top 10 matches for each target property in my sample?
Any help or points to similar problem sets (Kaggle, notebooks, github) would be very appreciated.