Tabular data with many missing features

I have a tabular data with many records, i want to generalize for different categorical feature values that i dont have records (for example different city).
I have some relations between the feature values that i have seen to the feature values that i havent seen (some other tabular data with info on both cities)
Does anyone have a good idea how to use the relations for improving prediction and shorten training?
Should i just had many columns to the model?
Maybe encode the relations (2nd table) and feed them as input feature?

As I understand standard approach is to denormalize your tables, make one big table where every record contains all the info (e.g. if you have a person from a city you would put all the info you have about this person as well as about this city in the same record).
So that if you have no data for particular part of your record (e.g. you don’t know what city this person is from) you just leave these cells blank (NaNs)

Although after reading you initial question a couple of times more, I am starting to have some doubts that I’ve understood it correctly. Maybe an example can help with that.

Thanks for the help.
I was thinking about a solution like “Predicting missing entries” in http://web.stanford.edu/~boyd/papers/pdf/glrm.pdf

@Pak
I am trying to predict user_buy on website (which could be 0)
I was able to determine that city can have a big effect
example record contains:
ts, city, referrer, user_id, user_visit_history, user_buy_history, user_buy

I also have 2 other resources,

  1. Aggregation data of:
    day, hour, city, total buy (for almost all cities)
  2. Demographic info per city

I have few ideas:

  1. denormalize tables
  2. encode the cities with some feature reduction algorithm and feed them as input features
  3. cluster the cities and then use the average of nearest cities in cluster for the embedding value of the new city (until i have enough data)

Would love to get your feedback and ideas

So, you have info on how much user_id buy in city (many of them).
Then you want to predict now good will another user_id buy in other city (which you don’t have any buying data).
You want system to determine how user_x (which system never saw) will buy in city_x (which is also unknown for system)?
Have I understood your task correctly?

Yes, new user and new city (or very little records)

Ok, I think it will be hard for system to solve this.
I think you’re right you could add to city name all the information you have about it (geographic data, population, avg sales there and etc. maybe even some feature engineering here). And try feed these denormalized data to neural net (hoping this info will be enough for system to describe the city).
The other (pretty radical) way could be:
If you have some other data for all the cities you need (data related to sales you product) you could try to teach (e.g. try to predict sales of each store in this city if you have this info) new neural network on it (definetly you need as much rows of data per city as you can), then get embeddings for cities from this net and try to use it in the first net. So it’s something similar to transfer learning, just for embeddings. One more time, it’s just an idea and it’s pretty radical :slight_smile: I haven’t tried it yet, but it can be good for your case (just watch cities’ embedding sizes it must be equal for both nets)