I would say create an indicator matrix with columns for a-z, i.e. row 1 will have a 1 for column a and column b and the rest zeros. Similarly row 2 will have a 1 at columns a, c, d, e and the rest zeros. Then you can have an embedding layer for each of those columns.
I created a identical row for every value in categorical column. So, every row will have only 1 value instead of multiple values. Then create an embedding vector on the larger data.
While scoring the model on original data, aggregate the vectors using mean, sum, etc.
This approach result in information loss. But this is the best I have now.