Architecture for Nonstandard Abbreviation Detection

As a toy project/learning exercise I am thinking of the following problem.
Given a word w and its set of possible abbreviations w_abv (generated by sort of arbitrary rules, not regeexp) and set of other arbitrary words w_arb (intersection(w_abv,w_arb) is empty ),
M(w,w_abv) = 1 for any w_abv in w_abv set, whereas M(w,w_arb) = 0 for all words in w_arb in w_arb.
In other words, as an example:
M(flight, flt) = 1, M(flight, flght) = 1, M(flight, attendant) = 0., because flt can be seen as abbreviation of flight, however attendant is not abbreviation of flight.
This sort of looks like a Siamese network problem for text, but also could possibly be treated as encoder-decoder problem with classification on top. I am wondering if I am missing something and are there other options (like using transformers or simpler)? For Siamese and Encoder-decoder , do you think my proposed approach would make sense? Is there a clear favourite before trying them both? Are you aware of any papers doing something similar?