You probably have thought of this and rejected the idea for some reason, but my first thought would be to introduce a conversion from your SVG image into a small size (say 68 x 68 or 126 x 126) bitmap image which you can serve into your model using all the technology build with CNN and maybe also benefit from some transfer learning.
This conversion can be included into a customized pipeline using a library converting svg to bitmap.
Yes, your idea makes perfect sense and was my first step. But this is insufficient for what I am aiming for. I actually need the model to be end-to-end (SVG to SVG), for a number of reasons, e.g.:
(a) The performance of tools for converting bitmaps to SVGs is very poor. Even Deep Learning powered ones. This is especially true if the PNGs have been generated without having the goal of subsequent vectorisation (to SVGs) in mind.
(b) I want the composition of the SVG to be considered, as a graph of layers, groups, elements.
I am aware of DeepSVG and Im2Vec.
Happy to discuss in detail on a separate discord if you are interested.