How would you represent an image/picture of doodles for classification?

I was looking at https://quickdraw.withgoogle.com/ and was thinking how would you send that to backend. I was looking at their network calls in my browser console in the payload and seems that they have their own language “quickdraw”.

Anyone has any ideas about how we could possibly represent such data? Sending data is just a POST request away but representation is all that matters. I am asking this because I was looking at some more fun projects that I could possibly do and seems that as we have been working on image recognition this could possibly be a fun project. I was thinking of doing this and then thinking of going for the “Original Snapchat lenses” as described here https://blog.statsbot.co/data-scientist-resume-projects-806a74388ae6 as a more challenging project.

I was thinking the doodles can simply be represents in gray scale. Would it make sense to do the same for the 2nd use case? Sending and receiving faces could take time. Maybe instead create a desktop app for that?