Detecting the exact shape of license plate (at tilted angles for example)

So I’m trying to find the exact license plate shape (in all tilted angle, rotated, …) forms of user uploaded images (no control over lighting, resolution, …etc) to be able to replace them with another image through a geometric perspective transformation. something like what the company is doing here :

or like in the following image :

So I think I’m clearly not just interested in the bounding box that object detection frameworks like YOLO or SSD or alike provide. In fact i have already trained a Yolov3 that pretty robustly detects the bounding box around all sort of plates. However getting from these bounding boxes to the actual license plate shape (or its coordinates) seems to be the tricky part.

I tried to pass the detected bounding box to a semantic segmentation network (Deeplab V3+) to carve out the plate for me but the success rate is not really acceptable specially that once the cloud is even correctly formed the next step would be to find the tarragon’s four corners that is needed for perspective transformation. (which could be a challenge of it own).

So then you could say why don’t you just try to regress the four vertices of the plate instead of all that fuss ? but there comes my question that what architecture would you recommend and why the literature seems so empty of successful polygon vertices regression models ? possibly at least partly due to instability cause by accumulation of erros in estimation of each vertices (too many free parameters…)

So any thoughts on this one ? you guys have any idea how that company mentioned above is possibly doing it ?

P.s.1 I decided to not use Mask R-CNN from the beginning because of its low frame rate. the closer to real time, the better for me.

Ps. 2 I tried all sorts of pure image processing techniques on the detected bounding box around the plate to carve out the plate as well (edge detection contour detection and filtering, hough transform,…etc.) not very successful either.