Describing an electronic circuit diagram

I want to generate text descriptions from a electronic circuit diagram. I’ve been struggling myself with some circuits on my side projects and I think it’d be a fun project to try. So far, this is the structure I have in mind

  1. Segment the image into components (and figure out how they are connected)
  2. CV model to recognize each of the components
  3. Generate a text representation
  4. Feed the representation to a LLM.

The part 2, 3 & 4 are kind of straightforward. I might struggle to build the classifier in 2, but I know how to tackle the problem. The text format for step three could be something like the mermaid format:

A[Battery] -- Current --> B[(Resistor)]
B -- Current --> C[(Capacitor)]
C -- Current --> D[(Inductor)]
D -- Current --> A

However, I am not sure how to approach the task of segmenting the image. I have tried some frameworks like yolo but with little success. Any ideas on how to do this?