Image to array of classes to solve a CAPTCHA

I want to train a CNN to solve CAPTCHAs with fixed length and alphabet.
Using materials from lessons 1,2,3 I’ve managed to build a good (pretty useless though) classifier for one-letter CAPTCHAs like this:
However, I have no idea how to make it work with longer CAPTCHAs:

  1. Is it achievable with CNN only?
  2. What kind of label should I use? Should it be a tensor of classes?
  3. What algorithm should I use? Should it be regression or classification or something else?