Thanks! Yes ABC is a common notation format for folk music. Each letter represents a musical note so C is ‘do’ and D is ‘re’, so on so forth. Lower case means a higher octave. The wikipedia article isn’t that good with explaining how it works and I found this one better.
I think some decisions made on cleaning is worth digging deeper for sure. This will be a side project for part2 then I happen to play Irish music so it would be particularly interesting to see how ‘good’ the model is by how it sounds, in addition to the common metrics.