Does anyone try ULMFit for genomics or proteins or any other sequence tagging(not natural text) task?
I tried to play with it but had to give up because of the lack of GPU power.
This is something I am very interested in and maybe I will give it another try when I have more GPU power available.
@MicPie, do you have anything interesting that came out of the Sentence Piece model? Did the tokens map nicely into codons for amino acid for the coding region? Were there any other tokens of interest (e.g. promoter region, etc)?
it was quite some time ago and I don’t remember anything special, but I didn’t looked into detail if the sentencepiece model captured special (longer) sequences.
The results looked quite like the example on the sentencepiece repo.
When fastai v2 is out I want to look into it again. - If you are interested in this topic too we can join forces and discuss our results.