I’ve played around with the raw Oxford Nanopore data a little bit. If you haven’t already seen them, there are a couple of tools that take raw NGS data, in addition to the tools (Albacore) that Oxford Nanopore provides (which I believe uses an RNN under the hood).
DeepBinner is a tool that de-multiplexes barcoded ONT runs using a CNN to classify the reads. It’s written in Keras, and has a published model with weights (and unusually good documentation).
Chiron is a neural net basecalling tool which achieves roughly the same accuracy as Albacore (I think albacore changed to a Chiron-style architecture recently). It’s particularly interesting because it uses CTC layers to do sequence-to-sequence learning i.e. not pre-segmenting the squiggle data into chunks. I think this is a very promising approach & something I want to read more about.
Not a deep learning tool, but SquiggleKit is a handy package for querying & manipulating the signal-level data, which might be a useful reference if you’re building your own stuff.
I’m also very interested in working with the raw signal coming off the NGS devices. It seems likely that there’s all kinds of information inherent in the signal that gets lost when translating to a fastq. It also seems that wide 1D CNN or LSTM networks would have a better chance of picking up surrounding context signal in the raw “squiggle” form than in a basecalled form. That’s just a hunch, though.