I have some timeseries sampled at 25600 Hz (25600 samples/second) and the duration of each signal is 60. seconds. I’m working on a classification problem (only two classes) and I don’t know how to truly handle this huge number of points. Does anyone have any idea how to tackle this? A 2nd question : I wonder has a knowledge of some baseline model that would a bit good at first?
Thanks!
You can try to compute cepstrum. It is a standard way to extract features from speech data. The trick is to move to the frequency domain first, and then reduce the dimensionality of FFT bins by dropping the ones with “low energy” through DCT.
I’m sorry I think I haven’t made my question clear enough. I don’t want to go with signal processing. I would like to use deep learning and I want to know if anyone has some good architecture in head so that I use it as a shortcut because I don’t have enough time to do a good amount of tweaking
Since your data is of sequential nature, I guess RNN should fit the bill. You also can put 1D CNN before the RNN, to extract more meaningful features first, like in the DeepSense paper.
If we are talking fully deep learning here, maybe try experimenting with dilated convolutions too before the RNN that @emilmelnikov points out. WaveNet fully uses convolutions for generative audio, but dilated convolutions work in general as good feature extractors too.
Although, if you only have two classes (and unless they’re like very close), it’s worth looking into signal processing for feature extraction.
Thank you. Yeah I thought it would be a good way to go. Have any idea about stacking many conv1d (with everyhting that goes with it like pooling,etc) and flatten id by the end and going for a sigmoid? If any experience with this architecture, would work as good as conv1d+RNN?
I’ve already tried feature extraction using signal processing and I’ve had quite good scores at the chosen metrics. I just want to try new architectures to do some kind of benchmark. Thank you for the ideas!
I think DeepSense uses convolutions in the first layer because it’s more computationally efficient, and it allows you to merge sequences from multiple channels into one. Of course, the only way to know for sure is to try it and see the results!
You can also try dilated convolutions, as suggested by @irhumshafkat, and maybe QNNs.
Thank you too much both of you @emilmelnikov and @irhumshafkat or your help