Hey Hassan, great to have you on this forum! Just scanned you InceptionTime Paper, looks very promising. And thanks for creating reproducible science, unfortunately not always the case, especially in the TS area.
Kernel Size: One question regarding kernel sizes: In your paper you now use kernel sizes of 40 - 20 - 10. Which are even numbers. I have always wondered what the thinking behind Wang’s 8-5-3 FCN and Resnet was, I have hardly ever seen even kernel sizes in image models. I mean all of those implementations were always in keras, just stuck a padding='same'
there and it is hidden away, but that actually leads to uneven padding (which in pytorch you have to create manually, so it becomes obvious). So why is that?
Transfer Learning: While in vision a network trained on imagenet seems almost universally usable the same definitely does not apply for ts models in my experiments. So I also think you have to choose a model pretrained on a very similar dataset/domain but even that sometimes does not help much. So I have not made much use of pretrained ts models so far. This is also one explanation for
Imaging Time Series I think, so by transfering the time series problem to the image domain one can make use of pretrained models and well tuned architectures, vision seems years ahead of ts in this regards (which seems to be changing thanks to you now! ). So make something an image - be it timeseries, audio etc. - and you can easily reuse everything done for vision. Having said that I always thought it is kind of a huge waste of ressources to convert e.g. a 96 step energy time series into an e.g. 224x224x3 image and then run it through huge models. The information is contained in 96 ordered numbers, so a much smaller 1D model should be much more performant… (if the right model can be found).
LSTM-FCN and its successor GRU-FCN (an 11 page paper about replacing the four letters LSTM
with GRU
in the same keras code and gaining even more stellar - yet unreproducible results): Your comment made me very happy. After trying to reimplement their model in pytorch I kept thinking my dimensions were wrong, but after rereading the paper I found that their “dimension shuffle” seems very strange to say the least. I mean they swap the dimensions of the univariate timeseries and then pass it through a RNN. But that means it only has one time-step. A 1-timestep RNN is just a regular NN. (mulitvariate even stranger). And then they add a dropout of 80% to the results of that. I could never confirm that my “hunch” that this made little sense until now (or was I just not getting it?)
UCR Dataset / Metrics:
You are publishing in this area, so maybe you can enlighten me (or even change something about it). I am aware that in order to try to compare results, researchers try to use the same datasets etc. But one third of the UCR dataset (85 sets) are artificial image-timeseries (image outlines converted into timeseries). This may have made sense at some point in time, but with todays vision models this usecase is kind of useless, right? So why continue benchmarking on that. Shouldn’t today’s models reflect more sensor data, more mulit-variate series (industrial/medical sensors, motion, spectrograms, sound etc.) in order to actually be relevant in the real world? (more multivariate was made available with the 2018 version of UCR but hardly anybody seems to use it?!)
Why is Accuracy used as the metric everyone compares on? From my own experience ( you could call it stupidity) on e.g. the earthquakes dataset it is easy to see that accuracy is a very bad metric for many of the datasets (some binary, very imbalanced). Why not “update the metric” to something more useful?