So I’ve been thinking through a structured data model I’ve built using the fast.ai library that appears to be working. One thing I noticed and pointed out in a separate thread was that changing the loss criterion mid training seems to help the model a lot. Discussion here:
One hypothesis for why this works so well is it helps get around distributional issues in my data. This got me thinking about my pre-processing steps and what I could do to improve them to help the model learn without having to change learning criteria mid training.
So the "core’ of my continuous data are 16 features that have an intrinsic relationship to one another. These 16 features are also used as the targets (when forward lagged in time). 14 of these range from [0,large] and 2 of these range from [-very large,+very large] but with 90% of observations lying between [-medium,+medium]. For the 14 features that are always positive, taking the log works really well to make them look much more normally distributed. What is stumping me is what to do with the other 2 features that can and do take on negative values. Is it ok to take the log of some features and not others? Won’t that be destroying any information contained in the correlation between series that have been logged and those that have not ? Maybe I’m overthinking it and should just try a bunch of stuff ?
fyi, The pre-processing I’m doing in the working model is to just scale all features by a single relevant scalar value and then use sklearn StandardScaler on all of the features
Any ideas would be appreciated!