Hello Everyone! Just Earned Basic
I have a question about how to craft features such that they lead to the best results for my Neural Networks.
Let me start with two examples I face in my current projects.
In one of my NNs I take wind direction as a input measured in degrees (0-360). It was brought to my attention that from an input perspective this might be a source of bias. If the wind shifts from NW to NE it will go from High 300’s to Below 50. This numerical represenation can lead to some strange cusps in my data when training the RNN
- Part of my solution to this was to split wind direction in to 8 columns and have them be binary ether the wind is blowing this way or not
The other example is with some labels that I have applying to loan default prediction NN. If I transform these text labels such as “Credit Card Refinancing” or “Car Loan” in to numerical representations wouldn’t this cause bias based on which label gets which number? “Car Loan” (Mapped to 9) being some how better then “Credit Card Refinancing” (mapped to 2).
- I had a similar idea of turning all these different purposes in to there own binary column that act an inputs
My main question is what is the name of this data problem and what are some of the more recent thoughts on how to handle converting text labels in to numerical representations that won’t inadvertently add bias to my network.