I have a dataset where 99% of data contains non fraudulent transactions and only 1% transactions are actually fraudulent transaction with over 150+ features, Can i leverage deep learning techniques? If yes, how?
@saurabhjha21 I would say it depends on how much data you have. If your data is big enough then I would say yes, as long as you solve the unbalanced classes problem. (You can search threads in this forums about unbalanced data and best practises)
The other extreme, if you have really small data…then I don’t think you can use not only deep learning but maybe not even machine learning as it will be really difficult to have enough data to properly tune your models and validate their performance. Your best chance in this case would be to try some anomaly detection model.
And, in between these two cases, with medium sized data… I would say ML yes but probably DL difficult.
I would be interested in how this turns out! I have a similar thing I will be looking for at work.
I have heard of cyber-security groups making a 3 pass look in production:
- Known bad things ( bank accounts #s, bad emails, etc)
- Known strange things on 1 feature (these banks overseas are riskier, generated )
- Then run a deep learning model on the end
Can you advise some links/paper to imbalance data classification problems?
In my empirical experience with breach detection based on ML - one of the easiest things to try - simply clone fraud records to get comparable counts with “good” records.