[Kaggle] TalkingData AdTracking Fraud Detection Challenge

miwojc · May 8, 2018, 12:25am

Now as the competition ended. Just wondered if anyone used DNN for the structured data problem, as taught in Part 1, with Rossman example?
Did DNN score good for you?
Would be great if you could share notebook / kernel. Thanks!
I couldn’t make the proc_df to work, also struggled to load the whole 200 million rows of data, kernel was restarting due to memory i believe.

quan.tran · May 13, 2018, 6:44am

I tried fastai DNN with a custom loss function to deal with this imbalanced dataset but due to time constraint I didn’t have much time to experiment with it more. My validation roc auc score is .973 (trained on 3 mil records of day 8, validated on 2 mil records day 9 with a bunch of new features). You can take a look at my repo here:

github.com

anhquan0412/talkingdata_clickfraud/blob/master/fastai.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PosixPath('data/tmp'),\n",
       " PosixPath('data/test_nextclick_FE.feather'),\n",
       " PosixPath('data/submission'),\n",
       " PosixPath('data/sample_submission.csv'),\n",
       " PosixPath('data/xgb2.model'),\n",
       " PosixPath('data/train_sample.csv'),\n",
       " PosixPath('data/dtree.dot'),\n",
       " PosixPath('data/test_FE.feather'),\n",
       " PosixPath('data/xgb1.model'),\n",
       " PosixPath('data/models'),\n",

This file has been truncated. show original

miwojc · May 17, 2018, 1:09pm

@quan.tran
Thanks for sharing!