Interpreting Tabular Data Results

peter_stuart_turner · April 1, 2020, 1:22pm

I am currently busy with Part 1, Lesson 4 - the tabular data section.

I was wondering, are there factory methods or standard ways to interpret the results?

Specifically, I want to know which variables are most useful (or correlate most highly with) the dependant variable that you are trying to predict? Apologies in advance if this is a dumb question!

muellerzr · April 1, 2020, 1:30pm

What you want is feature or permutation importance (the algorithm). I have a notebook here detailing it in fastai version 1 and there’s a very long forum thread with a good discussion on it, do a quick search

github.com

muellerzr/Practical-Deep-Learning-For-Coders/blob/master/04c_Permutation_Importance.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "04_Feature_Engineering.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uNJYgFkBlt90",

This file has been truncated. show original

peter_stuart_turner · April 1, 2020, 1:42pm

Ah this is awesome, will look into it. Thanks a bunch!

AjayStark · April 1, 2020, 1:42pm

Hi, @muellerzr
How to deal with very little data?
example:Covid19 dataset
Suppose i have hardly 30 rows, in such a case deep learning can’t be used for the tabular data.
What are the other option? Xgboost or random forest or maybe SGD ?

Thanks,

Edit: Xgboost also couldn’t get an accuracy