Low accuracy when using tabular_learner for Kaggle Titanic Competition

f47il · September 29, 2020, 2:13am

This is the result I’m getting, the accuracy always gets stuck at around 60%.

Maybe I’ve chosen the wrong categories?

joedockrill · September 29, 2020, 5:27am

Part of the point of Titanic as a beginners competition is learning to clean up the data and feature engineer. They’ve messed up the data on purpose so you can’t just throw it at a model and get a good result.

Go and take a look at the Titanic forum on kaggle and some of the notebooks and start looking at what other people are doing.

Start by looking at each column and asking yourself what useful info you can currently get from it, and what info you could get from it. EG: ID isn’t going to help you, name probably isn’t terribly helpful but extracting title from it might be.

muellerzr · September 29, 2020, 11:20am

I have an intro to pandas notebook which covers some feature generation on the titanic dataset (but as @joedockrill mentioned, all of it is inspired from Kaggle kernels I read )

github.com

muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Tabular Notebooks/01_Pandas.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro to Tabular and Pandas\n",
    "\n",
    "Before we begin doing tabular modeling, let's learn about the `Pandas` library"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas? Like the bear? No!\n",
    "\n",
    "`pandas` is a library we can use for reading and analyzing any bit of Tabular data. We'll work out of the newly released 1.0 version"
   ]
  },

This file has been truncated. show original

f47il · September 29, 2020, 4:33pm

Thanks for the responses @muellerzr and @joedockrill