Using CNNs for diverse datasets

How to use CNNs for a general tabular dataset like titanic?

1 Like

You could use a CNN with any data but in general convolutions would not give you a whole lot. C in a conv net refers to convolutions being performed on the pixels - pixels have the property that they are from some set range of value and various objects - regardless where in the picture they appear - are still the same object (positional invariance). Not sure any of that would be useful for the Titanic dataset.

If you are after going from lower order features to higher order features you can just skip the convolutional part and go with a set of fully connected layers where the input layer connects directly to all the values in an example.

One other thing that a CNN gives you is that if you have a filter of size 3 x 3 than for every filter you have 9 trainable parameters (10 with bias). If you wanted to connect directly to each pixel in an image (without the weight sharing provided by convolutions) you would have image_x_dim * image_y_dim parameters! That is a lot of parameters :slight_smile: But we do not face this problem with the Titanic dataset as the number of inputs there is very small (and each feature is qualitatively different from another - unlike in a set of pixels).

3 Likes

Just a small confirmation – Skipping the convolutional part and going with the fully connected layer for classification is equivalent to directly applying the ML algorithm on the as it is. Isn’t it?

1 Like

Yes. CNN is used for image classification. You can use Neural Networks as Dense or Fully Connected or Multi Layer Perceptron with multiple hidden layers for titanic dataset just like other ML algorithms. Neural nets in general have a lot more parameters to train as compared to other ML algorithms.

Hi Sakiran,
could you provide some more details on how you solved this issue.
I am not sure I follow what you mean by applying the ML as it is.

Check out this recently released paper on how to apply CNN to graph-structured data (and from there to standard regression or classification problems)

https://arxiv.org/abs/1704.08165

Answering my own question:
In the Titanic dataset there are several categorical variables.

In order to use them, I had to one-hot encode them before feeding the resulting dataset to my NN.
There are a few more data prep steps to go through as well:

  • Fill null values
  • Remove non-helpful categorical variables and ids
  • Separate the labels from the features
  • Transform the df to np array and matrices

Lesson 14 shows how to do this FYI.

1 Like

Thanks Jeremy.
Have part 2 lessons been released yet?

I don’t seem to be able to find them online.

Found it on Youtube. https://youtu.be/6lTyqrrWVQ0

It looks like part 2 videos are on a different channel

They’re linked from #part2

1 Like