Non NLP LSTM in fastai v2

JakobV · October 9, 2020, 6:15pm

Hello!
TLDR; How can I implement a RNN architecture to solve a NON-NLP binary classification problem, using the highest api-level possible of fastai. Each sample is a series of 8001 integers.

Data examples, formatted in three different ways can be found here

Link to notebook here. (You might have to navigate to ‘Untitled.ipynb’ when you open the link)

Introduction
I have been studying machine learning for about a year now through some classes at my university, and this semester I decided that I would try out fastai. I’m currently through the “Practical deep learning for coders” course, and I really like the ideas of the fastai library. I’m looking forward to become more skilled in using it.

My question is about advice in one of my projects, and I would be very grateful for any help on how to best go about it.

Question
I’m trying to make a binary classifier that handles one dimensional series of 8001 integers. Data examples, formatted in three different ways can be found here. My first goal is to test a LSTM architecture

I wanted to use the high level API of fastai, so I tried modifing this example.

df = pd.read_csv('dummy_data_like_IMDB.csv')
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)

Q1 I formatted the data to be similar to the example but I’m getting an error “IndexError: single positional indexer is out-of-bounds” on the TextDataLoaders line. Any suggestions? See the notebook for full code. (You might have to navigate to ‘Untitled.ipynb’ when you open the link)

Q2 TextDataLoader builds a vocabulary witch isn’t quite right for this non-nlp task. Any suggestions? The integeres in the sample will be within a range of -100 to 10 000 or so.

Q3 I not quite comfortable with the lower API levels of fastai yet, honestly I’m just getting to know th upper levels. If I need to dig deeper to acomplish a non-nlp RNN task, do you have any suggestions on where to start?

text_vocab should include all numbers.

Pomo · October 9, 2020, 6:52pm

Hi Jakob!

This sounds more like sequence classification than text classification. (Unless these integers are tokens with a grammar rather than numbers.)

A better place to find answers would be

JakobV · October 10, 2020, 8:22am

Thank you! I’ll tag my post to see if admins want to remove it from the part of the forum and post my question in the thread you suggested.