Time series/ sequential data study group

JakobV · October 12, 2020, 8:13am

Hello!

I’m trying to use the AWD-LSTM on my own dataset, but I’m having probelms with the TextDataLoader. I studied this example, and formatted my own data to be similar.

In my paper space notebook, I first test everything with a slighly modified version of the example and it works fine.

But the when I generate my own data, formatted in the same way, I get “Could not do one pass in your dataloader, there is something wrong in it”.

dls.one_batch() results in "AttributeError: ‘tuple’ object has no attribute ‘shape’ "

Any suggestions?

#Imports
import pandas as pd
from fastbook import *
from fastai.text.all import *

# Create a local csv file of the IMDB example 
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df = df.drop(['is_valid'], axis=1)
df.to_csv('URLs.IMDB_SAMPLE', sep=',', index=False)

# Working example 
df = pd.read_csv('URLs.IMDB_SAMPLE')
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)

#Functions for dataset generation
import random

def int_series_as_str(n, a, b):
    r = str(random.randint(a,b))
    for i in range(n):
        r += ' ' + str(random.randint(a, b))
    return r


def create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes):
    data = 'label,text\n'
    for i in range(n_samples-1):
        data += random.choice(classes) + ','
        data += '"data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end"'+ '\n'
    data += random.choice(classes) + ','
    data += 'data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end'
    return data

#Generate data, and load
n_samples = 1000
n_ints = 100
rng_frm = -10
rng_to = 100
classes = ['negative', 'positive']

d = create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes )
file_name = "data.csv"
with open(file_name, "w") as text_file:
    text_file.write(d)
    
df = pd.read_csv(file_name)
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)

# try one batch
dls.one_batch()

Originally posted here with more background information, but I flagged the post as it was suggested that I should move it here. I edited the post to focus on the main issue I’m having, but I’ll gladly elaborate more on the project here if the my other post get removed and that’s desirable