Hello!
I’m trying to use the AWD-LSTM on my own dataset, but I’m having probelms with the TextDataLoader. I studied this example, and formatted my own data to be similar.
In my paper space notebook, I first test everything with a slighly modified version of the example and it works fine.
But the when I generate my own data, formatted in the same way, I get “Could not do one pass in your dataloader, there is something wrong in it”.
dls.one_batch() results in "AttributeError: ‘tuple’ object has no attribute ‘shape’ "
Any suggestions?
#Imports
import pandas as pd
from fastbook import *
from fastai.text.all import *
# Create a local csv file of the IMDB example
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
df = df.drop(['is_valid'], axis=1)
df.to_csv('URLs.IMDB_SAMPLE', sep=',', index=False)
# Working example
df = pd.read_csv('URLs.IMDB_SAMPLE')
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)
#Functions for dataset generation
import random
def int_series_as_str(n, a, b):
r = str(random.randint(a,b))
for i in range(n):
r += ' ' + str(random.randint(a, b))
return r
def create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes):
data = 'label,text\n'
for i in range(n_samples-1):
data += random.choice(classes) + ','
data += '"data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end"'+ '\n'
data += random.choice(classes) + ','
data += 'data_start ' + int_series_as_str(n_ints, rng_frm, rng_to) + ' data_end'
return data
#Generate data, and load
n_samples = 1000
n_ints = 100
rng_frm = -10
rng_to = 100
classes = ['negative', 'positive']
d = create_csv_data(n_samples, n_ints, rng_frm, rng_to, classes )
file_name = "data.csv"
with open(file_name, "w") as text_file:
text_file.write(d)
df = pd.read_csv(file_name)
dls = TextDataLoaders.from_df(df, text_col='text', label_col='label')
learn = text_classifier_learner(dls, AWD_LSTM)
# try one batch
dls.one_batch()
Originally posted here with more background information, but I flagged the post as it was suggested that I should move it here. I edited the post to focus on the main issue I’m having, but I’ll gladly elaborate more on the project here if the my other post get removed and that’s desirable