I thought I’d check to see if there was a way of creating the requisite Datasets/DataLoaders via the Data Block API before embarking on a custom approach.
Thanks
I thought I’d check to see if there was a way of creating the requisite Datasets/DataLoaders via the Data Block API before embarking on a custom approach.
Thanks
I’m also trying to wrap my head around this. No working code at the moment, but I guess that the target sequence should end in the labels. Still digging in ItemList
and fastai.text.*
code trying to understand how things should fit together
Yah.
A LabelList
derives from Dataset
and accepts it’s x
and y
arguments as ItemList
objects … so I think you’re right to look there.
I’m going to play with this today though I fear there will probably be some easier way to do this in a few weeks :). Essentially I’m going to try the following:
Pre-split my training data into two separate DataFrames, one for training and one for validation.
Create two TextList
objects for each dataset, one for my input sequences and one for my output sequences.
Create a LabelList
for each pair of TextList
objects, from which I’ll create a LabelLists
object.
From there, I’m hoping to use the DataBlock API mechanism to build my DataLoaders via the call to .databunch
on my LabelLists
object.
Will update when I find success.
SOLVED
There may be a better way already, or else something in the framework’s pipeline, to make the process of preparing datasets/dataloaders for sequence-to-sequence tasks more generic (mine here is specifically for text) but this seems to work just fine.
Create seq2seq friendly datasets using the fast.ai DataBlock API
If folks find any issues or can suggest any improvements, I’d love to hear them. I’m sure such insights would be beneficial to all her on the forums as well.
Thanks, will check this