Using Text Siamese Networks for SNLI

Hi All,

I am currently attempting to use Fastai-v2 for training a Siamese Network on the task of SNLI. I am using the siamese tutorial, which uses images, as a template. However, I have fit a couple of roadblocks in transferring the tutorial to textual data. The current issue I am facing is with getting the data in the correct format of (premise, hypothesis, label) that the SNLI task comes in.

The most successful attempt is located in this google colab notebook: https://drive.google.com/file/d/1XmANd5SOWUpx0BAyCRyUpXABd6HCVSm6/view?usp=sharing

The issue comes when attempting to generate a batch using the dataloaders.one_batch function giving the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-97-ccb93b9fbe07> in <module>()
----> 1 dls.one_batch()

12 frames
/usr/local/lib/python3.6/dist-packages/fastai2/torch_core.py in _f(self, *args, **kwargs)
    269         def _f(self, *args, **kwargs):
    270             cls = self.__class__
--> 271             res = getattr(super(TensorBase, self), fn)(*args, **kwargs)
    272             return retain_type(res, self)
    273         return _f

ValueError: only one element tensors can be converted to Python scalars

I’ve tried to localize exactly what call is trigger this issue and what data is being produced, but haven’t been able to track it down.

I see that the underlying data is correct by looking at the dataset within the dataloader:
((TensorText([11, 72, 27, 14, 10, 34, 73, 15, 45, 18, 74, 16, 29, 37, 35, 9]),
TensorText([19, 0, 17, 24, 10, 34, 9])),
TensorCategory(0))

which shows the two sentences and the label (though I would have expected the two tensortext’s to be of type TextTuple (see google colab)).

I’ve also looked through previous attempts to use siamese networks for textual data in fastai, but they are using a previous fastai version.

Any help would be greatly appreciated!

Hi Nathan,

I have somethings that “works” mostly. I’m getting a weird CUDA error during training that I’ve asked for some help on here. I haven’t been able to figure it out but it does train on CPU and I am trying to get it work on TPU based on this thread.

The Colab notebook is available here: https://colab.research.google.com/drive/1uK-Wna2Uo9JRx7RqCIrl_W_yVt2tDiBD#offline=true&sandboxMode=true

So far the results are not great, only abut 68%-70% accuracy. I’m hoping that if I can get the training sorted out, I can work on different methods to improve the accuracy.

Hope it can be of some help to you.
Cheers,
Norman

Hey @nsecord, thanks for the help with your version, I will definitely check it out! I went ahead and requested access to the google colab notebook, so whenever you have the time please accept that request.

I’ll also look into getting the GPU option working and will keep you updated on any progress I made.

@nsecord, I got your example working! Thanks again Norman. Also, not 100% what changed when I slightly modified your version to work with the way I load the snli data, but I was able to get it training with a GPU without changing any of your code (could also be that I am installing fastai v2 from source instead of pip). But here is a version that I could train very quickly on a GPU: https://drive.google.com/file/d/1s8QhcwugSQd1ruqOcP6W5bc0JNuUxXc1/view?usp=sharing

Hi Nathan, that’s interesting. I’ll have to check what might have been the problem with the data. The error rate is still not very good, I’m not sure what to try next to improve the model.
Cheers, Norman

Hi Nathan,

I checked your code and you have an error at the very beginning when you create the dataset. Your code assumes that when your run the loop over the splits of the dataset what you get out will be train, validation and test sets, in that order. However, it doesn’t actually come out that way. If you add a print statement to see which splits, you will see that it is test, validation and then train (at least that is what I got).

dataset = nlp.load_dataset('snli')
dfs = []
for split in dataset:
    dfs.append(pd.DataFrame(zip(dataset[split]['premise'], dataset[split]['hypothesis'], dataset[split]['label']), columns = ['premise', 'hypothesis', 'label']))

trn_df, val_df, tst_df = dfs
trn_df.head()

This is why the training goes so quickly, you are only using 10000 samples for the training instead of the actual 550k samples provided.

You should probably do something like the following to ensure you are picking up the correct dataset:

trn_df = pd.DataFrame(dataset['train'], columns=['premise', 'hypothesis', 'label'])
val_df = pd.DataFrame(dataset['validation'], columns=['premise', 'hypothesis', 'label'])
tst_df = pd.DataFrame(dataset['test'], columns=['premise', 'hypothesis', 'label'])

If you do that, unfortunately, you should end up with the same CUDA error I have been experiencing. I’m going to ask on the Pytorch forum if anyone has any idea what might be causing this.

Norman

@nsecord Ah, good catch, thanks Norman! I did fix that problem with the dataset and then the cuda error did appear. Have you had any luck figuring out the CUDA error, and if you don’t mind could you link me to your pytorch forum post so I can track it?

@nsecord figured out the issue was with a text input being longer than the seq len that was defined. I guess it the code doesn’t automatically trim off longer sequences. This caused issues when pushing the batches and the models to the GPU that resulted in the strange error. I fixed the issue by just increase the seq_len variable from 72 to 85 because I couldn’t figure out how to force fastai to trim the batches.

Here is a complete colab that worked for me: https://drive.google.com/file/d/1vrJbRoidEktoBvNTkW16Qz7G-6YIm7vv/view?usp=sharing

Hello Nathan,

Thanks for the update, I haven’t had too much time to devote to this lately but will try it out.

Hi Norman,

Did you ever happen to find a fix for this issue? I’m having the same issue, and I’ve tried increasing the seq_len to no avail.