XNLI English Dataset classification using Fastai v1

The XNLI dataset (https://github.com/facebookresearch/XNLI) has 15 languages including English (2490 dev samples and 5010 test samples per lang.). While trying to test an Arabic ULMFiT model on the Arabic rows, the LR did not behave well on classification learner. I decided to test the En rows and found the same issue (less than 50% acc.). Is this due to the nature of the dataset? Anyone tried to test the WT103_1 ULMFiT model on this dataset?
Apparently, combining the two columns (premise, hypothesis) with a space is not the best approach (what I did) - as discussed here.
Update: I uploaded the kernel xnli_en classification. It has very poor performance for reasons unknown to me (most likely I am missing something as I expect performance to be much better but not very high due to task at hand and limited training set).