Hi,
I want to fit an LSTM for intent classification using the Dutch Roberta pretrained model.
I installed fastai v 2.3.0 on AWS and loaded the Dutch Roberta language model:
#Step 1: download the Dutch Bert model
from transformers import RobertaTokenizer, RobertaForSequenceClassification
dtokenizer = RobertaTokenizer.from_pretrained(“pdelobelle/robbert-v2-dutch-base”)
dmodel = RobertaForSequenceClassification.from_pretrained(“pdelobelle/robbert-v2-dutch-base”)
When I then start following the steps in the article: Using RoBERTa with fast.ai for NLP | by Dev Sharma | Analytics Vidhya | Medium to build a FastAI wrapper around the Transformers RobertaTokenizer, it gives me an
#Step 2: build a FastAI wrapper around the transfomers RobertaTokenizer (from: Using RoBERTa with fast.ai for NLP | by Dev Sharma | Analytics Vidhya | Medium)
class FastAiRobertaTokenizer(BaseTokenizer):
def init(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
self._pretrained_tokenizer = tokenizer
self.max_seq_len = max_seq_len
def call(self, *args, **kwargs):
return self
def tokenizer(self, t:str) → List[str]:
return [“”] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [“”]
Error message:
NameError Traceback (most recent call last)
in
1 #Step 2: build a FastAI wrapper around the transfomers RobertaTokenizer
----> 2 class FastAiRobertaTokenizer(BaseTokenizer):
3 def init(self, tokenizer: RobertaTokenizer, max_seq_len: int=128, **kwargs):
4 self._pretrained_tokenizer = tokenizer
5 self.max_seq_len = max_seq_len
in FastAiRobertaTokenizer()
6 def call(self, *args, **kwargs):
7 return self
----> 8 def tokenizer(self, t:str) → List[str]:
9 return [“”] + self._pretrained_tokenizer.tokenize(t)[:self.max_seq_len - 2] + [“”]
NameError: name ‘List’ is not defined
How can I solve this ? Should I install version 1 of FastAI or can I solve it differently ?
Thanks,
Wendy