Docs for using your own model?

drscotthawley · November 20, 2018, 12:33am

I’m looking through https://docs.fast.ai/ to try to make use of the fastai utilities we’ve been learning about with my own PyTorch model (e.g., how to turn it into a Learner), and not finding that.

I see the occasional forum posts using code that no longer work with v1 of fastai…

…I feel like I’m missing something big by not turning up anything significant in my searches – I really have searched before posting this, so, sorry about this. Presumably people do this all the time. Can anyone point me to the right place in the docs or an up-to-date explanation? Thanks.

Update 1 See docs for the Learner class. “Train model using data to minimize loss_func with optimizer opt_func.” To use your PyTorch Module with a Learner, it looks like you just pass it in as the second argument when you instantiate the Learner class.

Update 2: Here’s a nice example by @eslavich for defining a simple model and random data generator.

Update 3:…although when I try to run that, I get errors when trying to actually run learn.fit_one_cycle:

TypeError: on_epoch_end() missing 1 required positional argument: ‘smooth_loss’

and trying to run the LR finder gives

KeyError: ‘train’

Update 4 Jeremy showed how to do some of this tonight in Lesson 5. But he didn’t get any such errors. Hmm. ?

Update 5: By combining the above with the notebook for Lesson 5, I’ve made a working demo and posted it here as a GitHub gist! Or if you like, the same thing as a Google Colab Notebook.

cedric · November 20, 2018, 5:35am

I am looking for docs that covers how to use pretrained model that is not part of fastai models or torchvision.models, for example PyTorch pretrained BERT (Bidirectional Transformers for NLU) and MobileNet.

I have a vague idea on how this can be achieved:

from fastai import *
from fastai.text import *

# using BertModel class with Google AI's pretrained BERT base uncased model
from pytorch_pretrained_bert import BertModel

# create model. check the PyTorch BERT README
# load pre-trained model (weights). model is a type of torch.nn.Module
model = BertModel.from_pretrained('bert-base-uncased')

# create a DataBunch
data = TextDataBunch.from_folder(path)

# define a Learner object
# this is where the basic training loop is defined
learn = Learner(data, model)

# train
learn.fit(10)

Supersak80 · January 10, 2019, 12:10am

Hi @cedric, I’m interested in the exact same topic of using BERT within fast.ai. Were you able to figure out how to do so?

Thanks,
Sam

MicPie · January 10, 2019, 6:06am

I still have to play around with BERT but I guess for BERT the standard text dataset and/or training loop will not do it, as you have the masked language model and the next sentence prediction training modes, which are quite different from the traditional approaches.

alvisanovari · March 29, 2019, 8:27am

Yeah it would be cool if we could just add new pre-trained models like that.

This is the learner code in fastai: https://github.com/fastai/fastai/blob/master/fastai/text/learner.py

Right now it looks like there is only the Wikitext-103 and the OpenAI_Transformer (not sure if this is GPT-1 or 2) pre-trained model. The TransformerXL does not have any model saved.

_model_meta = {AWD_LSTM: {'hid_name':'emb_sz', 'url':URLs.WT103_1,
                      'config_lm':awd_lstm_lm_config, 'split_lm': awd_lstm_lm_split,
                      'config_clas':awd_lstm_clas_config, 'split_clas': awd_lstm_clas_split},
           Transformer: {'hid_name':'d_model', 'url':URLs.OPENAI_TRANSFORMER,
                         'config_lm':tfmer_lm_config, 'split_lm': tfmer_lm_split,
                         'config_clas':tfmer_clas_config, 'split_clas': tfmer_clas_split},
           TransformerXL: {'hid_name':'d_model', 
                          'config_lm':tfmerXL_lm_config, 'split_lm': tfmerXL_lm_split,
                          'config_clas':tfmerXL_clas_config, 'split_clas': tfmerXL_clas_split}}

The models are pulled using the get_language_model method. I just don’t know how much of the logic in that method would need to change for different pre-trained models though…or maybe it is as simple as saving a new pre-trained model e.g. bert_base_uncased and updating the dictionary?