How to do generation with forward and backward wiki103 without fine-tuning?

barata · January 20, 2021, 1:52pm

I want to use both the forward and backward wiki103 models without fine-tuning to do generation, something similar to the example below:

TEXT = "I liked this movie because"
N_WORDS = 40
N_SENTENCES = 2
preds = [learn.predict(TEXT, N_WORDS, temperature=0.75) 
         for _ in range(N_SENTENCES)]

In https://github.com/fastai/fastbook/blob/master/02_production.ipynb it mentions that, when a model is exported, that “even saves the definition of how to create the Dataloaders [which] (…) is important because otherwise you’d have to redefine how to transform your data in order to use your model in production.”

So my expectation was that, after I downloaded the WIKI103_BWD and WIKI103_FWD them, I’d only have to load them with something like

learn_inference_bwd = load_learner(<path to wiki103 bwd model>)
learn_inference_fwd = load_learner(<path to wiki103 fwd model>)

and then use them in the code as shown above for the generation. Is there a way of making this work as simple as that sounds? So far, I haven’t managed…

I’ve seen examples (e.g., ULMFit without finetuning) where the suggestion, to get the learner, is to do something like:

preTrainedWt103Path = DATA_PATH/'models/wt103'
learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5, pretrained=False)
learn.load_pretrained(wgts_fname = preTrainedWt103Path/'fwd_wt103.h5', itos_fname =preTrainedWt103Path/'itos_wt103.pkl', strict=False )

Somewhere else, in the current docs (https://docs.fast.ai/tutorial.text.html#The-ULMFiT-approach), I can find something similar, where I’d need a ‘language_model_learner’:

learn = language_model_learner(dls_lm, AWD_LSTM, metrics=[accuracy, Perplexity()], path=path, wd=0.1).to_fp16()

But, to get any of these, I’d need dataloaders, which from the page above would be something like:

dls_lm = TextDataLoaders.from_folder(path, is_lm=True, valid_pct=0.1)

These seem like what you’d do when you want to (further) train an existing model by passing more data, but then my question would be ‘why do I need data loaders (pointing to a path), when I’m just using a pre-trained model to do some basic generation’?

If I do need dataloaders, what should I point it to, when I’m not really going to do any fine-tuning? Do you have a snippet of code that loads, for instance the WIKI103_BWD, and do a straight generation with a text prompt?

Any help would be very welcome!

muellerzr · January 21, 2021, 7:52pm

You don’t. You actually really shouldn’t use the fastai API for this, or at least the Learner class. What you need to do is get the vocab associated with those original weights and make a simple batch of TextDataLoaders with the vocab for this model. We need a DataLoader of some form here as text models expect the data to be tokenized and as a tensor, so I can’t simply pass in “My name is” to my model, it would be something like tensor([102,100,90]) (just as an example). From there you would grab a batch of data and pass it to your loaded model. Below I have some pseudo code of what that would look like:

net = torch.load(mypretrainedweights)

# use the high level DataBlock
block = DataBlock(blocks = TextBlock.from_blah...)
# make sure that your splitter of choice is something like a no_split
dls = block.dataloaders(blah)

(See no split code here: SOLVED:NOT splitting datablock)
Then for doing inference use raw pytorch:

x,_ = dls.one_batch()
with torch.no_grad():
  net.eval()
  net.cuda()
  pred = net(x)

You’d need to decode that output, but that should be most of the way to get there