Custom Corpus for a custom language model

NavneetSajwan · September 13, 2020, 1:35pm

So, fastai uses the wikitext corpus for language modeling. I want to extract text from a specific category of articles from Wikipedia so that I can fine-tune my language model on those. Has anyone done something like this before? or is there any package that allows us to do the same?

mrfabulous1 · September 15, 2020, 12:47pm

Hi NavneetSajwan hope all is well!

Wikipedia offers free copies of all available content to interested users

Maybe the above post can help you extract the category you are looking for.

Cheers mrfabulous1