So, fastai uses the wikitext corpus for language modeling. I want to extract text from a specific category of articles from Wikipedia so that I can fine-tune my language model on those. Has anyone done something like this before? or is there any package that allows us to do the same?
Hi NavneetSajwan hope all is well!
Wikipedia offers free copies of all available content to interested users
Maybe the above post can help you extract the category you are looking for.