How do you save large language datasets

Beginner questions. First task. I want to create dataset composed of text in pdf files linked on a web page: is there a way to save the data to a database given it might be cubmersome to save a large number of pdf files on my computer, so I can do NLP training on it? Second task. Same context applies to my second task. In both cases I would like to go to a specific website, crawl page by page and click on the links, download the text from each link (first case link is pdf, second case link is just plain text on website), save the text to a database, etc. Any guidance is appreciated

Note I am a total beginner. Thanks