How would you prepare data if its in a large document?

I am trying to prep data to finetune a model, however the document is a confusing mix of text, tables and pictures. I could copy and paste the words but having this as one large cell in my csv seems wrong.

Linked a pdf. 35k words and 60 pages. Good example of data sources

I know I will summarize the text before feeding it into my Extractive QA model but I do not even know how to prep for summarization. How would you guys do it?