Here is the code from the lecture.
How can I use my own document instead of IMDB dataset to tokenise it?
My objective is to create summary of a legal case instead of movie review as show in the lecture.
Here is the code from the lecture.
How can I use my own document instead of IMDB dataset to tokenise it?
My objective is to create summary of a legal case instead of movie review as show in the lecture.
You need to save your text(s) as a .txt
file, save it in path
and pass the relevant folder name(s) in that path to get_text_files
. Something like:
files = get_text_files('projects/data', folders ['legal_texts'])
If you don’t have .txt
files you can use get_files()
in a similar manner
Do I have to convert the text into some particular structure before using it or can I use the legal case text as it is?
So you just have a single text? IMDB is broken up into many reviews, test what happens when you pass your single text in, then try break it up smaller sections and see what happens