I’ve been trying to figure out how to use RandomSplitter instead of sklearn’s TrainTestSplit, but I can’t seem to figure out how unless it’s within a data block, etc. How can I use it to create a test set to hide from the model? I’m currently attempting this with the Movie Lense movie reviews data set. Any tips on where to look?
I would probably start here fastai - Data transformations . There is also TrainTestSplitter
similar scikit. Try it out and let us know if you face any issues.
It actually is sklearn.model_selection.train_test_split
(just wraps the resulting splits in
L
s see).
So the answer might be: just use that, its the thing that does the task you want to be done. Except there is a particular reason you don’t want to use it, then please share why or what your usecase is and what you are trying to achieve to give you a more specific answer .
Thank you both @msivanes and @benkarr .
I was a little confused about how to use RandomSplitter() as an object but figured out how using the code below(for posterity):
df = ratings
splitter = RandomSplitter()
splits = splitter(df)
splits
returning this:
((#80000) [22860,5725,2547,15138,92058,52871,84848,43409,93068,95541…],
(#20000) [86556,31812,3798,42698,75080,75170,60071,27463,91757,22442…])