Trying to use RandomSplitter for Test set (not validation set)

I’ve been trying to figure out how to use RandomSplitter instead of sklearn’s TrainTestSplit, but I can’t seem to figure out how unless it’s within a data block, etc. How can I use it to create a test set to hide from the model? I’m currently attempting this with the Movie Lense movie reviews data set. Any tips on where to look?

I would probably start here fastai - Data transformations . There is also TrainTestSplitter similar scikit. Try it out and let us know if you face any issues.

2 Likes

It actually is sklearn.model_selection.train_test_split :smiley: (just wraps the resulting splits in Ls see).
So the answer might be: just use that, its the thing that does the task you want to be done. Except there is a particular reason you don’t want to use it, then please share why or what your usecase is and what you are trying to achieve to give you a more specific answer :slight_smile:.

1 Like

Thank you both @msivanes and @benkarr .

I was a little confused about how to use RandomSplitter() as an object but figured out how using the code below(for posterity):

df = ratings
splitter = RandomSplitter()
splits = splitter(df)
splits

returning this:
((#80000) [22860,5725,2547,15138,92058,52871,84848,43409,93068,95541…],
(#20000) [86556,31812,3798,42698,75080,75170,60071,27463,91757,22442…])

2 Likes