Validation set used for structured data like moielens and imdb


#1

In lecture 5 and 4 , jeremy had used get_cv_idxs(len(rarings)) for validation set, can anyone shed more light on how any row is selected for validation set , i mean what happens inside get_cv_idx() , when this function is called ?


(Theodoros Galanos) #2

You can find the code inside dataset.py:

np.random.seed(seed)
n_val = int(val_pct*n)
idx_start = cv_idx*n_val
idxs = np.random.permutation(n)
return idxs[idx_start:idx_start+n_val]

So it permutes all indices of your input data (n) and then selects a number of indices equal to n_val set in the options (default 20% or 0.2).

Regards,
Theodore.


#3

thanks Theodore