Test set evaluation - how?

See the discussion here: