Yes, I find it has many gems (i.e. tricks of doing things faster and/or better) which Jeremy has personally collected over 25 years of ML practices and you could not find them in textbooks or elsewhere.
At 30:35 of lesson 2, Jeremy gets a random sample of 30k rows. He then says the validation set should not change, and that the training set should not overlap with the dates (not sure which dates he is referring to).
The original validation set is made up of the last 12k rows. Since
proc_df is run on a subset of a random 30k rows, isn’t is possible that some of the new, smaller training data consists of rows from the validation set? Furthermore, I would think that the smaller training set is not necessarily ordered by date any longer since rows were picked at random.
edit: I checked out the source code, and
get_sample returns the data in sorted order so that addresses that question. I still think it’s possible that the training data could overlap with the original validation set.
Change in line 15 of text.py:
texts.append(open(fname, 'r', encoding='utf-8').read())
@jeremy I am confused whether to do Machine learning course or deep learning course first here?? which do you think will be better to do first??
They are different. If you don’t have any experience with dataset manipulation, cleaning and validation set creation, do the ML1 course first because that knowledge is assumed in the DL1 course. Personally I felt like it worked well for me to do ML1 followed by DL1. Also, fyi Jeremy has requested not to be personally tagged in posts unless he is the only person who can answer the question.