Lesson 11 discussion and wiki

gietema · April 11, 2019, 2:41am

Wasn’t the first layer’s kernel size supposed to be bigger than 3x3?

sgugger · April 11, 2019, 2:41am

It depends on where you are in your model. At the beginning, I think they still use a MaxPool. Later on though, it’s average pooling.

itsmuriuki · April 11, 2019, 2:42am

Average pooling retain more information than max pooling

harikrishnanrajeev · April 11, 2019, 2:42am

can Jeremy give a quick peek into , how he looks at a tensor ,breaks it down , analyze

raghavab1992 · April 11, 2019, 2:44am

Jeremy mentioned about recording videos for topics that we run out of time for…will those recording be available only along with MOOC or will they be released before that?

maya · April 11, 2019, 2:46am

So what does one do when having high res images (typically the case)? Resize to 224 by 224 (information loss)? Or to any appropriate size that can fit into GPU?

piotr.czapla · April 11, 2019, 2:47am

This was more about being explicit instead of implicit to make it easier for others to reason about the code, I mean if it doesn’t cost you readability:

def label_by_func(sd, f):
    proc = CategoryProcessor.from_dataset(sd.train) # <- that makes it explicit dependency on ds.train, so it is clear you can't reuse CategoryProcessor
    train = LabeledData.label_by_func(sd.train, f, proc)
    valid = LabeledData.label_by_func(sd.valid, f, proc)
    return SplitData(train,valid)

That way you don’t need the assert in CategoryProcessor. deprocess, and you let your user know that they can’t reuse CategoryProcessor between different datasets.

karthik.subraveti · April 11, 2019, 2:48am

I have a general question. Not sure what is the right forum, but posting here anyways. I found when working on my initial kaggle competitions that most of the times, good feature engineered kernels tend to perform better than using the deep learning techniques and changing the associated hyperparameters. Is feature engineering a must for every problem we look at or am i doing something wrong with deep learning.

marii · April 11, 2019, 2:50am

Example competition? I know this happens with Tabular data a bit.

yeldarb · April 11, 2019, 2:51am

What was the link to that BERT paper Jeremy just showed? I don’t think it’s in the first post

devforfu · April 11, 2019, 2:51am

I guess it depends on the type of competition. For tabular data, it is very often true. (At least, from my experience). But for images or texts, automatic feature extraction you have with Deep Learning show good results.

rachel · April 11, 2019, 2:51am

karthik.subraveti · April 11, 2019, 2:52am

It has been my experience mostly with tabular data.

gietema · April 11, 2019, 2:52am

I just added it to the first post.

rachel · April 11, 2019, 2:54am

When Jeremy covers the Rossman Competition (tabular data) that uses a mix of necessary feature engineering together with deep learning.

Edited to add:
Here are the links to the relevant lessons:
https://course18.fast.ai/lessons/lesson3.html
https://course18.fast.ai/lessons/lesson4.html
https://course18.fast.ai/lessonsml1/lesson10.html
https://course18.fast.ai/lessonsml1/lesson11.html
https://course18.fast.ai/lessonsml1/lesson12.html

yeldarb · April 11, 2019, 2:55am

Thanks. Can someone give a high level intuition for what “pre-training” is? How is it different from regular training? And is this something we’ve done before?

sgugger · April 11, 2019, 2:55am

When you use an imagenet model, you use a pretrained model. Same for transfer learning in NLP.

amanmadaan · April 11, 2019, 2:57am

From the paper:

1024 TPUs

JoshVarty · April 11, 2019, 2:57am

In Part 1 we used a pre-trained LSTM for classifying movie reviews. Pretraining involved going through WikiText and training the model to predict the next word.

rachel · April 11, 2019, 2:57am

Pre-training is the 1st stage in transfer learning. It is when you train a model on a dataset that is not the dataset you are ultimately interested in (perhaps because your dataset is too small). With transfer learning, you next “fine-tune” the model to your particular dataset.