Lesson 5 Advanced Discussion ✅

jeremy · November 21, 2018, 6:21pm

I think they’re actually reporting MSE, based on their SVD++ number.

joshfp · November 21, 2018, 10:14pm

Following my previous post about the embedding vectors for unseen data, after training a tabular model, I manually set the weights of the embedding vectors of the unseen data equal to the mean of the others embedding vectors, achieving an increase in the accuracy (I don’t think it is just by chance).

I’d like to get some feedback, specially if any of you try it on your tabular model (specially with unseen categories in the valid set). It is just one line of code to update the weights of the embedding.

jeremy · November 22, 2018, 4:36am

I think that’s a great idea!

joshfp · November 22, 2018, 6:01am

What I’d like to try next is to apply a kind of mix between dropout and data augmentation for tabular data. Currently the embedding vector for unseen data seems not being trained… What if categorical variables are set randomly to 0 (unknown) during training? I guess this would help the model to learn to predict even with some unseen categories. I think this is different from regular dropout in three ways:

Unlike embedding dropout that randomly drops the output of the embedding layer, the idea here is to randomly drop input values from categorical features, forcing the model to train using the embedding vector for unseen data instead.
For each categorical variable, add a column to flag when the value of the category is unknown, either by “data augmentation/dropout” during training or actual unknown categories during validation/inference. The idea is that the model receives information when a value for the category is unknown, on top of the embedding vector for unseen data (similarly to “_na” fields when filling missing values).
The “p” value (probability) of dropping out a categorical variable, would be eventually link to the probability of that variable being unknown in the validation set (it’d probably worth to train the model better to deal with categorical variables that are often missed in the validation set). Like Jeremy did when training language model, I hope this is not cheating using information from the features of the validation set for training.

I will try to implement it and will share the results.

oguiza · November 22, 2018, 8:59am

Hi José,
I think this is a very interesting idea, and look forward to seeing your results.

I have a question though: how different would this be to applying dropout to the input layer?
If I understand correctly what you are proposing is to kind of apply dropout to inputs to the embedding layer, to make those embeddings more robust. You might be interested in comparing 3 approaches: 1) no dropout applied, 2) apply dropout to all inputs to the NN, or 3) apply dropout to categorical data only. Just a thought.

joshfp · November 22, 2018, 5:53pm

Thank you Ignacio for your feedback.
As you mentioned it is a kind of dropout to inputs, however some tweaks were required to make it work with categorical variables, including:

Applying regular dropout to a categorical variable, I think it’s kind of too aggressive, since it is not just making zero some inputs, but changing the category itself, and so using completely different values from the embedding layer. In order to mitigate it, I include a flag for each category indicating whether the category is unknown or not, so the model is receiving extra information to deal with it.
Dropout was applied discriminatively by variable, based on the % of unknown values in the validation set, to put more effort on variables that are often unknown.
Regular pythorch dropout seems not to work with integer values, so a custom dropout function was required to make it work with embeddings.

The final result seems slightly better than training the regular model. However, it is not clear if it is just due to the training stochasticity. My toy dataset is quite simple and doesn’t have significant unknown values. Once Jeremy introduces Rossmann dataset in class, I will benchmark on it, anyway I am not aiming to invent anything, but taking these experiments as a way to learn and practice.

oguiza · November 22, 2018, 8:45pm

Thanks a lot for the detailed response, José!
I think it’s a very creative idea.
It’ll be interesting to see the results you get.
Un saludo!

jeremy · November 23, 2018, 5:03am

Very sensible. We already use category code 0 for #na, so you can actually do this right now!

Lankinen · November 23, 2018, 1:55pm

I’m making model where accuracy is top_3_accuracy. So i’m taking top 3 predictions as a result and if one of those is correct then it is same as predicting correct accuracy metric. Currently my predictions are like [0.02,0.04,0.93,0.01] because of cross entropy. How I can do this for three numbers. Like three bigger numbers and others less. I think that might make model better.

joshfp · November 23, 2018, 10:50pm

You can try:

pred_top3 = pred.topk(k=3, dim=1)[1]
acc = torch.stack([(p==y).any() for p,y in zip(pred_top3, y)]).float().mean()

Hope this helps.

myltykritik · November 25, 2018, 10:29am

Hello! I tried your code, but it fails with error:
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not bool

I have the latest fastai 1.0.28

techjoey · November 27, 2018, 3:23am

Sounds like a good research topic to me, especially since we seem to have established that tabular models could use more research. Thank you for sharing this meaningful contribution!

pbanavara · November 27, 2018, 7:08am

Sorry to chime in late. Don’t you also lose what part of the feature set is lost. For instance if you reduce 10 features to 2 - you don’t know which of the 10 were dropped right ?

jcatanza · November 27, 2018, 6:34pm

Actually @pbanavara you do know; PCA (principal components analysis) forms a set of new features that are independent linear combinations of the original input features, and keeps only those new features that have the strongest effect. The idea is, you gain a simpler model at the expense of some information loss. If you want to, you can find out exactly which new features have been dropped. However, in most cases it would be difficult to interpret these dropped features.

Keld · December 5, 2018, 2:15am

In this paper they use attention mechanism to outperform factorization machines.

Happy to discuss – https://arxiv.org/abs/1810.11921

Tripti · December 8, 2018, 12:41am

Hi,
I am looking for resources to get started on processing 3D images. As a simple 3D image categorization task but I am not sure where to get started. I found 3D mnist data. Will the same principles that we have been using to identify the right learning rate work for 3D images as well.

nazmiasri95 · December 13, 2018, 6:31am

I’m still stuck at how should I use fastai collaborative filtering learner to recommend in production. anyone have any end-to-end example ? I already trained it but I’m not sure how should I use it

hwasiti · January 13, 2019, 1:16pm

Update: I could make it work. My next post will describe how.

=================================================

I am trying to import models from the Cadene Pytorch model libraries.
I am trying for 2 days to make create_cnn working, but seems only learn = Learner(data, model) works. create_cnn is much better, but there is something I am missing. I understand that create_cnn expects a callable function and tried the following examples.

Here are working examples:

# this works
def get_model(pretrained=True, model_name = 'resnet50', **kwargs ): # pretrained key should be the first
    arch = models.resnet50(pretrained, **kwargs )
    return arch

learn = Learner(data, get_model(), metrics=[accuracy])

# this works
def get_model(pretrained=True, model_name = 'resnet50', **kwargs ): # pretrained key should be the first
    arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
    return arch

learn = Learner(data, get_model(), metrics=[accuracy])

Here are not working examples:

# this does not work
def get_model(pretrained=True, model_name = 'resnet50', **kwargs ): # pretrained key should be the first
    arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
    return arch

learn = Learner(data, get_model, metrics=[accuracy])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-45-814e6d5c8795> in <module>
      4     return arch
      5 
----> 6 learn = Learner(data, get_model, metrics=[accuracy])

<string> in __init__(self, data, model, opt_func, loss_func, metrics, true_wd, bn_wd, wd, train_bn, path, model_dir, callback_fns, callbacks, layer_groups)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/basic_train.py in __post_init__(self)
    145         self.path = Path(ifnone(self.path, self.data.path))
    146         (self.path/self.model_dir).mkdir(parents=True, exist_ok=True)
--> 147         self.model = self.model.to(self.data.device)
    148         self.loss_func = ifnone(self.loss_func, self.data.loss_func)
    149         self.metrics=listify(self.metrics)

AttributeError: 'function' object has no attribute 'to'

# this does not work
def get_model(pretrained=True, model_name = 'resnet50',  **kwargs ): # pretrained key should be the first
    arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
    return arch

learn = create_cnn(data, get_model(), metrics=[accuracy])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-74cb1e5cde9c> in <module>
      4     return arch
      5 
----> 6 learn = create_cnn(data, get_model(), metrics=[accuracy])

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/vision/learner.py in create_cnn(data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, **kwargs)
     53     "Build convnet style learners."
     54     meta = cnn_config(arch)
---> 55     body = create_body(arch, pretrained, cut)
     56     nf = num_features_model(body) * 2
     57     head = custom_head or create_head(nf, data.c, lin_ftrs, ps=ps, bn_final=bn_final)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/vision/learner.py in create_body(arch, pretrained, cut, body_fn)
     30 def create_body(arch:Callable, pretrained:bool=True, cut:Optional[int]=None, body_fn:Callable[[nn.Module],nn.Module]=None):
     31     "Cut off the body of a typically pretrained `model` at `cut` or as specified by `body_fn`."
---> 32     model = arch(pretrained)
     33     if not cut and not body_fn: cut = cnn_config(arch)['cut']
     34     return (nn.Sequential(*list(model.children())[:cut]) if cut

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/pretrainedmodels/models/torchvision_models.py in forward(self, input)
    338 
    339     def forward(self, input):
--> 340         x = self.features(input)
    341         x = self.logits(x)
    342         return x

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/pretrainedmodels/models/torchvision_models.py in features(self, input)
    320 
    321     def features(self, input):
--> 322         x = self.conv1(input)
    323         x = self.bn1(x)
    324         x = self.relu(x)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/conv.py in forward(self, input)
    318     def forward(self, input):
    319         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 320                         self.padding, self.dilation, self.groups)
    321 
    322 

TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not bool

# this does not work
def get_model(pretrained=True, model_name = 'resnet50',  **kwargs ): # pretrained key should be the first
    arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
    return arch

learn = create_cnn(data, get_model, metrics=[accuracy])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-44-86a48fcf2401> in <module>
      4     return arch
      5 
----> 6 learn = create_cnn(data, get_model, metrics=[accuracy])

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/vision/learner.py in create_cnn(data, arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, **kwargs)
     54     meta = cnn_config(arch)
     55     body = create_body(arch, pretrained, cut)
---> 56     nf = num_features_model(body) * 2
     57     head = custom_head or create_head(nf, data.c, lin_ftrs, ps=ps, bn_final=bn_final)
     58     model = nn.Sequential(body, head)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/callbacks/hooks.py in num_features_model(m)
    115 def num_features_model(m:nn.Module)->int:
    116     "Return the number of output features for `model`."
--> 117     return model_sizes(m)[-1][1]
    118 
    119 def total_params(m:nn.Module)->int:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/callbacks/hooks.py in model_sizes(m, size)
    110     "Pass a dummy input through the model `m` to get the various sizes of activations."
    111     with hook_outputs(m) as hooks:
--> 112         x = dummy_eval(m, size)
    113         return [o.stored.shape for o in hooks]
    114 

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/fastai/callbacks/hooks.py in dummy_eval(m, size)
    105 def dummy_eval(m:nn.Module, size:tuple=(64,64)):
    106     "Pass a `dummy_batch` in evaluation mode in `m` with `size`."
--> 107     return m.eval()(dummy_batch(m, size))
    108 
    109 def model_sizes(m:nn.Module, size:tuple=(64,64))->Tuple[Sizes,Tensor,Hooks]:

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    487             result = self._slow_forward(*input, **kwargs)
    488         else:
--> 489             result = self.forward(*input, **kwargs)
    490         for hook in self._forward_hooks.values():
    491             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai-v1/lib/python3.7/site-packages/torch/nn/modules/pooling.py in forward(self, input)
    563     def forward(self, input):
    564         return F.avg_pool2d(input, self.kernel_size, self.stride,
--> 565                             self.padding, self.ceil_mode, self.count_include_pad)
    566 
    567 

RuntimeError: Given input size: (2048x2x2). Calculated output size: (2048x-4x-4). Output size is too small at /opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THNN/generic/SpatialAveragePooling.c:48

A working example shared here to import any Cadene model is very helpful. It will expand the usable models in fastai for everybody. I heard in Kaggle competitions that there are people could make it work with fastai v1, but could not do it myself. Here is an 8th Kaggle winner in Human Protein Atlas comp. that just did that!

hwasiti · January 13, 2019, 6:59pm

Here is my code importing resnet50 from Cadene Pytorch pretrained models repo. I used resnet50 to make sure that I did not make any mistake which can be validated by comparing it with a normal fastai resnet50 method. The below code can give you access to ~45 models most of them still not available in fastai by simply changing the model’s name.

pretrainedmodels should be installed by:

pip install pretrainedmodels

Our lesson-1 example using pretrained models import:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from fastai import *
from fastai.vision import *
import pretrainedmodels

path = untar_data(URLs.PETS); path
path_anno = path/'annotations'
path_img = path/'images'
fnames = get_image_files(path_img)
np.random.seed(2)
pat = re.compile(r'/([^/]+)_\d+.jpg$')

bs = 64

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(),
                                   size=299, bs=bs//2).normalize(imagenet_stats)

def get_model(pretrained=True, model_name = 'resnet50', **kwargs ): 
    if pretrained:
        arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained='imagenet')
    else:
        arch = pretrainedmodels.__dict__[model_name](num_classes=1000, pretrained=None)
    return arch

custom_head = create_head(nf=2048*2, nc=37, ps=0.5, bn_final=False) 

# Below you can change the imported model into any of the models available in the `pretrainedmodels` 
# which can be shown by: pretrainedmodels.model_names
fastai_resnet50=nn.Sequential(*list(children(get_model(model_name = 'resnet50'))[:-2]),custom_head) 

def get_fastai_model(pretrained=True, **kwargs ): 
    return fastai_resnet50

learn = create_cnn(data, get_fastai_model, metrics=error_rate)
learn.fit_one_cycle(5) 
learn.unfreeze()
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4))

Update-1:
Maybe it is safer to create the custom_head as the following:

custom_head = create_head(nf=num_features_model(self.cnn)*2, nc=data.c, ps=0.5, bn_final=False)

Just in case the output features of the body of a model that you will use is not 2048

Update-2:
Jeremy tweeted an even better implementation for importing architectures from cadene. What I missed in the above code, the points of split for the discriminative learning.

gautik · January 26, 2019, 9:50pm

Looking for an answer to your point 2. That makes me wonder, the weights would not change much if a level in a variable doesn’t appear many times in the dataset.