Developer chat

bfarzin · April 5, 2019, 8:09pm

I like the warning idea. At least you would know that something might not be correct in your preds

sgugger · April 6, 2019, 12:17am

No, putting activations inside the model isn’t the way PyTorch intended things, so we follow that. Also there is more numerical stability by doing that.
This could be better documented, not sure about the warning since it would be very annoying.

Most of all, I’m not really sure how it’s getting you something surprising: you have a custom loss function, so you know what you’re doing with the final activations of your model. get_preds returns you the same thing, why is that unexpected?

soorajviraat · April 10, 2019, 12:53am

@kcturgutlu Were you able to get the plot?

piotr.czapla · April 11, 2019, 8:46am

@sgugger I wasn’t suggesting to add softmax layer to the model, that would be unstable as you say.
I meant something like this

    def get_preds(self, ds_type:DatasetType=DatasetType.Valid, with_loss:bool=False, n_batch:Optional[int]=None, pbar:Optional[PBar]=None) -> List[Tensor]:
        activ = getattr(self.model, 'activation', None) # or getattr(self, 'activation', None)
        return get_preds(self.model, self.dl(ds_type), cb_handler=CallbackHandler(self.callbacks),
                         activ=activ, loss_func=lf, n_batch=n_batch, pbar=pbar)

But now as I think about it maybe the warning will be enough.

You might think I’m a bit unflexible but the thinking formed by my work with functional and strongly typed languages. When I reason about code I often think about methods as functions were there are either obvious or explicit dependencies with minimal changes to the internal state. This is why pytorch has the convention of *_ to indicate that the method is changing the state.

Somehow in my mind, the training is independent of inference and the model behavior in production should not depend on the loss function. Tensorflow even though it has odd API it is following this model where serving does not depend on the loss function.

Think about the production use case, what will happen to a pickled model that is loaded to a new version of the library with a changed loss? Is it expected that changes to the training routine affect an already trained model? Or should we try to minimize this kind of side effects?

sgugger · April 11, 2019, 12:54pm

I’m not too sure I follow your logic. The model produces the same things at training and inference: activations just before the last non-linearity (softmax, sigmoid or whatever). The methods get_preds and predict provides convenience help to add those non-linearities when you’re using a standard loss, but that’s just for help. If you do lean.model(x), whether at training or inference, you get exactly the same thing, that may need to be softmaxed or sigmoided before being used.

piotr.czapla · April 14, 2019, 6:32pm

I’m bad at explaining potential issues, and the possible improvement isn’t worth the effort, especially that you pickle and unpickle the whole learner so loss isn’t going to change even if the default changes.

sermakarevich · April 18, 2019, 4:40pm

torch-1.0.1.post2 was causing the issue in my case. installing torch-1.0.1 fixed the issue

fubster · April 22, 2019, 6:45pm

Would there be any interest in porting some of the 0.7 machine learning tools into v1? Or are those best left as course material for the machine learning MOOC. I understand that there are other fully supported decision tree ML frameworks (xgboost etc) but some of the tools like proc_df() I found very useful and very “fastai-ish.”

sgugger · April 22, 2019, 7:20pm

All what proc_df does is done by the tabular processors normally. If there is a specific part missing, please suggest a PR for it.

bfarzin · April 24, 2019, 3:47pm

I have noticed that with a very large corpus, you need a lot of ram to generate your LMDataBunch because all the strings are put into a List[List[str]] and then we run the Counter(p for o in tokens for p in o) in the Vocab create after we have the full list stored in ds.items

I did some testing and find that you can generate that freq while going through the tokens using Counter.update() and then pass along to the Vocab with a slight change to the create. There are a few tests failing but before working out the remaining details I just wanted to check reducing the maximum memory footprint of the tokenization has value to others.

I can put out the PR as soon as I am done resolving the failing tests, but please come back with feedback if this is a bad approach/idea for some reason.

sgugger · April 24, 2019, 4:22pm

We can look at any PR that reduces memory usage, as long as it doesn’t hurt performance too much.

Blowoffvalve · April 28, 2019, 4:21pm

I intend to modify learn.recorder.plot() to print to a file by adding a dest_file parameter. Any opinions/ guidance/ reasons not to? I’m currently having to train models on a PC where i cannot install jupyter and i need to see the output of lr_find().

sgugger · April 28, 2019, 7:50pm

There is already a return_fig argument you can use to get the picture, then you can save it in any file.

TomB · May 3, 2019, 7:35am

Apologies if I’m misunderstanding the code, but looks like there is an error in CNNLearner.has_pool_type:

def has_pool_type(m):
    if is_pool_type(m): return True
    for l in m.children(): return has_pool_type(l)
    return False

Looks like the for loop should only return if the recursive call is true. Currently it will only only check the first child at each level.

sgugger · May 3, 2019, 3:35pm

Yes, there is a bug indeed. Any fix accepted

kai · May 6, 2019, 3:48pm

Hey! I’m dealing with radiographic images that have a higher depth than 8 bit. Would it make sense to make the divisor in the open_image function an argument with the default div_int:int=255 ?

tidylobster · May 6, 2019, 6:33pm

Hi, everyone! Is there any sample code on how to prepare data for bidirectional AWD_LSTM? I encounter this error, but don’t know how to change data_lm to work properly.

RuntimeError: Expected hidden[0] size (2, 32, 575), got (1, 32, 575)

Related question was asked here: Correct way to use bidirectional awd lstm language model learner?

utkb · May 8, 2019, 6:59am

Hi Sylvain,

I have submitted a PR (my first!) with a fix for this, and added a test for it. The test will create a dummy model with Resnet34 archi, and the bugfixed has_pool_type should return True for it (will return False with the bug that @TomB reported).

Hopefully the PR has followed the contribution guidelines. Thanks.

Yijin

bfarzin · May 10, 2019, 7:21pm

I collided with this again. Should we add this to the Transformer config to set the alpha=0. by default?

sgugger · May 10, 2019, 7:23pm

It’s not in the config for now.