You need to call
reset on your model first, if it’s an RNN, when debugging it in this way. That’s what creates the initial hidden state.
You need to call
How can I make a custom metric that return more than 1 number?
Oh my! I actually spent a lot of time to go through the code but cannot find the solution…I don’t know why I didn’t see any notification… Thanks! It works like a charm.
Why is the reset function call after looping self.hidden? If so how does hidden state created before this when I am doing fit(). I tried read through the source code but I cannot understand the flow exactly…
Sorry I don’t understand your question - can you give more detail please?
Sorry for not being clear.
From my understanding, self.hidden is created from RNN_Encoder.reset(), however, in the MultiBatchRNN forward(), the self.hidden loop comes earlier than super().forward(). So how does the self.hidden created at the first place? This essentially equivalent to the question, why do I have to call m.reset() manually when I pass a batch to the model, but I don’t have to do so when calling fit() or lr_find()? ( It would be great if you can give me some advice how could I use pdb to find the solution of this question, I only try to print things in function so I know the calling stack of function, but I don’t know how can I use debugger to find out how self.hidden is created)
Stepper for RNNs handles calling
reset for you.
I’m running latest version of notebook from Lesson resources on my Windows laptop with 960m GPU, that is not supported by pytorch 0.3, so using latest conda env with pytorch 0.4.0.
I’m running into same problem as NotImplementedError: During the fitting of Language Model in rnn_reg.py
learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1)
debugging proves that there is no Embedding in
fastai\lib\site-packages\torch\nn\backends\backend.py when calling
I wonder is there a path forward to solve this? Should I open this kind of issue in pytorch github?
P.S. Is it OK not to have backward compatability for pytorch? E.g. functions that worked in 0.3 don’t work in 0.4
You’ll need to
git pull to get the latest version which fixes this.
It fixed that error. I was running out of memory on 4GB GPU, so I decreased
bs=16, and now there is another error running learner.fit
RuntimeError Traceback (most recent call last) <ipython-input-20-b544778ca021> in <module>() ----> 1 learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1) C:\Users\Developer\fastai\courses\dl2\fastai\learner.py in fit(self, lrs, n_cycle, wds, **kwargs) 285 self.sched = None 286 layer_opt = self.get_layer_opt(lrs, wds) --> 287 return self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs) 288 289 def warm_up(self, lr, wds=None): C:\Users\Developer\fastai\courses\dl2\fastai\learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs) 232 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16, 233 swa_model=self.swa_model if use_swa else None, swa_start=swa_start, --> 234 swa_eval_freq=swa_eval_freq, **kwargs) 235 236 def get_layer_groups(self): return self.models.get_layer_groups() C:\Users\Developer\fastai\courses\dl2\fastai\model.py in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, **kwargs) 130 batch_num += 1 131 for cb in callbacks: cb.on_batch_begin() --> 132 loss = model_stepper.step(V(x),V(y), epoch) 133 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom) 134 debias_loss = avg_loss / (1 - avg_mom**batch_num) C:\Users\Developer\fastai\courses\dl2\fastai\model.py in step(self, xs, y, epoch) 55 if self.loss_scale != 1: assert(self.fp16); loss = loss*self.loss_scale 56 if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss) ---> 57 loss.backward() 58 if self.fp16: update_fp32_grads(self.fp32_params, self.m) 59 if self.loss_scale != 1: C:\Users\Developer\Anaconda3\envs\fastai\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph) 91 products. Defaults to ``False``. 92 """ ---> 93 torch.autograd.backward(self, gradient, retain_graph, create_graph) 94 95 def register_hook(self, hook): C:\Users\Developer\Anaconda3\envs\fastai\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables) 87 Variable._execution_engine.run_backward( 88 tensors, grad_tensors, retain_graph, create_graph, ---> 89 allow_unreachable=True) # allow_unreachable flag 90 91 RuntimeError: inconsistent range for TensorList output
@jeremy Is there a documentation available of the exact preprocessing and training steps for the language model (WikiText-103). I looked at the Hindi repository, but I’m not sure, if the “t_up” trick is used there?
So it would be great to have a kind of reference implementation/documentation for the WikiText-103 language model
This is the script we used: https://github.com/fastai/fastai/blob/master/courses/dl2/imdb_scripts/train_tri_wt.py
Interesting, this comes with the freezing of layers. The model trains without error if you unfreeze it completely, I don’t know where this one comes from.
I’ll try to look into it.
Hi all. I’m using Google Colaboratory and I’m getting an AttributeError when trying to install scipy sparse - module ‘scipy’ has no attribute ‘sparse’. I’ve searched the forums and notice others having the same issue.
Not a fast.ai problem but, just wondering if anyone has found a patch…
I have installed scipy-1.1.0 and also imported it directly using - from scipy import sparse as sp
Any help would be greatly appreciated!
AttributeError Traceback (most recent call last)
----> 1 from fastai.text import *
2 import html
/usr/local/lib/python3.6/dist-packages/fastai/text.py in ()
----> 1 from .core import *
2 from .learner import *
3 from .lm_rnn import *
4 from torch.utils.data.sampler import Sampler
5 import spacy
/usr/local/lib/python3.6/dist-packages/fastai/core.py in ()
----> 1 from .imports import *
2 from .torch_imports import *
4 def sum_geom(a,r,n): return an if r==1 else math.ceil(a(1-r**n)/(1-r))
/usr/local/lib/python3.6/dist-packages/fastai/imports.py in ()
3 import pandas as pd, pickle, sys, itertools, string, sys, re, datetime, time, shutil, copy
4 import seaborn as sns, matplotlib
----> 5 import IPython, graphviz, sklearn_pandas, sklearn, warnings, pdb
6 import contextlib
7 from abc import abstractmethod
/usr/local/lib/python3.6/dist-packages/sklearn_pandas/init.py in ()
1 version = ‘1.6.0’
----> 3 from .dataframe_mapper import DataFrameMapper # NOQA
4 from .cross_validation import cross_val_score, GridSearchCV, RandomizedSearchCV # NOQA
5 from .categorical_imputer import CategoricalImputer # NOQA
/usr/local/lib/python3.6/dist-packages/sklearn_pandas/dataframe_mapper.py in ()
5 import numpy as np
6 from scipy import sparse
----> 7 from sklearn.base import BaseEstimator, TransformerMixin
9 from .cross_validation import DataWrapper
/usr/local/lib/python3.6/dist-packages/sklearn/init.py in ()
133 from . import __check_build
–> 134 from .base import clone
135 __check_build # avoid flakes unused variable error
/usr/local/lib/python3.6/dist-packages/sklearn/base.py in ()
11 from scipy import sparse
12 from .externals import six
—> 13 from .utils.fixes import signature
14 from . import version
/usr/local/lib/python3.6/dist-packages/sklearn/utils/init.py in ()
10 from .murmurhash import murmurhash3_32
—> 11 from .validation import (as_float_array,
13 check_random_state, column_or_1d, check_array,
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in ()
14 import numpy as np
—> 15 import scipy.sparse as sp
17 from …externals import six
AttributeError: module ‘scipy’ has no attribute ‘sparse’
Is it me or the files are no longer existing at files.fast.ai?
To me, the files are still there.
Thanks they’re back online now
Is there a definitive answer to this question anywhere?
Based on the ULMFiT paper, the recommendation is to fine-tune “only the last layer” (section 3.2) before unfreezing and applying discriminative learning rates to the other layers. As such, why is there the line
learner.unfreeze() immediately before fitting the model begins? It seems that it should be
learner.freeze(-1) unless I’m missing something (which is typically the case )
Hey Christine, I’m starting to take a look at something similar and I’m curious what your results were here?
I thought it was a typo but didn’t get round to reporting it, i do the same with just unfreezing final layer
Did you have any luck with this? Can you not simply add more classes?
Hi - Unfortunately I wasn’t able to get better results with the different losses. (I got everything up and running, but the accuracy was always less good.) I’m sure I didn’t exhaust every possibility though, so let me know if you have better success than I did!