Human numbers notebook broken?

with the Multi fully connected model,
I get

RuntimeErrorTraceback (most recent call last)
<ipython-input-56-795d2412635c> in <module>()
----> 1 learn.fit_one_cycle(10, 1e-4, pct_start=0.1)

/usr/local/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, 
div_factor, pct_start, wd, callbacks, **kwargs)
 19     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
 20                                         pct_start=pct_start, **kwargs))
---> 21     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
 22 
 23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
164         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 166             callbacks=self.callbacks+callbacks)
167 
168     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
 92     except Exception as e:
 93         exception = e
---> 94         raise e
 95     finally: cb_handler.on_train_end(exception)
 96 

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
 82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
 83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
 85                 if cb_handler.on_batch_end(loss): break
 86 

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
 24     if opt is not None:
 25         loss = cb_handler.on_backward_begin(loss)
---> 26         loss.backward()
 27         cb_handler.on_backward_end()
 28         opt.step()

/usr/local/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
100                 products. Defaults to ``False``.
101         """
--> 102         torch.autograd.backward(self, gradient, retain_graph, create_graph)
103 
104     def register_hook(self, hook):

/usr/local/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
 88     Variable._execution_engine.run_backward(
 89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
 91 
 92 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

when I try to run:

learn.fit_one_cycle(10, 1e-4, pct_start=0.1)

I am on fastai 1.0.37 has something changed with this version?

1 Like

Same here. Curiously, the rest of the notebook works fine.

I updated cuda from 9.1 to 9.2 in my docker image which made other code in Lesson 7 work but it doesn’t resolve this one.

The models following model 2 work though.

I haven’t run the notebook but the error is coming from the use of an in place operation. It might be the line h += self.i_h(xi) in the forward function.

I notice the next model has this line written as h = h + self.i_h(xi). Try that change and see if it fixes things.

1 Like

Interesting - I wonder if it’s a newer pytorch or cuda version that lets it work for me. Anyway, I’ve removed those inplace ops now to avoid the problem.

confirmed this fixes it

Only Model2 is working without in place operation (using commit ID#f6bc052), however, the accuracy are very bad after 10 epochs. The rest of models are still broken.

epoch train_loss valid_loss accuracy
10 3.541940 3.634251 0.057940

My environments are:
Cuda: V9.0.176
fastai: v1.0.38
PyTorch: v1.0.0
conda: v4.5.12