Human numbers notebook broken?

data-drone · December 16, 2018, 9:12am

with the Multi fully connected model,
I get

RuntimeErrorTraceback (most recent call last)
<ipython-input-56-795d2412635c> in <module>()
----> 1 learn.fit_one_cycle(10, 1e-4, pct_start=0.1)

/usr/local/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, 
div_factor, pct_start, wd, callbacks, **kwargs)
 19     callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
 20                                         pct_start=pct_start, **kwargs))
---> 21     learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
 22 
 23 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
164         callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
165         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 166             callbacks=self.callbacks+callbacks)
167 
168     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
 92     except Exception as e:
 93         exception = e
---> 94         raise e
 95     finally: cb_handler.on_train_end(exception)
 96 

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
 82             for xb,yb in progress_bar(data.train_dl, parent=pbar):
 83                 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
 85                 if cb_handler.on_batch_end(loss): break
 86 

/usr/local/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
 24     if opt is not None:
 25         loss = cb_handler.on_backward_begin(loss)
---> 26         loss.backward()
 27         cb_handler.on_backward_end()
 28         opt.step()

/usr/local/lib/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
100                 products. Defaults to ``False``.
101         """
--> 102         torch.autograd.backward(self, gradient, retain_graph, create_graph)
103 
104     def register_hook(self, hook):

/usr/local/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
 88     Variable._execution_engine.run_backward(
 89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
 91 
 92 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

when I try to run:

learn.fit_one_cycle(10, 1e-4, pct_start=0.1)

I am on fastai 1.0.37 has something changed with this version?

jcatanza · December 18, 2018, 1:17am

Same here. Curiously, the rest of the notebook works fine.

data-drone · December 18, 2018, 3:21am

I updated cuda from 9.1 to 9.2 in my docker image which made other code in Lesson 7 work but it doesn’t resolve this one.

The models following model 2 work though.

KarlH · December 18, 2018, 4:00am

I haven’t run the notebook but the error is coming from the use of an in place operation. It might be the line h += self.i_h(xi) in the forward function.

I notice the next model has this line written as h = h + self.i_h(xi). Try that change and see if it fixes things.

jeremy · December 18, 2018, 5:27am

Interesting - I wonder if it’s a newer pytorch or cuda version that lets it work for me. Anyway, I’ve removed those inplace ops now to avoid the problem.

data-drone · December 18, 2018, 7:43am

confirmed this fixes it

Moody · December 19, 2018, 8:58am

Only Model2 is working without in place operation (using commit ID#f6bc052), however, the accuracy are very bad after 10 epochs. The rest of models are still broken.

epoch	train_loss	valid_loss	accuracy
10	3.541940	3.634251	0.057940

My environments are:
Cuda: V9.0.176
fastai: v1.0.38
PyTorch: v1.0.0
conda: v4.5.12