I decided to branch out a conversation from another thread.
Basically, I have a regression problem to solve. The way I go about setting up my data and my model is the following:
procs = [Categorify]
data = TabularDataBunch.from_df(path=path,
df=df,
dep_var=dep_var,
valid_idx=valid_idx,
procs=procs,
cat_names=cat_col,
cont_names=con_col, c=1)
range_scale = 1.2
y_range = (
float(df.iloc[train_idx].Vfinal_min_Vf50.min() * range_scale),
float(df.iloc[train_idx].Vfinal_min_Vf50.max() * range_scale)
)
def rmse(pred, targ):
"RMSE between `pred` and `targ`."
return torch.sqrt(((targ - pred)**2).mean())
emb_szs = {'weekday': 3}
learn = get_tabular_learner(data,
layers=[200,100],
emb_szs=emb_szs,
y_range=y_range) #, metrics=rmse)
I cannot use my own metric (RMSE), as it gives me the error indicative of the fact that it is doing a classification problem (i.e. it thinks my validation set had 96 levels with predicted 89 levels):
RuntimeError: The size of tensor a (96) must match the size of tensor b (89) at non-singleton dimension 1
Moreover, when I run learn.layer_groups
, I get the following output:
[Sequential(
(0): Embedding(6, 3)
(1): Dropout(p=0.0)
(2): BatchNorm1d(7, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Linear(in_features=10, out_features=200, bias=True)
(4): ReLU(inplace)
(5): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): Linear(in_features=200, out_features=100, bias=True)
(7): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): Linear(in_features=100, out_features=89, bias=True)
)]
Notice that the last layer has 89 features to output, while I only need 1, i.e. a real number.
FastAI version 1.0.22 had a “hidden” parameter c
one could pass to TabularDataBunch as discussed in the parent thread, but in the current version 1.0.24 that parameter does not have any effect on the kind of network being constructed.
Update
I noticed that kwargs
do not get passed to data created from TabularDataBunch.from_df
, see the code. Instead, you need to manually pass down c
after data creation, e.g.
data = TabularDataBunch.from_df(path=path,
df=df,
dep_var=dep_var,
valid_idx=valid_idx,
procs=procs,
cat_names=cat_names,
cont_names=cont_names)
data.c=1
.
.
.
learn = get_tabular_learner(data,
layers=[200,100],
emb_szs=emb_szs,
y_range=y_range)
The resultant network looks like it may be capable of doing regression, having only one output in the last layer:
[Sequential(
(0): Embedding(6, 3)
(1): Dropout(p=0.0)
(2): BatchNorm1d(7, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Linear(in_features=10, out_features=200, bias=True)
(4): ReLU(inplace)
(5): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): Linear(in_features=200, out_features=100, bias=True)
(7): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): Linear(in_features=100, out_features=1, bias=True)
)]
Unfortunately, that breaks something down the line when I try to teach the model with learn.fit_one_cycle(1, 1e-2)
. I believe it still thinks I am trying to teach it a classification task with only one output this time (i.e. 1 or 0) and it trips on the fact that my outputs lie outside of [0,1] range:
RuntimeError Traceback (most recent call last)
<ipython-input-20-3ea49add0339> in <module>
----> 1 learn.fit_one_cycle(1, 1e-2)
/opt/conda/lib/python3.6/site-packages/fastai/train.py in fit_one_cycle(learn, cyc_len, max_lr, moms, div_factor, pct_start, wd, callbacks, **kwargs)
18 callbacks.append(OneCycleScheduler(learn, max_lr, moms=moms, div_factor=div_factor,
19 pct_start=pct_start, **kwargs))
---> 20 learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
21
22 def lr_find(learn:Learner, start_lr:Floats=1e-7, end_lr:Floats=10, num_it:int=100, stop_div:bool=True, **kwargs:Any):
/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
160 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
161 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 162 callbacks=self.callbacks+callbacks)
163
164 def create_opt(self, lr:Floats, wd:Floats=0.)->None:
/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
92 except Exception as e:
93 exception = e
---> 94 raise e
95 finally: cb_handler.on_train_end(exception)
96
/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
82 for xb,yb in progress_bar(data.train_dl, parent=pbar):
83 xb, yb = cb_handler.on_batch_begin(xb, yb)
---> 84 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
85 if cb_handler.on_batch_end(loss): break
86
/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
23
24 if opt is not None:
---> 25 loss = cb_handler.on_backward_begin(loss)
26 loss.backward()
27 cb_handler.on_backward_end()
/opt/conda/lib/python3.6/site-packages/fastai/callback.py in on_backward_begin(self, loss)
219 def on_backward_begin(self, loss:Tensor)->None:
220 "Handle gradient calculation on `loss`."
--> 221 self.smoothener.add_value(loss.detach().cpu())
222 self.state_dict['last_loss'], self.state_dict['smooth_loss'] = loss, self.smoothener.smooth
223 for cb in self.callbacks:
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCTensorCopy.cpp:70
Any suggestions?
Update 2
I believe one of the culprits is the directive of using cross-entropy as loss measure. The switch occurs within data_block.py
inside label_cls
function. My FastAI code base did not have if isinstance(it, (float, np.float32)): return FloatList
line, but when I include it another error pops up so the solution is not there yet.