Oh my bad then, sorry about that
I got an error in basic_train.py line 270 because it is not recognizing table=True
,changing the code to remove this from the call, solves the problem.
Is it a bug?
Another potential bug:
When I create a DataBunch from a TensorDataset, as in lesson 5, and then try creating an ClassificationInterpretation
, it breaks.
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(9, figsize=(7,7))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-33-f4ec02bb4041> in <module>()
1 interp = ClassificationInterpretation.from_learner(learn)
----> 2 interp.plot_top_losses(9, figsize=(7,7))
~/Code/fastai/fastai/vision/learner.py in plot_top_losses(self, k, largest, figsize)
96 "Show images in `top_losses` along with their prediction, actual, loss, and probability of predicted class."
97 tl_val,tl_idx = self.top_losses(k,largest)
---> 98 classes = self.data.classes
99 rows = math.ceil(math.sqrt(k))
100 fig,axes = plt.subplots(rows,rows,figsize=figsize)
~/Code/fastai/fastai/basic_data.py in __getattr__(self, k)
99 return cls(*dls, path=path, device=device, tfms=tfms, collate_fn=collate_fn)
100
--> 101 def __getattr__(self,k:int)->Any: return getattr(self.train_dl, k)
102 def dl(self, ds_type:DatasetType=DatasetType.Valid)->DeviceDataLoader:
103 "Returns appropriate `Dataset` for validation, training, or test (`ds_type`)."
~/Code/fastai/fastai/basic_data.py in __getattr__(self, k)
22
23 def __len__(self)->int: return len(self.dl)
---> 24 def __getattr__(self,k:str)->Any: return getattr(self.dl, k)
25
26 @property
~/Code/fastai/fastai/basic_data.py in DataLoader___getattr__(dl, k)
6 __all__ = ['DataBunch', 'DeviceDataLoader', 'DatasetType']
7
----> 8 def DataLoader___getattr__(dl, k:str)->Any: return getattr(dl.dataset, k)
9 DataLoader.__getattr__ = DataLoader___getattr__
10
AttributeError: 'TensorDataset' object has no attribute 'classes'
That conversation has already been had on another topic, if you’re not using fastai to create your dataset, don’t expect all fastai functionalities to work on it. Here the problem is that your TensorDataset
doesn’t have the classes
attribute that ClassificationInterpretation
requires.
After experimenting a bit, and going back and forth, we finally settled on adding a MAJ token: each word that begins with a capital is lower cased (as before) but we add xxmaj in front of it to tell the model. It appears to help a little bit.
There is a new pretrained model to match that change: you’ll find it in URLs.WT103_1
The text example notebook has been updated to use it (and went from 79% to 84.5% accuracy in the process!)
Sorry, I didn’t see the question in another topic before posting here.
A lot of stuff aimed at unifying the API accross applications just merged:
- every type of items now has a
reconstruct
method that does the opposite of.data
: taking the tensor data and creating the object back. -
show_batch
has been internally modified to actually grab a batch then showing it. -
show_results
now works across applications. - introducing
data.export()
that will save the internal information (classes, vocab in text, processors in tabular etc) need for inference in a file named ‘export.pkl’. You can then create anempty_data
object by usingDataBunch.load_empty(path)
(wherepath
points to where this ‘export.pkl’ file is). This also works across applications.
Breaking change:
As a result ImageDataBunch.single_from_classes
has been removed as the previous method is more general.
Awesome! Sylvain can you point me to the scripts you are using to create the pre-trained model, I’d like to see if I can get some improvements using BiLM training and qrnn.
I wrote a small Tensorboard callback to visualize the metrics and the parameter/gradient distributions and histograms: https://nbviewer.jupyter.org/github/MicPie/fastai_course_v3/blob/master/TBLogger_v2.ipynb
It is still a work in progress, because the code needs to polished and is only tested with the network in the notebook.
Could this be interesting for the library? If, how would I best incorporate the needed Logger class (with the Copyleft license)?
Feedback, suggestions, tips, and etc. are highly appreciated!
PS: I don’t know if switching to TensorboardX would be a better choice. Maybe somebody worked already with TensorbordX and can share his experience?
I put the latest notebook I used to pretrain a QRNN here. Didn’t fully test the true_wd=False
so you’ll have to add that.
I think the latest to_detach
change broke the RNNCore
s forward
method. I am getting a RuntimeError
letting me know that input and hidden tensors are not on the same device. Since this wasn’t marked as a breaking change I guess it is a bug. How to proceed?
Will look into that later today. It’s definitely a bug!
Is there anyone here using fastai-v1 with macOS that can help us reproduce and debug fastai test suite failure on that system?(segfault in tests/test_vision_data_block.py)
https://dev.azure.com/fastdotai/fastai/_build/results?buildId=1930&view=logs
Most likely it’s related to this pytorch issue. And we would need to first reproduce this problem, and then reduce it to a simple test we could then file an issue with against pytorch.
Thanks.
I have OS X and can help
fwiw - i’m running python 3.7 and with the latest pull of fastai i’m not getting any failures in tests/test_vision_data_block.py
i was at first but once i deleted an old copy of mnist that was missing a test folder it worked fine
( osx 10.14.1 - no gpu )
Thank you for testing this Fred,
I’ve now updated our CI to run the correct up-to-date conda package on MacOS. They confusingly renamed pytorch-nightly-cpu
to pytorch-nightly
some weeks back. But this build works fine.
So it’s still something related to pypi build, and other then potential nuances in the 2 different package builds, the main difference is that conda and pypi install targets are on different drives it seems on the CI build. That’s why I thought it could be related to this pytorch issue . Is there a chance you could try and reproduce it so that the env and the data are on different mount points? basically moving the test suite to another /mnt/ point. See: https://github.com/pytorch/pytorch/issues/4969#issuecomment-381132009
And for the sake of searchers the error is:
=================================== FAILURES ===================================
______________________ test_image_to_image_different_tfms ______________________
def test_image_to_image_different_tfms():
get_y_func = lambda o:o
mnist = untar_data(URLs.COCO_TINY)
x_tfms = get_transforms()
y_tfms = [[t for t in x_tfms[0]], [t for t in x_tfms[1]]]
y_tfms[0].append(flip_lr())
data = (ImageItemList.from_folder(mnist)
.random_split_by_pct()
.label_from_func(get_y_func)
.transform(x_tfms)
.transform_y(y_tfms)
.databunch(bs=16))
> x,y = data.one_batch()
tests/test_vision_data_block.py:96:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
fastai/basic_data.py:115: in one_batch
try: x,y = next(iter(dl))
fastai/basic_data.py:47: in __iter__
for b in self.dl:
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/site-packages/torch/utils/data/dataloader.py:631: in __next__
idx, batch = self._get_batch()
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/site-packages/torch/utils/data/dataloader.py:610: in _get_batch
return self.data_queue.get()
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/multiprocessing/queues.py:94: in get
res = self._recv_bytes()
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/multiprocessing/connection.py:216: in recv_bytes
buf = self._recv_bytes(maxlength)
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/multiprocessing/connection.py:407: in _recv_bytes
buf = self._recv(4)
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/multiprocessing/connection.py:379: in _recv
chunk = read(handle, remaining)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
signum = 20, frame = <frame object at 0x1050e8048>
def handler(signum, frame):
# This following call uses `waitid` with WNOHANG from C side. Therefore,
# Python can still get and update the process status successfully.
> _error_if_any_worker_fails()
E RuntimeError: DataLoader worker (pid 1201) is killed by signal: Unknown signal: 0.
/Users/vsts/hostedtoolcache/Python/3.6.5/x64/lib/python3.6/site-packages/torch/utils/data/dataloader.py:274: RuntimeError
----------------------------- Captured stderr call -----------------------------
ERROR: Unexpected segmentation fault encountered in worker.
just to be clear, the bug you point to mentions /mnt which is a linux thing
(on osx there is /Volumes)
Am I still working on OSX or it is it ok to use different mnts on linux?
Oh, sorry, I don’t know osx, I assumed it’s the same as linux (mount-points-wise), but perhaps it’s not. I guess you need to go backwards from this solution, to reproduce the problem. Does it make sense?
I am not sure yet about testing it on linux - I will do that shortly myself. The CIs on linux and osx are configured identically, and only osx fails. But the original bug report is on linux, soI will certainly test that to rule it out.