09 is currently broken due to a change in PyTorch. @sgugger is working on fixing it.
The bug was coming from the change of default in affine_coord/grid_sample, I fixed it by passing the proper value for now. We can decide if we want the value of align_corners
to be True or False later on, for now I set it to True (so the opposite of PyTorch 1.4.0 but the sane as earlier behavior). It’s an arg so the user can change it as they like.
This required to pin the min version of PyTorch to 1.3.0, and also note we have now pinned torchvision to be 0.5.0 or later to deal with the bug that goes with recent versions of PIL.
I understand, thank you both for the support
Great! Thanks for the updates! I’ll be trying again today. By the way, I always wondered why you were fixing the PyTorch version, so thanks for that explanation
I’ve just created https://github.com/fastai/fastcore/pull/10 to explain this in the docs but … _xtra
is optional. So if _xtra is None
, all attr access is passed down to self.default
There was a bit of too complicated magic in Transform
with all the as_item
and as_item_force
attributes that resulted in hard-to-read code and subtle bugs (one taking a whole afternoon to Jeremy and me pairing to fix…) so I’ve done a bit of clean up there. This is the behavior of Transform
now:
- a
Transform
is always applied to each element of a tuple when it gets a tuple (subclasses of tuples too but not named tuples, not lists, not Ls) otherwise it takes the elements it receives and is applied - if you don’t want that default behavior, use an
ItemTransform
that will always be applied on the thing it receives, without looking at tuples (for instance,ItemGetter
is such anItemTransform
cause it takes a whole tuple and takes the i-th element)
This is in master of fastcore and fastai2, all tests are passing AFAICT. Next will be adding some support to easily debug the pipelines of transforms in TfmdLists and Datasets (it’s already there for DataBlock).
A helpful little note (didn’t think it needed it’s own thread):
To get results similar to with progress_disabled_ctx(learn) as learn:
back in v1, do:
with learn.no_logging():
with learn.no_bar():
learn.fit()
Note that you can put them both on the same line:
with learn.no_logging(), learn.no_bar():
learn.fit(...)
Hi @jeremy, I am very eager to join the March course online. While I am not very active on the forums here, but I have been trying to learn fastai v2 from @muellerzr notebooks. I will be grateful if you could please allow me to view the course online. If I can do something now to qualify myself, please let me know.
Is there an easy explanation for why Transform
is applied to all elements of a tuple but not of a list or named tuple?
I am still working on this problem. I have tried it a couple of times in another machine (no GPU, cheaper) and a similar thing happens (I updated the repositories as well). For some reason it also causes one of the AWS checks to fail so I have to reset the machine.
Now I am trying with n_workers
set to half of those available, to see if this helps. I know I should try to disable parallel processing for better debugging, but with this many documents (remember, about 3.7 million) I am not sure that’s really an option.
Does anyone has an issue with the latest fastai2 version (0.0.11… I was using version 0.0.8 before) when using this import:
from fastai2.vision.all import *
When switching to torch 1.4.0 this error dissapears but new issues occurs with RNNs.
(I can put this issues on github fastai2 repo, but prefer to have a quick feedback in case I’m missing something)
Pytorch&co versions (I’ve created a new docker image from the official pytorch tags on docker hub):
Blockquote
torch (1.4.0) - Tensors and Dynamic neural networks in Python with strong GPU acceleration
INSTALLED: 1.3.0
torchvision (0.5.0) - image and video datasets and models for torch deep learning
INSTALLED: 0.5.0 (latest)
Error:
Blockquote
File “/root/workspace/fastai2/fastai2/vision/all.py”, line 3, in
from .augment import *
File “/root/workspace/fastai2/fastai2/vision/augment.py”, line 91, in
from torchvision.transforms.functional import pad as tvpad
File “/opt/conda/lib/python3.6/site-packages/torchvision/init.py”, line 3, in
from torchvision import models
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/init.py”, line 12, in
from . import detection
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/init.py”, line 1, in
from .faster_rcnn import *
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py”, line 13, in
from .rpn import AnchorGenerator, RPNHead, RegionProposalNetwork
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/rpn.py”, line 11, in
from . import _utils as det_utils
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py”, line 19, in
class BalancedPositiveNegativeSampler(object):
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1219, in script
_compile_and_register_class(obj, _rcb, qualified_name)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1076, in _compile_and_register_class
_jit_script_class_compile(qualified_name, ast, rcb)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/_recursive.py”, line 222, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1226, in script
fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
RuntimeError:
builtin cannot be used as a value:
at /opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:14:56
def zeros_like(tensor, dtype):
# type: (Tensor, int) -> Tensor
return torch.zeros_like(tensor, dtype=dtype, layout=tensor.layout,
~~~~~~~~~~~~~ <— HERE
device=tensor.device, pin_memory=tensor.is_pinned())
‘zeros_like’ is being compiled since it was called from ‘torch.torchvision.models.detection._utils.BalancedPositiveNegativeSampler.call’
at /opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:72:12
# randomly select positive and negative examples
perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]
pos_idx_per_image = positive[perm1]
neg_idx_per_image = negative[perm2]
# create binary mask from indices
pos_idx_per_image_mask = zeros_like(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~... <--- HERE
matched_idxs_per_image, dtype=torch.uint8
)
neg_idx_per_image_mask = zeros_like(
matched_idxs_per_image, dtype=torch.uint8
)
pos_idx_per_image_mask[pos_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
neg_idx_per_image_mask[neg_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
The pipelines in the mid-level API will produce tuples, which is why we need this type to be special. We excluded other collections like lists or Ls to have an easy way for people using Transforms
to be applied on a collection as a whole (just pass a list instead of a tuple). For instance, a LabeledBBox
subclasses L
(it as a list of boxes and their labels) and it will always be considered as one object for the transforms.
Namedtuple are excluded for now as a hack around the iterations over rows of a dataframe. It produces a special pandas class that subclasses named tuples but we need those rows to be considered as one item. Once we have fixed at the low level how that iteration is done, we can remove them from being an exception (since they are a subclass of tuple).
Cool, thanks for the insight!
…and the reason for this is that PyTorch datasets are expected to create tuples, and DataLoader
batches are tuples.
I am experiencing a problem with the DataBlock.datasets()
using a list of tuples (numpy.ndarray, str) as a source. Using fastai2 v0.0.10, I was able to create dataloaders using both Datasets
and DataBlock
classes. When I updraded to v.0.0.11 this issue appeared.
I updated all my transforms that were using as_item=True
to ItemTransform
. By doing so, I was able to create a dataset using Datasets
class. Datasets.vocab
works fine and I can plot a timeseries from my dataset using show_at
.
My items are basically a list of tuples. The tuples are numpy.ndarray
and string
. The error occur in the Categorize.setups()
(in fastai2.data.transforms.py
). It says TypeError: unhashable type: 'numpy.ndarray'
. It looks like the numpy array is passed in instead of the strings. I suspect that it has to do with the DataBlock.getters()
but all my attempts were not successful.
Source code and the error stack are here below
class TensorTS(TensorBase):
"Transform a 2D array into a Tensor"
def show(self, ctx=None, title=None, chs=None, leg=True, **kwargs):
....
class ToTensorTS(ItemTransform):
"xy : tuple representing (2D numpy array, Label)"
def encodes(self, xy): x,y=xy; return TensorTS(x)
def TSBlock():
"`TransformBlock` for timeseries : Transform np array to TensorTS type"
return TransformBlock(type_tfms=ToTensorTS())
class LabelTS(ItemTransform):
"x : tuple representing (2D numpy array, Label)"
def encodes(self, xy): x,y=xy; return y
getters = [lambda xy: xy[0], lambda xy: xy[1]]
tsdb = DataBlock(blocks=(TSBlock, CategoryBlock),
get_items=get_ts_items,
getters=getters,
splitter=RandomSplitter(seed=seed),
batch_tfms = batch_tfms)
tsdb.datasets(fnames, verbose=True)
=======================
Collecting items from [Path('C:/Users/fh/.fastai/data/NATOPS/NATOPS_TRAIN.arff'), Path('C:/Users/fh/.fastai/data/NATOPS/NATOPS_TEST.arff')]
Found 360 items
2 datasets of sizes 288,72
Setting up Pipeline: <lambda> -> ToTensorTSBlock
TYPE of xy : <class 'list'>
LENTGH of xy : 2
[array([-0.540579, -0.54101 , -0.540603, -0.540807, -0.540564, -0.540681,
-0.540665, -0.541065, -0.540593, -0.540723, -0.54044 , -0.540232,
-0.540191, -0.540209, -0.53981 , -0.539904, -0.540259, -0.540194,
-0.54002 , -0.54033 , -0.546852, -0.551316, -0.553834, -0.558104,
-0.560738, -0.558948, -0.559712, -0.565038, -0.560855, -0.55511 ,
-0.553586, -0.55047 , -0.555965, -0.554922, -0.55378 , -0.557211,
-0.556262, -0.558439, -0.560581, -0.560134, -0.55657 , -0.564141,
-0.56727 , -0.568937, -0.572611, -0.570396, -0.569147, -0.564811,
-0.56305 , -0.566314, -0.553712], dtype=float32), '2']
TYPE of x : <class 'numpy.ndarray'>
TYPE of y : <class 'str'>
x : [-0.540579 -0.54101 -0.540603 -0.540807 -0.540564 -0.540681 -0.540665
-0.541065 -0.540593 -0.540723 -0.54044 -0.540232 -0.540191 -0.540209
-0.53981 -0.539904 -0.540259 -0.540194 -0.54002 -0.54033 -0.546852
-0.551316 -0.553834 -0.558104 -0.560738 -0.558948 -0.559712 -0.565038
-0.560855 -0.55511 -0.553586 -0.55047 -0.555965 -0.554922 -0.55378
-0.557211 -0.556262 -0.558439 -0.560581 -0.560134 -0.55657 -0.564141
-0.56727 -0.568937 -0.572611 -0.570396 -0.569147 -0.564811 -0.56305
-0.566314 -0.553712]
y : 2
Setting up Pipeline: <lambda> -> Categorize
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 tsdb.datasets(fnames, verbose=True)
c:\users\fh\dev\fastai2\fastai2\fastai2\data\block.py in datasets(self, source, verbose)
93 splits = (self.splitter or RandomSplitter())(items)
94 pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
---> 95 return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
96
97 def dataloaders(self, source, path='.', verbose=False, **kwargs):
c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
259 def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
260 super().__init__(dl_type=dl_type)
--> 261 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
262 self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
263
c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in (.0)
259 def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
260 super().__init__(dl_type=dl_type)
--> 261 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
262 self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
263
c:\users\fh\dev\fastai2\fastcore\fastcore\foundation.py in __call__(cls, x, *args, **kwargs)
39 return x
40
---> 41 res = super().__call__(*((x,) + args), **kwargs)
42 res._newchk = 0
43 return res
c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose)
200 if do_setup:
201 pv(f"Setting up {self.tfms}", verbose)
--> 202 self.setup(train_setup=train_setup)
203
204 def _new(self, items, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, **kwargs)
c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in setup(self, train_setup)
213
214 def setup(self, train_setup=True):
--> 215 self.tfms.setup(self, train_setup)
216 if len(self) != 0:
217 x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]
c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in setup(self, items, train_setup)
177 tfms = self.fs[:]
178 self.fs.clear()
--> 179 for t in tfms: self.add(t,items, train_setup)
180
181 def add(self,t, items=None, train_setup=False):
c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in add(self, t, items, train_setup)
180
181 def add(self,t, items=None, train_setup=False):
--> 182 t.setup(items, train_setup)
183 self.fs.append(t)
184
c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in setup(self, items, train_setup)
76 def setup(self, items=None, train_setup=False):
77 train_setup = train_setup if self.train_setup is None else self.train_setup
---> 78 return self.setups(getattr(items, 'train', items) if train_setup else items)
79
80 def _call(self, fn, x, split_idx=None, **kwargs):
c:\users\fh\dev\fastai2\fastcore\fastcore\dispatch.py in __call__(self, *args, **kwargs)
96 if not f: return args[0]
97 if self.inst is not None: f = MethodType(f, self.inst)
---> 98 return f(*args, **kwargs)
99
100 def __get__(self, inst, owner):
c:\users\fh\dev\fastai2\fastai2\fastai2\data\transforms.py in setups(self, dsets)
184
185 def setups(self, dsets):
--> 186 if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, add_na=self.add_na)
187 self.c = len(self.vocab)
188
c:\users\fh\dev\fastai2\fastai2\fastai2\data\transforms.py in __init__(self, col, sort, add_na)
169 if not hasattr(col,'unique'): col = L(col, use_list=True)
170 # `o==o` is the generalized definition of non-NaN used by Pandas
--> 171 items = L(o for o in col.unique() if o==o)
172 if sort: items = items.sorted()
173 self.items = '#na#' + items if add_na else items
c:\users\fh\dev\fastai2\fastcore\fastcore\foundation.py in unique(self)
372 return self._new(i for i,o in enumerate(self) if f(o))
373
--> 374 def unique(self): return L(dict.fromkeys(self).keys())
375 def enumerate(self): return L(enumerate(self))
376 def val2idx(self): return {v:k for k,v in self.enumerate()}
TypeError: unhashable type: 'numpy.ndarray'
======================
Thank you in advance for your help
You should replace the two lambdas in the getters list by ItemGetter(0)
and ItemGetter(1)
.
@sgugger can you please see if this is how one must evaluate when to use get_items, get_x, get_y
?
I am having a hard time coming up with rules when to use them. thank you.
-
if you have paths always use
get_items
. You cannot useget_x
to pass in the x.get_y
can be used to get the labels -
if we are working with a df:
2a. you can passget_items
like -
def _planet_items(x): return (
f'{planet_source}/train/'+x.image_name+'.jpg', x.tags.str.split())
so you can pass in both the x and y with just get_items
.
2b. or you can use get_x
and get_y
to get the column names.
Basically:
-
get_y
is some way to get the labels from theitems
(paths) or can be a col name in case of a df. - you cannot pass
get_x
andget_y
instead ofget_items
and expect it to become a tuple “(get_x, get_y)” for paths it works for df.
The most common situation is have seen is use get_items
to get your paths and then use get_y
to extract the label from it.
This is the first example i have seen with getters
need to figure this still(any tips on when to use this is much appreciated)
get_items
is completely decoupled from get_x
and get_y
: it is there to return all your items from the source. You can pass get_x
and get_y
(or a list of getters) to explain how to get your x and y from the result of get_items
and they both default to noop (which is why when get_items
return filenames, we don’t pass a get_x
).