Fastai v2 chat

Does anyone has an issue with the latest fastai2 version (0.0.11… I was using version 0.0.8 before) when using this import:

from fastai2.vision.all import *

When switching to torch 1.4.0 this error dissapears but new issues occurs with RNNs.

(I can put this issues on github fastai2 repo, but prefer to have a quick feedback in case I’m missing something)

Pytorch&co versions (I’ve created a new docker image from the official pytorch tags on docker hub):

Blockquote
torch (1.4.0) - Tensors and Dynamic neural networks in Python with strong GPU acceleration
INSTALLED: 1.3.0
torchvision (0.5.0) - image and video datasets and models for torch deep learning
INSTALLED: 0.5.0 (latest)

Error:

Blockquote
File “/root/workspace/fastai2/fastai2/vision/all.py”, line 3, in
from .augment import *
File “/root/workspace/fastai2/fastai2/vision/augment.py”, line 91, in
from torchvision.transforms.functional import pad as tvpad
File “/opt/conda/lib/python3.6/site-packages/torchvision/init.py”, line 3, in
from torchvision import models
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/init.py”, line 12, in
from . import detection
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/init.py”, line 1, in
from .faster_rcnn import *
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/faster_rcnn.py”, line 13, in
from .rpn import AnchorGenerator, RPNHead, RegionProposalNetwork
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/rpn.py”, line 11, in
from . import _utils as det_utils
File “/opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py”, line 19, in
class BalancedPositiveNegativeSampler(object):
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1219, in script
_compile_and_register_class(obj, _rcb, qualified_name)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1076, in _compile_and_register_class
_jit_script_class_compile(qualified_name, ast, rcb)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/_recursive.py”, line 222, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File “/opt/conda/lib/python3.6/site-packages/torch/jit/init.py”, line 1226, in script
fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(obj))
RuntimeError:
builtin cannot be used as a value:
at /opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:14:56
def zeros_like(tensor, dtype):
# type: (Tensor, int) -> Tensor
return torch.zeros_like(tensor, dtype=dtype, layout=tensor.layout,
~~~~~~~~~~~~~ <— HERE
device=tensor.device, pin_memory=tensor.is_pinned())
‘zeros_like’ is being compiled since it was called from ‘torch.torchvision.models.detection._utils.BalancedPositiveNegativeSampler.call
at /opt/conda/lib/python3.6/site-packages/torchvision/models/detection/_utils.py:72:12

        # randomly select positive and negative examples
        perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
        perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

        pos_idx_per_image = positive[perm1]
        neg_idx_per_image = negative[perm2]

        # create binary mask from indices
        pos_idx_per_image_mask = zeros_like(
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~...  <--- HERE
            matched_idxs_per_image, dtype=torch.uint8
        )
        neg_idx_per_image_mask = zeros_like(
            matched_idxs_per_image, dtype=torch.uint8
        )

        pos_idx_per_image_mask[pos_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
        neg_idx_per_image_mask[neg_idx_per_image] = torch.tensor(1, dtype=torch.uint8)
1 Like

The pipelines in the mid-level API will produce tuples, which is why we need this type to be special. We excluded other collections like lists or Ls to have an easy way for people using Transforms to be applied on a collection as a whole (just pass a list instead of a tuple). For instance, a LabeledBBox subclasses L (it as a list of boxes and their labels) and it will always be considered as one object for the transforms.

Namedtuple are excluded for now as a hack around the iterations over rows of a dataframe. It produces a special pandas class that subclasses named tuples but we need those rows to be considered as one item. Once we have fixed at the low level how that iteration is done, we can remove them from being an exception (since they are a subclass of tuple).

4 Likes

Cool, thanks for the insight!

1 Like

…and the reason for this is that PyTorch datasets are expected to create tuples, and DataLoader batches are tuples.

3 Likes

I am experiencing a problem with the DataBlock.datasets() using a list of tuples (numpy.ndarray, str) as a source. Using fastai2 v0.0.10, I was able to create dataloaders using both Datasets and DataBlock classes. When I updraded to v.0.0.11 this issue appeared.

I updated all my transforms that were using as_item=True to ItemTransform. By doing so, I was able to create a dataset using Datasets class. Datasets.vocab works fine and I can plot a timeseries from my dataset using show_at.

My items are basically a list of tuples. The tuples are numpy.ndarray and string. The error occur in the Categorize.setups() (in fastai2.data.transforms.py). It says TypeError: unhashable type: 'numpy.ndarray'. It looks like the numpy array is passed in instead of the strings. I suspect that it has to do with the DataBlock.getters() but all my attempts were not successful.

Source code and the error stack are here below

class TensorTS(TensorBase):
    "Transform a 2D array into a Tensor"
    def show(self, ctx=None, title=None, chs=None, leg=True, **kwargs):
....
class ToTensorTS(ItemTransform):
    "xy : tuple representing (2D numpy array, Label)"
    def encodes(self, xy): x,y=xy; return TensorTS(x)

def TSBlock():
    "`TransformBlock` for timeseries : Transform np array to TensorTS type"
    return TransformBlock(type_tfms=ToTensorTS())

class LabelTS(ItemTransform):
    "x : tuple representing (2D numpy array, Label)"
    def encodes(self, xy): x,y=xy; return y
getters = [lambda xy: xy[0], lambda xy: xy[1]]
tsdb = DataBlock(blocks=(TSBlock, CategoryBlock),
                   get_items=get_ts_items,
                   getters=getters,
                   splitter=RandomSplitter(seed=seed),
                   batch_tfms = batch_tfms)

tsdb.datasets(fnames, verbose=True)

=======================
Collecting items from [Path('C:/Users/fh/.fastai/data/NATOPS/NATOPS_TRAIN.arff'), Path('C:/Users/fh/.fastai/data/NATOPS/NATOPS_TEST.arff')]
Found 360 items
2 datasets of sizes 288,72
Setting up Pipeline: <lambda> -> ToTensorTSBlock
TYPE of xy : <class 'list'>
LENTGH of xy : 2
[array([-0.540579, -0.54101 , -0.540603, -0.540807, -0.540564, -0.540681,
       -0.540665, -0.541065, -0.540593, -0.540723, -0.54044 , -0.540232,
       -0.540191, -0.540209, -0.53981 , -0.539904, -0.540259, -0.540194,
       -0.54002 , -0.54033 , -0.546852, -0.551316, -0.553834, -0.558104,
       -0.560738, -0.558948, -0.559712, -0.565038, -0.560855, -0.55511 ,
       -0.553586, -0.55047 , -0.555965, -0.554922, -0.55378 , -0.557211,
       -0.556262, -0.558439, -0.560581, -0.560134, -0.55657 , -0.564141,
       -0.56727 , -0.568937, -0.572611, -0.570396, -0.569147, -0.564811,
       -0.56305 , -0.566314, -0.553712], dtype=float32), '2']
TYPE of x :  <class 'numpy.ndarray'>
TYPE of y :  <class 'str'>
x :  [-0.540579 -0.54101  -0.540603 -0.540807 -0.540564 -0.540681 -0.540665
 -0.541065 -0.540593 -0.540723 -0.54044  -0.540232 -0.540191 -0.540209
 -0.53981  -0.539904 -0.540259 -0.540194 -0.54002  -0.54033  -0.546852
 -0.551316 -0.553834 -0.558104 -0.560738 -0.558948 -0.559712 -0.565038
 -0.560855 -0.55511  -0.553586 -0.55047  -0.555965 -0.554922 -0.55378
 -0.557211 -0.556262 -0.558439 -0.560581 -0.560134 -0.55657  -0.564141
 -0.56727  -0.568937 -0.572611 -0.570396 -0.569147 -0.564811 -0.56305
 -0.566314 -0.553712]
y :  2
Setting up Pipeline: <lambda> -> Categorize
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
----> 1 tsdb.datasets(fnames, verbose=True)

c:\users\fh\dev\fastai2\fastai2\fastai2\data\block.py in datasets(self, source, verbose)
     93         splits = (self.splitter or RandomSplitter())(items)
     94         pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
---> 95         return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
     96 
     97     def dataloaders(self, source, path='.', verbose=False, **kwargs):

c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in __init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
    259     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    260         super().__init__(dl_type=dl_type)
--> 261         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    262         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    263 

c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in (.0)
    259     def __init__(self, items=None, tfms=None, tls=None, n_inp=None, dl_type=None, **kwargs):
    260         super().__init__(dl_type=dl_type)
--> 261         self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
    262         self.n_inp = (1 if len(self.tls)==1 else len(self.tls)-1) if n_inp is None else n_inp
    263 

c:\users\fh\dev\fastai2\fastcore\fastcore\foundation.py in __call__(cls, x, *args, **kwargs)
     39             return x
     40 
---> 41         res = super().__call__(*((x,) + args), **kwargs)
     42         res._newchk = 0
     43         return res

c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in __init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose)
    200         if do_setup:
    201             pv(f"Setting up {self.tfms}", verbose)
--> 202             self.setup(train_setup=train_setup)
    203 
    204     def _new(self, items, **kwargs): return super()._new(items, tfms=self.tfms, do_setup=False, types=self.types, **kwargs)

c:\users\fh\dev\fastai2\fastai2\fastai2\data\core.py in setup(self, train_setup)
    213 
    214     def setup(self, train_setup=True):
--> 215         self.tfms.setup(self, train_setup)
    216         if len(self) != 0:
    217             x = super().__getitem__(0) if self.splits is None else super().__getitem__(self.splits[0])[0]

c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in setup(self, items, train_setup)
    177         tfms = self.fs[:]
    178         self.fs.clear()
--> 179         for t in tfms: self.add(t,items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):

c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in add(self, t, items, train_setup)
    180 
    181     def add(self,t, items=None, train_setup=False):
--> 182         t.setup(items, train_setup)
    183         self.fs.append(t)
    184 

c:\users\fh\dev\fastai2\fastcore\fastcore\transform.py in setup(self, items, train_setup)
     76     def setup(self, items=None, train_setup=False):
     77         train_setup = train_setup if self.train_setup is None else self.train_setup
---> 78         return self.setups(getattr(items, 'train', items) if train_setup else items)
     79 
     80     def _call(self, fn, x, split_idx=None, **kwargs):

c:\users\fh\dev\fastai2\fastcore\fastcore\dispatch.py in __call__(self, *args, **kwargs)
     96         if not f: return args[0]
     97         if self.inst is not None: f = MethodType(f, self.inst)
---> 98         return f(*args, **kwargs)
     99 
    100     def __get__(self, inst, owner):

c:\users\fh\dev\fastai2\fastai2\fastai2\data\transforms.py in setups(self, dsets)
    184 
    185     def setups(self, dsets):
--> 186         if self.vocab is None and dsets is not None: self.vocab = CategoryMap(dsets, add_na=self.add_na)
    187         self.c = len(self.vocab)
    188 

c:\users\fh\dev\fastai2\fastai2\fastai2\data\transforms.py in __init__(self, col, sort, add_na)
    169             if not hasattr(col,'unique'): col = L(col, use_list=True)
    170             # `o==o` is the generalized definition of non-NaN used by Pandas
--> 171             items = L(o for o in col.unique() if o==o)
    172             if sort: items = items.sorted()
    173         self.items = '#na#' + items if add_na else items

c:\users\fh\dev\fastai2\fastcore\fastcore\foundation.py in unique(self)
    372         return self._new(i for i,o in enumerate(self) if f(o))
    373 
--> 374     def unique(self): return L(dict.fromkeys(self).keys())
    375     def enumerate(self): return L(enumerate(self))
    376     def val2idx(self): return {v:k for k,v in self.enumerate()}

TypeError: unhashable type: 'numpy.ndarray'
======================

Thank you in advance for your help

You should replace the two lambdas in the getters list by ItemGetter(0) and ItemGetter(1).

1 Like

Thank you very much @sgugger. That was super helpful.

@sgugger can you please see if this is how one must evaluate when to use get_items, get_x, get_y?
I am having a hard time coming up with rules when to use them. thank you.

  1. if you have paths always use get_items. You cannot use get_x to pass in the x. get_y can be used to get the labels

  2. if we are working with a df:
    2a. you can pass get_items like -

def _planet_items(x): return (
    f'{planet_source}/train/'+x.image_name+'.jpg', x.tags.str.split())

so you can pass in both the x and y with just get_items.
2b. or you can use get_x and get_y to get the column names.

Basically:

  • get_y is some way to get the labels from the items(paths) or can be a col name in case of a df.
  • you cannot pass get_x and get_y instead of get_items and expect it to become a tuple “(get_x, get_y)” for paths it works for df.

The most common situation is have seen is use get_items to get your paths and then use get_y to extract the label from it.
This is the first example i have seen with getters need to figure this still(any tips on when to use this is much appreciated)

get_items is completely decoupled from get_x and get_y: it is there to return all your items from the source. You can pass get_x and get_y (or a list of getters) to explain how to get your x and y from the result of get_items and they both default to noop (which is why when get_items return filenames, we don’t pass a get_x).

3 Likes

Any idea about this?

Hi,
I reinstalled the fastai2 yesterday on WSL environment.
I’m trying to go over the library nbs, but when I just open “08_vision.data” I get error -
“Could not find a kernel matching env37”.
I tried to open other notebooks (07_vision.core) and had no issue.

I used the editable install - > > pip install -e ".[dev]" >

my current python version is 3.7 (according conda list).
running “which jupyter notebook” shows it runs the correct version from the fastai2 environment.

might it make sense to hide the fact that pillow v7 removed PILLOW_VERSION from fastai users?

I put a some ideas together here; https://github.com/pete88b/data-science/blob/master/fix-PILLOW_VERSION.ipynb

I am using untar_data() to download timeseries zipped files. Those files contain the zipped files stored at the root folder (i.e. there isn’t any folder inside as opposed to planet_tiny.tgz for example). When using untar_data(), the timeseries uncompressed files are store in .fastai/data folder and therefore polluting the data folder. I tried to use dest argument but that ended up storing the uncompressed files in my current folder. Is there an option to uncompress files in a separate folder under .fastai/data?

Just for testing and since I am using the editable version fastai v2, I temporarily replaced dest.parent by dest (see commented line at the end of the code snippet here below) and that fixed the problem (files are stored in their own folder i.e. name of the zip file). I was wondering if it would be possible to add an argument to untar_data() (use_parent_dest=True for example, used as default in order to stay compatible with current use, and False in my case).

def untar_data(url, fname=None, dest=None, c_key='data', force_download=False, extract_func=file_extract):
    "Download `url` to `fname` if `dest` doesn't exist, and un-tgz to folder `dest`."
    default_dest = URLs.path(url, c_key=c_key).with_suffix('')
    dest = default_dest if dest is None else Path(dest)/default_dest.name
    fname = Path(fname or URLs.path(url))
    if fname.exists() and _get_check(url) and _check_file(fname) != _get_check(url):
        print("A new version of this dataset is available, downloading...")
        force_download = True
    if force_download:
        if fname.exists(): os.remove(fname)
        if dest.exists(): shutil.rmtree(dest)
    if not dest.exists():
        fname = download_data(url, fname=fname, c_key=c_key)
        if _get_check(url) and _check_file(fname) != _get_check(url):
            print(f"File downloaded is broken. Remove {fname} and try again.")
        # extract_func(fname, dest.parent)
        extract_func(fname, dest)
    return dest

If you pass an absolute path as dest it should work.

Thank you Jeremy for your fast reply. Following your suggesting I tried these two options:

Option 1 : set dest to the fully qualified path. It stores the uncompressed files in the right folder but it returns the wrong path:

dsname =  'NATOPS' #'NATOPS', 'LSST', 'Wine', 'Epilepsy', 'HandMovementDirection'
path_data = Config().data
dest = path_data/dsname
dest

Path('C:/Users/fh/.fastai/data/NATOPS')
url = 'http://www.timeseriesclassification.com/Downloads/NATOPS.zip'
path = untar_data(url, dest=dest)
path

Path('C:/Users/fh/.fastai/data/NATOPS/NATOPS') <--- doesn't exist

Option 2 : set dest to the .fastai/data path. It doesn’t store the uncompressed files in a separate folder but it returns the intended path:

For both options, the returned paths don’t exist (see <--- arrows in the code snippets)

dsname =  'NATOPS' #'NATOPS', 'LSST', 'Wine', 'Epilepsy', 'HandMovementDirection'
path_data = Config().data
dest = path_data
dest

Path('C:/Users/fh/.fastai/data')
url = 'http://www.timeseriesclassification.com/Downloads/NATOPS.zip'
path = untar_data(url, dest=dest)
path

Path('C:/Users/fh/.fastai/data/NATOPS') <--- doesn't exist either

I think this comes down to the fact that the timeseries zip files don’t have an inside folder and untar_data() assumes there is one.

Thank you for your help anyway. If it isn’t possible to modify untar_data(), it’s fine; I will continue using a method that I called unzip_data() that looks like untar_data()

Yeah unless we actually look inside the file we can’t really know where it will extract to, so we have to make an assumption about what to return.

1 Like

Hi there.

Been trying tirelessly to get fastai working under Windows with GPU. It’s installed, working perfectly fine on CPU, but absolutely refuses to use GPU seemingly whatever I try. I have tried using what I assume are “up to date” conda versions of things and also going back to pytorch 1.0.0 with cuda 9.0 and cudnn 7005. Also tried cuda 10.1 and 10.2 (and pytorch 1.4.0 I think it was…?). GPU driver is 441.66 - this is the constant I haven’t changed essentially because I wouldn’t know what version to try going backwards.

I don’t get any obvious errors that are exposed via the Anaconda console or in the notebook itself. I just get CPU activity with zilch happening on the GPU. “torch.cuda.is_available()” returns true, my GPU model is returned when I do “torch.cuda.get_device_name(0)”. I’m kind of at a loss at the moment.

Is this a known issue or a me related weirdness?

I had this running on Windows a while back @Russbo, but have been running more recently in Linux. I was planning on trying Windows again before the course starts (and do a write up of anything special needed) - so I’ll let you know how it goes. I seem to remember the GPU use was very spiky and overall slower than Linux, but for what I’d expect we’d be doing on the course it should be ok.

6 posts were merged into an existing topic: Fastai2 on Windows

I had the same problem while developing the audio module, where the files in some datasets are directly at the root of the tar file. The solution that I found was to modify the extract function, so that it extracts the contents to a folder with the same name as the compressed file.

The modified function is:

def tar_extract_at_filename(fname, dest):
    "Extract `fname` to `dest`/`fname`.name folder using `tarfile`"
    dest = Path(dest)/Path(fname).with_suffix('').name
    tarfile.open(fname, 'r:gz').extractall(dest)

And I use it like:

url = "https://public-datasets.fra1.digitaloceanspaces.com/250-speakers.tar"
path = untar_data(url, extract_func=tar_extract_at_filename)
4 Likes