Developer chat

Great ! Thanks Andrea :slight_smile:

1 Like

Anybody here has a good understanding of python internals? Currently we have an issue with ipython autoreloader - at the very least the Learner object doesn’t get updated to the address space of the newly reloaded modules.

It all started with getting:

PicklingError: Can't pickle <class 'fastai.basic_train.Recorder'>: 
it's not the same object as fastai.basic_train.Recorder

after I had learn.export() called, right after editing fastai/basic_train.py and having jupyter autoreload it via the usual:

%reload_ext autoreload
%autoreload 2

I started digging and looking at why pickle was failing. I reduced its main verification function of a much longer code from pickle, with exceptions et al to a simplified version, relevant just to our situation:

def pickle_get_class(obj):
    name = obj.__class__.__name__
    module_name = getattr(obj, '__module__', None)
    obj2 = sys.modules[module_name]
    for subpath in name.split('.'): obj2 = getattr(obj2, subpath)
    return obj2

#obj = learn.recorder
obj = learn

class1 = obj.__class__
class2 = pickle_get_class(obj)

print(f"class 1: {hex(id(class1))}")
print(f"class 2: {hex(id(class2))}")
print(class1 is class2)

When the notebook is run the first time after the kernel was restarted both print the same address, i.e. pointing to the same version of the class.

class 1: 0x5592541235a8
class 2: 0x5592541235a8
True

Then I’d modify, say, fastai/basic_train.py and rerun the cell and now the addresses are not the same, and class 1 hasn’t changed.

class 1: 0x5592541235a8
class 2: 0x5592541245b8
False

So ipython reload magic failed to update the objects as it describes in caveats.

Actually, you can ignore pickle_get_class. If reload were to work correctly hex(id( learn.__class__ )) should be different after each autoreload (if learner class was reloaded - directly or as a dependency).

You can see from my code that I first started with learn.recorder as reported by pickle, but then I noticed learn had the same issue.

This situation sucks since it’s no longer possible to use autoreload with at least learner objects if their modules are modified and the failure is silent so you could be still working with the old version and wasting hours debugging the wrong thing. Surely, doing a kernel restart will remedy it, but it’ll make debug much slower in some situations that requires pre-running extra steps.

When a new fastai function is developed it’s easy to make a fast running notebook so autoreload is not a necessity in such cases. But when a user is debugging something failing in fastai code deep in their notebook the restart approach doesn’t cut.

So we need to understand and hopefully fix why ipython reload magic fails to reload one or more of fastai class objects.

The reload magic functionality (the actual update of objects) in ipython is here: https://github.com/ipython/ipython/blob/master/IPython/extensions/autoreload.py#L253 From skimming through the code I don’t see anything that would suggest that it silently ignores any parts of the reload.

I think I forgot to mention I developed a little tool to compare conda envs. Get it at https://github.com/stas00/conda-tools.

I needed it to compare one working env vs. one failing to see what packages were different, here is an example of its output. You just pass the names of 2 environments you want to compare:

$ conda-env-compare.pl work27 work36
Comparing installed packages in environments: work27 and work36


********************************************** Match: Differ **********************************************
                       environment       work27       work36                  work27                  work36
                      package name      version      version                  source                  source
------------------------------------------------------------------------------------------------------------
                         ipykernel       4.10.0        5.1.0         py27_0/anaconda py36h39e3cac_0/anaconda
                            python       2.7.15        3.6.8     h9bab390_6/anaconda     h0371630_0/anaconda


********************************************** Match: Missing **********************************************
                       environment       work27       work36                  work27                  work36
                      package name      version      version                  source                  source
------------------------------------------------------------------------------------------------------------
                         backports          1.0                      py27_1/anaconda                        
                     backports-abc          0.5                      py27_0/anaconda                        
backports.shutil-get-terminal-size        1.0.0                      py27_2/anaconda                        
                      configparser        3.5.0                      py27_0/anaconda                        
                            enum34        1.1.6                      py27_1/anaconda                        
                       functools32      3.2.3.2                      py27_1/anaconda                        
                           futures        3.2.0                      py27_0/anaconda                        
                 get-terminal-size        1.0.0                  haa9412d_0/anaconda                        
                         ipaddress       1.0.22                      py27_0/anaconda                        
                          pathlib2        2.3.3                      py27_0/anaconda                        
                           scandir        1.9.0              py27h14c3975_0/anaconda                        
                    singledispatch      3.4.0.3                      py27_0/anaconda                        
                                xz                     5.2.4                             h14c3975_4/anaconda


*********************************************** Match: Same ***********************************************
                       environment       work27       work36                  work27                  work36
                      package name      version      version                  source                  source
------------------------------------------------------------------------------------------------------------
                            bleach        3.1.0        3.1.0         py27_0/anaconda         py36_0/anaconda
                   ca-certificates     2018.3.7     2018.3.7              0/anaconda              0/anaconda
                           certifi   2018.11.29   2018.11.29         py27_0/anaconda         py36_0/anaconda
                         decorator        4.3.0        4.3.0         py27_0/anaconda         py36_0/anaconda
                       entrypoints        0.2.3        0.2.3         py27_2/anaconda         py36_2/anaconda
                               gmp        6.1.2        6.1.2     h6c8ec71_1/anaconda     h6c8ec71_1/anaconda
                           ipython        5.1.0        5.1.0         py27_0/anaconda         py36_0/anaconda
                  ipython-genutils        0.2.0        0.2.0         py27_0/anaconda         py36_0/anaconda
                            jinja2         2.10         2.10         py27_0/anaconda         py36_0/anaconda
                        jsonschema        2.6.0        2.6.0         py27_0/anaconda         py36_0/anaconda
                    jupyter-client        5.2.4        5.2.4         py27_0/anaconda         py36_0/anaconda
                      jupyter-core        4.4.0        4.4.0         py27_0/anaconda         py36_0/anaconda
                           libedit 3.1.20170329 3.1.20170329     h6b74fdf_2/anaconda     h6b74fdf_2/anaconda
                            libffi        3.2.1        3.2.1     hd88cf55_4/anaconda     hd88cf55_4/anaconda
                         libgcc-ng        8.2.0        8.2.0     hdf63c60_1/anaconda     hdf63c60_1/anaconda
                         libsodium       1.0.16       1.0.16     h1bed415_0/anaconda     h1bed415_0/anaconda
                      libstdcxx-ng        8.2.0        8.2.0     hdf63c60_1/anaconda     hdf63c60_1/anaconda
                        markupsafe        1.1.0        1.1.0 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
                           mistune        0.8.4        0.8.4 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
                         nbconvert        5.3.1        5.3.1         py27_0/anaconda         py36_0/anaconda
                          nbformat        4.4.0        4.4.0         py27_0/anaconda         py36_0/anaconda
                           ncurses          6.1          6.1     he6710b0_1/anaconda     he6710b0_1/anaconda
                          notebook        5.7.4        5.7.4         py27_0/anaconda         py36_0/anaconda
                           openssl       1.1.1a       1.1.1a     h7b6447c_0/anaconda     h7b6447c_0/anaconda
                            pandoc      2.2.3.2      2.2.3.2              0/anaconda              0/anaconda
                     pandocfilters        1.4.2        1.4.2         py27_1/anaconda         py36_1/anaconda
                           pexpect        4.6.0        4.6.0         py27_0/anaconda         py36_0/anaconda
                       pickleshare        0.7.5        0.7.5         py27_0/anaconda         py36_0/anaconda
                               pip         18.1         18.1         py27_0/anaconda         py36_0/anaconda
                 prometheus-client        0.5.0        0.5.0         py27_0/anaconda         py36_0/anaconda
                    prompt-toolkit       1.0.15       1.0.15         py27_0/anaconda         py36_0/anaconda
                        ptyprocess        0.6.0        0.6.0         py27_0/anaconda         py36_0/anaconda
                          pygments        2.3.1        2.3.1         py27_0/anaconda         py36_0/anaconda
                   python-dateutil        2.7.5        2.7.5         py27_0/anaconda         py36_0/anaconda
                             pyzmq       17.1.2       17.1.2 py27h14c3975_0/anaconda py36h14c3975_0/anaconda
                          readline          7.0          7.0     h7b6447c_5/anaconda     h7b6447c_5/anaconda
                        send2trash        1.5.0        1.5.0         py27_0/anaconda         py36_0/anaconda
                        setuptools       40.6.3       40.6.3         py27_0/anaconda         py36_0/anaconda
                     simplegeneric        0.8.1        0.8.1         py27_2/anaconda         py36_2/anaconda
                               six       1.12.0       1.12.0         py27_0/anaconda         py36_0/anaconda
                            sqlite       3.26.0       3.26.0     h7b6447c_0/anaconda     h7b6447c_0/anaconda
                         terminado        0.8.1        0.8.1         py27_1/anaconda         py36_1/anaconda
                          testpath        0.4.2        0.4.2         py27_0/anaconda         py36_0/anaconda
                                tk        8.6.8        8.6.8     hbc83047_0/anaconda     hbc83047_0/anaconda
                           tornado        5.1.1        5.1.1 py27h7b6447c_0/anaconda py36h7b6447c_0/anaconda
                         traitlets        4.3.2        4.3.2         py27_0/anaconda         py36_0/anaconda
                           wcwidth        0.1.7        0.1.7         py27_0/anaconda         py36_0/anaconda
                      webencodings        0.5.1        0.5.1         py27_1/anaconda         py36_1/anaconda
                             wheel       0.32.3       0.32.3         py27_0/anaconda         py36_0/anaconda
                            zeromq        4.2.5        4.2.5     hf484d3e_1/anaconda     hf484d3e_1/anaconda
                              zlib       1.2.11       1.2.11     h7b6447c_3/anaconda     h7b6447c_3/anaconda
3 Likes

you’re such a fantastic toolsmidth - fearless and thorough - we are luckly you are here

1 Like

Since there was a lot of confusions, in DataBunch I’ve renamed the tfms argument to dl_tfms (people often used it for ds_tfms in computer vision).

1 Like

I have built a small version of Beam Search that seems promising. In the process, I looked carefully at the LanguageLearner.predict() method. I am not sure if this is a bug or I am misunderstanding how it works.

When you call predict(), you begin with an initial self.model.reset() that sets the hidden states to zero. Then you pass through the sample text and continue to append a new token each time to your list of generated tokens. However, your text is now the full set of tokens you have generated from the start, but you have not reset the state, so you are predicting from the end of the last prediction state.

What am I missing here?

I think we should add the parameter cut to unet_learner to be able to use custom model.
Current signature of the function is unet_learner(data:DataBunch,arch:Callable,pretrained:bool=True,blur_final:bool=True,norm_type:Optional[NormType]=’NormType’,split_on:Union[Callable, Collection[ModuleList], NoneType]=None,blur:bool=False,self_attention:bool=False,y_range:OptRange=None,last_cross:bool=True,bottle:bool=False,kwargs:Any)

I propose unet_learner(data:DataBunch, arch:Callable, pretrained:bool=True, blur_final:bool=True, norm_type:Optional[NormType]=NormType, split_on:Optional[SplitFuncOrIdxList]=None, blur:bool=False, self_attention:bool=False, y_range:Optional[Tuple[float,float]]=None, last_cross:bool=False, bottle:bool=False,cut:Union[int,Callable]=None, **kwargs:Any)->None: and pass the parameter to create_body as in create_cnn

I found this setup to debug PyTorch memory leaks on the Pyro forums: https://forum.pyro.ai/t/a-clever-trick-to-debug-tensor-memory/556
Maybe this is interesting for the library development. :slight_smile:

1 Like

That’s a nice version, @MicPie! Except it’s incomplete, it should be merged with this version: https://discuss.pytorch.org/t/how-to-debug-causes-of-gpu-memory-leaks/6741/24
We should put it somewhere in the docs for sure.

If you find other goodies please share!

1 Like

A post was merged into an existing topic: Fastai v1 install issues thread

Hi @sgugger cutout for Data Augmentation was implemented in previous fastai (before v1) but not in v1. Do you plan to add it in vision.transform or this is not a relevant technique and will not be implemented ? Thanks.

We just forgot. Will implement it when I have a bit of time next week, send me a PM if I forget!

1 Like

Thanks Sylvain !

@sgugger @pierreguillou I was procrastinating (instead of training LMs) and implemented it: https://github.com/fastai/fastai/pull/1489

Figured Sylvain’s busy with text stuff and I’ll help out a bit.

@pierreguillou, you can test if this works for you by using my fork, or wait till it’s merged (or till Sylvain implements it himself if my code sucks). Ping me if you’ll decide to try it out now and if you’ll have any question about it!

2 Likes

Thanks @xnutsive and @sgugger :slight_smile: (Just the letter t is missing at the end of the following phrase in fastai/docs_src/vision.transform.ipynb: “The normalization technique described in this paper: Improved Regularization of Convolutional Neural Networks with Cutou”).

Hello, this message concerns 2 issues with show_batch().

Note: I link it to my previous messages about plot_top_losses() as they have in common one issue about the DatasetType used by these 2 functions.

1) DatasetType: only train ?

The function show_batch() asks as argument ds_type (ie, the DatasetType) and has DatasetType.Train as default. Right ? But in its code (see below), self.train_ds is hard coded. Does it mean we can’t use show_batch() to display a validation batch (we would need self.valid_ds in this case, not ?)?

def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, **kwargs)->None:
    "Show a batch of data in `ds_type` on a few `rows`."
    x,y = self.one_batch(ds_type, True, True)
    if self.train_ds.x._square_show: rows = rows ** 2
    xs = [self.train_ds.x.reconstruct(grab_idx(x, i)) for i in range(rows)]
    #TODO: get rid of has_arg if possible
    if has_arg(self.train_ds.y.reconstruct, 'x'):
        ys = [self.train_ds.y.reconstruct(grab_idx(y, i), x=x) for i,x in enumerate(xs)]
    else : ys = [self.train_ds.y.reconstruct(grab_idx(y, i)) for i in range(rows)]
    self.train_ds.x.show_xys(xs, ys, **kwargs)

2) When batch size is one (bs=1), show_batch() does not work.

When batch size is one (bs=1), data.show_batch() (data is a ImageDataBunch) gives the following error, which is normal as the function tries to display by default 5x5=25 images from a train batch:
IndexError: index 1 is out of bounds for dimension 0 with size 1

However, data.show_batch(rows=1) that should display 1 image gives as well an error:
TypeError: 'AxesSubplot' object is not iterable

And, even if the batch size is > 1, data.show_batch(rows=1) gives the same error.

Then, the minimum to make show_batch() worked is bs=4 and data.show_batch(rows=2).
How to solve this issue and make show_batch() worked even for bs=1 ?

Thanks.

Hm, I can look into that tomorrow. I saw that show_batch doesn’t work for small batches (if you do show_batch(1) it’ll work, it just tries to show rows*cols elements, and a batch of 1 doesn’t have enough elements, default rows is 5.

That’s an easy fix, and I could look into valid_ds issue too.

I agree with you. Thanks if you can fix it.

I think you wanted to write “it will not work”.

Great :slight_smile: Thank you.

1 Like

Yeah, got ahead of myself and sent the reply and then re-read the original message. Not my best Monday /shrug.

Thank for the detailed investigation. I’ll work on it tomorrow and get back to you guys with a PR hopefully.

1 Like

train_ds is only hard-coded when we are looking at the class of either the inputs or the labels, to call things like reconstruct or show_xys. Those are the same for all the datasets in your DataBunch. The data is actually accessed in the first line, when we call one_batch (and there we pass ds_type).

2 Likes