Developer chat

So if I’m reading this correctly, testing for gpu mem leaks should be one of the top priorities for the test suite? (Improving/Expanding Tests).

I’d like to help. On the tests front, I’ll play around and ping you on the dev project thread. If you have a specific scenario or part of work in mind that would be helpful in the next few days, let me know, I’ll focus on that.

I won’t say it’s the top priority, since the current fastai code base doesn’t have too many issues with that. It’s just not utilizing all the available memory at times, because it doesn’t manage it tightly (1) due to cyclic references (2) due to fragmentation, caused by gpu mem allocation made before freeing the no longer needed memory in some situations. Ideally, the code should be cyclic reference free, so that when any object is removed it should be instantly reclaimed and if gpu is involved, its memory freed. But it’s not the case.

Thank you for the offer, @xnutsive. My plan is to add a few tests for the core functionality (create-learner-train-save-load sequence and its parts), and develop useful utils to make it easy to write them quickly. And then we can start expanding it to other parts. I know a few people are actively working on trying to get the ‘text’ classes to utilize less memory. e.g. LanguageModelLoader.

Have a look at https://github.com/fastai/fastai/blob/master/tests/test_vision_train.py#L87 (test_model_load_mem_leak) for a basic model. It’s now trivial to write leak tests, you just measure used memory before and after and you need to understand how to measure the real used memory.

I think the dev_nb folder in fastai_docs is probably the best place to share development notebooks. Thanks for investigating this!

1 Like

Perfect. Thank you, @sgugger.

@stas do you know if we can instruct the GPU to stop keeping a mem cache so that we can see the timeline for mem allocations.

When i restart the PC and/or jupyter notebook then i can see a surge in GPU-mem when starting training of a language model. i think it is when backprop starts. This would be easier to pin down without the cache hiding the amount of used memory

do you know if we can instruct the GPU to stop keeping a mem cache so that we can see the timeline for mem allocations.

I don’t know, perhaps there is a way to compile pytorch w/ caching disabled? Ask at http://discuss.pytorch.org/ and report back your findings?

Until, then try to run torch.cuda.empty_cache() at strategic points.

And you might find this cell-by-cell gpu memory logger that I have just released useful: https://github.com/stas00/ipygpulogger - it’s totally new so I’m still tweaking the interface (and feedback is welcome!). But the main reason I mentioned it to you is that it runs empty_cache() automatically for you before and after each cell is run to measure the gpu memory usage correctly. (and gc.collect() but that can be turned off)

I have also just discovered this pytorch CUDA memory profiler, which perhaps can be useful to you. https://gist.github.com/dojoteef/26cd46f7cc38b38e6f443c5f62411aa3

I have just started a new thread GPU Optimizations Central - let’s have that discussion over there and use that thread for compiling all the knowledge we collectively discover.

When i restart the PC and/or jupyter notebook then i can see a surge in GPU-mem when starting training of a language model. i think it is when backprop starts.

It could be this too: https://github.com/stas00/ipygpulogger#framework-preloading - if it’s the first 0.5GB then it certainly is the case.

thx
I have installed it and removed a ton statements to measure used memory from my notebook/.py files.
Good idea to preload pytorch:

1 Like

New: we can now directly export the Learner which avoids having to redefine the model at inference time (it’s saved with the data). You can check the inference tutorial for all the details, but you basically say learn.export() when you are ready, then learn = load_learner(path) when you want to load your inference learner.

Breaking change: the adult_sample dataset has been updated, you need to manually destroy the one you have to trigger a download.

4 Likes

learn.summary() seems have a bug right ?

In the latest version of fast.ai I encounter.

2 Likes

@jeremy changed it not to print() the output but to return the data. And ipython/jupyter doesn’t interpret special raw characters, you have to run the data through print().

In [1]: y = "\n".join(["a","b"])

In [2]: y
Out[2]: 'a\nb'

In [3]: print(y)
a
b

So basically you now have to do this:

print(learn.summary())

The function could probably be smart and detect if it’s in an ipython shell or not and print() instead of returning data then, but perhaps it’d be an inconsistent behavior.

Currently, fastai has an inconsistent mix of some functions returning data, others printing.

My guess is that the change was done to better support the use of fastai outside of jupyter environment, where “unsolicited” printing is not a correct function behavior.

Of course, the other solution is to have a set of ipython wrappers that:

def summary_p(self): print(self.summary()) 

and then you use the wrapper:

learn.summary_p()

or something like that.

2 Likes

is SWA already implemented in fast.ai v1 ? I found nothing from the docs.

Not yet, @wdhorton said he was working on it I believe.

Ok thank you. @wdhorton I am available if you need help.

Just finished:

  • LanguageModelLoader (used behind the scenes by TextLMDataBunch) has now been replaced by LanguageModelPreLoader which isn’t a DataLoader but an intermediate between the dataset and a pytorch DataLoader. It’s a Dataset and a Callback at the same time, and is responsible for reading a portion of the stream created by all the texts concatenated.
  • Which means we can have pre-loader now that are Callback. The only events we can call are on_epoch_begin or on_epoch_end since the multiprocessing in pytorch DataLoader (with num_workers>=1) makes a copy of the underlying dataset that is only synchronized at the end of the iteration.
4 Likes

this is really nice and memory usage is down. THX

Here is a small suggestion for def getitem__(self, k:int): inserting the blow line just before the comment “#Returning the right portion”. will allow users to provide the token id’s in a format that match the vocab . FX: np.uint16 for a vocab of size 64k.

    if concat.dtype != np.int64: concat = concat.astype(np.int64)

Will add.
Also note I removed the varying bptt because it doesn’t add anything now that we shuffle the texts at each batch (tested on witkitext-2).

1 Like

agree i could not measure any difference using p_bppt nor my own uniform distribution

I believe there is a ±1 offset issue between batches in the new version.
I had so many problems making my own indexing of the jagged array work, that i created a test to generate jagged arrays with continous numbers but random layout. I feel confident when i can handle 10000 different layouts

Here is a result with the ± issue from running on the newest version of LanguageModelPreLoader i fastai dev: https://github.com/kasparlund/nlp/blob/master/test_languagemodelloader.ipynb

when i tried again it failed because start became greater than end in getitem.

I can create an issue if you agree that there is an issue

get_transforms() with default settings threw the error below for histology images

"RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead"

I’m using images from this Kaggle competition:

First I set all the arguments for get_transforms() to zero to get a baseline

tfms = get_transforms(do_flip=False, 
                      flip_vert=False, 
                      max_rotate=0., 
                      max_zoom=0., 
                      max_lighting=0., 
                      max_warp=0., 
                      p_affine=0., 
                      p_lighting=0.)

Everything, I did below worked:

data = (ImageItemList.from_df(df=df, path=path, cols='fpaths')
                     .random_split_by_pct(valid_pct=0.2, seed=10)
                     .label_from_df(cols='class_label')
                     .transform(tfms, size=49)
                     .databunch(bs=128))

data.show_batch(rows=3, figsize=(7,7), hide_axis=False)

learn = create_cnn(data, models.resnet34, metrics=[error_rate, accuracy])

learn.fit_one_cycle(6)

Then I tried using the default options for get_transforms() and got the error:

tfms = get_transforms()

data = (ImageItemList.from_df(df=df, path=path, cols='fpaths')
                     .random_split_by_pct(valid_pct=0.2, seed=10)
                     .label_from_df(cols='class_label')
                     .transform(tfms, size=49)
                     .databunch(bs=128))

data.show_batch(rows=3, figsize=(7,7), hide_axis=False)

"RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead"

Finally, I narrowed the cause of the problem down to max_warp() by manually entering all of the defaults and changing each to zero, one at a time:

tfms = get_transforms(do_flip=True, 
                      flip_vert=False, 
                      max_rotate=10., 
                      max_zoom=1.1, 
                      max_lighting=0.2, 
                      max_warp=0., 
                      p_affine=0.75, 
                      p_lighting=0.75)

Thought I’d share this in case anyone else ran into the same issue.

It is impressive that fastai knows that these images should not have max_warp applied! Is this a bug?

I wonder if this same error will be thrown when I want to look at 3D images of cells and organelles? … TBD

The full error text was:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-25-0a63e3fd5550> in <module>
----> 1 data.show_batch(rows=3, figsize=(7,7), hide_axis=False)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in show_batch(self, rows, ds_type, **kwargs)
    151     def show_batch(self, rows:int=5, ds_type:DatasetType=DatasetType.Train, **kwargs)->None:
    152         "Show a batch of data in `ds_type` on a few `rows`."
--> 153         x,y = self.one_batch(ds_type, True, True)
    154         if self.train_ds.x._square_show: rows = rows ** 2
    155         xs = [self.train_ds.x.reconstruct(grab_idx(x, i, self._batch_first)) for i in range(rows)]

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in one_batch(self, ds_type, detach, denorm)
    134         w = self.num_workers
    135         self.num_workers = 0
--> 136         try:     x,y = next(iter(dl))
    137         finally: self.num_workers = w
    138         if detach: x,y = to_detach(x),to_detach(y)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_data.py in __iter__(self)
     68     def __iter__(self):
     69         "Process and returns items from `DataLoader`."
---> 70         for b in self.dl:
     71             y = b[1][0] if is_listy(b[1]) else b[1]
     72             yield self.proc_batch(b)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in __next__(self)
    466                 self.reorder_dict[idx] = batch
    467                 continue
--> 468             return self._process_next_batch(batch)
    469 
    470     next = __next__  # Python 2 compatibility

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
    487         self._put_indices()
    488         if isinstance(batch, _utils.ExceptionWrapper):
--> 489             raise batch.exc_type(batch.exc_msg)
    490         return batch
    491 

RuntimeError: Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/data_block.py", line 486, in __getitem__
    x = x.apply_tfms(self.tfms, **self.tfmargs)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 113, in apply_tfms
    else: x = tfm(x)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 498, in __call__
    return self.tfm(x, *args, **{**self.resolved, **kwargs}) if self.do_run else x
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 445, in __call__
    if args: return self.calc(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 450, in calc
    if self._wrap: return getattr(x, self._wrap)(self.func, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/image.py", line 167, in coord
    self.flow = func(self.flow, *args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 227, in symmetric_warp
    return _perspective_warp(c, targ_pts, invert)
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 213, in _perspective_warp
    return _apply_perspective(c, _find_coeffs(_orig_pts, targ_pts))
  File "/home/ubuntu/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/transform.py", line 194, in _find_coeffs
    return torch.gesv(B,A)[0][:,0]
RuntimeError: B should have at least 2 dimensions, but has 1 dimensions instead
5 Likes

It’s a bug with the new version of pytorch, it has been fixed in master I believe.

1 Like

split from Fastai v1 install issues thread

Trouble with tests/test_vision_data.py

Second problem, is that when I run the tests in the latest pull ($make test) I get an error related to pulling some data in using mnist = untar_data(URLs.COCO_TINY)

Looking at the ~/.fastai/data/ directory, it seems that sometimes the HTTP call fails to pull in a .tgz and that leads to the untar failing. If I delete the empty .tgz, then I (sometimes) get success the second time around. I believe this is due to a socket.timeout

My solution is to remove all the files in that directory rm -r ~/.fastai/data/* and then try again. After a few tries, it all seems to come in and I can run all the tests. I don’t know why the timeout is an issue and an intermittent one at that.

Show install information:

=== Software === 
python        : 3.7.1
fastai        : 1.0.40.dev0
fastprogress  : 0.1.18
torch         : 1.0.0
nvidia driver : 396.51
torch cuda    : 9.0.176 / is available
torch cudnn   : 7401 / is enabled

=== Hardware === 
nvidia gpus   : 2
torch devices : 2
  - gpu0      : 12194MB | TITAN Xp
  - gpu1      : 12196MB | TITAN Xp

=== Environment === 
platform      : Linux-4.15.0-32-generic-x86_64-with-debian-stretch-sid
distro        : Ubuntu 16.04 Xenial Xerus
conda env     : fai_v1_dev
python        : /home/farzin/anaconda3/envs/fai_v1_dev/bin/python
sys.path      : 
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python37.zip
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7/lib-dynload
/home/farzin/anaconda3/envs/fai_v1_dev/lib/python3.7/site-packages
/home/farzin/fast_ai/fastai-fork

Fri Jan 11 14:32:56 2019    
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.51                 Driver Version: 396.51                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:03:00.0 Off |                  N/A |
| 30%   47C    P8    21W / 250W |     12MiB / 12194MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:04:00.0  On |                  N/A |
| 23%   36C    P8    17W / 250W |    979MiB / 12196MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1433      G   /usr/lib/xorg/Xorg                           661MiB |
|    1      2423      G   compiz                                       302MiB |
|    1     22300      G   /usr/lib/firefox/firefox                       3MiB |
+-----------------------------------------------------------------------------+