Pytorch v1.0 stable is now working on Windows but fastai v1 needs some tweaks to get it work on Windows

You’re a boss :smiley:

Thanks to @313V this limitation is now removed in master :slight_smile:

5 Likes

@hwasiti

Having a fastai v1 environment on Windows is great. You should be aware, however, that with v0.7 I always found the same notebooks to be executed more slowly on windows (a substantial 50-80%, indeed). The same stood for tensorflow/keras.

I don’t know why, maybe a matter of OS primitives.

3 Likes

I remember a tweet from Jeremy, that the performance on Windows was good with pytorch 0.3. Wait I will search for it.

Here it is:

2 Likes

Anyone have a brief sketch of how to build the lib from source?

I’ve done it in R (my ‘native’ language), but never in Python/Anaconda…

Excited about this because I need to be on windows during the workday, and my GPU is basically sitting idle…

I’ve been testing lesson1-pets.ipynb on Windows 10 and comparing against Ubuntu 18.04. While Windows 10 seems to be giving the same results as Ubuntu. Unfortunately Windows is 5x slower than Ubuntu (60m vs 14m) with identical hardware (GPU) and software. CPU-only is impossibly slow on both systems. Windows monitors show that the hardware is way underutilized (20%) whereas Ubuntu is often 100%.

Is anyone else experiencing 5x slowness? Is it due to pickle or other lib not running multi-threaded?

5 Likes

I´m able to reproduce the slowness compared to Ubuntu on my machine too (10:12 vs 1:54). Most of the time is consumed between two epochs or before validation starts. So it is either the data preprocessing or initially transferring the data to the GPU. In the next days, I will test if Tensorflow has the same reduction in speed compared to Ubuntu.

6 Likes

I was able to confirm this as well, it should be noted that to run the lesson1-pets notebook, the regular expression for parsing the filenames must be changed from:
pat = re.compile(r’/([^/]+)_\d+.jpg$’)`

to

pat = re.compile(r'\\([^\\]+)_\d+.jpg$')

because linux uses slashes to separate file path levels and windows uses backslashes.

4 Likes

not sure if this: in windows, DataLoader with num_workers > 0 is extremely slow (50 times slower) is the reason, but it would appear it’s about process creation.

2 Likes

Agreeing but without looking at the code. The solution will be a vastly improved process/thread lifecycle manager with changes to process/thread communications, similar to a service architecture.

I was able to install fastai v1 and pytorch on Windows 10 using conda. I’m using python 3.7.1. However, when I run through lesson1-pets I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-15e5d1d9602d> in <module>
----> 1 data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs
      2                                   ).normalize(imagenet_stats)

C:\anaconda\envs\v3\lib\site-packages\fastai\vision\data.py in from_name_re(cls, path, fnames, pat, valid_pct, **kwargs)
    153         pat = re.compile(pat)
    154         def _get_label(fn): return pat.search(str(fn)).group(1)
--> 155         return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, **kwargs)
    156 
    157     @staticmethod

C:\anaconda\envs\v3\lib\site-packages\fastai\vision\data.py in from_name_func(cls, path, fnames, label_func, valid_pct, **kwargs)
    146         "Create from list of `fnames` in `path` with `label_func`."
    147         src = ImageItemList(fnames, path=path).random_split_by_pct(valid_pct)
--> 148         return cls.create_from_ll(src.label_from_func(label_func), **kwargs)
    149 
    150     @classmethod

C:\anaconda\envs\v3\lib\site-packages\fastai\data_block.py in _inner(*args, **kwargs)
    386         assert isinstance(fv, Callable)
    387         def _inner(*args, **kwargs):
--> 388             self.train = ft(*args, **kwargs)
    389             assert isinstance(self.train, LabelList)
    390             kwargs['label_cls'] = self.train.y.__class__

C:\anaconda\envs\v3\lib\site-packages\fastai\data_block.py in label_from_func(self, func, **kwargs)
    240     def label_from_func(self, func:Callable, **kwargs)->'LabelList':
    241         "Apply `func` to every input to get its label."
--> 242         return self.label_from_list([func(o) for o in self.items], **kwargs)
    243 
    244     def label_from_folder(self, **kwargs)->'LabelList':

C:\anaconda\envs\v3\lib\site-packages\fastai\data_block.py in <listcomp>(.0)
    240     def label_from_func(self, func:Callable, **kwargs)->'LabelList':
    241         "Apply `func` to every input to get its label."
--> 242         return self.label_from_list([func(o) for o in self.items], **kwargs)
    243 
    244     def label_from_folder(self, **kwargs)->'LabelList':

C:\anaconda\envs\v3\lib\site-packages\fastai\vision\data.py in _get_label(fn)
    152         "Create from list of `fnames` in `path` with re expression `pat`."
    153         pat = re.compile(pat)
--> 154         def _get_label(fn): return pat.search(str(fn)).group(1)
    155         return cls.from_name_func(path, fnames, _get_label, valid_pct=valid_pct, **kwargs)
    156 

AttributeError: 'NoneType' object has no attribute 'group'

See my post above about the regular expression used to parse the filenames

I’ve completed the lesson one on Windows 10 (cuda 10, gtx 1070), but the performance is indeed very poor. Most of the time is spent on CPU. I run a profiler and the most of the time is spent in grid_sampler. The GPU usage is very low.

command:
%prun learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))

result:
ncalls tottime percall cumtime percall filename:lineno(function)
14732 55.815 0.004 55.815 0.004 {built-in method grid_sampler}
32074 41.458 0.001 41.458 0.001 {method ‘decode’ of ‘ImagingDecoder’ objects}
234 37.108 0.159 37.108 0.159 {method ‘cpu’ of ‘torch._C._TensorBase’ objects}
14732 29.377 0.002 29.377 0.002 {method ‘clone’ of ‘torch._C.TensorBase’ objects}
20436 17.061 0.001 17.061 0.001 {built-in method addmm}
11010 15.233 0.001 15.233 0.001 {method 'sigmoid
’ of ‘torch._C._TensorBase’ objects}
29696 12.904 0.000 12.904 0.000 {method ‘contiguous’ of ‘torch._C.TensorBase’ objects}
49653 12.548 0.000 12.548 0.000 {method 'mul
’ of ‘torch._C._TensorBase’ objects}
14732 10.539 0.001 10.539 0.001 {method ‘astype’ of ‘numpy.ndarray’ objects}
941 7.764 0.008 7.764 0.008 {built-in method torch._C._nn.adaptive_avg_pool2d}
14732 7.158 0.000 8.627 0.001 image.py:517(affine_grid)
30006 5.989 0.000 5.989 0.000 {method 'add
’ of ‘torch._C.TensorBase’ objects}
10672 3.987 0.000 3.987 0.000 {method 'zero
’ of ‘torch._C._TensorBase’ objects}
55206 3.357 0.000 104.106 0.002 image.py:116(refresh)
118 3.313 0.028 3.313 0.028 {built-in method stack}
14732 2.889 0.000 2.889 0.000 {method ‘max’ of ‘torch._C._TensorBase’ objects}
14732 2.883 0.000 2.883 0.000 {method ‘min’ of ‘torch._C._TensorBase’ objects}
14732 2.773 0.000 255.430 0.017 data_block.py:481(getitem)

CPU usage is likely loading and augmenting the images.

As for the GPU usage, did you run this after unfreezing the model? If the model is still frozen, you’re only training the last linear layer, which is computationally light. Try unfreeze the model and increase the batch size to the largest your GPU can hold.

I tried both locked and unlocked model and the behaviour is consistent - about 35-40% cpu usage (i7-8700K) and some spikes of GPU usage. Input data are on SSD.

I’m getting results consistent with previous comments. It takes over 10 minutes, not 1.5min as on the video. Adjusting batch size doesn’t change much.

It seems the slowdown is a result of running the data loader with num_workers=0. GPU usage goes up when resnet50 is used. I’ve spent long time googling it and it seems we have to live with it until pytorch is fixed for Windows platform.

3 Likes

With the 0.7 pytorch and old course my gpu utilization was 100% at times. After upgrading to pytorch 1.0 and course v3, the gpu usage was only spikes.

@sgugger might also want to fix this for a future version of the notebooks by adding

fn_paths = [item.as_posix() for item in fn_paths]

into the https://github.com/fastai/course-v3/tree/master/nbs/dl1 notebooks that operate on file names. This would make Jeremy’s regexp work on Win and Posix without us poor Windows users having to figure out REs on themselves :wink:

3 Likes

The as_posix is in the library now, it has been suggested as a bug fix and will be in v1.0.43 when it’s released.

3 Likes

set path = r’\([^\]+)_\d+.jpg$’ will fix it.But in the meet a issue—‘PicklingError’ until now, i can’t fix it.

1 Like