Windows 10 Installation Notes (Windows command and WSL bash)

Chris_Palmer · November 9, 2017, 3:43pm

Hi Robert. The model is just lesson 1 from our current fastai class. The new version that uses PIL (installed as Pillow) rather than OpenCV. I have followed your instructions regarding setting up my PC, then cloned the fastai github.

BTW, I didn’t create a fastai environment - or rather I should say I tried to create one but it hung at some point, so this time I just ran it from my vanilla Python 3.6 set up as you instructed. It wasn’t clear from your intructions if or when to load the packaged environment as part of the setup.

How do I tell Pytorch to run in CPU only? Do I need to hack the fastai library - e.g. alter this in core.py?

def to_gpu(x, *args, **kwargs):
    if torch.cuda.is_available():
        return x.cuda(*args, **kwargs)
    else:
        return x

And this in model.py?

    if isinstance(input_size[0], (list, tuple)):
        x = [Variable(torch.rand(1,*in_size)).cuda() for in_size in input_size]
    else: x = [Variable(torch.rand(1,*input_size)).cuda()]
    m(*x)

ecdrid · November 11, 2017, 3:24am

conda install peterjc123 is broken…
Do have a backup of the scripts folder in the Anaconda3 directory…

Chris_Palmer · November 11, 2017, 10:25am

Yes, peterjc123 pytorch has a massive memory leak under Windows. I ran the regular Pytorch from my Windows 10 Ubuntu sub-system, and lesson 1 runs without gobbling up all of my memory. Obvioulsy really slowly as its a CPU only installation, but it does run. So there is no point in going any further down the road of trying to tun Pytorch directly in Windows 10.

Why did you ask if I have a backup of the scripts folder in Anacond3 directory? (I don’t BTW). Do I need to undo something to remove the dydfunctional libraries?

ecdrid · November 11, 2017, 11:40am

Chris_Palmer · November 11, 2017, 11:54am

Thanks @ecdrid

It talks about installing from the peterjc123 tar file. I installed it using conda install. So far I am not aware of any malfunction in my Python under Windows 10 - do you have a suggestion of a test or an examination of files I could do to see if everything is OK?

ecdrid · November 11, 2017, 2:55pm

Even if you do conda install…
Anaconda does download the tar file…

Chris_Palmer · November 11, 2017, 6:31pm

OK (gulp), do you have any idea how I might check the status of my Python?

ecdrid · November 11, 2017, 6:33pm

Actually what happened was that after the installation it had removed

activate.bat
activate
deactivate.bat
deactivate

from the Scripts Directory…
Except that As far as i have used it ,
It just works fine till now…

Chris_Palmer · November 11, 2017, 6:45pm

Oh well, that’s not too bad - so far I have seen these as used to link Keras to either Theano or Tensorflow, but in any case it should all be well documented in articles on the internet!

Chris_Palmer · November 25, 2017, 7:02pm

In case anyone gets stuck on having OpenCV working, with messages like this

ModuleNotFoundError: No module named 'cv2'

or like this

ImportError: libSM.so.6: cannot open shared object file: No such file or directory

I am adding here advice I got from another post, that you may have to update some dependencies:

sudo apt-get install libsm6 libxrender1 libfontconfig1

And I am not sure if relevant since this post is about setting up your own installation, especially relevant to WIndows 10 setup, but advice from another user is to use the provided fastai environment - it’s not clear from the instructions here about how to work with the fastai environment…

conda env create -f environment.yml

bsalita · December 20, 2017, 8:53pm

Seems that the original post can no longer be edited (60 day limit?). I’ve attached a pdf of a run of Lesson 1 to show that Windows + GPU is working. There’s some issues towards the end but they’re the same issues reported on non-Windows systems.fastai Lesson 1 Jupyter Notebook.pdf (2.4 MB)

As of this time, I see no Windows specific issues, at least not with Lesson 1.

Chris_Palmer · December 27, 2017, 6:58am

Hi Robert

I am getting the following error when I call learn.fit.

Did you make a change to the fastai library to overcome this (I updated to the latest version befoire running this today)?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-38-688cebf597d5> in <module>()
----> 1 learn.fit(0.01, 3)

D:\FASTAI\fastai\fastai\learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
    211         self.sched = None
    212         layer_opt = self.get_layer_opt(lrs, wds)
--> 213         self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
    214 
    215     def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):

D:\FASTAI\fastai\fastai\learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, **kwargs)
    158         n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
    159         fit(model, data, n_epoch, layer_opt.opt, self.crit,
--> 160             metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
    161 
    162     def get_layer_groups(self): return self.models.get_layer_groups()

D:\FASTAI\fastai\fastai\model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
     84             batch_num += 1
     85             for cb in callbacks: cb.on_batch_begin()
---> 86             loss = stepper.step(V(x),V(y))
     87             avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
     88             debias_loss = avg_loss / (1 - avg_mom**batch_num)

D:\FASTAI\fastai\fastai\model.py in step(self, xs, y)
     41         if isinstance(output,(tuple,list)): output,*xtra = output
     42         self.opt.zero_grad()
---> 43         loss = raw_loss = self.crit(output, y)
     44         if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss)
     45         loss.backward()

D:\Anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1047         weight = Variable(weight)
   1048     if dim == 2:
-> 1049         return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1050     elif dim == 4:
   1051         return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)

RuntimeError: Expected object of type Variable[torch.cuda.LongTensor] but found type Variable[torch.cuda.IntTensor] for argument #1 'target'

bsalita · December 27, 2017, 6:02pm

@Chris_Palmer, I have a couple of patches for that issue. It’s possible that it’s unique to Windows for any of a variety of possible reasons. The patches will make lesson1 run until you hit universal Value error (probably fastai lib error). It appears to me that there’s more patches needed as the problematic pattern is elsewhere in model.cp file. I’m guessing this is an issue with Python because it’s lacks static typed checking. TypePython anyone? I’ll open an issue/PR (done -
https://github.com/fastai/fastai/issues/71) on fastai’s github for the patches.

In fastai/fastai-master/fastai/model.cp:

Original: loss = raw_loss = self.crit(output, y)
Fix: loss = raw_loss = self.crit(output, y.long()) # patch

Original: return preds, self.crit(preds,y)
Fix: return preds, self.crit(preds,y.long()) # patch

After applying the patches, if you see this error “ValueError: Found input variables with inconsistent numbers of samples: [2000, 5]” you’ve reached the same error as other platforms (https://github.com/fastai/fastai/issues/70). Move on to something else until fastai fixes it.

Chris_Palmer · December 27, 2017, 8:17pm

Thanks Robert!

bsalita · December 29, 2017, 9:40pm

@Chris_Palmer, Per @jeremy’s advice, I’ve changed the above patches to use y.long() so they’ll work with both cpu and gpu processing.

Chris_Palmer · December 29, 2017, 10:54pm

Thanks Robert, again

bsalita · December 30, 2017, 10:34pm

The latest fixes on fastai’s github allow lesson1 to complete without error. Previously lesson1 would error out at the 80% mark. True for all platforms.

git clone https://github.com/fastai/fastai

Windows users will need to slightly modify the lesson1 file. The difference being the replacement of non-portable bash shell commands with cross-platform python code. I’ve attached a pdf of a run of lesson1 on Windows + GPU which shows the replacements for the bash commands. – No I haven’t. The file is too large. I’ll have to post some other way.

bsalita · December 30, 2017, 11:01pm

pytorch for Windows has been updated to use CUDA 9.0. tensorflow-gpu still requires CUDA 8.0. I’ve installed CUDA 8 and CUDA 9 side-by-side without problem. I can now use CUDA 9 with pytorch and CUDA 8 with tensorflow-gpu.

conda install -c peterjc123 pytorch cuda90

The next version of pytorch, version 0.40, is suppose to officially support Windows.

jeremy · December 30, 2017, 11:39pm

I’ve just pushed the change to add long() when calling self.crit. Let me know if you see any problems.

bsalita · December 31, 2017, 9:20am

I reran Lesson1 on Windows-gpu. It completed without errors. Thanks for committing the changes.

N.B. I use these commands to update changes at https://github.com/fastai/fastai master. The commands will update changes from master without deleting files not tracked by git (e.g. courses/dl1/data). Important: If you have any local changes that are tracked by git, they will be lost. With or without --hard option, any local commits that haven’t been pushed will be lost.

git fetch --all
git reset --hard origin/master

Good Stackoverflow discussion of git commands for updating. Many alternatives commands are suggested. Depends on your git situation.

How do I force git pull to overwrite local files?