Windows 10 Installation Notes (Windows command and WSL bash)

gerardo · November 8, 2017, 2:56pm

@vrajjshah I thought that I had the two partitions until I restarted and Windows 10 is now dead on my PC.
My Ubuntu configuration is up and running

Two cards are showing up.
and the pytorch seems to be responsive to the GPUs.

Maybe during Thanksgiving break I will go back and re-install the whole thing again and see if I can get the dual boot that I need.

Chris_Palmer · November 9, 2017, 2:57am

Thanks @bsalita

Looking at this for the first time. In the fast.ai system we are loading an environment file. But this instruction makes no mention of it that I could see. Can you clarify please, if these installation requirements should be fulfilled before or after loading the fastai environment, or supersedes the loading of it?

Also, you specify conda install pytorch first, then suggest conda install -c peterjc123 pytorch. Can we clarify please - I suspect we only need the peterjc123 version for GPU support directly under Windows 10. On the other hand if we are running from the WSL subsystem we can’t access our GPU from there so standard pytorch should be sufficient. Is this correct?

jeremy · November 9, 2017, 3:01am

Heh What you’re doing is great practice - but note if you use Crestle there’s nothing to set up at all!

Chris_Palmer · November 9, 2017, 5:42am

I get “execstack: cannot open ELF file: invalid file descriptor” when I try this step…

bsalita · November 9, 2017, 8:12am

I have not explored the environment file. The instructions show how to manually set up requirements for lesson1 on Windows or WSL and is helpful for Linux too. Please note the showstopper issues. I believe getting all of Lesson 1 to run on Windows or WSL can be done but it requires changes to the fastai library. I haven’t had time to work on Windows pull requests for fastai library. There’s possibly some non-fastai issues too.

Yes, peterjc123 is specifically for Windows prompt, others can use pytorch.

Chris_Palmer · November 9, 2017, 10:57am

Hi Robert

Thought I would give this a try. You are right, I can get Pytorch working directly under Windows 10, and even started running the code arch=resnet34, but after a number of epochs (87, then 360, then on the 32nd of 32), I got a memory issue - see snapshot below.

On the process of from torch._C import * the error comes as DLL load failed: The paging file is too small for this operation to complete.

Perhaps you could let me know if this makes sense to you? I have 16GB RAM in my PC, but my GPU is rather inadequate - an NVDIA GeForce GTX 650 Ti.

bsalita · November 9, 2017, 2:44pm

Do you have a link to the model that you’re running? I can test on my systems. Are you able to run CPU only?

Chris_Palmer · November 9, 2017, 3:43pm

Hi Robert. The model is just lesson 1 from our current fastai class. The new version that uses PIL (installed as Pillow) rather than OpenCV. I have followed your instructions regarding setting up my PC, then cloned the fastai github.

BTW, I didn’t create a fastai environment - or rather I should say I tried to create one but it hung at some point, so this time I just ran it from my vanilla Python 3.6 set up as you instructed. It wasn’t clear from your intructions if or when to load the packaged environment as part of the setup.

How do I tell Pytorch to run in CPU only? Do I need to hack the fastai library - e.g. alter this in core.py?

def to_gpu(x, *args, **kwargs):
    if torch.cuda.is_available():
        return x.cuda(*args, **kwargs)
    else:
        return x

And this in model.py?

    if isinstance(input_size[0], (list, tuple)):
        x = [Variable(torch.rand(1,*in_size)).cuda() for in_size in input_size]
    else: x = [Variable(torch.rand(1,*input_size)).cuda()]
    m(*x)

ecdrid · November 11, 2017, 3:24am

conda install peterjc123 is broken…
Do have a backup of the scripts folder in the Anaconda3 directory…

Chris_Palmer · November 11, 2017, 10:25am

Yes, peterjc123 pytorch has a massive memory leak under Windows. I ran the regular Pytorch from my Windows 10 Ubuntu sub-system, and lesson 1 runs without gobbling up all of my memory. Obvioulsy really slowly as its a CPU only installation, but it does run. So there is no point in going any further down the road of trying to tun Pytorch directly in Windows 10.

Why did you ask if I have a backup of the scripts folder in Anacond3 directory? (I don’t BTW). Do I need to undo something to remove the dydfunctional libraries?

ecdrid · November 11, 2017, 11:40am

Chris_Palmer · November 11, 2017, 11:54am

Thanks @ecdrid

It talks about installing from the peterjc123 tar file. I installed it using conda install. So far I am not aware of any malfunction in my Python under Windows 10 - do you have a suggestion of a test or an examination of files I could do to see if everything is OK?

ecdrid · November 11, 2017, 2:55pm

Even if you do conda install…
Anaconda does download the tar file…

Chris_Palmer · November 11, 2017, 6:31pm

OK (gulp), do you have any idea how I might check the status of my Python?

ecdrid · November 11, 2017, 6:33pm

Actually what happened was that after the installation it had removed

activate.bat
activate
deactivate.bat
deactivate

from the Scripts Directory…
Except that As far as i have used it ,
It just works fine till now…

Chris_Palmer · November 11, 2017, 6:45pm

Oh well, that’s not too bad - so far I have seen these as used to link Keras to either Theano or Tensorflow, but in any case it should all be well documented in articles on the internet!

Chris_Palmer · November 25, 2017, 7:02pm

In case anyone gets stuck on having OpenCV working, with messages like this

ModuleNotFoundError: No module named 'cv2'

or like this

ImportError: libSM.so.6: cannot open shared object file: No such file or directory

I am adding here advice I got from another post, that you may have to update some dependencies:

sudo apt-get install libsm6 libxrender1 libfontconfig1

And I am not sure if relevant since this post is about setting up your own installation, especially relevant to WIndows 10 setup, but advice from another user is to use the provided fastai environment - it’s not clear from the instructions here about how to work with the fastai environment…

conda env create -f environment.yml

bsalita · December 20, 2017, 8:53pm

Seems that the original post can no longer be edited (60 day limit?). I’ve attached a pdf of a run of Lesson 1 to show that Windows + GPU is working. There’s some issues towards the end but they’re the same issues reported on non-Windows systems.fastai Lesson 1 Jupyter Notebook.pdf (2.4 MB)

As of this time, I see no Windows specific issues, at least not with Lesson 1.

Chris_Palmer · December 27, 2017, 6:58am

Hi Robert

I am getting the following error when I call learn.fit.

Did you make a change to the fastai library to overcome this (I updated to the latest version befoire running this today)?

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-38-688cebf597d5> in <module>()
----> 1 learn.fit(0.01, 3)

D:\FASTAI\fastai\fastai\learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
    211         self.sched = None
    212         layer_opt = self.get_layer_opt(lrs, wds)
--> 213         self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
    214 
    215     def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):

D:\FASTAI\fastai\fastai\learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, **kwargs)
    158         n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
    159         fit(model, data, n_epoch, layer_opt.opt, self.crit,
--> 160             metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
    161 
    162     def get_layer_groups(self): return self.models.get_layer_groups()

D:\FASTAI\fastai\fastai\model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
     84             batch_num += 1
     85             for cb in callbacks: cb.on_batch_begin()
---> 86             loss = stepper.step(V(x),V(y))
     87             avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
     88             debias_loss = avg_loss / (1 - avg_mom**batch_num)

D:\FASTAI\fastai\fastai\model.py in step(self, xs, y)
     41         if isinstance(output,(tuple,list)): output,*xtra = output
     42         self.opt.zero_grad()
---> 43         loss = raw_loss = self.crit(output, y)
     44         if self.reg_fn: loss = self.reg_fn(output, xtra, raw_loss)
     45         loss.backward()

D:\Anaconda3\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1047         weight = Variable(weight)
   1048     if dim == 2:
-> 1049         return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1050     elif dim == 4:
   1051         return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)

RuntimeError: Expected object of type Variable[torch.cuda.LongTensor] but found type Variable[torch.cuda.IntTensor] for argument #1 'target'

bsalita · December 27, 2017, 6:02pm

@Chris_Palmer, I have a couple of patches for that issue. It’s possible that it’s unique to Windows for any of a variety of possible reasons. The patches will make lesson1 run until you hit universal Value error (probably fastai lib error). It appears to me that there’s more patches needed as the problematic pattern is elsewhere in model.cp file. I’m guessing this is an issue with Python because it’s lacks static typed checking. TypePython anyone? I’ll open an issue/PR (done -
https://github.com/fastai/fastai/issues/71) on fastai’s github for the patches.

In fastai/fastai-master/fastai/model.cp:

Original: loss = raw_loss = self.crit(output, y)
Fix: loss = raw_loss = self.crit(output, y.long()) # patch

Original: return preds, self.crit(preds,y)
Fix: return preds, self.crit(preds,y.long()) # patch

After applying the patches, if you see this error “ValueError: Found input variables with inconsistent numbers of samples: [2000, 5]” you’ve reached the same error as other platforms (https://github.com/fastai/fastai/issues/70). Move on to something else until fastai fixes it.