Unofficial pytorch 0.4 support

jeremy · May 13, 2018, 10:33pm

Thanks to @sgugger for fixing the last issues in AWD LSTM, you should now find that all of parts 1 and 2 of the course run fine under pytorch 0.4 (the new version that was just released). There’s no need to upgrade however, and we’re not updating environment.yml to push the new version - but if you have to upgrade for some other reason, things should work fine.

(If they don’t, feel free to ask for help here, but I can’t promise to make fixing it a priority, since I’m not considering 0.4 “officially supported” yet. Also, PRs to fix 0.4 issues are welcome, but be sure they’re tested in 0.3 first.)

sgugger · May 13, 2018, 10:56pm

Let’s hope I haven’t broken the library again

TheShadow29 · May 20, 2018, 8:11pm

@jeremy I tried with pytorch0.4 and the line learn = learn = ConvLearner.pretrained(arch,data, precompute=True) always gives cuda memory out of error. I tried this multiple times and also made sure the GPU was completely free. GPU is 1080Ti.

Unfortunately, I haven’t figured out the bug yet. I will report back if I find something.

willismar · July 24, 2018, 9:16pm

that’s very strange @TheShadow29 I could run normally on my old gtx 690 2x2gb ram .

TheShadow29 · July 25, 2018, 3:40am

I havent checked recently. Maybe there were some changes which fixed this. I stuck with version 0.3.1 for the time being.

Pomo · August 25, 2018, 6:39am

Please help me install Pytorch 0.4.1 correctly

Hello coders. I would like to experiment with the pure Pytorch version of the movielens example (from Lesson 5), found here:

This notebook requires Pytorch 0.4, and I read that the course notebooks are now compatible, so I attempted to install Pytorch 0.4.1, using my limited understanding of Linux. After a couple of false starts that utterly failed, I think I managed to revert back to the original fastai conda environment.

From there I edited environment.yml, the line ‘- pytorch<0.4’ into ‘- pytorch<0.5’. “conda env update” then seemed to download and install Pytorch 0.4.1.

However, now I get a Python error:
`ImportError Traceback (most recent call last)
in ()
----> 1 import torch

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/init.py in ()
78 pass
79
—> 80 from torch._C import *
81
82 all += [name for name in dir(_C)

ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory`

FYI:
/usr/local/cuda/lib64 contains links to
libcudart.so
libcudart.so.9.2
libcudart.so.9.2.148
But not for libcudart.so.9.0.

Further adventures with Stackoverflow suggested:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64 && sudo ldconfig
which did not fix the problem and might have created some new ones.

Certainly, “A little knowledge is a dangerous thing”, but I trust this situation can be fixed. I also have a full record of the Terminal session that got me into it, in case that would be helpful.

Will some kind expert please show me how to fix what’s wrong here and how to correctly install Pytorch 0.4?

Thanks so much!

Malcolm

Ubuntu 16.0.4 LTS. i5/nvidia 1070.

RogerS49 · August 28, 2018, 8:25am

Hi

I had some issues running the test set for fastai http://forums.fast.ai/t/error-running-pytest/21476 I managed to get the test set to run but my methods are perhaps not what is wanted. Using PYTHONPATH environment variable would have issues with maintaining a consistent view with the source activate environment. I took my paths from a python session after importing sys and printing the sys.path to output. Then quit python session. This gave me the main anaconda environment paths not the source activated ones. I just edited it by adding envs/fastai-cpu in the right places. There are probably more paths in that group than are required. Do you have any suggestions of how to get the right paths automatically . Thanks

borowis · September 23, 2018, 9:42pm

Hi, I think I hit a minor issue which might be related to pytorch 0.4. The thread is here Lesson 4 IMDB Test Part Fails but basically seems like n[0].unsqueeze(0) for single number does not add a dimension

wdhorton · September 24, 2018, 12:41am

If you still haven’t solved it, try this: conda install pytorch torchvision cuda92 -c pytorch. It seems like the version you have is looking for CUDA 9.0 and not 9.2.

Pomo · September 24, 2018, 6:42am

Hi William. I really appreciate your responding to my problem. I did get PyTorch 0.4 set up and working in a conda environment. However, It ran slower, so I went back to the current fastai environment. See post for details.
“Using PyTorch 0.4 and resultant slowdown”

Trying your suggested command updates PyTorch, torchvision, and cuda. But afterward torch.cuda.is_available() returns False. Ugh - more problems.

I have concluded that fixing these kinds of configuration problems is an irrelevant distraction from learning the course and not worth the time to master. So I’m just going to stay in the standard configuration and try not to break anything!

willismar · September 25, 2018, 5:07pm

Hi @Pomo , sorry I just saw your messages now. I see you are getting problems configuring PyTorch with Cuda.
Well to be honest Anaconda/Miniconda, cuda packages are very constrained (depending from what source channel do you use to install it) then it may cause problems between python packages that leads your PyTorch to miss understand the real needs. That thing opens a door to two solutions:
1- one you stay on the exact environment professor Howard made to the class
2- build your own environment for your cuda and packages , and deal with the needs of PyTorch.

I personally found that this is boring and I decided to build my own containers using pure python 3.6, installing my own cuda version and building my own PyTourch. Now I am using PyTorch 1.0.0 and Cuda 10 with previous classes of FastAI and its just fine. just minor changes made as I catch any problem from the old and new APIs. If you get some time and need directions to work in the same direction I did please contact me. May I can help you to solve your cuda problems or else as you said earlier , restrict to the version that is available and stable for the class. I am a explorer of terrains, so to me, do this kind of stuff is fun and give me opportunity to learn more.

Pomo · September 26, 2018, 6:11am

Olá Willismar,

Thanks for responding to my situation. I appreciate that you enjoy exploring the “bleeding edge” of PyTorch development. And it is nice to hear that the anaconda installation has real problems… so I don’t feel so incompetent.

I think that v. 1.0.0 is too large a jump for me to take, with its instabilities and requiring revisions to the lessons. However, I’ve found some code examples from papers and posts that require v. 0.4.1, the latest stable release. Would you be willing to help me set up a conda environment for running PyTorch 0.4.1? I would like to start directly from a clone of the standard fastai environment in order also to experiment with the lesson notebooks.

If this is too much to ask, or not fun for you to explore, no problem!

Tchau,
Malcolm

willismar · September 26, 2018, 12:41pm

Sure … no problem !

I will contact you on inbox.

willismar · September 26, 2018, 4:37pm

Hello again @Pomo , appear that our time is not in sync or you very busy, so I decided to go ahead and configure one machine with pythorch 0.41 so you can see how it is:

1- First clonning the repository:

git clone GitHub - fastai/fastai: The fastai deep learning library
cd fastai
conda env update
exec bash (linux only)
conda activate fastai

2- So I decided to find the packages needed to upgrade pytorch to 0.41

defaults channel

I got this that the main channels has all these versions of pytorch and more but here I am filtering just the ones are you interested in.

Notice: defaults channel can be ommited

conda search pytorch -c defaults
pytorch                   0.4.1  py27ha74772b_0  pkgs/main
pytorch                   0.4.1  py35ha74772b_0  pkgs/main
pytorch                   0.4.1  py36ha74772b_0  pkgs/main
pytorch                   0.4.1  py37ha74772b_0  pkgs/main

3 - Then I looked in others channels too:

anaconda channel

conda search pytorch -c anaconda
pytorch                   0.4.1  py27ha74772b_0  anaconda
pytorch                   0.4.1  py35ha74772b_0  anaconda
pytorch                   0.4.1  py36ha74772b_0  anaconda
pytorch                   0.4.1  py37ha74772b_0  anaconda

pytorch channel

conda search pytorch -c pytorch
pytorch                   0.4.1 py27__9.0.176_7.1.2_2                 pytorch
pytorch                   0.4.1 py27_cuda8.0.61_cudnn7.1.2_1          pytorch
pytorch                   0.4.1 py27_cuda9.0.176_cudnn7.1.2_1         pytorch
pytorch                   0.4.1 py27_cuda9.2.148_cudnn7.1.4_1         pytorch
pytorch                   0.4.1 py35_cuda8.0.61_cudnn7.1.2_1          pytorch
pytorch                   0.4.1 py35_cuda9.0.176_cudnn7.1.2_1         pytorch
pytorch                   0.4.1 py35_cuda9.2.148_cudnn7.1.4_1         pytorch
pytorch                   0.4.1 py35_py27__9.0.176_7.1.2_2            pytorch
pytorch                   0.4.1 py36_cuda8.0.61_cudnn7.1.2_1          pytorch
pytorch                   0.4.1 py36_cuda9.0.176_cudnn7.1.2_1         pytorch
pytorch                   0.4.1 py36_cuda9.2.148_cudnn7.1.4_1         pytorch
pytorch                   0.4.1 py36_py35_py27__9.0.176_7.1.2_2       pytorch
pytorch                   0.4.1 py37_cuda8.0.61_cudnn7.1.2_1          pytorch
pytorch                   0.4.1 py37_cuda9.0.176_cudnn7.1.2_1         pytorch
pytorch                   0.4.1 py37_cuda9.2.148_cudnn7.1.4_1         pytorch
pytorch                   0.4.1 py37_py36_py35_py27__9.0.176_7.1.2_2  pytorch

4- Then I also decided to look for cudnn at the defaults and anaconda channels

anaconda and defaults channels

conda search cudnn -c anaconda
Loading channels: done
# Name                  Version           Build  Channel             
cudnn                       5.1               0  anaconda            
cudnn                       5.1               0  pkgs/free           
cudnn                    5.1.10       cuda7.5_0  anaconda            
cudnn                    5.1.10       cuda7.5_0  pkgs/free           
cudnn                    5.1.10       cuda8.0_0  anaconda            
cudnn                    5.1.10       cuda8.0_0  pkgs/free           
cudnn                       6.0               0  anaconda            
cudnn                       6.0               0  pkgs/free           
cudnn                    6.0.21       cuda7.5_0  anaconda            
cudnn                    6.0.21       cuda7.5_0  pkgs/free           
cudnn                    6.0.21       cuda8.0_0  anaconda            
cudnn                    6.0.21       cuda8.0_0  pkgs/free           
cudnn                     7.0.5       cuda8.0_0  anaconda            
cudnn                     7.0.5       cuda8.0_0  pkgs/main           
cudnn                     7.1.2       cuda9.0_0  anaconda            
cudnn                     7.1.2       cuda9.0_0  pkgs/main           
cudnn                     7.1.3       cuda8.0_0  anaconda            
cudnn                     7.1.3       cuda8.0_0  pkgs/main           
cudnn                     7.2.1       cuda9.2_0  anaconda            
cudnn                     7.2.1       cuda9.2_0  pkgs/main

5- Next I decided to install cudnn 7.2.1 and I got this installation plan

As you can see it cames with cudnn 7.2.1 and cuda 9.2 , I accepted and it installed !

conda install cudnn cudnn==7.2.1
## Package Plan ##

  environment location: /home/ubuntu/miniconda3/envs/fastai

  added / updated specs: 
    - cudnn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h14c3975_0         3.5 MB  anaconda
    ca-certificates-2018.03.07 |                0         124 KB  anaconda
    cudnn-7.2.1                |        cuda9.2_0       322.8 MB  anaconda
    certifi-2018.8.24          |           py36_1         140 KB  anaconda
    ------------------------------------------------------------
                                           Total:       326.5 MB

The following packages will be UPDATED:

    ca-certificates: 2018.03.07-0      --> 2018.03.07-0      anaconda
    certifi:         2018.8.24-py36_1  --> 2018.8.24-py36_1  anaconda
    cudnn:           7.2.1-cuda9.2_0   --> 7.2.1-cuda9.2_0   anaconda
    openssl:         1.0.2p-h14c3975_0 --> 1.0.2p-h14c3975_0 anaconda

Proceed ([y]/n)? y


Downloading and Extracting Packages
openssl-1.0.2p       |  3.5 MB | ################################### | 100% 
ca-certificates-2018 |  124 KB | ################################### | 100% 
cudnn-7.2.1          | 322.8 MB | ################################## | 100% 
certifi-2018.8.24    |  140 KB | ################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

6-Next I tried to install pytorch for cuda 7.2.1

But if you remember there is no pytorch for cuda 9.2 for cudnn 7.2 released so it was useless ? YES …
When you select the version of pytorch that you can install it will find on the repositories what is the most available version that you can use to install and will downgrade this packages of cuda and cudnn to your anaconda/miniconda installation too… let’s see:
I executed the lines bellow and I got the following instalation plan:

conda install pytorch pytorch
## Package Plan ##

  environment location: /home/ubuntu/miniconda3/envs/fastai

  added / updated specs: 
    - pytorch


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pytorch-0.4.1              |   py36ha74772b_0       215.8 MB
    openssl-1.0.2p             |       h14c3975_0         3.5 MB
    cudnn-7.1.2                |        cuda9.0_0       367.8 MB
    ------------------------------------------------------------
                                           Total:       587.0 MB

The following NEW packages will be INSTALLED:

    nccl:            1.3.5-cuda9.0_0                               

The following packages will be UPDATED:

    ca-certificates: 2018.03.07-0                          anaconda --> 2018.03.07-0        
    certifi:         2018.8.24-py36_1                      anaconda --> 2018.8.24-py36_1    
    openssl:         1.0.2p-h14c3975_0                     anaconda --> 1.0.2p-h14c3975_0   

The following packages will be DOWNGRADED:

    cudatoolkit:     9.2-0                                          --> 9.0-h13b8566_0      
    cudnn:           7.2.1-cuda9.2_0                       anaconda --> 7.1.2-cuda9.0_0     
    pytorch:         0.4.1-py36_py35_py27__9.0.176_7.1.2_2 pytorch  --> 0.4.1-py36ha74772b_0

Proceed ([y]/n)? y


Downloading and Extracting Packages
pytorch-0.4.1        | 215.8 MB | ################################## | 100% 
openssl-1.0.2p       |  3.5 MB | ################################### | 100% 
cudnn-7.1.2          | 367.8 MB | ################################## | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

As you can see it downgraded cuda and cudnn as I mentioned earlier that it constrains your software / project and yet if you do have a external version of cuda in your machine and for some reason conda finds it will may or may not use that information to download the most appropriate version of pytorch for your conda environment breaking things or not along the way.

That’s why I use containers or cloud VMs, to have a isolated environment from external infection of undesirable packages.
Containers cannot do PCI Passthrough on Windows so you cannot create a docker that pass your GPU to it on Windows. I am not telling that is not possible but at least is not easy like a click of a button. I use linux so this thing is natural.

Let me know if you can follow this idea or tutorial to try to set up your project with pytorch 0.4.1 for cuda 9.0 and cudnn 7.1.2.

borowis · September 27, 2018, 8:23pm

I solved it like this:

res,*_ = m(torch.tensor([n[0]]).unsqueeze(0).cuda())

Please let me know if this expression can be simplified as I’m completely new to pytorch.

Also, (not yet released) torchtext 0.3 is required for pytorch 0.4

willismar · September 28, 2018, 4:36pm

Hi @Pomo, you may find this interesting

conda search pytorch -c pytorch | grep cuda9.2
conda search cuda92 -c pytorch
conda install cuda92 -c pytorch

then your pytorch will be upgraded to version 0.41 that uses cuda 9.2 and cudnn 7.1.4

if not then you can do:

conda install pytorch=0.4.1=py36_cuda9.2.148_cudnn7.1.4_1

ninja16180 · May 12, 2020, 12:08pm

Hi there!

Thanks for starting this pytorch support thread.
I am new to forums.fast.ai and came here looking for solution to an issue I am facing while training my model using pytorch framework.

I am going to post my issue in detail below. I am not sure whether this is the right forum to post this issue; pardon my ignorance and appreciate if you kindly redirect me to appropriate forum otherwise:

I am building this transformer model using pytorch where I am using custom embedding layer made of two pre-trained embeddings: fasttext and glove.
Dimension of both the pretrained embedding matrix is 300.
crawl-300d-2M.vec
glove.6B.300d.txt

But I want to limit the dimension of my custom embedding to 256.

Wrote this function to limit the embedding dimention to 256:

def load_embedding(embedding_file):

def get_coefs(word,*arr): 
  return word, np.asarray(arr, dtype='float32')[:256]  # keeping the embedding size 256

embeddings_index = dict(get_coefs(*o.split(" ")) for o in open(embedding_file, encoding="utf8", errors='ignore') if len(o)>100)

return embeddings_index

glove_file = ‘/glove.6B.300d.txt’
fasttext_file = ‘/crawl-300d-2M.vec’

glove_embeddings_index = load_embedding(glove_file)

fasttext_embeddings_index = load_embedding(fasttext_file)

custom embedding

creating a placeholder embedding matrix first

all_embs = np.stack(fasttext_embeddings_index.values()) #using fasttext embedding as base
emb_mean,emb_std = all_embs.mean(), all_embs.std()
embed_size = all_embs.shape[1]

nb_words = len(word_index)
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))

print(embedding_matrix.shape[1]) #output: 256

custom embedding creation which is a dictionary containing word and corresponding word vector

cust_embedding = {}

for word, indx in word_index.items():
if indx < nb_words:
embedding_vector = fasttext_embeddings_index.get(word)
if embedding_vector is None:
embedding_vector = glove_embeddings_index.get(word)
if embedding_vector is None:
embedding_vector = embedding_matrix[indx]

cust_embedding[word] = embedding_vector

saving custom embedding in .txt file which will be later used during preprocessing using torchtext

with open(’/custom_embeddings.txt’, ‘w+’) as f:
for token, vector in cust_embedding.items():
vector_str = ’ ‘.join([str(v) for v in vector])
f.write(f’{token} {vector_str}\n’)

Next I am trying to create a torchtext.vocab.Vectors object

import torchtext.vocab as vocab

custom_embeddings = vocab.Vectors(name = ‘/custom_embeddings.txt’, max_size= 256)

Here using ‘max_size=’ argument is throwing error:

TypeError: init() got an unexpected keyword argument ‘max_size’

But, I checked the torchtext.vocab.Vectors documentation where I could see this max_size argument is present :
class torchtext.vocab.Vocab(counter, max_size=None, min_freq=1, specials=[’’], vectors=None, unk_init=None, vectors_cache=None, specials_first=True)

And I need to set the size of my custom embedding to 256 or else later during training my model I am getting run time error.

code snippet of vocabulary building for encoder(ENC) and decoder(DEC) input using custom embedding:

ENC_TEXT.build_vocab(train_data, vectors = custom_embeddings)
DEC_TEXT.build_vocab(train_data, vectors = custom_embeddings)

model.embedding.weight.data.copy_(ENC_TEXT.vocab.vectors)

model.embedding.weight.data.copy_(DEC_TEXT.vocab.vectors)

Giving below the parameter setting and run time error received during training the model if I do not change the custom embedding dimension to 256:

INPUT_DIM = len(ENC_TEXT.vocab)
OUTPUT_DIM = len(DEC_TEXT.vocab)
HIDDEN_DIM = 256 # size of each pretrained word vector in the embedding matrix ie size[1] of the embedding matrix
ENC_LAYERS = 3
DEC_LAYERS = 3
ENC_HEADS = 10
DEC_HEADS =10
ENC_PF_DIM = 512
DEC_PF_DIM = 512
ENC_DROPOUT = 0.1
DEC_DROPOUT = 0.1

enc = Encoder(INPUT_DIM,
HIDDEN_DIM ,
ENC_LAYERS,
ENC_HEADS,
ENC_PF_DIM,
ENC_DROPOUT,
device)

dec = Decoder(OUTPUT_DIM,
HIDDEN_DIM ,
DEC_LAYERS,
DEC_HEADS,
DEC_PF_DIM,
DEC_DROPOUT,
device)

RuntimeError Traceback (most recent call last)
in ()
20 ENC_PF_DIM,
21 ENC_DROPOUT,
—> 22 device)
23
24 dec = Decoder(OUTPUT_DIM,

in init(self, input_dim, hid_dim, n_layers, n_heads, pf_dim, dropout, device, max_length)
18
19 # step added for custom embedding
—> 20 self.tok_embedding.weight.data.copy_(SRC.vocab.vectors)
21
22 self.pos_embedding = nn.Embedding(max_length, hid_dim)

RuntimeError: The size of tensor a (256) must match the size of tensor b (300) at non-singleton dimension 1

The main issue I am facing is this run time error of size mismatch between tensors.
Looking for the reason behind led to the size of the custom embedding being fixed to 300(instead of 256)

Really appreciate if you kindly help in resolve the issue.