File missing in 6B.50 compressed file

Untarring 6B.50d I do not find glove.6B.50d.txt.
The only files are, 6B.50d_idx.pkl, 6B.50d_words.pkl, 6B.50d.dat

Kindly advice.

1 Like

Are you trying to preprocess the 6B.50d.tgz file from http://files.fast.ai/models/glove/6B.50d.tgz?
Those are already preprocessed and ready to load using the load_glove function!

If you want to preprocess the original file you can download it here: https://nlp.stanford.edu/projects/glove/. I think the 50d, 100d, 200d and the 300d models come as a single file (~822MB).

If you just want to take a look at what the file looks like, you can check out the first 100 here: https://github.com/davidBelanger/torch-util/blob/master/glove.6B.50d.txt.first100

3 Likes

The wordvectors workwork requires this.

Given the wordvectors workbook has an error (see another thread on this - need to delete a newline character on line 51) and the limited forum discussion on this (+ git repo not being fixed) I see this as a minor non-essential exercise.

Pity. I usually like to explore the extras to understand the problem fully.

kindof surprised there has not been more discussion on this.

1 Like