Untarring 6B.50d I do not find glove.6B.50d.txt.
The only files are, 6B.50d_idx.pkl, 6B.50d_words.pkl, 6B.50d.dat
Kindly advice.
Untarring 6B.50d I do not find glove.6B.50d.txt.
The only files are, 6B.50d_idx.pkl, 6B.50d_words.pkl, 6B.50d.dat
Kindly advice.
Are you trying to preprocess the 6B.50d.tgz
file from http://files.fast.ai/models/glove/6B.50d.tgz?
Those are already preprocessed and ready to load using the load_glove
function!
If you want to preprocess the original file you can download it here: GloVe: Global Vectors for Word Representation. I think the 50d, 100d, 200d and the 300d models come as a single file (~822MB).
If you just want to take a look at what the file looks like, you can check out the first 100 here: torch-util/glove.6B.50d.txt.first100 at master · davidBelanger/torch-util · GitHub
The wordvectors workwork requires this.
Given the wordvectors workbook has an error (see another thread on this - need to delete a newline character on line 51) and the limited forum discussion on this (+ git repo not being fixed) I see this as a minor non-essential exercise.
Pity. I usually like to explore the extras to understand the problem fully.
kindof surprised there has not been more discussion on this.