This all depends on what it is that you are trying to achieve. The approach will differ whether you are trying to build a language model or do sentiment analysis, etc.
I have not done anything with language translation so it is hard for me to provide specifics. I think there might be something on translation in part 2 v2 of the course and IIRC there might have been a lecture on it in part 2 v1, somewhere towards the beginning of the second half of part 2 IIRC.
If vocabulary size is a problem, maybe you can limit it by considering all words with frequency below some n (say 10 or 50) to be
I am not sure what would be the correct measures to take here as I have never attempted to tackle translation and I think the solutions here would be quite task specific. I would probably start with searching for a walk through / kaggle kernel that shows a simple translation example you could start building on (mainly getting the data in and out using the technology of your choice).
If PyTorch is what you are planning on using, in torchtext there seems to exist a
text/torchtext/datasets/translation.py Might be a good starting point. Other than that there is also a translation example here: https://github.com/pytorch/text/blob/master/test/translation.py that seems to demonstrate creating data iterators demonstarting the BucketIterator (very handy!) and creating vocabulary with min word frequency and capped max size specifically for translation. This article is quite a nice overview of some of the torchtext functionality.