Fine-tuning a Model on a v3-32 TPU

Hi everyone,

Hope you are well. :slightly_smiling_face::wave:

I want to please find out if anyone has experience running a code-base, specifically, fine-tuning an LLM, on a v3-32 TPU? From my understanding, a v3-32 TPU has 4 hosts, and each of these hosts has 8 devices, respectively, so the entire TPU has a total of 32 devices. To get the model to work across this TPU, to successfully run on this TPU, the model and or data needs to be sharded across the hosts, right?

If any one has experience with this, or can point me to some useful resources, will really appreciate it a lot.
This is specifically a Google Cloud v3-32 TPU.

Many thanks in advance. Much and truly appreciated.

Kind Regards,
Zakia Salod