No, it is not exactly the same. Even though learning embeddings is part of training an AWD-LSTM, the power of the language model comes from your ability to get representations for a whole sentence or text using the hidden state of the encoder. With plain word embeddings the usual way is to use a linear combination of word embeddings to represent a sentence/text, which has far less representational power.
As to the second part of your question, you can change the processor object to a char tokenizer instead of word tokenizer (this will require some coding), but the training process remains the same. Though the original AWD-LSTM paper describes a word-level model, I don’t see why this should not work on the char level.