Padding, masking of lstm/bi-lstm issues in keras

Hi, I’m really confused the masking layer/function in the keras.

  1. when we pad the input sequence and then feed into the embedding layer, will the pre-padding or post-padding influence the result? If yes, how?
  2. after using masking and then apply the lstm layer, if the return _sequence flag is set to false, then which time slice is the returned hidden state? for example, if we have an input whose original time step is 7, and then padded with 10 with post-padding, is the result on time step 7 or time step 10?
  3. similar problem with above, but with bi-directional lstm, I just found this post: which method should i choose for the bi-lstm, the direct bi-lstm api or the method in this post, or they are just same?

Anyone has some experiences or have some toy example to illustrate this? Thanks a lot!