What is a `Bernoulli trial` and how is it used to implement dropout?

You can think of a Bernoulli trial as a coin flip, with heads and tails representing 1 and 0, respectively. If prob(heads) = prob(tails) = 0.5, it’s a fair coin. Otherwise the coin is unfair: for example if prob(heads) = 0.7 then prob(tails) = 1 - 0.7 = 0.3.

In the context of dropout, you essentially flip a coin (fair or unfair, depending on the chosen value of p) to decide whether to drop each node from the network. If the result is heads (1) you drop the node; if it’s tails (0) you leave the node in place.