You can think of a `Bernoulli trial`

as a coin flip, with **heads** and **tails** representing 1 and 0, respectively. If prob(heads) = prob(tails) = 0.5, it’s a **fair** coin. Otherwise the coin is **unfair**: for example if prob(heads) = 0.7 then prob(tails) = 1 - 0.7 = 0.3.

In the context of dropout, you essentially flip a coin (fair or unfair, depending on the chosen value of p) to decide whether to drop each node from the network. If the result is **heads** (1) you drop the node; if it’s **tails** (0) you leave the node in place.