[NLP] (for dummies) Cant figure the conditional probability decomposition over corpus Vocab

This must sound silly, but I could not find any explanation, or even clue on google. It must be very obvious for 99.99…%, but then letting it be clear should be very straight :slight_smile:

So the general assumption I cant explain is: Lets consider the probability of a document being the probability of having all its words/tokens. If D is the document, and Wi its words then P(D) = P(W1&…&Wd) with d the number of words of the document.

To me, it seems just like the probability to flip successfully d coins. And if we consider a given combination of d coin flips, among n coin flips, the probability of this combination isnt the d successful + the d-n failed coin flips ?

So, if we consider the document within the entire vocabulary, shouldnt we also consider the probability of the n-d words that did not appear ?

Again, I might be missing something super obvious.