This must sound silly, but I could not find any explanation, or even clue on google. It must be very obvious for 99.99…%, but then letting it be clear should be very straight
So the general assumption I cant explain is: Lets consider the probability of a document being the probability of having all its words/tokens. If D is the document, and Wi its words then P(D) = P(W1&…&Wd) with d the number of words of the document.
To me, it seems just like the probability to flip successfully d coins. And if we consider a given combination of d coin flips, among n coin flips, the probability of this combination isnt the d successful + the d-n failed coin flips ?
So, if we consider the document within the entire vocabulary, shouldnt we also consider the probability of the n-d words that did not appear ?
Again, I might be missing something super obvious.
- Finally found someone who shares my concerns. The answer seems pragmatical rather than mathematical: https://stats.stackexchange.com/questions/32614/including-missing-words-when-applying-naive-bayes-in-document-classification
- Digging again, something finally worthy with good references: https://stackoverflow.com/questions/33720659/should-naive-bayes-multiple-all-the-word-in-the-vocabulary