You’re doing great! Here’s the thing to think about: regularization penalizes coeffs that are larger. By using NB features, we don’t have to use such large coeffs to get the same result, compared to using plain binary features.

Once you’ve understood that, you’ll soon realize that NB-SVM still isn’t ideal - since we’d really like a *zero* coeff to represent our prior expectation as to the behavior of that feature. At that point, we can start to talk about the extension I made to NB-SVM which is the current state of the art in linear models for sentiment analysis! (Which no-one has written up yet - so if you get to the point you understand this bit, you can be the first person to put it down in writing… You’re well on the way to being there.)