The short answer to your question is, "most of the time we don't know what P(cheese) is, and it is often (relatively) difficult to calculate."
The longer answer why Bayes' Rule/Theorem is normally stated in the way that you wrote is because in Bayesian problems we have - sitting in our lap - a prior distribution (the P(B) above) and likelihood (the P(A|B), P(A|notB) above) and it is a relatively simple matter of multiplication to compute the posterior (the P(B|A)). Going to the trouble to reexpress P(A) in its summarized form is effort that could be spent elsewhere.
It might not seem so complicated in the context of an email because, as you rightly noted, it's just P(cheese), right? The trouble is that with more involved on-the-battlefield Bayesian problems the denominator is an unsightly integral, which may or may not have a closed-form solution. In fact, sometimes we need sophisticated Monte Carlo methods just to approximate the integral and churning the numbers can be a real pain in the rear.
But more to the point, we usually don't even care what P(cheese) is. Bear in mind, we are trying to hone our belief regarding whether or not an email is spam, and couldn't care less about the marginal distribution of the data (the P(A), above). It is just a normalization constant, anyway, which doesn't depend on the parameter; the act of summation washes out whatever info we had about the parameter. The constant is a nuisance to calculate and is ultimately irrelevant when it comes to zeroing in on our beliefs about whether or not the email's spam. Sometimes we are obliged to calculate it, in which case the quickest way to do so is with the info we already have: the prior and likelihood.