我是否正确理解贝叶斯推理和常客推理之间的差异?

机器算法验证 regression hypothesis-testing bayesian frequentist definition
2022-03-24 13:33:06
  1. 给定一系列独立实验,每个实验的结果都是成功或失败,成功的概率是介于 0 和 1 之间的某个数字 p

贝叶斯会考虑 k 次实验的结果X1,X2,...,Xk固定,而 p 将是随机的,然后假设 p 上的先验分布,然后根据 k 次实验的结果得出 p 的后验分布。使用 p 的分布,贝叶斯将使用可信区间

常客会考虑 pa 固定值,但 k 次实验的结果X1,X2,...,Xk随机并会尝试提出最大似然估计。常客会使用置信区间

  1. 假设检验

常客会使用正态近似来得出 p 值,以查看是否有任何理由拒绝零假设H0支持替代假设H1. 对于常客来说,P(H0),P(H1){0,1}

贝叶斯会分配先验概率H0并计算后验概率P(H0|k). 贝叶斯不会绝对选择H0H1. 相反,贝叶斯会更倾向于H0如果P(H0|k)>1P(H0|k)=P(H1|k)

  1. Regression: Given regression model g(E(Y))=β0+β1X1+...+βkXk

A frequentist would consider βi's and other parameters to be fixed and would use MLE or OLS.

A Bayesian would consider βi's and other parameters to be random, assign them prior distributions and then come up with posterior distributions for βi's and other parameters given the y's and X.


Anything wrong? Anything I missed?

2个回答

This questions is too broad, but I thought I would respond to a few points where your statements aren't accurate.

  1. Bayesians (typically) believe there is a fixed value for the parameters, but use a probability distribution to represent their uncertainty about what the true value is.

  2. A Bayesian is typically interested in the full posterior rather than a point or interval estimate of a particular parameter (although for simplicity in reporting results point or interval estimates are typically provided).

  3. A frequentist would not use a normal approximation for hypothesis testing with a point null in a binomial experiment.

  4. Even if a frequentist "rejects a null hypothesis" that does not mean they choose the alternative.

  5. Bayesians will choose between hypotheses if forced to, but typically we would prefer model averaging.

  6. In a regression problem many frequentists use penalized likelihood methods, e.g. lasso, ridge regression, elastic net, etc. and therefore would not be using the MLE or OLS estimators.

A Bayesian would consider the results of the experiments fixed and consider population parameters as stochasts. This in contrast to frequentist, who see the data as "just another sample in an endless stream of samples" and who see the population parameters as fixed (but unknown).

The logical Bayesian order would be: 1. define the prior distribution 2. collect data 3. use that data to update your prior distribution. After updating it is called the posterior distribution.

Mind you that a confidence interval is really different from a credible interval. A confidence interval relates to the sampling procedure. If you would take many samples and calculate a 95% confidence interval for each sample, you'd find that 95% of those intervals contain the population mean.

This is useful to for instance industrial quality departments. Those guys take many samples, and now they have the confidence that most of their estimates will be pretty close to the reality. They know that 95% of their estimates are close, but they can't say that about one specific estimate.

Compare this to rolling dice: if you roll 600 (fair) dice, your best guess is that 1/6, that is 100 dice, will roll a six. But if you someone has rolled 1 die, and asks you: - "What is the probability that this throw was a 6 ?", - the answer "Well, that is 1/6 or 16.6%" is wrong. The die shows either a 6, or some other figure. So the probability is 1, or 0.

When asked before the throw what the probability of throwing a 6 is, a Bayesian would say "1/6" (based on prior information: everybody knows that a die has 6 sides), but a Frequestist would say "No idea" because frequentism is solely based on the data, not on priors.

Likewise, if you have only 1 sample (thus 1 confidence interval), you have no way to say how likely it is that the population mean is in that interval. It is either in it, or not. The probability is either 1, or 0.


If a frequentist rejects H0, this means that P(data|H0) is smaller than some threshold. He says "It is very unlikely to find these sort of data if H0 were true, therefore I assume that H0 is not true, thus H1 must be true". Therefore, in this framework, H0 and H1 must be mutually exclusive and cover all possibilities.

As far as I understand, some frequentist say that if H0 is rejected, this does not imply that H1 is formally accepted; others say that rejecting the one equals accepting the other.

Hypothesis testing in a Bayesian method is slightly different. The method is to see how good the data are predicted by Hypothesis A, or B, or C (no need to limit this to 2 hypotheses). The researcher could say: "Hypothesis A explains the data 3 x better than Hypothesis B and 50 times better than Hypothesis C".