Chance: Is It Real? If copy-pasting the response in the other thread to you is what it takes to get you to stop being petulant, here it is:
This is a response to @Jeremiah from the 'Chance, Is It Real?' thread.
Pre-amble: I'm going to assume that someone reading it knows roughly what an 'asymptotic argument' is in statistics. I will also gloss over the technical specifics of estimating things in Bayesian statistics, instead trying to suggest their general properties in an intuitive manner. However, it is impossible to discuss the distinction between Bayesian and frequentist inference, so it is unlikely that someone without a basic knowledge of statistics will understand this post fully.
Reveal rough definition of asymptotic argument
In contemporary statistics, there are two dominant interpretations of probability.
1) That probability is always proportional to the long-term frequency of a specified event.
2) That probability is the quantification of uncertainty about the value of a parameter in a statistical model.
(1) is usually called the 'frequentist interpretation of probability', (2) is usually called the 'Bayesian interpretation of probability', though there are others. Each of this philosophical positions has numerous consequences for how data is analysed. I will begin with a brief history of the two viewpoints.
The frequentist idea of probability can trace its origin to Ronald Fisher, who gained his reputation in part through analysis of genetics in terms of probability - being a founding father of modern population genetics, and in part through the design and analysis of comparative experiments - developing the analysis of variance (ANOVA) method for their analysis. I will focus on the developments resulting from the latter, eliding technical detail. Bayesian statistics is named after Thomas Bayes, the discoverer of Bayes Theorem', which arose in analysing games of chance. More technical details are provided later in the post. Suffice to say now that Bayes Theorem is the driving force behind Bayesian statistics, and this has a quantity in it called the prior distribution - whose interpretation is incompatible with frequentist statistics.
The ANOVA is an incredibly commonplace method of analysis in applications today. It allows experimenters to ask questions relating to the variation of a quantitive observations over a set of categorical experimental conditions.
For example, in agricultural field experiments 'Which of these fertilisers is the best?'
The application of fertilisers is termed a 'treatment factor', say there are 2 fertilisers called 'Melba' and 'Croppa', then the 'treatment factor' has two levels (values it can take), 'Melba' and 'Croppa'. Assume we have one field treated with Melba, and one with Croppa. Each field is divided into (say) 10 units, and after the crops are fully grown, the total mass of vegetation in each unit will be recorded. An ANOVA allows us to (try to) answer the question 'Is Croppa better than Melba?. This is done by assessing the mean of the vegetation mass for each field and comparing these with the observed variation in the masses. Roughly: if the difference in masses for Croppa and Melba (Croppa-Melba) is large compared to how variable the masses are, we can say there is evidence that Croppa is better than Melba.* How?
This is done by means of a hypothesis test. At this point we depart from Fisher's original formulation and move to the more modern developments by Neyman and Pearson (which is now the industry standard). A hypothesis test is a procedure to take a statistic like 'the difference between Croppa and Melba' and assign a probability to it. This probability is obtained by assuming a base experimental condition, called 'the null hypothesis', several 'modelling assumptions' and an asymptotic argument .
In the case of this ANOVA, these are roughly:
A) Modelling assumptions: variation between treatments only manifests as variations in means, any measurement imprecision is distributed Normally (a bell curve).
B) Null hypothesis: There is no difference in mean yields between Croppa and Melba
C) Asymptotic argument: assume that B is true, then what is the probability of observing the difference in yields in the experiment assuming we have an infinitely large sample or infinitely many repeated samples? We can find this through the use of the Normal distribution (or more specifically for ANOVAS, a derived F distribution, but this specificity doesn't matter).
The combination of B and C is called a hypothesis test.
The frequentist interpretation of probability is used in C. This is because a probability is assigned to the observed difference by calculating on the basis of 'what if we had an infinite sample size or infinitely many repeated experiments of the same sort?' and the derived distribution for the problem (what defines the randomness in the model).
An alternative method of analysis, in a Bayesian analysis would allow the same modelling assumptions (A), but would base its conclusions on the following method:
A) the same as before
B) Define what is called a prior distribution on the error variance.
C) Fit the model using Bayes Theorem.
D) Calculate the odds ratio of the statement 'Croppa is better than Melba' to 'Croppa is worse than or equal to Melba' using the derived model.
I will elide the specifics of fitting a model using Bayes Theorem. Instead I will provide a rough sketch of a general procedure for doing so below. It is more technical, but still only a sketch to provide an approximate idea.
Bayes theorem says that for two events A and B and a probability evaluation P:
P(A|B) = P(B|A)P(A) / P(B)
where P(A|B) is the probability that A happens given that B has already happened, the conditional probability of A given B. If we also allow P(B|A) to depend on the data X, we can obtain P(A|B,X), which is called the posterior distribution of A.
For our model, we would have P(B|A) be the likelihood as obtained in frequentist statistics (modelling assumptions), in this case a normal likelihood given the parameter A = the noise variance of the difference between the two quantities. And P(A) is a distribution the analyst specifies without reference to the specific values obtained in the data, supposed to quantify the a priori uncertainty about the noise variance of the difference between Croppa and Melba. P(B) is simply a normalising constant to ensure that P(A|B) is indeed a probability distribution.
Bayesian inference instead replaces the assumptions B and C with something called the prior distribution and likelihood, Bayes Theorem and a likelihood ratio test. The prior distribution for the ANOVA is a guesstimate of how variable the measurements are without looking at the data (again, approximate idea, there is a huge literature on this). This guess is a probability distribution over all the values that are sensible for the measurement variability. This whole distribution is called the prior distribution for the measurement variability. It is then combined with the modelling assumptions to produce a distribution called the 'posterior distribution', which plays the same role in inference as modelling assumptions and the null hypothesis in the frequentist analysis. This is because posterior distribution then allows you to produce estimates of how likely the hypothesis 'Croppa is better than Melba' is compared to 'Croppa is worse than or equal to Melba', that is called an odds ratio.
The take home message is that in a frequentist hypothesis test - we are trying to infer upon the unknown fixed value of a population parameter (the difference between Croppa and Melba means), in Bayesian inference we are trying to infer on the posterior distribution of the parameters of interest (the difference between Croppa and Melba mean weights and the measurement variability). Furthermore, the assignment of an odds ratio in Bayesian statistics does not have to depend on an asymptotic argument relating the null hypothesis and alternative hypothesis to the modelling assumptions. Also, it is impossible to specify a prior distribution through frequentist means (it does not represent the long run frequency of any event, nor an observation of it).
Without arguing which is better, this should hopefully clear up (to some degree) my disagreement with @Jeremiah and perhaps provide something interesting to think about for the mathematically inclined.