Pattern Recognition, Machine Learning, and the Bayesian Approach

Author: Stephen J. Turnbull
Organization: Faculty of Engineering, Information, and Systems at the University of Tsukuba
Contact: Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp>
Date: August 9, 2018
Copyright: 2018, Stephen J. Turnbull
topic:statistics

Broad outline based in part on Pattern Recognition and Machine Learning by Christopher Bishop (New York: Springer-Verlag, 2006; ISBN: 978-0-387-31073-2 Probably copyright infringement: http://users.isr.ist.utl.pt/~ wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf). Bishop does not cover unsupervised learning at all. Unsupervised learning follows The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani, and and Jerome Friedman (New York: Springer-Verlag, 2001; ISBN: 978-0387-95284-0). Detailed discussion is based on both books.

Modern practice of business and social science statistics has undergone a dramatic transformation in the past 3 decades. Originally business statistics relied pragmatically on analysis of variance (ANOVA) for qualitative data as practiced in experimental sciences such as biology and medicine and regression from economics, with an epistemological basis in the frequentist interpretation of probability. These two streams converged through the idea of latent variables for ANOVA (e.g., in structural equation modeling (SEM), also called covariance structure analysis) and instrumental variables modeling in econometrics, although the implementations continue to be tuned to qualitative data and cardinal data respectively.

However, in business planning and economic policy, the frequentist epistemology, which presumes an in-principle repeatable experiment, is not very persuasive, and leads to difficult-to-interpret results. An example that Bishop gives is that of flipping a coin three times, where it comes up tails all three times. The minimal frequentist analysis of this experiment would estimate that the coin has a probability of 1 to come up tails, with zero standard error of estimate. Unless your model is of a two-tailed coin or a "trick" coin, this is completely implausible. Unfortunately, the frequentist interpretation intentionally provides no guidance about resolving this difficulty, except to get more data -- and even then, because of the degenerate standard error, there's no measure of the confidence you should have in the results.