Under review by guest columnist Chris Edwards:
Clayton, Aubrey. (2021). Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia University Press.
Chivers, Tom. (2024). Everything is Predictable: How Bayesian Statistics Explain Our World. One Singal Publishers.
Carroll, Sean. (2024). The Biggest Ideas in the Universe: Quanta and Fields. Penguin.
Chris Edwards, EdD is a high school teacher, a frequent contributor to Skeptic, and the author of the forthcoming book The New Order: How AI Rewrites the Narrative of Science (Prometheus, 2025). He teaches AP world history and an English course on critical thinking at a public high school in the Midwest and is the author of To Explain it All: Everything You Wanted to Know about the Popularity of World History Today; Connecting the Dots in World History; Femocracy: How Educators Can Teach Democratic Ideals and Feminism; and Beyond Obsolete: How to Upgrade Classroom Practice and School Structure.
Statistics and theoretical physics, intimately connected, recently went through a forced update created by skepticism about the use of traditional explanatory models. The results of these updated theories are detailed in three new books: Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science (2021) by Aubrey Clayton, Everything is Predictable: How Bayesian Statistics Explain Our World (2024) by Tom Chivers, and The Biggest Ideas in the Universe: Quanta and Fields (2024) by Sean Carroll. Together they synthesize to create connections between Bayesian probabilities and wave function theories. This means that statisticians and physicists can model wave functions with probability functions more tightly than ever before, and in so doing avoid the logical problems that come from human-constructed analogies. Crucially, this is all coming just in time for AI to take the process over.
In Bernoulli’s Fallacy, Aubrey Clayton writes about how the Swiss mathematician Jacob Bernoulli (1655-1705) created what became known as the frequentist method of statistical analysis. Clayton begins bluntly by stating “The methods of modern statistics—the tools of data analysis routinely taught in high schools and universities, the nouns and verbs of the common language of statistical inference spoken in research labs and written in journals, the theoretical results of thousands of person-years’ worth of effort—are founded on a logical error” (p.1).
Bernoulli drew a direct connection between large sample sizes and theoretical predictability. “What he was able to show,” Clayton writes “was that…observed frequencies would necessarily converge to true probabilities as the number of trials got larger” (p.27). This concept stuck because it made intuitive sense. Yet, Bernoulli’s frequentist approach suffered from a theoretical constraint. If a statistician predicts that “heads” will come up 25 times in 50 trials, the statistician can explain away a “40 heads to 10 tails” result by stating that the prediction and reality would align at a higher number of trials. “However,” Clatyon writes, “this…presents a practical impossibility: we can’t flip a coin an infinite number of times.” Also, she adds, flip a coin that frequently and the coin will become misshapen in a way that a theoretical coin will not.
At some level, this is a common enough conceit. Anyone who has ever seen a “poll of polls” during an election season might have thought “what about a poll of poll of polls?” and realized that frequentist statistics might not converge on actual predictions in time for election night. Clayton then goes on to note that statistics hid the problem with p-values, an all-but-impossible-to-calculate concept about the predictive veracity of a sample. As is often the case in science, a concept overexplained is a concept flawed. As Clayton states “In the centuries following Bernoulli, as empiricism began to crescendo, some made the leap to claiming this long-run frequency of occurrence was actually what the probability was by definition” (p. 29).
Untangle the p-value, and one finds only tautology. This is not unusual in theoretical mathematics, but statistical analysis matters more than theoretical math because statistical analysis of experiments often inform practical applications in public policy. Researchers can’t just keep performing experiments over and over until theoretical predictions match experimental outcomes. Hence the problems that occur when experimental outcomes are not replicable. “These failures of replication,” Clayton writes, “are compounded by the fact that effect sizes even among the findings that do replicate are generally found to be substantially smaller than what the original studies claimed” (p. 275). This is likely because only experiments with substantial “effect findings” are likely to get the attention and funding to be replicated, and the replication experiment gets a result that reverts to the mean.
So what do we do about the problem? Tom Chivers answers this question by channeling the theories of the English philosopher and statistician Thomas Bayes (1702-1761). Chivers writes “for Bayes, probability is subjective. It’s a statement about ignorance and our best guesses of the truth. It’s not a property of the world around us, but of our understanding of the world” (p.65). Later, Chivers contrasts Bayesian analysis with Bernoulli’s approach by giving a nod to Clayton:
The nature of frequentist statistics requires that you either reject the null or you don’t. Either there’s a real effect, or there isn’t. And so, if you get a big enough sample size you’ll definitely find something. A Bayesian, instead, can make an estimate of the size of the effect and give a probability distribution (p. 149.)
Because they are intuitive and practical, Bayesian probabilities don’t need to hide any underlying flaws with p-values and null-hypotheses (although the latter has uses). The core Bayseian formula is simply understood as a variation of conditional probabilities: given what you know, what are the odds of a certain outcome? The more that you know, the more predictable your outcome can be. Conditional probabilities can be calculated by converging additive factors and then dividing those by singular factors that one has knowledge about, or the “given.”
Late in the book, Chivers quotes a “superforecaster” named David Manheim, who treats statistical “base rates” (characteristics, by percentage, that exist in a population) as the core “given” in Bayesian analysis. While such an approach proves to be logically sound, it relies upon changing the mathematical probability analysis at the rate that new information (the “given”) comes in. The problem, then, becomes practical because “Most of us…don’t keep base rates in our mind like that, so our beliefs are swayed by every new bit of information,” writes Chivers. This leads to a new understanding of statistical analysis:
…there’s more to being a good forecaster than using base rates. For one thing, “using new data as a likelihood function” sounds nice and simple, but in most cases, you can’t just do the math—yet people are still using their judgement to decide how much to update from the base rate” (p.258).
While this might sound subjective, Chivers advises forecasters to keep track of their predictions and measure them against each other, so that over time the predictions become less susceptible to erroneous base rates. In stating this, Chivers turns forecasting into a process informed first by intuitions about base rates but then shaped into specific predictions by adding more information to your Bayesian “given.” To use an easy example, if a fifty-year-old woman walks into an oncologist’s office; the oncologist would diagnose from a base rate about the percentages of cancer in the women-fifty-and-over category, but then shape the diagnosis with whatever data comes in from medical testing.
While Clayton and Chivers both approach statistical analysis as connected to sample size replication, diagnostics, and forecasting, their arguments must connect to the branch of physics most closely connected to probability: quantum mechanics. This brings us to Sean Carroll’s book, dense with ideas and tightly packaged with strings of logic: The Biggest Ideas in the Universe: Quanta and Fields. The book jacket promises the reader an explanation of theoretical physics that “goes beyond analogies.” Reader beware: this means that Carroll requires that you stretch up to his explanations as he will not contract his theories to fit with what you might be more familiar with. There are no references to rubber bed sheets and bowling balls, and he tells you right away that nothing in the quantum world acts much like a game of billiards.
This reviewer found the approach to be refreshing more often than it was confusing. To take one example, Carroll never tries to describe anything by analogy to a “year,” instead writing that “Nothing ‘travels’ through time, forward or backward; things exist at each moment of time and the laws of physics guarantee that there is some persistence of things from moment to moment. Certainly, any implication that we could use antiparticles to send signals backward in time, or somehow affect the past, is completely off base” (p. 107). Although Carroll does not explicitly say this, his approach sensibly does away with hypothetical time travel scenarios where we imagine a past state of the universe as compared to current measures of time.
Without analogies, and with incomplete information, physicists rely on probabilities. As Carroll notes, “Set up a quantum system with some wave function, let it evolve according to the Schrodinger equation, then calculate the probability of measurement outcomes using the Borne rule. There’s a bit of mathematical heavy lifting along the way, but the procedure itself is clear enough” (p.41). Taking the measurement of a system causes it to break from a probabilistic potential reality into actual reality. Carroll writes that this is the measurement problem.
The introduction of a measurement force disrupts any connection of wave functions, Carroll later explains “Because the apparatus is a macroscopic object that cannot help but interact with the bath of photons around it, the environment becomes entangled too. This process is known as decoherence—when a quantum system in a superposition becomes entangled with its environment” (p. 71). Carroll carefully states that a wave function is not a field. Later, he explains “If the wave function is not a field, but we’re doing quantum field theory, what is the relationship between the two? The answer is that we have a wave function of field configurations, in exactly the same sense that a quantum theory of particles features a wave function of particle positions” (p.91).
Quantum physicists, and therefore Carroll, think of particles as statistical expressions of a field. There’s not really a particle there, but a temporary knot of waves. Later, Carroll adds:
Different fields have different spins, are characterized by different symmetries, and correspondingly come with their own notations…fields interact in myriad ways, and calculating the physical effects of those interactions can be a bit of a mess” (p. 98).
Connections with Bayesian probabilities can be found throughout Carroll’s book, such as when he writes about the theoretical physicist and computer scientist Ken Wilson (1936-2013). In the 1960s Wilson was one of the first to see the direct application of computer technology to the problems of physics. This passage by Carroll provides the direct connection to the new understanding of Bayesian probabilities offered by Clayton and Chivers:
Wilson was inspired by the newfangled device that was beginning to become useful to physicists: the digital computer. Imagine we try to do a numerical simulation of a quantum field theory. Inside a finite-sized computer memory, we cannot literally calculate what happens at every location in space, since there is an infinite number of such locations. Instead, we can follow the fields approximately by chunking space up into a lattice of points separated by some distance. What Wilson realized is that he could systematically study what happens as we take that separation distance to be smaller and smaller. And it’s not just that we’re doing calculus, taking the limit as the distance goes to zero. Rather, at any fixed lattice size we can construct an “effective” version of the theory that will be as accurate as we like. In particular, we can get sensible physical answers without ever worrying about—or even knowing—what happens when the distance is exactly zero (p. 129).
Put another way, quantum particles move a lot faster than we do. By the time a human observer can cognitively process the position of a particle, the particle has moved on. Human observers can only see the universe as it was, never as it is. A computer can mathematically model a wave function by assessing new information as a Bayesian “given” and can recalculate probabilities from that about a particle’s position faster than humans can. The distance between an event and our understanding of an event can never be zero, but computers can process information and break down probabilities with information at a rate that humans cannot comprehend.
Synthesis and Analysis
Since Clayton and Chivers found something new by studying a couple of old mathematicians, it is worth dusting off one of James Clerk Maxwell’s most underappreciated ideas regarding probabilities and temperature. In Convergence: The Idea at the Heart of Science (2016), Peter Watson writes:
Maxwell saw that what was needed was a way of representing many motions in a single equation, a statistical law. He devised one that said nothing about individual molecules but accounted for the proportion that had the velocities within any given range. This was the first-ever statistical law in physics, and the distribution of velocities turned out to be bell-shaped, the familiar normal distribution of populations about a mean. But its shape varied with the temperature—the hotter the gas, the flatter the curve and the wider the bell (p. 39).
Maxwell’s theory connects quantum mechanics with probability theory. From a probability perspective, an object at absolute zero creates a flat probability line because all the particles adhere directly to the mean. Adding heat to a system causes particles and waves to fly off into different probability states. As Carroll notes, “Heavier particles decay into lighter ones. In special relativity, energy is conserved but ‘mass’ is not separately conserved; it is just one form of energy. A heavy particle can decay into a collection of particles with a lower mass, with the remaining energy being the kinetic energy of the offspring particles” (p. 257). He just described entropy, or the heat loss that comes from interactions. Any moving system, and there is no other kind, releases heat that can be directly translated into Maxwell’s probability field.
Vibrating strings can be understood, for example, in a direct ratio to the heat lost by the vibration. Cooling an object to absolute zero produces the same “slowing time” effect as time dilation, for example, which means that “time” as a concept can be understood as relative motion or as relative temperature, but the temperature model more easily fits with our understanding of probability. Carroll states that “No matter how big our system is, there is only one wave function. Ultimately, there is just one quantum state for everything, the wave function of the universe” (p. 63). That function can be broken up into smaller systems and understood through probability and this means that we can’t get the absolute truth of wave function, but we can get pretty close.
To take Carroll’s “no-analogies” conceit to its logical conclusion, computer-generated AI will not need the esoteric names that we assign to the elements. Each element can just as well be understood as being in a probabilistic state of entropy; with stable elements decaying slowly and unstable elements radiating quickly. Because computers can turn information into Bayesian probabilities so quickly, an AI can adjust its base rate accordingly and close the gap between “given” information and probabilities instantly. AI will go beyond human theoretical capacities quickly, but thanks to the theoretical updates, driven by the skepticism of old methods, given to us by Clayton, Chivers, and Carroll, we have a chance to understand how AI does it.
I certainly enjoyed Clayton's book too.
The reason frequentist stats don't work (I think) is that in practice there are no realistic situations for it.
First they are only applicable to formal experimental situations where you have worked to remove biases using randomization and controls, most data is observational, packed with plenty of biases so the objectivity of any p-value can be called into question there.
Second, where there is a formal experiment it requires people analyzing the results just to do the pre-decided test, and no messing about. But people with an interest in the outcome are going to mess about with the data, only a very disinterested person is going to do it right.
So the case where frequentist stats works is where a disinterested person is prepared to spend a lot of money and time setting up a formal experiment. Not a situation that occurs often!
Pre-registration can help correct for biases that drive p-hacking imo. I push for RAD scholarship in my discipline (rhetoric, writing, and composition) so orthodox social Justice theorizing/metaphors can be proven/disproven rather than propagated by mere belief and nonreplicable case studies. Every few years a scholar makes the same RAD call…to no change.
But I do teach my writing students. To extend this discussion, I recommend reading Science Fictions (2021) by Stuart Ritchie as a follow-up to this excellent 4-book review. He contends that incentives like publication hype, negligence, and fraud contribute to the replication crisis where studies are rarely replicated, if ever. We need to fund replication and meta-analyses not only genius original research.