Bayesian Frameworks in Cognitive Science and Computer Vision

A conversation about the free energy principle, predictive processing, and how these theoretical frameworks compare to practical engineering approaches in robotics and autonomous vehicles.

Q: Are you familiar with explanations and theories in cognitive science involving "free energy"?

Yes, I'm quite familiar with this area. The free energy principle (FEP) is a theoretical framework developed primarily by Karl Friston, a neuroscientist at UCL, that attempts to provide a unified account of brain function, perception, action, and learning.

The core idea is that biological systems—particularly brains—work to minimise "variational free energy," which is a mathematical quantity borrowed from statistical physics and machine learning. In practical terms, minimising free energy means reducing the difference between an organism's internal model of the world and the sensory inputs it actually receives.

There are a few key concepts:

Predictive processing — The brain is fundamentally a prediction machine. It maintains generative models of the world and constantly generates predictions about incoming sensory data. Prediction errors (mismatches between expectation and reality) propagate up the hierarchy to update the model.

Active inference — Organisms don't just passively update beliefs; they also act on the world to make their predictions come true. You can minimise surprise either by changing your model or by changing the world to match your expectations.

The Markov blanket — Friston uses this concept to define the boundary between a system and its environment, arguing that any self-organising system that persists over time must implicitly minimise free energy.

The framework is mathematically dense and has attracted both enthusiasm and criticism. Supporters see it as a potential "theory of everything" for cognition. Critics argue it's unfalsifiable, overly abstract, or that it redescribes known phenomena without adding predictive power.

A concrete example: the fish in murky water

To make this less abstract, consider a fish navigating murky water looking for food.

The classical picture of perception would say: photons hit the retina, signals travel to the brain, the brain processes them, builds a representation of the world, decides what to do, sends motor commands to the fins. Perception, then cognition, then action—a linear pipeline.

The free energy picture reframes this entirely.

The fish's brain maintains a generative model—a set of expectations about what the world is like and how sensory signals relate to that world. It "predicts" what its eyes, lateral line, and other sensors should be reporting right now, given its current best guess about where it is, what's nearby, and how it's moving.

When actual sensory signals arrive, the brain compares them to the predictions. If they match, nothing much happens—the model is confirmed. If they don't match, that prediction error is informative. Maybe there's a shadow moving in a way that doesn't fit the model. The brain updates its hypothesis: perhaps a predator, perhaps prey, perhaps just a drifting plant.

But here's the key move: the fish doesn't just sit there passively updating beliefs. It can also act to reduce prediction error. If the visual signal is ambiguous, the fish can swim closer to get a better look. If the model says "food is over there," the fish can move toward it—and if the model is correct, the sensory signals will unfold as predicted. Action and perception are two sides of the same coin: both are ways of minimising the mismatch between model and world.

The "free energy" being minimised is essentially a measure of surprise—how much the incoming signals deviate from what the model expected. A fish that chronically fails to minimise free energy is a fish that's constantly surprised by the world, which means its model is bad, which means it probably gets eaten or starves.

The same story for humans

The same framework applies, just with more elaborate models. When you walk into a familiar room, your brain is predicting what you'll see before you see it—the layout, the furniture, the lighting. You don't consciously experience this prediction; you just experience the room. But if something is wrong—a window broken, an unexpected person sitting there—you notice. That noticing is prediction error breaking through.

Your actions are similarly predictive. Reaching for a coffee cup, your brain predicts the sequence of proprioceptive and tactile signals that should occur if your arm moves correctly and the cup is where you think it is. If the cup is heavier than expected, you notice—prediction error again. If someone has moved the cup, your hand closes on empty air and the mismatch is jarring.

On this account, perception is not building a picture of the world from scratch each moment. It's maintaining and refining a model, with sensory data serving mainly to correct the model when it drifts. And action is not a separate system that kicks in after perception is done. It's part of the same loop: you act in ways that either test your model or make your predictions come true.

Q: Can you introduce the foundations of Bayesian/statistical frameworks of mind and cognition more broadly—stepping back from Friston specifically?

Sure, let me try to sketch the landscape.

The basic move

The foundational insight is treating the mind as a system that maintains and updates probabilistic models of the world. Perception becomes inference: given sensory data, what's the most probable state of the world that caused it? Action becomes a way of gathering informative data or achieving preferred states. Learning becomes model updating over longer timescales.

This isn't new. Helmholtz in the 1860s talked about perception as "unconscious inference"—the idea that what we see isn't raw sensation but a conclusion drawn from incomplete evidence. But formalising it properly required tools that came later.

Bayes as the normative framework

Bayes' theorem tells you how to update beliefs given evidence. If you take this seriously as a normative account of reasoning—this is how an ideal agent should update—then you can ask how well humans and animals approximate Bayesian inference, and what computational architecture could implement it.

This spawned a research tradition in cognitive science through the 1990s and 2000s. Tenenbaum, Griffiths, Chater, Oaksford and others showed that many aspects of human cognition—language learning, causal reasoning, concept formation—look roughly Bayesian when you analyse them carefully. People don't just learn associations; they infer latent structure.

The appeal here is that you get why certain inferences are made, not just that they're made. A Bayesian model tells you what the rational solution to the problem is, given the available information and reasonable priors.

The problem of intractability

Pure Bayesian inference is computationally intractable for most realistic problems. The space of possible hypotheses is vast, and computing full posterior distributions is NP-hard in general.

So the question becomes: what approximations does the brain use? This is where different research programmes diverge.

Sampling-based approaches propose that the brain approximates posterior distributions by drawing samples—essentially running something like Monte Carlo methods. There's behavioural and neural evidence that human judgments show signatures of sampling: variability that looks like sampling noise, systematic biases that fall out of finite sampling, sequential effects that look like autocorrelated samples. This connects to work by Vul, Goodman, Gershman, and others.

Variational approaches propose that instead of sampling, the brain fits a tractable approximate distribution to the true posterior—minimising the divergence between them. This is where the "free energy" in Friston's framework comes from: variational free energy is the objective function you minimise when doing variational inference. It's an upper bound on surprise (negative log probability of data), and minimising it amounts to making your approximate posterior as close as possible to the true one.

Predictive coding as an implementation

One level down from the computational theory is the question of neural implementation. Predictive coding offers an answer.

The idea is that the brain is organised hierarchically, with each level trying to predict the activity of the level below. Only prediction errors propagate upward. Higher levels encode more abstract, slowly-changing features; lower levels encode fine-grained, fast-changing details.

This is computationally efficient (you only transmit what's unexpected) and biologically plausible (there's suggestive evidence from cortical microcircuitry, though it remains contested). Rao and Ballard's 1999 paper on predictive coding in visual cortex was influential here.

Friston's contribution was to show that predictive coding can be derived as a scheme for minimising variational free energy. So you get a story that connects the normative level (Bayesian inference), the algorithmic level (variational approximation), and the implementational level (predictive coding in neural hierarchies).

Active inference: closing the loop

Classical Bayesian perception is passive—you receive data and update beliefs. But organisms act, and their actions change what data they receive.

Active inference extends the framework to action. The idea is that you have beliefs not just about the current state of the world but about trajectories into the future. You select actions that minimise expected free energy—a quantity that balances reducing uncertainty (epistemic value) with achieving preferred outcomes (pragmatic value).

This reframes motor control as inference. You don't compute actions directly; you infer them as the actions that a system with your beliefs and preferences would take. There's something philosophically elegant here: perception and action become two aspects of the same optimisation.

Where evolution fits in

One reading of the free energy framework is that natural selection has shaped organisms to be approximately Bayesian—because accurate inference about the world aids survival. Priors are inherited (shaped by evolution) and updated (shaped by learning). This connects to older ideas in evolutionary epistemology.

Friston makes stronger claims: that the free energy principle isn't just descriptive but somehow constitutive of what it means to be a self-organising system. Anything that persists against the second law of thermodynamics, he argues, must implicitly be minimising free energy. This is where the framework gets philosophically ambitious and, critics would say, potentially unfalsifiable.

The landscape of positions

Roughly speaking, people in this space differ on:

How literally Bayesian is the brain? Strong Bayesians think the framework is more or less correct and the research programme is about filling in details. Weaker readings treat Bayesian models as useful normative benchmarks rather than process models.
Variational vs sampling? These aren't mutually exclusive—the brain might use different approximation schemes for different problems—but there's genuine disagreement about which is primary.
How unified should the theory be? Friston wants everything under one principle. Others are happy with a toolkit of models for different domains.
Falsifiability and empirical content. Some view the free energy principle as a genuine empirical theory; others see it as a framework or even a tautology dressed up in mathematics.

Suggested reading

Clark, Surfing Uncertainty — Very readable book-length treatment of predictive processing, less mathematically dense, philosophically sophisticated. Probably the best single introduction to the conceptual landscape.
Hohwy, The Predictive Mind — Similar territory, slightly more technical, good on the philosophical implications.
Tenenbaum's work on Bayesian cognitive science — The papers on learning concepts, causality, intuitive physics. Shows the explanatory power of Bayesian models in specific domains.
Gershman & Daw on approximate inference — Good review of sampling vs variational approaches in the brain.

Q: I've seen the Helmholtz quote but I doubt he was using "inference" in the modern statistical sense?

You're almost certainly right to be suspicious. This is a case where a historical quote gets retrofitted into a modern framework in a way that flatters the framework.

Helmholtz was writing in the 1860s, decades before the formalisation of statistical inference as we know it. He wasn't thinking in terms of probability distributions, likelihood functions, or Bayes' theorem as a computational procedure. The German he used was "unbewusster Schluss"—unconscious conclusion or unconscious reasoning.

What Helmholtz was doing was arguing against the empiricist view that perception is a direct readout of sensation. He observed that perception is underdetermined by sensory data—the same retinal image is compatible with infinitely many 3D scenes—and concluded that the brain must be doing something constructive, drawing on prior experience to settle on an interpretation. He framed this in terms of inductive reasoning: we learn associations between sensations and their typical causes, and these learned associations shape perception automatically.

So the core insight—perception involves going beyond the data, informed by prior experience—is genuinely there. But the specific Bayesian machinery (priors as probability distributions, likelihoods, posterior inference) is a modern gloss. Helmholtz's "inference" was closer to the informal sense: drawing a conclusion from incomplete evidence, the way you might infer someone's mood from their expression.

The retrospective canonisation of Helmholtz as a proto-Bayesian is useful for giving the modern programme a distinguished lineage, but it papers over significant conceptual differences. It's a bit like crediting Democritus with modern atomic theory because he thought matter was made of indivisible particles.

Q: Is it profitable to compare the last decade's academic and commercial work on computer vision/robotics (e.g. self-driving cars) with the cognitive science literature? Those tasked with making robots navigate the 3D world have also employed Bayesian techniques. How do those compare with the computations hypothesised in the Friston/Clark literature?

This is a really interesting question and I think the answer is "yes, but the comparison is revealing in ways that might not favour the cognitive science literature."

The engineering side actually works

The self-driving car people have had to solve concrete versions of the problems the cognitive science literature theorises about. And they've had to solve them well enough to not kill people.

Take SLAM (simultaneous localisation and mapping). A robot moving through an unknown environment has to infer both where it is and what the environment looks like, from noisy sensor data, in real time. This is genuinely a Bayesian inference problem—you have a joint posterior over your pose and the map, and you update it as data comes in.

Early approaches (extended Kalman filters, particle filters) were explicitly probabilistic. They maintained uncertainty estimates, propagated them through time, and fused information from multiple sensors in principled ways. This was Bayesian reasoning operationalised, with actual priors, actual likelihoods, actual approximate posteriors.

And it worked, sort of. But it was brittle, slow, and didn't scale well. The representations were hand-engineered, the state spaces were low-dimensional, and the approximations were often crude.

What actually won

The deep learning revolution mostly swept this away. Modern perception stacks in autonomous vehicles are dominated by neural networks trained end-to-end on vast datasets. They don't maintain explicit probability distributions over hypotheses. They don't do anything that looks like variational inference or predictive coding in any literal sense.

What they do is learn very good function approximations: image in, bounding boxes and semantic labels out. The uncertainty handling is often bolted on after the fact (if at all), rather than being intrinsic to the computation.

This is philosophically interesting. The engineering community tried the Bayesian approach, found it wanting for high-dimensional perception, and largely moved on. The representations that work are learned, distributed, and not readily interpretable as probability distributions over structured hypotheses.

Where the comparison gets awkward

The Friston/Clark literature makes claims about how biological perception works. It says the brain does something like variational inference, implemented via predictive coding. But:

The engineering systems that actually achieve robust perception in the real world mostly don't work this way.
When Bayesian methods are used in robotics (sensor fusion, state estimation, planning under uncertainty), they're applied to relatively low-dimensional, structured state spaces—not to raw high-dimensional perception.
The scaling properties are unfavourable. Variational inference and predictive coding have been tried in machine learning. They work for some things, but they haven't been the path to the best perceptual systems.

This doesn't prove the brain doesn't do predictive coding. Maybe biological neural networks have tricks that make it work better than our artificial versions. Maybe the claims are at a level of abstraction where the specific implementation doesn't matter. But it does suggest some tension.

A more sympathetic reading

One could argue the comparison isn't quite fair. The cognitive science theories are about understanding a system that already works (the brain), not about engineering a system from scratch. The constraints are different. Evolution had billions of years and different objectives. The brain has to be built from proteins and run on 20 watts.

There's also the point that modern neural networks, while not explicitly Bayesian, might be implicitly approximating something like Bayesian inference. Some theoretical work suggests that certain training procedures and architectures converge to solutions that look Bayesian in some limit. So maybe the engineering and cognitive science traditions are describing the same thing at different levels.

And active inference—the action-selection part of Friston's framework—does connect to ideas in reinforcement learning and planning. The exploration/exploitation tradeoff, the idea of acting to reduce uncertainty, the integration of perception and control—these are live concerns in robotics.

What I actually think

The profitable comparison might be this: the engineering work provides a reality check on which computational ideas actually scale to real-world perception and action. The cognitive science work provides a normative framework and conceptual vocabulary that can be useful for thinking about what systems should be doing, even if the implementation looks different.

But if someone claims the brain is doing variational free energy minimisation via hierarchical predictive coding, it's worth asking: why hasn't that approach produced the best artificial perception systems? Either the brain is doing something quite different from what we've tried, or it's doing something similar but with crucial ingredients we're missing, or the theory is more metaphor than mechanism.

dandavison/bayesian-brain.md

Select an option

No results found

Select an option

No results found

Bayesian Frameworks in Cognitive Science and Computer Vision