It has been said that philosophy of science is as useful to scientists as ornithology is to birds. But perhaps it can be useful to metascientists?

State of Play


In the 20th century, philosophy of science attracted first-rate minds: scientists like Henri Poincaré, Pierre Duhem, and Michael Polanyi, as well as philosophers like Popper, Quine, Carnap, Kuhn, and Lakatos. Today the field is a backwater, lost in endless debates about scientific realism which evoke the malaise of medieval angelology.1 Despite being part of philosophy, however, the field made actual progress, abandoning simplistic early models for more sophisticated approaches with greater explanatory power. Ultimately, philosophers reached one of two endpoints: some went full relativist,2 while others (like Quine and Laudan) bit the bullet of naturalism and left the matter to metascientists and psychologists.3 "It is an empirical question, which means promote which ends".


Did the metascientists actually pick up the torch? Sort of. There is some overlap, but (with the exception of the great Paul Meehl) they tend to focus on different problems. The current crop of metascientists is drawn, like sharks to blood, to easily quantifiable questions about the recent past (with all those p-values sitting around how could you resist analyzing them?). They focus on different fields, and therefore different problems. They seem hesitant to make normative claims. Less tractable questions about forms of progress, norms, theory selection, etc. have fallen by the wayside. Overall I think they underrate the problems posed by philosophers.

Rational Reconstruction

In The History of Science and Its Rational Reconstructions Lakatos proposed that theories of scientific methodology function as historiographical theories and can be criticized or compared to each other by using the theories to create "rational historical reconstructions" of scientific progress. The idea is simple: if a theory fails to rationally explain the past successes of science, it's probably not a good theory, and we should not adopt its normative tenets. As Lakatos puts it, "if the rationality of science is inductive, actual science is not rational; if it is rational, it is not inductive." He applied this "Pyrrhonian machine de guerre" not only to inductivism and confirmationism, but also to Popper.

The main issue with falsification boils down to the problem of auxiliary hypotheses. On the one hand you have underdetermination (the Duhem-Quine thesis): testing hypotheses in isolation is not possible, so when a falsifying result comes out it's not clear where the modus tollens should be directed. On the other hand there is the possibility of introducing new auxiliary hypotheses to "protect" an existing theory from falsification. These are not merely abstract games for philosophers, but very real problems that scientists have to deal with. Let's take a look at a couple of historical examples from the perspective of naïve falsificationism.

First, Newton's laws. They were already falsified at the time of publication: they failed to correctly predict the motion of the moon. In the words of Newton, "the apse of the Moon is about twice as swift" as his predictions. Despite this falsification, the Principia attracted followers who worked to improve the theory. The moon was no small problem and took two decades to solve with the introduction of new auxiliary hypotheses.

A later episode involving Newton's laws illustrates how treacherous these auxiliary hypotheses can be. In 1846 Le Verrier (I have written about him before) solved an anomaly in the orbit of Uranus by hypothesizing the existence of a new planet. That planet was Neptune and its discovery was a wonderful confirmation of Newton's laws. A decade later Le Verrier tried to solve an anomaly in the orbit of Mercury using the same method. The hypothesized new planet was never found and Newton's laws remained at odds with the data for decades (yet nobody abandoned them). The solution was only found in 1915 with Einstein's general relativity: Newton should have been abandoned this time!

Second, Prout's hypothesis: in 1815 William Prout proposed that the atomic weights of all elements were multiples of the atomic weight of hydrogen. A decade later, chemists measured the atomic weight of chlorine at 35.45x that of hydrogen and Prout's hypothesis was clearly falsified. Except, a century after that, isotopes were discovered: variants of chemical elements with different neutron numbers. Turns out that natural chlorine is composed of 76% 35Cl and 24% 37Cl, hence the atomic weight of 35.45. Whoops! So here we have a case where falsification depends on an auxiliary hypothesis (no isotopes) which the experimenters have no way of knowing.4

Popper tried to rescue falsificationism through a series of unsatisfying ad-hoc fixes: exhorting scientists not to be naughty when introducing auxiliary hypotheses, and saying falsification only applies to "serious anomalies". When asked what a serious anomaly is, he replied: "if an object were to move around the Sun in a square"!5

Problem, officer?

There are a few problems with rational reconstruction, and while I don't think any of them are fatal, they do mean we have to tread carefully.

External factors: no internal history of science can explain the popularity of Lysenkoism in the USSR—sometimes we have to appeal to external factors. But the line between internal and external history is unclear, and can even depend on your methodology of choice.

Meta-criterion choice: what criteria do you use to evaluate the quality of a rational reconstruction? Lakatos suggested using the criteria of each theory (eg use falsificationism to judge falsificationism) but he never makes a good case for that vs a standardized set of meta-criteria.

Case studies: philosophers tend to argue using case studies and it's easy to find one to support virtually any position, even if its normative suggestions are suboptimal. Lots of confirmation bias here. The illustrious Paul Meehl correctly argues for the use of "actuarial methods" instead. "Absent representative sampling, one lacks the database needed to best answer or resolve these types of inherently statistical questions." The metascientists obviously have a great methodological advantage here.

Fake history: the history of science as we read it today is sanitized if not fabricated.6 Successes are remembered and failures thrown aside; chaotic processes of discovery are cleaned up for presentation. As Peter Medawar noted in Is the scientific paper a fraud?, the "official record" of scientific progress contains few traces of the messy process that actually generated said progress.7 He further argues that there is a desire to conform to a particular ideal of induction which creates a biased picture of how scientific discovery works.

Falsification in Metascience

Now, let's shift our gaze to metascience. There's a fascinating subgenre of psychology in which researchers create elaborate scientific simulations and observe subjects as they try to make "scientific discoveries". The results can help us understand how scientific reasoning actually happens, how people search for hypotheses, design experiments, create new concepts, and so on. My favorite of these is Dunbar (1993), which involved a bunch of undergraduate students trying to recreate a Nobel-winning discovery in biochemistry.8

Reading these papers one gets the sense that there is a falsificationist background radiation permeating everything. When the subjects don't behave like falsificationists, it's simply treated as an error or a bias. Klahr & Dunbar scold their subjects: "our subjects frequently maintained their current hypotheses in the face of negative information". And within the tight confines of these experiments it's usually true that it is an error. But this reflects the design of the experiment rather than any inherent property of scientific reasoning or progress, and extrapolating these results to real-world science in general would be a mistake.

Sociology offers a cautionary tale about what happens when you take this kind of reasoning to an extreme: the strong programme people started with an idealistic (and wrong) philosophy of science, they then observed that real-world science does not actually operate like that, and concluded that it's all based on social forces and power relations, descending into an abyss of epistemological relativism. To reasonable people like you and me this looks like an excellent reductio ad absurdum, but sociologists are a special breed and one man’s modus ponens is another man’s modus tollens. The same applies to over-extensions of falsificationism. Lakatos:

...those trendy 'sociologists of knowledge' who try to explain the further (possibly unsuccessful) development of a theory 'falsified' by a 'crucial experiment' as the manifestation of the irrational, wicked, reactionary resistance by established authority to enlightened revolutionary innovation.

One could also argue that the current focus on replication is too narrow. The issue is obscured by the fact that in the current state of things the original studies tend to be very weak, the "theories" do not have track records of success, and the replications tend to be very strong, so the decision is fairly easy. But one can imagine a future scenario in which failed replications should be treated with far more skepticism.

There are also some empirical questions in this area that are ripe for the picking: at which point do scientists shift their beliefs to the replication over the original? What factors do they use? What do they view a falsification as actually refuting (ie where do they direct the modus tollens)? Longitudinal surveys, especially in the current climate of the social sciences, would be incredibly interesting.

Unit of Progress

One of the things philosophers of science are in agreement about is that individual scientists cannot be expected to behave rationally. Recall the example of Prout and the atomic weight of chlorine above: Prout simply didn't accept the falsifying results, and having obtained a value of 35.83 by experiment, rounded it to 36. To work around this problem, philosophers instead treated wider social or conceptual structures as the relevant unit of progress: "thinking style groups" (Fleck), "paradigms" (Kuhn), "research programmes" (Lakatos), "research traditions" (Laudan), etc. When a theory is tested, the implications of the result depend on the broader structure that theory is embedded in. Lakatos:

We have to study not the mind of the individual scientist but the mind of the Scientific Community. [...] Kuhn certainly showed that psychology of science can reveal important-and indeed sad-truths. But psychology of science is not autonomous; for the-rationally reconstructed-growth of science takes place essentially in the world of ideas, in Plato's and Popper's 'third world'.

Psychologists are temperamentally attracted to the individual, and this is reflected in their metascientific research methods which tend to focus on individual scientists' thinking, or isolated papers. Meehl, for example, simply views this as an opportunity to optimize individuals' cognitive performance:

The thinking of scientists, especially during the controversy or theoretical crises preceding Kuhnian revolutions, is often not rigorous, deep, incisive, or even fair-minded; and it is not "objective" in the sense of interjudge reliability. Studies of resistance to scientific discovery, poor agreement in peer review, negligible impact of most published papers, retrospective interpretations of error and conflict all suggest suboptimal cognitive performance.

Given the importance of broader structures however, things that seem irrational from the individual perspective might make sense collectively. Institutional design is criminally under-explored, and the differences in attitudes both over time and over the cross section of scientists are underrated objects of study.

You might retort that this is a job for the sociologists, but look at what they have produced: on the one hand they gave us Robert Merton, and on the other hand the strong programme. They don't strike me as particularly reliable.

Fields & Theories

Almost all the scientists doing philosophy of science were physicists or chemists, and the philosophers stuck to those disciplines in their analyses. Today's metascientists on the other hand mostly come from psychology and medicine. Not coincidentally, they tend to focus on psychology and medicine. These fields tend to have different kinds of challenges compared to the harder sciences: the relative lack of theory, for example, means that today's metascientists tend to ignore some of the most central parts of philosophy of science, such as questions about Lakatos's "positive heuristic" and how to judge auxiliary hypotheses, questions about whether the logical or empirical content of theories is preserved during progress, questions about how principles of theory evaluation change over time, and so on.

That's not to say no work at all has been done in this area, for example Paul Meehl9 tried to construct a quantitative index of a theory's track record that could then be used to determine how to respond to a falsifying result. There's also some similar work from a Bayesian POV. But much more could be done in this direction, and much of it depends on going beyond medicine and the social sciences. "But Alvaro, I barely understand p-values, I could never do the math needed to understand physics!" If the philosophers could do it then so can the psychologists. But perhaps these problems require broader interdisciplinary involvement: not only specialists from other fields, but also involvement from neuroscience, computational science, etc.

What is progress?

One of the biggest questions the philosophers tried to answer was how progress is made, and how to even define it. Notions of progress as strictly cumulative (ie the new theory has to explain everything explained by the old one) inevitably lead to relativism, because theories are sometimes widely accepted at an "early" stage when they have limitations relative to established ones. But what is the actual process of consensus formation? What principles do scientists actually use? What principles should they use? Mertonian theories about agreement about standards/aims are clearly false, but we don't have anything better to replace them. This is another question that depends on looking beyond psychology, toward more theory-oriented fields.

Looking Ahead

Metascience can continue the work and actually solve important questions posed by philosophers:

  • Is there a difference between mature and immature fields? Should there be?
  • What guiding assumptions are used for theory choice? Do they change over time, and if yes how are they accepted/rejected? What is the best set of rules? Meehl's suggestions are a good starting point: "We can construct other indexes of qualitative diversity, formal simplicity, novel fact predictivity, deductive rigor, and so on. Multiple indexes of theoretical merit could then be plotted over time, intercorrelated, and related to the long-term fate of theories."
  • Can we tell, in real time, which fields are progressing and which are degenerating? If not, is this an opening for irrationalism? What factors should we use to decide whether to stick with a theory on shaky ground? What factors should we use to judge auxiliary hypotheses?10 Meehl started doing good work in this area, let's build on it.
  • Does null hypothesis testing undermine progress in social sciences by focusing on stats rather than the building of solid theories as Meehl thought?
  • Is it actually useful, as Mitroff suggests, to have a wide array of differently-biased scientists working on the same problems? (At least when there's lots of uncertainty?)
  • Gholson & Barker 1985 applied Lakatos and Laudan's theories to progress in physics and psychology (arguing that some areas of psychology do have a strong theoretical grounding), but this should be taken beyond case studies: comparative approaches with normative conclusions. Do strong theories really help with progress in the social sciences? Protzko et al 2020 offer some great data with direct normative applications, much more could be done in this direction.
  • And hell, while I'm writing this absurd Christmas list let me add a cherry on top: give me a good explanation of how abduction works!

Recommended reading:

  • Imre Lakatos, The Methodology of Scientific Research Programmes [PDF] [Amazon]

  1. 1.Scientific realism is the view that the entities described by successful scientific theories are real.
  2. 2.Never go full relativist.
  3. 3.Quine abandoned the entirety of epistemology, "as a chapter of psychology".
  4. 4.Prout's hypothesis ultimately turned out to be wrong for other reasons, but it was much closer to the truth than initially suggested by chlorine.
  5. 5.The end-point of this line is the naked appeal to authority for deciding what is a serious anomaly and what is not.
  6. 6.Fictions like the idea that Newton's laws were derived from and compatible with Kepler's laws abound. Even in a popular contemporary textbook for undergrads you can find statements like "Newton demonstrated that [Kepler's] laws are a consequence of the gravitational force that exists between any two masses." But of course the planets do not follow perfect elliptical orbits in Newtonian physics, and empirical deviations from Kepler were already known in Newton's time.
  7. 7.Fleck is also good on this point.
  8. 8.Klahr & Dunbar (1988) and Mynatt, Doherty & Tweeny (1978) are also worth checking out. Also, these experiments could definitely be taken further, as a way of rationally reconstructing past advances in the lab.
  9. 9.Did I mention how great he is?
  10. 10.Lakatos: "It is very difficult to decide, especially since one must not demand progress at each single step, when a research programme has degenerated hopelessly or when one of two rival programmes has achieved a decisive advantage over the other."