How Many Undetected Frauds in Science?
0.04% of papers are retracted. At least 1.9% of papers have duplicate images "suggestive of deliberate manipulation". About 2.5% of scientists admit to fraud, and they estimate that 10% of other scientists have committed fraud. 27% of postdocs said they were willing to select or omit data to improve their results. More than 50% of published findings in psychology are false. The ORI, which makes about 13 misconduct findings per year, gives a conservative estimate of over 2000 misconduct incidents per year.
That's a wide range of figures, and all of them suffer from problems if we try to use them as estimates of the real rate of fraud. While the vast majority of false published claims are not due to fabrication, it's clear that there is a huge iceberg of undiscovered fraud hiding underneath the surface.
Part of the issue is that the limits of fraud are unclear. While fabrication/falsification are easy to adjudicate, there's a wide range of quasi-fraudulent but quasi-acceptable "Questionable Research Practices" (QRPs) such as HARKing which result in false claims being presented as true. Publishing a claim that has a ~0%1 chance of being true is the worst thing in the world, but publishing a claim that has a 15% chance of being true is a totally normal thing that perfectly upstanding scientists do. Thus the literature is inundated by false results that are nonetheless not "fraudulent". Personally I don't think there's much of a difference.
There are two main issues with QRPs: first, there's no clear line in the sand, which makes it difficult to single out individuals for punishment. Second, the majority of scientists engage in QRPs. In fact they have been steeped in an environment full of bad practices for so long that they are no longer capable of understanding that they are behaving badly:
Let him who is without QRPs cast the first stone.
The case of Brian Wansink (who committed both clear fraud and QRPs) is revealing: in the infamous post that set off his fall from grace, he brazenly admitted to extreme p-hacking. The notion that any of this was wrong had clearly never crossed his mind: he genuinely believed he was giving useful advice to grad students. When commenters pushed back, he justified himself by writing that "P-hacking shouldn’t be confused with deep data dives".
Anyway, here are some questions that might help us determine the size of the iceberg:
- Are uncovered frauds high-quality, or do we only have the ability to find low-hanging fruit?
- Are frauds caught quickly, or do they have long careers before anyone finds out?
- Are scientists capable of detecting fraud or false results in general (regardless of whether they are produced by fraud, QRPs, or just bad luck)?
- How much can we rely on whistleblowers?
Here's an interesting case recently uncovered by Elisabeth Bik: 8 different published, peer-reviewed papers, by different authors, on different subjects, with literally identical graphs. The laziness is astonishing! It would take just a few minutes to write an R script that generates random data so that each fake paper could at least have unique charts. But the paper mill that wrote these articles won't even do that. This kind of extreme sloppiness is a recurring theme when it comes to frauds that have been caught.
In general the image duplication that Bik uncovers tends to be rather lazy: people just copy paste to their heart's content and hope nobody will notice (and peer reviewers and editors almost certainly won't notice).
The Bell Labs physicist Jan Hendrik Schön was found out because he used identical graphs for multiple, completely different experiments.
This guy not only copy-pasted a ton of observations, he forgot to delete the excel sheet he used to fake the data! Managed to get three publications out of it.
Back to Wansink again: he was smart enough not to copy-paste charts, but he made other stupid mistakes. For example in one paper (The office candy dish) he reported impossible means and test statistics (detected through granularity testing). If he had just bothered to create a plausible sample instead of directly fiddling with summary statistics, there's a good chance he would not have been detected. (By the way, the paper has not been retracted, and continues to be cited. I Fucking Love Science!)
In general Wansink comes across as a moron, yet he managed to amass hundreds of publications, 30k+ citations, and half a dozen books. What percentile of fraud competence do you think Wansink represents?
The point is this: generating plausible random numbers is not that difficult! Especially considering the fact that these are intelligent people with extensive training in science and statistics. It seems highly likely that there are more sophisticated frauds out there.
Do frauds manage to have long careers before they get caught? I don't think there's any hard data on this (though someone could probably compile it with the Retraction Watch database). Obviously the highest-profile frauds are going to be those with a long history, so we have to be careful not to be misled. Perhaps there's a vast number of fraudsters who are caught immediately.
Overall the evidence is mixed. On the one hand, a relatively small number of researchers account for a fairly large proportion of all retractions. So while these individuals managed to evade detection for a long time (Yoshitaka Fujii published close to 200 papers over a 25 year career), most frauds do not have such vast track records.
On the other hand just because we haven't detected fraudulent papers doesn't necessarily mean they don't exist. And repeat fraud seems fairly common: simple image duplication checks reveal that "in nearly 40% of the instances in which a problematic paper was identified, screening of other papers from the same authors revealed additional problematic papers in the literature."
Even when fraud is clearly present, it can take ages for the relevant authorities to take action. The infamous Andrew Wakefield vaccine autism paper, for example, took 12 years to retract.
I've been reading a lot of social science papers lately and a thought keeps coming up: "this paper seems unlikely to replicate, but how can I tell if it's due to fraud or just bad methods?" And the answer is that in general we can't tell. In fact things are even worse, as scientists seem to be incapable of detecting even really obviously weak papers (more on this in the next post).
In cases such as Wansink's, people went over his work with a fine comb after the infamous blogpost and discovered all sorts of irregularities. But nobody caught those signs earlier. Part of the issue is that nobody's really looking for fraud when they casually read a paper. Science tends to work on a kind of honor system where everyone just assumes the best. Even if you are looking for fraud, it's time-consuming, difficult, and in many cases unclear. The evidence tends to be indirect: noticing that two subgroups are a bit too similar, or that the effects of an intervention are a bit too consistent. But these can be explained away fairly easily. So unless you have a whistleblower it's often difficult to make an accusation.
The case of the 5-HTTLPR gene is instructive: as Scott Alexander explains in his fantastic literature review, a huge academic industry was built up around what should have been a null result. There are literally hundreds of non-replicating papers on 5-HTTLPR—suppose there was one fraudulent article in this haystack, how would you go about finding it?
Some frauds (or are they simply errors?) are detected using statistical methods such as the granularity testing mentioned above, or with statcheck. But any sophisticated fraud would simply check their own numbers using statcheck before submitting, and correct any irregularities.
Detecting weak research is easy. Detecting fraud and then prosecuting it is extremely difficult.
Some cases are brought to light by whistleblowers, but we can't rely on them for a variety of reasons. A survey of scientists finds that potential whistleblowers, especially those without job security, tend not to report fraud due to the potential career consequences. They understand that institutions will go to great lengths to protect frauds—do you want a career, or do you want to do the right thing?
Often there simply is no whistleblower available. Scientists are trusted to collect data on their own, and they often collaborate with people in other countries or continents who never have any contact with the data-gathering process. Under such circumstances we must rely on indirect means of detection.
South Korean celebrity scientist Hwang Woo-suk was uncovered as a fraud by a television program which used two whistleblower sources. But things only got rolling when image duplication was detected in one of his papers. Both whistleblowers lost their jobs and were unable to find other employment.
In some cases people blow the whistle and nothing happens. The report from the investigation into Diederik Stapel, for example, notes that "on three occasions in 2010 and 2011, the attention of members of the academic staff in psychology was drawn to this matter. The first two signals were not followed up in the first or second instance." By the way, these people simply noticed statistical irregularities, they never had direct evidence.
And let's turn back to Wansink once again: in the blog post that sank him, he recounted tales of instructing students to p-hack data until they found a result. Did those grad students ever blow the whistle on him? Of course not.
Let's say that about half of all published research findings are false. How many of those are due to fraud? As a very rough guess I'd say that for every 100 papers that don't replicate, 2.5 are due to fabrication/falsification, and 85 are due to lighter forms of methodological fraud. This would imply that about 1% of fraudulent papers are retracted.
This is both good and bad news. On the one hand, while most fraud goes unpunished, it only represents a small portion of published research. On the other hand, it means that we can't fix reproducibility problems by going after fabrication/falsification: if outright fraud completely disappeared tomorrow, it would be no more than an imperceptible blip in the replication crisis. A real solution needs to address the "questionable" methods used by the median scientist, not the fabrication used by the very worst of them.
- 1.Sometimes fraudulent methods are used to defend a real result. Check out the tragicomic story of Newton and the Fudge Factor. Or the recent case of Stapel: "More than one social psychologist informally commenting on the Stapel case has suggested that some of the findings suspected of being fraudulently produced or embellished might well be true when tested properly." ↩