Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Most Science Studies Appear to Be Tainted By Sloppy Analysis (wsj.com)
20 points by nickb on Sept 14, 2007 | hide | past | favorite | 15 comments


Worth pointing out is that, while the studies are almost certainly wrong, they develop into less wrong that builds on the old more wrong.

Take the Bohr atom, horribly wrong, but immensely valuable in it's time (and still pedagogically). Then you add sub-orbitals, orbital hybridization theory, and the quantum mechanical model, all almost certainly wrong.

Even in the social sciences you have results like "if we make charitable gifts tax deductible, revenue will go down, but gifts will go up more than the lost revenue" later studies came along using people as their own control by studying their behavior through time instead of just over one year, and the opposite is true, tax revenue goes down more than giving goes up. So the original study was "wrong" but to get the right answer you have to start with the simple and build up.

There are certainly areas that are so entangled with inter-relations and data difficulties that make empirical studies worthless until someone comes up with decent models to untangle it all, but I think those are the exception rather than the rule.


I think I agree with your main point, but it's important to distinguish between various kinds of "wrongness".

The Bohr atom is "wrong" in the way that a cartoon is wrong.

Newtonian mechanics (and non-relativistic quantum mechanics, for that matter) are "wrong" because they are approximations: at certain length scales or time scales they work perfectly, but at other scales they break down and need to be replaced by something more precise.

The kind of "wrong" that this article is discussing is much different: biased experimental design and lousy statistics. Unlike the other two kinds of "wrong", this is actually a problem. I've seen thousands of hours wasted - some of them by me - due to lack of knowledge of the principles of statistical significance, or because someone found a way to fool himself into cherry-picking the most interesting data points.

Public service announcement: If you're still in school as you read this, take a stats course!


> If you're still in school as you read this, take a stats course!

If you're not, you can consult with a decent statistician at most institutions. Might save your career, in fact...


You raise a great point, but consider a different viewpoint.

epidemiological studies are based on complex systems, systems with many variables and non-linearities. Atoms are a lot simpler (ha!) Here's what I think is going on: a hot paper comes out, presenting an interesting theory. People either like or don't like the theory based on preconceived notions, look at their data, plot every permutation of Xi versus Xj and publish the "juiciest plot."

Ok, it sounds like I'm saying that the "scientific method" has broken down. I'm not, the preconceived notion is subtle, difficult, and hard to eradicate! So, people aren't starting with simple and building up. Not because they don't want to, but because they can't.


What we need is a massive meta-study. Here's how it would work: take 1000 randomly-selected studies, each of which had a 95% confidence interval. Repeat all of them. You should then get some idea of which 50 of the 1000 studies saw noise and treated it as data. Then compare the media attention accorded to the 50 versus the 950 -- that should tell you what you probably already know: that since we're drawn to interesting results, and implausible truths are interesting, we all pay disproportionate attention to studies that happen to be wrong.


This is rather expensive, and for human studies, would require both extremely fine control of bias (very, very difficult in meta-analyses due to publication bias) as well as extremely well-established investigators who could get a hugely negative study published in a highly visible journal.

Most academics know better than to challenge the norms. Corporate control of scholarly journals (in terms of the actual distribution and management of the content rights) has not served the actual practice of science very well.

In any event, this massive meta-replication study you speak of is done, very slowly and piecemeal, every day in every reasonable field. As new studies fail to replicate old studies, the conclusions of the irreproducible are deprecated, while those that consistently bear fruit are entrenched. Given complex models and expensive studies (eg. massive multi-year cohort studies of genetic, environmental, and gene*environment interaction effects on cancer risk), that's about all you can hope for.

Science is simultaneously practiced in a manner worse than anyone's fears (on the level of individual studies) and better than anyone's hopes (on the level of accretion of knowledge over time vs. budgetary constraints). The fraudsters burn out quickly in most rigorous fields.


>In any event, this massive meta-replication study you speak of is done, very slowly and piecemeal, every day in every reasonable field. As new studies fail to replicate old studies, the conclusions of the irreproducible are deprecated, while those that consistently bear fruit are entrenched. Given complex models and expensive studies (eg. massive multi-year cohort studies of genetic, environmental, and gene*environment interaction effects on cancer risk), that's about all you can hope for.

Sure, but in a very ad hoc way. It's not a study if you're just sucking up data when you find it -- there isn't a control group, and you haven't standardized it.


> there isn't a control group, and you haven't standardized it.

Be very, very careful walking down this path. For example, case-control studies are inherently non-representative of population rates and thus more appropriate for rare events (the tests reflect divergence from a large-sample approximation of the hypergeometric distribution to a chi-squared distribution). So if you want to test them against hypothetical nulls, it's better to permute existing data (provided it is in fact good data) or do your own replication study (focusing on the main effects) than to attempt to standardize a model that resists standardization. The question for many studies is not really 'how much' but 'whether', as in, whether a risk or a process or a mechanism explains a significant amount of the observed and unknown.

One of the worst fallacies in science is the widespread belief that large-sample asymptotic results are applicable to smaller samples, often extreme-value distributions. You can't know what you don't know in some of these studies, in other words. You can attempt to build a model of what could have happened given the constraints under the null, and then permute the events in a simulation -- but for a meta-analysis with replication, how would you pull this off under a finite budget?

And it would still be less expensive than your previous suggestion of attempting to directly replicate each of, say, 1000 studies. Even casting aside the intense reticence of researchers to perform purely replication-based studies, which is amazingly powerful to behold.

So, while it is rather piecemeal and ad-hoc, my contention is that the current jury-rigged methodology is actually a less expensive way (albeit much slower) to converge at the best approximation to the truth in complex fields.

You can disprove that a transcription factor will bind to a kinase after a functional mutation is introduced, because finding a control is easy -- you split a colony of cells, transfect one, don't transfect the other, and run everything in parallel. But when you are modeling genetic and environmental effects over a span of 25 years among 15,000 men and women, how do you go back in time and select an appropriate group of controls for meta-analysis?

The truth is that you don't. Barring craven and institutionalized misinterpretation, such as is performed in some clinical trials analysis (when we all know better, and most biostatisticians can easily pick apart the errors if the information is made public), the current process is a slow, iterative, but useful approximation to the infinite-budget approach you propose.

It might work for rinky-dink biochemistry or psychology experiments, or perhaps for microarray studies with expression signatures for things like tumor phenotypes. That's not too awful, and you could re-pilot the study for replication (happens a lot already). But the monster cohort studies that spawn sub-studies -- good luck with that! The big epidemiological studies are among the most mistreated of all, because carefully parametrized statements of summary results are then spun by talking heads to sound as dramatic as possible. Sometimes the principal investigators will get in on the game, but as often as not, calmly presented information ("we saw a 2.5x (95% ci: 1.4x-3.7x) increase in bladder cancer risk for GSTM1 null phenotypes exposed to N-nitrosamines") will be re-spun as "KILLER CANCER GENE FOUND BY PIONEERING RESEARCHERS AT THE UNIVERSITY OF SOUTH SCRANTON!!!1".

Don't shoot the message, shoot the messenger, for those...


From the piece:

> "The correction isn't the ultimate truth either," Prof. Kevles said.

No kidding! Folks, this is an iterative process. If you get as excited as the P.I. about their findings, stop for a second, look at the methods & materials, and ask if a different analysis would support their findings. In a 'hot' field, look very carefully at the figures and tables, for these fields are the most prone to shenanigans (cough stem cells cough). At the same time, when you have a critical mass of smart, motivated people in a field that's ripe for discovery, real advances can and do happen. What biologist would have taken you seriously in 1990 if you told them that, not only would we have a map of significant functional landmarks in the human genome by 2000, but by 2010, we'd have them broken down base-by-base into their patterns of variation among sub-populations? And the same thing is happening for everything from nematode worms to wine grapes.

Don't throw the baby out with the bathwater. But do expect more heavy-handed anti-intellectual undertones from the WSJ as Murdoch begins to insert his control into the editorial staff.


Humans are not rational beings, their judgments are driven by certain inbuilt heuristics and biases.

Excellent intro to the topic: http://singinst.org/Biases.pdf

Excellent blog: http://www.overcomingbias.com


Here's the essay, Why Most Published Research Findings Are False: [http://medicine.plosjournals.org/perlserv/?request=get-docum...]


I recall having read this piece before, and it is well written; but for a non-statistician, there's really just one thing you need to remember:

Extraordinary claims require extraordinary evidence.

If you're testing 500,000 hypotheses and you find that one of them is significant, it better be extremely significant, and it had better survive independent replication. Otherwise... well, let's not belabor the point.

Just keep in mind that a wild and wacky theory needs some heavy-duty experimental evidence (i.e. replication) before you, or the PI, or the referees for Ye Olde Journal, have any reason to believe it. If everyone involved would keep this in mind, it would cut down hugely on publication bias and popular confirmation bias as well.


Why Most Published Research Findings Are False, AKA Medicine Recapitulates the Incompleteness Theorem Seventy Years On


In my field, I've watched as the statistics have been invented to deal with the data. Papers from the early neuroimaging days may not have the best, or even correct, stats, but we know if those studies were on to something if the findings were replicated. I think that iteration in science is as good an approximation of 'truth' as we can get from the world around us. Without replication any individual finding is suspect, no matter how much play it gets in the popular press.


Medicine is not science!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: