Is Economics Finally Becoming Trustworthy?

“There are two things you are better off not watching in the making: sausages and econometric estimates. This is a sad and decidedly unscientific state of affairs we find ourselves in. Hardly anyone takes data analyses seriously. Or perhaps more accurately, hardly anyone takes anyone else’s data analyses seriously.”

That is the scathing critique that economist Ed Leamer leveled at empirical research in his famed 1983 article “Lets Take the Con Out of Econometrics”. At the time, he meant that researchers knew not to trust other researchers’ estimates much because they were sensitive to arbitrary choices made throughout the research process. But for most of the decades since Leamer’s critique, the educated public has tended to take peer-reviewed studies seriously.

This started to change with physician John Ioannidis’ 2005 hit article “Why Most Published Research Findings Are False”. Concerns grew rapidly through the “replication crisis” of the 2010s, assisted by the growth of social media. Psychology was hit first and hardest, starting with the 2011 article “False Positive Psychology”. But economics and the rest of the social sciences haven’t been spared.

A core premise of science is that research should be replicable. If one scientist creates an experiment to measure a physical constant like the speed of light, and they document their experiment well enough, other scientists should be able to perform the same experiment and find the same result. If one lab’s results can’t be replicated anywhere else, then like cold fusion, they probably aren’t real.

Outside of hard sciences like physics we don’t expect to get the same precision. Perhaps one trial finds a drug reduces heart attacks by 17%, while another finds 14%. But for research to usefully inform our actions, it needs to be at least somewhat replicable. If one trial found a drug worked but every subsequent trial found it did nothing, people probably shouldn’t take the drug.

Social science research has spent decades producing the equivalent of studies hyping a drug that turns out to be useless or harmful. When a team led by Brian Nosek attempted in 2015 to replicate 100 experiments that had been published in top psychology journals, less than half turned out to show statistically significant findings. A Federal Reserve discussion paper released the same year found similarly poor results for published economics papers.

If peer-reviewed studies published in top journals can’t be trusted, what can we trust? Since 2015 some popular answers have been “nothing”, or a mix of common sense and ideologically-informed prior beliefs. But scientific reforms undertaken in the wake of the replication crisis may finally be starting to bear fruit in the form of replicable, trustworthy research.

The US military was one of many institutions that had been relying on social science research to guide its decision-making. When the replication crisis led to doubts about this research, they decided to act. The Defense Advanced Projects Research Agency, famed for funding hard-technology breakthroughs like the Internet and self-driving cars, provided funding for Brian Nosek and the Center for Open Science to conduct a massive replication of research from across the social sciences. The idea was to test both how reliable this research was, and to see whether there were any commonalities in the sorts of research that turned out to be more trustworthy.

The results of this effort were just published in a special issue of the journal Nature. Hundreds of researchers (of which I was one) from across the social sciences attempted to replicate hundreds of claims from papers published in top social science journals. Overall we found things improving from a poor start. For instance, most papers don’t share the data or code that supposedly produced their results, but they are much more likely to than they were in 2009, the start of the period studied.

Figure 1: Data and code availability by year of publication

Source: Nature

Economics, along with political science, looks relatively good by this measure, with about half of articles sharing data or code, compared to less than one in ten articles in the field of Education. Economics likewise had relatively good “reproducibility”, with most articles clearing this low bar. Reproducibility refers to whether, if other researchers analyze the exact same dataset a published article says it used in the exact same way the article says it analyzed it, they get the exact same result. For Economics papers they produced the exact same result 67% of the time, a higher rate than every other field studied.

Figure 2: Reproducibility by Field

Source: Nature

I call this a low bar because it simply means that the original researchers documented what they did well enough that others could copy it, not that what they found was correct (conversely, if they didn’t document things well enough for others to copy, it wouldn’t necessarily mean they were wrong). How do we know if they were correct?

Other papers from the Nature issue test how sensitive results are to tweaks in the methods of analysis. If there are several reasonable methods of analyzing the data, did the original researchers happen (by coincidence or cherry-picking) to choose the only one that gives statistically significant results? Or would most reasonable methods reach more or less the same conclusion?

Here most papers could be called “directionally correct”. Of attempts to test their robustness, 74% found statistically significant results in the same direction as the original, but only 34% found an effect size very close to the original.

When attempting to replicate claims in new datasets (not just using new methods with existing data), only half found statistically significant results in the same direction as the originals, and the effects found were less than half as large as the originals.

Overall this suggests that published social science research usually exaggerates the size of the effects, and often claims effects that may not exist. This is far from ideal, but relying on research is still much better than chance. For instance, robustness tests found significant effects in the opposite direction as the original paper only 2% of the time.

What does all this mean for consumers of research? It’s always been a good idea to trust whole literatures more than single papers. For economics, the Journal of Economic Perspectives does a great job summing up areas of research in a relatively accessible way.

As a new quick rule of thumb inspired by the Nature papers, you could do worse than “cut estimated effect sizes in half”. If a published paper says that a college degree raises wages 100%, then chances are the degree really does raise wages, but more like 40–50%. In 2005, John Ioannidis said that “most published research findings are false”. By 2026, we seem to have improved to “most published research findings are exaggerated.”

The post Is Economics Finally Becoming Trustworthy? appeared first on Econlib.

Source link