Is Most Published Research Wrong?

In 2011 an article was published in the reputable "Journal of Personality and Social Psychology". It was called "Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect" or, in other words, proof that people can see into the future. The paper reported on nine experiments. In one, participants were shown two curtains on a computer screen and asked to predict which one had an image behind it, the other just covered a blank wall. Once the participant made their selection the computer randomly positioned an image behind one of the curtains, then the selected curtain was pulled back to show either the image or the blank wall the images were randomly selected from one of three categories: neutral, negative, or erotic. If participants selected the curtain covering the image this was considered a hit.

Now with there being two curtains and the images positions randomly behind one of them, you would expect the hit rate to be about fifty percent. And that is exactly what the researchers found, at least for negative neutral images however for erotic images the hit rate was fifty-three percent. Does that mean that we can see into the future? Is that slight deviation significant? Well to assess significance scientists usually turn to p-values, a statistic that tells you how likely a result, at least this extreme, is if the null hypothesis is true. In this case the null hypothesis would just be that people couldn't actually see into the future and the 53-percent result was due to lucky guesses. For this study the p-value was .01 meaning there was just a one-percent chance of getting a hit rate of fifty-three percent or higher from simple luck. p-values less than .05 are generally considered significant and worthy of publication but you might want to use a higher bar before you accept that humans can accurately perceive the future and, say, invite the study's author on your news program; but hey, it's your choice.

After all, the .05 threshold was arbitrarily selected by Ronald Fisher in a book he published in 1925. But this raises the question: how much of the published research literature is actually false? The intuitive answer seems to be five percent. I mean if everyone is using p less than .05 as a cut-off for statistical significance, you would expect five of every hundred results to be false positives but that unfortunately grossly underestimates the problem and here's why. Imagine you're a researcher in a field where there are a thousand hypotheses currently being investigated. Let's assume that ten percent of them reflect true relationships and the rest are false, but no one of course knows which are which, that's the whole point of doing the research. Now, assuming the experiments are pretty well designed, they should correctly identify around say 80 of the hundred true relationships this is known as a statistical power of eighty percent, so 20 results are false negatives, perhaps the sample size was too small or the measurements were not sensitive enough. Now considered that from those 900 false hypotheses using a p-value of .05, forty-five false hypotheses will be incorrectly considered true.

As for the rest, they will be correctly identified as false but most journals rarely published no results: they make up just ten to thirty percent of papers depending on the field, which means that the papers that eventually get published will include 80 true positive results: 45 false positive results and maybe 20 true negative results. Nearly a third of published results will be wrong even with the system working normally, things get even worse if studies are underpowered, and analysis shows they typically are, if there is a higher ratio of false-to-true hypotheses being tested or if the researchers are biased. All of this was pointed out in 2005 paper entitled "Why most published research is false". So, recently, researchers in a number of fields have attempted to quantify the problem by replicating some prominent past results. The Reproducibility Project repeated a hundred psychology studies but found only thirty-six percent had a statistically significant result the second time around and the strength of measured relationships were on average half those of the original studies.

An attempted verification of 53 studies considered landmarks in the basic science of cancer only managed to reproduce six even working closely with the original study's authors these results are even worse than i just calculated the reason for this is nicely illustrated by a 2015 study showing that eating a bar of chocolate every day can help you lose weight faster. In this case the participants were randomly allocated to one of three treatment groups: one went on a low-carb diet, another one on the same low carb diet plus a 1.5 ounce bar of chocolate per day and the third group was the control, instructed just to maintain their regular eating habits at the end of three weeks the control group had neither lost nor gained weight but both low carb groups had lost an average of five pounds per person the group that a chocolate however lost weight ten percent faster than the non-chocolate eaters the finding was statistically significant with a p-value less than .

05 As you might expect this news spread like wildfire, to the front page of Bild, the most widely circulated daily newspaper in Europe and into the Daily Star, the Irish Examiner, to Huffington Post and even Shape Magazine unfortunately the whole thing had been faked, kind of. I mean researchers did perform the experiment exactly as they described, but they intentionally designed it to increase the likelihood of false positives: the sample size was incredibly small, just five people per treatment group, and for each person 18 different measurements were tracked including: weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, and so on; so if weight loss didn't show a significant difference there were plenty of other factors that might have. So the headline could have been "chocolate lowers cholesterol" or "increases sleep quality" or… something. The point is: a p-value is only really valid for a single measure once you're comparing a whole slew of variables the probability that at least one of them gives you a false positive goes way up, and this is known as "p-hacking". Researchers can make a lot of decisions about their analysis that can decrease the p-value, for example let's say you analyze your data and you find it nearly reaches statistical significance, so you decide to collect just a few more data points to be sure then if the p-value drops below .

05 you stop collecting data, confident that these additional data points could only have made the result more significant if there were really a true relationship there, but numerical simulations show that relationships can cross the significance threshold by adding more data points even though a much larger sample would show that there really is no relationship. In fact, there are a great number of ways to increase the likelihood of significant results like: having two dependent variables, adding more observations, controlling for gender, or dropping one of three conditions combining all three of these strategies together increases the likelihood of a false-positive to over sixty percent, and that is using p less than .05 Now if you think this is just a problem for psychology neuroscience or medicine, consider the pentaquark, an exotic particle made up of five quarks, as opposed to the regular three for protons or neutrons.

Particle physics employs particularly stringent requirements for statistical significance referred to as 5-sigma or one chance in 3.5 million of getting a false positive, but in 2002 a Japanese experiment found evidence for the Theta-plus pentaquark, and in the two years that followed 11 other independent experiments then looked for and found evidence of that same pentaquark with very high levels of statistical significance. From July 2003 to May 2004 a theoretical paper on pentaquarks was published on average every other day, but alas, it was a false discovery for their experimental attempts to confirm that theta-plus pentaquark using greater statistical power failed to find any trace of its existence. The problem was those first scientists weren't blind to the data, they knew how the numbers were generated and what answer they expected to get, and the way the data was cut and analyzed, or p-hacked, produced the false finding. Now most scientists aren't p-hacking maliciously, there are legitimate decisions to be made about how to collect, analyze and report data, and these decisions impact on the statistical significance of results.

For example, 29 different research groups were given the same data and asked to determine if dark-skinned soccer players are more likely to be given red cards; using identical data some groups found there was no significant effect while others concluded dark-skinned players were three times as likely to receive a red card. The point is that data doesn't speak for itself, it must be interpreted. Looking at those results it seems that dark skinned players are more likely to get red carded but certainly not three times as likely; consensus helps in this case but for most results only one research group provides the analysis and therein lies the problem of incentives: scientists have huge incentives to publish papers, in fact their careers depend on it; as one scientist Brian Nosek puts it: "There is no cost to getting things wrong, the cost is not getting them published". Journals are far more likely to publish results that reach statistical significance so if a method of data analysis results in a p-value less than .

05 then you're likely to go with that method, publication's also more likely if the result is novel and unexpected, this encourages researchers to investigate more and more unlikely hypotheses which further decreases the ratio of true to spurious relationships that are tested; now what about replication? Isn't science meant to self-correct by having other scientists replicate the findings of an initial discovery? In theory yes but in practice it's more complicated, like take the precognition study from the start of this video: three researchers attempted to replicate one of those experiments, and what did they find? well, surprise surprise, the hit rate they obtained was not significantly different from chance. When they tried to publish their findings in the same journal as the original paper they were rejected. The reason? The journal refuses to publish replication studies. So if you're a scientist the successful strategy is clear and don't even attempt replication studies because few journals will publish them, and there is a very good chance that your results won't be statistically significant any way in which case instead of being able to convince colleagues of the lack of reproducibility of an effect you will be accused of just not doing it right.

So a far better approach is to test novel and unexpected hypotheses and then p-hack your way to a statistically significant result. Now I don't want to be too cynical about this because over the past 10 years things have started changing for the better. Many scientists acknowledge the problems i've outlined and are starting to take steps to correct them: there are more large-scale replication studies undertaken in the last 10 years, plus there's a site, Retraction Watch, dedicated to publicizing papers that have been withdrawn, there are online repositories for unpublished negative results and there is a move towards submitting hypotheses and methods for peer review before conducting experiments with the guarantee that research will be published regardless of results so long as the procedure is followed. This eliminates publication bias, promotes higher powered studies and lessens the incentive for p-hacking. The thing I find most striking about the reproducibility crisis in science is not the prevalence of incorrect information in published scientific journals after all getting to the truth we know is hard and mathematically not everything that is published can be correct.

What gets me is the thought that even trying our best to figure out what's true, using our most sophisticated and rigorous mathematical tools: peer review, and the standards of practice, we still get it wrong so often; so how frequently do we delude ourselves when we're not using the scientific method? As flawed as our science may be, it is far away more reliable than any other way of knowing that we have. This episode of veritasium was supported in part by these fine people on Patreon and by Audible.com, the leading provider of audiobooks online with hundreds of thousands of titles in all areas of literature including: fiction, nonfiction and periodicals, Audible offers a free 30-day trial to anyone who watches this channel, just go to audible.com/veritasium so they know i sent you. A book i'd recommend is called "The Invention of Nature" by Andrea Wolf which is a biography of Alexander von Humboldt, an adventurer and naturalist who actually inspired Darwin to board the Beagle; you can download that book or any other of your choosing for a one month free trial at audible.

com/veritasium so as always i want to thank Audible for supporting me and I really want to thank you for watching..

5 Bad Reasons to Ditch the Paris Climate Agreement

Yesterday the President of the United States Donald J Trump decided to remove the U.S. from the Paris climate agreement, something that was agreed to by basically every country on earth except for Syria and Nicaragua Syria in war and Nicaragua because they didn't think it went far enough. Now this just baffles me, I'm trying to understand the reasons for why you would do this, why withdraw from this agreement but none of the stated reasons make any sense to me so in this video I'm going to break down the top five bad reasons I've heard for why the US is withdrawing from the Paris climate agreement. Okay, number one is because it is bad for the US economy. The U.S. set a target of reducing their emissions from 2005 levels by 26 to 28 percent by 2025, and they've already reduced the emissions by around 12 to 14 percent.

So maybe it's fair enough to say if you wanted to implement some really strict policies and really curb emissions there might be a way to do harm to the economy in the process but here's the thing, the Paris agreement is completely non-binding. So if the president didn't want to implement any policies to curb emissions that would be fine and he's not going to be president in 2025 anyway so I mean what does it matter there's a non-binding agreement there are no repercussions no one has to do anything it's mainly just a goal it's a target that target in itself is not going to harm the US economy and all of this ignores the fact that the world is moving towards cleaner, greener tech innovation there's going to be a lot of investment in that area, estimates of multiple trillions of dollars being invested in this so if you're a country that doesn't embrace reductions in emissions then actually you might miss out on investment opportunities new innovations and you might lose the opportunity to be a world leader and that might actually hurt the GDP and if you look at the Canadian province of British Columbia for example they implemented a carbon tax and reduced per capita fossil fuel use by about 20 percent compared to the rest of Canada meanwhile their GDP grew at the same rate as the rest of the country so there isn't a lot of evidence to suggest that reducing emissions, directly causes a downturn in the economy.

Which brings us to number two, well the free market should decide what technologies take off, what innovations happen the money, the smart money should go where the good investment opportunities are the government shouldn't be deciding who should win and who should lose and that we should change to a cleaner greener economy, that is a very American viewpoint on the world and I like it, I like this idea that markets are smart and they'll put money where it pays returns the problem is this market has never been fair and the reason why is because co2 has not been considered really a pollutant up until now and to be fair co2 doesn't really seem like a pollutant and if you're just emitting a little bit of it there's no problem the problem comes when we totally change the amount of co2 in the atmosphere and only then because co2 has this effect of trapping infrared radiation, something scientists figured out you know more than 100 years ago. So here's the problem, people have been emitting co2 which in small amounts is really not a big deal but in large amounts can cause some damage, damage in the form of more intense storms and droughts and people have to pay for that so there is a cost actually associated with emitting co2 except right now that cost is not being borne by the emitters of co2, it's being borne by the whole world and that makes the markets not on a level playing field.

I mean the analogy for this would be let's say there's one company that disposes of its pollution appropriately and that cost some money and so paying this company is more expensive than paying another company which just dumps its pollution in a river and you know leaves the rest of the communities downstream to deal with it. in that market it's not fair because people will go to the cheaper option and they're only cheaper because they're polluting for free, so in order for free markets to decide and make a fair decision all I'm saying is we need to factor in the cost of the pollution. This makes cleaner technology way more competitive and so yeah let's go for a free market solution but let's make sure the market is truly fair first. Number three, China and India don't have to reduce their emissions so why should the United States? Ok well the truth about this is that China and India are setting targets under the Paris agreement to reduce their emissions but that is per unit of GDP.

With the idea that these countries are still developing they're still going to grow a lot and so it seems pretty unfair to curb their emissions so strictly right now, whereas the US is the biggest historic emitter of carbon dioxide they've emitted about 30% of the total excess carbon dioxide that is now in the atmosphere Europe's also emitted about 30% and that has made those countries very rich and very capable of changing their economies into less polluting economies so the idea here is that what seems most fair is for the countries that contributed most of the problem to start to take action first and also because their economies can deal with it they're rich enough and also the economies of the US and Europe don't depend very much on just a lot of energy I mean a lot of the sectors like you know financial and technology and innovation they don't require tons of energy to to get going, not like building the infrastructure in in India and China are going to require in order to lift all of those populations out of poverty so I think it seems pretty fair for the US and Europe to go first I don't think this is a part where you point to a country that hasn't really contributed much the problem say well why aren't they changing first before we do it.

If you created the problem you need to be one of the first to try to fix it. Number four, the Paris agreement wouldn't do anything to help climate change anyway, now while it's true that under the current emissions targets that have been set we're not guaranteed to limit warming to under two degrees Celsius which is what most experts think is kind of a safe level but it is an important starting point it is all the countries of the world virtually coming together to agree to do something and I think once people start taking action to try to achieve these goals we're going to find that it just gets easier to try to lower our emissions so I think the Paris agreement is really a floor not a ceiling on what we can do in terms of reducing our emissions and it's really an important first step and I don't see how anything is gained by leaving it. Number five; he had to withdraw from the Paris agreement because it's politically unpopular here in the U.S. That is actually just not true depending on what poll you look at roughly seven out of ten Americans think that we should still be in the agreement and 60% of swing voters think that it's good to be part of the deal and even half of Republicans wanted to stay in so what really is gained here I think there's certainly a portion of Trumps base that wanted to see him withdraw from this agreement it's something he can point to is a campaign pledge that's been fulfilled and it'll definitely energize that base but beyond that it's hard to see how this is going to raise his approval ratings much which currently sit around 39% and that brings me to bonus reason number six which perhaps is the real reason that he did this and it was to piss off the opposition.

He wanted a whole bunch of environmentalists whipped into a frenzy so that he could point at them and say look how crazy these people are and how much they prefer the trees and birds and stuff like that over jobs and the economy and things that people really should care about. The problem is I mean that relies on people believing that you know these sorts of agreements would be bad for the economy which I think you can demonstrate from the evidence that they're not, so I think the best response to this decision is not to get angry or inflamed or you know go nuts about it because I think that's kind of maybe why he did it in the first place I think the best reaction is one that we're already seeing, that people around the U.S. cities, states, leaders business leaders are all agreeing to work with each other to make sure that the U.

S. meets its responsibilities under the Paris climate agreement whether the federal government actually, you know signs it, ratifies it or not and I think that might be the best outcome here if Trump becomes marginalized and people no longer look to his leadership that might just make him feel small which is probably the thing he would hate the most..