Search This Blog

Showing posts with label publication. Show all posts
Showing posts with label publication. Show all posts

Thursday, 8 February 2018

A simple guide to statistics in the age of deception

Tim Harford in The Financial Times

Image result for statistics



“The best financial advice for most people would fit on an index card.” That’s the gist of an offhand comment in 2013 by Harold Pollack, a professor at the University of Chicago. Pollack’s bluff was duly called, and he quickly rushed off to find an index card and scribble some bullet points — with respectable results. 


When I heard about Pollack’s notion — he elaborated upon it in a 2016 book — I asked myself: would this work for statistics, too? There are some obvious parallels. In each case, common sense goes a surprisingly long way; in each case, dizzying numbers and impenetrable jargon loom; in each case, there are stubborn technical details that matter; and, in each case, there are people with a sharp incentive to lead us astray. 

The case for everyday practical numeracy has never been more urgent. Statistical claims fill our newspapers and social media feeds, unfiltered by expert judgment and often designed as a political weapon. We do not necessarily trust the experts — or more precisely, we may have our own distinctive view of who counts as an expert and who does not.  

Nor are we passive consumers of statistical propaganda; we are the medium through which the propaganda spreads. We are arbiters of what others will see: what we retweet, like or share online determines whether a claim goes viral or vanishes. If we fall for lies, we become unwittingly complicit in deceiving others. On the bright side, we have more tools than ever to help weigh up what we see before we share it — if we are able and willing to use them. 

In the hope that someone might use it, I set out to write my own postcard-sized citizens’ guide to statistics. Here’s what I learnt. 

Professor Pollack’s index card includes advice such as “Save 20 per cent of your money” and “Pay your credit card in full every month”. The author Michael Pollan offers dietary advice in even pithier form: “Eat Food. Not Too Much. Mostly Plants.” Quite so, but I still want a cheeseburger.  

However good the advice Pollack and Pollan offer, it’s not always easy to take. The problem is not necessarily ignorance. Few people think that Coca-Cola is a healthy drink, or believe that credit cards let you borrow cheaply. Yet many of us fall into some form of temptation or other. That is partly because slick marketers are focused on selling us high-fructose corn syrup and easy credit. And it is partly because we are human beings with human frailties. 

With this in mind, my statistical postcard begins with advice about emotion rather than logic. When you encounter a new statistical claim, observe your feelings. Yes, it sounds like a line from Star Wars, but we rarely believe anything because we’re compelled to do so by pure deduction or irrefutable evidence. We have feelings about many of the claims we might read — anything from “inequality is rising” to “chocolate prevents dementia”. If we don’t notice and pay attention to those feelings, we’re off to a shaky start. 

What sort of feelings? Defensiveness. Triumphalism. Righteous anger. Evangelical fervour. Or, when it comes to chocolate and dementia, relief. It’s fine to have an emotional response to a chart or shocking statistic — but we should not ignore that emotion, or be led astray by it. 

There are certain claims that we rush to tell the world, others that we use to rally like-minded people, still others we refuse to believe. Our belief or disbelief in these claims is part of who we feel we are. “We all process information consistent with our tribe,” says Dan Kahan, professor of law and psychology at Yale University. 

In 2005, Charles Taber and Milton Lodge, political scientists at Stony Brook University, New York, conducted experiments in which subjects were invited to study arguments around hot political issues. Subjects showed a clear confirmation bias: they sought out testimony from like-minded organisations. For example, subjects who opposed gun control would tend to start by reading the views of the National Rifle Association. Subjects also showed a disconfirmation bias: when the researchers presented them with certain arguments and invited comment, the subjects would quickly accept arguments with which they agreed, but devote considerable effort to disparage opposing arguments.  

Expertise is no defence against this emotional reaction; in fact, Taber and Lodge found that better-informed experimental subjects showed stronger biases. The more they knew, the more cognitive weapons they could aim at their opponents. “So convenient a thing it is to be a reasonable creature,” commented Benjamin Franklin, “since it enables one to find or make a reason for everything one has a mind to do.” 

This is why it’s important to face up to our feelings before we even begin to process a statistical claim. If we don’t at least acknowledge that we may be bringing some emotional baggage along with us, we have little chance of discerning what’s true. As the physicist Richard Feynman once commented, “You must not fool yourself — and you are the easiest person to fool.” 

The second crucial piece of advice is to understand the claim. That seems obvious. But all too often we leap to disbelieve or believe (and repeat) a claim without pausing to ask whether we really understand what the claim is. To quote Douglas Adams’s philosophical supercomputer, Deep Thought, “Once you know what the question actually is, you’ll know what the answer means.” 

For example, take the widely accepted claim that “inequality is rising”. It seems uncontroversial, and urgent. But what does it mean? Racial inequality? Gender inequality? Inequality of opportunity, of consumption, of education attainment, of wealth? Within countries or across the globe? 

Even given a narrower claim, “inequality of income before taxes is rising” (and you should be asking yourself, since when?), there are several different ways to measure this. One approach is to compare the income of people at the 90th percentile and the 10th percentile, but that tells us nothing about the super-rich, nor the ordinary people in the middle. An alternative is to examine the income share of the top 1 per cent — but this approach has the opposite weakness, telling us nothing about how the poorest fare relative to the majority.  

There is no single right answer — nor should we assume that all the measures tell a similar story. In fact, there are many true statements that one can make about inequality. It may be worth figuring out which one is being made before retweeting it. 

Perhaps it is not surprising that a concept such as inequality turns out to have hidden depths. But the same holds true of more tangible subjects, such as “a nurse”. Are midwives nurses? Health visitors? Should two nurses working half-time count as one nurse? Claims over the staffing of the UK’s National Health Service have turned on such details. 

All this can seem like pedantry — or worse, a cynical attempt to muddy the waters and suggest that you can prove anything with statistics. But there is little point in trying to evaluate whether a claim is true if one is unclear what the claim even means. 

Imagine a study showing that kids who play violent video games are more likely to be violent in reality. Rebecca Goldin, a mathematician and director of the statistical literacy project STATS, points out that we should ask questions about concepts such as “play”, “violent video games” and “violent in reality”. Is Space Invaders a violent game? It involves shooting things, after all. And are we measuring a response to a questionnaire after 20 minutes’ play in a laboratory, or murderous tendencies in people who play 30 hours a week? “Many studies won’t measure violence,” says Goldin. “They’ll measure something else such as aggressive behaviour.” Just like “inequality” or “nurse”, these seemingly common sense words hide a lot of wiggle room. 

Two particular obstacles to our understanding are worth exploring in a little more detail. One is the question of causation. “Taller children have a higher reading age,” goes the headline. This may summarise the results of a careful study about nutrition and cognition. Or it may simply reflect the obvious point that eight-year-olds read better than four-year-olds — and are taller. Causation is philosophically and technically a knotty business but, for the casual consumer of statistics, the question is not so complicated: just ask whether a causal claim is being made, and whether it might be justified. 

Returning to this study about violence and video games, we should ask: is this a causal relationship, tested in experimental conditions? Or is this a broad correlation, perhaps because the kind of thing that leads kids to violence also leads kids to violent video games? Without clarity on this point, we don’t really have anything but an empty headline.  

We should never forget, either, that all statistics are a summary of a more complicated truth. For example, what’s happening to wages? With tens of millions of wage packets being paid every month, we can only ever summarise — but which summary? The average wage can be skewed by a small number of fat cats. The median wage tells us about the centre of the distribution but ignores everything else. 

Or we might look at the median increase in wages, which isn’t the same thing as the increase in the median wage — not at all. In a situation where the lowest and highest wages are increasing while the middle sags, it’s quite possible for the median pay rise to be healthy while median pay falls.  

Sir Andrew Dilnot, former chair of the UK Statistics Authority, warns that an average can never convey the whole of a complex story. “It’s like trying to see what’s in a room by peering through the keyhole,” he tells me.  

In short, “you need to ask yourself what’s being left out,” says Mona Chalabi, data editor for The Guardian US. That applies to the obvious tricks, such as a vertical axis that’s been truncated to make small changes look big. But it also applies to the less obvious stuff — for example, why does a graph comparing the wages of African-Americans with those of white people not also include data on Hispanic or Asian-Americans? There is no shame in leaving something out. No chart, table or tweet can contain everything. But what is missing can matter. 

Channel the spirit of film noir: get the backstory. Of all the statistical claims in the world, this particular stat fatale appeared in your newspaper or social media feed, dressed to impress. Why? Where did it come from? Why are you seeing it?  

Sometimes the answer is little short of a conspiracy: a PR company wanted to sell ice cream, so paid a penny-ante academic to put together the “equation for the perfect summer afternoon”, pushed out a press release on a quiet news day, and won attention in a media environment hungry for clicks. Or a political donor slung a couple of million dollars at an ideologically sympathetic think-tank in the hope of manufacturing some talking points. 

Just as often, the answer is innocent but unedifying: publication bias. A study confirming what we already knew — smoking causes cancer — is unlikely to make news. But a study with a surprising result — maybe smoking doesn’t cause cancer after all — is worth a headline. The new study may have been rigorously conducted but is probably wrong: one must weigh it up against decades of contrary evidence. 

Publication bias is a big problem in academia. The surprising results get published, the follow-up studies finding no effect tend to appear in lesser journals if they appear at all. It is an even bigger problem in the media — and perhaps bigger yet in social media. Increasingly, we see a statistical claim because people like us thought it was worth a Like on Facebook. 

David Spiegelhalter, president of the Royal Statistical Society, proposes what he calls the “Groucho principle”. Groucho Marx famously resigned from a club — if they’d accept him as a member, he reasoned, it couldn’t be much of a club. Spiegelhalter feels the same about many statistical claims that reach the headlines or the social media feed. He explains, “If it’s surprising or counter-intuitive enough to have been drawn to my attention, it is probably wrong.”  

OK. You’ve noted your own emotions, checked the backstory and understood the claim being made. Now you need to put things in perspective. A few months ago, a horrified citizen asked me on Twitter whether it could be true that in the UK, seven million disposable coffee cups were thrown away every day.  

I didn’t have an answer. (A quick internet search reveals countless repetitions of the claim, but no obvious source.) But I did have an alternative question: is that a big number? The population of the UK is 65 million. If one person in 10 used a disposable cup each day, that would do the job.  

Many numbers mean little until we can compare them with a more familiar quantity. It is much more informative to know how many coffee cups a typical person discards than to know how many are thrown away by an entire country. And more useful still to know whether the cups are recycled (usually not, alas) or what proportion of the country’s waste stream is disposable coffee cups (not much, is my guess, but I may be wrong).  

So we should ask: how big is the number compared with other things I might intuitively understand? How big is it compared with last year, or five years ago, or 30? It’s worth a look at the historical trend, if the data are available.  

Finally, beware “statistical significance”. There are various technical objections to the term, some of which are important. But the simplest point to appreciate is that a number can be “statistically significant” while being of no practical importance. Particularly in the age of big data, it’s possible for an effect to clear this technical hurdle of statistical significance while being tiny. 

One study was able to demonstrate that unborn children exposed to a heatwave while in the womb went on to earn less as adults. The finding was statistically significant. But the impact was trivial: $30 in lost income per year. Just because a finding is statistically robust does not mean it matters; the word “significance” obscures that. 

In an age of computer-generated images of data clouds, some of the most charming data visualisations are hand-drawn doodles by the likes of Mona Chalabi and the cartoonist Randall Munroe. But there is more to these pictures than charm: Chalabi uses the wobble of her pen to remind us that most statistics have a margin of error. A computer plot can confer the illusion of precision on what may be a highly uncertain situation. 

“It is better to be vaguely right than exactly wrong,” wrote Carveth Read in Logic (1898), and excessive precision can lead people astray. On the eve of the US presidential election in 2016, the political forecasting website FiveThirtyEight gave Donald Trump a 28.6 per cent chance of winning. In some ways that is impressive, because other forecasting models gave Trump barely any chance at all. But how could anyone justify the decimal point on such a forecast? No wonder many people missed the basic message, which was that Trump had a decent shot. “One in four” would have been a much more intuitive guide to the vagaries of forecasting.

Exaggerated precision has another cost: it makes numbers needlessly cumbersome to remember and to handle. So, embrace imprecision. The budget of the NHS in the UK is about £10bn a month. The national income of the United States is about $20tn a year. One can be much more precise about these things, but carrying the approximate numbers around in my head lets me judge pretty quickly when — say — a £50m spending boost or a $20bn tax cut is noteworthy, or a rounding error. 

My favourite rule of thumb is that since there are 65 million people in the UK and people tend to live a bit longer than 65, the size of a typical cohort — everyone retiring or leaving school in a given year — will be nearly a million people. Yes, it’s a rough estimate — but vaguely right is often good enough. 

Be curious. Curiosity is bad for cats, but good for stats. Curiosity is a cardinal virtue because it encourages us to work a little harder to understand what we are being told, and to enjoy the surprises along the way.  

This is partly because almost any statistical statement raises questions: who claims this? Why? What does this number mean? What’s missing? We have to be willing — in the words of UK statistical regulator Ed Humpherson — to “go another click”. If a statistic is worth sharing, isn’t it worth understanding first? The digital age is full of informational snares — but it also makes it easier to look a little deeper before our minds snap shut on an answer.  

While curiosity gives us the motivation to ask another question or go another click, it gives us something else, too: a willingness to change our minds. For many of the statistical claims that matter, we have already reached a conclusion. We already know what our tribe of right-thinking people believe about Brexit, gun control, vaccinations, climate change, inequality or nationalisation — and so it is natural to interpret any statistical claim as either a banner to wave, or a threat to avoid.  

Curiosity can put us into a better frame of mind to engage with statistical surprises. If we treat them as mysteries to be resolved, we are more likely to spot statistical foul play, but we are also more open-minded when faced with rigorous new evidence. 

In research with Asheley Landrum, Katie Carpenter, Laura Helft and Kathleen Hall Jamieson, Dan Kahan has discovered that people who are intrinsically curious about science — they exist across the political spectrum — tend to be less polarised in their response to questions about politically sensitive topics. We need to treat surprises as a mystery rather than a threat.  

Isaac Asimov is thought to have said, “The most exciting phrase in science isn’t ‘Eureka!’, but ‘That’s funny…’” The quip points to an important truth: if we treat the open question as more interesting than the neat answer, we’re on the road to becoming wiser.  

In the end, my postcard has 50-ish words and six commandments. Simple enough, I hope, for someone who is willing to make an honest effort to evaluate — even briefly — the statistical claims that appear in front of them. That willingness, I fear, is what is most in question.  

“Hey, Bill, Bill, am I gonna check every statistic?” said Donald Trump, then presidential candidate, when challenged by Bill O’Reilly about a grotesque lie that he had retweeted about African-Americans and homicides. And Trump had a point — sort of. He should, of course, have got someone to check a statistic before lending his megaphone to a false and racist claim. We all know by now that he simply does not care. 

But Trump’s excuse will have struck a chord with many, even those who are aghast at his contempt for accuracy (and much else). He recognised that we are all human. We don’t check everything; we can’t. Even if we had all the technical expertise in the world, there is no way that we would have the time. 

My aim is more modest. I want to encourage us all to make the effort a little more often: to be open-minded rather than defensive; to ask simple questions about what things mean, where they come from and whether they would matter if they were true. And, above all, to show enough curiosity about the world to want to know the answers to some of these questions — not to win arguments, but because the world is a fascinating place. 

Saturday, 22 October 2016

So much for scientific publications: Nonsense paper written by iOS autocomplete accepted for conference

Elle Hunt in The Guardian


A nonsensical academic paper on nuclear physics written only by iOS autocomplete has been accepted for a scientific conference.


Christoph Bartneck, an associate professor at the Human Interface Technology laboratory at the University of Canterbury in New Zealand, received an email inviting him to submit a paper to the International Conference on Atomic and Nuclear Physics in the US in November.

Since I have practically no knowledge of nuclear physics I resorted to iOS autocomplete function to help me writing the paper,” he wrote in a blog post on Thursday. “I started a sentence with ‘atomic’ or ‘nuclear’ and then randomly hit the autocomplete suggestions.

“The atoms of a better universe will have the right for the same as you are the way we shall have to be a great place for a great time to enjoy the day you are a wonderful person to your great time to take the fun and take a great time and enjoy the great day you will be a wonderful time for your parents and kids,” is a sample sentence from the abstract.

It concludes: “Power is not a great place for a good time.”
Bartneck illustrated the paper – titled, again through autocorrect, “Atomic Energy will have been made available to a single source” – with the first graphic on the Wikipedia entry for nuclear physics.

He submitted it under a fake identity: associate professor Iris Pear of the US, whose experience in atomic and nuclear physics was outlined in a biography using contradictory gender pronouns.

The nonsensical paper was accepted only three hours later, in an email asking Bartneck to confirm his slot for the “oral presentation” at the international conference.

“I know that iOS is a pretty good software, but reaching tenure has never been this close,” Bartneck commented in the blog post.

He did not have to pay money to submit the paper, but the acceptance letter referred him to register for the conference at a cost of US$1099 (also able to be paid in euros or pounds) as an academic speaker.

“I did not complete this step since my university would certainly object to me wasting money this way,” Bartneck told Guardian Australia. “... My impression is that this is not a particularly good conference.”

The International Conference on Atomic and Nuclear Physics will be held on 17-18 November in Atlanta, Georgia, and is organised by ConferenceSeries: “an amalgamation of Open Access Publications and worldwide international science conferences and events”, established in 2007.

An organiser has been contacted by Guardian Australia for comment.

Bartneck said that given the quality of the review process and the steep registration fee, he was “reasonably certain that this is a money-making conference with little to no commitment to science.

“I did not yet reply to their email, but I am tempted to ask them about the reviewers’ comments. That might be a funny one.”

The conference’s call for abstracts makes only a little more sense than Bartneck’s paper.

“Nuclear and sub-atomic material science it the investigation of the properties, flow and collaborations of the essential (however not major) building pieces of matter.”

A bogus research paper reading only “Get me off Your Fucking Mailing List” repeated over and over again was accepted by the International Journal of Advanced Computer Technology, an open-access academic journal, in November 2014.

Monday, 16 July 2012

Free access to British scientific research within two years


Radical shakeup of academic publishing will allow papers to be put online and be accessed by universities, firms and individuals
Professor Dame Janet Finch
Professor Dame Janet Finch's recommendations on open access publishing prompted the government's decision.
 
The government is to unveil controversial plans to make publicly funded scientific research immediately available for anyone to read for free by 2014, in the most radical shakeup of academic publishing since the invention of the internet.

Under the scheme, research papers that describe work paid for by the British taxpayer will be free online for universities, companies and individuals to use for any purpose, wherever they are in the world.

In an interview with the Guardian before Monday's announcement David Willetts, the universities and science minister, said he expected a full transformation to the open approach over the next two years.

The move reflects a groundswell of support for "open access" publishing among academics who have long protested that journal publishers make large profits by locking research behind online paywalls. "If the taxpayer has paid for this research to happen, that work shouldn't be put behind a paywall before a British citizen can read it," Willetts said.

"This will take time to build up, but within a couple of years we should see this fully feeding through."

He said he thought there would be "massive" economic benefits to making research open to everyone.
Though many academics will welcome the announcement, some scientists contacted by the Guardian were dismayed that the cost of the transition, which could reach £50m a year, must be covered by the existing science budget and that no new money would be found to fund the process. That could lead to less research and fewer valuable papers being published.

British universities now pay around £200m a year in subscription fees to journal publishers, but under the new scheme, authors will pay "article processing charges" (APCs) to have their papers peer reviewed, edited and made freely available online. The typical APC is around £2,000 per article.

Tensions between academics and the larger publishing companies have risen steeply in recent months as researchers have baulked at journal subscription charges their libraries were asked to pay.

More than 12,000 academics have boycotted the Dutch publisher Elsevier, in part of a broader campaign against the industry that has been called the "academic spring".

The government's decision is outlined in a formal response to recommendations made in a major report into open access publishing led by Professor Dame Janet Finch, a sociologist at Manchester University. Willetts said the government accepted all the proposals, except for a specific point on VAT that was under consideration at the Treasury.

Further impetus to open access is expected in coming days or weeks when the Higher Education Funding Council for England emphasises the need for research articles to be freely available when they are submitted for the Research Excellence Framework, which is used to determine how much research funding universities receive.

The Finch report strongly recommended so-called "gold" open access, which ensures the financial security of the journal publishers by essentially swapping their revenue from library budgets to science budgets. One alternative favoured by many academics, called "green" open access, allows researchers to make their papers freely available online after they have been accepted by journals. It is likely this would be fatal for publishers and also Britain's learned societies, which survive through selling journal subscriptions.

"There is a genuine value in academic publishing which has to be reflected and we think that is the case for gold open access, which includes APCs," Willetts told the Guardian. "There is a transitional cost to go through, but it's overall of benefit to our research community and there's general acceptance it's the right thing to do.

"We accept that some of this cost will fall on the ring-fenced science budget, which is £4.6bn. In Finch's highest estimation that will be 1% of the science budget going to pay for gold open access, at least before we get to a new steady state, when we hope competition will bring down author charges and universities will make savings as they don't have to pay so much in journal subscriptions," he added.

"The real economic impact is we are throwing open, to academics, researchers, businesses and lay people, all the high quality research that is publicly funded. I think there's a massive net economic benefit here way beyond any £50m from the science budget," Willetts said.

In making such a concerted move towards open access before other countries, Britain will be giving its research away free while still paying for access to articles from other countries.

Willetts said he hoped the EU would soon take the same path when it announced the next tranche of Horizon 2020 grants, which are available for projects that run from 2014. The US already makes research funded by its National Institutes of Health open access, and is expected to make more of its publicly funded research freely available online.

Professor Adam Tickell, pro-vice chancellor of research and knowledge transfer at Birmingham University, and a member of the Finch working group, said he was glad the government had endorsed the recommendations, but warned there was a danger of Britain losing research projects in the uncertain transition to open access publishing.

"If the EU and the US go in for open access in a big way, then we'll move into this open access world with no doubt at all, and I strongly believe that in a decade that's where we'll be. But it's the period of transition that's the worry. The UK publishes only 6% of global research, and the rest will remain behind a paywall, so we'll still have to pay for a subscription," Tickell said.

"I am very concerned that there are not any additional funds to pay for the transition, because the costs will fall disproportionately on the research intensive universities. There isn't the fat in the system that we can easily pay for that." The costs would lead to "a reduction in research grants, or an effective charge on our income" he said.

Another consequence of the shift could be a "rationing" of research papers from universities as competition for funds to publish papers intensifies. This could be harmful, Tickell said. For example, a study that finds no beneficial effect of a drug might be seen as negative results and go unpublished, he said.

Stevan Harnad, professor of electronics and computer science at Southampton University, said the government was facing an expensive bill in supporting gold open access over the green open access model.

He said UK universities and research funders had been leading the world in the movement towards "green" open access that requires researchers to self-archive their journal articles on the web, and make them free for all.

"The Finch committee's recommendations look superficially as if they are supporting open access, but in reality they are strongly biased in favour of the interests of the publishing industry over the interests of UK research," he said.

"Instead of recommending that the UK build on its historic lead in providing cost-free green open access, the committee has recommended spending a great deal of extra money — scarce research money — to pay publishers for "gold open access publishing. If the Finch committee recommendations are heeded, as David Willetts now proposes, the UK will lose both its global lead in open access and a great deal of public money — and worldwide open access will be set back at least a decade," he said.