Unpopular Opinion: correlation

Showing posts with label correlation. Show all posts

Friday, 1 June 2018

I can make one confident prediction: my forecasts will fail

Tim Harford in The Financial Times

I am not one of those clever people who claims to have seen the 2008 financial crisis coming, but by this time 10 years ago I could see that the fallout was going to be bad. Banking crises are always damaging, and this was a big one. The depth of the recession and the long-lasting hit to productivity came as no surprise to me. I knew it would happen.

Or did I? This is the story I tell myself, but if I am honest I do not really know. I did not keep a diary, and so must rely on my memory — which, it turns out, is not a reliable servant.

In 1972, the psychologists Baruch Fischhoff and Ruth Beyth conducted a survey in which they asked for predictions about Richard Nixon’s imminent presidential visit to China and Russia. How likely was it that Nixon and Mao Zedong would meet? What were the chances that the US would grant diplomatic recognition to China? Professors Fischhoff and Beyth wanted to know how people would later remember their forecasts. Since their subjects had taken the unusual step of writing down a specific probability for each of 15 outcomes, one might have hoped for accuracy. But no — the subjects flattered themselves hopelessly. The Fischhoff-Beyth paper was titled, “I knew it would happen”.

This is a reminder of what a difficult task we face when we try to make big-picture macroeconomic and geopolitical forecasts. To start with, the world is a complicated place, which makes predictions challenging. For many of the subjects that interest us, there is a substantial delay between the forecast and the outcome, and this delayed feedback makes it harder to learn from our successes and failures. Even worse, as Profs Fischhoff and Beyth discovered, we systematically misremember what we once believed.

Small wonder that forecasters turn to computers for help. We have also known for a long time — since work in the 1950s by the late psychologist Paul Meehl — that simple statistical rules often outperform expert intuition. Meehl’s initial work focused on clinical cases — for example, faced with a patient suffering chest pains, could a two or three-point checklist beat the judgment of an expert doctor? The experts did not fare well. However, Meehl’s rules, like more modern machine learning systems, require data to work. It is all very well for Amazon to forecast what impact a price drop may have on the demand for a book — and some of the most successful hedge funds use algorithmically-driven strategies — but trying to forecast the chance of Italy leaving the eurozone, or Donald Trump’s impeachment, is not as simple. Faced with an unprecedented situation, machines are no better than we are. And they may be worse.

Much of what we know about forecasting in a complex world, we know from the research of the psychologist Philip Tetlock. In the 1980s, Prof Tetlock began to build on the Fischhoff-Beyth research by soliciting specific and often long-term forecasts from a wide variety of forecasters — initially hundreds. The early results, described in Prof Tetlock’s book Expert Political Judgement, were not encouraging. Yet his idea of evaluating large numbers of forecasters over an extended period of time has blossomed, and some successful forecasters have emerged.

The latest step in this research is a “Hybrid Forecasting Tournament”, sponsored by the US Intelligence Advanced Research Projects Activity, designed to explore ways in which humans and machine learning systems can co-operate to produce better forecasts. We await the results. If the computers do produce some insight, it may be because they can tap into data that we could hardly have imagined using before. Satellite imaging can now track the growth of crops or the stockpiling of commodities such as oil. Computers can guess at human sentiment by analysing web searches for terms such as “job seekers allowance”, mentions of “recession” in news stories, and positive emotions in tweets.

And there are stranger correlations, too. A study by economists Kasey Buckles, Daniel Hungerman and Steven Lugauer showed that a few quarters before an economic downturn in the US, the rate of conceptions also falls. Conceptions themselves may be deducible by computers tracking sales of pregnancy tests and folic acid.

Back in 1991, a psychologist named Harold Zullow published research suggesting that the emotional content of songs in the Billboard Hot 100 chart could predict recessions. Hits containing “pessimistic rumination” (“I heard it through the grapevine / Not much longer would you be mine”) tended to predict an economic downturn.

His successor is a young economist named Hisam Sabouni, who reckons that a computer-aided analysis of Spotify streaming gives him an edge in forecasting stock market movements and consumer sentiment. Will any of this prove useful for forecasting significant economic and political events? Perhaps. But for now, here is an easy way to use a computer to help you forecast: open up a spreadsheet, note down what you believe today, and regularly revisit and reflect. The simplest forecasting tip of all is to keep score.

Monday, 18 July 2016

A nine-point guide to spotting a dodgy statistic

Boris Johnson did not remove the £350m figure from the Leave campaign bus even after it had been described as ‘misleading’. Photograph: Stefan Rousseau/PA

David Spiegelhalter in The Guardian

I love numbers. They allow us to get a sense of magnitude, to measure change, to put claims in context. But despite their bold and confident exterior, numbers are delicate things and that’s why it upsets me when they are abused. And since there’s been a fair amount of number abuse going on recently, it seems a good time to have a look at the classic ways in which politicians and spin doctors meddle with statistics.

Every statistician is familiar with the tedious “Lies, damned lies, and statistics” gibe, but the economist, writer and presenter of Radio 4’s More or Less, Tim Harford, has identified the habit of some politicians as not so much lying – to lie means having some knowledge of the truth – as “bullshitting”: a carefree disregard of whether the number is appropriate or not.

So here, with some help from the UK fact-checking organisation Full Fact, is a nine-point guide to what’s really going on.

Use a real number, but change its meaning

There’s almost always some basis for numbers that get quoted, but it’s often rather different from what is claimed. Take, for example, the famous £350m, as in the “We send the EU £350m a week” claim plastered over the big red Brexit campaign bus. This is a true National Statistic (see Table 9.9 of the ONS Pink Book 2015), but, in the words of Sir Andrew Dilnot, chair of the UK Statistics Authority watchdog, it “is not an amount of money that the UK pays to the EU”. In fact, the UK’s net contribution is more like £250m a week when Britain’s rebate is taken into account – and much of that is returned in the form of agricultural subsidies and grants to poorer UK regions, reducing the figure to £136m. Sir Andrew expressed disappointment that this “misleading” claim was being made by Brexit campaigners but this ticking-off still did not get the bus repainted.

George Osborne quoted the Treasury’s projection of £4,300 as the cost per household of leaving the EU. Photograph: Matt Cardy/Getty Images

Make the number look big (but not too big)

Why did the Leave campaign frame the amount of money as “£350m per week”, rather than the equivalent “£19bn a year”? They probably realised that, once numbers get large, say above 10m, they all start seeming the same – all those extra zeros have diminishing emotional impact. Billions, schmillions, it’s just a Big Number.

Of course they could have gone the other way and said “£50m a day”, but then people might have realised that this is equivalent to around a packet of crisps each, which does not sound so impressive.

George Osborne, on the other hand, preferred to quote the Treasury’s projection of the potential cost of leaving the EU as £4,300 per household per year, rather than as the equivalent £120bn for the whole country. Presumably he was trying to make the numbers seem relevant, but perhaps he would have been better off framing the projected cost as “£2.5bn a week” so as to provide a direct comparison with the Leave campaign’s £350m. It probably would not have made any difference: the weighty 200-page Treasury report is on course to become a classic example of ignored statistics.

Recent studies confirmed higher death rates at weekends, but showed no relationship to weekend staffing levels. Photograph: Peter Byrne/PA

Casually imply causation from correlation

In July 2015 Jeremy Hunt said: “Around 6,000 people lose their lives every year because we do not have a proper seven-day service in hospitals….” and by February 2016 this had increased to “11,000 excess deaths because we do not staff our hospitals properly at weekends”. These categorical claims that weekend staffing was responsible for increased weekend death rates were widely criticised at the time, particularly by the people who had done the actual research. Recent studies have confirmed higher death rates at weekends, but these showed no relationship to weekend staffing levels.

Choose your definitions carefully

On 17 December 2014, Tom Blenkinsop MP said, “Today, there are 2,500 fewer nurses in our NHS than in May 2010”, while on the same day David Cameron claimed “Today, actually, there are new figures out on the NHS… there are 3,000 more nurses under this government.” Surely one must be wrong?

But Mr Blenkinsop compared the number of people working as nurses between September 2010 and September 2014, while Cameron used the full-time-equivalent number of nurses, health visitors and midwives between the start of the government in May 2010 and September 2014. So they were both, in their own particular way, right.

‘Indicator hopper’: Health secretary Jeremy Hunt. Photograph: PA

Use total numbers rather than proportions (or whichever way suits your argument)

In the final three months of 2014, less than 93% of attendances at Accident and Emergency units were seen within four hours, the lowest proportion for 10 years. And yet Jeremy Hunt managed to tweet that “More patients than ever being seen in less than four hours”. Which, strictly speaking, was correct, but only because more people were attending A&E than ever before. Similarly, when it comes to employment, an increasing population means that the number of employed can go up even when the employment rate goes down. Full Fact has shown how the political parties play “indicator hop”, picking whichever measure currently supports their argument.

Is crime going up or down? Don’t ask Andy Burnham. Photograph: PA

Don’t provide any relevant context

Last September shadow home secretary Andy Burnham declared that “crime is going up”, and when pressed pointed to the police recording more violent and sexual offences than the previous year. But police-recorded crime data were de-designated as “official” statistics by the UK Statistics Authority in 2014 as they were so unreliable: they depend strongly on what the public choose to report, and how the police choose to record it.

Instead the Crime Survey for England and Wales is the official source of data, as it records crimes that are not reported to the police. And the Crime Survey shows a steady reduction in crime for more than 20 years, and no evidence of an increase in violent and sexual offences last year.
Exaggerate the importance of a possibly illusory change

Next time you hear a politician boasting that unemployment has dropped by 30,000 over the previous quarter, just remember that this is an estimate based on a survey. And that estimate has a margin of error of +/- 80,000, meaning that unemployment may well have gone down, but it may have gone up – the best we can say is that it hasn’t changed very much, but that hardly makes a speech. And to be fair, the politician probably has no idea that this is an estimate and not a head count.

Serious youth crime has actually declined, but that’s not because of TKAP. Photograph: Action Press / Rex Features

Prematurely announce the success of a policy initiative using unofficial selected data

In June 2008, just a year after the start of the Tackling Knives Action Programme (TKAP), No 10 got the Home Office to issue a press release saying “the number of teenagers admitted to hospital for knife or sharp instrument wounding in nine… police force areas fell by 27% according to new figures published today”. But this used unchecked unofficial data, and was against the explicit advice of official statisticians. They got publicity, but also a serious telling-off from the UK Statistics Authority which accused No 10 of making an announcement that was “corrosive of public trust in official statistics”. The final conclusion about the TKAP was that serious youth violence had declined in the country, but no more in TKAP areas than elsewhere.

Donald Trump: ‘Am I going to check every statistic?’
Photograph: Robert F. Bukaty/AP

If all else fails, just make the numbers up

Last November, Donald Trump tweeted a recycled image that included the claim that “Whites killed by blacks – 81%”, citing “Crime Statistics Bureau – San Francisco”. The US fact-checking site Politifact identified this as completely fabricated – the “Bureau” did not exist, and the true figure is around 15%. When confronted with this, Trump shrugged and said, “Am I going to check every statistic?”

Not all politicians are so cavalier with statistics, and of course it’s completely reasonable for them to appeal to our feelings and values. But there are some serial offenders who conscript innocent numbers, purely to provide rhetorical flourish to their arguments.

We deserve to have statistical evidence presented in a fair and balanced way, and it’s only by public scrutiny and exposure that anything will ever change. There are noble efforts to dam the flood of naughty numbers. The BBC’s More or Less team take apart dodgy data, organisations such as Full Fact and Channel 4’s FactCheck expose flagrant abuses, the UK Statistics Authority write admonishing letters. The Royal Statistical Society offers statistical training for MPs, and the House of Commons library publishes a Statistical Literacy Guide: how to spot spin and inappropriate use of statistics.

They are all doing great work, but the shabby statistics keep on coming. Maybe these nine points can provide a checklist, or even the basis for a competition – how many points can your favourite minister score? In my angrier moments I feel that number abuse should be made a criminal offence. But that’s a law unlikely to be passed by politicians.

David Spiegelhalter is the Winton Professor of the Public Understanding of Risk at the University of Cambridge and president elect of the Royal Statistical Society

Wednesday, 26 June 2013

Mickey's problem - The sacking of Australian cricket coach Mickey Arthur

Australia's recently replaced coach came up against an Australian cricketing culture struggling to come to terms with a new reality

Ed Smith

June 26, 2013

Mickey Arthur watches on from the balcony, Edgbaston, June 12, 2013

Arthur's track record of success with South Africa does not "prove" he is a brilliant coach any more than his track record of relative failure with Australia proves he is a bad one © AFP

Enlarge

Related Links

Daniel Brettig : Arthur let down by coach-killers

News : Lehmann's cultural remedy for Australia

Players/Officials: Mickey Arthur | Darren Lehmann

Series/Tournaments: Australia tour of England and Scotland

Teams: Australia

One of the questions asked of Australian cricketers during the Mickey Arthur era was, "How did you rate your sleep?" The idea was to encourage a holistic approach to match preparation, in which mind and body worked together in blissful harmony.

From today, if a player complains about a poor night's sleep under the new coaching regime of Darren Lehmann, he should expect the burly left-hander to reply: "Should have had an extra couple of beers last night then, mate." As for hydration, Rod Marsh used to say that if you had to take a toilet break during the hours of play then you obviously hadn't drunk enough the night before. Being a bit thirsty in the morning has its benefits.

In turning to Lehmann, there is a sense of Australian cricket coming home. He is naturally chatty and quick-witted, with a keen cricket brain and an earthy manner. When he was Yorkshire's overseas player, I remember a close four-day match between Yorkshire and Kent at Canterbury. Before the start of the final day's play, it was agreed that both teams would enjoy a few drinks in the home dressing room after the match. Lehmann was free and unguarded with his perceptions and insights, almost as though it was a responsibility of senior players to talk about the game. You could also tell he was absolutely in his element in a dressing-room environment.

Context is everything, as Mickey Arthur has found out. As coach of South Africa, Arthur enjoyed an established side, a resolute captain and an experienced group of senior players. That played to his strengths. An affable and undemonstrative man, Arthur could operate under the radar. Graeme Smith, one of the strongest captains in world cricket, already commanded plenty of authority and a clear sense of direction.

It has become fashionable in modern sport to waste a great deal of energy fretting about "job descriptions" and "lines of accountability". In real life, however, wherever the arrows may point on the flow charts, power finds itself in the hands of dominant personalities. The real determining factor in the distribution of power between a captain and a coach is their personal chemistry. A shrewd coach will empower a captain and the senior player as far as possible. And when Arthur was coach of South Africa, there was no shortage of alpha males out on the pitch.

Now transfer Arthur into a very different setting. Where South Africa had a settled side that was enjoying sustained success, Australia are adjusting - or failing to adjust - to leaner years, having gorged themselves on two decades of feasting on perpetual success. Where most of the South African team selected itself, Australia have had great difficulty identifying their best XI. That is not a criticism. You try selecting the same team during a sequence of defeats and listen in vain for the pundits shouting, "Well done on retaining consistency of selection." No, losing teams search for a new combination that will bring better results. The much-worshipped god "consistency of selection" is partly a privilege that follows from success as well as a cause of it. There is certainly a strong correlation between a settled side and a winning team, but as mathematicians learn in their first statistics class, correlation does not always imply straightforward causality.

Arthur faced another problem not of his own making: the expectations of the Australian cricketing culture. This has been an unpleasant hangover after a hell of a party. For 20 years Australian cricket celebrated a golden age that would have made Jay Gatsby blush. In terms of cricketing talent, the taps overflowed with vintage champagne. To understand how good Australia were, simply remember that Lehmann himself only played 27 Tests.



	We used to hear how Australian cricket was best because they were mates who played for each other; Australian cricket was best because they were tougher and "mentally stronger"; Australian cricket was best because they had fewer first-class teams; Australian cricket was best because it didn't have to endure the "mediocrity of county cricket"

As any economist will tell you, the most dangerous aspect of any boom is the absurd way it is "explained" as a new and permanent paradigm shift (remember the view, just before the financial crisis, that modern banks had mastered "risk-free" methods?) We used to hear how Australian cricket was best because they were mates who played for each other; Australian cricket was best because they were tougher and "mentally stronger"; Australian cricket was best because they had fewer first-class teams; Australian cricket was best because it didn't have to endure the "mediocrity of county cricket"; Australian cricket was best because they knew how to enjoy a win and let their hair down; Australian cricket was best because they were "more professional". I heard all those theories put forward with huge confidence, often in tandem, even when the theories contradicted each other.

The difficulty, of course, came when results deteriorated, as they eventually had to. In a boom, you can have any explanation for why Australia were so good and still be proved "right". As a result, Australian cricket finds itself awash with voodoo doctors - convinced of their own prescience - rushing to pronounce the cure for a new and frightening malady called "average results". My own opinion is that the rise and fall of cricketing nations is harder to explain, let alone reverse, than most people seem to think.

Arthur's frustrating time with Australia reveals a broader problem. The whole notion of "a track record" is questionable, especially when the track record under discussion consists of a smallish sample size. Arthur's track record of success with South Africa does not "prove" he is a brilliant coach any more than his track record of relative failure with Australia proves he is a bad one.

Each phase of every management career is unique. The way any team functions can never be reduced to scientific analysis. As a result, credit and blame can never be exactly apportioned. We know for sure that some leaders experience success and failure. But exactly why, or to what extent they were responsible, will always remain partly a mystery. Coaches do not operate in a vacuum. What they inherit - the personnel, appetite for change, and attitude of the wider culture - matters at least as much as their methods.

Arthur encountered an Australian cricketing culture struggling to come to terms with a new reality. Quite simply, they aren't that good anymore. They may well get better under Darren Lehmann. But anxiously searching for miracles has a nasty habit of making them harder to find.

Tuesday, 23 April 2013

Beware the nostrums of economists

T. T. RAM MOHAN

Politicians should not fall for the economic fad of the day. Policies should be subjected to democratic processes and be responsive to people’s aspirations

“The ideas of economists,” John Maynard Keynes famously wrote, “… are more powerful than is commonly understood. Indeed the world is ruled by little else.” He might have added that the ideas of economists can often be dangerous. Policies framed on the basis of the prevailing or dominant economic wisdom have often gone awry and the wisdom was later found to rest on shaky foundations.

A striking case in point is the debate on austerity in the Eurozone as an answer to rising public debt and faltering economic growth. One school has long argued that the way to reduce debt and raise the growth rate is through austerity, that is, steep cuts in public spending (and, in some cases, higher taxes). This school received a mighty boost from a paper published in 2010 by two economists, Carmen Reinhart and Kenneth Rogoff (RR). The paper is now at the centre of a roaring controversy amongst economists.

The RR paper showed that there is a correlation between an economy’s debt to GDP ratio. As the ratio rises from one range to another, growth falls. Once the debt to GDP ratio rises beyond 90 per cent, growth falls sharply to -0.1 per cent. For some economists and also for policymakers in the Eurozone, this last finding provided an ‘aha’ moment.

CUTS IN SPENDING

Since public debt was clearly identified as the culprit, it needed to be brought down through cuts in spending. The IMF pushed this line in the bail-out packages it worked out for Greece and Portugal among others. The U.K. chose to become an exemplar of austerity of its own accord.

It now turns out that there was a computational error in the RR paper. Three economists at the University of Massachusetts at Amherst have produced a paper that shows that the effect of rising public debt is nowhere as drastic as RR made it out to be. At a debt to GDP ratio of 90 per cent, growth declines from an average of 3.2 per cent to 2.2 per cent, not from 2.8 per cent to -0.1 per cent, as RR had contended.

You could say that even the revised estimates show that growth does fall with rising GDP. However, as many commentators have pointed out, correlation is not causation. We cannot conclude from the data that high debt to GDP ratios are the cause of low growth. It could well be the other way round, namely, that low growth results in a high debt to GDP ratio.

There is a broad range of experience that suggests that high debt to GDP ratios are often self-correcting. Both the U.S. and the U.K. emerged from the Second World War with high debt to GDP ratios. These ratios fell as growth accelerated in the post-war years. India’s own debt to GDP ratio kept rising through the second half of the 1990s and the early noughties. As growth accelerated on the back of a global boom, the ratio fell sharply. The decline in the ratio did not happen because of expenditure compression, which the international agencies and some of our own economists had long urged.

NEEDED, RETHINK

The controversy over the RR paper should prompt serious rethinking on austerity in the Eurozone. Many economists have long argued that the sort of austerity that has been imposed on some of the Eurozone economies or that the U.K. has chosen to practise cannot deliver higher growth in the near future. It only condemns the people of those economies to a long period of pain.

The IMF itself has undergone a major conversion on this issue and is now pressing the U.K. to change course on austerity. Its chief economist, Olivier Blanchard, went so far as to warn that the U.K. Chancellor, George Osborne, was “playing with fire.” The IMF’s conversion came about late last year when it acknowledged that its own estimates of a crucial variable, the fiscal multiplier, had been incorrect. In its World Economic Outlook report published last October, the IMF included a box on the fiscal multiplier, which is the impact on output of a cut or increase in public spending (or an increase or reduction in taxes). The smaller the multiplier, the less costly, in terms of lost output, is fiscal consolidation. The IMF had earlier assumed a multiplier for 28 advanced economies of around 0.5. This would mean that for any cut in public spending of X, the impact on output would be less than X, so the debt to GDP ratio would fall.

REVISED ESTIMATE

The IMF now disclosed that, since the sub-prime crisis, the fiscal multipliers had been higher — in the range of 0.9 to 1.7. The revised estimate for the multiplier meant that fiscal consolidation would cause the debt to GDP ratio to rise — exactly the opposite of what policymakers in the Eurozone had blithely assumed. The people of Eurozone economies that have seen GDP shrink and unemployment soar are unlikely to be amused by the belated dawning of wisdom at the IMF.

This is not the first time the IMF has made a volte face on an important matter of economic policy. Before the East Asian crisis and for several years thereafter, the IMF was a strong votary of free flows of capital. During the East Asian crisis, many economists had pointed out that the case for free flows of capital position lacked a strong economic foundation, unlike the case for free trade. This did not prevent the IMF from peddling its prescription to the developing world. India and China refused to go along.

In 2010, the IMF discarded its hostility to capital controls. It said that countries would be justified in responding to temporary surges in capital flows. A year later, it took the position that countries would be justified in responding to capital surges of a permanent nature as well. Last December, it came out with a paper that declared that there was “no presumption that full liberalisation is an appropriate goal for all countries at all times.” The IMF’s realisation was a little late in the day for the East Asian economies and others whose banking systems have been disrupted by volatile capital flows.

Capital account convertibility is one instance of a fad in policy catching on even when it lacked a strong economic foundation. Another is privatisation, for which Margaret Thatcher has been eulogised in recent weeks. Thatcher’s leap into privatisation in the U.K. was driven by her conviction that the state needed to be pushed back. After privatisation became something of a wave, economists sought to find theoretical and empirical grounds for it and initially came out overwhelmingly in favour.

GRADUATED APPROACH

It took major mishaps in privatisation in places such as Russia and Eastern Europe for the conclusions to become rather more nuanced. Privatisation works in some countries, in some industries, and under conditions in which law and order, financial markets and corporate governance are sound. Moreover, partial privatisation — or what is called disinvestment — can be as effective as full privatisation. As in the case of capital account convertibility, India’s graduated approach to liberalisation has been vindicated. It is, perhaps, no coincidence that the fastest growing economies in the world until recently, China and India, did not embrace the conventional wisdom on privatisation.

Other fads have fallen by the wayside or are seen as less than infallible since the sub-prime crisis, and these relate to the financial sector. ‘Principles-based’ regulation is superior to ‘rule-based’ regulation. The central bank must confine itself to monetary policy and regulatory powers must be vested in a separate authority. Monetary policy must focus on inflation alone and must not worry about asset bubbles and financial stability. One can add to this list.

What lessons for policymaking can we derive from the changes in fashion amongst economists? Certainly, one is that politicians and policymakers must beware the nostrums of economists, and they must not fall for the economic fad of the day. Economic policies must always be subject to democratic processes and be responsive to the aspirations of people. Broad acceptability in the electorate must be the touchstone of economic policies. Another important lesson is that gradualism is preferable to ‘big bang’ reforms.

India’s attempts at liberalisation, one would venture to suggest, have conformed to these principles better than many attempted elsewhere. Such an approach can mean frustrating delays in decision-making and the results may be slow in coming. However, social turbulence is avoided, as are nasty surprises, in economic outcomes. At the end of the day, economic performance turns out to be more enduring.

(The author is a professor at IIM Ahmedabad; ttr@iimahd.ernet.in)

Search This Blog