Skip to main content

The AI Delusion - Gary Smith *****

This is a very important little book ('little' isn't derogatory - it's just quite short and in a small format) - it gets to the heart of the problem with applying artificial intelligence techniques to large amounts of data and thinking that somehow this will result in wisdom.

Gary Smith as an economics professor who teaches statistics, understands numbers and, despite being a self-confessed computer addict, is well aware of the limitations of computer algorithms and big data. What he makes clear here is that we forget at our peril that computers do not understand the data that they process, and as a result are very susceptible to GIGO - garbage in, garbage out. Yet we are increasingly dependent on computer-made decisions coming out of black box algorithms which mine vast quantities of data to find correlations and use these to make predictions. What's wrong with this? We don't know how the algorithms are making their predictions - and the algorithms don't know the difference between correlation and causality.

The scientist's (and statistician's) mantra is often 'correlation is not causality.' What this means is that if we have two things happening in the world we choose to measure - let's call them A (it could be banana imports) and B (it could number of pregnancies in the country) and if B rises and falls as A does, it doesn't mean that B is caused by A. It could be that A is caused B, A and B are both caused by C, or it's just a random coincidence. The banana import/pregnancy correlation actually happened in the UK for a number of years after the second world war. Human statisticians would never think the pregnancies were caused by banana imports - but an algorithm would not know any better.

In the banana case there was probably a C linking the two, but because modern data mining systems handle vast quantities of data and look at hundreds or thousands of variables, it is almost inevitable that they will discover apparent links between two sets of information where the coincidence is totally random. The correlation happens to work for the data being mined, but is totally useless for predicting the future. 

This is the thesis at the heart of this book. Smith makes four major points that really should be drummed into all stock traders, politicians, banks, medics, social media companies... and anyone else who is tempted to think that letting a black box algorithm loose on vast quantities of data will make useful predictions. First, there are patterns in randomness. Given enough values, totally random data will have patterns embedded within it - it's easy to assume that these have a meaning, but they don't. Second, correlation is not causality. Third, cherry picking is dangerous. Often these systems pick the bits of the data that work and ignore the bits that don't - an absolute no-no in proper analysis. And finally, data without theory is treacherous. You need to have a theory and test it against the data - if you try to derive the theory from the data with no oversight, it will always fit that data, but is very unlikely to be correct.

My only problems with book is that Smith insists for some reason on making databases two words ('data bases' - I know, not exactly terrible), and the book can feel a bit repetitious because most of it consists of repeated examples of how the four points above lead AI systems to make terrible predictions - from Hillary Clinton's system mistakenly telling her team where to focus canvassing effort to the stock trading systems produced by 'quants'. But I think that repetition is important here because it shows just how much we are under the sway of these badly thought-out systems - and how much we need to insist that algorithms that affect our lives are transparent and work from knowledge, not through data mining. 

As Smith points out, we regularly hear worries that AI systems are going to get so clever that they will take over the world. But actually the big problem is that our AI systems are anything but intelligent: 'In the age of Big Data, the real danger is not that computers are smarter than us, but that we think computers are smarter than us and therefore trust computers to make important decisions for us.’

This should be big-selling book. A plea to the publisher: change the cover (it just looks like it's badly printed and smudged) and halve the price to give it wider appeal. 

Hardback:  

Kindle:  

Review by Brian Clegg

Comments

Popular posts from this blog

The Art of Statistics - David Spiegelhalter *****

Statistics have a huge impact on us - we are bombarded with them in the news, they are essential to medical trials, fundamental science, some court cases and far more. Yet statistics is also a subject than many struggle to deal with (especially when the coupled subject of probability rears its head). Most of us just aren't equipped to understand what we're being told, or to question it when the statistics are dodgy. What David Spiegelhalter does here is provide a very thorough introductory grounding in statistics without making use of mathematical formulae*. And it's remarkable.

What will probably surprise some who have some training in statistics, particularly if (like mine) it's on the old side, is that probability doesn't come into the book until page 205. Spiegelhalter argues that as probability is the hardest aspect for us to get an intuitive feel for, this makes a lot of sense - and I think he's right. That doesn't mean that he doesn't cover all …

The Best of R. A. Lafferty (SF) – R. A. Lafferty ****

Throughout my high school years (1973–76) I carefully kept a list of all the science fiction I read. I’ve just dug it out, and it contains no fewer than 1,291 entries – almost all short stories I found in various SF magazines and multi-author anthologies. Right on the first page, the sixth item is ‘Thus We Frustrate Charlemagne’ by R. A. Lafferty, and his name appears another 32 times before the end of the list. This isn’t a peculiarity of my own tastes. Short stories were much more popular in those days than they are today, and any serious SF fan would have encountered Lafferty – a prolific writer of short fiction – in the same places I did.

But times change, and this Gollancz Masterworks volume has a quote from the Guardian on the back describing Lafferty as ‘the most important science fiction writer you’ve never heard of’. Hopefully this newly assembled collection will go some way to remedying that situation. It contains 22 short stories, mostly dating from the 1960s and 70s, each w…

David Beerling - Four Way Interview

David Beerling is the Sorby Professor of Natural Sciences, and Director of the Leverhulme Centre for Climate Change Mitigation at the University of Sheffield. His book The Emerald Planet (OUP, 2007) formed the basis of a major 3-part BBC TV series ‘How to Grow a Planet’. His latest title is Making Eden.

Why science?

I come from a non-academic background. None of my family, past or present, went to university, which may explain the following. In the final year of my degree in biological sciences at the University of Wales, Cardiff (around 1986), we all participated in a field course in mid-Wales, and I experienced an epiphany. I was undertaking a small research project on the population dynamics of bullheads (Cotus gobio), a common small freshwater fish, with a charismatic distinguished professor, and Fellow of the Royal Society in London. Under his guidance, I discovered the process of learning how nature works through the application of the scientific method. It was the most exciting t…