Skip to main content

Can Computers write Science Books? - Brian Clegg

The German academic publisher Springer has for some time been using automated editing software (with mixed results) - but recently has brought out a whole book written by a piece of AI software called Beta Writer. The book, Lithium-Ion Batteries: a machine generated summary of current research, can be downloaded free of charge as a PDF. But is this a serious challenge for science writers?

It's certainly interesting. If I'm honest, this is hardly a book at all - it's more the output of an automated abstract generator pulled together in book form, where frankly this information would be far better just as a web page. However, there's no doubt that there is some interesting work going on here, particularly in the introduction and conclusion sections of the 'book'.

The whole thing starts with a (human written) preface explaining the technology - by far the most readable part of the text. We then get four 'chapters' of machine-generated content, which each have the format introduction/ set of abstracts / conclusion. Obviously it's the introduction and conclusion that provide the most interest.

I'll focus on the first introduction, though the same criticisms apply throughout. The first test of a piece of scientific writing meant to be readable is to take a step back and get an overview of a chunk of text - does it look like English or is it dominated by acronyms and numbers? A chunk out of the first page shows that this is very dense technical text, extremely low on readability:



The other two significant indicators of readability are whether the text is a collection of fact statements or is written using connectives and summary to give flow, and whether or not overall there is a structure that takes the reader by the hand and leads them through a communication process. On both tests, the book falls down in a big way. Pretty well every sentence is a standalone fact statement that could be a bullet point: there is no flow whatsoever. And although some attempt has been made to group these statements effectively, there is no sense of a thought-through structure. In the interminable-seeming introductions - the first one runs to 22 dense pages - there is no sense that we are going anywhere, just that we are experiencing randomly thrown together bits of data.

Inevitably, an automated process will produce some sentences that don't quite work, so one essential here is to see whether these have been captured and fixed. A reasonably high percentage of the content does make grammatical sense, but there are regular hiccups - for example we get: 

  • 'That sort of research's principal aim...' - it should be 'principle' not 'principal'. 
  • 'Materials, a number of metal oxides with high theoretical capacity have aroused more and more attention including...' - that 'Materials,' start makes no sense.
  • 'Through Tang and others, mesoporous nanosheet is synthesized...' - sounds painful.
  • 'It is still maintained the huge capacity of 611 mAg-1... when utilized as an anode.' - doesn't make any sense.
  • 'Apart from, few-layer nanosheets enhance a fast insertion...' - apart from what?
  • And so on for many, many more examples.

Going on comments I've had from some Springer authors, the level of uncaught or automatic-editing-generated errors is fairly high in their human-authored publications - these books tend not to be heavily edited - but because they are starting with far more readable text, this is less of an issue.

So, should science writers be worried? Obviously, as a professional writer myself I'm biassed, but I would say 'No' - at least, not yet. The text in the introductions and conclusions is nowhere near the readability of a decent technical science book, let alone the far higher writing quality required for a good popular science book. And the outcome also emphasises that even if, long-term, automated writing becomes more common, it is always likely to need a look over by a human editor to avoid errors creeping in. However, this is a fascinating experiment and Springer should be congratulated for getting this far.

Comments

Popular posts from this blog

The Art of Statistics - David Spiegelhalter *****

Statistics have a huge impact on us - we are bombarded with them in the news, they are essential to medical trials, fundamental science, some court cases and far more. Yet statistics is also a subject than many struggle to deal with (especially when the coupled subject of probability rears its head). Most of us just aren't equipped to understand what we're being told, or to question it when the statistics are dodgy. What David Spiegelhalter does here is provide a very thorough introductory grounding in statistics without making use of mathematical formulae*. And it's remarkable.

What will probably surprise some who have some training in statistics, particularly if (like mine) it's on the old side, is that probability doesn't come into the book until page 205. Spiegelhalter argues that as probability is the hardest aspect for us to get an intuitive feel for, this makes a lot of sense - and I think he's right. That doesn't mean that he doesn't cover all …

The Best of R. A. Lafferty (SF) – R. A. Lafferty ****

Throughout my high school years (1973–76) I carefully kept a list of all the science fiction I read. I’ve just dug it out, and it contains no fewer than 1,291 entries – almost all short stories I found in various SF magazines and multi-author anthologies. Right on the first page, the sixth item is ‘Thus We Frustrate Charlemagne’ by R. A. Lafferty, and his name appears another 32 times before the end of the list. This isn’t a peculiarity of my own tastes. Short stories were much more popular in those days than they are today, and any serious SF fan would have encountered Lafferty – a prolific writer of short fiction – in the same places I did.

But times change, and this Gollancz Masterworks volume has a quote from the Guardian on the back describing Lafferty as ‘the most important science fiction writer you’ve never heard of’. Hopefully this newly assembled collection will go some way to remedying that situation. It contains 22 short stories, mostly dating from the 1960s and 70s, each w…

David Beerling - Four Way Interview

David Beerling is the Sorby Professor of Natural Sciences, and Director of the Leverhulme Centre for Climate Change Mitigation at the University of Sheffield. His book The Emerald Planet (OUP, 2007) formed the basis of a major 3-part BBC TV series ‘How to Grow a Planet’. His latest title is Making Eden.

Why science?

I come from a non-academic background. None of my family, past or present, went to university, which may explain the following. In the final year of my degree in biological sciences at the University of Wales, Cardiff (around 1986), we all participated in a field course in mid-Wales, and I experienced an epiphany. I was undertaking a small research project on the population dynamics of bullheads (Cotus gobio), a common small freshwater fish, with a charismatic distinguished professor, and Fellow of the Royal Society in London. Under his guidance, I discovered the process of learning how nature works through the application of the scientific method. It was the most exciting t…