Non-Fiction Reviews

Big Data
A Very Short Introduction

( 2017 ) , Oxford University Press, 7.99 / US$11.95, pbk, xix+ 125pp, ISBN 978-0-198-77957-5


This is a very useful, concise introduction to the topic of big data: principally the data gathered on large numbers of people. The ground it covers includes: big data's useful properties; the problem of data storage and analysis; usage in medicine and business; and security as well as societal issues.

Statistician Dawn Holmes starts of recounting and incident in the Sparta-Athenian war of 431BC. One party wanted to scale a defended wall and for that they had to estimate its height so as to construct ladder of the appropriate length. They did not want to get needlessly close to measure the wall as they would come under fire, so they decided to count the number of bricks the wall was high. However counting at a distance was fraught wit difficulty as miscounts were common. So a large number of soldiers were each, individually given the task. The average of their answers was used to multiply the dimensions of an individual brick and so the wall's height was calculated.  This demonstrates the emergent properties of big data whereby a trend detectable en masse is elucidated dispite the vagaries of individual data points.

Prior to the internet and the computer systems of the early 20th century to both capture and handle large volumes of data, data handlers had to rely on comparatively small sampling. This is fraught with potential error as anyone who has followed voters' intentions in a run-up to an election will know all too well. Yet wit such systems larger populations (or population samples) can be analysed. Yet, as Holmes later re-visits in more detail, this is a double edged sword: yes, there can be benefits arising from fast data accruing to both data gathers/users as well as individuals whose data is being gathered, but there can also be disadvantage as the data can be used to the disadvantage of some of those from whom data is gathered.

Holme's illustrates the uge growth in computing enabling big data gathering and usage, noting that the NASA Apollo mission control computer mainframes each had just 8 Mb of processing memory, while the onboard Apollo 11 computer had just 64Kb.  Today (early 21st century) even modest home computers have processing memory of a couple of MB with hard drives of terabytes (Tb) worth of storage. This in turn will undoubtedly seem small even just a few years in the future.

We then get a whistle-stop, summary tour of some of the technical detail behind big data, including: distributed file systems (DFS), NoSQL databases and CAP theory. Also covered is the use of big data in healthcare include the fascinating example of how Google (successfully and unsuccessfully) followed flu trends.

In the chapter on business, illustrative examples covered include that of: Amazon, pay-per-click advertising, recommender systems, and Netflix.

The chapter I found particularly pertinent was the chapter on security which included a fair bit on the Snowdon unauthorised data leak incident of the best part of half a million files of US national security (military and political) data. Nearly everyone who uses computers in the developed world is aware of the risk from hackers. This chapter is a treasure-trove of information which distilled down brings us back to big data's double edged sword. For example, yes, we can store our pictures and files in the cloud that has a virtually zero (near-negligable) chance of them being accidentally deleted, but on the other hand it is very difficult to ensure that a file that a user does want deleted will in fact have all copies deleted and that this is really in the hands of the data service provider: individual users are sacrificing their control over their data.  For me this chapter's most telling comment relates to the sociology lecturer Janet Veresi of Princeton who conducted a personal experiment to see if she could keep her pregnancy secret from online marketers. She avoided social media, used TOR (the dark web) and only bought related purchases in-store and in cash. It was all perfectly legal but ultimately she concluded that opting out was costly, time consuming and made her look she said like a 'bad citizen'. Having said that, using a TOR browser does keep you safe from trackers and is worth considering: there are many legitimate users of TOR which is not as some belief solely the tool of criminals.

This decade we are witnessing a sea-change. Dawn Holmes concludes that big data is power and its potential for good is enormous. But it is also power and how we prevent its abuse is up to us.  I whole-heartedly agree with this sentiment. Not mentioned in this book are the lessons from science fiction (1984 and The Shockwave Rider to name but two examples). Yet this site's regulars principally those into science and science fiction can bring that to the table themselves and in doing find this little book absorbing.  The machines really are taking over and we had better be aware of it.

Jonathan Cowie

[Up: Non-Fiction Index | Top: Concatenation]

[Updated: 18.1.15 | Contact | Copyright | Privacy]