Big data issues

4 minutes estimated reading time

Big data origins

In the past, what is now included in the envelope of big data resided with just a few organisations. The story of big data started with the US government. The government used a young company called IBM and their punch card technology to help tabulate their census data. Punch card technology started in the textile industry, where industrial revolution-era jacquard looms manufactured complex fabric patterns. Punch cards also controlled fairground organs and related instruments. It was with early tabulating machines made by IBM and others that started to change the world as we know it.
Computer History Museum
When the mainframe came along governments used them to manage tax collection and to run the the draft for Vietnam. It came a key part of the US anti-war protesters to destroy machine readable draft cards. (The draft card destruction didn’t affect the draft process. But burning the draft card was still an offence and some people underwent punishment.)

Credit agencies

Also around this time, the credit agency was coming into its own in the US. Over a period of 60 years, it had gradually accumulated records on millions of Americans and Canadians. The New York Times in 1970 described the kind of records that were held by Retail Credit (now known as Equifax):

…may include ‘facts, statistics, inaccuracies and rumors’ … about virtually every phase of a person’s life; his marital troubles, jobs, school history, childhood, sex life, and political activities.

These records helped to vet people for job applications, bank loans and department store consumer credit. It was like a private sector version of the J. Edgar Hoover files. Equifax moved to computerise its records. One reason was to improve the professionalisation of its business. This also had an implication on the wider availablity of credit information. Computerisation led to the Fair Credit Report Act in the US. This legislation was designed to give consumers a measure of transparency and control over their data.

Forty years later, mainframe computers are still used to process tens of thousands of credit card transactions every second. New businesses including social networks, search engines and online advertising companies have vast amounts of data; unlike anything a credit agency ever had.

The social, cultural & ethical dimensions of big data

The recent The Social, Cultural & Ethical Dimensions of “Big Data” event held at New York University by the Data & Society Research Institute was important. Events like these help society understand what changes to make in the face of rapid technological change.

Algorithmic accountability

The Algorithmic Accountability primer from the event highlights the seemingly innocuous examples of how technology like Google’s search engine can have far reaching consequences. What the Data & Society Research Institute called ‘filter bubbles’. Personalisation of search will change that consumers see from individual to individual. This discrimination could also be applied to items like pricing. Staples has produced an algorithm that based pricing on location of the web user; better off customers were provided with better prices. One of the problems of regulating this area is first of all defining what an algorithm actually is from a policy perspective.

Algorithmic systems are generally not static systems but are continually tweaked and refined, so represent a moving target. During my time at Yahoo! we rolled out a major change to the search algorithm every two weeks on a Wednesday evening US west coast time. I imagine that pace of change at the likes of Google and Facebook has only accelerated.

The problem with many rules based systems now is that we no longer write the rules or teach the systems; instead we give the system access to large data sets and it starts to teach itself – the results generally work but we don’t know why. This is has been a leap forward for what would be broadly based artificial intelligence, but makes these systems intrinsically hard to regulate.
concern with data practices
Given all this it is hardly surprising that research carried out  on behalf of President Obama by The Whitehouse showed a high level of concern amongst US citizens. More related content here.

More information

Jacquard Loom – National Museums Scotland
Separating Equifax from Fiction | Wired (Issue 3.05)
Data & Society | Algorithmic Accountability primer
This Landmark Study Could Reveal How The Web Discriminates Against You | Forbes
Websites Vary Prices, Deals Based on Users’ Information | WSJ
The 90-day review for Big Data | Whitehouse
Data & Society | Alogrithmic Accountability Workshop Notes
Digital Me: Will the next Cringely be from Gmail? | I, Cringely