The current 'big data' era is not new. There have been other periods in human civilisation where we have been overwhelmed by data. By looking at these periods we can understand how a shift from discrete to abstract methods demonstrate why the emphasis should be on algorithms not code.
If the entire span of human history was a day then we would only have started talking after 11:07PM. It would have been a quiet day. But in the short time frame that followed we quickly amassed a wealth of knowledge and stories that we needed to pass on to other people. You can imagine a time, initially, when it was possible to pass on all knowledge through the form of dialogue from generation to generation, parent to child, teacher to student. The volume of knowledge we produced as a society, however, quickly increased to a point at which there was too much knowledge to pass on in the form of dialogue.
We needed to codify this information in another form to save and distribute it. Codifying this knowledge in the form of writing would have represented a significant technological shift for those societies. Indeed, the character of Socrates in Plato's Phaedrus worried that this technological shift to writing and books was a much poorer medium than dialogue and would diminish our ability to develop true wisdom and knowledge. Needless to say that I don't think Socrates would have been a fan of TV.
The point is that a dialogue represents a discrete means to communicate information. You know the person you are talking to and there is a direct interaction between the people involved in the dialogue through argument and counter argument. A book, on the other hand, is an abstract means of communication in that there is no direct interaction between writer and reader. The writer cannot know who, how many, when or where their book will be read. We can have some idea of our audience and tailor the content to that but in most cases it would represent an abstract way to pass on knowledge or to learn new skills.
Another era of big data occurred when we moved from a simple form of calculations to the abstract form of theorems, symbols and algorithms we consider as mathematics today. The first recorded calculations took place in Mesopotamia around 2500 BC. The Mesopotamians needed to work out how many people a barn full of grain could feed.
The Rise of Coding:
Coding or programming first came to prominence when people such as Grace Hopper worked on the Harvard Mark I computer in 1945. Before this, computers, if you could call them that, were simply calculating machines. During World War II for example, artillery guns required tables to be aimed properly. The tables were the results of differential equations which took into consideration hundreds of different factors such as distance, elevation, wind speeds, temperature, humidity and so on. Incidentally, the name computer comes from the term user to describe the woman who operated these machines during the war. They were know as computers. The operators would have to use punch cards and crank handles to solve the equations. It could take 170 people a month to create just one card.
Coding arose out of the need to find an easier way to carry out these calculations. If a way could be found to codify the instructions to tell the hardware what operations it need to perform then the manual operations could be eliminated. This would also allow for different instructions to be coded so that one piece of hardware could perform multiple different operations.
Does this seem similar to anything we discussed before? Whereas in Mesopotamia clay tables were used to perform calculations, coding was the new medium used in the mid 20th century. While it seems more advanced, this is still a discrete operation since it is working on specific calculations. It is just a more efficient way to perform these specific calculations. Coding was a way to eliminate the need for manual operators and enabled people to process and calculate even more data.
Algorithms Vs Code
Algorithm: A sequence of steps that describes an idea for solving a problem meeting the criteria of correctness and terminability. An abstract recipe for the calculation independent of implementation.
Code: A set of instructions for a computer. A concrete implementation of the calculation on a specific platform in a specific programming language.
This new ability to codify instructions directly to computers enabled people to implement more complex sets of instructions in the form of algorithms. Algorithms have been around for much longer than the invention of coding. The Muslim mathematician Al-Khawarizm described algorithms for solving linear and quadratic equations around 820AD. The word algorithm comes from the Latinization of his name to Algoritmi, and the term algebra comes from "al-jabr", a name for one of the operations Al-Khawarizm used to solve quadratic equations. Algorithms are a finite number of calculations or instructions that, when implemented, will yield a result. As we have seen, code is a way to provide instructions directly to a computer. It is this aspect which means it is well suited to implementing algorithms, which are, in essence, just a series of different operations to be performed in a certain order.
As with earlier periods of big data, the amount of information we were working with increased. Advances in coding, in terms of design and usage, when combined with Moore's law resulted in performance improvements that enabled us to deal with the increasing digitisation of our world. You could still write code to query a database in order to find a list of related resources. The discrete nature of the operation was maintained since humans were still writing the code to tell the hardware what operations to perform. While the operations became more and more complex it was still the human codifying instructions. But algorithms had already started to show their potential to create a new era of abstraction.
The Rise of the Algorithm:
In this way we can see that algorithms are very different from code. You can implement algorithms via code. And, it is true, that the way an algorithm is implemented in code can impact its performance. Using a binary heap, for example, as opposed to sorting a different data structure, is more efficient when you want to find the smallest or largest element in a sequence. But you do not need to be able to code to create an algorithm any more than you need to know how to read music to write a song.
And while everyone knows about the magic of Moore's law to engender the performance improvements we need to drive our digital economy, few know that in many areas algorithms generate more performance improvements than any hardware gains. Indeed, a 2010 Federal report noted that algorithmic improvements have resulted in significant gains in areas such as speech recognition, natural language processing and logistics.
Even more remarkable and even less widely understood is that in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed. Report to the President and Congress: Designing a digital future
The amount of data we have available to us now means that we can no longer think in discrete terms. This is what big data forces us to do. It forces us to take a step back, an abstract step back to find a way to cope with the tidal wave of data flooding our systems. Traditionally you would write code to search your data given a certain pattern or set of parameter. For example, you may want to search your database of customer for any customers who purchased more than 2 items and spent over 30 in the last two weeks. You might want to reach out to these people with certain offers. You are looking for the data that matches this pattern. With big data it's the opposite, you have so much data you are looking for patterns that match the data.
There is so much data humans cannot find the patterns. You have to take another step back. This is the abstract step where algorithms enable to us to find patterns via clustering, classification, machine learning and any other number of new techniques underpinned, not by code, but by algorithms. This new step is needed to find the patterns you or I cannot see. Like the spectrum of light where there are wavelengths which the human eye cannot see, there are patterns we cannot see beyond certain volumes of data. The volume over which we cannot see patterns is big data.
This abstract algorithmic step goes even further. Not only will it search for a patterns but it will also create the code we need to do this. In his book The Master Algorithm Pedro Domingos describes how learner algorithms can be used to create new algorithms which in turn can write the code we need, with machine learning, computers write their own programs, so we don't have to. To achieve this we need to better understand how these algorithms work and how to tailor them to suit our needs. Otherwise we will be unable to fully unlock the potential of this abstract transition
The Industrial Revolution automated manual work and the Information Revolution did the same for mental work, but machine learning automates automation itself. Without it, programmers become the bottleneck holding up progress. With it the pace of progress picks up- Pedro Domingos, The Master Algorithm
We will still need programmers no matter what form the discrete to abstract transition may take. That is not the point. It is not to imply that coding is not important or that there are no improvements that better coding can realise. Instead the point is that we need to start thinking about algorithms. They are not just for mathematicians or academics. There are algorithms all around us and you don't need to know how to code to use them or understand them. Right now there are people coming up with new algorithms by applying evolutionary techniques to the vast amounts of big data via genetic programming to find optimisations and improvement in different fields. People have even created better optimisation techniques by looking at how metal cools and creating an algorithm to model that (it is called simulated annealing and its a great example of the different ways we can start thinking about algorithms, check it out.
The current rise of coding as a key skill of the new digital economy, keen to learning and to read, has obscured our view of algorithms. Algorithms are increasingly part of our everyday lives, from recommending our films to filtering our news and finding our partners. We need to better understand them to better understand, and control, our own futures.