Big Data has become a new buzz word in the IT industry. Everyone is talking about it and repeatedly using it to impress others, even if they themselves don't really know what it means. It is often used out of context and more as a marketing gimmick. This article aims to explain what Big Data really is and how it will be useful in solving problems.
Physics and Mathematics calculations can give us the exact distance from the East Coast of the USA to the West Coast, accurate to about 1 yard. This is a phenomenal achievement and has been applied to various technologies in our daily life. But the challenge comes in when you have data that is not static, which is constantly changing and changing at a rate and in volumes which are humongous to determine in real-time. The only way we can process this data is by using computers.
IBM data scientists break big data into four dimensions: volume, variety, velocity, and veracity. But there are many more aspects of it. Big data can be described by the following characteristics:
Volume is the size of the data which determines the value and potential of the data under consideration and whether it can actually be considered as Big Data or not. Variety means that the category to which the data belongs to is also a very essential fact that needs to be known by the data analysts. This helps the people, who are closely analyzing the data and are associated with it, to effectively use the data to their advantage and thus upholding the importance of the data. Velocity refers to how fast the data is generated and processed to be useful. The variability of the data can also be a problem for the analysts. Veracity is the quality of the data being captured. The accurate analysis depends on the veracity of the source data.
An article on the Tibco Blog provided a very simple analogy to understand what Big Data really is. Their blog says that:
"One analogy for Big Data analysis is to compare your data to a large lake... Trying to get an accurate size of this lake down to the last gallon or ounce is virtually impossible... Now let's assume that you have built a big water counting machine... You feed all of the water in the lake through your big water counting machine, and it tells you the number of ounces of water in the lake... for that point in time."
A better, more visual analogy is presented by Paul Lewis of Hitachi Data Systems. He often explains about Big Data by showing a picture cartoon filled with hundreds of people who are doing different things in the picture, looking busy. He explains:
"You need to find the person with the suitcase of money (Value)... but there are many people (Volume), all walking at various speeds running to work (Velocity), from all walks of life (Variety), some are crooks (Veracity)."
Importance and Benefits
One of the major reasons why we need Big Data is for prediction and analysis. One of the best examples where Big Data can be seen in action is the Large Hadron Collider experiment, in which about 150 million sensors deliver data 40 million times per second. After filtering and refraining from recording more than 99.999% of these streams, there are 100 collisions of interest per second. Another important example is Facebook, which handles over 50 billion user photos.
Healthcare is another area where Big Data can play a significant role. One of the most amazing example is Google Flu Trends, which analyses search data from various locations and uses the Data Analysis to identify patterns of Influenza epidemics and endemics around the world. Although this data is not necessarily accurate or may have a lot of false positives, it highlights the potential of what such data can show you.
A key benefit of Big Data is that there is no specific format in which it is stored. Crudely put, it is a raw dump of data i.e. it is unstructured. The system uses complex algorithms to classify and process this data, which makes it very special.