What can data mining and big data do?
In short, they empower us with the ability of forecasting.
1. Our lives have been digitalized
Today, each of the many things we do everyday can literally be recorded. Every credit card transaction is digitalized and traceable; Our public presence is consistently being monitored by the many CCTV's hanging around every corner of the city; For businesses, the majority of the financial and operating data are saved in some kinds of ERP's; And with the rise of wearable devices, every heartbeat and breath is being digitalized and saved into usable data. Just when so much of our lives are being digitalized, a computer can now "understand" our world better than ever before.
2. If the pattern remains unchanged, the past = future
Many of the different things in our lives show patterns. For example, a person is likely to travel between work and home in any working days and either go on a vacation or watch a movie in any non-working days, and this pattern is unlikely to change. A store will have it's peak hours and slacking times of any single day and this pattern is unlikely to change. A business will demand higher labor input in certain months of the year and this pattern is unlikely to change.
Summing up point 1 and point 2, we can conclude that it is very possible for a computer to predict the future given if the patterns in the past are provided as these patterns are most likely consistent over a prolonged period of time.
If a computer can predict people's lifestyles, it will know exactly when is the best time to fit in a promotion, such as a promotion for a car wash if this person tends to get a car wash on every Friday of the week, or a coupon of a hotel stay if this person tend to go on a vacation on March of every year. Businesswise, a computer can also predict a store's sales forecast throughout the day then build the business strategy to maximize total revenue. For enterprises, a computer can also design the best operational plan consisting of the most reasonable work force arrangement.
As soon as the future becomes predictable, we can always plan ahead and prepare for the best move possible. Just like Neo in "The Matrix", he's able to dodge all the bullets because he can see where the bullets' coming from clearly. According to Sherlock Holmes, "an advanced grasp of the mathematics of probability, mapped onto a thorough apprehension of human psychology, and the known dispositions of any given individual can reduce the number of variables considerably", in another word, "big data gives us the power to predict the future". This is the power of data mining. Data mining is consistently tied to Big Data simply because Big Data enables massive datasets, thus providing the base to all predictions.
So, what exactly are Big Data, Data Mining, and Machine Learning?
When the amount of data is tremendous, it is obvious this data cannot be dealt with on any single machine. An extremely large file, letâ??s say 10GB, chances are you won't be able to open it in any Windows systems before it crashes down the whole thing. Big data has been developed for this exact purpose. You can think of it as a special software, which splits a big file into much smaller ones, which can then be processed on numerous machines. The process of dividing and combing of the data pieces is known as MapReduce. And the software framework most commonly used for this process, it's called Hadoop. Hadoop solves the basic problem, and there is a bunch of tools to be used along with Hadoop such as Pig, Zookeeper and Hive to make the process even easier. Hadoop together with it's many associated tools are generally referred as the "Big Data Technology".
Just now we had touch based on how a piece of data can be processed. Assuming this piece of data contains a group of shoppers' purchasing behaviors, including the total number of items purchased, the number of items purchased by each shopper. This is so far a simple statistical analysis. However, if our goal was to analyze the correlation between the different types of shoppers, or if we want to extrapolate the specific preference of a specific type of shoppers, or even to predict any shopper's gender or age, we'll need a much more complicated model, which we called Algorithm. Machine Learning can be more easily understood as all different kinds of algorithms developed for data mining purposes, such as logistic regression, decision tree, collaborative filtering, and much more.
Through the application of machine learning algorithms, existing data can actually be utilized to predict for the unknowns, and this is exactly why the wonders of Data Mining is closely connected to Machine Learning. Nevertheless, the strength of any machine learning algorithm depends heavily on the supply of massive datasets. Keep in mind regardless of how sophisticated an algorithm is, no inspirational prediction can be made from a few lines of data. Big data technology is the premise of machine learning, and with the use of machine learning, we are able to gain valuable insights from existing datasets, and this is data mining.
The article was originally published here