There's a lot of social media and general internet buzz regarding Big Data, but what exactly is it? Here are 5 interesting things to know about Big Data.
1. What is it?
Simply put, Big Data refers to large data sets that are computationally analysed to reveal patterns and trends relating to a certain aspect of the data. There's no minimum amount of data needed for it to be categorised as Big Data, as long as there's enough to draw solid conclusions.
explains the different facets of Big Data through the 8 V's.
Fig. 1: M-Brain - Big Data with 8 V's
2. How can I access Big Data?
Big Data is available in an endless number of places and it's only increasing as time goes on. A simple Google search will enable you to find a data repository for just about everything. A lot of people aren't aware of just how much data is already available for access and analysis. KD Nuggets has an extensive list of Datasets for Data Mining and Data Science available here - https://www.kdnuggets.com/datasets/index.html
How you can access and utilise this data can be split into six parts:
Before anything happens, some data is needed. This can be gained in a number of ways, normally via an API call to a company's web service.
The main difficulty with Big Data is managing how it will be stored. It all depends on the budget and expertise of the individual responsible for setting up the data storage as most providers will require some programming knowledge to implement. A good provider should allow you a safe, straight-forward place to store and query your data.
Like it or not, data sets come in all shapes and sizes. Before you can even think about how the data will be stored, you need to make sure it is in a clean and acceptable format.
Data mining is the process of discovering insights within a database. The aim of this is to provide predictions and make decisions based on the data currently held.
Once all the data has been collected it needs to be analysed to look for interesting patterns and trends. A good data analyst will spot something out of the ordinary, or something that hasn't been reported by anyone else.
Perhaps the most important is the visualisation of the data. This is the part that takes all the work done prior and outputs a visualisation that ideally anyone can understand. This can be done using programming languages such as Plot.ly and d3.js or software such as Tableau.
3. Are there careers related to Big Data?
With the growing access to Big Data, it should come as no surprise that the volume of careers related is on the rise as well. According to the Data Motion
, a Big Data Engineer would earn an average salary of $150,000 a year.
Fig. 2: Top 10 Big Data Jobs
4. Is it a growing industry?
In short, yes. The general interest and access to Big Data is on the rise. This Google Trends chart (https://g.co/trends/pxXJa
) shows the increase in popularity of the search term â??Big Data' between 2004 and the present day.
Fig. 3: Google Trends for Big Data, 2004-2018
According to IDC
, "Worldwide revenues for big data and business analytics (BDA) will reach $150.8 billion in 2017, an increase of 12.4 percent over 2016". The company goes onto estimate that by 2020, big data revenues could top $210 billion.
5. How do I learn more?
Big Data is a broad subject, so learning it all requires knowledge of several areas. Someone looking to work in the field would need an array of certain skills, including one or more of the following:
- A knowledge of a programming language that relates to data analysis, namely R, Python, SAS or SQL
- A good understanding of Maths and Statistics
- Experience on how to scrape a webpage
- Basic Excel skills
- Mathematics for Big Data
- Business and scientific applications of Big Data
- Big databases and NoSQL including MongoDB, Cassandra and Neo4J
- Analytics, machine learning and data visualisation using Weka, R and scikit-Learn
- Optimisation and heuristics for big problems
- Cluster computing with Hadoop, Spark, Hive and MapReduce