We live in a world that is turning digital faster than we can think of. There is wave of digital transformation and big data is giving the necessary edge to the organisations; analyzing customer's behaviour, personalizing the customer's experience which directly improves the customers experience and their companies sell improving the company's revenue. There has been a market growth for Big Data as more and more enterprises are implementing data driven strategies in their working environment. Talking about the last few years Apache Hadoop has been the most popular go to tool of big data analyzation. But that does not mean that is the only tool, there are many other big data tools out there and each and every one of them comes with the promise to save money, your time and help you uncover never seen before business insights.
This post covers a few of the other well known big data tools
1. Avro: a tool that was developed by Doug Cutting & is being used for data serialization for encoding the schema of Hadoop files.
2. Cassandra: is a distributed and Open Source database. It is designed to handle big amounts of distributed data across commodity servers while providing a highly available service, it is a NoSQL solution that was initially developed by Facebook. It is used by many organizations like Netflix, Cisco, and Twitter.
3. Drill: An open source distributed system for performing interactive analysis on large-scale datasets. It is similar to Google's Dremel, and is managed by Apache.
4. Elasticsearch: An open source search engine built on Apache Lucene. It is developed on Java, can power extremely fast searches that support your data discovery applications.
5. Flume: is a framework for populating Hadoop with data from web servers, application servers and mobile devices. It is the plumbing between sources and Hadoop.
6. HCatalog: is a centralized metadata management and sharing service for Apache Hadoop. It allows for a unified view of all data in Hadoop clusters and allows diverse tools, including Pig and Hive, to process any data elements without needing to know physically where in the cluster the data is stored.
7. Impala: provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase using the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries.
9. Kafka: is a distributed publish-subscribe messaging system that offers a solution capable of handling all data flow activity and processing these data on a consumer website. This type of data (page views, searches, and other user actions) is a key ingredient in the current social web area.
10. MongoDB: is a NoSQL database oriented to documents, developed under the open source concept. This comes with full index support and the flexibility to index any attribute and scale horizontally without affecting functionality.
11. Neo4j: is a graph database & boasts performance improvements of up to 1000x or more when in comparison with relational databases.
12. Oozie: is a workflow processing system that lets users define a series of jobs written in multiple languages - such as Map Reduce, Pig and Hive. It further intelligently links them to one another. Oozie allows users to specify dependancies.
13. Pig: is a Hadoop-based language developed by Yahoo. It is relatively easy to learn and is adept at very deep, very long data pipelines.
14. Storm: is a system of real-time distributed computing, open source and free. Storm makes it easy to reliably process unstructured data flows in the field of real-time processing. Storm is fault-tolerant and works with nearly all programming languages, though typically Java is used. Descending from the Apache family, Storm is now owned by Twitter.
15. Tableau: is a data visualization tool with a primary focus on business intelligence. You can create maps, bar charts, scatter plots and more without the need for programming. They recently released a web connector that allows you to connect to a database or API thus giving you the ability to get live data in visualization.