Programming is an integral part of data science. Among other things, it is acknowledged that a person who understands programming logic, loops and functions has a higher chance of becoming a successful data scientist. But, what about those folks who never studied programming in their school or college days?
With the recent boom in data science, a lot of people are interested in getting into this domain. but don't have the slightest idea about coding. In fact, I too was a member of your non-programming league until I joined my first job. Therefore, I understand how terrible it feels when something you have never learned haunts you at every step. The good news is that there is a way for you to become a data scientist, regardless of your programming skills. There are tools that typically obviate the programming aspect and provide user-friendly GUI (Graphical User Interface) so that anyone with minimal knowledge of algorithms can simply use them to build high quality machine learning models.
Many companies especially startups have recently launched GUI driven data science tools. List of Tools for Data Science And Machine Learning People who dont have knowledge about programming:
RapidMiner (RM) was originally started in 2006 as an open-source stand-alone software named Rapid-I. Over the years, they have given it the name of RapidMiner and also attained 35Mn USD in funding. The tool is open-source for old version (below v6) but the latest versions come in a 14-day trial period and licensed after that.
RM covers the entire life-cycle of prediction modeling, starting from data preparation to model building and finally validation and deployment. The GUI is based on a block-diagram approach, something very similar to Matlab Simulink. There are predefined blocks which act as plug and play devices. You just have to connect them in the right manner and a large variety of algorithms can be run without a single line of code. On top of this, they allow custom R and Python scripts to be integrated into the system.
DataRobot (DR) is a highly automated machine learning platform built by all time best Kagglers including Jeremy Achin, Thoman DeGodoy and Owen Zhang. Their platform claims to have obviated the need for data scientists. This is evident from a phrase from their website. Data science requires math and stats aptitude, programming skills, and business knowledge. With DataRobot, you bring the business knowledge and data, and our cutting-edge automation takes care of the rest.
These processes will obviously iterate in different orders. The BigML platform provides nice visualizations of results and has algorithms for solving classification, regression, clustering, anomaly detection and association discovery problems. They offer several packages bundled together in monthly, quarterly and yearly subscriptions. They even offer a free package but the size of the dataset you can upload is limited to 16MB.
Cloud AutoML is part of Google's Machine Learning suite offerings that enables people with limited ML expertise to build high quality models. The first product, as part of the Cloud AutoML portfolio, is Cloud AutoML Vision. This service makes it simpler to train image recognition models. It has a drag-and-drop interface that let's the user upload images, train the model, and then deploy those models directly on Google Cloud.
Cloud AutoML Vision is built on Google's transfer learning and neural architecture search technologies. This tool is already being used by a lot of organizations.
Paxata is one of the few organizations which focus on data cleaning and preparation, and not the machine learning or statistical modeling part. It is an MS Excel-like application that is easy to use. It also provides visual guidance making it easy to bring together data, find and fix dirty or missing data, and share and re-use data projects across teams. Like the other tools mentioned in this article, Paxata eliminates coding or scripting, hence overcoming technical barriers involved in handling data.
MLBase is an open-source project developed by AMP (Algorithms Machines People) Lab at the University of California, Berkeley. The core idea behind this is to provide an easy solution for applying machine learning to large scale problems.
It has 3 offerings:
MLlib: It works as the core distributed ML library in Apache Spark. It was originally developed as part of MLBase project, but now the Spark community supports it
MLI: An experimental API for feature extraction and algorithm development that introduces high-level ML programming abstractions
ML Optimizer: This layer aims to automating the task of ML pipeline construction. The optimizer solves a search problem over feature extractors and ML algorithms included in MLI and MLlib.
Auto-WEKA is a data mining software written in Java, developed by the Machine Learning Group at the University of Waikato, New Zealand. It is a GUI based tool which is very good for beginners in data science. The best part about it is that it is open-source and the developers have provided tutorials and papers to help you get started.
Driverless AI is a magical platform for enterprises from h2o.ai that supports automatic machine learning. A 1 month trial version is available as a docker image at this link. All you have to do is using simple dropdowns select the files for train, test and mention the metric using which you want to track model performance. Sit back and watch as the platform with an intuitive interface trains on your dataset to give excellent results at par with a good solution an experienced data scientist can come up with.
When there are so many big name players in this field, how could Microsoft lag behind? The Azure ML Studio is a simple yet powerful browser based ML platform. It has a visual drag-and-drop environment where there is no requirement of coding. They have published comprehensive tutorials and sample experiments for newcomers to get the hang of the tool quickly. It employs a simple five step process:
Import your dataset
Perform data cleaning and other pre-processing steps, if necessary
It is one of the most recognizable brands in the world. IBM Watson Studio provides a beautiful platform for building and deploying your machine learning and deep learning models. You can interactively discover, clean and transform your data, use familiar open source tools with Jupyter notebooks and RStudio, access the most popular libraries, train deep neural networks, among a a vast array of other things.
For people just starting out in this field, they have provided a bunch of videos to ease the introductory phase. You can choose to take a free trial and check out this awesome tool by yourself. The above video guides you through how to create a project in Watson Studio.