There is huge material for the people who are interested in the machine learning with python. As we moved to the new year 2019, it is a good time to again visit the concept and get the way for the new learning path for mastering machine learning with Python.
This path will be split into three parts, one each for basic, intermediate and advanced topics. Always try and look upon these topics in the relative sense and after getting through the advanced post do not expect to be research-caliber machine learning engineers. This learning path will give you some understanding of programming, computer science concepts, and/or machine learning in an abstract sense, who are wanting to be able to use the implementations of machine learning algorithms of the prevalent Python libraries to build their own machine learning models.
Fear not if the steps seem mostly aimed at machine learning algorithms, as along the way you will also come across additional important concepts, such as data preprocessing, loss metrics, data visualization, and much more.
So grab a cup of your favorite beverage and settle in for the first of three in the series, and start mastering basic machine learning with Python in these 7 steps.
1. Mastering Python Basics
I looked for some updated materials for this section, beyond those I pointed out in previous iterations, both for the change and for the purpose of keeping up with recent versions of Python.
As you will be needing a number of Pythons more popular scientific libraries as we progress, I recommend using the Anaconda distribution, which you can download here, instead of installing components separately. Just launch the installer, and when it's done you will have Python, Jupyter notebook, and everything else you will need moving forward.
2. Understanding the Python Scientific Computing Environment
So, you have Python and the scientific computing stack installed and ready to go. But why?
Before going too much further, it's a good idea to understand what the scientific computing stack is, what its most prominent and important components are, and how they will be used in a machine learning environment.
This article from Dataquest, aptly titled Jupyter Notebook for Beginners: A Tutorial, dives into why we are using Jupyter notebooks at all and introduces some of the most important Python libraries you will encounter along this path, namely Pandas, Numpy, and Matplotlib.
The tutorials do not cover Scikitlearn, one of the main engines of the actual machine learning process in the Python ecosystem, which contains implementations of dozens of algorithms for you to implement in your own projects. However, the introductory article An introduction to machine learning with Scikitlearn, directly from the maintainers of Scikitlearn will give you an overview of its basics in 5 minutes.
As an exercise left to the reader, I would suggest locating and becoming familiar the contents of the documentation for Pandas, Numpy, Matplotlib, and Scikitlearn, and would keep the links handy as references moving forward. At any rate, make sure you are comfortable with the basics of these 4 tools specifically, as they are well used in basic Python machine learning.
Classification is one of the main methods of supervised learning, and the manner in which prediction is carried out as relates to data with class labels. Classification involves finding a model which describes data classes, which can then be used to classify instances of unknown data. The concept of training data versus testing data is of integral importance to classification. Popular classification algorithms for model building and manners of presenting classifier models include (but are not limited to) decision trees, logistic regression, support vector machines, and neural networks.
First, watch MIT professor John Guttags lecture on classification.
Then have a look at the following tutorials, each of which covers an elementary machine learning classification algorithm (how exciting, your first machine learning algorithm!).
Susan Li provides a detailed overview of implementing the most basic classifier, logistic regression, in Building A Logistic Regression in Python, Step by Step.
Once you have completed Susan's tutorial, follow Russell Browns concise Creating and Visualizing Decision Trees with Python.
Regression is similar to classification, in that it is another dominant form of supervised learning and is useful for predictive analysis. They differ in that classification is used for predictions of data with distinct finite classes, while regression is used for predicting continuous numeric data. As a form of supervised learning, training/testing data is an important concept in regression as well.
First, watch CMU professor Tom Mitchell's lecture on regression.
For analyzing data which do not have any classes which are prelabelled, clustering is used. The concept of maximizing intro class similarity is used by grouping the data instances together and for the classes which are differing, the similarity is minimized. This translates to the clustering algorithm identifying and grouping instances which are very similar, as opposed to ungrouped instances which are much less similar to one another. Clustering is learning which is unsupervised, and it does not require any labeling for classes.
For the clustering algorithm, clustering is the leading example but is not the only one. There are several clustering schemes, which includes hierarchical clustering, fuzzy clustering, and density clustering, as do different takes on centroid style clustering. For more on this, read Jake Huneycutts An Introduction to Clustering Algorithms in Python.
6. More Classification
For checking the complex algorithm, let us shift to the classification. In this lecture video, Watch CMUs Maria Florina Balcans discuss support vector machines (SVMs).
Then read Aakash Tandels Support Vector Machines - A Brief Overview, a high-level treatment of SVMs. Follow this up with Support Vector Machine vs Logistic Regression by Georgios Drakos.
7. Ensemble Methods
And this last thing, here we will learn about the methods of an ensemble.
Here we have a video lecture by Peter Bloem of Vrije University.
2 mostly explanatory articles:
Gradient Boosting from scratch, by Prince Grover
Random Forest Simple Explanation, by Will Koehrsen
Finally, follow these tutorials to try your hand at ensemble methods.
Introduction to Python Ensembles, by Sebastian Flennerhag
CatBoost vs. Light GBM vs. XGBoost, by Alvira Swalin
Using XGBoost in Python, by Manish Pathak
Hopefully, you have benefited from these 7 steps to mastering basic machine learning with Python.