Machine Learning: It's Origin and Influence on other Fields

By Jyoti Nigania |Email | Jul 20, 2018 | 7461 Views

Machine Learning is changing the way we do things, and it has started becoming main-stream very quickly. While many factors have contributed to this increase in machine learning, one reason is that it is becoming easier for developers to apply it. And, that is through open source frameworks. Yet most would agree that these days the largest fraction of machine learning researchers come from computer science.

What are the origins of Machine Learning? How machine learning developed as the dominant and powerful tool?

Answered by Yisong Yue on Quora:
This is a tricky question to answer, because machine learning is one of those fields that combines and builds on results from many other fields. Some people would say machine learning started from statistics, and others would say that it started from computational neuroscience. Yet most would agree that these days the largest fraction of machine learning researchers come from computer science.

What can be recognized as modern machine learning really came into its own in the late 1980's and early 1990's. So rather than say that any one field is "The origin" of machine learning, here are list of some fundamental contributions from various fields that helped formed the early intellectual core of modern machine learning.

The fact that machine learning traces bits and pieces of its origins to many different fields is not surprising, given that it is a fundamental problem that manifests itself in many different fields. In statistics, the problem is manifested from a primarily computational standpoint (i.e. how to efficiently train large complex models). In computer science and artificial intelligence, the problem is manifested in how to train more robust versions of the brittle rule-based AI systems that led to the AI winter. In neuroscience, the problem is manifested in how to design operational models of the brain. And so on and so forth.

The three fields that arguably had the largest influence on early machine learning are statistics, neuroscience, and computer science. But many other fields had a significant influence as well.

Statistics & Probability Theory:
  • VC dimension: One of the most iconic theoretical foundations in machine learning is Vapnik-Chervonenkis theory, of which the VC dimension is the most well known concept. VC theory is a way to quantify Sample complexity, and directly led to the development of the Support vector machine. The modern version of SVMs, proposed by Cortes and Vapnik, was first published in 1995 (Support-vector networks).
  • Bayesian inference: Bayesian inference is a hallmark of model estimation, and is used in models ranging from Gaussian Mixture Models to Latent Dirichlet allocation. Bayesian inference can get very expensive, once the model gets complicated, so care needs to be done to make sure one can still do it efficiently. The two main approaches are sampling via Markov chain Monte Carlo and Variational Bayesian methods. Both these approaches can be augmented with computational concepts to be more computationally efficient.
  • Multi-armed bandit: The MAB problem characterizes a simple class of sequential experimental design problems, where the design of experiments must be coupled with some other goal. This is the keystone paper. See also: What is the multi-arm bandit problem? What are some of its implications?
  • Bias-variance tradeoff: The bias-variance tradeoff is a central concept in the design and analysis of machine learning algorithms, and is often used to reason about issues such as overfitting. Optimizing the bias-variance tradeoff led to very popular ensemble methods such as bagging and random forests.
  • Principal component analysis: Dimensionality reduction can be thought of as possibly the simplest form of statistical learning, and PCA is the canonical dimensionality reduction approach. It is a workhorse in machine learning and data science these days.

Computer Science & Artificial Intelligence:
  • Probably approximately correct learning: PAC learnability is an extremely important concept in the analysis of machine learning methods. It provides a framework for reasoning about the trade-offs between accuracy, reliability and sample efficiency of machine learning methods. Leslie Valiant won the 2010 Turing Award in part for his seminal contributions on PAC learning.
  • Online machine learning: Online learning is setting where the learning algorithm receives data in a stream and must learn well in that setting. The concept of comparing batch vs online algorithms has been around for a while in computer science, and that way of thinking influenced early research in online machine learning.
  • Query Complexity in Active learning: Query complexity is an important concept in theoretical computer science. The basic idea is, given an oracle that represents a concept, how many queries to the oracle does it take to learn a certain concept from a concept class? The most recognized common-day instantiation of this is Twenty Questions, where oracle has in mind some object and answers yes or no questions. These concepts have a strong influence in areas of machine learning such as active learning.
  • Bayesian networks & Causal Inference: One of the big early thrusts in AI was probabilistic and causal reasoning, of which Bayesian networks are one of the most iconic models for. Bayesian networks continue to be used today, and more breakthroughs in causality is always something that people hope for. Judea Pearl won the 2011 Turing Award for his seminal contributions to this field.
  • Viterbi Algorithm: Probably the most widely used dynamic programming approach in machine learning, the Viterbi algorithm (and the related Forward-backward algorithm) are used for inference in Hidden Markov models and other bounded tree-width Graphical models

  • Perceptron: The study of computational models of the brain and neurons led to the discovery or invention of the perceptron by Frank Rosenblatt. The perceptron is one of the most important methods in early machine learning, due to its simplicity and relative ease of analysis. In many cases, when one first studies how to tackle new types of machine learning problems, people first try to augment the perceptron method to get a handle on the new problem. For instance, the structured perceptron was the first machine learning method proposed for structured prediction problems.
  • Artificial neural network: In some ways, ANNs are just more extravagant versions of perceptrons. Of course, this has a very significant effect because the ANNs can learn multiple layers. I don't think I need to belabor the importance of neural networks in this day and age. Even back in the early days of machine learning, ANNs were used regularly just not as much as they are today.
  • Neural coding: This is the study of how the brain encodes patterns and activities, and directly motivated methods such as Autoencoders, which are a type of artificial neural network.

Functional Analysis:
  • Reproducing kernel Hilbert space: Given that many machine learning problems operate on continuous data and model spaces, functional analysis is quite useful for analyzing many learning problems. One such concept is the RKHS, which is used to derive non-linear Support vector machines and Gaussian processes.

Statistical Mechanics & Physics:
  • Markov random field: Originally used in statistical mechanics, MRFs are arguably the most fundamental type of probabilistic graphical model used in machine learning. The complexity of how the distribution of any set of variables can be formulated as an MRF is given by the Hammersley-Clifford theorem.
  • Mean field theory: I chose mean field as being representative of other methods for approximating a complicated distribution with a simpler one (cf. Variational Bayesian methods).
  • Gibbs sampling: The other way to approximate a complicated distribution is via sampling, of which Gibbs sampling (a special case of MCMC sampling) is the most popular method in machine learning.

Operations Research & Optimization:
  • Markov decision process & Hidden Markov model: MDPs are probably the most popular way to model reinforcement learning problems, and HMMs is the gateway model to more general Structured Prediction.
  • Nonlinear programming: Most machine learning problems reduce to some kind of optimization problem. In fact, I think that these days machine learning is probably the largest (or at least highest profile) consumer of continuous optimization methods. Many machine learning papers appearing at ICML/NIPS/KDD look suspiciously like optimization papers. Learning the KKT conditions used to be an integral part of a machine learning education (nowadays the growth in data and the shift towards stochastic gradient descent has made things like the KKT conditions less central to machine learning).

Behavioral Psychology:
  • Reinforcement learning: In many ways, reinforcement learning is a generalization of standard supervised machine learning to a generic interactive closed-loop setting (see also: Why Reinforcement Learning is Important). Much of the initial formalization was inspired by behavioral psychology, and has applications in robotics, planning, modeling users with states. Reinforcement learning also generalizes the Multi-armed bandit problem.
Information Theory & Signal Processing:
  • Entropy (information theory): Entropy is an important concept in reasoning about the uncertainty of a distribution. It can be used as a guiding principle in how uncertain a learned model is, how compressible a dataset is, and about hardness results in learning problems. More generally on the hardness issue, information theoretic limits are an important component of learning theory. For instance, knowing information theoretic lower bounds will inform us about what is the minimum amount of data or signal to guarantee learnability in different types of learning problems. Entropy and related concepts such as Mutual information are also used as decision criteria for Active learning.
  • Data compression: Data compression is an important issue in machine learning. In fact, one could say that learning models is just a fancier form of data compression. Principal component analysis is one example from a statistical & geometric point of view, but there are many other viewpoints as well. For instance, Compressed sensing and Random projection are other compression methods.
Hence, yet most would agree that these days the largest fraction of machine learning researchers come from computer science.

Source: HOB