Importance of Different Maths in Data Science

By arvind |Email | Sep 27, 2018 | 7275 Views

Mathematics is the bedrock of any contemporary discipline of science. It is no surprise then that, almost all the techniques of modern data science (including all of the machine learning) have some deep mathematical underpinning or the other
Sometimes, as a data scientist (or even as a junior analyst on the team), you have to learn those foundational mathematics by heart to use or apply the techniques properly, other times you can just get by using an API or the out-of-box algorithm.
However, having a solid understanding of the math behind the cool algorithm you are using to create meaningful product recommendation for your users will never hurt you. More often than not, it should give you an edge among your peers and make you more confident. It always pays to know the machinery under the hood (even at a high level) than being just the guy behind the wheel with no knowledge about the car.
It goes without saying that you will absolutely need all the other pearls of knowledge, programming ability, some amount of business acumen, and your unique analytical and inquisitive mindset about the data to function as a top data scientist. All I am trying to do is to gather the pointers to the most essential math skills to help you in this endeavour.
Why and how is it different
Consider a web developer (or a business analyst). (S)he may be dealing with lot of data and information on a daily basis but there may not be an emphasis on rigorous modelling of that data. Often, there is immense time pressure, and the emphasis is on ‚??use the data for your immediate need and move on‚?? rather than on deep probing and scientific exploration of the same. Whether you like it or not, data science should always be about the science (not data), and following that thread, certain tools and techniques become indispensable. Most of them are the hallmarks of sound scientific process,

  • Modelling a process (physical or informational) by probing the underlying dynamics,
  • Constructing hypotheses,
  • Rigorously estimating the quality of the data source,
  • Quantifying the uncertainty around the data and predictions,
  • Training one‚??s sense for identification of the hidden pattern from the stream of information,
  • Understanding clearly the limitation of a model
  • (Occasionally) trying to understand a mathematical proof and all the abstract logic behind it

This kind of training, much of it- ability to think not in term of dry numbers but abstract mathematical entities (and their properties and inter-relationships), is imparted as part of standard curriculum of a four-year college level science degree program. One does not need to be a summa cum laude from a top university to have past access to this kind of mathematics, but unfortunately, that past access pretty much languishes at that point of the road and often does not get carried forward in our mental processes :-)
And, I am not talking about that differential calculus course back in the freshman year. I am thinking something simple like the number 

What: Absolute must-know to grow as a data scientist. The importance of having a solid grasp over essential concepts of statistics and probability cannot be overstated in a discussion about data science. Many practitioners in the field actually call classical (non neural network) machine learning nothing but statistical learning. The subject is vast and endless, and therefore focused planning is critical to cover most essential concepts.

  • Data summaries and descriptive statistics, central tendency, variance, covariance, correlation,
  • Basic probability: basic idea, expectation, probability calculus, Bayes theorem, conditional probability,
  • Probability distribution functions‚??-‚??uniform, normal, binomial, chi-square, student‚??s t-distribution, Central limit theorem,
  • Sampling, measurement, error, random number generation,
  • Hypothesis testing, A/B testing, confidence intervals, p-values,
  • ANOVA, t-test
  • Linear regression, regularization

Linear Algebra:
What: Friend suggestion on Facebook. Song recommendation in Spotify. Transferring your selfie to a portrait drawing Salvador Dali style using Deep Transfer learning. What is common? Matrices and matrix algebra in all of them. This is an essential branch of mathematics to study for understanding how most machine learning algorithms work on a stream of data to create insight. Here are the essential topics to learn,
  • Basic properties of matrix and vectors‚??-‚??scalar multiplication, linear transformation, transpose, conjugate, rank, determinant,
  • Inner and outer products, matrix multiplication rule and various algorithms, matrix inverse,
  • Special matrices‚??-‚??square matrix, identity matrix, triangular matrix, idea about sparse and dense matrix, unit vectors, symmetric matrix, Hermitian, skew-Hermitian and unitary matrices,
  • Matrix factorization concept/LU decomposition, Gaussian/Gauss-Jordan elimination, solving Ax=b linear system of equation,
  • Vector space, basis, span, orthogonality, orthonormality, linear least square,
  • Eigenvalues, eigenvectors, and diagonalization, singular value decomposition (SVD)

What: The original maverick is back! Whether you loved it or hated it during college days, the fact is that the concept and application of calculus pops up in numerous places in the field of data science or machine learning. It lurks behind the simple looking analytical solution of ordinary least square problem in linear regression, or it is embedded in every back-propagation your neural network makes to learn a new pattern. It is an extremely valuable skill to add to your repertoire. Here are the topics to learn,

  • Functions of single variable, limit, continuity and differentiability,
  • Mean value theorems, indeterminate forms and L‚??Hospital rule,
  • Maxima and minima,
  • Product and chain rule,
  • Taylor‚??s series, infinite series summation/integration concepts
  • Fundamental and mean value-theorems of integral calculus, evaluation of definite and improper integrals,
  • Beta and Gamma functions,
  • Functions of multiple variables, limit, continuity, partial derivatives,
  • Basics of ordinary and partial differential equations (not too advanced)

Discrete Math
What: This is often a less discussed topic in the scheme of "Math for Data Science" but the fact is that all modern data science is done with the help of computational systems and discrete math is at the heart of such systems. A refresher in discrete math will imbue the learner with concepts critical to her daily use of algorithms and data structures in analytics project. Some key topics to learn here,

  • Sets, subsets, power sets
  • Counting functions, combinatorics, countability
  • Basic Proof Techniques‚??-‚??induction, proof by contradiction
  • Basics of inductive, deductive, and propositional logic
  • Basic data structures- stacks, queues, graphs, arrays, hash tables, trees
  • Graph properties‚??-‚??connected components, degree, maximum flow/minimum cut concepts, graph coloring
  • Recurrence relations and equations
  • Growth of functions and O(n) notation concept

Optimization, operation research topics
What: These topics are little different from the traditional discourse in applied mathematics as they are mostly relevant and most widely used in specialized fields of study‚??-‚??theoretical computer science, control theory, or operation research. However, a basic understanding of these powerful techniques can be immensely fruitful in the practice of machine learning. Virtually every machine learning algorithm/technique aims to minimize some kind of estimation error subject to various constraints. That, right there, is an optimization problem. Topics to learn,

  • Basics of optimization -how to formulate the problem
  • Maxima, minima, convex function, global solution
  • Linear programming, simplex algorithm
  • Integer programming
  • Constraint programming, knapsack problem
  • Randomized optimization techniques‚??-‚??hill climbing, simulated annealing, Genetic algorithms

Source: HOB