The selection process of data scientists at Google gives higher priority to candidates with strong background in statistics and mathematics. Not just Google, other top companies (Amazon, Airbnb, Uber etc) in the world also prefer candidates with strong fundamentals rather than mere know-how in data science.
If you too aspire to work for such top companies in the future, it is essential for you to develop a mathematical understanding of data science. Data science is simply the evolved version of statistics and mathematics, combined with programming and business logic. I've met many data scientists who struggle to explain predictive models statistically.
More than just deriving accuracy, understanding & interpreting every metric, calculation behind that accuracy is important. Remember, every single 'variable' has a story to tell. So, if not anything else, try to become a great story explorer!
Here, I've compiled a list of must-read books on statistics and mathematics. I understand mathematics has no extreme. Hence, I've enlisted only those books which will help you to connect with data science better.
This is a highly recommended book for practicing data scientists. The focus of this books is kept on connecting statistics concept with machine learning. Hence, you'll learn about all popular supervised and unsupervised machine learning algorithms. R users will get an advantage since the practical aspects of algorithms have been demonstrated using R. In addition to theory, this book also lay emphasis on using ML algorithms in a real-life setting.
This book is an advanced level of the previous book. It is written by Trevor Hastie and Rob Tibshirani, Professors at Stanford University. Their first book 'Introduction to Statistical Learning' uncover the basics of statistics and machine learning. This book will introduce you to higher level algorithms such as Neural Networks, Bagging & Boosting, Kernel methods etc. The algorithms have been implemented in R programming.
The author of this book is Alien B Downey. It is based on performing statistical analysis practically in Python. Hence, make sure you've got some basic knowledge of Python before buying this book. It focuses entirely on the understanding real-life influence of statistics using popular case studies. Since stats and math are closely connected, it also has dedicated chapters on a topic like Bayesian estimation.
Did you know the about crucial role of statistics in programming? The author of this book is Norm Matloff, Professor, University of California. This book explains using probabilistic concepts and statistical measures in R. Again, a good practice source for R users. It teaches the art of dealing with probabilistic models and choosing the best one for final evaluation. It is a highly recommended book (especially for R users).
This book is written by Andy Field, Jeremy Miles, and Zoe Field. I would highly recommend this book to newbies in data science. To start with statistics, this book has great content which goes in-depth detail of its topics. Along with, the statistical concept is explained in conjunction with R which makes it even more useful. It offers a step by step understanding, with parallel support of interesting practice examples.