All professionals I'm sure looking ahead to a new start and want to increase their data analysis skills. So here is the collection of books through which data scientist can sharpen up their knowledge and skills.
Authors: Trevor Hastie, Robert Tibshirani, and Jerome Friedman
During the past decade, there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. Data analysis, which not long ago was primarily the domain of statistics, has evolved dramatically in the last few decades. This is almost entirely a consequence of the revolution in computing which has occurred over that period. At the start of this revolution, researchers were enabled to perform analyses that they might previously have balked at. But gradually things advanced so that nowadays tools can be applied which would be quite inconceivable without machine assistance. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with liberal use of color graphics.
The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting, graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization and spectral clustering. There is also a chapter on methods for wide data, including multiple testing and false discovery rates.
This is a beautiful book. Not only in presentation, where it makes excellent use of color, but also in content and style. It would make a first-class text for an advanced undergraduate or an initial graduate course in modern statistical tools. A true fore-runner to what is now called Data Science.
Author: Allen B. Downey
If you know how to program, you have the skills to turn data into knowledge using the tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. Think Stats emphasizes simple techniques you can use to explore real data sets and answer interesting questions, and you are encouraged to work on a project with real datasets.
If you have basic skills in Python, you can use them to learn concepts in probability and statistics, and many of the exercises use short programs to run experiments and help you develop understanding. You'll work with a case study throughout the book to help you learn the entire data analysis process from collecting data and generating statistics to identifying patterns and testing hypotheses. Along the way, you'll become familiar with distributions, the rules of probability, visualization, and many other tools and concepts.
Author: Sanjoy Mahajan
Traditional mathematics teaching is largely about solving exactly stated problems exactly, yet life often hands us partly defined problems needing only moderately accurate solutions. Street-Fighting Mathematics teaches us how to guess answers without needing a proof or an exact calculation. In Street-Fighting Mathematics, Sanjoy Mahajan describes six tools: dimensional analysis, easy cases, lumping, picture proofs, successive approximation, and reasoning by analogy. Illustrating each tool with numerous examples, he carefully separates the tool the general principle from the particular application so that the reader can most easily grasp the tool itself to use on problems of particular interest.
Given the title of this book (and its subtitle -The Art of Educated Guessing and Opportunistic Problem Solving), I was expecting a pop-maths book but instead found this book to be a straight-up maths textbook. As a physicist and mathematician, I enjoyed it immensely. Learning to see problems the way Mahajan sees them takes deep thought, time, and practice, but that is what makes Street-Fighting Mathematics an enjoyable read that provides an enlightening look at solving problems. On the other hand, I would hesitate to recommend it to those that might have difficulties with maths. You definitely need a strong understanding of calculus, differential equations, statistics, and basic physics to get the best out of this book.
Author: Roger D. Peng
Data science has taken the world by storm. Every field of study and area of business has been affected as people increasingly realize the value of the incredible quantities of data being generated. This book covers the essential exploratory techniques for summarizing data with R.
Exploratory data analysis is a key part of the data science process because it allows you to sharpen your question and refine your modeling strategies to develop more complex statistical models. This book covers the plotting systems in R as well as some of the basic principles of constructing informative data graphics and some of the common multivariate statistical techniques used to visualize high-dimensional data. Some of the topics covered are making exploratory graphs, principles of analytic graphics, plotting systems and graphics devices in R, clustering methods, and dimension reduction techniques.
Author: Brian Caffo
Statistical inference is the process of drawing conclusions about populations or scientific truths from data. There are many modes of performing inference including statistical modeling, data-oriented strategies and explicit use of designs and randomization in analyses. Further complications include alternative broad statistical theories (frequentists, Bayesian, likelihood, design based) and numerous complexities (missing data, observed and unobserved confounding, biases) for performing inference that can leave you in a debilitating maze of techniques, philosophies, and nuance.
This book, the accompaniment to the online Coursera Course on Statistical Inference presents the fundamentals of inference in a practical approach for getting things done and is designed to help you to understand the broad directions of statistical inference and use this information for making informed choices in analyzing data. Topics covered include probability, random variables, expectations, variability, distributions, limits and confidence intervals, testing, p-values, power, Bootstrapping and permutation tests.
Author: Lee Baker
Correlation Is Not Causation explains how to systematically test for the five most common correlation-causation pitfalls that even the pros fall into (occasionally). We'll learn to create strategies to analyze the data and interpret the results in a way that is easy to understand.
Best of all, there is no technical or statistical jargon it is written in plain English.
It is packed with visually intuitive examples and makes no assumptions about your previous experience with correlations, in short, it is perfect for beginners.