Foundational R programming concepts such as data types, vectors arithmetic, and indexing

How to perform operations in R including sorting, data wrangling using dplyr, and making plots

Course description

The first in our Professional Certificate Program in Data Science, this course will introduce you to the basics of R programming. You can better retain R when you learn it to solve a specific problem, so you will use a real-world dataset about crime in the United States. You will learn the R skills needed to answer essential questions about differences in crime across the different states.

Rather than covering every R skill, you might need, you will build a strong foundation to prepare you for the more in-depth courses later in the series, where we cover concepts like probability, inference, regression, and machine learning. We help you develop a skill set that includes R programming, data wrangling with dplyr, data visualization with ggplot2, file organization with UNIX/Linux, version control with git and GitHub, and reproducible document preparation with RStudio.

The demand for skilled data science practitioners is rapidly growing, and this series prepares you to tackle real-world data analysis challenges.

The concepts necessary to refine estimates and margins of errors of populations, parameters, estimates and standard errors in order to make predictions about data

How to use models to aggregate data from different sources

The very basics of Bayesian statistics and predictive modeling

Course description

Statistical inference and modeling are indispensable for analyzing data affected by chance, and thus essential for data scientists. In this course, you will learn these key concepts through a motivating case study on election forecasting.

This course will show you how inference and modeling can be applied to develop the statistical approaches that make polls an effective tool and we'll show you how to do this using R. You will learn concepts necessary to define estimates and margins of errors and learn how you can use these to make predictions relatively well and also provide an estimate of the precision of your forecast.

How to perform cross-validation to avoid overtraining

Several popular machine learning algorithms

How to build a recommendation system

What is regularization and why it is useful

Course Description:

In this course, part of our Professional Certificate Program in Data Science, you will learn popular machine learning algorithms, principal component analysis, and regularization by building a movie recommendation system.

You will learn about training data, and how to use a set of data to discover potentially predictive relationships. As you build the movie recommendation system, you will learn how to train algorithms using training data so you can predict the outcome for future datasets. You will also learn about overtraining and techniques to avoid it such as cross-validation. All of these skills are fundamental to machine learning.

How linear regression was originally developed by Galton

What is confounding and how to detect it

How to examine the relationships between variables by implementing linear regression in R

In data science applications, it is very common to be interested in the relationship between two or more variables. The motivating case study we examine in this course relates to the data-driven approach used to construct baseball teams described in Moneyball. We will try to determine which measured outcomes best predict baseball runs by using linear regression.

We will also examine confounding, where extraneous variables affect the relationship between two or more other variables, leading to spurious associations. Linear regression is a powerful technique for removing confounders, but it is not a magical process. It is essential to understand when it is appropriate to use, and this course will teach you when to apply this technique.

Data Science: Visualization

Learn basic data visualization principles and how to apply them using ggplot2.

The weaknesses of several widely-used plots and why you should avoid them

We will also be looking at how mistakes, biases, systematic errors, and other unexpected problems often lead to data that should be handled with care. The fact that it can be difficult or impossible to notice a mistake within a dataset makes data visualization particularly important.

The growing availability of informative datasets and software tools has led to increased reliance on data visualizations across many areas. Data visualization provides a powerful way to communicate data-driven findings, motivate analyses, and detect flaws. This course will give you the skills you need to leverage data to reveal valuable insights and advance your career.