Let's say you have an amazing idea for a machine learning app. It's going to be brilliant. It's going to revolutionize the world of finance, mobile advertising, or... some other world, but it's definitely going to revolutionize something. And gosh darn it, it's going to be the smartest, most learned app the world has ever seen.
The only thing standing between you and glory is the small matter of actually coding your brilliant idea; and the first question you would want to ask yourself in this regard is which programming language you want to use for your app, with the two immediate candidates likely being R and Python.
Each of these languages has its pros, cons, and diehard fanbase. This article is meant to help developers choose between these two bitter rivals, in the context of machine learning (for a more general, feature-by-feature comparison you might want to check out this great infographic
Let's get down to it then!
Round 1: Ease of Development
Python lets you hit the ground running... if you have programming experience.
While both Python and R are completely manageable and used by many developers in both business and academia, Python lends itself more easily to developers who have experience with other programming languages. Its syntax is more familiar than R, while also closer to regular English text - making it easier to read and debug.
R is very popular with advanced business users - e.g. data analysts in fields such as retail, marketing or finance - who come from more of a statistics background, rather than programming or software development. Since you're developing a machine learning app, we're guessing you're closer to the latter group - in which case you might appreciate Python's flexibility, readability and similarity to the type of programming you already know and love.
Round 2: Robustness and Production Readiness
Python fits more naturally into a complex coding environment.
While applications of R in the business world are definitely on a growth trajectory
, Python is still a more full-fledged programming language and is used for many types of web and other applications, in addition to its data science applications. R, on the other hand, is still mostly used for data analysis advanced statistical modeling.
Hence, assuming you would want to integrate your machine learning algorithms into some kind of interface that's communicating with other code, written by other programmers, Python might be the better choice. R can be used for rapid prototyping or to solve a specific problem, but Python will be easier to maintain and scale in the long run (especially considering its versioning and documentation are far more consistent).
Round 3: External Libraries
Both languages have a breadth of external libraries that can be (relatively) easily used in a machine learning project, Python's are a bit more mature. Specifically, scikit-learn
is an extremely popular, open-source machine learning package that is used in many commercial applications.
Meanwhile, R libraries such as caret
are catching up, but are not quite there yet when it comes to breadth of functionality. With R you might be able to more quickly build and launch your first model - but mastering scikit and similar libraries will provide you with a deeper and more complete toolset that you can feel safe using in your machine learning app.
Round 4: Performance with Big Data
R can provide better performance when performing large computations.
Machine learning will often involve working with massive datasets and highly complex computations to train and test your algorithms - so you'll want to make sure the programming language you use will perform will in these kind of scenarios.
While both R and Python can integrate with Hadoop for big data, newer R packages utilize C to provide better performance for large-scale computation. Hence, you might get faster results when using R in these situations.
Round 5: Statistics and Data Visualization
While this would not be the core of your machine learning software, your app might very well include some elements of statistics, analytics and data visualization.
Here, R is the hands-off winner as a tool that's built from the ground up to provide a robust platform for advanced statistical analysis. Integrating ggplot2
will enable you to create some really nifty visualizations as well, including interactive, browser-based graphs and charts.
While Python can and is used for statistical analysis and data visualization, R will probably be the better choice for this type of functionality - especially when it comes to â??one-off' operations, prototyping and testing various hypotheses (versus creating reusable and extendible features).
And the overall winner is...
Python. With the necessary caveats that every application, use case and business scenario is different, Python is the more mature, fully-fledged and flexible option for machine learning - and for creating complex coding projects in general. However, with R's rapid development and growing popularity, we won't be surprised if it catches up within a few years.
P.S.: if you're developing from scratch, it's probably neither
Our discussion above assumes you would want to be using an external library and build your machine learning app around it. Unless you've got a team of programming superstars, this is probably the direction you'd go.
However, if you want to start from scratch and rewrite the libraries themselves - either as a research project or because you have a truly brilliant idea for optimizing some of the under-the-hood processes - then you probably would use a compiled language (rather than an interpreted one), such as C or Java. In fact, most of the external libraries you'll be using are actually written in these languages.