Why is Python more popular than R as a tool for data analysis? Most data science jobs ask for Python experience.
Because I've received messages about this let me be forthright. I've hired DS/DAs, been one myself, and seen a lot of good and bad from both "types". The absolute worst are the people who approach things from a tools perspective first and/or assume that there is some optimal toolset. I can firmly tell you that a good recruiter or analytics team will almost always prefer "Occam's Razor" type candidates as they are likely to have a much lower draw on resources, spending, and system complexity. With that said I hope my answer offers insight from this biased POV.
This is partially because a lot of people do not differentiate between "Data Analyst" and "Data Scientist" AND because the latter is a sexy, undefined title that pays well and everyone wants. To me a Data Scientist is not necessarily an analyst. Many great DS I've met are basically developers that are really, really skilled at storing, processing, and manipulating massive amounts of data. The science part comes from a background that integrates these skills with various economic, mathematic, and statistical analysis techniques that would otherwise be impossible, slow, or difficult without the assembly of a host of techniques to get data from unusable to usable. Python is a language that many have used to facilitate such operations and it can be very, very fast in use cases that rely on memory-heavy operations, fast looping, or writing operations that will eventually become production code. Please re-read this last point over and over until you understand why this matters for various analytic tasks.
A Data Analyst is often less concerned with how this backend works and more on ways to provide fast, actionable analysis of a dataset. For me personally I deal with "Big Data", but I don't really need to do a lot to my datasets to get them ready for processing nor does my code feed into a production environment in any way. People call me a "Data Scientist" in the sense that I take largely unstructured swaths of data, produce smaller datasets from it, and then conduct various analytic tasks to help make sense of the data. I develop specific reports and quite honestly most of my ML implementations run "fast enough" on R that I don't need to use much Python these days. I'm also not being paid to develop in-house solutions that rely on resource-heavy operations. I've used Python extensively in the past, but for what I currently do its really unnecessary.
Unfortunately everyone wants the "Data Scientist" title so they tend to do Google searches for what this title entails and approach it from a tools perspective first. The problem with this is that recruiters and managers that know nothing about data analysis tend to think "Lots of Data Scientists use Python so we should require that". In reality I know some phenomenal DS that use PostGres, Excel, and sometimes a bit of R (but rarely). On the other hand I know guys that can develop a parallelized network solution for running advanced models in a flash, but can't actually tell you what about an analytic product is useful to the end constituents. They can run models, cite various correlation measures, etc. but can't tell anyone what to do with that information.
My general advice to people is figure out which you are good at, a job calls for, and you want to do. There is nothing worse than being the guy that had to use an extensive technology stack to be outshined by someone that might, on a good day, be an Excel superuser. Same goes for the opposite. Don't go by what the internet tells you, go by what you need to solve the problems that you are hired to solve.
Python and R are the two most popular programming languages used extensively by Data analysts and Data Scientists. Both the languages are free and open-source and were developed in the 1900s. However, following are some of the reasons which gives Python edge over R that makes it even more popular among the community.
Purpose and utility: R was developed with the purpose of delivering a user-friendly way of performing statistical analysis and graphical modelling and hence R is more suited for statistical data analysis. However, Python was developed as general-purpose-programming language with the purpose of increasing code readability.
Libraries: The Comprehensive R Archive Network (CRAN) offers 10,033 packages for statistical analysis of data. The Python Package Index (PyPi) offers 102,199 packages, all for data analysis requirements. The sheer number makes python the winner in this regard.
Processing: R is believed to be slower than python in data processing and can take upto hours for a poorly written code as the language was developed for small data statistical analysis only.
Web integration: Since Python has evolved with the Unix Web servers it provides easy integration. Django one of the most common web application framework is written in python. Others include Gunicorn, Python Paste, Tornado etc.
Python has Ipython Notebook: The IPython Notebook makes it easier to work with Python and data. It provides easy sharing of notebooks with colleagues without installation requirements. This drastically reduces the overhead of organizing code, output and notes files.