I believe the job of data scientist as we know it today will be barely recognizable in five to 10 years. Instead, end users in all manner of economic sectors will work with data science software the way non-technical people work with Excel today. In fact, those data science tools might be just another tab in Excel 2029.
Financial analysts today rarely need to recruit data scientists to help them because the platforms they use already provide the data science tools they need. This will become common across many other fields, as a basic understanding of data science will become a required skill for many jobs. Meanwhile, much of today's data science work is being automated, and some observers warn that data scientists might be automating themselves out of a job.
Data Science's Soaring Popularity
Data science careers are experiencing a gold rush moment. A 2018 Bloomberg article hailed data science as "America's Hottest Job," citing a 75% increase in data scientist job postings on recruiting website Indeed.com from January 2015 to January 2018. Data science doctorates at some consulting firms are drawing salaries of $300,000, the article noted.
However, all these young people are going into a profession that may be unrecognizable a decade from now. While their data science skills will be a strong career asset, a surprisingly small proportion of them will likely to be working as straight data scientists.
From Machine Code To Mass Coding To Data Automation
When I studied computer science back in the way-back-when, compiler design was a required course. We needed to know how to convert programming languages like C directly into machine language, the hexadecimal code that computers interpret directly. It was common to write pieces of commercial applications in machine language for faster performance.
Over the past few decades, successive layers of software functions have been abstracted into higher-level development tools. Most coding today is done in high-level, easy-to-learn languages like Python, and relatively few programmers need to know how to speak directly to the hardware.
Data science is quickly following the same progression. Over the next three to five years, higher-level tools will increasingly alleviate the need for expertise in foundational technologies like high-performance computing (dividing a problem across CPUs), data munging (preparing raw data for analysis), the internals of machine-learning systems or low-level statistical methodologies. All this will be handled under the hood.
Today, dozens of companies -- including Trifacta, Element Analytics, and Kylo -- are introducing new data analytics tools, with many of them aimed at reducing tedious data preparation work and allowing data scientists to quickly get to the analytical work. Also emerging are data science frameworks that automate algorithm selection and parameter tuning (e.g., Auto-sklearn, DataRobot). These frameworks and tools are combined with platforms for data management to create large building blocks for the data consumer of the future.
The Path Forward For Data Scientists
Over the coming years, I foresee data scientists dividing into at least five types of workers:
1. Generalists: The first group will be data science generalists, who will interpret data and make it usable. These generalists will focus on educating end users, helping users ask questions of the data rather than finding all the answers themselves. This will likely be a transitional role, more common in five years than in ten.
2. Industry specialists: The second and largest group will comprise industry specialists, who will apply data science techniques and tools in specific verticals like manufacturing, medical sciences, and finance. This is where I believe the bulk of the jobs will be. However, these won't be considered data science jobs. This worker won't be a data scientist who understands manufacturing but rather a manufacturing leader who understands data science. Today's equivalent is the researcher who is a statistics ace.
3. Deep specialists: The third and smallest group will be deep specialists in specific data science technologies. This is where the remaining pure data science jobs will be. Their role will be pursuing data science in the abstract, improving the performance of algorithms and designing new generalized approaches. They will be like today's computer scientists, building theoretical foundations rather than solving everyday problems.
4. Analytics developers: The fourth group will transition from a data scientist into an analytics developer. These are software development specialists who deal with data interaction and helping people make inferences from data reports. Algorithm design will be a small part of their job, assisted by data platforms as well as by robust code libraries that do a lot of the work in turn-key fashion.
5. Data engineers: Finally, new jobs will emerge like the data engineer, who builds pipelines that transform and deliver data into foundational platforms, where the analytics and visualization take place. While data scientists are usually recognized for their brilliant algorithms, up to 80% of a data scientist's time could be spent collecting, cleaning and organizing data.
Within 10 years, data science will be so enmeshed within industry-specific applications and broad productivity tools that we may no longer think of it is a hot career. Just as generations of math and statistics students have gone on to fill all manner of roles in business and academia without thinking of themselves as mathematicians or statisticians, the newly minted data scientist grads will be tomorrow's manufacturing engineers, marketing leaders, and medical researchers.