The huge data community has been addressing data scientist shortage ever since big data became an issue. Currently, we're learning that there is probably a good larger shortage of another variety of data professional.
Data engineer may be a comparatively new position that is a hybrid of types between a data analyst and a data scientist. Whereas information scientists are reception making and standardization refined machine learning models and alternative kinds of analysis, data engineers shine at manipulating huge amounts of data and guaranteeing the complete huge data code stack will scale to support large workloads.
"A data engineer is the all-purpose everyman of a big data analytics operation, working between downstream analysts on the one hand, and upstream data scientists on the other. They will often come from programming backgrounds and are experts in big data frameworks, such as Hadoop. They're called on to ensure that data pipelines are scalable, repeatable, and secure, and can serve multiple constituents in the enterprise."
Trifecta co-founder and CTO Sean Kandel noted an uptick in demand for data engineers two years ago. "I'm definitely seeing that title pop up more, and seeing more postings for it," he told us back then.
Now, that little surge in demand seems to be blossoming into a full-blown data engineering shortage. According to a new report released by Stitch and Galvanize, there are only 6,500 self-reported data engineers across the whole country according to an analysis of their LinkedIn profiles, but more than 6,600 job openings for data engineers in the San Francisco Bay Area alone.
According to the study, the number of data engineers has doubled from 2013 to 2015, with the biggest concentration of data engineers in the information technology and services industries. The top five skills needed by data engineers are SQL, Java, Python, Hadoop, and Linux.
Stitch's analysis (conducted in SQL, Python, and Jupyter, naturally) found that while there is currently twice the number of data scientists as data engineers, the number of data engineers is growing much faster than any other position.
That gap spells trouble for digital companies hoping to hire big data engineering talent, including high-flying Silicon Valley firms like Uber, Airbnb, and Spotify. But it smells like an opportunity for folks like Galvanize CEO Jim Deters, who says his company is working to satisfy both sides of the house through boot camp style programs for both data scientists and data engineers.
"There's vast demand on each side," Deters tells Datanami during a recent interview. "For those that are simply getting down to dabble their toes into understanding they seem to be a computer code company and a data company, they could not be as refined with wherever they are going. they are graduating from totally different business intelligence tools and using R.
"But people who have gotten any on the maturity curve," he continues, "that's wherever we tend to see the very best demand for data engineer. that is wherever we've retooled. We've taken our program and rather than creating it client facing, we've self-addressed the enterprise and serving to existing data analyst and data human become data engineers."
Data engineers should be skilled at moving and transforming vast amounts of data, which implies they need to be familiar with ETL tools and techniques. within the Hadoop world, data transformation typically happens once the data is browsed, not once it's ab initio loaded, and data engineers have to be on prime of that.
Being able to find relevant information in the modern schema-on-read environment is a key skill that data engineers must possess, according to Will Smith, a principal data engineer, and architect at MIT, who participated in the Stitch study.
"Imagine you have terabytes of log data from ad impressions in JSON," Smith writes in the report. "The data engineer has no idea what they will find in that data. The skillset now requires the developer to do data discovery and develop code, rather than just using straight SQL. This is a very different skill set that is needed in the schema-on-write environment."
Most data engineers live and work in the U.S., which isn't surprising. The San Francisco Bay Area is a hotbed of data engineering talent, thanks to the proximity to UC Berkeley and Silicon Valley Web giants like Google, Facebook, and LinkedIn, which train and employ their employees to do data engineering work
What is fascinating that's that the majority of data engineers do not outline themselves as "data engineers" the least bit. Instead, software system engineers are that the most typical title for folk doing data engineering work, followed by analyst, consultant, business analyst, data designer, and data analyst. Some data engineers at silicon valley corporations earn up to $500,000 p.a., though those style of salaries are rare, consistent with sewing, that found most job openings for data engineers pay within the low six-digits.
As the title suggests, engineering is at the guts of a data engineer's job. "A data engineer must be way more subtle in actual software system engineering at the tip of the day," Galvanize's Deters says. "A data engineer is massively deep within the plumbing and understanding ETL tools understanding Python and Python libraries, and obtaining a lot of deeply into the software system facet and also the software system stack attached managing and manipulating huge amounts of data."
As the rate of change of big data technology continues to accelerate, so too do the skills required to make use of it. It's not easy to keep up. "That's a big trend that's going on," Deters says. "These technologies continue to evolve and get more complex and granular. As companies adopt more of these technologies, to actually get more value out of them, they have to continue to invest and build the skill sets."