What if companies managed their data like they manage their money? By definition, businesses manage money as a strategic asset, and the tools available to CFOs are well-defined. Ask any CIO how much money they have, where it's coming from and going to and they'll tell you right away - down to the penny.
But CIOs and CDOs aren't so fortunate. The challenge goes deeper than simply putting the right systems in place. I spoke with Andy Palmer, Co-founder and CEO of data unification provider Tamr, to discuss the problem and examine how companies are working their way through the problem.
What do you believe is the crux of the issue?
Data is an asset - not exhaust. The core issue facing enterprise data leaders today -- Chief Information, Data and Analytics Officers -- is that their desire to monetize the enormous quantities of company data is colliding head-on with decades of treating data as operational exhaust, rather than managing it as a source of value-creating fuel. As a result, demand for clean, usable data far outstrips supply. To fuel that demand, organizations are learning to manage data -- like they do money -- as an asset. Forward-thinking organizations want to deliver transformative insight before they're asked and they need a new framework that allows them to do so. The suite of new technologies and processes that allow enterprises to deliver trusted analytics in a usable way is emerging as a discipline that I call "DataOps." DataOps is the framework by which organizations can reconcile the disconnected nature of enterprise data with the demands of their internal users -- surfacing clean, reliable and complete data and delivering it to the right people at the right time. I truly believe that the most successful companies over the next decade will be the ones that build strong DataOps capabilities.
Why have most enterprise's essentially treated their data a form of operational exhaust? What are the consequences of those choices?
Enterprise systems (and the data they create) are radically heterogeneous. It's completely understandable why data has been treated in this way for the last three decades -- I was probably guilty of it too in my past roles running large IT organizations. In the past, for business units to be flexible and meet their goals, they needed, or at least felt they needed, a large degree of control over their systems and data. With business units driving things, these systems proliferate organically. Mergers and acquisitions as well as huge investments in business process automation systems has accelerated systems creep as well. My co-founder Mike Stonebraker works with companies all the time who have thousands of data systems - in our race to automate the back and front office of the enterprise - we created many idiosyncratic systems and consequently data that is idiosyncratic. For the most part, data was a by-product of doing business, and attempts to impose standards in a top down, deterministic way rarely worked.
The myth of the single enterprise platform. Vendors have accelerated the problem by selling a lot of systems that customers didn't need or even use -- but locked them into that vendor's suite, and organizations just started to buy multiple instances for different business units. We know that one-platform-fits-all doesn't work for companies, and now these enterprises have the opportunity to engineer flexible, agile systems using best-of-breed software. Because those older platforms are both ingrained and not particularly useful for working with other systems for analytics, DataOps technologies will need to work across these legacy platforms to help find the exhaust, clean it, and deliver it to users.
Do all big enterprises behave this way, or are there some types of companies that have taken a different approach. What have been the implications of the different approach they have taken?
Don't put the AI Cart before the Data Horse. The largest tech companies -- Google, Facebook, Tesla -- have a huge advantage in this regard. They were able to engineer their systems from scratch and aren't burdened with legacy IT investments. They also recognized from the start that they were either wholly or heavily in the business of data, and have been treating it like an asset from day one. Their data is relatively pristine; it's well-organized, well understood and treated as fuel for creating products and delighting customers. Well-organized data is absolutely foundational to gaining benefit from the latest and greatest tech. I talk to folks in the market all the time who are psyched about the buzziest new technologies like deep learning algos that are being pioneered by those newer companies. For a forty or fifty year old Fortune 100 company, they aren't going to get transformative results out of the gate with some cutting edge, new algorithm. They have to first get their house in order at the data level -- know where their data is, what it represents, and organize it around their core entities like customers, suppliers and products. They need to create easier ways for their employees to access, use, curate and publish data. 80 to 90% of the benefits big companies will reap will come from relatively unsexy tasks like integrating CRM data to provide a 360 degree view of a customer. A really interesting question I got asked recently was, "What if companies managed their data like they manage their money?" This really got me thinking about the huge mindset changes that organizations are going to have to go through in the near future.
For companies that want to start treating their data like money, what are the alternatives for them to consider?
The solution is a mix of people, technology and process - there is no magic black box. It's important to realize that managing money in a business context means recognizing that money is a strategic asset which can be used to drive growth. The reason executives care about knowing about their money (aside from investor reporting) -- where it is, how much there is, and where it came from -- is that so they can strategically use that asset. Data ought to be understood the same way. An effective CDO should know where the data is, how much of it there is, and where it's coming from and execs need to care so they can deploy that data in their own missions. Execs also need to think about the link between building great products and upselling customers and the supply of clean, trusted data. Thinking about data this way also makes data management and procurement a value generating activity -- not a chore. Forward thinking companies -- we do a lot of work with GE, for example -- have made the mental switch. It can be tough! CFOs have mature frameworks, tools and levers in managing their money -- it's been well established by a hundred years of standardized accounting practices, chief data officers don't have these yet.
How can Machine Learning and AI help solve this problem? Is the hype around machine learning and AI an asset or a liability?
Algorithms are relatively cheap - Data can be the primary differentiator. Tamr's Head of Data Science, Eliot Knudsen, is always quick to point out that machine learning and artificial intelligence are extremely useful and extremely limited. Any technology which can automate -- at scale -- tasks that used to require humans is going to be helpful. But organizations should look to deploy these technologies where they are best suited to the task at hand, not just for the sake of it. At Tamr, we use machine learning very strategically to build data integration models based on human feedback. It's a no-brainer use case for machine learning and there are lots of those types of use cases out there. I think if organizations are focused on the goal -- highly repeatable, rapid, trusted analytics distributed throughout an enterprise -- they'll lead themselves to the right technologies and tools to accomplish the mission. But the focus ought to be the mission, not the toolset.
Is clean and unified data alone the answer? What role does organizational culture and human behavior play?
Clean, unified data is just the table stakes. Just as analytics have been democratized in the enterprise - unification of data will also become democratized. Clean and unified data, in my opinion, is almost the easy part. The same forces that caused system proliferation also cause territorial behavior within an enterprise, and breaking down those barriers can be the hardest part of a project. Users are going to get their hands on the data they want one way or another -- so the earlier organizations implement security and access policies in a way that recognizes that data will flow swiftly around the enterprise, the better. However, providing really good data, even for a small use case, can really open eyes. When users are involved in the process of policing data quality and seeing improvements which help them in their work, they chomp at the bit to help and provide more feedback in order to get better data. It's a virtuous cycle that the best organizations lean into.
GE's digital transformation is being watched closely. Describe how they are paying down their â??data debt' and working on the problem?
It all starts with leadership - and Jeff Immelt set the stage for the success GE is going to have over the next 10 years with digital. GE's first step was the acknowledge the scale and extent of the legacy problem; they had too many systems to get their arms around using traditional approaches. The size of their data forced them to think about solutions which wouldn't require painful systems integrations and take years. They were smart enough to recognize that an agile approach would serve them best. From an early stage they recognized that their next transformative wave of products were going to be data-driven, so they have a massive incentive to develop their organizational capacity for DataOps. We work with GE to clean their internal data; and it's clear that their outward focus on data has moved the needle internally -- the cobbler's son shouldn't go barefoot. They really treat data as an asset across the enterprise. It's been amazing to be involved.