Data is an unreliable friend, and hardly anything about it is actually scientific. So what, Data Science?
Over the past 5 years, I have interviewed more than 1,000 candidate data scientists for an apparently highly coveted set of jobs at Evo Pricing. In the process I have learned that the media are portraying a fundamental lie about this profession: throwing data at off-the-shelf algorithms is really not the point.
A fundamental rethink would be appropriate, and it is likely overdue.
At its heart, data science is a noble name for a broad set of number crunching activities that were mostly invented long ago, but recently received a new lease of life from being applied with greatly enhanced technical devices: more data, more processing power, more reasonable outcomes at a cheaper price.
As the cost of storing and processing data went down, the volume of data collected went up: very simple law of supply & demand, or you can call it the Price Elasticity of Data if you will. The price goes down, the volume goes up. Someone will then have to do something with all this stuff. Enter Data Science.
Common misunderstandings surround data science
What is data science?
According to Berkeley: one of the most promising and in-demand career paths for skilled professionals.
According to my, the name 'data science' suggests the particular approach of being a solution in search of a problem. Here, some data; what can we do with it, anything?
Actually sounds sub-optimal, not only career-wise to set up one's profession, but also as a business strategy: let's invest big money to gather all this data, one day something good will come out of it.
Unfortunately, the industrial revolution in the XIX century gave us schools and universities to train large numbers of blue-collar workers to provide uniform answers to pre-packaged questions efficiently; and little has changed ever since.
What about training humans to ask the right questions instead, and letting the machines find the answers?
Data science can be a career dead-end
Even if many flavors of data science are gaining new popularity, like Artificial Intelligence and all other marketing hype that goes with it, the profession is mostly good for early tenure learners only.
Good salary prospects of 80k+ yearly average may sound appealing, but averages hide the full complexity of the challenge. To truly succeed with data one must excel at specific, impactful, and well-defined problems, rather than become a generalist expert of data or even worse science, which is mostly old from an academic point of view â?? as the opening image shows.
Data and algorithms are powerful tools. But, like with any tool, they can only be as good as the use that one makes of them.
Developing Business Science to succeed
How can one become successful with data? Focus on the problem to be solved, the job-to-be-done, instead of the data.
For those who focus on for-profit use cases, Business Science suggests all the right ideas:
Business problem to be defined, researched & solved
The scientific, data-driven approach
Business impact: measurable, objective outcome.
For not-for-profit and other use cases, the logic is nevertheless similar: start with question/hypothesis, use rigorous methodology, then go back to the learning criteria/question and validate if any impact was proven or not. Rinse and repeat without distraction.
Now the problem is how to get the job done? So much we can directly learn about this, from an apparently hilarious analogy.
Salmon lesson 1: start from the end
The humble salmon, besides having a yummy taste, also gets a lot of things right during its 5â??10 years' lifespan: it first starts from the end (river mouth) and only then goes back to the source (river spring) to lay eggs/spawn, before leaving space for the next generation of salmon.
The baby salmon is born next to the spring, then it swims downstream as it grows, learns about all the exciting stuff happening in the ocean, before turning back to the river, where it can assert its own reproductive claim.
The average data scientist could learn a lot from the humble salmon. Spending too long close to the data, and swimming down towards more and more data, may make for comfortable (intellectually lazy) swimming, but is a juvenile strategy that does not lead to long-term success.
A more mature salmon would instead start downstream, with a keen focus on establishing which question (river) they plan to address and the impact they want to have, before starting to slowly and painfully swim upstream while progressively narrowing down the amount of data (water) they swim through.
Salmon lesson 2: reject the waterfall approach
I have worked for 10 years at McKinsey & Company as a management consultant. Throughout my tenure, I followed rigorously the traditional waterfall approach: investing large amounts of time, effort, and client budget upfront. Researching everything in-depth. Boiling the ocean, if you will - and, in that process, killing all the poor salmons!
By and large, back then my team would formulate an initial hypothesis, and then look for appropriate data to prove or disprove this. Called hypothesis-driven thinking. At its best, an efficient quasi-scientific approach; at its worst, an expensive example of confirmation bias, where data are used to justify a decision that had already reached consensus beforehand anyways.
The theory may be appropriate for highly strategic, long-term plans, but certainly leaves clients none the wiser about what to do tomorrow morning, and then the day after, as the world becomes a faster, more complex, chaotic place. Businesses run as a movie, not a picture, as my chairman Robert Diamond likes to say.
This approach risks answering the wrong question and certainly fails to create the self-learning feedback that is crucial to continued success in spite of constant market disruption. Today the data are the model!
At the end of the day, that is why the whole concept of Agile Development was invented. Allow for incremental adjustments.
Salmon lesson 3: 80/20 to avoid the bears
At the top of every respectable waterfall, even the nimble salmon must face its nemesis, the big hairy beast.
While swimming upstream, every salmon may face unexpected challenges, apparently insurmountable obstacles, scary predators. What used to look like a calm water flow becomes a tumbling waterfall, suddenly hard to navigate.
In a desperate attempt at jumping ahead of its strength, the salmon meets its nemesis, the big hairy bear waiting for its lunch.
Perfectionism is the Business Scientist's nemesis.
Perfectionism is a trait that makes life an endless report card on accomplishments or looks. It can be a fast and enduring track to unhappiness.
Water (data) may quickly become a hideout for the bear to bite. A place to drown rather than the happy fluid to swim into. Better take a different, more pragmatic approach.
80/20 is the cure - focus on what really matters and go around the obstacles rather than try to cut through them. Swim around and look for ways to avoid the bears. Edge cases very often have little to no business impact! So why bother?
Salmon lesson 4: less (data) is more (storytelling)
More time should be spent preparing data-driven results than researching them. That's MORE time as in actually more time and not the last-minute rush variant of 'more'.
The salmon is born around little water (data) - formulating a narrow question; then goes into the wide ocean of research, with big data and big water to help; but then goes back into little water territory. Because explaining a result means cherry-picking and designing for impact.
Data-driven work is bottom-up, but communication must be top-down to be effective.
At some point, the Business Scientist should stop boiling the ocean of data and start thinking about the message: from scratch, how to convey that message alone? Switch from bottom-up to top-down mode.
Not creating fancy data viz that are highly dynamic and confusing, thus only interesting to techies. But actually, distilling down the message. LESS data and MORE time planning communication.
Effective communication always starts from the so-what and therefore from the end, before swimming backward into the million reasons why this claim is asserted, and the million supporting data points to support the conclusion.
I highly recommend reading The Pyramid Principle by Barbara Minto for great insight into effective logical storytelling using facts.
Salmon strategy 5: proof in the pudding
Starting from the end, from the tangible impact and its measurement approach, is key to win approval and gain confidence in what otherwise could be considered obscure algorithmic prowess pro se. Determining if a black box works require usage.
You can only learn that a SatNav works by actually using it, not just by studying it.
I am particularly passionate about pricing and supply chain applications, and in both cases, the biggest prize is very often in the strategic stuff done upstream: planning, designing. BUT there is a but. You never get to touch that type of stuff if you first cannot demonstrate tangible value quickly.
So I recommend starting with end-of-lifecycle things like reordering (for supply chain) and markdowns (for pricing). Even if I am very well aware that, for example, the best markdown is the one you do NOT offer at all because the planning was accurate, to begin with. But how hard would it be to disrupt planning, without first having won hearts & minds on something tangible?
Be a Business Scientist at heart, the money will follow
Would it pay off more to learn the exact same data science techniques that everyone else is studying, or rather investing in learning about specific, unloved business applications where gut-feel cannot compete with data-driven impact?
Find your own niche first, and forget about machine learning until you will have an exciting problem you want to solve. After quitting McKinsey I have taught myself all I needed to know about R is just an afternoon to get started with implementing my Business Science idea, but did I know VERY well what I wanted to do!
It probably took me over ten years before that day, to reach a level of expertise that made me comfortable about the specific problem I wanted to deal with.
To do a Ph.D., you need a research question first - unlike for an MSC where someone else gives you all the questions.
To become an entrepreneur, you need a unique business idea first - unlike a professional career, where someone else gives you all the questions.
It's a philosophy of life.
Dare to swim against the current, and rather than just becoming another data scientist choose to pursue the more fulfilling & rewarding Business Scientist career.