Many companies want to use Data Science to advance their businesses. They recognize the need of data science as every organization prime goal is to stay competitive and make use of their data, but many of them are unsure of how to get started and don't even have a data scientist team.
Fortunately, because the modern ecosystem for data science in particular the Python stack is so friendly to beginners, nowadays you can make a lot of progress without needing a huge team of highly technical experts or advanced degrees in mathematics.
The following steps are advised to bootstrap your way into data science:
Step 1: Pick a data set
Look at what data your organization currently collects or has access to. If you're an insurance company, you might have millions of claims records that you're not utilizing. If you're a manufacturer, you might have a database of past orders. Whatever the nature of your organization, identify the first data set that you'll want to use to advance the business. This will give a healthy focus to your data science mission.
Step 2: Set an initial goal
Decide on the first thing you want to do with the data. Do you want to automate a report that currently is done manually and is very tedious? Do you want to set up an analytics dashboard on top of your database so your team can easily extract insights and generate data visualizations? Or do you want to build a machine learning model that can predict how likely a customer is to churn? Whatever goal you choose, the important thing here is that it is an achievable milestone that would provide some business value.
Step 3: Skill up your team
If what you want to do is more along the lines of automating reports, extracting insights and running queries on the data, that's more in the wheelhouse of data analytics. In that case, you'll want your team to be proficient in the following tools:
- SQL to easily interact with your database.
- Python to write programs to automate analyses of the data.
- Pandas to easily program more complex analyses and work with larger data sets.
- Plot.ly or something similar to visualize your data so you can communicate your findings.
Note: At Simple Fractal, we use the Python stack and find it easiest to teach, but using R and similar tools would be fine as well.
If you want to do something more in the realm of data science, such as building a predictive model for churn, forecasting sales or detecting fraudulent claims, then, in addition to the abovementioned analytics skill stack, you'll also want your team to learn the following:
- The basic underpinnings of supervised and unsupervised learning
- How to use scikit-learn library.
- How to interface with Spark, Azure Machine Learning, Google Cloud Machine Learning or similar
Some hardliners believe you must understand all of the mathematics behind the data science models. On the other side of the spectrum, some believe you can just treat the models as pure black boxes and use them without any deeper understanding of what goes on inside. What we believe is more in the middle: Learn to use the tool, and also learn some of the basic intuition behind the models. In my experience, that's typically enough to make sure you're proficient enough to build useful things responsibly.
Step 4: Hack on the data
Now that your team has the skills it needs, get hacking away on the data! See what your initial results are, and keep iterating.
Hence, implementing these steps we will gain very useful insights about the organizations data and also be proactive about using data science to drive the organization forward.