Here is the cheesy part (feel free to skip it if necessary) - 2018 is the year of turning point in my life, and has been writing on Medium has proven to be one of my best decisions to share my experience and learning journey with so many data science enthusiasts and like-minded people - like YOU.
Thank you for being with me throughout my riding journey and I hope to share more with you in 2019!
So... Back to the topic of interest today.
In fact, this was one of my struggles when I first started off in the data science field. Interestingly, this is also one of the most common debate topics among data scientists.
So now the question is: Should data scientists know how to write production code?
Here is my answer: YES.
Production code is a well-tested and stable code which accounts for real-life scenarios and it must be robust to function.
And having the ability to write a production-level code is one of the highly sought-after skills as a data scientist for a company.
Good news to you if you're a software engineer turned data scientist as you might have developed this skill set through building various production codes for deployment in your previous roles.
If you're like me who is neither a former software engineer nor a person with any computer science background, I know how you feel and that's why this article is for you.
Let's get started!
Why Production Code?
Why we - as a data scientist - should care about writing production code in the first place?
Because this is where our analysis and models truly add values to our end users. Without the deployment of models after agonizing months (or even years) of models development, models will always remain as models if they don't bring any benefits to customers or end users.
All those long hours of data collection and cleaning, models building and optimization, and presentation are meant to show that your models are able to generate results and insights to reach business objectives.
Once you've successfully convinced stakeholders (provided your models are robust, the analysis makes sense from the business perspective, and the results are able to achieve the business goals), deployment phase will not be too far away, and that is when you need to put models into production by delivering production-level code.
To be brutally honest, your boss doesn't really care what models you used. What he/she cares is simply the RESULTS.
To deliver the results and you're good to go.
Again, It Depends.
I'd say there is not a necessity to know how to write production quality code to become a data scientist.
At the end of the day, it depends...
In my experience, some companies and clients that I've worked with needed data analytics and models building mainly for their internal analysis and usage.
This could mean your stakeholders simply want to know the performance of business metrics based on historical data, or how your models could make the business operations more efficient and cost-effective. In these cases, you may not need to write production code given the business objectives.
For some other companies, having the production-ready code is a must to be integrated into their existing system before being deployed to end users.
There is precisely one of the reasons why some data scientist job descriptions don't include production code skill as one of the requirements, but rather more on one of the preferable skills as they may have a team of software engineers or IT people to help production models.
Thank you for reading.
I'd not say that I'm an expert in writing production code as the learning journey is definitely not easy. But here I am, still learning every day to improve one step at a time.
I hope this article gave you the edge as a data scientist to understand the importance of writing production code and master this important skill yet not explicitly stated in job descriptions.
At the end of the day, you're what you have added values and contributed to the company to reach their business goals.