I recently joined the Enterprise Insight Studio team at Accenture's global center for innovation in Dublin as an Artificial Intelligence (AI) Software Engineering. Given that this is a new role in the team - and in the field in general - there are still a lot of questions around what an AI Software Engineering does and how the role fits within a Data Science team. To help answer some of those questions, I have highlighted in this post some of the key skills that an AI Software Engineer brings to the table and how they integrate within a Data Science team. In practice, several people work on a Data Science team to build analyses and data products. The final product will only be as good as the team that is responsible for collecting, building and analyzing the underlying data. Not too long ago, Data Science teams were mainly composed by Data Scientists, Data Architects and Business Analysts. However, there was a still a big gap to be filled to turn the Data Science work into scalable and stable products, and there is exactly where the AI Software Engineer comes into play.
The main role of an AI Software Engineer in a Data Science team is to productize the data science work so it can serve an internal stakeholder or external customers. The AI Engineer must collaborate with the Data Scientists, Data Architects and Business Analysts to ensure alignment between the business objectives and the analytics back end. And, to justify the AI portion in the job title, an AI Software Engineer is responsible for staying up to date and informed about breakthrough artificial intelligence technologies with the potential to transform business, the workforce or consumer experience and how that can be leveraged by the Data Science team. That all sounds very good, but in practical terms, what exactly does it mean? It means, to put it in a simple statement, that the AI Engineer is responsible for bringing a Software Engineering culture into the Data Science process. That is a massive task and involves things like:
Build Infrastructure as Code
Automatization of the Data Science team infrastructure. This important Software Engineering concept is a key part of a successful Data Science project. The AI Software Engineer is responsible for making sure that the environments created during the model development and training can be easily managed and replicated for the final product. Tools such as Anaconda, for Python package management, and Docker or Vagrant, for the creation of easily transportable and self-contained environments, should be part of a Data Science team process to help with the collaboration between team members and with easily deployable models. Is the responsibility of the AI Software Engineer to build and manage that pipeline, allowing the Data Scientists to focus exclusively on the model's development.
Continuous Integration and Versioning Control
This is another important fact for a Software Engineer that can be easily missed in a Data Science team. Tools such as TFS or GIT should be part of the daily process of a Data Science project. During the model development, there are so many iterations and different updates that is impossible to keep track of all that has been done without a proper versioning control system in place. Concepts such as candidate releases, different branches for a different type of issues/user stories, all under one common location that is accessible by everyone in the team are extremely important to transform the Data Science work into an actual product. That also allows the introduction of concepts such as code reviews, which guarantees that more Data Scientists will be aware of how a code/model works and it will help improve the quality of the work that is created.
Any product, that being a model with a simple user interface or a fully integrated application, should be thoroughly tested. Obviously, from a Software Engineer point of view, those tests should be fully automated. That means that unit testing, branch testing, integration testing, and security testing should be embedded in the Data Science core process. Of course, that is aside from A/B testing, which is a different case and done at a different stage of the development, but they are equally important and should not be discarded.
Development of APIs to help integrate data products and source into applications. The AI Software Engineer is responsible for build and maintains a platform to easily "convert" the models into APIs that can be consumed by other applications. That means the development of tools or custom APIs that follow a standard approach and a common language. That also means that the Data Science team can quickly spin a model into an API that is consumed by the "outside world". This is a key step for transforming the science models into a product and the AI Software Engineer should bring all of his/her expertise into play to guarantee that the APIs created from the models are scalable, flexible and reliable.
Development of Pilots and MVP Applications
Although not always required, as some Data Science work can be simply presented through Jupyter or other data visualization tools, the development of Pilots and MVPs is still very important in a Data Science process. The MVP is the final product that encapsulates all the other aspects I have mentioned so far, from the creation and testing of the models through the API development all the way down to a final product that can be demoed and consumed by the end users. The main point here is to have applications - either MVPs or final releases - that are so solid that the end users won't even realize there is a data product underneath it.
The AI Software Engineer should also consider the implementation of other Software Engineer concepts into a Data Science team, such as continuous delivery, application monitoring, and auto-scale, which should also all be part of the core process. However, before going further into the Engineering aspect of it, the points mentioned above, which I believe are the most important of the whole process, should have been implemented and be totally integrated with the team's culture. Only then more advanced Engineering concepts should be brought into the table.
To conclude, we can think of an AI Software Engineer as the person responsible for making the life of the Data Scientists and Data Architects easier. They should be focusing on the important aspect of their work: analyzing data and creating models with high accuracy or working on the overall architecture of the project. The AI Software Engineer comes in as the guy that will be responsible for creating the APIs, test and deploy the models, create any user interfaces that might be needed to display a more relevant view of the models - model visualization, automatize our infrastructure, etc - and bridge the gap between the Data Scientists and the Data Architects. In a nutshell, the AI Software Engineer is responsible for wrapping the data science work into a final product.
This is just a basic overview of the role of an AI Software Engineer in a Data Science team and what kind of contributions that person brings to the table. Hopefully, that will help you understand what the role entails and why it is important to have software engineering concepts behind a data science work. Giving that this is a new role within the Data Science scope, there are still a lot of questions to be answered and the AI Software Engineer has to be flexible enough to implement his/her ideas within the team and to take action on areas that are not necessarily related to Software Engineering. Nonetheless, that is a very exciting position to be, with lots of opportunities to learn and grow. It is also a great opportunity for a Software Engineering to step into the exciting and ever so growing Data Science field. If you are interested in talking more about what an AI Software Engineer does or if you have any questions or suggestions, feel free to reach out, I would love to hear from you. In the meantime, stay curious and keep coding.