Short DescriptionThoughtWorks is hiring for Data Engineer who will be creating complex data processing pipelines, as part of diverse, high energy teams and design scalable implementations of the models developed by Data Scientists.
- Creating complex data processing pipelines, as part of diverse, high energy teams.
- Designing scalable implementations of the models developed by our Data Scientists.
- Hands-on programming based on TDD, usually in a pair programming environment.
- Deploying data pipelines in production based on Continuous Delivery practices.
- Advising clients on the usage of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
- Minimum of 6 years of overall industry experience.
- 3+ years of experience building and deploying large scale data processing pipelines in a production environment.
- Experience building data pipelines and data-centric applications using distributed storage platforms like HDFS, S3, NoSQL databases (HBase, Cassandra, etc) and distributed processing platforms like Hadoop, Spark, Hive, Oozie, Airflow, etc in a production setting.
- Hands on experience in MapR, Cloudera, Hortonworks and/or Cloud (AWS EMR, Azure HDInsights, Qubole etc.) based Hadoop distributions.
- Experience working with, or an interest in Agile Methodologies, such as Extreme Programming (XP) and Scrum.
- Knowledge of software best practices, like Test-Driven Development (TDD) and Continuous Integration (CI).
- Strong communication and client-facing skills with the ability to work in a consulting environment are essential.
- Senior developers (6+ years) are expected to be the Architect for small and large enterprise projects. On larger projects, you are expected to work closely with the fellow architects to come up with the architecture and take it further.
- The desire to contribute to the wider technical community through collaboration, coaching, and mentoring of other technologists.