Experience: 4+ years
We are looking to build out Data Engineering capabilities within the team to support the development of new features as well as the maintenance of the existing data pipelines.
The existing data pipelines use Apache Spark (Deployed on AWS EMR) with a mixture of Java, Python, and Scala languages. Our preferred language for new features is Python. Whilst data usually starts in CSV and JSON(L) format it usually is transformed into Apache Parquet format.
If you are detail-oriented, with excellent organizational skills and experience in this field, we’d like to hear from you.
- Experience with using Apache Spark (within any language)
- Proficient in Python to support the development of new functionality
- Basic experience of Java and/or Scala is desirable to support the maintenance of existing code
- Strong understanding of the software development life cycle and best practices and experience of working in an Agile development environment and SCRUM
- Strong leadership and communication skills