Work closely with data engineers, the product team, and other stakeholders to gather data requirements, and design and build efficient data pipelines
Create and maintain algorithms and data processing code in Java / Groovy
Implement processes for data validation, cleansing, and transformation to ensure data accuracy and consistency
Develop Python scripts to automate data extraction from both new and existing sources
Monitor and troubleshoot the performance of data pipelines in Airflow, proactively addressing any issues or bottlenecks
Write SQL queries to extract data from BigQuery and develop reports using Google’s Looker Studio
Participate in daily stand-ups, sprint planning, and retrospective meetings
Engage in peer code reviews, knowledge sharing, and assist other engineers with their work
Introduce new technologies and best practices as needed to keep the product up to date
Assist in troubleshooting and resolving production escalations and issues
Requirements
Bachelor's degree or equivalent programming experience
4-5 years of overall experience as a backend software developer, with at least 2 years as a Data Engineer using Spark with Java / Groovy and / or Python
Strong coding skills, and knowledge of data structures, OOP principles, databases, and API design
Highly proficient in developing programs and data pipelines in Java / Groovy or Python
2+ years of professional experience with Apache Spark / Hadoop
Nice to have
Work experience with AWS (EMR, S3, lambda, EC2, glue, RDS)
Work experience with SQL (MYSQL is a Plus) and NoSQL Databases
Experience with Elasticsearch
Experience with Python
Experience with Scala (Zeppelin)
Experience with Airflow or other ETL
Certification or verified training in one or more of the following technologies / products : AWS, ElasticSearch