Seeking an experienced Data Engineer to design, develop, and optimize robust data pipelines, real-time processing solutions, and scalable ETL processes. Collaborate with cross-functional teams to implement data-driven solutions, maintain high-quality data infrastructure, and mentor junior engineers.
Responsibilities:
Architect, develop, and maintain robust and scalable data pipelines and ETL processes to support business needs and ensure data accuracy and availability.
Implement real-time data processing solutions using streaming technologies like Apache Kafka, Apache Flink, or Spark Streaming.
Work closely with data scientists, analysts, and product managers to understand business requirements and translate them into scalable technical solutions.
Collaborate with the data engineering team to design and implement innovative data-driven solutions.
Engage with stakeholders to gather and document requirements, translating them into technical specifications and actionable tasks for the engineering team.
Lead technical design sessions and code reviews to ensure adherence to best practices and industry standards.
Identify, troubleshoot, and resolve issues related to data pipelines, ETL processes, and data infrastructure to maintain high data quality and reliability.
Implement monitoring solutions and alerts using tools like Grafana, Prometheus, or CloudWatch to proactively detect and resolve issues.
Monitor and optimize data pipelines and infrastructure for performance, scalability, and cost-efficiency.
Implement automated testing and validation processes to ensure data integrity and quality.
Mentor junior engineers, providing guidance and support to help them grow their technical skills and understanding of data engineering best practices.
Lead projects and initiatives, ensuring successful delivery and alignment with business objectives.
Qualifications:
Bachelor's degree in Computer Science, Information Technology, or a related field.
Minimum of 5 years of experience in data engineering, with a proven track record of building and maintaining complex data systems.
Strong experience in designing, building, and deploying data lakes and data warehouses on cloud platforms.
Experience with Azure Data toolset and big data technologies such as Hadoop and Kafka.
Strong background in data warehousing and data lakes, including data modeling and data quality management.
Proficiency in SQL and NoSQL databases.
Experience with ETL tools like Apache NiFi, Talend, or Informatica.
Familiarity with data orchestration tools such as Apache Airflow, Prefect, or Azure Data Factory.
Experience with Snowflake, Amazon Redshift, Google BigQuery, or Azure Synapse for data warehousing solutions.
Experience with cloud computing platforms such as AWS, Azure, and GCP.
Strong problem-solving and analytical skills, with a keen attention to detail.
Excellent communication and interpersonal skills, with the ability to work effectively in a collaborative team environment.
Extensive experience with Apache Spark, Databricks, Hadoop, and Kafka for batch and real-time data processing.
Familiarity with Spark's ecosystem including SparkSQL, SparkML, and PySpark.
Data pipelines & ETL processes, Real-time data processing (Kafka, Flink, Spark Streaming), Cloud platforms (AWS, Azure, GCP), Data lakes & warehouses (Snowflake, Synapse, BigQuery), Big data tools (Apache Spark, Databricks, Hadoop), SQL & NoSQL databases, ETL tools (NiFi, Talend, Informatica), Data orchestration (Airflow, Prefect, ADF), Monitoring tools (Grafana, Prometheus, CloudWatch).