Develop and maintain scalable data processing pipelines on AWS. This role involves hands-on coding and optimization of AWS-based data solutions while working closely with the team to ensure efficient and high-performance architecture.
Key Responsibilities:
Develop and optimize AWS Glue, Lambda, and Athena workloads for data processing.
Implement AWS CDK to define and deploy infrastructure in an automated manner.
Work with AWS Lakehouse to manage structured and semi-structured data.
Assist in building serverless data architectures for batch and streaming data processing.
Work collaboratively with senior engineers and architects to improve performance and scalability........
Requirements:
5–8 years of experience as a Data Engineer with AWS ecosystem
Strong hands-on experience in developing and optimizing AWS Glue, Lambda, and Athena workloads
Expertise in building and managing AWS Lakehouse solutions for structured and semi-structured data
Proficiency in programming languages such as Python, PySpark, Spark SQL, TypeScript, Scala, or Java
Practical knowledge of AWS CDK for Infrastructure-as-Code and automated deployments
Experience in designing and optimizing scalable, high-performance data pipelines
Ability to collaborate with senior engineers and architects to enhance performance and scalability
Familiarity with batch and streaming data processing in serverless architectures
Must-Have Skills:
AWS Cloud Services (Glue, Lambda, Athena, Lakehouse, Iceberg)
AWS CDK for Infrastructure-as-Code
Programming proficiency in Python, Pyspark, Spark SQL, Typescript, Scala, or Java
Experience with data lakes and data warehousing on AWS
Nice-to-Have Skills:
Real-time data streaming (Firehose, Kinesis, Kafka)
Hands-on experience with Apache Iceberg for data lake management
Experience with NoSQL databases (DynamoDB)
AWS Glue, AWS Lambda, AWS Athena, AWS Lakehouse, AWS CDK, Python, PySpark, Spark SQL, TypeScript, Scala, Java, Apache Iceberg, Firehose, Kinesis, Kafka, DynamoDB