Senior Data Engineer

Overview

We are looking for a dynamic and experienced Senior Data Engineer – Databricks to design, build, and optimize robust data pipelines using the Databricks Lakehouse platform. The ideal candidate should have strong hands-on skills in Apache Spark, PySpark, cloud data services, and a good grasp of Python and Java. This role involves close collaboration with architects, analysts, and developers to deliver scalable and high-performing data solutions across AWS, Azure, and GCP.

 

Job Description

Data Pipeline Development

Build scalable and efficient ETL/ELT workflows using Databricks and Spark for both batch and streaming data.

Leverage Delta Lake and Unity Catalog for structured data management and governance.

Optimize Spark jobs by tuning configurations, caching, partitioning, and serialization techniques.

Cloud-Based Implementation

Develop and deploy data workflows onAWS (S3, EMR,Glue), Azure (ADLS, ADF, Synapse), and/orGCP (GCS, Dataflow, BigQuery).

Manage and optimize data storage, access control, and pipeline orchestration using native cloud tools.

Use toolslike Databricks Auto Loader and SQL Warehousing for efficient data ingestion and querying.

Programming & Automation

Write clean, reusable, and production-grade code in Python and Java.

Automate workflows using orchestration tools(e.g., Airflow,ADF, or Cloud Composer).

Implement robust testing, logging, and monitoring mechanisms for data pipelines.

Collaboration & Support

Collaborate with data analysts, data scientists, and business users to meet evolving data needs.

Support production workflows, troubleshoot failures, and resolve performance bottlenecks.

Document solutions, maintain version control, and follow Agile/Scrum processes.

Required Skills

Technical Skills:

Databricks: Hands-on experience with notebooks, cluster management, Delta Lake, Unity Catalog, and job orchestration.

Spark: Expertise in Spark transformations, joins, window functions, and performance tuning.

Programming: Strong in PySpark and Java, with experience in data validation and error handling.

Cloud Services: Good understanding of AWS, Azure, or GCP data services and security models.

DevOps/Tools: Familiarity with Git, CI/CD, Docker (preferred), and data monitoring tools.

Experience:

5–8 years of data engineering or backend development experience.

Minimum 1–2 years of hands-on work in Databricks with Spark.

Exposure to large-scale data migration, processing, or analytics projects.

Certifications (nice to have):

Databricks Certified Data Engineer Associate.

Working Conditions:

Working Conditions Details :Hours of work Full-time hours; Flexibility forremote work with ensuring availability during US Timings.

Overtime expectations: Overtime may not be required aslong asthe commitment is accomplished

Work environment Primarily remote; occasional on-site work may be needed only during client visit.

Travel requirements: No travel required.

On-call responsibilities: On-call duties during deployment phases.

Special conditions or requirements: Not Applicable.

Workplace Policies and Agreements:

Confidentiality Agreement: Required to safeguard client sensitive data.

Non-Compete Agreement: Must be signed to ensure proprietary model security.

Non-Disclosure Agreement: Must be signed to ensure client confidentiality and security.

Skills & Requirements

Databricks, Apache Spark, PySpark, Delta Lake, Unity Catalog, AWS (S3, EMR, Glue), Azure (ADLS, ADF, Synapse), GCP (GCS, Dataflow, BigQuery), Python, Java, Databricks Auto Loader, SQL Warehousing, Airflow, Cloud Composer, Git, CI/CD, Docker, data monitoring tools, data validation, error handling, performance tuning, Agile, Scrum

Apply Now

Join Our Community

Let us know the skills you need and we'll find the best talent for you