The Data Engineer will be responsible for designing, developing, and optimizing large-scale data pipelines using Databricks, Spark, and Python. This is a short-term, high-priority onsite engagement (6–9 months) supporting critical client data engineering initiatives. The engineer will work directly at the client location in Pennsylvania and collaborate closely with technical teams to deliver scalable and high-performing data solutions.
Key responsibilities
• Design, build, and maintain Databricks-based ETL/ELT pipelines.
• Develop high-performance Spark (PySpark) workflows for data processing.
• Work with large-scale data in Lakehouse/Data Lake environments.
• Optimize and troubleshoot existing Databricks jobs and clusters.
• Collaborate with business and technical stakeholders to understand data requirements.
• Implement data quality checks, validation rules, and monitoring processes.
• Work with orchestration tools (ADF or equivalent) to schedule and automate workflows.
• Ensure best practices in version control, CI/CD, and documentation.
• Support production pipelines and resolve data-related issues proactively.
Key competencies
• Strong analytical and problem-solving mindset.
• Ability to work independently in a fast-paced, onsite environment.
• Excellent communication and cross-functional collaboration skills.
• Strong ownership and accountability for deliverables.
• Adaptability to dynamic project needs.
Databricks (advanced, hands-on), Python, ETL/ELT pipeline development, Spark (SQL/PySpark), Data Engineering, Databricks Lakehouse, Data Lake architecture, Big Data processing, Azure Data Factory (ADF), Workflow orchestration, Data pipeline optimization, Data quality and validation, CI/CD practices, Version control (Git), Performance tuning, Data troubleshooting and monitoring