Data Engineer

Overview

We are seeking Data Engineers to join the Data Onboarding Engineering team. This role is focused on building and operating robust, scalable data pipelines that ingest and process 30+ TB of data daily, primarily using Python on Google Cloud Platform (GCP).

The engineer will collaborate closely with business partners, researchers, and trading teams to onboard high-value datasets that directly power systematic trading and research workflows.

The ideal candidate is highly hands-on, production-focused, and comfortable operating in a high-performance, data-intensive environment.

Job Description

Key Responsibilities
Work closely with business stakeholders to understand data requirements and usage patterns
Collaborate with engineers, researchers, and portfolio managers to onboard new and complex datasets
Design, build, and support production-grade ETL and data ingestion pipelines using Python
Operate and scale data pipelines running on Google Cloud infrastructure
Ensure strong standards around data quality, reliability, monitoring, and operational support
Handle large-scale batch data ingestion volumes (30TB+ per day)
Extend and enhance the existing data onboarding framework to support new data sources and formats
Troubleshoot and resolve pipeline failures and data quality issues in production
Contribute to documentation, operational runbooks, and engineering best practices

Desired Skills and Experience

Essential Skills
3+ years of professional experience as a Data Engineer or in a similar role
3+ years of hands-on experience building ETL pipelines in production environments
Strong Python programming skills for data processing and pipeline development
Practical experience with cloud-based data platforms, preferably Google Cloud Platform (GCP)
Solid understanding of data operations, including ingestion, processing, storage, quality, and lifecycle management
Strong SQL skills and familiarity with data modeling concepts

Nice-to-Have Skills
Experience with Snowflake as a cloud data warehouse
Exposure to Spark or other distributed data processing frameworks
Familiarity with Lakehouse concepts (Delta Lake or similar formats)
Experience with event-driven or streaming data pipelines
Background working with financial, market, or alternative datasets
Knowledge of data observability, lineage, and governance tooling

Behavioral Competencies
Strong problem-solving and analytical mindset
Excellent collaboration and communication skills
Ability to work effectively with cross-functional technical and non-technical teams
High ownership and accountability in a production environment
Comfortable working in a fast-paced, data-driven organization

Skills & Requirements

Python (data engineering focus), ETL pipeline development, Data pipeline orchestration (Airflow/Prefect/Luigi), SQL & data modeling, Google Cloud Platform (GCP), Data quality & monitoring, Snowflake, Spark, Lakehouse concepts (Delta Lake or similar), Event-driven or streaming data pipelines, Financial or market data handling, Data observability, Data lineage and governance tooling.

Apply Now

Join Our Community

Let us know the skills you need and we'll find the best talent for you