We are seeking Data Engineers to join the Data Onboarding Engineering team. This role is focused on building and operating robust, scalable data pipelines that ingest and process 30+ TB of data daily, primarily using Python on Google Cloud Platform (GCP).
The engineer will collaborate closely with business partners, researchers, and trading teams to onboard high-value datasets that directly power systematic trading and research workflows.
The ideal candidate is highly hands-on, production-focused, and comfortable operating in a high-performance, data-intensive environment.
Key Responsibilities
• Work closely with business stakeholders to understand data requirements and usage patterns
• Collaborate with engineers, researchers, and portfolio managers to onboard new and complex datasets
• Design, build, and support production-grade ETL and data ingestion pipelines using Python
• Operate and scale data pipelines running on Google Cloud infrastructure
• Ensure strong standards around data quality, reliability, monitoring, and operational support
• Handle large-scale batch data ingestion volumes (30TB+ per day)
• Extend and enhance the existing data onboarding framework to support new data sources and formats
• Troubleshoot and resolve pipeline failures and data quality issues in production
• Contribute to documentation, operational runbooks, and engineering best practices
Desired Skills and Experience
Essential Skills
• 3+ years of professional experience as a Data Engineer or in a similar role
• 3+ years of hands-on experience building ETL pipelines in production environments
• Strong Python programming skills for data processing and pipeline development
• Practical experience with cloud-based data platforms, preferably Google Cloud Platform (GCP)
• Solid understanding of data operations, including ingestion, processing, storage, quality, and lifecycle management
• Strong SQL skills and familiarity with data modeling concepts
Nice-to-Have Skills
• Experience with Snowflake as a cloud data warehouse
• Exposure to Spark or other distributed data processing frameworks
• Familiarity with Lakehouse concepts (Delta Lake or similar formats)
• Experience with event-driven or streaming data pipelines
• Background working with financial, market, or alternative datasets
• Knowledge of data observability, lineage, and governance tooling
Behavioral Competencies
• Strong problem-solving and analytical mindset
• Excellent collaboration and communication skills
• Ability to work effectively with cross-functional technical and non-technical teams
• High ownership and accountability in a production environment
• Comfortable working in a fast-paced, data-driven organization
Python (data engineering focus), ETL pipeline development, Data pipeline orchestration (Airflow/Prefect/Luigi), SQL & data modeling, Google Cloud Platform (GCP), Data quality & monitoring, Snowflake, Spark, Lakehouse concepts (Delta Lake or similar), Event-driven or streaming data pipelines, Financial or market data handling, Data observability, Data lineage and governance tooling.