We are seeking a highly skilled Data Engineer to design and implement solutions for moving and managing large datasets (60–100GB file sizes) efficiently. The ideal candidate will have strong experience in building reliable, scalable data pipelines, developing user-friendly interfaces for data transfer, and ensuring robust error handling. This role requires excellent problem-solving skills, attention to detail, and the ability to deliver production-ready solutions with minimal downtime.
Key Responsibilities:
Design, develop, and maintain efficient data transfer pipelines for large files (60–100GB).
Build a user-friendly interface to simplify dataset movement and monitoring for end-users.
Implement strong error handling, logging, and automated recovery mechanisms.
Optimize file transfer performance while ensuring data integrity and security.
Collaborate with stakeholders to gather requirements and deliver tailored solutions.
Monitor, troubleshoot, and resolve issues in production environments.
Maintain clear and comprehensive documentation for workflows and tools.
Required Skills & Experience:
Proven experience in data engineering with large-scale datasets.
Strong proficiency in Python, Java, or other relevant programming languages.
Hands-on experience with data transfer protocols, APIs, and cloud storage services (AWS S3, Azure Blob, GCP Storage, etc.).
Experience with UI development for data tools (React, Angular, or similar frameworks).
Strong understanding of distributed computing, parallel processing, and data pipeline optimization.
Familiarity with ETL tools, workflow orchestration (Airflow, Luigi, etc.), and automation scripts.
Knowledge of database systems (SQL and NoSQL).
Excellent debugging and performance tuning skills.
Nice-to-Have Skills:
Experience with big data technologies (Hadoop, Spark).
Familiarity with containerization (Docker, Kubernetes).
Understanding of data encryption and security best practices.
Soft Skills:
Strong attention to detail and commitment to high-quality deliverables.
Ability to work independently and proactively solve problems.
Clear communication skills and a collaborative mindset.
Python, Java, Data Transfer Protocols, APIs, AWS S3, Azure Blob, GCP Storage, React, Angular, Distributed Computing, Parallel Processing, Data Pipeline Optimization, ETL Tools, Workflow Orchestration, Airflow, Luigi, Automation Scripts, SQL, NoSQL, Debugging, Performance Tuning, Hadoop, Spark, Docker, Kubernetes, Data Encryption, Security Best Practices