We are actually in 8th year of our inception, Dataeaze has been a reliable partner for global big and small businesses, in their data journey. We remain focused on making it easy for organizations to work with data and to build better decision-making mechanisms, by incorporating our domain knowledge and leveraging our customized artificial intelligence and deep learning techniques.
KEY RESPONSIBILITIES:
To develop various components in Python of our unified data pipeline framework.
To contribute towards the establishment of best practices for the optimal and efficient usage of Airflow, DBT and Snowflake.
To assist with the testing and deployment of our data pipeline framework utilizing standard testing frameworks and CI/CD tooling.
To monitor the performance of queries and data loads and perform tuning as necessary.
To provide assistance and guidance during the QA & UAT phases to quickly confirm the validity of potential issues and to determine the root cause and best resolution of verified issues.
SKILLS / QUALIFICATIONS:
Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or related field required.
At least 7 years of experience in data development and solutions in highly complex data environments with large data volumes.
At least 7 years of SQL / PLSQL experience with the ability to write ad-hoc and complex queries to perform data analysis.
At least 5 years of experience developing data pipelines and data warehousing solutions using Python and libraries such as Pandas, NumPy, PySpark, etc.
At least 3 years of experience developing solutions in a hybrid data environment (on-Prem and Cloud).
At least 3 years of experience developing Airflow DAGs to orchestrate data pipelines that utilize branching, dynamic DAG / task generation, and error handling.
Hands on experience with developing data pipelines for structured, semi-structured, and unstructured data and experience integrating with their supporting stores (e.g. RDBMS, NoSQL DBs, Document DBs, Log Files, etc.)
Hands on experience with Snowflake a must.
Hands on experience with Apache Spark a must.
Hands on experience with DBT preferred.
Experience with performance tuning SQL queries, Spark job, and stored procedures.
An understanding of E-R data models (conceptual, logical, and physical).
Understanding of advanced data warehouse concepts (Factless Fact Tables, Temporal \ Bi-Temporal models, etc.) a plus.
Strong analytical skills, including a thorough understanding of how to interpret customer business requirements and translate them into technical designs and solutions.
Strong communication skills both verbal and written. Capable of collaborating effectively across a variety of IT and Business groups, across regions, roles and able to interact effectively with all levels.
Self-starter. Proven ability to manage multiple, concurrent projects with minimal supervision. Can manage a complex ever changing priority list and resolve conflicts to competing priorities.
Strong problem-solving skills. Ability to identify where focus is needed and bring clarity to business objectives, requirements, and priorities.
Snowflake, Python, Airflow Engineer