We are looking for an experienced backend or infrastructure-focused engineer with strong expertise in Kubernetes, cloud infrastructure, and automation. The role involves building and maintaining reliable systems, contributing to production-grade code and infrastructure, and driving long-term improvements in reliability and performance. The ideal candidate will have a systems-thinking mindset, hands-on cloud experience, and the ability to influence through high-quality code, documentation, and design reviews.
Key Responsibilities:
Write, review, and maintain production-grade code and infrastructure-as-code (Terraform, Helm, GitHub Actions).
Manage and optimize Kubernetes in production (preferably GKE), including autoscaling and ingress strategies.
Design, implement, and secure cloud infrastructure (IAM, VPCs, secrets, workload identity, CloudSQL optimization).
Apply systems-thinking to ensure resilience, minimize cascading failures, and manage blast radius effectively.
Participate in incident mitigation, root cause analysis, and implement long-term reliability improvements.
Influence engineering decisions through well-crafted PRs, documentation, and design reviews.
(Desirable) Contribute to GitOps workflows with tools like ArgoCD/FluxCD and develop or integrate Kubernetes operators.
(Desirable) Implement and monitor SLIs, SLOs, and structured alerting to improve service reliability.
Requirements
Experience in backend or infra-focused engineering roles (e.g., SRE, platform, DevOps, or fullstack).
Can confidently write or review production-grade code and infra-as-code (Terraform, Helm, GitHub Actions, etc.).
Have deep hands-on experience with Kubernetes in production, ideally on GKE, including workload autoscaling and ingress strategies.
Understand cloud concepts like IAM, VPCs, secret storage, workload identity, and CloudSQL performance characteristics.
Think in systems: you understand cascading failure, timeout boundaries, dependency health, and blast radius.
Regularly contribute to incident mitigation or long-term fixes (not just closing alerts).
Can influence through well-written PRs, documentation, and thoughtful design reviews.
Desirable Experience:
Exposure to GitOps tooling such as ArgoCD or FluxCD.
Experience developing or integrating Kubernetes operators.
Familiarity with service-level indicators (SLIs), service-level objectives (SLOs), and structured alerting.
backend engineering, infrastructure engineering, SRE, platform engineering, DevOps, fullstack, production-grade coding, infrastructure-as-code, Terraform, Helm, GitHub Actions, Kubernetes, GKE, workload autoscaling, ingress strategies, IAM, VPCs, secret storage, workload identity, CloudSQL, systems thinking, incident response, mitigation, PR reviews, documentation, design reviews, GitOps, ArgoCD, FluxCD, Kubernetes operators, SLIs, SLOs, structured alerting