We are looking for an Operations & Reliability Tech Lead to drive the strategy and execution of our production operations, ensuring high availability, scalability, and resilience across cloud, hybrid, and on‑prem environments. This role is key to maintaining 24/7 system reliability , improving operational maturity, and enabling business growth through automation, observability, and strong engineering practices.
Responsabilidades e atribuições
- Lead the operational lifecycle of production systems, ensuring high availability and performance.
- Define and improve monitoring, alerting, and observability frameworks.
- Drive deployment strategies (zero‑downtime, multi‑region, high availability).
- Lead incident response and root cause analysis, promoting continuous improvement.
- Collaborate with Engineering, Product, and Infrastructure teams.
- Improve automation, CI/CD pipelines, and Infrastructure as Code practice...