Senior Infrastructure/Site Reliability Engineer, SRE
Posted 3ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
SRE ensuring reliability across a multi-region cloud-native platform for AI risk management. Architecting infrastructure and improving performance while mentoring engineering teams.
Responsibilities:
- Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes).
- Lead initiatives to improve availability, latency, and performance at scale.
- Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability.
- Define the metrics, alerts, and runbooks that form our observability backbone.
- Run chaos experiments and failure simulations to harden the platform.
- Mentor engineers and set best practices for SRE across the company.
Requirements:
- Proven track record as a senior SRE or Infrastructure Engineer in high-scale environments.
- Expert-level skills in AWS and Infrastructure as Code (Pulumi, Terraform).
- Strong programming ability in Go or Python. We use Go.
- Deep understanding of distributed systems (Kafka, ClickHouse) and microservices architecture.
- Mastery of container orchestration (Kubernetes) and production debugging.
- Strong sense of ownership, and the judgment to balance velocity with reliability.
Benefits:
- Mission-driven teams: Work alongside industry veterans from Meta, Uber, Citi, and Confluent, all united by a shared goal to make the digital world safer.
- Ownership and impact: We believe in extreme ownership. You'll be empowered to take responsibility, move fast, and make decisions that drive our mission forward.
- Innovate at the cutting edge: Your work will shape how modern finance detects fraud and manages risk.

















