Senior Infrastructure/Site Reliability Engineer, SRE

Posted 3ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

SRE ensuring reliability across a multi-region cloud-native platform for AI risk management. Architecting infrastructure and improving performance while mentoring engineering teams.

Responsibilities:

  • Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes).
  • Lead initiatives to improve availability, latency, and performance at scale.
  • Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability.
  • Define the metrics, alerts, and runbooks that form our observability backbone.
  • Run chaos experiments and failure simulations to harden the platform.
  • Mentor engineers and set best practices for SRE across the company.

Requirements:

  • Proven track record as a senior SRE or Infrastructure Engineer in high-scale environments.
  • Expert-level skills in AWS and Infrastructure as Code (Pulumi, Terraform).
  • Strong programming ability in Go or Python. We use Go.
  • Deep understanding of distributed systems (Kafka, ClickHouse) and microservices architecture.
  • Mastery of container orchestration (Kubernetes) and production debugging.
  • Strong sense of ownership, and the judgment to balance velocity with reliability.

Benefits:

  • Mission-driven teams: Work alongside industry veterans from Meta, Uber, Citi, and Confluent, all united by a shared goal to make the digital world safer.
  • Ownership and impact: We believe in extreme ownership. You'll be empowered to take responsibility, move fast, and make decisions that drive our mission forward.
  • Innovate at the cutting edge: Your work will shape how modern finance detects fraud and manages risk.