Senior/Staff Infrastructure, Site Reliability Engineer (SRE)
Posted 1ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior SRE managing resilient cloud infrastructure for Oscilar's AI Risk Decisioning™ Platform. Leading best practices and mentoring engineers in a remote-first culture.
Responsibilities:
- Architect and operate resilient cloud infrastructure (AWS, Pulumi, Kubernetes).
- Lead initiatives to improve availability, latency, and performance at scale.
- Design and evolve our CI/CD pipelines to optimize for speed, safety, and repeatability.
- Define the metrics, alerts, and runbooks that form our observability backbone.
- Run chaos experiments and failure simulations to harden the platform.
- Mentor engineers and set best practices for SRE across the company.
Requirements:
- Proven track record as a senior SRE or Infrastructure Engineer in high-scale environments.
- Expert-level skills in AWS and Infrastructure as Code (Pulumi, Terraform).
- Strong programming ability in Go or Python. We use Go.
- Deep understanding of distributed systems (Kafka, ClickHouse) and microservices architecture.
- Mastery of container orchestration (Kubernetes) and production debugging.
- Strong sense of ownership, and the judgment to balance velocity with reliability.
Benefits:
- Compensation: Competitive salary and equity packages, including a 401k plan
- Flexibility: Remote-first culture — work from anywhere
- Health: 100% Employer covered comprehensive health, dental, and vision insurance with a top tier plan for you and your dependents (US)
- Balance: Unlimited PTO policy
- Technical: AI First company; both Co-Founders are engineers at heart; and over 50% of the company is Engineering and Product
- Culture: Family-Friendly environment; Regular team events and offsites
- Development: Unparalleled learning and professional development opportunities
- Impact: Making the internet safer by protecting online transactions


















