Lead Site Reliability Engineer

Posted 5hrs ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Lead Site Reliability Engineer at Gifthealth developing scalable Ruby on Rails applications. Responsible for embedding reliability, automation, and DevOps practices into software systems.

Responsibilities:

Designs, builds, and maintains reliable, scalable software systems supporting Ruby on Rails applications
Embeds reliability, performance, and operational best practices into application code and development workflows
Owns DevOps practices including CI/CD reliability, deployment strategies, and release safety
Leads incident response, debugging, and root cause analysis across application and platform layers
Implements and evolves observability (logging, metrics, tracing) within application and service code
Partners with engineering teams on architecture, capacity planning, and technical standards

Requirements:

Bachelor’s degree in computer science, engineering, or related field OR equivalent professional experience in software engineering, SRE, or DevOps roles (Required)
Cloud platform certifications (AWS, GCP, Azure) (Preferred)
SRE or DevOps-focused certifications (Preferred)
5+ years of experience in software engineering, SRE, or DevOps roles (Required)
Hands-on experience building and operating Ruby on Rails applications in production (Required)
Experience in owning production incidents and application-level reliability (Required)
Experience in high-growth or scaling engineering organizations (Preferred)
Experience working in regulated or customer-impact–sensitive environments (Preferred)
Knowledge of Ruby on Rails application architecture and production operations; software reliability engineering principles (SLOs, SLIs, error budgets); and modern DevOps and CI/CD practices (Required)
Knowledge of security and compliance considerations in production systems (Preferred)
Strong software engineering skills (Ruby and/or comparable backend languages) (Required)
Debugging and performance optimization of production applications skills (Required)
CI/CD pipelines, deployment automation, and release tooling skills (Required)
Monitoring and observability tooling (Datadog, New Relic, Prometheus, etc.) skills (Required)
Infrastructure as Code (Terraform or similar) skills (Preferred)
Containerization and orchestration (Docker) skills (Preferred)
Ability to write production-quality code that improves system reliability (Required)
Ability to collaborate with product and engineering teams to influence design decisions (Required)
Ability to troubleshoot complex, cross-system failures (Required)
Ability to mentor engineers on operational ownership and reliability practices (Preferred)
Ability to balance speed of delivery with long-term system health (Preferred)

Lead Site Reliability Engineer

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Gifthealth

Report this job

Similar Jobs

Zeta Global

Verity Group

Supabase

Travelers

SysMap Solutions

Alten México

Lanlink Informática Ltda.

Gugu Robotics

YDUQS

Coderio

Zapier

GXA

Stefanini Brasil

Yelp

Cognitive Medical Systems, Inc.

Gorilla Logic

Verity Group

Apriorit

Apriorit

Fable