Staff Site Reliability Engineer

Posted 102ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Staff Site Reliability Engineer responsible for improving reliability across systems at RWS. Lead technical investigations and define engineering standards for cloud and on-prem infrastructure.

Responsibilities:

  • Lead technical investigations into reliability issues across cloud and on-prem systems
  • Define and drive adoption of SLIs, SLOs, and error budgets
  • Provide senior SRE guidance in incident reviews and long-term remediation
  • Shape the architecture of RWS’s new observability platform
  • Identify systemic reliability bottlenecks and design technical solutions

Requirements:

  • Senior-level SRE or platform engineer experience
  • Strong expertise in observability and platforms such as Prometheus, Grafana
  • Experience with OpenTelemetry
  • Knowledge of AWS and/or GCP, Kubernetes/EKS
  • Ability to analyze complex system behavior
  • Strong technical communication skills

Benefits:

  • Health insurance
  • Professional development opportunities
  • Remote work options
  • Flexible work hours