Senior Site Reliability Engineer, SRE

Posted 1ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer ensuring system reliability and performance for Compass UOL. Collaborating with teams to automate processes and improve operational excellence.

Responsibilities:

Ensure the reliability, availability, scalability, and performance of production systems;
Define, monitor, and evolve SLIs, SLOs, SLAs, and Error Budgets;
Implement and enhance observability practices, including logs, metrics, tracing, and alerts;
Participate in response to critical incidents, conduct root cause analyses (RCA), and lead blameless post-mortems;
Automate operational processes to reduce manual work and increase efficiency;
Collaborate with Development, DevOps, and Architecture teams to prevent systemic failures;
Plan and validate strategies for high availability, scalability, capacity planning, and disaster recovery;
Support technical decisions through analysis of reliability, performance, and utilization metrics;
Contribute to the continuous evolution of a reliability culture and operational excellence.

Requirements:

Bachelor's degree in Computer Science, Software Engineering, Information Systems, or a related field;
Proven experience in SRE, IT Operations, Cloud, or Software Engineering;
Experience with critical, distributed, and high-availability environments;
Experience with monitoring, incident management, and operational reliability;
Experience with large-scale AWS environments;
Advanced knowledge of Docker and Kubernetes;
Experience with observability, monitoring, and troubleshooting tools;
Automation skills using Python and Shell scripting;
Knowledge of resilience concepts, disaster recovery, capacity planning, and security;
Experience with Chaos Engineering;
Knowledge of OpenTelemetry and distributed observability.

14hr

Senior Site Reliability Engineer

Senior Site Reliability Engineer scaling backend systems to support high-volume customers at Honeycomb. Working with AWS, Kubernetes, and various backend teams in a fully remote setting.

Senior Site Reliability Engineer, SRE

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Compass

Report this job

Similar Jobs

Honeycomb.io

Tempo Software

Software Mind

Nagarro

Tec2Cloud

Red Hat

Inspired Testing

IVIX

Quevera

GFT Technologies

GFT Technologies

General Dynamics Information Technology

In All Media

EY

General Dynamics Information Technology

Veta Virtual

Alight Solutions

Vertical Relevance

HostPapa

Conexa Saúde