Senior SRE/DevOps Engineer

Posted 125ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer for Scentbird focusing on Kubernetes and AWS infrastructure reliability and performance. Collaborating with engineering teams to improve operational efficiency and service availability.

Responsibilities:

Use your shift to prevent incidents from ever happening.
Run our infrastructure with AWS, Docker and Kubernetes.
Make monitoring and alerting alert on symptoms and not on outages.
Document every action so your findings turn into repeatable actions–and then into automation.
Improve the deployment process to make it as boring as possible.
Design, build and maintain core infrastructure pieces that allow Scentbird scaling.
Debug production issues across services and levels of the stack.
Plan the growth of Scentbird’s infrastructure.

Requirements:

Strong hands-on Kubernetes experience required (EKS preferred): cluster operations, workload design, networking, upgrades, and performance troubleshooting.
5+ years production application support experience in a high uptime environment
5+ years UNIX administration experience including diagnosis of performance issues, package management, load estimation, kernel tuning, networking configuration, etc.
5+ years hosting experience in a large heavy-traffic environment
3+ years software engineering experience (Java/TypeScript is plus, but any other programming language is good to know)
Strong understanding of networking fundamentals (VPCs, routing, load balancers, DNS, TCP/IP, TLS) and debugging service-to-service connectivity issues.
4+ years experience working with Gitlab CI/CD / Github action
Hands-on AWS experience strongly required
Hands-on experience building monitoring/alerting/tracing systems (Grafana/Prometheus/ELK/OpenTelemetry).
Database experience is a plus (RDS/Aurora/Postgres/Redis/Elasticsearch), especially around scaling, replication, and performance troubleshooting.
Service Mesh experience is a plus
Security experience is a plus
Excellent troubleshooting and analytical skills
Ability to work independently on large, complex projects with minimal guidance
Excellent troubleshooting and analytical skills with the ability to debug distributed systems under pressure.

Benefits:

Competitive base compensation
Bonus program
Paid Time Off
A fun, creative and energetic work environment.

3hr

Staff Site Reliability Engineer

Staff Site Reliability Engineer managing production infrastructure across AWS and Azure for ScalePad. Fostering engineering culture and leading initiatives in reliability and developer experience.

Senior SRE/DevOps Engineer

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

Scentbird

Report this job

Similar Jobs

ScalePad

ScalePad

Madiff

PhoenixTeam

Availity

General Dynamics Information Technology

Social Discovery Group

SimSpace

Concept Plus, LLC

Button

General Dynamics Information Technology

SouthState Bank

Internas

Internas

General Dynamics Information Technology

ZigZag Offshoring

Aequilibrium

Cyclotron, Inc.

CDW

NetApp