SRE Lead

Posted 146ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

SRE Lead responsible for production reliability overseeing blockchain infrastructure and team operations. Lead the SRE team, manage incidents, and drive operational excellence for multi-region platforms.

Responsibilities:

Lead and grow the SRE team: hiring, onboarding, 1:1s, performance reviews, and career development.
Own SRE operating cadence: prioritization, planning, execution, and visibility of reliability work.
Maintain high standards for production readiness: runbooks, operational checklists, change management, and quality gates.
Own production reliability end-to-end across gateways, clusters, and blockchain node fleets.
Define and evolve SLIs/SLOs for uptime, response time, RPS, and time-to-resolve; partner with engineering teams to meet targets.
Own incident management standards: alerting strategy, escalation, incident coordination, and communications.
Run and improve postmortems: ensure follow-ups are executed and reliability debt is reduced over time.
Lead capacity planning and performance work across regions and chains; balance reliability, speed, and cost.
Lead design reviews and set engineering standards for reliability, scalability, and operational excellence.
Drive architecture decisions across Nomad + Kubernetes environments, gateways, and observability stack.
Build and evolve internal tooling that improves reliability and operational efficiency (automation, health systems, diagnostics, self-service).

Requirements:

3+ years in SRE / infrastructure / production engineering, including 1+ year leading people
Strong Linux, networking, and production incident debugging skills
Experience running and scaling distributed, multi-region, high-load systems
Hands-on with orchestration (Nomad and/or Kubernetes) and modern gateways/proxies
Solid observability practices (metrics, logs, traces, alerting, incident response)
Using AI agents to improve operational efficiency and reliability automation
Strong communication and ability to lead technical decisions end to end
Nice to have: Web3 / RPC infrastructure and blockchain node operations
HashiCorp stack (Nomad, Consul, Vault), Prometheus ecosystem
Terraform / IaC, capacity & cost modeling, DDoS and abuse protection
Building internal platforms: self-service tools, runbooks, reliability automation.

Benefits:

20 days of annual leave, plus an additional 12 days off to use for your holidays or personal days.
Well-being programs to support your health and balance.
Coworking space compensation for a productive work environment.
Paid sick leave to ensure you can rest when needed.
A company that invests in your growth, with personalized roadmaps to guide your professional development.
An actively growing company with great opportunities for both horizontal and vertical career development.
Opportunity to shape the initiatives you’re working on and make a real impact.

SRE Lead

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

GetBlock

Report this job

Similar Jobs

IVIX

Quevera

GFT Technologies

GFT Technologies

General Dynamics Information Technology

In All Media

EY

General Dynamics Information Technology

Veta Virtual

Alight Solutions

Vertical Relevance

HostPapa

Conexa Saúde

Cherokee Federal

Zealogics Inc

EverCommerce

Falconwood, Incorporated

RethinkFirst

DECA Games

Gainwell Technologies