Senior Site Reliability Engineer, Node Platform

Posted 28ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Site Reliability Engineer designing infrastructure primitives for decentralized networks. Collaborate on Kubernetes-based control planes and improve operational efficiency.

Responsibilities:

  • You will design and build the infrastructure primitives that define how Chainlink Decentralized Oracle Networks (DONs) scale across internal systems and the decentralized ecosystem.
  • You will help create the CRE (Kubernetes-based) control plane that enables:
  • Deterministic horizontal scaling of DONs
  • Safe and repeatable infrastructure expansion
  • Improved operational efficiency and scalability
  • You will develop the core infrastructure components, including Kubernetes Operators and scaling automation, that Product teams will adopt and then might later be distributed to external node operators to improve decentralized scaling.

Requirements:

  • 6–9+ years in SRE / Platform / Infrastructure Engineering
  • Proven experience scaling Kubernetes in high-throughput production environments
  • Deep knowledge of:
  • Scheduler behavior
  • StatefulSets & persistent workloads
  • Autoscaling strategies (HPA, VPA, KEDA, custom scaling)
  • Resource management & performance tuning
  • Multi-cluster and multi-region architectures
  • Experience in diagnosing production failures at the cluster scale
  • Strong Terraform or Crossplane experience
  • GitOps workflows (ArgoCD / Flux) experience
  • CI/CD reliability experience
  • Automation-first mindset
  • AWS production experience
  • Proficiency in Go (strongly preferred) or equivalent systems language.

Benefits:

  • All roles with Chainlink Labs are global and remote-based.
  • We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes.
  • Commitment to Equal Opportunity