Senior Site Reliability Engineer, C#, .NET

Posted 12hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Site Reliability Engineer at Climavision ensuring reliability of weather data services across various environments. Focused on improving operational maturity and handling complex production issues.

Responsibilities:

  • Own production reliability for Climavision’s customer-facing platform and radar-derived weather data services across Azure, colocation, and edge Kubernetes environments.
  • Contribute to the definition and improvement of SLIs, SLOs, alerting standards, and operational metrics used to measure platform reliability.
  • Support and coordinate production incident response efforts, including troubleshooting, mitigation, communication, and postmortem analysis.
  • Diagnose and resolve complex production issues across application services, Kubernetes infrastructure, storage, and distributed systems.
  • Drive multi-replica and multi-cluster high availability across Climavision’s .NET services.
  • Improve reliability and operational maturity of production platform services, including observability, autoscaling, ingress, and distributed storage.
  • Partner with software engineering teams to improve production readiness, resiliency patterns, deployment safety, and operational visibility before services reach production.
  • Support and evolve Climavision’s observability platform, including metrics, logging, distributed tracing, dashboarding, and alerting.

Requirements:

  • A bachelor’s degree in computer science, software engineering, or a related field; equivalent professional experience considered.
  • Minimum of 7 years of experience in Site Reliability Engineering, DevOps, Production Engineering, Platform Engineering, or a related infrastructure-focused role, with at least 4 years in a role formally titled Site Reliability Engineer or carrying explicit SLO / error-budget accountability.
  • Strong, hands-on software engineering experience with a minimum of 3 years of experience supporting and modifying C# / .NET applications in production environments.
  • Demonstrated experience refactoring production application code (preferably C# / .NET) to make services horizontally scalable across multiple replicas.
  • Experience designing or operating multi-cluster high-availability architectures, including failover behavior, traffic routing, and cross-cluster service deployment.
  • Strong hands-on experience operating production workloads in self-managed or highly customized Kubernetes environments.
  • Experience diagnosing and resolving production incidents across application, platform and Kubernetes infrastructure layers, including workload scheduling, storage, ingress, and cluster-level failures.
  • Strong written and verbal communication skills, including incident documentation and postmortem authoring.

Benefits:

  • Competitive compensation
  • Comprehensive benefits package
  • 401(k) Savings Plan
  • Medical/Dental/Vision Benefits
  • Health Savings Account (HSA) and Flexible Spending Account (FSA)
  • Unlimited Paid Time-off
  • 11 Paid Holidays
  • Paid Parental Leave
  • Company Paid Short-term Disability (STD)
  • Company Paid Long-term Disability (LTD)
  • Company Paid Life Insurance