Strategic Ops Engineer III

Posted 2hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Strategic Ops Engineer III at Backblaze focusing on IT operations management and incident resolution strategies. Working with engineering teams to optimize system resilience and performance.

Responsibilities:

  • Available to Lead and govern the end-to-end incident management lifecycle, including detection, triage, escalation, and resolution.
  • Drive major incident management (MIM) processes and communications.
  • Improve MTTR (Mean Time to Resolution) through automation and process optimization.
  • Establish and maintain incident response playbooks and runbooks.
  • Maintain and improve intelligent heatmaps leveraging AI/ML to identify recurring technical themes and prioritize long-term remediation.
  • Implement trend analysis and proactive problem identification using observability data and AI.
  • Track and manage problem records to closure.
  • Govern change management processes (lead the CAB), ensuring safe, compliant, and low-risk deployments.
  • Define and enforce change policies, risk assessments, and approval workflows.
  • Drive continuous improvement in release and deployment practices.
  • Maintain a strong understanding of system architecture and monitoring strategies, identifying gaps and opportunities for improvement.
  • Partner with engineering teams to improve system resilience and performance.
  • Reduce alert fatigue by improving signal-to-noise ratio in monitoring systems.
  • Leverage AI/ML for anomaly detection, predictive alerting, and automated root cause analysis.
  • Implement AI-driven solutions to optimize incident response and operational workflows.
  • Analyze large-scale operational data to identify patterns and recommend improvements.

Requirements:

  • 5+ years of experience in IT Operations, SRE, or similar roles.
  • Strong expertise in Incident, Problem, and Change Management (ITIL or similar frameworks).
  • Proven experience in governing and optimizing operational processes.
  • AI & Data Expertise: Strong knowledge of AI/ML concepts, including anomaly detection, predictive analytics, and data modeling.
  • AIOps Experience: Hands-on experience with AIOps platforms or building AI-driven operational solutions (event correlation, alert prioritization).

Benefits:

  • Health insurance
  • Paid time off
  • Professional development opportunities
  • Remote work options