Service Reliability & Operations Manager

Posted 163ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Service Reliability & Operations Manager overseeing IT services for Kentro, ensuring IT service stability and operational excellence.

Responsibilities:

Lead teams responsible for Application Performance Monitoring (APM), observability, and “eyes on glass” 24/7 monitoring functions.
Ensure proactive detection of service degradation and performance anomalies.
Drive adoption of modern monitoring tools, dashboards, and alerting frameworks.
Oversee the major incident process, ensuring rapid triage, escalation, communication, and resolution.
Serve as the escalation point for Critical/High incidents and coordinate cross-functional response.
Conduct post-incident reviews and ensure corrective actions are implemented.
Manage sustainment of critical integrations, ensuring reliability, version alignment, and lifecycle management.
Partner with engineering teams to ensure smooth handoffs from project delivery to steady-state operations.
Maintain documentation, runbooks, and operational readiness standards.
Track and improve KPIs such as MTTR, service availability, alert fidelity, and incident volume trends.
Identify systemic issues and drive continuous improvement initiatives across operations.
Ensure alignment with ITIL processes, especially incident, problem, and change management.
Lead, mentor, and develop a team of analysts, engineers, and incident managers.
Foster a culture of accountability, collaboration, and operational discipline.
Build succession plans, training programs, and career pathways for operational staff.
Partner with other ESOM teams to ensure end-to-end service reliability.
Work closely with the PMO on readiness for new services, innovation pilots, and portfolio changes.
Provide clear, concise communication to leadership during incidents and operational reviews.

Requirements:

Bachelor's degree in computer science, electronics engineering, or other engineering or technical discipline
10+ years in IT operations, service reliability, or incident management, including 5+ years managing managers and large teams.
Experience overseeing large teams while supporting a Federal client.
Proven experience leading multi-site IT operations and large-scale teams (400+ employees).
Strong background in ITIL practices, incident management, and customer support operations.
History of collaboration and flexibility, including innovative solutions to solve challenges facing geographically distributed teams.
Exceptional leadership, coaching, and interpersonal communication skills.
Strong analytical and problem-solving skills with a data-driven mindset.
Ability to build and maintain strong client relationships and manage escalations effectively.
Experience with APM, observability platforms, enterprise monitoring tools, and KPI reporting.
Ability to prioritize work and self-direct with minimal input.
Strong messaging capabilities to create team cohesion, team-focus and ongoing drive.
ITIL Certification (preferred)
Experience with end-user technologies and concepts (preferred)
Strategic thinking with a focus on operational excellence.
Ability to influence and inspire large teams.
Results-oriented with a track record of delivering high customer satisfaction.
Adaptability and resilience in a fast-paced, multi-client environment.
US Citizen or Green card holder
Willing and able to get a Public Trust Suitability clearance
Must meet updated ID requirements: If you do not currently meet the ID requirements outlined, you must be willing and able to update your current forms of ID in a timely manner to complete the suitability process successfully.

Service Reliability & Operations Manager

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Kentro

Report this job

Similar Jobs

Team Velocity

Flourish Health

Salesforce

Serve Robotics

Valnet

Clipboard Health

Quorum Health

Mitratech

Protective Life

Denova Consulting

Curana Health

Kraken Digital Asset Exchange

DaVita Kidney Care

Gridware

DYWIDAG

Coastal

RXO, Inc.

ReWorks Solutions

AB InBev

CyberRisk Alliance