Reliability Engineer
Posted 97ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Reliability Engineer managing facility infrastructure reliability across mission-critical data center systems. Designing, implementing, and improving asset strategies to achieve uptime and safety objectives.
Responsibilities:
- The Reliability Engineer is accountable for facility infrastructure reliability across mission critical data center systems (power, cooling, controls).
- You will design, implement, and continuously improve asset strategies and work management processes to achieve uptime, safety, and cost objectives.
- Core work includes reliability analytics, PM optimization, MOP/SOP governance, change management, root cause analysis (RCA), and program execution for critical spares, condition monitoring, and lifecycle asset management.
- Develop and maintain equipment strategies (criticality, failure modes, maintenance prescriptions) for power and cooling systems.
- Own PM quality and audit activities; eliminate ineffective tasks and deploy optimized prescriptions.
- Author, review, and govern SOPs/MOPs/EOPs and change packages; ensure adherence through training and approvals.
- Partner with site teams to maintain CMMS schedules and O&M plans; lead reliability investigations and corrective actions.
- Implement oil/coolant analysis, thermography, vibration, and battery monitoring; trend data to preempt failures.
- Establish and maintain critical spares lists and stocking strategies; track gaps and remedial actions.
- Support lifecycle asset management processes to guide replacements and capital planning.
- Lead post incident RCAs and FMEA; publish learnings and update procedures.
- Collaborate with CE leaders to uphold operator certification and training standards; mentor technicians on reliability methods.
Requirements:
- 7 years in reliability, maintenance engineering, or facilities engineering within mission critical environments.
- Expertise with RCM, FMEA, RCA, and maintenance optimization.
- Familiarity with UPS, generators, switchgear, chillers, cooling towers, CRAH/CRAC, and BMS/EPMS.
- Experience governing SOP/MOP/EOP, CMMS scheduling, and change management.
- Ability to analyze condition monitoring data and turn findings into actions.
- Proficiency in data analysis and visualization tools (Excel, Power BI, or similar) to mine CMMS, condition-monitoring, and operational data for trends, failure patterns, and predictive insights.
- Ability to apply statistical methods or reliability modeling to support decision-making.
- Strong communication skills; able to lead investigations and drive consensus.
Benefits:
- Support 24×7 operations
- Occasional on call rotation and night/weekend work
- Ability to work in mechanical/electrical rooms around energized systems (following LOTO and NFPA 70E)
- Travel to supported sites (~25 %)
















