Site Reliability Engineer

Posted 2ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer at Hewlett Packard Enterprise managing cloud systems and enhancing service reliability. Engaging in full service lifecycle from design to operation.

Responsibilities:

  • Engage in and improve the whole lifecycle of services - from inception and design, through to deployment, operation, and refinement.
  • Support development of services from planning phase before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
  • Provide technical leadership and guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
  • Maintain services once they are living by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Capacity planning the growth of cloud infrastructure.
  • Improve operational processes such as deployments and upgrades.
  • Manage execution of project priorities, deadlines, and deliverables.
  • Be on an on-call rotation to respond to incidents that impact platform availability.
  • Use your on-call shift to prevent incidents from happening.
  • Experience in incident response, including conducting post-mortems and implementing lessons learned, enhances system reliability.

Requirements:

  • 10+ years of engineering or systems experience
  • Experience leveraging cloud architecture, applying site reliability principles, and/or demonstrating sensitivity to operational concerns
  • Strong understanding of network design and architecture
  • Scaling and managing distributed systems
  • Significant experience with monitoring and observability platforms
  • Demonstrated ability to debug, fix, and optimize code
  • Troubleshooting skills across network, application, and distributed services layers
  • The ability to learn quickly and adapt to new technologies is essential
  • Excellent communications skills, both verbal and written.

Benefits:

  • Health & Wellbeing
  • Personal & Professional Development
  • Unconditional Inclusion