T3 Operations & Support Specialist – Compute & OS

Posted 2hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

T3 Operations & Support Specialist for cloud-native platform supporting major energy transmission operator in Germany. Responsible for Compute & OS services within Local Production, handling complex incidents and ensuring readiness.

Responsibilities:

  • Providing T3 operational ownership for Compute & OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures
  • Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
  • Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability
  • Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues
  • Validating deployment artefacts from an operations perspective and enforcing quality assurance measures
  • Monitoring system health, performance metrics and service availability across multi-tenant environments
  • Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions
  • Implementing monitoring and logging strategies to support audit and compliance requirements
  • Performing routine security scans and remediating identified vulnerabilities

Requirements:

  • 5 to 10+ years in IT operations, service delivery, or platform operations
  • Proven experience implementing and leading Incident, Problem, Change and Release governance in production
  • Hands-on experience with VMware 8 virtualisation
  • Operating Systems: Red Hat Enterprise Linux and Ubuntu
  • OS tooling: Satellite, IPA, Certificate Server
  • ITSM/collaboration tooling: Jira Service Management, Jira, Confluence
  • Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts
  • Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking
  • Hands-on experience documenting procedures and enforcing clear runbooks and playbooks
  • Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)
  • Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists
  • Fluent English and German (C1 minimum in both)

Benefits:

  • Flexible working hours
  • Freedom to choose projects
  • Access to exciting projects in various industries
  • Support in advancing your career
  • Competitive pay
  • Dedicated team for assistance