CD Operations Engineer

Posted 53mins ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer managing and scaling a production Kubernetes platform for innovative companies. Focusing on automation, CI/CD pipelines, and operational excellence.

Responsibilities:

  • Maintain and optimise CI/CD pipelines to ensure deployment readiness and validate all deployment artifacts from an operational perspective.
  • Define and enforce quality assurance measures, including standard operating procedures and successful test reporting.
  • Implement rollback strategies and comprehensive operational monitoring for all production deployments.
  • Manage monitoring, incident, problem, and change management within a multi-tenant managed Kubernetes environment.
  • Monitor system health, performance metrics, and service availability, resolving incidents to minimise service disruption.
  • Perform root cause analysis and implement corrective and preventive actions to enhance platform stability.
  • Automate recurring operational tasks and critical processes to reduce toil and improve service reliability.
  • Validate automated procedures through the full software development lifecycle, including staging and testing.
  • Implement logging and monitoring strategies to adhere to security and audit compliance standards.
  • Conduct routine security scans and remediate vulnerabilities across the platform.

Requirements:

  • Professional proficiency in both English and German (C1 level minimum)
  • At least 3 years of hands-on operational experience with self-managed Kubernetes clusters and productive applications in on-premise environments
  • Deep understanding of networking concepts, including protocols, load balancing, and security
  • Extensive experience with CI/CD processes and tooling, such as GitLab, Jenkins, Tekton, or ArgoCD
  • Fundamental understanding of core operations processes including incident, change, and problem management (ITSM) alongside SRE concepts
  • Experience gathering operational insights from monitoring and observability tools, including managing SLI/SLA/SLOs
  • Proven ability to document procedures and enforce clear runbooks or playbooks
  • Practical experience with monitoring and logging stacks such as Prometheus, Grafana, Mimir, or Loki

Benefits:

  • Flexible working hours
  • Freedom to choose your own projects
  • Access to exciting projects in various industries
  • Competitive pay
  • Dedicated team support