Senior Site Reliability Engineer
Posted 2hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Site Reliability Engineer at Kraken managing infrastructure for Payward Services. Collaborating with development teams and improving operational excellence in a fully remote setup.
Responsibilities:
- Manage and support infrastructure for Payward Services, including Nomad, Kubernetes, databases, and 3rd party system integration
- Provide operational support across multiple teams, helping debug issues in staging and production environments
- Participate in incident response and post-incident reviews to improve system resilience
- Consult with teams on performance, monitoring, and alerting best practices — with awareness of partner-facing SLA commitments
- Build tooling, automation, and dashboards to improve observability and empower development teams
- Maintain and troubleshoot CI pipelines, ensuring reliable and fast build, test, and deployment cycles
- Collaborate with developers, QA, and product managers to streamline development and release cycles
- Support a fully distributed team operating across multiple timezones
Requirements:
- 5+ years in DevOps or SRE role
- Proficiency with hybrid-cloud infrastructure environments
- Git source version-control and CI/CD configuration proficiency
- Deep understanding of monitoring and alerting systems, preferably Prometheus and Grafana
- Ability to debug complex distributed systems, networks, and Linux operating systems issues
- Containerization and orchestration experience (Docker, Nomad, Kubernetes a plus)
- Strong scripting skills (Bash, Python, or Go)
- Self-starter capable of thriving independently and remotely in fast-paced environments
Benefits:
- Kraken Culture page
- Global team
- Flexible remote work arrangements



















