Principal Site Reliability Engineer – Platform Tribe
Posted 65ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Principal Site Reliability Engineer managing infrastructure and game experience for a leading iGaming supplier. Focused on minimizing latency and providing 24x7 support for critical SaaS events.
Responsibilities:
- Manage day-to-day alerts, system checks, and issue escalation as necessary.
- Provide 24x7 on-call support for critical SaaS events.
- Document issues and remediation steps.
- Proactively create monitors within the EKS/K8s ecosystem.
- Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
- Enhance infrastructure health by implementing checks and scripts to address known issues.
- Maintain and develop deployment code.
- Implement/integrate new technologies into our Cloud Infrastructure.
- Collaborate with other teams to provide top-notch support and assistance.
- Prioritize customer focus in planning deployments/updates, ensuring minimal impact.
- Conduct RCA and take necessary corrective actions to prevent issue recurrence.
- Assign alert-related actions to the appropriate team after investigation.
- Handle support requests for environment-specific actions.
Requirements:
- Proficiency in Kubernetes (deployment, scaling, troubleshooting)
- Experience with configuration management tools like FluxCD/ArgoCD
- Strong experience with issue processing (RCA, Postmortems)
- Familiarity with AWS, Terraform, Docker, CI/CD
- Experience with monitoring tools like DataDog, Prometheus, Grafana, and logging solutions like Elasticsearch, Logstash, and Kibana (ELK Stack) or AWS CloudWatch
- Strong understanding of networking concepts and protocols
- Proficiency in at least one scripting language (e.g., Python, NodeJS, Go)
- Proficiency in Git or other version control systems
- Familiarity with incident response and management tools like PagerDuty, Opsgenie, or VictorOps
- Ownership, proactiveness, persistence, and passion for maintaining a high-traffic online platform.
Benefits:
- Competitive Salary and annual performance/salary reviews
- Realistic and transparent Bonus system (15-20%), paid quarterly
- Unlimited paid vacation leave & paid sick leave
- Flexible work schedule to accommodate your needs
- 100% Remote
- Medical Insurance for you +1
- Financial Support for Life Events & Extended Parental Leave
- Paid professional development courses and trainings
- B2B contracts



















