Site Reliability Engineer
Posted 7hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Site Reliability Engineer at ZigZag designing and maintaining infrastructure and automation solutions. Collaborating with teams to ensure platform reliability and operational efficiency.
Responsibilities:
- Design, build, and maintain scalable and reliable infrastructure and platform services
- Develop and maintain infrastructure-as-code (e.g., CloudFormation, Terraform)
- Develop custom automation workflows and internal tools to support infrastructure provisioning, monitoring, and incident response
- Monitor system performance, availability, and capacity using observability tools (e.g., SumoLogic, AWS CloudWatch)
- Create and maintain dashboards and monitoring solutions that offer deep insight into platform health and support rapid incident diagnosis
- Automate operational processes (e.g., deployments, failovers, scaling) to reduce toil and enhance system resilience
- Participate in incident response activities, including postmortems and root cause analysis, to drive continual improvement
- Continuously evolve and maintain SLOs and SLIs, ensuring a balance between development velocity and system reliability
- Design and implement robust CI/CD pipelines and zero-downtime deployment strategies
- Collaborate with engineering teams to embed reliability, scalability, performance, and security best practices into the SDLC
Requirements:
- 2+ years of experience in a SRE role or similar (e.g. DevOps Engineer)
- Experience managing an AWS environment and working in a SaaS business
- Strong knowledge and experience of infrastructure-as-code
- Experience with building and supporting robust CI/CD pipelines
- Strong problem solving and analytical skills
- Excellent communication and collaboration skills
- Ability to work in a fast-paced, agile environment
- Experience with BuildKite
- Experience with distributed systems and microservice architecture
- Exposure to compliance frameworks (PCI-DSS, ISO27001)
Benefits:
- Health insurance
- Paid time off
- Professional development opportunities

















