Senior Site Reliability Engineer
Posted 1hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Site Reliability Engineer enhancing ScalePad's multi-cloud platform and developer experience. Involved in infrastructure operations across AWS and Azure while mentoring fellow engineers.
Responsibilities:
- Operate production infrastructure across AWS and Azure, including networking, IAM, and cost.
- Build and operate Terraform modules and state at scale, keeping our infrastructure as code clean and reviewable.
- Run Kubernetes in production: upgrades, scaling, troubleshooting, and platform improvements.
- Operate and improve CI/CD pipelines that the entire engineering org depends on.
- Operationalize SLO/SLI frameworks and observability practices alongside the SRE team.
- Drive incident response practice, on-call tooling, and incident review follow-through.
- Reduce operational toil through automation across secret rotation, access management, and environment provisioning.
- Contribute to capacity planning, disaster recovery, and resilience work across critical systems.
- Build and maintain internal developer tooling that removes friction across engineering.
- Lead rollouts of AI-native tooling for code review, testing, and engineering productivity.
- Own migrations and consolidation of internal platforms such as Jira, Confluence, ticketing, and documentation systems.
- Mentor engineers and technical leads, fostering growth and knowledge-sharing within the organization.
- Evaluate and introduce new technologies, tools, and approaches to improve scalability and efficiency.
Requirements:
- 5+ years of experience in software engineering, infrastructure, or related technical disciplines, with a focus on Site Reliability Engineering (SRE), DevOps, Platform Engineering, or similar roles.
- Strong expertise in cloud infrastructure, distributed systems, networking, and observability practices.
- Experience designing and operating highly available, scalable production systems.
- Deep understanding of scripting, automation, infrastructure as code, CI/CD, and operational best practices.
- Experience implementing SLO/SLI frameworks and reliability engineering methodologies.
- Incident management, troubleshooting, and on-call experience in complex production environments.
- Passion for mentoring engineers and improving engineering culture.
Benefits:
- Share in our success through our Employee Stock Ownership Plan (ESOP) and RRSP matching.
- Parental leave programs are in place to support you and your family when it matters most.
- Join opt-in mentorship programs and learn directly from founders and senior leaders.
- Access an annual professional development budget to level up your skills, your career, and your impact.
- Work with brand new, top-of-the-line hardware and equipment.
- Receive a monthly stipend to help you create an effective hybrid or remote work environment.
- Take care of yourself with 100% employer-paid benefits.

















