Cloud Infrastructure Engineer
Posted 4ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Cloud Infrastructure Engineer managing operational excellence as part of the Infrastructure team for a tech company specializing in genomics workflows. A hands-on role focused on operational tasks, SRE work, and automation.
Responsibilities:
- Monitor, respond to, and resolve production incidents and operational issues
- Manage routine operational tasks that currently consume team capacity (deployments, configuration changes, access management, maintenance windows)
- Participate in on-call rotation: Share responsibility for after-hours production support
- Work with support and development teams to troubleshoot and resolve platform issues
- Execute and validate infrastructure changes in production environments
- Update and improve documentation for operational procedures
- Handle patches, upgrades, certificate renewals, and other recurring operational tasks
- Monitor system health, respond to alerts, and maintain SLAs
- Spot repetitive manual work and build automation to eliminate it
- Create scripts, utilities, and self-service tools to reduce operational burden
- Improve observability to catch issues before they become incidents
- Reduce friction and manual steps in release and deployment workflows
- Enable developers to handle routine tasks without infrastructure team involvement
- Convert manual procedures into automated, repeatable infrastructure code (Terraform)
- Leave behind improved runbooks, automation, and processes
- Help quantify operational burden and demonstrate reduction over time
- Free up permanent team members for strategic initiatives
- Support development teams' infrastructure needs and unblock their work
- Capture operational knowledge and procedures that exist only in people's heads
- Provide clear documentation and knowledge transfer for systems and automation you build
- Participate in team rituals: Standups, retrospectives, and planning to stay aligned with team priorities
Requirements:
- 5-8 years of experience in infrastructure, platform, SRE, or DevOps engineering
- Strong operational background: Experience managing production systems and handling incidents
- Proven toil reduction skills: Track record of identifying repetitive work and automating it away
- Strong expertise with cloud infrastructure (AWS strongly preferred)
- Proficiency with infrastructure-as-code (Terraform required)
- Experience with container orchestration (Kubernetes, Nomad, or similar)
- Experience with service mesh and service discovery (Consul, Istio, or similar)
- Experience with secrets management (Vault, Secrets Manager, or similar)
- Strong understanding of monitoring, alerting, and observability
- Comfortable with on-call work: Experience with incident response and production support
- Proven ability to onboard quickly and become productive in new environments
- Strong troubleshooting skills: Can diagnose complex system issues under pressure
- Self-directed work style: Minimal supervision required for operational work
- Bias for automation: Natural instinct to eliminate manual work.


















