Cloud Infrastructure Engineer
Posted 48ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Infrastructure Engineer designing and maintaining cloud-native platforms for AI applications at JetBrains. Focused on scalability and automated operations with Google Cloud Platform.
Responsibilities:
- Design, implement, and manage the core infrastructure powering Kineto's platform on Google Cloud Platform (GCP), including networking, security, and identity management.
- Build and operate resilient, highly available distributed systems using Kubernetes (GKE), Knative, Istio, and related cloud-native technologies.
- Automate the entire infrastructure life cycle (IaC) using Terraform and Terragrunt, ensuring secure, reproducible, and auditable environments.
- Implement and maintain CI/CD pipelines (e.g. GitHub Actions and TeamCity) and deployment tools like Flux and Helm for GitOps-driven application delivery.
- Optimize and manage the multi-tenant data layer on Postgres and Neon, focusing on robust tenant isolation, performance, backups, and safe schema management.
- Drive site reliability engineering (SRE) practices, including monitoring, alerting (Prometheus, Grafana), logging (Loki), and incident response.
- Solve complex operational challenges, such as optimizing scale to zero for cost efficiency, minimizing cold starts, enhancing autoscaling behavior, and managing queue backpressure.
- Implement platform-wide performance tuning (e.g. container resource limits, distributed locks, caching strategies, and GC configurations).
- Ensure platform security and compliance by implementing best practices for secrets management, network segmentation, and vulnerability scanning.
- Own major infrastructure roadmap items, including multi-region deployments, disaster recovery planning, advanced tenancy separation, and ephemeral preview environments.
- Champion DevOps and SRE principles across the engineering team, mentoring engineers on cloud-native best practices, operational readiness, and debugging complex distributed systems.
- Collaborate with product and engineering teams to define the long-term vision for the platform's architecture and operational model.
Requirements:
- Have five or more years of experience building and operating large-scale, commercial cloud-native infrastructure, with a strong focus on DevOps/SRE practices.
- Possess deep, hands-on expertise with GCP (or AWS/Azure) and Kubernetes administration and operations (GKE experience is a strong plus).
- Are proficient with infrastructure-as-code (IaC) tools, particularly Terraform, for managing complex environments.
- Have a solid understanding of Linux internals, networking (CNI and service mesh), security, and distributed system design.
- Are familiar with CI/CD tools, GitOps (e.g. Flux), monitoring stacks (Prometheus/Grafana), and logging systems.
- Thrive in cross-functional teams and excel at communicating complex infrastructure ideas clearly.
Benefits:
- Professional development opportunities
- Remote work options

















