Senior Infrastructure Engineer – AI/ML

Posted 114ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Infrastructure Engineer specializing in AI/ML infrastructure at Mitratech, designing and optimizing AWS environments for complex workloads. Collaborating with ML teams to enhance model performance and security practices.

Responsibilities:

  • Design, deploy, and maintain scalable and secure infrastructure supporting AI and ML workloads.
  • Build and maintain AWS cloud environments for compute (EC2, ECS/EKS, Lambda), storage (S3, EFS, FSx), and networking (VPC, Transit Gateway, PrivateLink, Route 53, load balancers).
  • Implement security best practices using IAM, KMS, Secrets Manager, GuardDuty, and Security Hub.
  • Support and optimize AI/ML workloads across AWS services (SageMaker, Bedrock, Batch, Step Functions).
  • Develop and maintain Infrastructure as Code (IaC) using Terraform, AWS CDK, and CloudFormation.
  • Manage containerized workloads and orchestration platforms (Docker, EKS, Fargate), including GPU scheduling and scaling.
  • Set up and maintain monitoring and observability frameworks using CloudWatch and OpenTelemetry.
  • Build and manage CI/CD pipelines (CircleCI, GitHub Actions, GitLab CI) for infrastructure automation and ML/Gen AI deployments.
  • Collaborate with ML and Generative AI teams to scale models, optimize performance, and design efficient prompt or inference pipelines.
  • Develop runbooks and SOPs for AI service deployment, troubleshooting, and performance optimization.
  • Ensure security, compliance, and data protection across AI datasets and environments.

Requirements:

  • Strong proficiency in Linux administration and systems-level troubleshooting.
  • Deep expertise in AWS cloud services, with experience in compute, storage, networking, and security domains.
  • Proficiency in container orchestration (Kubernetes/EKS) and infrastructure automation tools.
  • Hands-on experience with IaC tools such as Terraform, AWS CDK, or CloudFormation.
  • Familiarity with monitoring, logging, and observability stacks (Prometheus, Grafana, OpenTelemetry).
  • Experience implementing CI/CD pipelines for automated deployment and testing.
  • Understanding of AI/ML concepts, including model deployment, inference scaling, and LLM performance tuning.
  • Working knowledge of security best practices, IAM roles, encryption, and compliance controls.
  • Excellent collaboration and communication skills to partner with ML engineers, data scientists, and product teams.

Benefits:

  • Equal-opportunity employer that values diversity at all levels• Professional development opportunities