AI Platform Engineer – ML Ops

Posted 142ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

AI Platform Engineer at Toku managing MLOps pipelines and deploying AI systems. Focusing on cloud-native AI workloads with strong collaboration across engineering teams.

Responsibilities:

Design, improve, and operate MLOps pipelines for training, deploying, and managing ML models in production.
Build and maintain CI/CD-style workflows for model packaging, versioning, and deployment across environments.
Operate and optimise AWS-based infrastructure for AI workloads, including compute, storage, and networking components.
Manage GPU-enabled workloads, addressing scalability, reliability, and cost-efficiency for high-load AI applications.
Implement monitoring and alerting for deployed models, focusing on system health, performance, and operational stability.
Own and evolve shared tooling such as MLflow, Docker-based workflows, and deployment frameworks to improve developer productivity.
Work closely with infrastructure, SRE, and engineering teams to align AI platform practices with broader system standards.
Support live AI services by diagnosing deployment, scaling, and infrastructure-related issues impacting AI features.
Ensure reproducibility, traceability, and governance across the full ML lifecycle, from experimentation to production.

Requirements:

Hands-on experience building and operating MLOps pipelines for production ML systems.
Strong experience with AWS services used for AI workloads, including EC2, ECS, and SageMaker.
Practical experience with Docker and container-based deployment of ML workloads.
Experience with MLflow or similar tools for experiment tracking, model versioning, and lifecycle management.
Experience managing GPU-based workloads and addressing performance and cost challenges at scale.
Strong understanding of cloud infrastructure concepts as they apply to ML systems.
Ability to work with Python-based ML codebases to support deployment and lifecycle needs.
Working familiarity with LLMs, NLP models, and applied ML concepts sufficient to support deployment and monitoring (without owning core model development).
Proven experience supporting live, production ML systems with real customer impact.
Ability to work cross-functionally with applied AI engineers, backend engineers, and infra teams.

Benefits:

Training and Development
Discretionary Yearly Bonus & Salary Review
Healthcare Coverage based on location
20 days Paid Annual Leave (excluding Bank holidays)

AI Platform Engineer – ML Ops

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

Toku

Report this job

Similar Jobs

The Renaissance Network, Inc.

Kohl's

Social Discovery Group

R1 RCM

Security Journey

TELUS

Paramount

Kainos

Castor

Cargill

Demandbase

Kaufland e-commerce

Jeevan Technologies (a Nobl Q company)

RTX

Guidehouse

Luxor Technology

Luxor Technology

AeroVect

Libre

P2P Labs & P2P Tech Services