Senior Machine Learning Operations Engineer

Posted 10hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Machine Learning Operations Engineer building real-time inference systems for risk decisioning at Mercury. Leading deployment infrastructures and ensuring high availability for ML models in production.

Responsibilities:

  • Build and operate the real-time inference service that scores models for the risk decision engine, with low latency and high availability as first-class requirements
  • Own model deployment infrastructure — registry and versioning, CI/CD with performance, bias, and consistency checks, shadow mode, and staged rollouts
  • Build model observability: availability, latency, and error monitoring, plus drift detection as a retraining trigger
  • Partner with Risk Data Science to take models from a clean development-to-production handoff through to production operation under MLP ownership
  • Implement experimentation capabilities such as champion/challenger and canary routing, and explainability outputs like SHAP attributions
  • Feel a strong sense of product ownership and actively seek responsibility — we self-organize on small and medium projects, and we want someone excited to help shape and build a brand-new platform team

Requirements:

  • 5+ years in machine learning engineering, backend software engineering, MLOps, or a closely related field
  • Production ML service experience — deploying, serving, and operating models in low-latency, high-availability contexts
  • Strong backend engineering fundamentals in Python, with API frameworks like FastAPI or Flask
  • Experience with model deployment and lifecycle tooling: model registries, CI/CD for models, versioning, and staged rollout patterns (shadow, canary, champion/challenger)
  • Experience building observability and alerting for production services — latency, errors, and ideally model-specific signals like drift
  • Comfort with the data layer ML depends on: SQL, key-value/low-latency stores (Redis, DynamoDB, or equivalent), and streaming pipelines (Kafka, Kinesis, Redpanda, or equivalent)

Benefits:

  • Competitive salary
  • Equity
  • Health insurance plans
  • Paid time off
  • Remote work options