Senior Machine Learning Operations Engineer
Posted 10hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Machine Learning Operations Engineer building real-time inference systems for risk decisioning at Mercury. Leading deployment infrastructures and ensuring high availability for ML models in production.
Responsibilities:
- Build and operate the real-time inference service that scores models for the risk decision engine, with low latency and high availability as first-class requirements
- Own model deployment infrastructure — registry and versioning, CI/CD with performance, bias, and consistency checks, shadow mode, and staged rollouts
- Build model observability: availability, latency, and error monitoring, plus drift detection as a retraining trigger
- Partner with Risk Data Science to take models from a clean development-to-production handoff through to production operation under MLP ownership
- Implement experimentation capabilities such as champion/challenger and canary routing, and explainability outputs like SHAP attributions
- Feel a strong sense of product ownership and actively seek responsibility — we self-organize on small and medium projects, and we want someone excited to help shape and build a brand-new platform team
Requirements:
- 5+ years in machine learning engineering, backend software engineering, MLOps, or a closely related field
- Production ML service experience — deploying, serving, and operating models in low-latency, high-availability contexts
- Strong backend engineering fundamentals in Python, with API frameworks like FastAPI or Flask
- Experience with model deployment and lifecycle tooling: model registries, CI/CD for models, versioning, and staged rollout patterns (shadow, canary, champion/challenger)
- Experience building observability and alerting for production services — latency, errors, and ideally model-specific signals like drift
- Comfort with the data layer ML depends on: SQL, key-value/low-latency stores (Redis, DynamoDB, or equivalent), and streaming pipelines (Kafka, Kinesis, Redpanda, or equivalent)
Benefits:
- Competitive salary
- Equity
- Health insurance plans
- Paid time off
- Remote work options















