Staff Research Scientist – Reinforcement Learning

Posted 1hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Staff Research Scientist leading simulation environments and post-training AI solutions at Centific. Collaborating with data scientists and engineers to produce high-quality AI applications for enterprise workflows.

Responsibilities:

  • Design simulation environments and digital twins for enterprise workflows
  • Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
  • Build pipelines that convert human-labeled traces and verifiable signals into training data
  • Architect multi-turn, tool-using agents with closed learning loops
  • Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
  • Set the technical bar across the team — architecture, code review, engineering standards
  • Mentor researchers and engineers; drive technical direction through influence
  • Translate research into production; contribute to publications

Requirements:

  • 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
  • MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
  • 5+ years hands-on RL — environment design, reward engineering, policy optimization — with at least one production deployment LLM Post-Training
  • 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
  • Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
  • Strong Python and software engineering skills — comfortable building production pipelines, not just notebooks
  • Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
  • Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)

Benefits:

  • Health insurance
  • 401(k) matching
  • Flexible work hours
  • Paid time off
  • Remote work options