Staff Research Scientist – Reinforcement Learning
Posted 1hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Staff Research Scientist leading simulation environments and post-training AI solutions at Centific. Collaborating with data scientists and engineers to produce high-quality AI applications for enterprise workflows.
Responsibilities:
- Design simulation environments and digital twins for enterprise workflows
- Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
- Build pipelines that convert human-labeled traces and verifiable signals into training data
- Architect multi-turn, tool-using agents with closed learning loops
- Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
- Set the technical bar across the team — architecture, code review, engineering standards
- Mentor researchers and engineers; drive technical direction through influence
- Translate research into production; contribute to publications
Requirements:
- 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
- MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
- 5+ years hands-on RL — environment design, reward engineering, policy optimization — with at least one production deployment LLM Post-Training
- 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
- Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
- Strong Python and software engineering skills — comfortable building production pipelines, not just notebooks
- Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
- Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)
Benefits:
- Health insurance
- 401(k) matching
- Flexible work hours
- Paid time off
- Remote work options



















