Red Team Manager, Training, Quality, Roleplay Excellence

Posted 81ds ago

Employment Information

Industry
Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Red Team Manager leading evaluations and training for advanced AI systems at mpathic. Focusing on safety, reliability, and quality of adversarial roleplays.

Responsibilities:

  • Train & Lead Red Team Reviewers
  • Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
  • Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
  • Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
  • Train Experts on Roleplays, Model Behavior & Harm
  • Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
  • Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
  • Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
  • Create Training Materials & Resources
  • Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.) Write clear guidance that enables new hires to become productive quickly.
  • Review & Evaluate Vulnerable User Roleplays
  • Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
  • Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations.
  • Create Vulnerable User Roleplays
  • Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability.
  • Review Hiring Applicants
  • Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations. Help build a scalable standard for what “great” looks like in this role.

Requirements:

  • 4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
  • Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
  • Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
  • Excellent written communication—clear, structured, and test-case oriented.
  • Experience leading or mentoring teams in fast-moving environments.
  • Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
  • Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates.
  • Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
  • Experience scaling an operational team and improving productivity without quality loss.

Benefits:

  • Health insurance
  • Professional development