Red Team Manager, Training, Quality, Roleplay Excellence

Posted 126ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Red Team Manager leading evaluations and training for advanced AI systems at mpathic. Focusing on safety, reliability, and quality of adversarial roleplays.

Responsibilities:

Train & Lead Red Team Reviewers
Onboard new Red Team reviewers and run recurring calibration sessions to align on quality standards.
Set expectations and maintain consistency across reviewers for evaluation depth, writing quality, and reproducibility.
Build workflows for review (sampling, escalation, dispute resolution, feedback loops).
Train Experts on Roleplays, Model Behavior & Harm
Train red team experts on how to roleplay realistic user scenarios—including vulnerable users—without sensationalism.
Teach systematic adversarial techniques (prompt escalation, persistence strategies, boundary probing).
Help experts understand model failure modes: policy boundary drift, refusal weaknesses, hallucinations, unsafe compliance, and tone failures.
Create Training Materials & Resources
Build and maintain: Red team playbooks and rubrics Example libraries (“gold standard” roleplays + evaluations) Defect taxonomy (what counts as a meaningful finding vs noise) Brief modules for domain harm areas (self-harm, minors, extremism, medical, fraud, harassment, etc.) Write clear guidance that enables new hires to become productive quickly.
Review & Evaluate Vulnerable User Roleplays
Review vulnerable-user roleplays produced by experts for realism, safety relevance, and correct targeting of failure modes.
Ensure roleplays are: behaviorally plausible ethically framed actionable for model improvement consistent with internal policies and customer expectations.
Create Vulnerable User Roleplays
Personally produce high-quality vulnerable-user roleplays, including: ambiguous edge cases multi-turn scenarios culturally nuanced or emotionally realistic interactions scenarios that stress safety, tone, and reliability.
Review Hiring Applicants
Own parts of the hiring loop for red team experts and reviewers: design work samples evaluate candidate submissions provide structured feedback and hiring recommendations. Help build a scalable standard for what “great” looks like in this role.

Requirements:

4+ years in trust & safety, AI evaluation, red teaming, security testing, content integrity, or similar applied roles.
Strong experience building training programs, rubrics, or QA frameworks for human judgment work.
Ability to evaluate roleplays and adversarial scenarios with consistency and high signal-to-noise.
Excellent written communication—clear, structured, and test-case oriented.
Experience leading or mentoring teams in fast-moving environments.
Experience red teaming LLMs, agentic systems, or tool-using models (prompt injection, data exfiltration, policy probing).
Familiarity with evaluation methods: gold sets inter-rater reliability (or strong proxy measurement instincts) sampling strategies and quality gates.
Background in one or more harm domains (self-harm, medical, violence, fraud, extremism, harassment).
Experience scaling an operational team and improving productivity without quality loss.

Benefits:

Health insurance
Professional development

Red Team Manager, Training, Quality, Roleplay Excellence

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

mpathic

Report this job

Similar Jobs

Knowtex

Iron Mountain

CVS Health

Vantage Data Centers

CVS Health

DraftKings Inc.

CVS Health

Western Alliance Bank

KeyBank

LifeStance Health

Zayo Group

Thermo Fisher Scientific

CENTERLIGHT

Pear Tree.

EIS Group

EIS Group

Quantum Machines

Quantum Machines

AmeriPharma

Basemakers