AI Engineer, Quality – Evals

Posted 83ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

AI Engineer developing evaluation platforms and pipelines for AI agents at fintech startup. Collaborating across teams to ensure reliable AI performance and product quality.

Responsibilities:

Design and build a unified evaluation platform that serves as the single source of truth for all of our agentic systems and audit workflows
Build observability systems that surface agent behavior, trace execution, and failure modes in production, and feedback loops that turn production failures into first-class evaluation cases
Own the evaluation infrastructure stack including integration with LangSmith and LangGraph.
Translate customer problems into concrete agent behaviors and workflows
Integrate and orchestrate LLMs, tools, retrieval systems, and logic into cohesive, reliable agent experiences
Build automated pipelines that evaluate new models against all critical workflows within hours of release
Design evaluation harnesses for our most complex Agentic systems and workflows
Implement comparison frameworks that measure effectiveness, consistency, latency, and cost across model versions
Design guardrails and monitoring systems that catch quality regressions before they reach customers
Use AI as core leverage in how you design, build, test, and iterate
Prototype quickly to resolve uncertainty, then harden systems for enterprise-grade reliability
Build evaluations, feedback mechanisms, and guardrails so agents improve over time
Work with SMEs and ML Engineers to create evaluation datasets by curating production traces.
Design prompts, retrieval pipelines, and agent orchestration systems that perform reliably at scale
Define and document evaluation standards, best practices, and processes for the engineering organization
Advocate for evaluation-driven development and make it easy for the team to write and run evals
Partner with product and ML engineers to integrate evaluation requirements into agent development from day one
Take full ownership of large product areas rather than executing on narrow tasks

Requirements:

Multiple years of experience shipping production software in complex, real-world systems
Experience with TypeScript, React, Python, and Postgres
Built and deployed LLM-powered features serving production traffic
Implemented evaluation frameworks for model outputs and agent behaviors
Designed observability or tracing infrastructure for AI/ML systems
Worked with vector databases, embedding models, and RAG architectures
Experience with evaluation platforms (LangSmith, Langfuse, or similar)
Comfort operating in ambiguity and taking responsibility for outcomes
Deep empathy for professional-grade, mission-critical software (experience with audit and accounting workflows are not required)

Benefits:

Competitive compensation packages with meaningful ownership
Flexible PTO
401k
Wellness benefits, including a bundle of free therapy sessions
Technology & Work from Home reimbursement
Flexible work schedules

AI Engineer, Quality – Evals

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

Fieldguide

Report this job

Similar Jobs

Kyte App

Cyclotron, Inc.

Realm Digital

ClanX

Verterim, Inc.

OneQrew

Temporal Technologies

Snorkel AI

First American (India)

SDL

The Home Depot

Sutherland

Cint

Bitsight

Cyclotron, Inc.

refive

FactSet

HAMERKOP Climate Impacts

Homie.mx

EY