Senior Software Engineer – AI Evaluation & Benchmarks, Python

Posted 4hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Software Engineer contributing to AI model evaluation benchmarks and pipelines in a remote capacity. Requires expertise in Python and extensive software engineering experience.

Responsibilities:

  • Design and build the coding benchmarks and evaluation pipelines used to test frontier AI models on real software engineering work:
  • Design coding benchmarks that evaluate frontier models on real-world programming tasks — reasoning, debugging, and production-quality code
  • Build and maintain scalable data pipelines for evaluation workflows
  • Analyze model-generated code for correctness, reliability, and edge-case failures
  • Construct structured evaluation scenarios across large repos and multi-language environments
  • Provide detailed technical feedback on model performance and failure patterns
  • Contribute to evaluation frameworks that set the bar for how coding ability is measured
  • End result: benchmarks that meaningfully separate what frontier models can and can't do — and shape how the next generation is trained and improved.
  • AI coding evaluation in one line: Design task → build harness → run model → analyze failures → feed findings back into the benchmark → evaluations that actually distinguish strong models from weak ones.

Requirements:

  • 4+ years of professional software engineering experience (non-negotiable)
  • Expert Python — clean, performant, well-tested code
  • Hands-on experience working in large, complex codebases
  • Proven experience designing and implementing LLM coding benchmarks and evaluation data pipelines
  • Strong command of Git and modern development workflows
  • Track record at a high-growth tech company or top-tier software organization
  • Strong written English communication.
  • Identity verification: Applicants must verify identity and have valid documentation to work as an independent contractor.

Benefits:

  • Identity verification required for independent contractors in residence country
  • Weekly payments via PayPal or Stripe