Technical Staff Member – Data Intelligence

Posted 2hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Technical Staff role at Reka working with researchers and engineers to ensure data quality for ML. Building processing pipelines and optimizing data systems for model training.

Responsibilities:

  • Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds
  • Explore open source datasets and create internal ones most suitable to build fundamental World Models
  • Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.
  • Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs
  • Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction
  • Track and optimize throughput, storage, and compute utilization across pipelines and related assets

Requirements:

  • Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems
  • Comfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systems
  • Demonstrated research experience with data compositions, quality, and dataset releases
  • Ability to design and execute experiments with convincing unbiased outcomes
  • Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)
  • Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)
  • Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale
  • Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers
  • Bonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments

Benefits:

  • Flexible work arrangements