AI Research Engineer, Computer Vision

Posted 11ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

AI Research Engineer in a social AI company focusing on image and video generation. Collaborating with researchers to implement cutting-edge machine learning models.

Responsibilities:

  • Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling.
  • Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models.
  • Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed). Debug training instabilities, loss spikes, and convergence issues.
  • Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality.
  • Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods).
  • Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics.
  • Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth.
  • Profile and optimize training speed, GPU memory utilization, and iteration time. Implement inference optimizations to reduce latency and compute cost.
  • Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate

Requirements:

  • 2–5 years of hands-on experience building and training ML systems, with strong ownership of results
  • Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code.
  • Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
  • Solid understanding of distributed training workflows and practical debugging of large training runs
  • Demonstrated ability to read and implement AI research papers in computer vision. Familiarity with cutting-edge computer vision models and research literature in the image and video domain.
  • Experience building data pipelines for large-scale image or video datasets
  • Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
  • Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale.

Benefits:

  • Competitive salary and generous company equity
  • Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
  • 42 days of paid time off, including:
  • 15 PTO days
  • 10 sick days
  • 15 company holidays
  • 2 floating holidays
  • Generous parental leave & fertility support
  • 401(k) retirement savings plan
  • Lifestyle spending account – $500/month to use however you’d like
  • Complimentary lunch and snacks for in-office employees
  • One Medical membership, and more!