AI Research Engineer – Kernel & Inference Optimization

Posted 1hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

AI Research Engineer focusing on model serving and inference at Tether, contributing to advancements in AI systems and architecture. Collaborating with a global team in a dynamic fintech environment.

Responsibilities:

  • Drive innovation in model serving and inference architectures for advanced AI systems
  • Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency
  • Ensure pipelines run efficiently across diverse environments
  • Establish clear performance targets
  • Build, run, and monitor controlled inference tests
  • Identify and prepare high-quality test datasets and simulation scenarios
  • Analyze computational efficiency and diagnose bottlenecks in the serving pipeline
  • Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines

Requirements:

  • A degree in Computer Science or related field
  • Ideally PhD in NLP, Machine Learning, or a related field
  • Must have knowledge of Metal Shading Language (MSL)
  • Proven experience in low-level kernel optimizations and inference optimization on mobile devices
  • A deep understanding of modern model serving architectures and inference optimization techniques
  • Strong expertise in writing GPU kernels for mobile devices
  • Practical experience in developing and deploying end-to-end inference pipelines
  • Demonstrated ability to apply empirical research to overcome challenges in model serving
  • Distributed Inference Systems: Designing and optimizing high-performance inference engines

Benefits:

  • Work remotely from anywhere in the world
  • Opportunity to collaborate with a global team
  • Professional development opportunities to hone your skills