AI Research Engineer – Kernel & Inference Optimization
Posted 59mins ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
AI Research Engineer responsible for optimizing model serving and inference architectures. Join Tether to innovate in the fintech space remotely from India.
Responsibilities:
- Design and deploy state-of-the-art model serving architectures that deliver high throughput and low latency while optimizing memory usage
- Ensure these pipelines run efficiently across diverse environments
- Establish clear performance targets such as reduced latency, improved token response, and minimized memory footprint
- Build, run, and monitor controlled inference tests in both simulated and live production environments
- Track key performance indicators such as response latency, throughput, memory consumption, and error rates
- Document iterative results and compare outcomes against established benchmarks
- Identify and prepare high-quality test datasets and simulation scenarios
- Analyze computational efficiency and diagnose bottlenecks in the serving pipeline
- Work closely with cross-functional teams to integrate optimized serving and inference frameworks into production pipelines
Requirements:
- A degree in Computer Science or related field
- Ideally PhD in NLP, Machine Learning, or a related field, complemented by a solid track record in AI R&D (with good publications in A* conferences)
- Must have knowledge of Metal Shading Language (MSL)
- Proven experience in low-level kernel optimizations and inference optimization on mobile devices is essential
- A deep understanding of modern model serving architectures and inference optimization techniques
- Must have strong expertise in writing GPU kernels for mobile devices (i.e., smartphones)
- Practical experience in developing and deploying end-to-end inference pipelines
- Demonstrated ability to apply empirical research to overcome challenges in model serving
- Distributed Inference Systems: Designing and optimizing high-performance inference engines
Benefits:
- Professional development opportunities
- Working remotely from every corner of the world











