Senior System Software Engineer – AI Data Platform, Inference Factory Optimization

Posted 101ds ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Senior Software Engineer designing, building, and optimizing automation systems for NVIDIA's AI and high-performance computing platforms. Impacting AI model delivery and deployment globally across diverse environments.

Responsibilities:

  • Develop efficient infrastructure and tools for automating complex software processes
  • Implement advanced test harnesses, benchmarking frameworks, and analytical tools to optimize the performance and efficiency of our software and hardware platforms
  • Apply deep knowledge of operating systems, kernel internals, device drivers, memory management, storage, networking, and high-speed interconnects to build and troubleshoot highly performant systems
  • Work with engineering teams to understand needs, define requirements, and deliver efficient solutions
  • Set performance goals, monitor feedback, analyze data, and make continuous improvements for system reliability
  • Contribute to defining technical strategies and roadmaps for our platform automation initiatives, ensuring alignment with company-wide goals and standard methodologies

Requirements:

  • Bachelor's or equivalent experience in Computer Science, Computer Engineering, or a related technical field, or Master's degree or equivalent experience in a similar field
  • 5+ years of industry experience in software development, focusing on infrastructure, distributed systems, automation, and/or performance engineering
  • Proven ability to develop robust tools and automation using programming languages such as C++, Python, or Go
  • Experience with operating system internals, device drivers, memory management, and debugging performance issues in complex compute applications
  • Experience in designing, building, and operating large-scale distributed systems, with knowledge of networking protocols, cluster management, and high-performance interconnects
  • Experience building and maintaining automated testing, benchmarking, and continuous integration/continuous deployment pipelines
  • Outstanding analytical, problem-solving, and debugging skills, with a track record of resolving complex technical challenges
  • Excellent interpersonal and communication skills, with the ability to articulate complex technical concepts to diverse audiences and collaborate effectively across teams

Benefits:

  • Health insurance
  • 401(k) retirement plans
  • Paid time off
  • Flexible work arrangements
  • Professional development opportunities