Director, Site Reliability and Software Engineering

Posted 2hrs ago

Employment Information

Education
Salary
Experience
Job Type

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability and Software Engineering leader managing NVIDIA's DGX Cloud computing services. Overseeing team operations and driving technical project success in innovative environment.

Responsibilities:

  • Manage a team of Software and Site Reliability engineers, including program development, task planning and code reviews.
  • Define team strategy and roadmap, and drive adoption of scalable SDLC practices, test infrastructure, and modern practices.
  • Drive technical projects and provide leadership in an innovative and fast-paced environment.
  • Be responsible for the overall planning, tracking and success of technical projects.
  • Work closely with project and product management teams to ensure best-in-class product development.
  • Contribute technically to the technical projects for DGX Cloud Computing Services.
  • Interact with key internal stakeholders to provide operational and financial clarity on technical spend.
  • Lead efforts related to executive reporting, dashboards, and operational CTO metrics focusing on continuous improvement and evolution to maximize decision making and executive visibility.

Requirements:

  • 12+ overall years of Experience in engineering management
  • 5+ years of leadership
  • Bachelor / Master degree in Computer Science, or equivalent experience
  • Experience in designing and implementing large-scale distributed systems
  • Experience in Containers / Virtualization environments/ Cluster solutions
  • Experience in managing Technical Support / DevOps teams
  • Strong knowledge in Unix/Linux.
  • Demonstrated people management and leadership skills, the proven track record of mentoring and coaching team members

Benefits:

  • equity
  • benefits