Site Reliability Engineer, Monitoring and Control Engineering

Posted 8ds ago

Employment Information

Education
Salary
Experience
Job Type

Job Description

Site Reliability Engineer responsible for NBCU's Distribution Engineering monitoring and control systems. Utilizing automation and on-call support, to ensure high availability.

Responsibilities:

  • Utilize scripting and automation to develop, customize and enhance monitoring/alerting tools for “on-air” environments
  • Interact with automated monitoring infrastructure to ensure healthy environments
  • Create system dashboards that improve system availability and reliability
  • Query data stores to quantify the scope of reported issues
  • Create new metrics and identify monitoring deliverables to improve site reliability
  • Act as a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations.
  • This role requires on-call 24/7 support on a rotating shift schedule
  • Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible.
  • Create comprehensive documentation outlining the intricacies of encountered issue, elucidating the root cause and steps for effective issue resolution.
  • Administer monitoring and control systems within the “on-air” environments
  • Develop proof of concept deployments for evaluation of products and architectures
  • Utilize modern frameworks and scripting languages to develop products and services for NBCU's IP video distribution environment

Requirements:

  • Bachelor’s degree in computer science or related degree
  • Experience with IP video and broadcast technologies
  • 3-5+ yrs experience with monitoring and alerting tools i.e. Grafana, Splunk, ELK Stack, Dataminer
  • Ability to develop end-to-end monitoring dashboards, alerts and reports for enterprise level environments
  • 3-5 years of SRE experience in the technology sector supporting and maintaining production-quality software or software-defined infrastructure in a high traffic environment run in a cloud environments (AWS preferred)
  • Ability to collect data from various systems using COTS APIs
  • Experience with scripting languages and tools i.e C#, Python, Bash
  • Experience with modern frontend technologies like Vite, React, NodeJS, Typescript
  • Experience with configuration management technology i.e. Ansible, Salt, and/or Chef
  • Experience with public cloud platforms such as AWS, GCP or Azure
  • Experience with networking and cloud-based network environments
  • Experience with containerization Docker & Kubernetes
  • Experience with CI/CD build (Github Actions), deployment practices, and Infrastructure as Code (Terraform)
  • Experience in administrating Linux and Windows environments
  • Ability to use Agile process for project management, development & tracking
  • Comfortable working in a fast-paced agile environment. Requirements change quickly and our team needs to adapt to moving targets.

Benefits:

  • medical, dental, and vision insurance
  • 401(k)
  • paid leave
  • tuition reimbursement
  • various other discounts and perks

NBCUniversal

Entertainment Providers

Here you can create the extraordinary. Join us.

View all jobs at NBCUniversal