Site Reliability Engineer
Posted 1ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Site Reliability Engineer ensuring resilience and performance of mission-critical Defense systems. Blends software engineering, automation, and operations expertise to build scalable platforms.
Responsibilities:
- Build/Design and maintain highly available, scalable systems across cloud and on‑prem environments.
- Develop automation solutions that improves observability, speeds recovery, and eliminates manual operational work.
- Implement monitoring, alerting, and performance tuning strategies that ensure system health.
- Collaborate with development and infrastructure teams to design reliable architectures and CI/CD pipelines.
- Conduct root cause analysis and drive systemic improvements to prevent future incidents.
- Champion SRE best practices such as SLIs/SLOs, error budgets, and automated incident response.
- Provide inputs into proposal operations in area of subject matter expertise, collaborating on solution elements and providing written narratives that describe technical solution elements designed for a specific opportunity.
Requirements:
- Work Experience: 15+ years in this space; system reliability, DevSecOps, cloud operations, or infrastructure engineering.
- Education: Bachelor's with 15 years or an additional 4 years of work experience in lieu of degree
- Strong scripting and automation skills (Python, Bash, PowerShell, etc.).
- Hands-on experience with monitoring tools (Prometheus, Grafana, Splunk, ELK, Datadog, etc.).
- Familiarity with Kubernetes, container orchestration, and modern CI/CD pipelines.
- Understanding of networking, Linux system internals, and distributed systems.
- Ability to troubleshoot complex technical issues across the stack.
- US Citizenship Required
- Candidate must possess active secret to start, and ability to attain Top Secret/SCI
- Preferred: Experience supporting DoD or other federal programs.
- Certifications such as Kubernetes (CKA/CKAD), AWS/Azure, or ITIL.
- Experience implementing SRE frameworks at scale.
Benefits:
- Our benefits package for all US-based employees includes a variety of medical plan options, some with Health Savings Accounts
- dental plan options
- a vision plan
- a 401(k) plan offering the ability to contribute both pre and post-tax dollars up to the IRS annual limits and receive a company match.
- To encourage work/life balance, GDIT offers employees full flex work weeks where possible
- a variety of paid time off plans, including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave.
- GDIT typically provides new employees with 15 days of paid leave per calendar year to be used for vacations, personal business, and illness and an additional 10 paid holidays per year.
- Paid leave and paid holidays are prorated based on the employee’s date of hire.
- The GDIT Paid Family Leave program provides a total of up to 160 hours of paid leave in a rolling 12 month period for eligible employees.
- To ensure our employees are able to protect their income, other offerings such as short and long-term disability benefits, life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance are provided or available.
- We regularly review our Total Rewards package to ensure our offerings are competitive and reflect what our employees have told us they value most.


















