Site Reliability Engineer

Posted 136ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer optimizing cloud-based systems and automating operations at Upsun. Collaborating across teams to enhance system reliability and efficiency with SRE practices.

Responsibilities:

Refine Monitoring and Observability: Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
Automate Deployments and Workflows: Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
Optimize CI/CD Pipelines: Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
Cloud Infrastructure Management: Help scale cloud-based systems on platforms like AWS, GCP, and Azure while minimizing technical debt and operational complexity.
Incident Response and Post-Mortem: Support incident management and lead post-mortem analysis, ensuring continuous improvement and knowledge sharing.
Collaborate with Cross-Functional Teams: Work closely with engineering and product teams to integrate reliability practices into the development lifecycle and prioritize reliability efforts.
Drive Technical Innovation: Introduce and champion new tools, technologies, and practices that improve system reliability, performance, and scalability.

Requirements:

DevOps, Cloud Operations, or SRE Expertise: A solid understanding of DevOps, Cloud Operations, or SRE principles, with a focus on reliability and scalability.
Advanced Linux Internals Expertise: Hands-on experience with Linux systems, including performance tuning, kernel configurations, and troubleshooting.
Programming Languages: Proficiency in programming languages such as Go (preferred) or Python, with a focus on building tools and automating processes.
Scripting Skills: Strong skills in scripting languages like Python, Bash, or Go to automate workflows, streamline tasks, and manage infrastructure.
Cloud Infrastructure Knowledge: Extensive experience with cloud platforms like AWS, GCP, and Azure, along with expertise in monitoring/logging frameworks and CI/CD pipelines.
Containerization and Orchestration: Hands-on experience with Docker, Kubernetes, and other containerization technologies for building and deploying scalable applications is a nice to have.
Problem-Solving and Collaboration: Strong problem-solving skills, system design experience, and the ability to collaborate effectively across teams.

Benefits:

Flexible PTO
Comprehensive healthcare coverage (UK, France, Spain)
Company stock options
Professional development budget
Office equipment budget
Wellness budget
Annual team gatherings
Internet reimbursement
Inclusive parental leave
Remote work travel program

Site Reliability Engineer

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

RemoteWoman

Report this job

Similar Jobs

Sophos

Internas

NDD Tech | Brasil

Semios

Cisco

OmegaHires

IRIUM

IRIUM

Envision Healthcare

Pear Tree.

Veeam Software

Veeam Software

Arclin

EXL

Celonis

Upstart

Leidos

Ascensus

The Home Depot

Stord