Cloud Reliability Engineer

Posted 173ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Cloud Reliability Engineer responsible for maintaining cloud infrastructure in supply chain software. Collaborating with teams to optimize performance with automation and reliability principles.

Responsibilities:

Operate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments.
Manage and optimize Kubernetes clusters — deployment, scaling, patching, and upgrades.
Ensure system availability, scalability, and performance through proactive monitoring and optimization.
Maintain infrastructure-as-code (IaC) for consistent and repeatable deployments.
Identify opportunities for operational automation to eliminate manual processes (“reduce toil”).
Build and maintain automated pipelines for deployments, configuration, and remediation.
Develop self-healing mechanisms to automatically detect and resolve common service issues.
Design proactive monitoring, alerting, and observability dashboards (Dynatrace, DataDog).
Collaborate with DevOps and development teams to build reliable, observable, and resilient systems.
Monitor, troubleshoot, and resolve infrastructure and application issues.

Requirements:

Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience).
5+ years of experience in experience in Cloud Engineering, DevOps, or Site Reliability roles.
Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP).
Strong knowledge of Kubernetes deployment, management, and troubleshooting.
Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms.
Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible).
Strong troubleshooting and analytical skills across infrastructure and applications.
Experience with incident response, RCA, and postmortem processes.
A mindset of continuous improvement, reliability, and self-healing automation.
Understanding of SRE principles, SLAs/SLOs/SLIs, and chaos engineering practices.

Benefits:

Competitive salary
Flexible working hours
Professional development budget
Home office setup allowance
Global team events

17hr

Principal DevSecOps – Platform Engineer

DevSecOps Engineer developing and operating security automation platforms for Department of Defense and Federal customers. Focus on hands-on software development within a DevSecOps context.

Cloud Reliability Engineer

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

Infios

Report this job

Similar Jobs

General Dynamics Information Technology

PHIZENIX

Bertoni Solutions

Applaudo

Addvisor Group

Carrier

General Dynamics Information Technology

Abstra

Exoscale

Exoscale

Interval Group

Interval Group

GFT Technologies

Mercury Insurance

In All Media

Clever Real Estate

Tech9

Malwarebytes

Minor Hotels Europe and Americas

YPO