Platform/DevOps Engineer

Posted 143ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Own infrastructure reliability and cost optimization for a production platform serving diverse customers. This role emphasizes building resilient, secure, and cost-efficient cloud infrastructure.

Responsibilities:

Ensure 99.5% uptime SLA across all production services and customer environments.
Design and maintain multi-region deployments to support geographic redundancy.
Implement automated failover mechanisms for databases, load balancers, and critical services.
Build and manage disaster recovery strategies, including automated backups and point-in-time recovery.
Lead incident detection, response, and postmortems, meeting defined SLAs for P0 issues.
Develop real-time observability dashboards for uptime, latency, error rates, and system health.
Monitor application and infrastructure performance metrics across customers.
Implement alerting, on-call rotations, escalation policies, and PagerDuty integrations.
Manage log aggregation and retention using SIEM platforms such as Splunk or Sumo Logic.
Support SOC 2 Type II preparation through security controls, monitoring, and documentation.
Implement vulnerability scanning, penetration testing coordination, and DLP controls.
Optimize cloud infrastructure costs through right-sizing, auto-scaling, and storage lifecycle policies.
Track and report infrastructure and API costs per customer, driving FinOps best practices.
Build automated runbooks and self-healing workflows for common incidents.

Requirements:

Strong experience as a Site Reliability Engineer, DevOps Engineer, or Platform Engineer.
Deep expertise in AWS cloud architecture (ECS, EKS, RDS, Lambda, S3, CloudFront).
Proven experience with Infrastructure as Code using Terraform or CloudFormation.
Hands-on production experience with Kubernetes and container orchestration.
Strong knowledge of observability and monitoring tools (Datadog, New Relic, Prometheus, Grafana).
Experience managing on-call rotations, incident response, and post-incident reviews.
Solid understanding of security practices including SIEM, vulnerability scanning, and SOC 2 compliance.
Demonstrated experience in cloud cost optimization and FinOps practices.
Ability to operate independently and prioritize reliability in high-availability environments.

Benefits:

Health insurance
Flexible work arrangements
Professional development opportunities

Platform/DevOps Engineer

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Benefits:

SMASH

Report this job

Similar Jobs

IVIX

Quevera

GFT Technologies

GFT Technologies

General Dynamics Information Technology

In All Media

EY

General Dynamics Information Technology

Veta Virtual

Alight Solutions

Vertical Relevance

HostPapa

Conexa Saúde

Cherokee Federal

Zealogics Inc

EverCommerce

Falconwood, Incorporated

RethinkFirst

DECA Games

Gainwell Technologies