Senior Platform Engineer – DevOps, Infrastructure and Platform
Posted 14hrs ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Senior Platform Engineer at Ozmap responsible for AWS and Linux environments, troubleshooting, and building CI/CD pipelines for continuous delivery.
Responsibilities:
- Design, operate and evolve AWS (EC2) and on-premises environments with containers (Docker), ensuring availability, security and scalability;
- Operate and administer Linux production environments (systemd, kernel/network tuning, I/O, process troubleshooting);
- Build and evolve CI/CD pipelines from scratch, including quality and security gates;
- Develop end-to-end observability (instrumentation, exporters, PromQL, SLI/SLO, alerts);
- Lead advanced troubleshooting, root cause analysis and blameless post-mortems — driving structural change afterwards, not just producing a report;
- Implement automation using Infrastructure as Code;
- Analyze and optimize cloud costs: rightsizing, usage analysis and proposing data-driven alternatives;
- Act as a technical reference for developers and engineers, influencing architecture without relying on formal authority.
Requirements:
- Required: production experience operating core primitives in AWS (~4+ years): EC2, VPC/networking, IAM and security — production operation and technical decision-making;
- Linux and networking (~4+ years): server administration and production troubleshooting — disk full, OOM killer, network diagnostics; processes, memory and I/O;
- CI/CD built from scratch (~3+ years): pipelines created and evolved by you (GitHub Actions, Jenkins, self-hosted runners, secrets, caching, gates);
- End-to-end open-source observability (~2+ years): Prometheus, Grafana, Loki, VictoriaMetrics or equivalents — configured and operated by you, not just used. OpenTelemetry — including instrumentation, exporters, PromQL and SLI/SLO definition;
- Operation under managed layers: concrete experience with nginx/HAProxy/Envoy, Linux underneath, and leading the resolution of critical incidents you have driven;
- Docker in production (~3+ years): real operation of containers in critical environments — volumes, networking, resource management, graceful shutdown of services;
- High autonomy: receives an ambiguous problem ("our observability is weak") and delivers end-to-end;
- Ownership and proactivity: anticipates problems before they become incidents;
- Clear communication and technical influence, connecting development, infrastructure and business teams;
- Conducts post-mortems focused on root cause, organizational learning and continuous improvement, without a blame culture;
- Maturity to self-manage while working remotely.
Benefits:
- 💻 Equipment allowance – to ensure a comfortable work setup;
- 💚 Health support – because your well-being matters;
- 📚 Education support – we support your continuous development journey;
- 🎂 Birthday gift – because we like to celebrate together;
- 🏅 Recognition for tenure – your time with us is valued;
- 🗣️ Language support – to help you go beyond borders;
- 🏋️ TotalPass (for employee use only);
- 🌴 Paid leave after 12 months of employment;
- 🎉 Online integration events and socials.


















