Site Reliability Engineer – Mobile and Internet Platform

Posted 102ds ago

Employment Information

Industry

Education

Salary

Experience

Job Type

Location

Report this job

Job expired or something wrong with this job?

Job Description

Site Reliability Engineer ensuring stability and reliability of Mobile and Internet Banking platforms for a leading financial institution. Managing Kubernetes/OpenShift infrastructure and providing 24/7 operational support.

Responsibilities:

Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards
Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers
Respond to and resolve production incidents with minimal mean time to resolution (MTTR)
Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting
Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments
Provide 24/7 production support and on-call rotation for critical banking services
Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues
Conduct root cause analysis (RCA) on incidents and implement preventive measures
Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise
Implement alerting strategies and threshold management in Prometheus and Grafana
Support infrastructure scaling, capacity planning, and load balancing in production environments
Implement security best practices and compliance requirements for financial systems in containerized environments
Manage on-premise data center infrastructure and server resources
Document operational procedures, troubleshooting guides, and create knowledge base articles

Requirements:

BSc in Computer Science, Information Technology, Software Engineering, or related field
2+ years of hands-on experience in SRE, DevOps, or Production Engineering roles
Hands-on experience supporting production applications in Kubernetes/OpenShift environments
Strong experience with OpenShift container platform administration and troubleshooting on on-premise infrastructure
Proficiency with Prometheus for metrics collection and monitoring
Proficiency with Grafana for dashboard creation and visualization
Experience with ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging
Strong understanding of Linux/Unix operating systems and networking fundamentals
Practical experience with CI/CD tools and automation frameworks
Proficiency in at least one programming/scripting language (Python, Go, or Bash)
Experience with database management (SQL and NoSQL) on-premise
Excellent troubleshooting and analytical skills for production support
Strong communication skills and ability to work in cross-functional teams
Experience in 24/7 production support environments
Experience with on-premise data center infrastructure management
Previous experience in financial services or banking sector is a plus

Site Reliability Engineer – Mobile and Internet Platform

Employment Information

Report this job

Job Description

Responsibilities:

Requirements:

Xenon Seven

Report this job

Similar Jobs

ISCC – the International Sustainability and Carbon Certification

GRAS - Global Risk Assessment Services

MAIA

Smart Education Partners GmbH

coeo Group

Developer Akademie

Phantom

TRIMEDX

White Hat Gaming

Oscilar

RevenueCat

Oscilar

Invillia

workidentity GmbH

workidentity GmbH

workidentity GmbH

workidentity GmbH

Generac

Mind Computing

Minor Hotels Europe and Americas