Site Reliability Engineer – Mobile and Internet Platform
Posted 102ds ago
Employment Information
Report this job
Job expired or something wrong with this job?
Job Description
Site Reliability Engineer ensuring stability and reliability of Mobile and Internet Banking platforms for a leading financial institution. Managing Kubernetes/OpenShift infrastructure and providing 24/7 operational support.
Responsibilities:
- Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards
- Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers
- Respond to and resolve production incidents with minimal mean time to resolution (MTTR)
- Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting
- Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments
- Provide 24/7 production support and on-call rotation for critical banking services
- Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues
- Conduct root cause analysis (RCA) on incidents and implement preventive measures
- Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise
- Implement alerting strategies and threshold management in Prometheus and Grafana
- Support infrastructure scaling, capacity planning, and load balancing in production environments
- Implement security best practices and compliance requirements for financial systems in containerized environments
- Manage on-premise data center infrastructure and server resources
- Document operational procedures, troubleshooting guides, and create knowledge base articles
Requirements:
- BSc in Computer Science, Information Technology, Software Engineering, or related field
- 2+ years of hands-on experience in SRE, DevOps, or Production Engineering roles
- Hands-on experience supporting production applications in Kubernetes/OpenShift environments
- Strong experience with OpenShift container platform administration and troubleshooting on on-premise infrastructure
- Proficiency with Prometheus for metrics collection and monitoring
- Proficiency with Grafana for dashboard creation and visualization
- Experience with ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging
- Strong understanding of Linux/Unix operating systems and networking fundamentals
- Practical experience with CI/CD tools and automation frameworks
- Proficiency in at least one programming/scripting language (Python, Go, or Bash)
- Experience with database management (SQL and NoSQL) on-premise
- Excellent troubleshooting and analytical skills for production support
- Strong communication skills and ability to work in cross-functional teams
- Experience in 24/7 production support environments
- Experience with on-premise data center infrastructure management
- Previous experience in financial services or banking sector is a plus
















