This job offer is not available in your country.

Site Reliability Engineer

Groupe SIIWarsaw

30+ days ago

Job description

We are looking for an experienced Site Reliability Engineer to work in a professional software development hub, responsible for delivering software solutions used by our client’s external private and business customers across the globe.

Our customer is a global retail company with over 16.000 stores in 26 countries, serving more than 6 million customers a day and having about 130.000 people working at their stores and support offices.

You will be focused on the Next Generation Retail Platform, aiming to improve its reliability, telemetry, observability, performance, maintainability, and quality.

Your role

Understanding the business criticality of supported services
Overseeing the production environment by monitoring availability, setting up alerting, and taking a holistic view of system health
Defining, gathering, and analyzing metrics from all systems (operating systems, applications, PaaS like Azure, and AWS) to assist in fault finding and performance tuning
Integrating monitoring, alerting, ticketing, and paging systems to provide instant information in case of incident, with appropriate incident thresholds defined
Actively proposing and implementing improvements to processes, access management, change management, resource utilization, and lifecycle of service
Operational support to production issues - engaging Support and Product Teams, triaging P1 production incidents
Demand forecasting and capacity planning – ensuring proper ratio of capacity / cost and efficiency of running services, avoiding over- and under-provisioning
Defining SLI, SLA, and SLO together with building dashboards to observe metrics
Driving or participating in postmortems
Leveraging automation to perform operations to scale with load and for menial tasks (toil)
Building critical paths of products built on the platform in cooperation with Product Teams
Mapping critical paths to logical and physical resources in cooperation with architects and DevOps
Partnering with Product Teams in capacity planning
Cooperating with architects and development teams in influencing services design and permanent resolution of defects in line with architectural principles and development practices

Your skills

Bachelor’s degree in computer science, IT, engineering, system analysis, or a related study (or equivalent or proven experience)

7 years of experience in IT Industry Development

3 years of experience in support or maintenance of microservices platform

Proven experience with setting up monitoring / alerting and reliability engineering

Excellent communication, analytical, planning, organizational and technical skills

Motivated and driven by achieving long-term business outcomes

Ability to speak English at C1 level

Experience in Azure

Good understanding of product management, agile principles, and development methodologies

A proactive approach to identifying problems, performance bottlenecks, and areas for improvement

Technical knowledge of areas like : microservice architecture, Java development, reliability patterns, SQL and NoSQL databases, CI / CD, Azure, TDD, BDD, etc.

Job no.240614-PO7G3

Create a job alert for this search

Site Reliability Engineer • Warsaw