Senior Site Reliability Engineer - Observability

ModernaWarszawa, Województwo mazowieckie, Polska

27 days ago

Job description

Senior Site Reliability Engineer - Observability

Join to apply for the Senior Site Reliability Engineer - Observability role at Moderna

We are seeking a Senior Site Reliability Engineer – Observability with deep expertise in designing, building, and operating observability solutions across application, database, host, and container environments. In this role, you will lead the development of a modern, open-source observability platform (e.g., Grafana, Prometheus, or similar) that is scalable, resilient, and cost-effective. This platform will form the foundation for enterprise-wide monitoring and log management, empowering teams to gain actionable insights, optimize performance, and improve system reliability.

This is a high-impact role for a self-starter who takes initiative and drives outcomes, with ownership spanning observability platforms, governance, agent fleet management, automation, and FinOps practices – shaping how Moderna advances its observability strategy in a rapidly growing global enterprise.

Here’s What You’ll Do

Your Key Responsibilities Will Be

Platform Ownership & Operations : Manage and advance Moderna’s enterprise observability platform with a focus on open-source and SaaS observability technologies (Grafana, Prometheus, Loki, Tempo, Jaeger, OpenTelemetry, Dynatrace, Splunk, etc.).
Lead governance, agent fleet management, and FinOps optimization to ensure the platform is scalable, cost-effective, and compliant with enterprise requirements.
Balance hands-on engineering work (building, configuring, and operating the platform) with strategic ownership (roadmap influence, governance, cost optimization).
Collaborate with vendors and open-source communities to influence feature roadmaps and maximize platform value.
Observability Engineering : Design and build scalable, resilient, and cost-optimized observability architectures to support application, database, host, and container monitoring.
Implement telemetry pipelines for metrics, traces, and logs using Grafana, Prometheus exporters, Kubernetes instrumentation, distributed tracing, or similar technologies.
Establish and evolve best practices for monitoring, alerting, SLOs / SLIs, and incident detection across hybrid environments (cloud-native and on-prem).
Partner with application and infrastructure teams to enable self-service observability capabilities, accelerating troubleshooting and reliability improvements.
Log Management : Build and maintain enterprise-scale log management capabilities within the observability platform.
Evolve log management to serve as a scalable, cost-effective alternative to traditional log aggregation solutions.
Partner with security and infrastructure teams to ensure logging meets performance, compliance, and retention requirements.
Incident Response & Collaboration : Integrate observability solutions with incident management platforms such as PagerDuty to streamline escalation, response, and workflow automation.
Oversee and optimize on-call processes, ensuring alerts are actionable, routed effectively, and resolved quickly.
Provide real-time telemetry during incidents and support root cause analysis (RCA) backed by observability data.
Automation & Integration : Develop automation using Python, Terraform, Ansible, and CI / CD pipelines to streamline observability workflows.
Implement self-healing mechanisms and automated remediation for recurring reliability issues.
Ensure integrations with enterprise platforms, including PagerDuty, ServiceNow, and Jira, to enhance incident, change, and problem management.
Analytics & Reporting : Deliver dashboards and reporting that provide actionable visibility into system health, reliability, and costs.
Track and report key metrics such as MTTA, MTTR, error rates, and cost per workload.
Knowledge Sharing & Continuous Improvement : Create documentation, runbooks, and training to support adoption and consistency across engineering teams.
Participate in post-incident reviews, applying lessons learned to refine monitoring strategies and prevent recurrence.
Promote a culture of continuous learning, improvement, and observability adoption across the enterprise.

The key Moderna Mindsets you’ll need to succeed in the role :

We behave like owners. You’ll be a self-starter who takes initiative and drives outcomes, going beyond assigned tasks to deliver platforms that create long-term value.

We act with urgency. Action today compounds the lives saved tomorrow. You will proactively optimize observability tools and workflows to enhance system performance and reliability.

We obsess over learning. We don’t have to be the smartest – we have to learn the fastest. In this role, you will continuously refine monitoring strategies based on real-time data and incident response learnings.

Here’s What You’ll Need (Basic Qualifications)

7+ years of experience in site reliability engineering, observability, or platform engineering.

Extensive expertise in managing and administering SaaS (Dynatrace, Splunk, or similar) or open-source observability platforms, including governance, agent fleet management, and cost optimization.

Proven experience designing and building scalable, resilient, and cost-effective observability platforms using Prometheus, Grafana, Node / Blackbox Exporters, Kubernetes, or similar.

Strong knowledge of observability practices (metrics, logs, traces, SLO / SLI design) across complex, large-scale enterprise environments.

Hands-on experience with incident management platforms such as PagerDuty and ITSM integrations (ServiceNow, Jira).

Proficiency in automation and infrastructure-as-code (Python, Terraform, Ansible, Bash).

Experience monitoring and troubleshooting hybrid and cloud-native environments (AWS, Azure, or GCP).

Strong problem-solving skills and the ability to operate in a high-paced, global environment.

Demonstrated ability to take initiative, work independently, and drive outcomes in complex enterprise environments.

Here’s What You’ll Bring to the Table (Preferred Qualifications)

Experience working in biotech, pharmaceutical, healthcare, or other regulated environments (e.g., GxP, HIPAA).

Experience with enterprise-scale log management (e.g., Loki, Elastic, Splunk) and retention / cost optimization.

Familiarity with ITSM processes and integrations with observability solutions.

Relevant certifications in AWS, Azure, Dynatrace, Splunk or related observability technologies.

A proactive, innovative mindset with a passion for open-source adoption, continuous improvement, and automation.

At Moderna, we believe that when you feel your best, you can do your best work. Our global benefits and well-being resources are designed to support you—at work, at home, and everywhere in between. Benefits include quality healthcare, lifestyle spending accounts, fitness and mindfulness resources, family planning and adoption benefits, generous PTO and other perks. Benefits may vary by country, and Moderna is an equal opportunity employer committed to non-discrimination.

About Moderna : Since 2010, Moderna has aspired to build the leading mRNA technology platform and a world-class team. We value belonging, diversity, and inclusion, and are dedicated to creating a culture that cares for patients, employees, the environment, and communities. Moderna is committed to equal opportunity in employment and non-discrimination. A 70 / 30 in-office model supports collaboration and mentorship. For more information, visit modernatx.com / careers.

Moderna is a smoke-free, alcohol-free, and drug-free work environment. If you meet the Basic Qualifications and are excited to contribute to our mission, please apply. Moderna is dedicated to providing reasonable accommodations for applicants with disabilities; contact for accommodations during the hiring process.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Note : This listing may include additional roles in the related field. Referrals increase your chances of interviewing at Moderna by 2x

#J-18808-Ljbffr

Create a job alert for this search

Site Reliability Engineer • Warszawa, Województwo mazowieckie, Polska

Related jobs

Promoted

Senior Site Reliability Engineer - Observability

Moderna TherapeuticsWarszawa, Województwo mazowieckie, Polska

The Role : • •Joining Moderna offers the unique opportunity to be part of a pioneering team that's revolutionizing medicine through mRNA technology, with a diverse pipeline of development programs acr...Show moreLast updated: 29 days ago

Promoted

Site Reliability Engineer

EvertzWarszawa, Województwo mazowieckie, Polska

We’re looking for highly motivated, passionate site reliability engineers to join our growing team.Our services are hosted in AWS, with a Serverless First mindset. As part of this role you will work...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

CognizantWarszawa, Województwo mazowieckie, Polska

Site Reliability Engineer (SRE).Hybrid (3 days / week in office).We’re expanding our engineering teams to support a strategic initiative in the financial domain. As a global tech partner with over 30 ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer

KUBOWarszawa, Masovian, Poland

Disaster Recovery architecture.We are currently supporting a global leader in cloud-based solutions for financial management, liquidity, and risk, in the search for an experienced Senior SRE / DevO...Show moreLast updated: 15 days ago

Promoted

Senior Site Reliability Engineer

TinkWarszawa, Województwo mazowieckie, Polska

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more t...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

act digitalWarszawa, Województwo mazowieckie, Polska

You should be the kind of person who automates first, understands the big picture, and can work alongside developers to enable velocity without sacrificing stability or security.This is not a runbo...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

ComplexioWarszawa, Województwo mazowieckie, Polska

Be among the first 25 applicants.Complexio’s Foundational AI platform automates business processes by ingesting and understanding complete enterprise data—both structured and unstructured.Through p...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Complexio LimitedWarszawa, Województwo mazowieckie, Polska

Complexio’s Foundational AI platform automates business processes by ingesting and understanding complete enterprise data—both structured and unstructured. Through proprietary models, knowledge grap...Show moreLast updated: 30+ days ago

Promoted

Site reliability engineer I Mid-Senior

Nord SecurityWarszawa, Województwo mazowieckie, Polska

We’re actively involved in all phases of development with other teams to obtain the best outcomes – from the simplest UI elements to innovative features. Develop, maintain and document internal syst...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Point72Warszawa, Województwo mazowieckie, Polska

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.A Career with Point72’s Technology Team. As Point72 reimagines the future of investing, our Technology ...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

Point72 Asset Management, L.PWarszawa, Województwo mazowieckie, Polska

A Career with Point72’s Technology Team.As Point72 reimagines the future of investing, our Technology group is constantly improving our company’s IT infrastructure, positioning us at the forefront ...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer

CognizantWarszawa, Województwo mazowieckie, Polska

As Top Employer, we are dedicated to helping the world's leading companies build stronger businesses — helping them go from doing digital to being digital. With the capacity to support various clien...Show moreLast updated: 30+ days ago

Promoted

Lead Site Reliability Engineer

SimCorpWarszawa, Województwo mazowieckie, Polska

Lead Site Reliability Engineer – SimCorp.Lead Site Reliability Engineer.Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology.If you are an innovativ...Show moreLast updated: 22 days ago

Promoted
New!

Tech Lead - Senior Site Reliability Engineer

GoogleWarszawa, mazowieckie, Polska

Tech Lead - Senior Site Reliability Engineer.Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems...Show moreLast updated: 3 hours ago

Promoted

Senior Site Reliability Engineer

Invisible TechnologiesWarszawa, Województwo mazowieckie, Polska

Invisible Technologies is the AI operating system for the enterprise.Our end-to-end AI Software Platform structures messy data, builds digital workflows, deploys agentic solutions, evaluates / measur...Show moreLast updated: 30+ days ago

Promoted

Senior Site Reliability Engineer

Visa Inc.Warszawa, Województwo mazowieckie, Polska

The Staff Site Reliability Engineering (SRE) is a critical part of our Visa Cloud platform strategy.In this role, you will be focused on ensuring Visa’s development platform and tooling allows our ...Show moreLast updated: 30+ days ago

Promoted

Lead Site Reliability Engineer

SimWojewództwo mazowieckie, Polska

Join some of the most innovative thinkers in FinTech as we lead the evolution of financial technology.If you are an innovative, curious, collaborative person who embraces challenges and wants to gr...Show moreLast updated: 30+ days ago

Promoted

Site Reliability Engineer

WiproWarszawa, Województwo mazowieckie, Polska

Be among the first 25 applicants.Get AI-powered advice on this job and more exclusive features.With 900+ employees in Poland supporting over 45 clients, we leverage our holistic portfolio of capabi...Show moreLast updated: 30+ days ago