Centage is transforming how finance teams operate by providing modern, intuitive, and automated tools for budgeting, forecasting, and financial planning. As we embark on the next phase of scaling and platform transformation, we’re looking for an exceptional SRE to join our team and help future-proof our infrastructure.
Role Overview
As a Site Reliability Engineer at Centage, you'll partner closely with engineering to drive platform modernization, reliability, and data infrastructure evolution. This includes deep involvement in cloud, Kubernetes, and data workflows as well as leading initiatives around observability, security, and performance.
You’ll also play a critical role in supporting and re-architecting our legacy ETL pipelines (Jenkins + SSAS cubes) as we migrate to a modern MongoDB-backed approach . In addition, you’ll be expected to provide Tier 3 support during US hours, responding to production issues and contributing to root cause analysis and resolution
Requirements
Must-Have :
5+ years of experience in SRE, DevOps, or Infrastructure roles in a SaaS environment.
Deep hands-on experience with AWS infrastructure and services.
Strong working knowledge of Kubernetes, Helm, containerized workloads, and microservice deployment strategies.
Experience managing and optimizing MongoDB Atlas for production.
Demonstrated ability to migrate legacy data pipelines, especially those involving Jenkins, to modern GitOps stacks.
Proficiency in scripting (e.g., Bash, Python) or backend coding (Go, Node.js).
Effective communicator, able to coordinate across support and product teams.
Comfortable providing Tier 3 support during core US hours.
Nice-to-Have :
Experience with Workato or other iPaaS platforms.
Familiarity with GitOps (ArgoCD), service meshes (Istio), or AWS serverless architecture.
Background in financial, planning, or BI-related software systems.
What You’ll Gain
The opportunity to modernize key infrastructure and data systems that power Centage’s next-generation planning tools.
High autonomy and ability to influence engineering and platform direction.
Work in a collaborative, remote-friendly culture that values expertise and continuous improvement.
Responsibilities
Design, implement, and maintain secure, scalable infrastructure in AWS, including cost and performance optimization.
Manage, monitor, and scale Kubernetes (EKS) clusters; handle container lifecycle, Helm charts, and service mesh configuration.
Lead the re-architecture of our legacy Jenkins-based ETL pipelines, currently relying on SSAS data cubes, toward a modern MongoDB-native approach.
Provide day-to-day Tier 3 support during US hours, ensuring incident response readiness and supporting root cause analysis in coordination with support and development teams.
Optimize and scale MongoDB Atlas environments, managing replication, performance tuning, and availability.
Drive observability through tools like Sumo Logic (preferred), Grafana, Prometheus, and Datadog.
Implement and manage Infrastructure-as-Code solutions (Terraform, Pulumi) and contribute to CI / CD automation.
Own operational metrics (SLAs / SLOs / SLIs) and continuously improve system reliability through automation and process refinement.
Document best practices, build runbooks, and ensure system resilience through testing and chaos engineering where appropriate.
Benefits
Site Reliability Engineer • Kraków, Polska