Open Systems AG

Zürich

Vor 2 Monaten

Senior Observability Platform Engineer

Veröffentlicht:20 September 2025
Pensum:100%
Vertragsart:Festanstellung
Arbeitsort:Zürich

Job-Zusammenfassung

Wir suchen einen Senior Observability Platform Engineer bei Open Systems. Hier erwartet dich eine innovative Arbeitsumgebung und spannende Herausforderungen.

Aufgaben

Verantwortung für unsere Observability-Plattformen und deren Optimierung.
Zusammenarbeit mit Engineering-Teams zur Verbesserung der Systemleistung.
Implementierung neuer Tools und Strategien zur Kostensenkung.

Fähigkeiten

Mindestens 5 Jahre Erfahrung in der Plattform- oder Site-Reliability-Engineering.
Erfahrung mit Observability-Stacks wie Thanos oder Loki.
Kenntnisse in Kubernetes und GitOps für effektive Deployment-Prozesse.

Ist das hilfreich?

Über den Job

Senior Observability Platform Engineer

We are seeking a highly skilled and experienced Senior Platform Observability Engineer to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and efficiency of our core observability infrastructure that supports our engineering teams and customer-facing portal. Your work will include evolving these systems and participate in fostering adoption of observability best-practices in the organization.

You are excited by the prospect of managing more than 20 TB of telemetry data per day, originating from a fleet of 10 000+ nodes (including linux hosts, k8s clusters, VMs).

Key Responsibilities:

Observability Platform Operations

Configure, operate, and enhance our observability platforms and frameworks (Clickhouse, Thanos, Loki, Tempo, OpenTelemetry Collector + custom processors).

Continuously improve and drive organization-wide adoption of observability best-practices, ensuring comprehensive monitoring, logging, and tracing.

Develop and maintain automated solutions for monitoring, alerting, and incident response.

System Optimization

Collaborate with engineering teams to understand their needs and provide robust, scalable solutions utilizing the observability platform.

Optimize system performance and ensure high availability through proactive monitoring and maintenance.

Develop and implement strategies for cost optimization, capacity planning, and performance tuning.

Innovation and Improvement

Stay up-to-date with the latest industry trends, tools, and technologies to drive continuous improvement.

Experiment with and implement new tools, especially around observability and telemetry, to enhance platform capabilities.

Evaluate and integrate OpenTelemetry Collector where beneficial to enhance telemetry data collection and analysis.

Essential/Required Skills:

Observability Platforms: Proven track record in managing at least one of the following observability stacks: Thanos, Mimir, Cortex, Tempo, Loki or Clickhouse; with the ability to configure, operate, and improve these systems.

Kubernetes: Deep understanding of Kubernetes architecture and hands-on experience in managing resources on clusters.

Helm: Experience in writing and maintaining Helm charts, and understanding third-party charts to deploy and manage Kubernetes resources efficiently.

GitOps: Experience in continuous delivery and GitOps practices (version control, CI/CD pipelines).

Docker: Expertise in containerization, orchestration, and optimization of Docker workloads.

Linux: Proficiency in Linux system administration, including scripting and automation.

Desirable Skills

Coding Experience: Coding knowledge in Golang or a similar language.

Open Source: contributor to open source project written in Golang or a similar language.

OpenTelemetry Collector: Knowledge of the OpenTelemetry Collector or direction contribution to project.

Soft Skills

Quick Learner: Ability to quickly grasp new concepts and technologies, adapting to the evolving needs of the organization.

Communication: Excellent communication skills, with the ability to convey complex technical concepts to both technical and non-technical stakeholders.

Customer Focus: Keen awareness of customer needs and the impact of platform operations on both internal engineering teams and external users.

Collaborative Mindset: Strong ability to work collaboratively in cross-functional teams, contributing to a culture of continuous improvement and innovation.

Education and Experience

Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).

5+ years of experience in platform engineering, site reliability engineering, or a related role.

Demonstrated experience in managing large-scale infrastructures and observability platforms (such as Thanos, Mimir, Cortex, Tempo, Loki, Clickhouse).

What we offer:

You’ll be among people who believe in:

Caring PASSIONATELY about keeping our customers safe – We’re dedicated to solving problems. Whatever it takes.

Thinking UNCONVENTIONALLY to stay ahead – The world never fails to surprise us. So let’s surprise it first.

Doing the hard work to make things SIMPLE – Craft and hone something that delights in its simplicity.

Working COLLABORATIVELY to build success – The power of the team will always make us faster and better.

As a testament to this, Open Systems has been recognized as an outstanding place to work. You’ll be surrounded by smart teams who enrich your experience and provide opportunities you will need to develop your skills and advance your career.

We look forward to receiving your online application (please note that you have to compress your application into two attachments).

Come as you are! We search for amazing people of diverse backgrounds, experiences, abilities, and perspectives. Open Systems welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, disability, or veteran status.

Direct applications only will be considered.

About Open Systems:

Backed by the Service Experience Promise, Open Systems simply and cost-effectively connects and secures hybrid environments and thus ensures your organization can meet business objectives. Open Systems uniquely focuses on a superior user experience when helping organizations reduce risk, improve efficiency, and accelerate innovation. The Open Systems SASE Experience delivers on the promise of ZTNA with a comprehensive, unified and easy-to-implement and use SASE platform that combines SD-WAN and Security Service Edge delivered as a Service. We provide 24x7 operational management and engineering support from assigned engineering teams and ensure affordable and predictable costs.

Weitere relevante Jobsuchen

Kategorien:

Bau / Architektur / Engineering, Bauleitung / Bauingenieure / AVOR

Senior Observability Platform Engineer

Aufgaben

Fähigkeiten

Über den Job

Weitere relevante Jobsuchen

Kategorien:

Lohnrechner