Why Pick Flagger for your Canary Deployments In this clip, Dinesh Nithyanandam explains why they picked Flagger as their Canary release tool at Moller-Maersk. #observability #sre #engineering
Observability Engineering Community
Internet Publishing
A community for engineers to have conversations, share interesting ideas, and learn about all things observability.
About us
Welcome to the Observability Engineering community! This is a place for engineers to have great conversations, share interesting ideas, and learn about all things observability. It doesn't matter if you're a pro, just getting started, or somewhere in the middle – we'd love for you to come hang out with us, connect with other observability nerds, pick up some new knowledge, and stay on top of what's new in the industry. Let's level up our observability game together!
- Website
-
https://www.meetup.com/observability_engineering/
External link for Observability Engineering Community
- Industry
- Internet Publishing
- Company size
- 2-10 employees
- Type
- Privately Held
Updates
-
Scaling Prometheus at Wise for Metrics Collection Toomas Ormisson explains how Wise scaled its Prometheus deployment to support a growing amount of metrics. #observability #sre #devops #engineering
-
Collection and storing terabytes of metrics a day at Wise How does a company with 50TB+ telemetry data a day handle metrics collection and storage? In this clip, Toomas Ormisson gives an overview of how metrics are collected and stored at Wise. #observability #engineering #sre
-
Managing alerts for 30 versus 1M SLOs at GCE Alex Palcuie explains the challenges of managing alerting when the number of SLOs grows from 30 to a few hundred to a few million. #observability #reliability #SRE #engineering
-
The SLO Matrix GCE uses to track 30M SLOs Alex Palcuie, SRE Google, explains the SLO Matrix he built and the challenges he had to overcome to track 30 million SLOs. #observability #sre #engineering
-
Tricks the GCE team uses for managing over 30 million Latency SLOs Latency SLOs can be tricky to deal with. In this clip, Alex Palcuie from Google discusses the strategy his team uses to simplify and accurately track latency SLOs for GCE. #observability #engineering #sre
-
Managing 30M SLOs at GCE with the Rule of 5 Errors In this clip, Alex Palcuie explains the "Rule of Five Errors" used by the Google Compute Engine SRE team for measuring reliability, which adjusts alarm thresholds based on request volume to ensure accurate and fair SLOs for all users, not just large-scale ones. #observability #sre #engineering
-
How does the Google Compute Engine (GCE) control plane work? In this clip, Alex Palcuie breaks down the role of the Google Compute Engine (GCE) control plane, detailing each step from user authentication to resource allocation and final delivery. #sre #googlecloud #observability
-
Migration Superpowers In this clip, Will Sewell highlights elements he calls "migration superpowers," which enable large-scale changes across Monzo's services.
-
Step 4 of Migrating to OpenTelemetry Will Sewell presents the fourth and final step in migrating from OpenTracing and Jaeger to OpenTelemetry: mass deployment and config-based features.