Last Call! 📣 🚂 We're gearing up for our next Observability Engineering Community London meetup, which is happening this Wednesday! This time, we've got two great talks by Dinesh Nithyanandam, Lead for Observability, Performance, and Reliability at A.P. Moller - Maersk, and Carly Richmond, Principal Developer Advocate & Manager at Elastic. 👉 Dinesh will discuss how A.P. Moller - Maersk leverages Flagger to automate canary deployments across its extensive infrastructure, which includes 120 clusters and over 1,000 microservices running in a multi-cloud environment with AKS and GKE. By integrating with monitoring tools like Prometheus, Flagger continuously evaluates key SLOs, ensuring that new versions are only fully deployed if they meet defined reliability and performance standards. 👉 Carly will draw on her experience in building applications for investment banking to explore why validating long-term feedback on feature adoption is challenging. She will discuss how combining Real User Monitoring agents, such as Elastic RUM, with OpenTelemetry tracing for backend services and Elastic Observability can help quantify user experience satisfaction and adoption, ensuring effective user experiences. 📅 Event Details: Date: Wednesday, September 11th Time: 6:00 PM Location: 24 High Holborn · London See you all there! #Observability #Engineering #SRE #SLO #devops
Karim Traiaia 💭’s Post
More Relevant Posts
-
Last Call! 📣 🚂 We're gearing up for our next Observability Engineering Community London meetup, which is happening this Wednesday! This time, we've got two great talks by Dinesh Nithyanandam, Lead for Observability, Performance, and Reliability at A.P. Moller - Maersk, and Carly Richmond, Principal Developer Advocate & Manager at Elastic. 👉 Dinesh will discuss how A.P. Moller - Maersk leverages Flagger to automate canary deployments across its extensive infrastructure, which includes 120 clusters and over 1,000 microservices running in a multi-cloud environment with AKS and GKE. By integrating with monitoring tools like Prometheus, Flagger continuously evaluates key SLOs, ensuring that new versions are only fully deployed if they meet defined reliability and performance standards. 👉 Carly will draw on her experience in building applications for investment banking to explore why validating long-term feedback on feature adoption is challenging. She will discuss how combining Real User Monitoring agents, such as Elastic RUM, with OpenTelemetry tracing for backend services and Elastic Observability can help quantify user experience satisfaction and adoption, ensuring effective user experiences. 📅 Event Details: Date: Wednesday, September 11th Time: 6:00 PM Location: 24 High Holborn · London See you all there! #Observability #Engineering #SRE #SLO #devops
Observability Engineering Meetup | September Edition, Wed, Sep 11, 2024, 6:00 PM | Meetup
meetup.com
To view or add a comment, sign in
-
Amazing. The best part is how some of these tools are fostering greater #democratization within engineering teams, by allowing for knowledge sharing, enhanced team collaboration and transparency. Makes for a tremendous improvement in organizational work culture and productivity. #engineeringteams #trends
𝗧𝗿𝗲𝗻𝗱𝘀 𝗶𝗻 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴: After interviewing with some of the leading SRE, platform and developer experience teams at growing startups and large enterprises, we identified some of the leading trends within Platform Engineering that teams can think about. > Transitioning to Open Source Observability Stack > Optimising alerting > Focus on developer experience Read more on what's users' favourite OSS observability stack, and more in our recent publication here: https://lnkd.in/grYj3Sv9
Emerging Trends in Platform Engineering
notes.drdroid.io
To view or add a comment, sign in
-
𝗧𝗿𝗲𝗻𝗱𝘀 𝗶𝗻 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴: After interviewing with some of the leading SRE, platform and developer experience teams at growing startups and large enterprises, we identified some of the leading trends within Platform Engineering that teams can think about. > Transitioning to Open Source Observability Stack > Optimising alerting > Focus on developer experience Read more on what's users' favourite OSS observability stack, and more in our recent publication here: https://lnkd.in/grYj3Sv9
Emerging Trends in Platform Engineering
notes.drdroid.io
To view or add a comment, sign in
-
Hey everyone! Welcome back to another exciting edition of the Observability Engineering London meetup! This time, we’re diving deep into two critical aspects of engineering – dashboards, runbooks, and large-scale migrations. On Thursday, October 17th, we’ll be joined by two fantastic speakers: First up, we have Colin Douch , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices. Next, Will Sewell, Platform Engineer at Monzo Bank, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry. This is shaping up to be a great event for anyone working with observability, incident response, or large-scale infrastructure. See you there! #observability #opentelemetry #sre #devops #platform #engineering
Observability Engineering Meetup | October Edition, Thu, Oct 17, 2024, 6:00 AM | Meetup
meetup.com
To view or add a comment, sign in
-
Hey everyone! Welcome back to another exciting edition of the Observability Engineering London meetup! This time, we’re diving deep into two critical aspects of engineering – dashboards, runbooks, and large-scale migrations. On Thursday, October 17th, we’ll be joined by two fantastic speakers: First up, we have Colin Douch , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices. Next, Will Sewell, Platform Engineer at Monzo Bank, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry. This is shaping up to be a great event for anyone working with observability, incident response, or large-scale infrastructure. See you there! #observability #opentelemetry #sre #devops #platform #engineering
Observability Engineering Meetup | October Edition, Thu, Oct 17, 2024, 6:00 PM | Meetup
meetup.com
To view or add a comment, sign in
-
We're back with another edition of the Observability Engineering Community London Meetup, and it is shaping up to be one the best ones so far, with great talks from Colin Douch and Will Sewell 🤩 Spots are limited to make sure you save yours! #observability #engineering #sre #devops #platformengeneering
Hey everyone! Welcome back to another exciting edition of the Observability Engineering London meetup! This time, we’re diving deep into two critical aspects of engineering – dashboards, runbooks, and large-scale migrations. On Thursday, October 17th, we’ll be joined by two fantastic speakers: First up, we have Colin Douch , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices. Next, Will Sewell, Platform Engineer at Monzo Bank, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry. This is shaping up to be a great event for anyone working with observability, incident response, or large-scale infrastructure. See you there! #observability #opentelemetry #sre #devops #platform #engineering
Observability Engineering Meetup | October Edition, Thu, Oct 17, 2024, 6:00 PM | Meetup
meetup.com
To view or add a comment, sign in
-
The Observability Engineering Community London meetup is back for another edition! This time, we’re diving deep into dashboards, runbooks, and large-scale migrations. First up, we have Colin Douch , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices. Next, Will Sewell, Platform Engineer at Monzo, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry. Spots are limited to make sure you save your spot! #observability #sre #monitring #engineering #otel #devops
Hey everyone! Welcome back to another exciting edition of the Observability Engineering London meetup! This time, we’re diving deep into two critical aspects of engineering – dashboards, runbooks, and large-scale migrations. On Thursday, October 17th, we’ll be joined by two fantastic speakers: First up, we have Colin Douch , formerly the Observability Tech Lead at Cloudflare. Colin will explore the allure of creating hyper-specific dashboards and runbooks, and why this often does more harm than good in incident response. He’ll share insights on how to avoid the common pitfalls of hyper-specialization and provide a roadmap for using these tools more effectively in SRE practices. Next, Will Sewell, Platform Engineer at Monzo Bank, who will take us behind the scenes of how Monzo runs migrations across a staggering 2,800 microservices. Will’s talk will focus on Monzo’s approach to centrally driven migrations, with a specific look at their recent move from OpenTracing to OpenTelemetry. This is shaping up to be a great event for anyone working with observability, incident response, or large-scale infrastructure. See you there! #observability #opentelemetry #sre #devops #platform #engineering
Observability Engineering Meetup | October Edition, Thu, Oct 17, 2024, 6:00 PM | Meetup
meetup.com
To view or add a comment, sign in
-
Microservices can be improved by applying chaos engineering principles, which involve intentionally introducing faults, failures, and disruptions to test and strengthen the system's resilience. If you want to know more, check out this post below from Sayan Moitra!
🚀 Strengthening Microservices with Chaos Engineering in EKS and Istio 🚀 In the ever-evolving world of cloud-native applications, ensuring resilience isn’t just a luxury—it's a necessity. This week, I dove deep into chaos engineering, leveraging Chaos Mesh, Istio, and Amazon EKS to simulate complex, real-world failure scenarios across interconnected microservices. 🔍 The Challenge: How do you ensure your services can handle simultaneous: Network latency between critical services. Random pod failures in high-traffic components. Resource contention leading to performance degradation. 💡 The Solution: Using Chaos Mesh, I orchestrated: 1. Simultaneous Network Chaos: Injected delays between the reviews and ratings services. 2. Pod Failures: Randomly terminated 50% of pods in the productpage service. 3. Resource Stress: Simulated CPU overload in the ratings service. With Istio, we monitored traffic rerouting, retries, and circuit-breaking mechanisms in action. And with Grafana, we visualized how our system responded under pressure. 📊 Key Insights: 1️⃣ Chaos engineering helps uncover hidden dependencies and vulnerabilities. 2️⃣ Observability tools like Grafana and Istio's telemetry (Kiali) make analyzing resilience strategies easier. 3️⃣ Iterative testing fosters a culture of reliability and operational excellence. 🎯 Takeaway: Simulating failure isn’t about breaking your system—it's about preparing it to succeed under the toughest conditions. By proactively identifying weak points, you build applications that users can trust, no matter what. 💡 Curious to know more? I’ve written a detailed blog with a step-by-step demo. Check it out here: 👉 https://lnkd.in/dnCESWwS #CloudNative #ChaosEngineering #Kubernetes #AmazonEKS #Istio #Resilience #DevOps
Advanced Chaos Engineering: Chaos Mesh in EKS with Istio for Multi-Service Resilience Testing
moitrasayan007.medium.com
To view or add a comment, sign in
-
🚀 Strengthening Microservices with Chaos Engineering in EKS and Istio 🚀 In the ever-evolving world of cloud-native applications, ensuring resilience isn’t just a luxury—it's a necessity. This week, I dove deep into chaos engineering, leveraging Chaos Mesh, Istio, and Amazon EKS to simulate complex, real-world failure scenarios across interconnected microservices. 🔍 The Challenge: How do you ensure your services can handle simultaneous: Network latency between critical services. Random pod failures in high-traffic components. Resource contention leading to performance degradation. 💡 The Solution: Using Chaos Mesh, I orchestrated: 1. Simultaneous Network Chaos: Injected delays between the reviews and ratings services. 2. Pod Failures: Randomly terminated 50% of pods in the productpage service. 3. Resource Stress: Simulated CPU overload in the ratings service. With Istio, we monitored traffic rerouting, retries, and circuit-breaking mechanisms in action. And with Grafana, we visualized how our system responded under pressure. 📊 Key Insights: 1️⃣ Chaos engineering helps uncover hidden dependencies and vulnerabilities. 2️⃣ Observability tools like Grafana and Istio's telemetry (Kiali) make analyzing resilience strategies easier. 3️⃣ Iterative testing fosters a culture of reliability and operational excellence. 🎯 Takeaway: Simulating failure isn’t about breaking your system—it's about preparing it to succeed under the toughest conditions. By proactively identifying weak points, you build applications that users can trust, no matter what. 💡 Curious to know more? I’ve written a detailed blog with a step-by-step demo. Check it out here: 👉 https://lnkd.in/dnCESWwS #CloudNative #ChaosEngineering #Kubernetes #AmazonEKS #Istio #Resilience #DevOps
Advanced Chaos Engineering: Chaos Mesh in EKS with Istio for Multi-Service Resilience Testing
moitrasayan007.medium.com
To view or add a comment, sign in
-
Great post by Tina Huang & Divanny Lamas. Thanks for sharing Gergely Orosz's article! I agree with Tina's insightful point about the shift toward a more generalized engineering skill set, supported by AI, reflecting a broader trend in the tech industry toward adaptability, resilience, and deep integration of technology and human insight. This resonates strongly: AI can indeed augment us by handling specialized, granular tasks, allowing us to focus on the bigger, holistic picture. Divany's argument for a return to simplicity is also well-supported - spot on! The article mentioned below makes some provoking thoughts. However, with the above in mind, I would respectfully disagree with the point about a shift back from microservices to monolithic architectures. Microservices were introduced to promote decoupling and simplification, not add complexity. While managing distributed components does introduce some overhead, the key downsides mentioned in the article are not inherent to the microservices architecture itself. Rather, they stem from a lack of proper "architecture" thinking and domain modeling. Organizations that adopted microservices without a deep understanding of the big picture had to pay the price. Re-introducing proper "architecture" thinking and domain modeling, as Uber did in 2020, is a natural and necessary part of software development, not a shift away from microservices. I would argue that maintaining large monolithic codebases is even harder when the original developers who understood the logic are no longer around. This often leads to band-aid solutions and spaghetti code, which makes the whole system even more vulnerable, unstable, and more "oncall hell". While managing the proliferation of services is indeed challenging, that's where tools and automation can help tremendously. As the article rightly points out, "the decade of microservices has resulted in far better tooling to build, operate and manage microservices." In the current context, microservices have actually increased the importance of Site Reliability Engineering (SRE) and AI, presenting a huge opportunity (and need) in those areas. Ultimately, I believe that automating the deployment and management of distributed (micro) services is a more surmountable challenge than maintaining overly complex monolithic logic, especially as tooling and AI capabilities continue to evolve. #monoliths #architecture #microservices #SRE #tooling
Is SRE a zero interest rate phenomenon? In an era where operations has hit a crisis point, and our days of free money are behind us, we need a new approach that doesn't involve throwing more bodies at the problem. Too many organizations I've worked with see SRE as a panacea for overly complex architecture. It's dehumanizing for the people stuck in constant fire-fighting mode, and unsustainable for the teams dependent on them. I'm personally excited to see a return to simplicity - focus less on tech debt, and more on the business requirements. Let's value breadth in engineering again; full-stack oriented teams and operational ownership by developers makes for better products and better culture. Perhaps my one nitpick is that I *do* think AI will play a big role in helping in this transitional period and beyond. It's impossible to maintain documentation, and no one person can map the entirety of these systems and their dependencies in their heads. The tools are early, but the days of engineering and operations knowledge bases that are always up to date and never need to be maintained are on the immediate horizon ;) Great article, very much a must read for anyone running technical teams.
The end of 0% interest rates: what it means for software engineering practices
newsletter.pragmaticengineer.com
To view or add a comment, sign in