Servicerank: Root cause identification of anomaly in large-scale microservice architectures
IEEE Transactions on Dependable and Secure Computing, 2021•ieeexplore.ieee.org
Nowadays, increasing business applications running in the cloud are embracing the
microservice architecture. This article presents the challenges and implications of
diagnosing root causes of anomalies in large-scale microservice architecture using real
incidents in IBM Bluemix. We propose ServiceRank, a novel framework for anomaly
detection and root cause identification in the microservice architecture to tackle these
challenges. ServiceRank introduces an anomaly detector followed by a root cause analysis …
microservice architecture. This article presents the challenges and implications of
diagnosing root causes of anomalies in large-scale microservice architecture using real
incidents in IBM Bluemix. We propose ServiceRank, a novel framework for anomaly
detection and root cause identification in the microservice architecture to tackle these
challenges. ServiceRank introduces an anomaly detector followed by a root cause analysis …
Nowadays, increasing business applications running in the cloud are embracing the microservice architecture. This article presents the challenges and implications of diagnosing root causes of anomalies in large-scale microservice architecture using real incidents in IBM Bluemix. We propose ServiceRank, a novel framework for anomaly detection and root cause identification in the microservice architecture to tackle these challenges. ServiceRank introduces an anomaly detector followed by a root cause analysis module, which detects the suspected abnormal service without pre-defined thresholds. To generalize our approach, we design a causal relationship extraction approach to construct impact graphs for root cause investigation according to specific anomalies. To eliminate cloud design-patterns' impact on anomaly diagnosis, we propose a correlation calibration mechanism in ServiceRank and present a calibration algorithm for the circuit breaker - A typical protection pattern in the microservice architecture. Finally, we design a heuristic investigation algorithm based on the second-order random walk to identify the anomaly's root cause. Experimental results in a simulated environment and the IBM Bluemix platform show that ServiceRank outperforms selected approaches in accuracy and offers fast identification of root cause service when an anomaly occurs. Moreover, we can deploy ServiceRank rapidly and easily in various systems without any pre-defined knowledge.
ieeexplore.ieee.org
Showing the best result for this search. See all results