Coroot, an open source observability tool powered by eBPF, went generally available with version 1.0 last week. As this tool is cloud-native, we were curious to know how it can help troubleshoot databases on Kubernetes.
In this blog post, we will see how to quickly debug PostgreSQL with Coroot and Percona Operator for PostgreSQL.
Prepare
Install Coroot
The easiest way here is to use a helm chart. Add a repository:
1 2 | helm repo add coroot https://coroot.github.io/helm-charts helm repo update coroot |
Install Coroot:
1 | helm install --namespace coroot --create-namespace coroot coroot/coroot |
This will install Coroot, Prometheus with kube-state-metrics, coroot-node-agent and Clickhouse. Clickhouse is required for profiling, logs, and tracing.
Check out the detailed installation instructions in the official documentation.
Once the pods are up and running, connect to Coroot UI through port-forwarding:
1 | kubectl port-forward -n coroot service/coroot 8080:8080 |
Now access it at http://localhost:8080/.
Deploy PostgreSQL cluster with Operator
For consistency, we will use a helm as well. Add repository:
1 2 | helm repo add percona https://percona.github.io/percona-helm-charts/ helm repo update |
Deploy the Operator and the cluster:
1 2 | helm install my-operator percona/pg-operator helm install cluster1 percona/pg-db |
Verify that the database is up with kubectl get pg. It should be in a ready state.
More insights with PostgreSQL agent
Coroot provides agents for various applications, including PostgreSQL. With agents, users get more insights tailored to their application. To install the agent with Percona Operator, we need to add it as a sidecar and configure the user on the PostgreSQL side.
Under instances.[].sidecars section add the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | metadata: annotations: prometheus.io/scrape: 'true' prometheus.io/port: '80' sidecars: - name: coroot-pg-agent image: "ghcr.io/coroot/coroot-pg-agent:latest" ports: - containerPort: 80 protocol: TCP resources: requests: cpu: 100m memory: 100Mi env: - name: DSN |
You can read more about user management in our Operator in the documentation.
Coroot in action
Discovery
Coroot automatically discovers applications in Kubernetes (as many other eBPF tools do). For specific applications it shows all the components that it interacts with. For example, for my PostgreSQL cluster it looked like this:
I don’t have a lot going on, so there are just a few other containers that interact with my cluster.
SLO
SLO, Service Level Objectives, is a standard value that can quickly tell if the service-level agreement is met or broken for the application. I was restarting my cluster and nodes a lot, and that immediately showed some issues with SLO:
Once Coroot detects SLO budget breach, it highlights it on the graphs. This issue is visible across all the graphs for easier debugging.
Logs
Built-in centralized logging is a must for a modern observability system. It definitely helps to debug complex applications with multiple components. A highly available PostgreSQL cluster has pgBouncer pods, primary and replica nodes, and backup containers. All these components are hard to debug without proper logging.
Profiling
There is nothing better for good debugging than a nice flame graph. Looking into CPU profiling can tell a lot about what PostgreSQL and its components are doing and where resources are going. Obviously, it might be even more valuable to profile some applications.
PostgreSQL-specific metrics
For PostgreSQL-specific metrics, Coroot relies on pg_stat_statement and pg_stat_activity. It shows basic information about queries and connections. From an infrastructure perspective, I liked how it quickly pointed out when specific instances were restarted and when the switchover between primary and replica happened. For example, during the time the incident was detected, it is clearly visible that there were some restarts of the nodes.
You will see the queries that are taking the most of the resources. Also how the load impacts the replication lag. I ran sysbench to generate some load, and this is how it impacted my database:
Coroot’s use of eBPF technology for observability offers a powerful, user-friendly solution that enhances visibility and simplifies troubleshooting. The addition of PostgreSQL-specific agents further tailors this tool to the needs of database administrators by providing precise metrics and logs. Whether you are managing service-level objectives, tracking application interactions, or maintaining database health, Coroot equips you with the necessary insights to efficiently diagnose and resolve issues, ensuring your database operations run smoothly and reliably. This synergy between advanced monitoring tools and Kubernetes operations paves the way for more resilient and performant database environments in cloud-native ecosystems.
Try out Percona Operator for PostgreSQL Try out Coroot