Arcjet’s Post

How we achieve our 25ms p95 response time SLA? 1. Local-first security The first step to fast response times is processing as much as possible locally. Although your interaction with the Arcjet SDK is in JS, most of the analysis happens inside a WebAssembly module. We bundle this with the SDK, and it runs in-process without requiring any additional software like Redis or an agent. 2. A low-latency gRPC API in every cloud region Not everything can be analyzed locally. Rate limits need to be tracked across requests, and we operate an IP reputation database. This is all handled via our API. If a decision can’t be taken locally, then our API will augment the analysis with a final decision. As this blocks requests to your application, we set ourselves a p95 latency goal of 20ms, with an additional few millisecond allowance for network latency. This means it needs to run in the same cloud region - we’re currently deployed to 22 different regions. 3. Persistent HTTP/2 connections The slowest part of making a request is often establishing the initial connection. A normal TCP handshake requires 1 round trip, and the TLS handshake requires another 2. If a round trip takes just 1 ms, then that’s easily 3 ms taken up just opening the connection. Then the API request requires another round trip. A few months ago, we introduced support for multiplexing requests over a persistent HTTP/2 connection. When the Arcjet SDK is instantiated, we open a connection to our API and then keep that connection open over multiple requests. 4. Smart caching If a client request is denied, then the chances are their next request should probably also be denied, but it depends on which rule caused the deny decision. This is where our smart caching comes in. When an email verification request results in a deny, that should not be cached because it may be because the user made a typo. Their next request should be freshly re-evaluated. But if it’s a rate limit rule that denies a request, we can look at the configuration to determine whether another request is going to result in another deny. Not only do we consider whether they are already over the rate limit, the time to cache the result is determined by the rate limit configuration. If the limit applies for a fixed window over the next 30 seconds, then we’ll cache for 30 seconds. Results The result is our p95 API response time of around 25ms, which is within our goal range of 20-30ms. Our p50 is around 4ms, but we pay more attention to the outliers revealed by the p95 because those are what cause real user experience degradation.

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics