DZone Spotlight

Sunday, September 22 View All Articles »

Low-Level Optimizations in ClickHouse: Utilizing Branch Prediction and SIMD To Speed Up Query Execution

By Taras Baranyuk

CORE

In data analysis, the need for fast query execution and data retrieval is paramount. Among numerous database management systems, ClickHouse stands out for its originality and, one could say, a specific niche, which, in my opinion, complicates its expansion in the database market. I’ll probably write a series of articles on different features of ClickHouse, and this article will be a general introduction with some interesting points that few people think about when using various databases. ClickHouse was created from scratch to process large volumes of data as part of analytical tasks, starting with the Yandex.Metrica project in 2009. The development was driven by the need to process many events generated by millions of websites to provide real-time analytical reports for Metrica’s clients. The requirements were very specific, and none of the existing databases at that time met the criteria. Let’s take a look at these requirements: Maximize query performance Real-time data processing Ability to store petabytes of data Fault tolerance in terms of data centers Flexible query language The list is pretty obvious, except perhaps for “fault tolerance in terms of data centers.” Let me expand on this point a bit more. Living in countries with unstable infrastructure and high infrastructure risks, ClickHouse developers face various unforeseen situations, such as accidental damage to cables, power outages, and flooding with water from a burst pipe that, for some reason, was near the servers. All of this can interrupt the work of data centers. Yandex strategically designs services, including the database for Metrics, to ensure continuous operation even under such extreme conditions. This requirement is especially true given the need to process and store petabytes of data in real time. It’s as if the database was designed to survive an “infrastructure apocalypse.” There was nothing suitable on the market at that time. Only a few databases could realize, at most, three out of five parameters and that with some pretensions, five was out of the question. Key Features ClickHouse focuses on interactive queries that run in a second or faster. This is important because a user won’t wait if a report takes longer to load. Analysts also benefit from instant query responses, allowing them to ask more queries and focus on working with the data, improving the quality of analysis. ClickHouse uses SQL; it’s obvious. The advantage is that SQL is known to all analysts. However, SQL is not flexible for arbitrary data transformations, so ClickHouse has added many extensions and features. It is rare for ClickHouse to aggregate data in advance to maintain the flexibility and accuracy of reports. Storing individual events avoids loss of aggregation. Developers working with ClickHouse should allocate event attributes in advance and pass them to the system in a structured form, avoiding using unstructured formats to preserve the interactivity of queries. How To Execute a Query Quickly Quick read: Only the required columns Read locality, i.e., index is needed Data compression 2. Fast processing: Block processing Low-level optimizations 1. Quick Read The easiest way to speed up a query in basic analytics scenarios is to use columnar data organization, i.e., storing data by column. This allows you to load only those columns needed for a particular query. When the number of columns reaches hundreds, loading all the data will slow down the system — and this is a scenario we need to avoid! Since the data usually does not fit into RAM, organizing local readings from disk is necessary. Full loading of the entire table is inefficient, so it is required to use an index to limit reading to only the essential parts of the data. However, even when reading this part of the data, access to the data must be localized — moving around the disk in search of the necessary data will significantly slow down the query execution. Finally, data must be compressed. This reduces their volume and significantly saves disk bandwidth, which is critical for high processing speeds. 2. Fast Processing And now, finally, I’m getting to the point where I’m summarizing the primary purpose of this article. Once the data has been read, it needs to be processed very quickly, and ClickHouse provides many mechanisms for this. The main advantage is the processing of data in blocks. A block is a small part of a table consisting of several thousand rows. This is important because ClickHouse works like an interpreter, and interpreters can be notoriously slow. However, if you spread the processing overhead over thousands of rows, this becomes imperceptible. Working with blocks allows using SIMD instructions, significantly speeding up data processing. When analyzing weblogs, a block may contain data on thousands of queries. These queries are processed simultaneously using SIMD instructions, providing high performance and minimal time consumption. Block processing also has a favorable effect on processor cache utilization. When a block of data is loaded into the cache, processing it in the cache is much faster than if the data were constantly unloaded and loaded from main memory. For example, when working with large analytics tables in ClickHouse, caching allows you to process data faster and minimize memory access costs. ClickHouse also uses many low-level optimizations. For example, data aggregation and filtering functions are designed to minimize the number of operations and maximize the capabilities of modern processors. SIMD Again, in ClickHouse, data is processed in blocks that include multiple columns with a set of rows. By default, the maximum block size is 65,505 rows. A block is an array of columns, each an array of primitive-type data. This approach to array processing in the engine provides several key benefits: Optimizes cache and CPU pipeline usage Allows the compiler to automatically vectorize code using SIMD instructions to improve performance Let’s start with the difficulties associated with SIMD implementation: There are many SIMD instruction sets, and each requires a different implementation. Not all processors, especially older or low-cost models, support modern SIMD instruction sets. Platform-dependent code is challenging to develop and maintain, which increases the likelihood of bugs. Incorporating platform-dependent code requires a specific approach for each compiler, making it difficult to use in different environments. Besides, you should consider that when developing code using SIMD, it is important to test it on different architectures to avoid compatibility and correctness problems. So, how were these challenges met? Briefly: Insertion and generation of platform-specific code are done through macros, simplifying the management of different architectures. All platform-specific objects and functions are in separate namespaces, which improves code organization and support. If the code is unsuitable for any architecture, it is automatically excluded, and the current platform is automatically determined. The optimal implementation is selected from the available options using the Bayesian multi-armed bandit method, which allows dynamically selecting the most efficient approach depending on the execution conditions. This approach allows you to consider different architectural features and customize your code for a specific platform without excessive complexity or the risk of bugs. A Little Bit of Code If you look at the code, the most crucial class that takes care of the basic functionality of implementation selection is ImplementationSelector. Let’s take a look at what this class is all about: template <typename FunctionInterface> class ImplementationSelector : WithContext { public: using ImplementationPtr = std::shared_ptr<FunctionInterface>; explicit ImplementationSelector(ContextPtr context_) : WithContext(context_) {} ColumnPtr selectAndExecute(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const { if (implementations.empty()) throw Exception(ErrorCodes::NO_SUITABLE_FUNCTION_IMPLEMENTATION, "There are no available implementations for function " "TODO(dakovalkov): add name"); bool considerable = (input_rows_count > 1000); ColumnPtr res; size_t id = statistics.select(considerable); Stopwatch watch; if constexpr (std::is_same_v<FunctionInterface, IFunction>) res = implementations[id]->executeImpl(arguments, result_type, input_rows_count); else res = implementations[id]->execute(arguments, result_type, input_rows_count); watch.stop(); if (considerable) { statistics.complete(id, watch.elapsedSeconds(), input_rows_count); } return res; } template <TargetArch Arch, typename FunctionImpl, typename ...Args> void registerImplementation(Args &&... args) { if (isArchSupported(Arch)) { const auto & choose_impl = getContext()->getSettingsRef().function_implementation.value; if (choose_impl.empty() || choose_impl == detail::getImplementationTag<FunctionImpl>(Arch)) { implementations.emplace_back(std::make_shared<FunctionImpl>(std::forward<Args>(args)...)); statistics.emplace_back(); } } } private: std::vector<ImplementationPtr> implementations; mutable detail::PerformanceStatistics statistics; }; It is this class that provides flexibility and scalability when working with different processor architectures, automatically selecting the most efficient function implementation based on statistics and system characteristics. The main points to look out for are: FunctionInterface: This is the interface of the function that is used in the implementation. This is usually IFunction or IExecutableFunctionImpl, but it can also be any interface with an execute method. This parameter specifies which particular implementation will be used to execute the function. context_: This is a pointer to a context (e.g., ContextPtr) that stores information about the current execution environment. This allows the implementer to choose an optimal strategy based on the context information. SelectAndExecute: This method selects the best implementation based on processor architecture and statistics of previous runs. Depending on the function, the interface calls either executeImpl or execute. The default selection will be made if there is not enough data to gather statistics (e.g., too few rows). registerImplementation: This is a method that registers a new function implementation for the specified architecture. If the architecture is supported by the processor, an instance of the implementation is created and added to the list of available implementations. std::vector<ImplementationPtr> implementations: This stores all registered implementations of the function. Each vector element is a smart pointer to a specific implementation, depending on the architecture. mutable detail::PerformanceStatistics statistics: Performance statistics collected from previous runs. It is protected by an internal mutex, which allows you to safely collect and analyze data about execution time and the number of processed rows. The code uses macros to generate platform-dependent code, making managing different implementations for different processor architectures easy. Example of Using ImplementationSelector As an example, let’s look at how UUID generation is implemented. And a little bit of code again: #include <DataTypes/DataTypeUUID.h> #include <Functions/FunctionFactory.h> #include <Functions/FunctionHelpers.h> #include <Functions/FunctionsRandom.h> namespace DB { #define DECLARE_SEVERAL_IMPLEMENTATIONS(...) \ DECLARE_DEFAULT_CODE (__VA_ARGS__) \ DECLARE_AVX2_SPECIFIC_CODE(__VA_ARGS__) DECLARE_SEVERAL_IMPLEMENTATIONS( class FunctionGenerateUUIDv4 : public IFunction { public: static constexpr auto name = "generateUUIDv4"; String getName() const override { return name; } size_t getNumberOfArguments() const override { return 0; } bool isDeterministic() const override { return false; } bool isDeterministicInScopeOfQuery() const override { return false; } bool useDefaultImplementationForNulls() const override { return false; } bool isSuitableForShortCircuitArgumentsExecution(const DataTypesWithConstInfo & /*arguments*/) const override { return false; } bool isVariadic() const override { return true; } DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override { FunctionArgumentDescriptors mandatory_args; FunctionArgumentDescriptors optional_args{ {"expr", nullptr, nullptr, "any type"} }; validateFunctionArguments(*this, arguments, mandatory_args, optional_args); return std::make_shared<DataTypeUUID>(); } ColumnPtr executeImpl(const ColumnsWithTypeAndName &, const DataTypePtr &, size_t input_rows_count) const override { auto col_res = ColumnVector<UUID>::create(); typename ColumnVector<UUID>::Container & vec_to = col_res->getData(); size_t size = input_rows_count; vec_to.resize(size); /// RandImpl is target-dependent and is not the same in different TargetSpecific namespaces. RandImpl::execute(reinterpret_cast<char *>(vec_to.data()), vec_to.size() * sizeof(UUID)); for (UUID & uuid : vec_to) { UUIDHelpers::getHighBytes(uuid) = (UUIDHelpers::getHighBytes(uuid) & 0xffffffffffff0fffull) | 0x0000000000004000ull; UUIDHelpers::getLowBytes(uuid) = (UUIDHelpers::getLowBytes(uuid) & 0x3fffffffffffffffull) | 0x8000000000000000ull; } return col_res; } }; ) // DECLARE_SEVERAL_IMPLEMENTATIONS #undef DECLARE_SEVERAL_IMPLEMENTATIONS class FunctionGenerateUUIDv4 : public TargetSpecific::Default::FunctionGenerateUUIDv4 { public: explicit FunctionGenerateUUIDv4(ContextPtr context) : selector(context) { selector.registerImplementation<TargetArch::Default, TargetSpecific::Default::FunctionGenerateUUIDv4>(); #if USE_MULTITARGET_CODE selector.registerImplementation<TargetArch::AVX2, TargetSpecific::AVX2::FunctionGenerateUUIDv4>(); #endif } ColumnPtr executeImpl(const ColumnsWithTypeAndName & arguments, const DataTypePtr & result_type, size_t input_rows_count) const override { return selector.selectAndExecute(arguments, result_type, input_rows_count); } static FunctionPtr create(ContextPtr context) { return std::make_shared<FunctionGenerateUUIDv4>(context); } private: ImplementationSelector<IFunction> selector; }; REGISTER_FUNCTION(GenerateUUIDv4) { factory.registerFunction<FunctionGenerateUUIDv4>(); } } The code above contains the generateUUIDv4 function, which generates a random UUID and can choose the best implementation depending on the processor architecture (e.g., using SIMD instructions on AVX2-enabled processors). How It Works Declaring Multiple Implementations The DECLARE_SEVERAL_IMPLEMENTATIONS macro declares multiple versions of a function depending on the processor architecture. In this case, two implementations are declared: the standard (default) and AVX2-enabled version for processors supporting the corresponding SIMD instructions. FunctionGenerateUUIDv4 Class This class inherits from the IFunction, which we have already met in the previous section, and implements the basic logic of the UUID generation function. getName(): Returns the name of the function — generateUUUIDv4 getNumberOfArguments(): Returns 0 since the function takes no arguments isDeterministic(): Returns false since the function’s result changes with each call getReturnTypeImpl(): Determines the function’s return data type, the UUID executeImpl(): This is the main part of the function where UUID generation is performed UUID Generation The executeImpl() method generates a vector of UUIDs for all rows (defined by the input_rows_count variable). randimpl::execute is used to generate random bytes that populate each entry in the column. Each UUID is then modified as required by RFC 4122 for UUID v4. This includes setting certain bits in the high and low parts of the UUID to indicate version and variant. Selecting the Optimal Implementation The second version of the FunctionGenerateUUIDv4 class uses ImplementationSelector, which allows you to select the optimal implementation of a function depending on the processor architecture. selector.registerImplementation(): The constructor registers two implementations: the default (Default) and for AVX2-enabled processors (if USE_MULTITARGET_CODE is enabled). selectAndExecute(): The executeImpl() method calls this method, which selects the most efficient implementation of the function based on the architecture and statistics of previous runs. Registering the Function At the end of the code, the function is registered in the function factory using the REGISTER_FUNCTION(GenerateUUIDv4) macro. This allows ClickHouse to use it in SQL queries. Now let’s look at the operation of the code step by step: When calling the generateUUIDv4 function, ClickHouse first checks which processor architecture is being used. Depending on the architecture (for example, whether the processor supports AVX2), the best function implementation is selected using ImplementationSelector. The function generates random UUIDs using the RandImpl::execute method and then modifies them according to the UUID v4 standard. The result is returned as a UUID column that is ready for queries. Thus, processors without AVX2 support will use the standard implementation, and processors with AVX2 support will use an optimized version with SIMD instructions to speed up UUID generation. Some Statistics For a query like this: SELECT count() FROM ( SELECT generateUUIDv4() AS uuid FROM numbers(100000000) ) … you get some nice speed gain numbers at the expense of SIMD. Opinion From my experience with ClickHouse, I can say that there are many things under the hood that greatly simplify the lives of analysts, data scientists, MLEs, and even DevOps. All this functionality is available absolutely free of charge and with a relatively low entry threshold. There is no perfect database, just as there is no ideal solution to any problem. But I believe that ClickHouse is close enough to this limit. And it would be a significant omission not to try it as one of the practical tools for creating large systems. More

Optimizing Cost and Carbon Footprint With Smart Scaling Using SQS Queue Triggers: Part 1

By Varun Dixit

The rising demand for sustainable and eco-friendly computing solutions has given rise to the area of green computing. According to Wikipedia, Green computing, green IT (Information Technology), or ICT sustainability, is the study and practice of environmentally sustainable computing or IT. With the rise of cloud computing in the last few decades, green computing has become a big focus in the design and architecture of software systems in the cloud. This focus on optimizing the resources for energy efficiency allows us to focus on cost optimization in addition to reducing the carbon footprint. AWS is one of the cloud providers leading the movement from the front. AWS is investing heavily in renewable energy and designing its data centers with efficiency in mind. According to the Amazon Climate Pledge, Amazon aims to achieve net-zero carbon emissions by 2040 and has already made significant strides, including powering its operations with 100% renewable energy by 2025. Additionally, AWS provides tools like the AWS Carbon Footprint Calculator, enabling customers to monitor and reduce their carbon emissions, thereby fostering a more sustainable digital ecosystem. In this article, we will explore a common approach for reducing carbon emissions: exploiting the temporal flexibility inherent to many cloud workloads by executing them in periods with the greenest energy and suspending them at other times. We will explore how we can use AWS SQS and AWS Autoscaling to optimize the use of EC2 compute capacity to reduce carbon emissions. What Is AWS Autoscaling? According to AWS, AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Using AWS Auto Scaling, it’s easy to set up application scaling for multiple resources across multiple services in minutes. Figure 1: Illustration of Auto Scaling in AWS What Is Amazon SQS? Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service offered by AWS. It enables the decoupling and scaling of microservices, distributed systems, and serverless applications. SQS allows you to send, store, and receive messages between software components without losing messages or requiring other services to be available. This decoupled architecture is excellent for the use case for event-based services or task queuing kind of workloads where one service sends a task to be executed to the queue and other services trigger some action based upon the message received from the queue. Figure 2: SQS Use case Illustration Creating SQS Metrics-Based Auto Scaling Policies in AWS Using Amazon EC2 with Amazon SQS for dynamic scaling not only optimizes cost and resource utilization but also contributes to reducing carbon emissions. By automatically scaling up only when demand increases and scaling down when it's low, the system ensures that you're not running more instances than necessary. This efficient use of resources minimizes the energy consumption associated with over-provisioned infrastructure. Additionally, by reducing the number of idle or underutilized servers, you lower the overall energy footprint of your application. In essence, this intelligent scaling approach not only supports a cost-effective and flexible application architecture but also aligns with sustainability goals by reducing unnecessary energy use and associated carbon emissions. Figure 3: Auto scaling with SQS Setup Step 1: Create an Amazon SQS Queue Sign in to AWS Management Console: Go to the Amazon SQS service. Create a New Queue: Click "Create queue." Choose between a Standard or FIFO queue based on your needs. Configure the queue name, settings (visibility timeout, message retention, etc.), and click "Create Queue." Figure 4: SQS creation pop-up Step 2: Create a Launch Template or Configuration Navigate to EC2 in the console: Go to the "Launch Templates" section or "Launch Configurations" (if using Launch Configurations). Create a new launch template/configuration: Click "Create Launch Template/Configuration." Configure the instance details (AMI, instance type, key pair, security groups, etc.). Save the launch template/configuration. Figure 5: Launch template creation page Step 3: Set Up a CloudWatch Alarm Based on SQS Metrics Alarm 1: Scaling Up Go to CloudWatch console: Navigate to the "Alarms" section. Create a new alarm: Click "Create alarm." Choose SQS as the metric source and select the metric "ApproximateNumberOfMessagesVisible." Set the condition for scaling up (e.g., when the number of messages exceeds a high threshold). Configure the alarm to trigger a scaling action to add instances to the Auto Scaling Group. Alarm 2: Scaling Down Create a second alarm: Repeat the steps to create another alarm. Choose the same SQS metric ("ApproximateNumberOfMessagesVisible"). Set a lower threshold for scaling down (e.g., when the number of messages drops below a certain level). Configure the alarm to trigger a scaling action to remove instances from the Auto Scaling Group. Figure 6: CloudWatch alarm creation page Step 4: Create an Auto Scaling Group 1. Go to the auto-scaling console: Navigate to "Auto Scaling Groups" under the EC2 service. 2. Create a new auto-scaling group: Click "Create Auto Scaling group." Select the launch template or configuration created in Step 2. Specify the Auto Scaling Group name and the VPC/subnet. Set the desired capacity, minimum, and maximum number of instances. Click "Next" to configure scaling policies. 3. Configure auto-scaling policies: In the Auto Scaling Group creation process, attach both CloudWatch alarms you created. Define the scaling actions: Scaling Up: Add instances when the first alarm triggers. Scaling Down: Remove instances when the second alarm triggers. Ensure that the scaling actions are set to match your desired thresholds for cost-effectiveness and performance. Figure 7: Auto Scaling Group Creation Page Step 6: Test the Auto Scaling Group Send messages to the SQS Queue: Manually send a few messages to the SQS queue to simulate a workload. Figure 8: Send SQS message page Monitor the scaling: Check the Auto Scaling Group to see if it adds instances as the queue size increases. Observe if the system scales down correctly when the number of messages decreases. Conclusion Implementing dynamic scaling based on Amazon SQS messages greatly improves the reliability, scalability, and cost-effectiveness of your AWS environment. By adjusting resources automatically according to real-time demand, you ensure that your infrastructure is always appropriately sized, avoiding both excessive and insufficient resource allocation. This approach not only enhances user experience but also supports business growth by adapting seamlessly to changing workloads. A major benefit of this strategy is its positive impact on energy efficiency and sustainability. As companies increasingly prioritize reducing their carbon footprint, focusing on energy efficiency becomes essential. By scaling resources up only when needed and scaling down during quieter periods, you cut down on unnecessary energy use. This not only helps lower costs but also reduces your environmental impact, aligning with broader goals of sustainability. In today's world, where being environmentally conscious is crucial, adopting energy-efficient practices is key. Dynamic scaling based on SQS messages helps your AWS environment run more sustainably by matching energy use closely with actual demand. This careful management of resources leads to reduced carbon emissions and supports the global effort toward greener technology solutions. Looking ahead to Part 2, we’ll tackle the challenges of using basic SQS metrics for dynamic scaling. Simple metrics, like the number of messages in the queue, often don’t fully capture the complexities of your workload. We’ll explore these limitations and discuss how to create more effective dynamic metrics. This involves using custom CloudWatch metrics, integrating multiple data sources, and applying advanced analytics to gain a clearer picture of your system's needs. By improving these metrics, you can refine your scaling strategies, optimize resource use, and enhance both cost efficiency and energy conservation. More

Trend Report

Enterprise Security

Security is everywhere: Behind every highly performant application, or even detected threat, there is a powerful security system and set of processes implemented. And in the off chance there are NOT such systems in place, that fact will quickly make itself known. We are living in an entirely new world, where bad actors are growing more and more sophisticated the moment we make ourselves "comfortable." So how do you remain hypervigilant in this ever so treacherous environment?DZone's annual Enterprise Security Trend Report has you covered. The research and expert articles explore the fastest emerging techniques and nuances in the security space, diving into key topics like CSPM, full-stack security practices and challenges, SBOMs and DevSecOps for secure software supply chains, threat hunting, secrets management, zero-trust security, and more. It's time to expand your organization's tactics and put any future attackers in their place as you hear from industry leaders and experts on how they are facing these challenges in everyday scenarios — because if there is one thing we know about the cyberspace, any vulnerabilities left to chance will always be exposed.

Refcard #398

Open-Source Data Management Practices and Patterns

By Abhishek Gupta

CORE

Open-Source Data Management Practices and Patterns

Refcard #370

Data Orchestration on Cloud Essentials

By Sudip Sengupta

CORE

DZone Annual Community Survey: What's in Your 2024 Tech Stack?

Are you a software developer or other tech professional? If you’re reading this, chances are pretty good that the answer is "yes." Long story short — we want DZone to work for you! We're asking that you take our annual community survey so we can better serve you! ^^ You can also enter the drawing for a chance to receive an exclusive DZone Swag Pack! The software development world moves fast, and we want to keep up! Across our community, we found that readers come to DZone for various reasons, including to learn about new development trends and technologies, find answers to help solve problems they have, connect with other peers, publish their content, and expand their personal brand's audience. In order to continue helping the DZone Community reach goals such as these, we need to know more about you, your learning preferences, and your overall experience on dzone.com and with the DZone team. For this year's DZone Community research, our primary goals are to: Learn about developer tech preferences and habits Identify content types and topics that developers want to get more information on Share this data for public consumption! To support our Community research, we're focusing on several primary areas in the survey: You, including your experience, the types of software you work on, and the tools you use How you prefer to learn and what you want to learn more about on dzone.com The ways in which you engage with DZone, your content likes vs. dislikes, and your overall journey on dzone.com As a community-driven site, our relationships with our members and contributors is invaluable, and we want to make sure that we continue to serve our audience to the best of our ability. If you're curious to see the report from the 2023 Community survey, feel free to check it out here! Thank you in advance for your participation!—Your favorite DZone Content and Community team

By Dominique Roller

Build an Advanced RAG App: Query Routing

In previous articles, we built a basic RAG application. We also learned to introduce more advanced techniques to improve a RAG application. Today, we will explore how to tie those advanced techniques together. Those techniques might do different — sometimes opposite — things. Still, sometimes we need to use all of them, to cover all possibilities. So let's see how we can link different techniques together. In this article, we will take a look at a technique called Query Routing. The Problem With Advanced RAG Applications When our Generative AI application receives a query, we have to decide what to do with it. For simple Generative AI applications, we send the query directly to the LLM. For simple RAG applications, we use the query to retrieve context from a single data source and then query the LLM. But, if our case is more complex, we can have multiple data sources or different queries that need different types of context. So do we build a one-size-fits-all solution, or do we make the application adapt to take different actions depending on the query? What Is Query Routing? Query routing is about giving our RAG app the power of decision-making. It is a technique that takes the query from the user and uses it to make a decision on the next action to take, from a list of predefined choices. Query routing is a module in our Advanced RAG architecture. It is usually found after any query rewriting or guardrails. It analyzes the input query and it decides the best tool to use from a list of predefined actions. The actions are usually retrieving context from one or many data sources. It could also decide to use a different index for a data source (like parent-child retrieval). Or it could even decide to search for context on the Internet. Which Are the Choices for the Query Router? We have to define the choices that the query router can take beforehand. We must first implement each of the different strategies, and accompany each one with a nice description. It is very important that the description explains in detail what each strategy does since this description will be what our router will base its decision on. The choices a query router takes can be the following: Retrieval From Different Data Sources We can catalog multiple data sources that contain information on different topics. We might have a data source that contains information about a product that the user has questions about. And another data source with information about our return policies, etc. Instead of looking for the answers to the user’s questions in all data sources, the query router can decide which data source to use based on the user query and the data source description. Data sources can be text stored in vector databases, regular databases, graph databases, etc. Retrieval From Different Indexes Query routers can also choose to use a different index for the same data source. For example, we could have an index for keyword-based search and another for semantic search using vector embeddings. The query router can decide which of the two is best for getting the relevant context for answering the question, or maybe use both of them at the same time and combine the contexts from both. We could also have different indexes for different retrieval strategies. For example, we could have a retrieval strategy based on summaries, a sentence window retrieval strategy, or a parent-child retrieval strategy. The query router can analyze the specificity of the question and decide which strategy is best to use to get the best context. Other Data Sources The decision that the query router takes is not limited to databases and indexes. It can also decide to use a tool to look for the information elsewhere. For example, it can decide to use a tool to look for the answer online using a search engine. It can also use an API from a specific service (for example, weather forecasting) to get the data it needs to get the relevant context. Types of Query Routers An important part of our query router is how it makes the decision to choose one or another path. The decision can vary depending on each of the different types of query routers. The following are a few of the most used query router types: LLM Selector Router This solution gives a prompt to an LLM. The LLM completes the prompt with the solution, which is the selection of the right choice. The prompt includes all the different choices, each with its description, as well as the input query to base its decision on. The response to this query will be used to programmatically decide which path to take. LLM Function Calling Router This solution leverages the function-calling capabilities (or tool-using capabilities) of LLMs. Some LLMs have been trained to be able to decide to use some tools to get to an answer if they are provided for them in the prompt. Using this capability, each of the different choices is phrased like a tool in the prompt, prompting the LLM to choose which one of the tools provided is best to solve the problem of retrieving the right context for answering the query. Semantic Router This solution uses a similarity search on the vector embedding representation of the user query. For each choice, we will have to write a few examples of a query that would be routed to this path. When a user query arrives, an embeddings model converts it to a vector representation and it is compared to the example queries for each router choice. The example with the nearest vector representation to the user query is chosen as the path the router must route to. Zero-Shot Classification Router For this type of router, a small LLM is selected to act as a router. This LLM will be finetuned using a dataset of examples of user queries and the correct routing for each of them. The finetuned LLM’s sole purpose will be to classify user queries. Small LLMs are more cost-effective and more than good enough for a simple classification task. Language Classification Router In some cases, the purpose of the query router will be to redirect the query to a specific database or model depending on the language the user wrote the query in. Language can be detected in many ways, like using an ML classification model or a Generative AI LLM with a specific prompt. Keyword Router Sometimes the use case is extremely simple. In this case, the solution could be to route one way or another depending on if some keywords are present in the user query. For example, if the query contains the word “return” we could use a data source with information useful about how to return a product. For this solution, a simple code implementation is enough, and therefore, no expensive model is needed. Single Choice Routing vs Multiple Choice Routing Depending on the use case, it will make sense for the router to just choose one path and run it. However, in some cases, it also can make sense to use more than one choice for answering the same query. To answer a question that spans many topics, the application needs to retrieve information from many data sources. Or the response might be different based on each data source. Then, we can use all of them to answer the question and consolidate them into a single final answer. We have to design the router taking these possibilities into account. Example Implementation of a Query Router Let’s get into the implementation of a query router within a RAG application. You can follow the implementation step by step and run it yourself in the Google Colab notebook. For this example, we will showcase a RAG application with a query router. The application can decide to answer questions based on two documents. The first document is a paper about RAG and the second is a recipe for chicken gyros. Also, the application can decide to answer based on a Google search. We will implement a single-source query router using an LLM function calling router. Load the Paper First, we will prepare the two documents for retrieval. Let's first load the paper about RAG: Load the Recipe We will also load the recipe for chicken gyros. This recipe from Mike Price is hosted in tasty.co. We will use a simple web page reader to read the page and store it as text. Save the Documents in a Vector Store After getting the two documents we will use for our RAG application, we will split them into chunks and we will convert them to embeddings using BGE small, an open-source embeddings model. We will store those embeddings in two vector stores, ready to be questioned. Search Engine Tool Besides the two documents, the third option for our router will be to search for information using Google Search. For this example, I have created my own Google Search API keys. If you want this part to work, you should use your own API keys. Create the Query Router Next, using the LlamaIndex library, we create a Query Engine Tool for each of the three options that the router will choose between. We provide a description for each of the tools, explaining what it is useful for. This description is very important since it will be the basis on which the query router decides which path it chooses. Finally, we create a Router Query Engine, also with Llama. We give the three query engine tools to this router. Also, we define the selector. This is the component that will make the choice of which tool to use. For this example, we are using an LLM Selector. It's also a single selector, meaning it will only choose one tool, never more than one, to answer the query. Run Our RAG Application! Our query router is now ready. Let's test it with a question about RAG. We provided a vector store loaded with information from a paper on RAG techniques. The query router should choose to retrieve context from that vector store in order to answer the question. Let's see what happens: Our RAG application answers correctly. Along with the answer, we can see that it provides the sources from where it got the information from. As we expected, it used the vector store with the RAG paper. We can also see an attribute "selector_result" in the result. In this attribute, we can inspect which one of the tools the query router chose, as well as the reason that the LLM gave to choose that option. Now let's ask a culinary question. The recipe used to create the second vector store is for chicken gyros. Our application should be able to answer which are the ingredients needed for that recipe based on that source. As we can see, the chicken gyros recipe vector store was correctly chosen to answer that question. Finally, let's ask it a question that can be answered with a Google Search. Conclusion In conclusion, query routing is a great step towards a more advanced RAG application. It allows us to set up a base for a more complex system, where our app can better plan how to best answer questions. Also, query routing can be the glue that ties together other advanced techniques for your RAG application and makes them work together as a whole system. However, the complexity of better RAG systems doesn't end with query routing. Query routing is just the first stepping stone for orchestration within RAG applications. The next stepping stone for making our RAG applications better reason, decide, and take actions based on the needs of the users are Agents. In later articles, we will be diving deeper into how Agents work within RAG and Generative AI applications in general.

By Roger Oriol

High-Load Systems: Overcoming Challenges in Social Network Development

I am Alexander Kolobov. I worked as a team lead at one of the biggest social networks, where I led teams of up to 10 members, including SEO specialists, analysts, and product manager. As a developer, I designed, developed, and maintained various features for the desktop and mobile web versions of a social network across backend, frontend, and mobile application APIs. My experience includes: Redesigning the social network interface for multiple user sections Completely rewriting network widgets for external sites Maintaining privacy settings for closed profiles and the content archiving function Overhauling the backend and frontend of the mail notification system, handling millions of emails daily Creating a system for conducting NPS/CSI surveys that covered the two largest Russian social networks In this article, I am going to talk about high-load systems and the challenges they bring. I want to touch upon the following aspects: What is high-load? High-load challenges and requirements Technologies vs challenges We’ll briefly discuss how to define if a system is high-load or not, and then we’ll talk about how high loads change system requirements. Based on my experience, I’ll highlight what approaches and technologies can help overcome high-load challenges. What Is High-Load? Let’s begin with the definition. What systems can we call high-load? A system is considered “high-load” if it meets several criteria: High request volume: Handles millions of requests daily Large user base: Supports millions of concurrent users Extensive data management: Manages terabytes or even petabytes of data Performance and scalability: Maintains responsiveness under increasing loads Complex operations: Performs resource-intensive calculations or data processing High reliability: Requires 99.9% or higher uptime Geographical distribution: Serves users across multiple locations with low latency Concurrent processing: Handles numerous concurrent operations Load balancing: Distributes traffic efficiently to avoid bottlenecks High-Load or Not? Basically, we can already call a system high-load if it meets these benchmarks: Resource utilization: >50% Availability: >99.99% Latency: 300ms RPS (Requests Per Second): >10K One more thing I want to mention is that if I were to give a one-sentence definition of what a high-load system is, I would say: it is when usual methods for processing requests, storing data, and managing infrastructure are no longer enough, and there is a need to create custom solutions. VK Social Network: A High-Load Example Let’s take a look at VK social network loads. Here is what the system had to process already a couple of years ago: 100 million monthly active users (MAU) 100 million posts and content creations per day 9 billion post views per day 20,000 servers These numbers result in the following performance metrics: Resource utilization: >60% Availability: >99.94% Latency: 120ms RPS: 3M So we can definitely call VK loads high. High-Load Challenges Let’s take a step further and look at the difficulties the management of such systems entails. The main challenges are: Performance: Maintaining fast response times and processing under high load conditions Data management: Storing, retrieving, and processing large volumes of data effectively Scalability: Providing that scalability is possible at any stage Reliability: Ensuring the system remains operational and available despite high traffic and potential failures Fault tolerance: Building systems that can recover from failures and continue to operate smoothly External Solutions Risks Apart from the challenges, high-load systems bring certain risks, and that is why we have to question some of the traditional tools. The main issues with external solutions are: They are designed for broad application, not highly specialized tasks. They may have vulnerabilities that are difficult to address quickly. They can fail under high loads. They offer limited control. They may have scalability limitations. The main issue with external solutions is that they are not highly specialized; instead, they are designed for broad market applicability. And it often comes at the expense of performance. There is also an issue with security: on the one hand, external solutions are usually well-tested due to their large user base, but on the other hand, fixing identified issues quickly and precisely is challenging. Updating to a fixed version might lead to compatibility problems. External solutions also require ongoing tweaking and fixing, which is very difficult (unless you are a committer of that solution). And finally, they may not scale effectively. High-Load Structure Requirements Naturally, with growing loads, reliability, data management, and scaling requirements are increasing: Downtime is unacceptable: In the past, downtime for maintenance was acceptable; users had lower expectations and fewer alternatives. Today, with the vast availability of online services and the high competition among them, even short periods of downtime can lead to significant user dissatisfaction and negatively affect Net Promoter Score. Zero data loss ensured by cloud services: Users previously kept backups, but now cloud services must ensure zero data loss. Linear scaling: While systems were once planned in advance, there’s now a need for them to scale linearly at any moment due to possible explosive audience growth. Ease of maintenance: In a competitive environment, it’s essential to launch features quickly and frequently. According to the "five nines" standard (99.999% uptime), which is often referenced in the tech industry, only about 5 minutes of downtime per year are considered acceptable. Technologies vs Challenges Further on, we’ll discuss some possible ways how to overcome these challenges and meet the high-load requirements. Let’s look at how VK's social network grew and gradually transformed its architecture and adopted or created technologies that suited the scale and new requirements. VK Architecture Evolution 2013 (55 million users): KPHP to C++ translator 2015 (76 million users): Hadoop 2017 (86 million users): CDN 2019-2020 (97 million users): Blob Storage, gRPC, microservices on Go/Java, KPHP language 2021-2022 (100 million users): Parallelism in KPHP, QUIC, ImageProcessor, AntiDDOS So, what happened? As the platform’s popularity grew, attracting a larger audience, numerous bottlenecks appeared, and optimization became a necessity: The databases could no longer keep up The project’s codebase became too large and slow The volume of user-generated content also increased, creating new bottlenecks Let’s dive into how we addressed these challenges. Data Storage Solutions In normal-sized projects, traditional databases like MySQL can meet all your needs. However, in high-load projects, each need often requires a separate data storage solution. As the load increased, it became crucial to switch to custom, highly specialized databases with data stored in simple, fast, low-level structures. In 2009, when relational databases couldn’t efficiently handle the growing load, the team started developing their own data storage engines. These engines function as microservices with embedded databases written in C and C++. Currently, there are about 800 engine clusters, each responsible for its own logic, such as messages, recommendations, photos, hints, letters, lists, logs, news, etc. For each task needing a specific data structure or unusual queries, the C team creates a new engine. Benefits of Custom Engines The custom engines proved to be much more efficient: Minimal structuring: Engines use simple data structures. In some cases, they store data as nearly bare indexes, leading to minimal structuring and processing at the reading stage. This approach increases data access and processing speed. Efficient data access: The simplified structure allows for faster query execution and data retrieval. Fast query execution: Custom-tailored queries can be optimized for specific use cases. Performance optimization: Each engine can be fine-tuned for its specific task. Scalability: We also get more efficient data replication and sharding. Reliance on master/slave replication and strict data-level sharding enables horizontal scaling without issues. Heavy Caching Another crucial aspect of our high-load system is caching. All data is heavily cached, often precomputed in advance. Caches are sharded, with custom wrappers for automatic key count calculation on the code level. In large systems like ours, caching moves from merely improving performance as the main goal to reducing load on the backend. The benefits of this caching strategy include: Precomputed data: Many results are calculated ahead of time, reducing response times. Automatic code-level scaling: Our custom wrappers help manage cache size efficiently. Reduces load on the backend: By serving pre-computed results, we significantly decrease the workload on our databases. KPHP: Optimizing Application Code The next challenge was optimizing the application code. It was written in PHP and became too slow, but changing the language was impossible with millions of lines of code in the project. This is where KPHP came into play. The goal of the KPHP compiler is to transform PHP code into C++. Simply put, the compiler converts PHP code to C++. This approach boosts performance without the extensive problems associated with rewriting the entire codebase. The team started improving the system from bottlenecks, and for them, it was the language, not the code itself. KPHP Performance 2-40 times faster in synthetic tests 10 times faster in production environments In real production environments, KPHP proved to be from 7 to 10 times faster than standard PHP. KPHP Benefits KPHP was adopted as the backend of VK. By now it supports PHP 7 and 8 features, making it compatible with modern PHP standards. Here are some key benefits: Development convenience: Allows fast compilation and efficient development cycles Support for PHP 7/8: Keeps up with modern PHP standards Open Source Features: Fast compilation Strict typing: Reduces bugs and improves code quality Shared memory: For efficient memory management Parallelization: Multiple processes can run simultaneously Coroutines: Enables efficient concurrent programming Inlining: Optimizes code execution NUMA support: Enhances performance on systems with Non-Uniform Memory Access Noverify PHP Linter To further enhance code quality and reliability, we implemented the Noverify PHP linter. This tool is specifically designed for large codebases and focuses on analyzing git diffs before they are pushed. Key features of Noverify include: Indexes approximately 1 million lines of code per second Analyzes about 100,000 lines of code per second Can also run on standard PHP projects By implementing Noverify, we’ve significantly improved our code quality and caught potential issues before they made it into production. Microservices Architecture As our system grew, we also partly transitioned to a microservices architecture to accelerate time to market. This shift allowed us to develop services in various programming languages, primarily Go and Java, with gRPC for communication between services. The benefits of this transition include: Improved time to market: Smaller, independent services can be developed and deployed more quickly. Language flexibility: We can develop services in different languages, choosing the best tool for each specific task. Greater development flexibility: Each team can work on their service independently, speeding up the development process. Addressing Content Storage and Delivery Bottlenecks After optimizing databases and code, we began breaking the project into optimized microservices, and the focus shifted to addressing the most significant bottlenecks in content storage and delivery. Images emerged as a critical bottleneck in the social network. The problem is that the same image needs to be displayed in multiple sizes due to interface requirements and different platforms: mobile with retina/non-retina, web, and so on. Image Processor and WebP Format To tackle this challenge, we implemented two key solutions: Image processor: We eliminated pre-cut sizes and instead implemented dynamic resizing. We introduced a microservice called Image Processor that generates required sizes on the fly. WebP format: We transitioned to serving images in WebP format. This change was very cost-effective. The results of switching from JPEG to WebP were significant: 40% reduction in photo size 15% faster delivery time (50 to 100 ms improvement) These optimizations led to significant improvements in our content delivery system. It’s always worth identifying and optimizing the biggest bottlenecks for better performance. Industry-Wide High-Load Solutions While the choice of technologies is unique for each high-load company, many approaches overlap and demonstrate effectiveness across the board. We’ve discussed some of VK’s strategies, and it’s worth noting that many other tech giants also employ similar approaches to tackle high-load challenges. Netflix: Netflix uses a combination of microservices and a distributed architecture to deliver content efficiently. They implement caching strategies using EVCache and have developed their own data storage solutions. Yandex: As one of Russia’s largest tech companies, Yandex uses a variety of in-house databases and caching solutions to manage its search engine and other services. I cannot but mention ClickHouse here, a highly specialized database developed by Yandex to meet its specific needs. This solution proved to be so fast and efficient that it is now widely used by others. Yandex created an open-source database management system that stores and processes data by columns rather than rows. Its high-performance query processing makes it ideal for handling large volumes of data and real-time analytics. LinkedIn: LinkedIn implements a distributed storage system called Espresso for its real-time data needs and leverages caching with Apache Kafka to manage high-throughput messaging. Twitter (X): X employs a custom-built storage solution called Manhattan, designed to handle large volumes of tweets and user data. Conclusion Wrapping up, let’s quickly revise what we’ve learned today: High-load systems are applications built to support a large number of users or transactions at the same time and they require excellent performance and reliability. The challenges of high-load systems include limits on scalability, reliability issues, performance slowdowns, and complicated integrations. High-load systems have specific requirements: preventing data loss, allowing fast feature updates, and keeping downtime to a minimum. Using external solutions can become risky under high loads, so often there is a need to go for custom solutions. To optimize a high-load system, you need to identify the key bottlenecks and then find ways to approach them. This is where the optimization begins. High-load systems rely on effective scalable data storage with good caching, compiled languages, distributed architecture, and good tooling. There are no fixed rules for creating a high-load application; it’s always an experimental process. Remember, building and maintaining high-load systems is a complex task that requires continuous optimization and innovation. By understanding these principles and being willing to develop custom solutions when necessary, you can create robust, scalable systems capable of handling millions of users and requests.

By Alexander Kolobov

Programming Language With No Syntax?

Is it possible to have a programming language that has no syntax? It sounds like a contradiction. Programming languages are all about syntax, plus a bit of code generation, optimization, run-time environment, and so on. But syntax is the most important part as far as programmers are concerned. When encountering a new programming language, it takes time to learn the syntax. Could we just make the syntax disappear or at least make it as simple as possible? Could we also make the syntax arbitrary so that the programmer writing the code can define it for themselves? Ouroboros is a programming language that tries to do just that. It has the simplest syntax ever. It is so simple that it does not even have a syntax analyzer. All it has is a lexical analyzer, which is 20 lines long. At the same time, you can write complex programs and even expressions with parentheses and operators of different precedence, assuming you write your own syntax for that in the program. That way, no syntax also means any syntax. This article is an introduction to Ouroboros, a programming language with no syntax. It is a toy, never meant to be used in production, but it is a fun toy to play with, especially if you have ever wanted to create your own programming language. There were programming languages with minimal syntax. One of the very first languages was LISP, which used only parentheses to group statements as lists. If you are familiar with TCL, you may remember how simple the language is. However, it still defines complex expressions and control structures as part of the language. Another simple language to mention is FORTH. It is a stack language. The syntax is minimal. You either put something on the stack or call a function that works with the values on the stack. FORTH was also famous for its minimal assembly core and for the fact that the rest of the compiler was written in FORTH itself. These languages inspired the design of Ouroboros. LISP is known for the simplest syntax. One might say that LISP has the simplest syntax of all programming languages, but it would be a mistake. True to its name, it uses parentheses to delimit lists, which can be either data or programming structures. As you may know, LISP stands for "Lots of Irritating Superfluous Parentheses." Ouroboros does not do that. It inherits the use of { and } from TCL, but unlike LISP, you are forced to use them only where they are really needed. Ouroboros, although being an interpreted language, can compile itself. Well, not really compile, but you can define syntax for the language in the language itself. However, it is not like in the case of compilers where the compiler is written in the source language. One of the first compilers was the PASCAL compiler written by Niklaus Wirth in PASCAL. The C compiler was also written in C, and more and more language compilers are written in the language they compile. In the case of an interpreted language, it is a bit different. It is not a separate program that reads the source code and generates machine code. It is the executing code, the application program itself, that becomes part of the interpreter. That way, you cannot look at it and say, "This code is not Ouroboros." Any code can be, depending on the syntax you define for it at the start of the code. The Name of the Game Before diving into what Ouroboros is, let’s talk about the name itself. Ouroboros coils around itself in an endless cycle of creation and recreation. The name "Ouroboros" is as multifaceted as the language itself, offering layers of meaning that reflect its unique nature and aspirations. The Eternal Cycle At its core, Ouroboros draws inspiration from the ancient symbol of a serpent consuming its own tail. This powerful image represents the cyclical nature of creation and destruction, perfectly encapsulating our language’s self-referential definition. Just as the serpent feeds upon itself to sustain its existence, Ouroboros the language is defined by its own constructs, creating a closed loop of logic and functionality. UR: The Essence of Simplicity Abbreviated as "UR," Ouroboros embraces the concept of fundamental simplicity. In German, "Ur—" signifies something primordial, primitive, or in its most basic form. This perfectly encapsulates the design philosophy behind Ouroboros: a language stripped down to its absolute essentials. By pushing the simplification of syntax to the extreme, Ouroboros aims to be the "ur-language" of programming — a return to the most elemental form of computation. Like the basic building blocks of life or the fundamental particles of physics, Ouroboros provides a minimal set of primitives from which complex structures can emerge. This radical simplicity is not a limitation but a feature. It challenges programmers to think at the most fundamental level, fostering a deep understanding of computational processes. In Ouroboros, every construct is essential, every symbol significant. It’s programming distilled to its purest form. Our Shared Creation The name begins with "Our-," emphasizing the collaborative nature of this language. Ouroboros is not just a tool but a shared endeavor that belongs to its community of developers and users. It’s a language crafted by us, for us, evolving through our collective efforts and insights. Hidden Treasures Delve deeper into the name, and you’ll uncover more linguistic gems: "Oro" in many Romance languages means "gold" or "prayer." Ouroboros can be seen as a golden thread of logic, or a prayer-like mantra of computational thought. "Ob-" as a prefix often means "toward" or "about," suggesting that Ouroboros is always oriented toward its own essence, constantly reflecting upon and refining itself. "Boros" could be playfully interpreted as a variation of "bytes," hinting at the language’s digital nature. Parsing the name as "our-ob-oros" reveals a delightful multilingual wordplay: "our way to the treasure." This blend of English ("our"), Latin ("ob" meaning "towards"), and Greek ("oros," which can be associated with "boundaries" or "definitions") mirrors the language’s eclectic inspirations. Just as Ouroboros draws from the diverse traditions of TCL, LISP, and FORTH, its name weaves together linguistic elements from different cultures. This multilingual, multi-paradigm approach guides us toward the treasures of computation, defining new boundaries along the way, much like how TCL offers flexibility, LISP promotes expressiveness, and FORTH emphasizes simplicity and extensibility. A Name That Bites Back Ultimately, Ouroboros is a name that challenges you to think recursively, to see the end in the beginning and the whole in every part. It’s a linguistic puzzle that mirrors the very nature of the programming language it represents — complex, self-referential, and endlessly fascinating. As you embark on your journey with Ouroboros, remember that you’re not just writing code; you’re participating in an ancient cycle of creation, where every end is a new beginning, and every line of code feeds into the greater whole of computational possibility. What Is Ouroboros? Ouroboros is a programming language that has no syntax. I have already said that, and now comes the moment of truth: it is a "lie." There is no programming language with absolutely no syntax. UR has a syntax, and it is defined with this sentence: You write the lexical elements of the language one after the other. Syntax That is all. When the interpreter starts to execute the code, it begins reading the lexical elements one after the other. It reads as many elements as it needs to execute some code and not more. To be specific, it reads exactly one lexical element before starting execution. When the execution triggered by the element is finished, it goes on reading the next element. The execution itself can trigger more reads if the command needs more elements. We will see it in the next example soon. A lexical element can be a number, a string, a symbol, or a word. Symbols and words can and should have an associated command to execute. For example, the command puts is borrowed shamelessly from TCL and is associated with the command that prints out a string. Plain Text puts "Hello, World!" It is the simplest program in Ouroboros. When the command behind puts starts to execute, it asks the interpreter to read the next element and evaluate it. In this example, it is a constant string, so it is not difficult to calculate. The value of a constant string is the string itself. The next example is a bit more complex: Plain Text puts add "Hello, " "World!" In this case, the argument to the command puts is another command: add. When puts asks the interpreter to get its argument, the interpreter reads the next element and then starts to execute. As add starts to execute, it needs two arguments, which it asks from the interpreter. Since these arguments are strings, add concatenates them and returns the result. Blocks There is a special command denoted by the symbol {. The lexical analyzer recognizing this character will ask the interpreter to read the following elements until it finds the closing }. This call is recursive in nature if there are embedded blocks. The resulting command is a block command. A block command executes all the commands in it and results in the last result of the commands in the block. Plain Text puts add {"Hello, " "World!"} If we close the two strings into a block, then the output will be a single World! without the `Hello, `. The block "executes" both strings, but the value of the block is only the second string. Commands The commands implemented are documented in the readme of the project on GitHub. The actual set of commands is not fascinating. Every language has a set of commands. The fascinating part is that in UR there is no difference between functions and commands. Are puts or add commands or functions? How about if and while? They are all commands, and they are not part of the language per se. They are part of the implementation. The command if asks the interpreter to fetch one argument, evaluated. It will use this as the condition. After this, it will fetch the next two elements without evaluation. Based on the boolean interpretation of the condition, it will ask the interpreter to evaluate one of the two arguments. Similarly, the command while will fetch two arguments without evaluation. It then evaluates the first as a condition, and if it is true, it will evaluate the second and then go back to the condition. It fetched the condition unevaluated because it will need to evaluate it again and again. In the case of the if command, the condition is evaluated only once, so we did not need a reference to the unevaluated version. Many commands use the unevaluated version of the arguments. This use makes it possible to use the "binary" operators as multi-argument operators. If you want to add up three numbers, you can write add add 1 2 3, or add* 1 2 3 {}, or {add* 1 2 3}. The command add fetches the first argument unevaluated and sees if it is a *. If it is *, then it will fetch the arguments until it encounters the end of the arguments or an empty block. This is a little syntactic sugar, which should be peculiar in the case of a language that has no syntax. It really is there to make the experiment and the playing with the language bearable. On the other side, it erodes the purity of the language. It is also only a technical detail, and I mention it only because we will need to understand it when we discuss the metamorphic nature of the language. It will be needed to understand the use of the first example there. Variables UR supports variables. Variables are strings with values associated with them. The value can be any object. When the interpreter sees a symbol or a bare word (identifier) to evaluate, it will check the value associated with it. If the value is a command, then it will execute the command. In other cases, it will return the value. The variables are scoped. If you set a variable in a block, then the variable is visible only in that block. If there are variables with the same name in the parent block, then the variable in the child block will shadow the variable in the parent block. Variable handling and scoping are implementation details and not strictly part of the language. The implementation as it is now supports boolean, long, double, big integer, big decimal, and string primitive values. It also supports lists and objects. A list is a list of values, and it can be created with the list command. The argument to the command is a block. The command list will ask the interpreter to fetch the argument unevaluated. Afterward, it evaluates the block from the start the same way as the block command does. However, instead of throwing away the resulting values and returning the last one, it returns a list of the results. An object is a map of values. It can be created with the object command. The argument to the command is the parent object. The fields of the parent object are copied to the new object. Objects also have methods. They are the fields that have a command as a value. Introspection The interpreter is open like a cracked safe after a heist. Nothing is hard-wired into the language. When I wrote that the language interpreter recognizes bare words, symbols, strings, etc., it was only true for the initial setup. The lexical analyzers implemented are UR commands, and they can be redefined. They are associated with the names $keyword, $string, $number, $space, $block, $blockClose, and $symbol. The interpreter uses the variable structures to find these commands. There is another variable named $lex that is a list of the lexical analyzers. The interpreter uses this list when it needs to read the next lexical element. It invokes the first, then the second, and so on until one of them returns a non-null value, a lexical element, which is a command. If you modify this list, then you can change the lexical analyzers, and that way you can change the syntax of the language. The simplest example is changing the interpretation of the end-of-line character. You may remember that we can use the binary operators using multiple arguments terminated with an empty block. It would be nice if we could omit the block and just write add* 1 2 3 simply adding a new-line at the end. We can do that by changing the lexical analyzer that recognizes the end-of-line character, and this is exactly what we are going to do in this example. Plain Text set q add* 3 2 1 {} puts q insert $lex 0 '{ if { eq at source 0 "\n"} {sets substring 1 length source source '{}} set q add* 3 2 1 {} puts q We insert a new lexical analyzer at the beginning of the list. If the very first character of the current state of the source code is a new-line character, then the lexical analyzer eats this character and returns an empty block. The command source returns the source code that was not parsed by the interpreter yet. The command sets sets the source code to the string value specified. The first puts q will print 6 because at the time of the first calculation, new-lines are just ignored, and that way the value of q is add* 3 2 1 {}. The second puts q will print 5 because the new-line is eaten by the lexical analyzer, and the value of q is add* 3 2 {}. Here, the closing {} was the result of the lexical analysis of the new-line character. The values 1 and {} on the next line are calculated, but they do not have any effect. This is a very simple example. If you want to see something more complex, the project file src/test/resources/samples/xpression.ur contains a script that defines a numerical expression parser. There is a special command called fixup. This command forces the interpreter to parse the rest of the source. After this point, the lexical analyzers are not used anymore. Executing this command does not give any performance benefit, and that is not the purpose. It is more like a declaration that all the codes that are part of the source code introspection and the metamorphic calculation are done. A special implementation of the command can also take the parsed code and generate an executable, turning the interpreter into a compiler. Technical Considerations The current version is implemented in Java. Ouroboros is not a JVM language, though. We do not compile the code to Java byte-code. The Java code interprets the source and executes it. The implementation is an MVP focusing on the metamorphic nature of the language. It is meant to be an experiment. This is the reason why there are no file, network, and other I/O operations except the single puts command that writes to the standard output. The Java service loader feature is used to load the commands and to register them with their respective names in the interpreter. It means that implementing extra commands is as simple as creating them, writing a class implementing a ContextAgent to register them (see the source code), and put them on the classpath. The whole code is open-source and available on GitHub. It is licensed under the Apache License 2.0 (see the license file in the repo). It is exactly 100 classes at the time of writing this article. It means that the source code is simple, short, and easy to understand. If you need some straightforward scripting language in your application, you can use it. It was not meant to be for production, though. Going Further There is no plan currently to extend the language and include more commands. We only plan to create more metamorphic code in the language. The reason for that is that we do not see the language as a practical tool as of today. If it proves to be useful and gains a user base and utilization, we certainly will incorporate more commands to support I/O, file handling, networking, and so on. We also have visions of implementing the interpreter in other languages, like in Rust and Go. Anyone suggesting or wanting to develop commands for better usability or adding features is welcome. It can be a parallel project, or it can be merged into the main project if that makes sense. Conclusion In exploring Ouroboros, we delved into the concept of a programming language that minimizes syntax to the point of almost non-existence. This radical approach challenges the conventional understanding of what a programming language should be, presenting a system where syntax is both absent and infinitely customizable. By drawing inspiration from languages like LISP, TCL, and FORTH, Ouroboros embodies simplicity and introspection, allowing programmers to define their syntax and commands within the language itself. While Ouroboros is not designed for practical production use, it serves as an intriguing experiment in language design and metaprogramming. Its self-referential nature and minimalistic design offer a playground for developers interested in the fundamentals of computation, syntax design, and language interpretation. Whether it evolves into a more robust tool or remains a fascinating intellectual exercise, Ouroboros pushes the boundaries of how we think about programming languages, inviting us to consider the possibility of a language where syntax is as mutable and recursive as the Ouroboros serpent itself.

By Peter Verhas

CORE

Founder Mode?

TL;DR: The Perils of Founder Mode This article delves into the darker aspects of Founder Mode, popularized by Paul Graham and others. It offers a critical perspective for agile practitioners, product leaders, startup founders, and managers who embrace this paradigm and probably fall victim to survivorship bias; the Jobs and the Cheskys are the exception, not the rule. The article explores how resulting tendencies, such as micromanagement, lack of strategic transparency, team devaluation, and reckless risk-taking, can undermine organizational health, stifle innovation, and conflict with agile principles. These can jeopardize long-term success while making work in organizations with a failed founder mode application miserable for everyone below the immediate leadership level and the founder himself. The Collateral Damage Caused by Founder Mode The Founder Mode concept suggests that founders should immerse themselves deeply in all facets of their organization, believing their direct involvement is essential for success. While this approach can bring about swift decision-making and maintain a strong vision, it also harbors several hidden pitfalls. These include tendencies toward: Survivorship Bias and the Myth of the Exceptional Founder Confirmation Bias and the Reinforcement of Echo Chambers Misaligned Investor Incentives and the Big Bet Mentality Micromanagement Obscured strategic objectives Taylorism Revisited: Viewing Teams as Replaceable Cogs Erosion of employee agency Normalization of reckless risk-taking Prioritization of short-term gains over sustainable culture Incompatibility with Agile Self-Management Principles Moreover, cognitive biases and misaligned investor incentives can exacerbate these issues: Survivorship Bias and the Myth of the Exceptional Founder The glorification of successful founders contributes to survivorship bias — the logical error of focusing on those who succeeded while overlooking those who did not. Celebrated figures who adopted Founder Mode are often exceptions rather than the rule. Their success stories create a narrative that this approach is universally practical, which is misleading. By emulating these outliers without acknowledging the unique circumstances that contributed to their success, other founders may adopt strategies unlikely to yield the same results. This perpetuates unrealistic expectations and disregards the importance of context in leadership effectiveness. Founder Mode: Confirmation Bias and the Reinforcement of Echo Chambers Founders deeply invested in their ideas are susceptible to confirmation bias — the tendency to search for, interpret, and recall information that confirms preexisting beliefs. This bias is amplified in Founder Mode as dissenting voices are often suppressed or removed, and supportive feedback is amplified. This creates an echo chamber where critical evaluation is lacking, but everyone tells the emperor what the emperor wants to hear. Without diverse perspectives, the organization becomes blind to potential flaws in its strategies, making it vulnerable to preventable mistakes. Misaligned Investor Incentives and the Big Bet Mentality Investors, particularly venture capitalists, may encourage Founder-Mode behaviors by prioritizing rapid growth and significant returns on investment. This “big bet” mentality values aggressive scaling and market domination, sometimes disregarding the negative consequences for organizational health and ethical standards; think Travis Kalanick or Elon Musk. Such misaligned incentives pressure founders to make bold moves without fully considering the risks or collateral damage. This can lead to decisions that harm the company’s long-term prospects in favor of short-term financial gains. Micromanagement Under the Guise of Engagement One of the most significant concerns with Founder Mode is the propensity for micromanagement. Driven by passion and a desire for perfection, founders may involve themselves excessively in day-to-day operations. This over-involvement can lead to a work environment where co-workers feel their expertise is undervalued. The constant oversight impedes their ability to exercise professional judgment, fostering a culture of dependency rather than empowerment. Micromanagement not only stifles creativity but also hampers efficiency. Teams spend valuable time seeking approvals for minor decisions, slowing down processes that “Agile” aims to streamline and accelerate. This contradicts the principles of self-managing teams, which are central to agile practices and are crucial for rapid adaptation in fast-paced markets. Obscured Strategic Goals and the Neglect of Commander’s Intent Clarity of purpose is paramount in complex and uncertain environments. Founder Mode often results in the obfuscation of strategic goals, with founders keeping their overarching plans opaque to maintain control. This lack of transparency prevents teams from understanding the broader context of their work, inhibiting their ability to make informed decisions that align with the company’s vision. The military concepts of “Auftragstaktik” and “commander’s intent” emphasize the importance of conveying the desired end state without dictating the exact means of achieving it. By ignoring these principles, Founder Mode diminishes the effectiveness of teams, as they cannot adapt their strategies in response to changing circumstances without a clear understanding of the ultimate objectives. Taylorism Revisited: Viewing Teams as Replaceable Cogs Founder Mode can inadvertently revive the outdated management philosophy of Taylorism, where workers are seen as interchangeable parts in a machine. This perspective reduces highly skilled professionals to mere executors of the founder’s directives, disregarding their potential contributions to innovation and problem-solving. Such an approach undermines team morale and engagement. Employees’ motivation wanes when they perceive their unique skills and insights as neither recognized nor valued. This affects individual performance and cascades on team dynamics and overall organizational effectiveness. It defies lessons learned on motivation, particularly in knowledge work — see Pink’s 2011 book “Drive: The Surprising Truth About What Motivates Us" — and results in employing financial reward schemes. Erosion of Employee Agency and Autonomy A hallmark of effective modern leadership is empowering employees to make decisions within defined boundaries. Founder Mode, however, often entails a top-down command structure where directives are issued without room for discussion or input. This diminishes employees’ sense of agency, leading to disengagement and a lack of ownership over their work. The suppression of autonomy is particularly detrimental in agile environments, where unpredictability requires responsiveness and adaptability. Teams that lack the authority to adjust their course of action in response to new information cannot effectively navigate the complexities of product development. Normalization of Reckless Risk-Taking Founder Mode can foster a culture where taking significant risks without adequate analysis becomes normalized. The founder’s confidence in their vision may lead them to make high-stakes decisions that jeopardize the company’s future. While bold moves can yield substantial rewards, they can also result in catastrophic failures. This “gambling” mentality, while in line with many venture capitalists’ return on investment expectations, overlooks the importance of calculated risk management. It places the company’s fate on the shoulders of singular, potentially impulsive choices rather than on a balanced strategy that weighs potential gains against possible losses. Prioritization of Immediate Results Over Sustainable Culture Another characteristic of the founder mode is an intense focus on achieving immediate results. While short-term successes can be gratifying, they may come at the expense of building a sustainable organizational culture. Practices that prioritize “getting things done at all costs” often lead to employee burnout, ethical oversights, hero worship, and a toxic work environment. A sustainable culture is vital for long-term success. It attracts and retains top talent, fosters innovation, and builds a strong brand reputation. Neglecting this aspect can have lasting negative impacts that outweigh any short-lived achievements. Incompatibility With Agile Self-Management Principles Agile practices emphasize collaboration, self-organization, and adaptability. Founder Mode’s control-oriented approach is inherently at odds with these principles. Micromanagement and lack of transparency hinder teams’ ability to respond swiftly to changes, collaborate effectively, and take initiative. The success of agile practices relies on trusting teams to manage their work and make decisions that best serve the project’s goals. Founder Mode undermines this trust, leading to rigid processes that stifle innovation and slow down progress. Founder Mode and the Necessity of Agility in a Product Operating Model In a fiercely competitive market, agility is not just beneficial but essential. Therefore, applying a product operating model must be flexible to accommodate shifting customer needs, technological advancements, and competitive pressures. Founder Mode’s centralized decision-making structure impedes this flexibility. Organizations that bottleneck decisions through the founder become less responsive to external changes. This can result in missed opportunities, decreased customer satisfaction, and an inability to stay ahead of competitors. Additional Considerations on Founder Mode While the article covers the key pitfalls associated with the Founder Mode management style, there are a few additional points that might be valuable: The importance of emotional intelligence: Developing emotional intelligence is crucial for leaders who wish to move beyond the limitations of Founder Mode. Leaders can foster a more collaborative and supportive work environment by being attuned to their teams’ emotions and motivations. This enhances team cohesion and encourages open communication, essential for innovation and problem-solving. Navigating organizational growth: As startups evolve into larger organizations, the leadership approach must adapt accordingly. Founder Mode may have been effective in the early stages due to the necessity for rapid decision-making and a clear vision. However, shifting towards more distributed leadership as the company grows can help manage increased complexity and promote scalability. Balancing vision with flexibility: While having a strong vision is essential, rigidity can be detrimental. Leaders should remain open to feedback and be willing to adjust their strategies in response to new information or changing market conditions. This flexibility allows the organization to stay competitive and responsive to external shifts. Cultivating a learning and failure culture: Encouraging continuous learning and development within the organization can counteract some of the negative effects of Founder Mode. By promoting a culture where experimentation is welcomed, and failures are seen as learning opportunities, teams are more likely to innovate and adapt. Ethical leadership and corporate responsibility: Leaders should also consider the ethical implications of their management style. Prioritizing ethical decision-making and corporate social responsibility can enhance the company’s reputation and build trust with stakeholders, including customers, employees, and investors. Alignment with stakeholders: Ensuring that the interests of all stakeholders are considered can help mitigate the misaligned incentives that sometimes arise with aggressive growth strategies. Open dialogue with investors, for instance, can lead to more sustainable expectations and reduce the pressure contributing to risky decision-making. Food for Thought As you reflect on the complexities and potential drawbacks of Founder Mode, consider the following questions to deepen your understanding and explore alternative approaches: Balancing vision and collaboration: How can founders maintain a strong, clear vision while also empowering their teams to contribute ideas and make decisions? Adaptive leadership: How might leaders need to adapt their management style as the organization grows and evolves? How can they recognize when a shift is necessary? Empowering teams: What practical steps can be taken to increase employee agency and autonomy without sacrificing alignment with the company’s strategic goals? Risk management: How can organizations encourage innovative thinking and calculated risk-taking while avoiding reckless or unethical decisions? Cultivating transparency: What mechanisms can be implemented to communicate strategic goals effectively throughout the organization? Mitigating cognitive biases: What strategies can leaders employ to identify and counteract confirmation bias within themselves and their teams? Investor relations: How can founders align investor expectations with sustainable business practices prioritizing long-term success over short-term gains? Learning from failure: Considering the influence of survivorship bias, how can organizations learn from both the successes and failures of others to inform their own strategies? Integrating Agile principles: What adjustments are necessary to reconcile the control-oriented tendencies of Founder Mode with the self-management and adaptability inherent in agile practices? Building sustainable culture: How can focusing on immediate results be balanced with the need to develop a healthy, sustainable organizational culture? Future leadership models: How might alternative leadership models combine founders’ passion and vision with the collaborative and adaptive practices needed in today’s complex business environment? Conclusion While compelling, the myth of the omnipotent founder often masks the underlying dysfunctions that "Founder Mode" leadership can introduce into an organization. This approach runs counter to the very principles that underpin agile practices. Organizations risk eroding the foundation of innovation and adaptability essential in a complex and unpredictable business environment by treating teams as mere executors of a singular vision and stifling their autonomy. Moreover, the normalization of reckless risk-taking and the influence of cognitive biases, such as confirmation bias and survivorship bias, further compound these issues. When combined with misaligned investor incentives that prioritize short-term gains over sustainable growth, the organization becomes vulnerable to strategic missteps that can jeopardize its long-term viability. It is imperative for agile practitioners, product leaders, startup founders, and managers to assess the adoption of "Founder Mode" leadership critically. Embracing a leadership style that values transparency, empowers teams, and promotes responsible risk management is not just preferable but essential. By fostering a culture of collaboration and continuous improvement, organizations can leverage the collective expertise of their teams, enhance their agility, and create sustainable value. By moving beyond the allure of "Founder Mode," organizations position themselves to navigate the market’s uncertainties more effectively. By aligning leadership practices with agile principles, they improve internal dynamics and strengthen their capacity to respond to external challenges. Ultimately, organizations can achieve enduring success and resilience through this deliberate shift towards empowerment and agility. Of course, these considerations take a backseat once an organization identifies its Steve Jobs. However, that is a very, very rare occasion. As Richard P. Feynman said: “The first principle is that you must not fool yourself, and you are the easiest person to fool.” (Source) What is your experience with the Founder Mode model? Please share with us in the comments. Recommended Reading Paul Graham: Founder Mode

By Stefan Wolpers

CORE

Leveling Up My GraphQL Skills: Real-Time Subscriptions

For a few years now, I’ve tried to identify frameworks, products, and services that allow technologists to maintain their focus on extending the value of their intellectual property. This continues to be a wonderful journey for me, filled with unique learning opportunities. The engineer in me recently wondered if there was a situation where I could find a secondary benefit for an existing concept that I’ve talked about before. In other words, could I identify another benefit with the same level of impact as the original parent solution previously recognized? For this article, I wanted to dive deeper into GraphQL to see what I could find. In my “When It’s Time to Give REST a Rest” article, I talked about how there are real-world scenarios when GraphQL is preferable to a RESTful service. We walked through how to build and deploy a GraphQL API using Apollo Server. In this follow-up post, I plan to level up my knowledge of GraphQL by walking through subscriptions for real-time data retrieval. We’ll also build a WebSocket service to consume the subscriptions. Recap: Customer 360 Use Case My prior article centered around a Customer 360 use case, where patrons of my fictional business maintain the following data collections: Customer information Address information Contact methods Credit attributes A huge win in using GraphQL is that a single GraphQL request can retrieve all the necessary data for a customer’s token (unique identity). JavaScript type Query { addresses: [Address] address(customer_token: String): Address contacts: [Contact] contact(customer_token: String): Contact customers: [Customer] customer(token: String): Customer credits: [Credit] credit(customer_token: String): Credit } Using a RESTful approach to retrieve the single (360) view of the customer would have required multiple requests and responses to be stitched together. GraphQL gives us a solution that performs much better. Level Up Goals In order to level up in any aspect of life, one has to achieve new goals. For my own goals here, this means: Understanding and implementing the subscriptions value proposition within GraphQL Using a WebSocket implementation to consume a GraphQL subscription The idea of using subscriptions over queries and mutations within GraphQL is the preferred method when the following conditions are met: Small, incremental changes to large objects Low-latency, real-time updates (such as a chat application) This is important since implementing subscriptions inside GraphQL isn’t trivial. Not only will the underlying server need to be updated, but the consuming application will require some redesign as well. Fortunately, the use case we’re pursuing with our Customer 360 example is a great fit for subscriptions. Also, we’ll be implementing a WebSocket approach to leveraging those subscriptions. Like before, I’ll continue using Apollo going forward. Leveling Up With Subscriptions Creds First, we need to install the necessary libraries to support subscriptions with my Apollo GraphQL server: Shell npm install ws npm install graphql-ws @graphql-tools/schema npm install graphql-subscriptions With those items installed, I focused on updating the index.ts from my original repository to extend the typedefs constant with the following: JavaScript type Subscription { creditUpdated: Credit } I also established a constant to house a new PubSub instance and created a sample subscription that we will use later: JavaScript const pubsub = new PubSub(); pubsub.publish('CREDIT_BALANCE_UPDATED', { creditUpdated: { } }); I cleaned up the existing resolvers and added a new Subscription for this new use case: JavaScript const resolvers = { Query: { addresses: () => addresses, address: (parent, args) => { const customer_token = args.customer_token; return addresses.find(address => address.customer_token === customer_token); }, contacts: () => contacts, contact: (parent, args) => { const customer_token = args.customer_token; return contacts.find(contact => contact.customer_token === customer_token); }, customers: () => customers, customer: (parent, args) => { const token = args.token; return customers.find(customer => customer.token === token); }, credits: () => credits, credit: (parent, args) => { const customer_token = args.customer_token; return credits.find(credit => credit.customer_token === customer_token); } }, Subscription: { creditUpdated: { subscribe: () => pubsub.asyncIterator(['CREDIT_BALANCE_UPDATED']), } } }; I then refactored the server configuration and introduced the subscription design: JavaScript const app = express(); const httpServer = createServer(app); const wsServer = new WebSocketServer({ server: httpServer, path: '/graphql' }); const schema = makeExecutableSchema({ typeDefs, resolvers }); const serverCleanup = useServer({ schema }, wsServer); const server = new ApolloServer({ schema, plugins: [ ApolloServerPluginDrainHttpServer({ httpServer }), { async serverWillStart() { return { async drainServer() { serverCleanup.dispose(); } }; } } ], }); await server.start(); app.use('/graphql', cors(), express.json(), expressMiddleware(server, { context: async () => ({ pubsub }) })); const PORT = Number.parseInt(process.env.PORT) || 4000; httpServer.listen(PORT, () => { console.log(`Server is now running on http://localhost:${PORT}/graphql`); console.log(`Subscription is now running on ws://localhost:${PORT}/graphql`); }); To simulate customer-driven updates, I created the following method to increase the credit balance by $50 every five seconds while the service is running. Once the balance reaches (or exceeds) the credit limit of $10,000, I reset the balance back to $2,500, simulating a balance payment being made. JavaScript function incrementCreditBalance() { if (credits[0].balance >= credits[0].credit_limit) { credits[0].balance = 0.00; console.log(`Credit balance reset to ${credits[0].balance}`); } else { credits[0].balance += 50.00; console.log(`Credit balance updated to ${credits[0].balance}`); } pubsub.publish('CREDIT_BALANCE_UPDATED', { creditUpdated: credits[0] }); setTimeout(incrementCreditBalance, 5000); } incrementCreditBalance(); The full index.ts file can be found here. Deploy to Heroku With the service ready, it’s time for us to deploy the service so we can interact with it. Since Heroku worked out great last time (and it’s easy for me to use), let’s stick with that approach. To get started, I needed to run the following Heroku CLI commands: Shell $ heroku login $ heroku create jvc-graphql-server-sub Creating jvc-graphql-server-sub... done https://jvc-graphql-server-sub-1ec2e6406a82.herokuapp.com/ | https://git.heroku.com/jvc-graphql-server-sub.git The command also automatically added the repository used by Heroku as a remote: Shell $ git remote heroku origin As I noted in my prior article, Apollo Server disables Apollo Explorer in production environments. To keep Apollo Explorer available for our needs, I needed to set the NODE_ENV environment variable to development. I set that with the following CLI command: Shell $ heroku config:set NODE_ENV=development Setting NODE_ENV and restarting jvc-graphql-server-sub... done, v3 NODE_ENV: development I was ready to deploy my code to Heroku: Shell $ git commit --allow-empty -m 'Deploy to Heroku' $ git push heroku A quick view of the Heroku Dashboard showed my Apollo Server running without any issues: In the Settings section, I found the Heroku app URL for this service instance: https://jvc-graphql-server-sub-1ec2e6406a82.herokuapp.com/ Please note: This link will no longer be in service by the time this article is published. For the time being, I could append graphql to this URL to launch Apollo Server Studio. This let me see the subscriptions working as expected: Notice the Subscription responses on the right-hand side of the screen. Leveling Up With WebSocket Skillz We can leverage WebSocket support and Heroku’s capabilities to create an implementation that consumes the subscription we’ve created. In my case, I created an index.js file with the following contents. Basically, this created a WebSocket client and also established a dummy HTTP service that I could use to validate the client was running: JavaScript import { createClient } from "graphql-ws"; import { WebSocket } from "ws"; import http from "http"; // Create a dummy HTTP server to bind to Heroku's $PORT const PORT = process.env.PORT || 3000; http.createServer((req, res) => res.end('Server is running')).listen(PORT, () => { console.log(`HTTP server running on port ${PORT}`); }); const host_url = process.env.GRAPHQL_SUBSCRIPTION_HOST || 'ws://localhost:4000/graphql'; const client = createClient({ url: host_url, webSocketImpl: WebSocket }); const query = `subscription { creditUpdated { token customer_token credit_limit balance credit_score } }`; function handleCreditUpdated(data) { console.log('Received credit update:', data); } // Subscribe to the creditUpdated subscription client.subscribe( { query, }, { next: (data) => handleCreditUpdated(data.data.creditUpdated), error: (err) => console.error('Subscription error:', err), complete: () => console.log('Subscription complete'), } ); The full index.js file can be found here. We can deploy this simple Node.js application to Heroku, too, making sure to set the GRAPHQL_SUBSCRIPTION_HOST environment variable to the Heroku app URL we used earlier. I also created the following Procfile to tell Heroku how to start up my app: Shell web: node src/index.js Next, I created a new Heroku app: Shell $ heroku create jvc-websocket-example Creating jvc-websocket-example... done https://jvc-websocket-example-62824c0b1df4.herokuapp.com/ | https://git.heroku.com/jvc-websocket-example.git Then, I set the the GRAPHQL_SUBSCRIPTION_HOST environment variable to point to my running GraphQL server: Shell $ heroku --app jvc-websocket-example \ config:set \ GRAPHQL_SUBSCRIPTION_HOST=ws://jvc-graphql-server-sub-1ec2e6406a82.herokuapp.com/graphql At this point, we are ready to deploy our code to Heroku: Shell $ git commit --allow-empty -m 'Deploy to Heroku' $ git push heroku Once the WebSocket client starts, we can see its status in the Heroku Dashboard: By viewing the logs within the Heroku Dashboard for jvc-websocket-example instance, we can see the multiple updates to the balance property of the jvc-graphql-server-sub service. In my demo, I was even able to capture the use case where the balance was reduced to zero, simulating that a payment was made: In the terminal, we can access those same logs with the CLI command heroku logs. Shell 2024-08-28T12:14:48.463846+00:00 app[web.1]: Received credit update: { 2024-08-28T12:14:48.463874+00:00 app[web.1]: token: 'credit-token-1', 2024-08-28T12:14:48.463875+00:00 app[web.1]: customer_token: 'customer-token-1', 2024-08-28T12:14:48.463875+00:00 app[web.1]: credit_limit: 10000, 2024-08-28T12:14:48.463875+00:00 app[web.1]: balance: 9950, 2024-08-28T12:14:48.463876+00:00 app[web.1]: credit_score: 750 2024-08-28T12:14:48.463876+00:00 app[web.1]: } Not only do we have a GraphQL service with a subscription implementation running, but we now have a WebSocket client consuming those updates. Conclusion My readers may recall my personal mission statement, which I feel can apply to any IT professional: “Focus your time on delivering features/functionality that extends the value of your intellectual property. Leverage frameworks, products, and services for everything else.” — J. Vester In this deep-dive into GraphQL subscriptions, we’ve successfully consumed updates from an Apollo Server running on Heroku by using another service also running on Heroku — a Node.js-based application that uses WebSockets. By leveraging lightweight subscriptions, we avoided sending queries for unchanging data but simply subscribed to receive credit balance updates as they occurred. In the introduction, I mentioned looking for an additional value principle inside a topic I’ve written about before. GraphQL subscriptions are an excellent example of what I had in mind because it allows consumers to receive updates immediately, without needing to make queries against the source data. This will make consumers of the Customer 360 data very excited, knowing that they can receive live updates as they happen. Heroku is another example that continues to adhere to my mission statement by offering a platform that enables me to quickly prototype solutions using a CLI and standard Git commands. This not only gives me an easy way to showcase my subscriptions use case but to implement a consumer using WebSockets too. If you’re interested in the source code for this article, check out my repositories on GitLab: graphql-server-customer-subscription websocket-example I feel confident when I say that I’ve successfully leveled up my GraphQL skills with this effort. This journey was new and challenging for me — and also a lot of fun! I plan to dive into authentication next, which hopefully provides another opportunity to level up with GraphQL and Apollo Server. Stay tuned! Have a really great day!

By John Vester

CORE

The Ultimate Database Scaling Cheatsheet: Strategies for Optimizing Performance and Scalability

As applications grow in complexity and user base, the demands on their underlying databases increase significantly. Efficient database scaling becomes crucial to maintain performance, ensure reliability, and manage large volumes of data. Scaling a database effectively involves a combination of strategies that optimize both hardware and software resources to handle increasing loads. This cheatsheet provides an overview of essential techniques for database scaling. From optimizing query performance with indexing to distributing data across multiple servers with horizontal scaling, each section covers a critical aspect of database management. Whether you're dealing with a rapidly growing application or preparing for future growth, understanding these strategies will help you make informed decisions to ensure your database remains robust and responsive. This guide will walk you through the key concepts and best practices for: Indexing: Enhancing query performance through efficient data retrieval methods Vertical scaling: Increasing the capacity of a single database server to handle more load Horizontal scaling/sharding: Distributing data across multiple servers to manage larger datasets and higher traffic Denormalization: Improving read performance by reducing the number of joins through strategic data redundancy Caching: Reducing database load by storing frequently accessed data in faster storage layers Replication: Enhancing availability and reliability by copying data across multiple databases By mastering these techniques, you can ensure that your database infrastructure scales efficiently and remains performant as your application and data grow. 1. Indexing What Is Indexing? Indexing is a technique used to improve the speed of data retrieval operations on a database table at the cost of additional storage space. An index creates a data structure (e.g., B-Tree, Hash Table) that allows the database to quickly locate rows without scanning the entire table. Key Concepts Primary index: Automatically created on the primary key of a table, it ensures uniqueness and speeds up query performance on that key. Secondary index: Created on columns that are frequently used in query conditions (WHERE clauses). It helps in speeding up searches but may slow down write operations due to the need to maintain the index. Composite index: An index on multiple columns. It is useful for queries that filter on multiple columns, but the order of columns in the index is crucial. Unique index: Ensures that the indexed columns have unique values, similar to a primary key but can be applied to non-primary columns. Best Practices Index selective columns: Columns with high cardinality (a large number of unique values) benefit most from indexing. Avoid over-indexing: While indexes speed up reads, they slow down writes (INSERT, UPDATE, DELETE) due to the additional overhead of maintaining the index. Use only necessary indexes. Monitor index performance: Regularly analyze query performance to ensure indexes are effectively used. Tools like EXPLAIN (in SQL) can help diagnose issues. Consider covering indexes: A covering index contains all the columns needed for a query, allowing the database to satisfy the query entirely from the index without accessing the table. Challenges Maintenance overhead: Indexes need to be updated as the data changes, which can introduce performance bottlenecks in write-heavy applications. Increased storage: Indexes consume additional disk space, which can be significant depending on the size of the data and the number of indexes. Complex queries: In some cases, complex queries may not benefit from indexes, especially if they involve functions or multiple table joins. Conclusion Indexing is a powerful tool for optimizing database performance, particularly for read-heavy workloads. However, it's essential to balance the benefits of fast data retrieval with the potential costs in terms of storage and write performance. Regularly review and optimize indexes to ensure your database scales effectively as your application grows. 2. Vertical Scaling What Is Vertical Scaling? Vertical scaling, also known as "scaling up," involves increasing the capacity of a single database server to handle a higher load. This can be achieved by upgrading the server's hardware, such as adding more CPU cores, increasing RAM, or using faster storage solutions like SSDs. The goal is to boost the server's ability to process more transactions, handle larger datasets, and improve overall performance. Key Concepts CPU upgrades: More powerful processors with higher clock speeds or additional cores can handle more concurrent queries, reducing latency and improving throughput. Memory expansion: Increasing the amount of RAM allows the database to cache more data in memory, reducing the need to access slower disk storage and speeding up query performance. Storage improvements: Moving from traditional hard drives to SSDs or even NVMe drives can drastically reduce data access times, leading to faster read and write operations. Database tuning: Beyond hardware upgrades, tuning the database configuration (e.g., adjusting buffer sizes, and cache settings) to take full advantage of the available resources is crucial for maximizing the benefits of vertical scaling. Advantages Simplicity: Vertical scaling is straightforward since it doesn't require changes to the application or database architecture. Upgrading hardware is often less complex than implementing horizontal scaling or sharding. Consistency: With a single server, there's no need to worry about issues like data consistency across multiple nodes or the complexities of distributed transactions. Maintenance: Managing a single server is simpler, as it involves fewer moving parts than a distributed system. Challenges Cost: High-performance hardware can be expensive, and there is often a diminishing return on investment as you approach the upper limits of server capacity. Single point of failure: Relying on a single server increases the risk of downtime if the server fails. Redundancy and failover mechanisms become critical in such setups. Scalability limits: There's a physical limit to how much you can scale up a single server. Once you reach the maximum hardware capacity, further scaling requires transitioning to horizontal scaling or sharding. Conclusion Vertical scaling is an effective solution for improving database performance in the short term, especially for applications that are not yet experiencing massive growth. However, it's important to recognize its limitations. As your application continues to grow, you may eventually need to combine vertical scaling with other strategies like horizontal scaling or replication to ensure continued performance and availability. Balancing the simplicity and power of vertical scaling with its potential limitations is key to maintaining a scalable database infrastructure. 3. Horizontal Scaling/Sharding What Is Horizontal Scaling? Horizontal scaling, often referred to as "scaling out," involves distributing your database across multiple servers to manage larger datasets and higher traffic. Unlike vertical scaling, where you improve a single server's capacity, horizontal scaling adds more servers to handle the load. This approach spreads the data and query load across multiple machines, allowing for virtually unlimited scaling as your application grows. Sharding Sharding is a specific technique used in horizontal scaling where the database is divided into smaller, more manageable pieces called "shards." Each shard is a subset of the overall data and is stored on a separate server. Queries are directed to the appropriate shard based on the data's partitioning logic (e.g., range-based, hash-based). Sharding helps distribute the load evenly across servers and can significantly improve performance and scalability. Key Concepts Partitioning: The process of dividing a database into smaller parts (shards) that can be spread across multiple servers. Partitioning logic determines how the data is divided (e.g., by user ID, geographic region). Replication: In conjunction with sharding, data can be replicated across shards to ensure availability and fault tolerance. Load balancing: Distributing incoming database queries evenly across multiple servers to prevent any single server from becoming a bottleneck. Consistency models: Ensuring data consistency across shards can be challenging. Different consistency models, such as eventual consistency or strong consistency, can be employed based on application requirements. Advantages Scalability: Horizontal scaling offers virtually unlimited scalability by adding more servers as needed. This allows your database infrastructure to grow with your application. Fault tolerance: By distributing data across multiple servers, the failure of a single server has less impact, as other servers can take over the load or provide data redundancy. Cost-effectiveness: Scaling out with multiple commodity servers can be more cost-effective than investing in increasingly expensive high-performance hardware for a single server. Challenges Complexity: Managing a sharded database is more complex than managing a single server. It requires careful planning of partitioning logic, replication strategies, and query routing. Consistency and availability: Ensuring consistency across shards can be difficult, especially in distributed environments. Trade-offs between consistency, availability, and partition tolerance (CAP theorem) need to be considered. Data redistribution: As your application grows, you may need to re-shard or redistribute data across servers, which can be a complex and resource-intensive process. Conclusion Horizontal scaling and sharding are powerful strategies for managing large-scale applications that require high availability and can handle massive amounts of data. While the complexity of managing a distributed system increases, the benefits of improved scalability, fault tolerance, and cost-effectiveness often outweigh the challenges. Proper planning and implementation of horizontal scaling can ensure your database infrastructure remains robust and scalable as your application continues to grow. 4. Denormalization What Is Denormalization? Denormalization is the process of intentionally introducing redundancy into a database to improve read performance. It involves restructuring a normalized database (where data is organized to minimize redundancy) by combining tables or adding duplicate data to reduce the number of joins required in queries. This can lead to faster query execution times at the cost of increased storage space and potential complexity in maintaining data consistency. Key Concepts Normalization vs. denormalization: Normalization organizes data to minimize redundancy and dependencies, typically through multiple related tables. Denormalization, on the other hand, merges these tables or adds redundant data to optimize query performance. Precomputed aggregates: Storing aggregated data (e.g., total sales per region) in a denormalized form can significantly speed up queries that require these calculations, reducing the need for complex joins or real-time computations. Data redundancy: By duplicating data across multiple tables or including commonly queried fields directly in related tables, denormalization reduces the need to join tables frequently, which can drastically improve query performance. Advantages Improved read performance: Denormalized databases can execute read-heavy queries much faster by eliminating the need for complex joins and reducing the computational overhead during query execution. Simplified queries: With fewer tables to join, queries become simpler and more straightforward, making it easier for developers to write and maintain efficient queries. Optimized for specific use cases: Denormalization allows you to tailor your database schema to optimize performance for specific, frequently executed queries, making it ideal for read-heavy applications. Challenges Data inconsistency: The primary trade-off in denormalization is the risk of data inconsistency. Since the same data might be stored in multiple places, ensuring that all copies of the data remain synchronized during updates can be challenging. Increased storage costs: Redundant data consumes additional storage space, which can be significant depending on the size of the database and the extent of denormalization. Complex updates: Updating data in a denormalized database can be more complex, as changes must be propagated across all redundant copies of the data, increasing the likelihood of errors and requiring more careful transaction management. Best Practices Selective denormalization: Only denormalize data that is frequently queried together or requires fast read performance. Avoid over-denormalizing, as it can lead to unmanageable complexity. Maintain a balance: Strive to balance the benefits of faster reads with the potential downsides of increased complexity and storage requirements. Regularly review your denormalization strategies as the application's needs evolve. Use case evaluation: Carefully evaluate the use cases where denormalization will have the most impact, such as in read-heavy workloads or where query performance is critical to user experience. Conclusion Denormalization is a powerful tool for optimizing read performance in databases, especially in scenarios where speed is critical. However, it comes with trade-offs in terms of data consistency, storage costs, and update complexity. By carefully applying denormalization where it makes the most sense, you can significantly enhance the performance of your database while managing the associated risks. Properly balancing normalization and denormalization is key to maintaining a scalable and performant database infrastructure. 5. Caching What Is Caching? Caching is a technique used to temporarily store frequently accessed data in a fast-access storage layer, such as memory, to reduce the load on the database and improve application performance. By serving data from the cache instead of querying the database, response times are significantly faster, and the overall system scalability is enhanced. Key Concepts In-memory cache: A cache stored in RAM, such as Redis or Memcached, which provides extremely fast data retrieval times. In-memory caches are ideal for storing small, frequently accessed datasets. Database query cache: Some databases offer built-in query caching, where the results of expensive queries are stored and reused for subsequent requests, reducing the need for repeated query execution. Object caching: Storing the results of expensive computations or database queries as objects in memory. This can be used to cache rendered pages, user sessions, or any other data that is expensive to generate or fetch. Cache expiration: A strategy to invalidate or refresh cached data after a certain period (time-to-live or TTL) to ensure that the cache doesn't serve stale data. Cache expiration policies can be time-based, event-based, or based on data changes. Advantages Improved performance: Caching can significantly reduce the load on the database by serving frequently accessed data from a faster cache layer, resulting in faster response times for users. Scalability: By offloading read operations to the cache, the database can handle more simultaneous users and queries, making the application more scalable. Cost efficiency: Reducing the number of database queries lowers the need for expensive database resources and can reduce the overall infrastructure costs. Challenges Cache invalidation: One of the most challenging aspects of caching is ensuring that the cached data remains fresh and consistent with the underlying database. Invalidation strategies must be carefully designed to prevent serving stale data. Cache misses: When data is not found in the cache (a cache miss), the application must fall back to querying the database, which can introduce latency. Proper cache population and management strategies are crucial to minimizing cache misses. Complexity: Implementing and maintaining a caching layer adds complexity to the application architecture. It requires careful planning and monitoring to ensure that the cache is effective and does not introduce additional issues, such as memory overuse or data inconsistency. Best Practices Use caching wisely: Cache data that is expensive to compute or frequently accessed. Avoid caching data that changes frequently unless you have a robust invalidation strategy. Monitor cache performance: Regularly monitor the cache hit rate (the percentage of requests served from the cache) and adjust cache size, expiration policies, and strategies to optimize performance. Layered caching: Consider using multiple layers of caching (e.g., in-memory cache for ultra-fast access and a distributed cache for larger datasets) to balance performance and resource utilization. Conclusion Caching is a critical component of a scalable database architecture, especially for read-heavy applications. It can dramatically improve performance and reduce the load on your database, but it must be implemented with careful consideration of cache invalidation, data consistency, and overall system complexity. By leveraging caching effectively, you can ensure that your application remains fast and responsive, even as the load increases. 6. Replication What Is Replication? Replication involves copying and maintaining database objects, such as tables, across multiple database servers. This process ensures that the same data is available across different servers, which can improve availability, fault tolerance, and load distribution. Replication can be set up in various configurations, such as master-slave, master-master, or multi-master, depending on the needs of the application. Key Concepts Master-slave replication: In this model, the master server handles all write operations, while one or more slave servers replicate the data from the master and handle read operations. This setup reduces the load on the master server and increases read performance. Master-master replication: In this configuration, multiple servers (masters) can accept write operations and replicate the changes to each other. This approach allows for high availability and load distribution but requires careful conflict resolution mechanisms. Synchronous vs. asynchronous replication: Synchronous replication ensures that data is written to all replicas simultaneously, providing strong consistency but potentially increasing latency. Asynchronous replication, on the other hand, allows for lower latency but introduces the risk of data inconsistency if a failure occurs before all replicas are updated. Failover and redundancy: Replication provides a failover mechanism where, if the master server fails, one of the slave servers can be promoted to master to ensure continuous availability. This redundancy is crucial for high-availability systems. Advantages High availability: By maintaining multiple copies of the data, replication ensures that the database remains available even if one or more servers fail. This is critical for applications that require 24/7 uptime. Load distribution: Replication allows read operations to be distributed across multiple servers, reducing the load on any single server and improving overall system performance. Fault tolerance: In the event of a hardware failure, replication provides a backup that can be quickly brought online, minimizing downtime and data loss. Challenges Data consistency: Ensuring that all replicas have consistent data can be challenging, especially in asynchronous replication setups where there might be a delay in propagating updates. Conflict resolution strategies are necessary for multi-master configurations. Increased complexity: Managing a replicated database system introduces additional complexity in terms of setup, maintenance, and monitoring. It requires careful planning and execution to ensure that replication works effectively and does not introduce new problems. Latency issues: Synchronous replication can introduce latency in write operations because the system waits for confirmation that all replicas have been updated before proceeding. This can affect the overall performance of the application. Best Practices Choose the right replication strategy: Select a replication model (master-slave, master-master, etc.) based on your application's specific needs for consistency, availability, and performance. Monitor and optimize: Regularly monitor replication lag (the delay between updates to the master and when those updates appear on the replicas) and optimize the replication process to minimize this lag. Plan for failover: Implement automated failover mechanisms to ensure that your system can quickly recover from failures without significant downtime. Conclusion Replication is a vital strategy for building a robust, high-availability database system. It enhances fault tolerance, improves read performance, and ensures data availability across multiple servers. However, it also introduces challenges related to data consistency and system complexity. By carefully selecting the right replication strategy and continuously monitoring and optimizing the replication process, you can build a scalable and reliable database infrastructure that meets the demands of modern applications.

By Suleiman Dibirov

Zero To AI Hero, Part 3: Unleashing the Power of Agents in Semantic Kernel

As I promised in Part 2, it is time to build something substantial with our Semantic Kernel so far. If you are new to Semantic Kernel and must dive into code/head first, I highly recommend starting with Part 1 of this series. There is a lot of theory out there, but we explore these articles with a GitHub sample you can easily download and play with to understand the core concepts. I wanted to use Agent Smith from The Matrix, but I can't seem to find one without copyrights. So, DALL-E 3 to the rescue. Semantic Kernel’s agents aren’t just your typical AI assistants — they’re the multitasking powerhouses that bring advanced automation to your fingertips. By leveraging AI models, plugins, and personas, these agents can perform complex tasks that go beyond mere question-answering and light automation. This article will guide you through building agents with Semantic Kernel, focusing on the key components and offering practical examples to illustrate how to create an agent that plans a trip using various plugins. In this part, we will start looking into AI agents, expand on our example from Part 2, and plan an entire day trip with our newly minted Agent. What Are Agents in Semantic Kernel? Agents in Semantic Kernel are intelligent orchestrators designed to handle complex tasks by interacting with multiple plugins and AI models. They work like a highly organized manager who knows exactly which team members (plugins) to call upon and when to get the job done. Whether it’s planning a road trip, providing weather updates, or even helping you pack for a vacation, agents can combine all these functionalities into a cohesive, efficient flow. Fundamental Building Blocks of an Agent AI Models: The core decision-making unit of an agent, AI models can be Large Language Models like OpenAI’s GPT-4/Mistral AI or small language models like Microsoft's Phi-3. The models interpret user input and generate appropriate responses or actions. Plugins: We explored these in Part 2. These specialized tools allow the agent to perform actions like data retrieval, computation, or API communication. Think of plugins as the agent’s Swiss Army knife, each tool ready for a specific purpose. Simply put, plugins are just existing code callable by an agent. Plans: Plans define the flow of tasks the agent should follow. They map out each step the agent takes, determining which plugins to activate and in what sequence — this part we haven't discussed yet. We will go over plans in this article. Personas: A persona is simply the agent's role in a given context. In the general AI world, it is often called a meta prompt or system prompt. These instructions set the tone for the Agent and give it ground rules for what to do when in doubt. Memory: Memory helps agents retain information across interactions, allowing them to maintain context and remember user preferences. In other words, a simple chat history is part of memory, giving the agent a conversation context. Even if you provide a simple input like "yes" to an Agent's question, the Agent can tie your "yes" to the rest of the conversation and understand what you are answering, much like the humans. There are a few more small components that belong to Agents, such as connectors, etc.; we will omit them here to focus on what matters. It’s Time To Plan for Our Spontaneous Day Trip Let's build an agent capable of planning a day trip by car. Where I live, I have access to the mountains by the Poconos, Jersey Shore beaches, and the greatest city of New York, all within an hour to two-hour drive. I want to build an Agent capable of planning my entire day trip, considering the weather, what to pack, whether my car is fully charged, etc. Let's dive code/head first onto our Agent. C# using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.SemanticKernel.Connectors.OpenAI; using System.ComponentModel; var builder = Kernel.CreateBuilder(); builder.AddAzureOpenAIChatCompletion( deploymentName: "<YOUR_DEPLOYMENT_NAME>", endpoint: "<YOUR_ENDPOINT>", apiKey: "<YOUR_AZURE_OPENAI_API_KEY>" ); builder.Plugins.AddFromType<TripPlanner>(); // <----- This is anew fellow on this Part 3 - TripPlanner. Let's add it to the Kernel builder.Plugins.AddFromType<TimeTeller>(); // <----- This is the same fellow plugin from Part 2 builder.Plugins.AddFromType<ElectricCar>(); // <----- This is the same fellow plugin from Part 2 builder.Plugins.AddFromType<WeatherForecaster>(); // <----- New plugin. We don't want to end up in beach with rain, right? var kernel = builder.Build(); IChatCompletionService chatCompletionService = kernel.GetRequiredService<IChatCompletionService>(); ChatHistory chatMessages = new ChatHistory(""" You are a friendly assistant who likes to follow the rules. You will complete required steps and request approval before taking any consequential actions. If the user doesn't provide enough information for you to complete a task, you will keep asking questions until you have enough information to complete the task. """); while (true) { Console.Write("User > "); chatMessages.AddUserMessage(Console.ReadLine()!); OpenAIPromptExecutionSettings settings = new() { ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions }; var result = chatCompletionService.GetStreamingChatMessageContentsAsync( chatMessages, executionSettings: settings, kernel: kernel); Console.Write("Assistant > "); // Stream the results string fullMessage = ""; await foreach (var content in result) { Console.Write(content.Content); fullMessage += content.Content; } Console.WriteLine("\n--------------------------------------------------------------"); // Add the message from the agent to the chat history chatMessages.AddAssistantMessage(fullMessage); } public class TripPlanner // <------------ Trip planner plugin. An expert on planning trips { [KernelFunction] [Description("Returns back the required steps necessary to plan a one day travel to a destination by an electric car.")] [return: Description("The list of steps needed to plan a one day travel by an electric car")] public async Task<string> GenerateRequiredStepsAsync( Kernel kernel, [Description("A 2-3 sentence description of where is a good place to go to today")] string destination, [Description("The time of the day to start the trip")] string timeOfDay) { // Prompt the LLM to generate a list of steps to complete the task var result = await kernel.InvokePromptAsync($""" I'm going to plan a short one day vacation to {destination}. I would like to start around {timeOfDay}. Before I do that, can you succinctly recommend the top 2 steps I should take in a numbered list? I want to make sure I don't forget to pack anything for the weather at my destination and my car is sufficiently charged before I start the journey. """, new() { { "destination", destination }, { "timeOfDay", timeOfDay } }); // Return the plan back to the agent return result.ToString(); } } public class TimeTeller // <------------ Time teller plugin. An expert on time, peak and off-peak periods { [KernelFunction] [Description("This function retrieves the current time.")] [return: Description("The current time.")] public string GetCurrentTime() => DateTime.Now.ToString("F"); [KernelFunction] [Description("This function checks if the current time is off-peak.")] [return: Description("True if the current time is off-peak; otherwise, false.")] public bool IsOffPeak() => DateTime.Now.Hour < 7 || DateTime.Now.Hour >= 21; } public class WeatherForecaster // <------------ Weather plugin. An expert on weather. Can tell the weather at a given destination { [KernelFunction] [Description("This function retrieves weather at given destination.")] [return: Description("Weather at given destination.")] public string GetTodaysWeather([Description("The destination to retrieve the weather for.")] string destination) { // <--------- This is where you would call a fancy weather API to get the weather for the given <<destination>>. // We are just simulating a random weather here. string[] weatherPatterns = { "Sunny", "Cloudy", "Windy", "Rainy", "Snowy" }; Random rand = new Random(); return weatherPatterns[rand.Next(weatherPatterns.Length)]; } } public class ElectricCar // <------------ Car plugin. Knows about states and conditions of the electric car. Also can charge the car. { private bool isCarCharging = false; private int batteryLevel = 0; private CancellationTokenSource source; // Mimic charging the electric car, using a periodic timer. private async Task AddJuice() { source = new CancellationTokenSource(); var timer = new PeriodicTimer(TimeSpan.FromSeconds(5)); while (await timer.WaitForNextTickAsync(source.Token)) { batteryLevel++; if (batteryLevel == 100) { isCarCharging = false; Console.WriteLine("\rBattery is full."); source.Cancel(); return; } //Console.WriteLine($"Charging {batteryLevel}%"); Console.Write("\rCharging {0}%", batteryLevel); } } [KernelFunction] [Description("This function checks if the electric car is currently charging.")] [return: Description("True if the car is charging; otherwise, false.")] public bool IsCarCharging() => isCarCharging; [KernelFunction] [Description("This function returns the current battery level of the electric car.")] [return: Description("The current battery level.")] public int GetBatteryLevel() => batteryLevel; [KernelFunction] [Description("This function starts charging the electric car.")] [return: Description("A message indicating the status of the charging process.")] public string StartCharging() { if (isCarCharging) { return "Car is already charging."; } else if (batteryLevel == 100) { return "Battery is already full."; } Task.Run(AddJuice); isCarCharging = true; return "Charging started."; } [KernelFunction] [Description("This function stops charging the electric car.")] [return: Description("A message indicating the status of the charging process.")] public string StopCharging() { if (!isCarCharging) { return "Car is not charging."; } isCarCharging = false; source?.Cancel(); return "Charging stopped."; } } We will dissect the code later. For now, let's ask our Agent to plan our day trip for us. Kinda cool, isn't it? We didn't tell the Agent we wanted to charge the electric car. We only told the Agent to plan a trip; it knows intuitively that: The electric car needs to be charged, and The weather needs to be checked. Cool, indeed! We have a small charging simulator using .NET's PeriodicTimer. It is irrelevant for SK, but it would give an exciting update on the console, showing that the charging and battery juice levels are ongoing. As you can see in the screenshot below, I asked the Agent to stop charging the car when the battery level was 91%, which is sufficient for the trip. Did you also notice an interesting thing? When I first asked the question, I only said to plan a trip to the beach. I didn't mention when I was planning to go or which beach. The Agent was aware of this and asked us clarifying questions to get answers to these questions. This is where the persona+memory and the planner come into the picture. Let's start dissecting the code sideways with the Planner first. Planner: The Manager of Everything Think of a planner as a manager of some sort. It can identify the course of action, or "simple steps," to achieve what the user wants. In the above example, planner identifies two steps. Check the weather and pack accordingly: This is where the WeatherForecaster plugin comes into play later. Ensure the car is ready for the trip: This is where the ElectricCar plugin comes into play later. C# public class TripPlanner // <------------ Trip planner plugin. An expert on planning trips { [KernelFunction] [Description("Returns back the required steps necessary to plan a one day travel to a destination by an electric car.")] [return: Description("The list of steps needed to plan a one day travel by an electric car")] public async Task<string> GenerateRequiredStepsAsync( Kernel kernel, [Description("A 2-3 sentence description of where is a good place to go to today")] string destination, [Description("The time of the day to start the trip")] string timeOfDay) { // Prompt the LLM to generate a list of steps to complete the task var result = await kernel.InvokePromptAsync($""" I'm going to plan a short one day vacation to {destination}. I would like to start around {timeOfDay}. Before I do that, can you succinctly recommend the top 2 steps I should take in a numbered list? I want to make sure I don't forget to pack anything for the weather at my destination and my car is sufficiently charged before I start the journey. """, new() { { "destination", destination }, { "timeOfDay", timeOfDay } }); // Return the plan back to the agent return result.ToString(); } } Look at the parameters of the GenerateRequiredStepsAsync KernelFunction. It also needs to take in destination and timeOfDay. These are necessary to plan the trip. Without knowing when and to where, there can be no trips. Now, take a closer look at the prompt. This is where we tell the planner that I want to plan for the following: A day trip To the given destination At the specified time I am using my electric car. I haven't packed for the weather at the destination. Now our Agent knows through the planner that we need to come up with steps to satisfy all of these to plan the trip. The Agent is also aware of available plugins and has the authority to invoke them to provide me with a pleasant trip. Persona: Who Am I? This is where we tell the Agent who it is. The agent's persona is important as it helps the model act within character and take instructions from the user to decide what to do in a dilemma, what steps are to be taken before an action etc. In short, personas define the ground rules of behavior of an Agent. C# ChatHistory chatMessages = new ChatHistory(""" You are a friendly assistant who likes to follow the rules. You will complete required steps and request approval before taking any consequential actions. If the user doesn't provide enough information for you to complete a task, you will keep asking questions until you have enough information to complete the task. """); Here, we clearly define the character and role of our agent. We told it that you: Are an assistant Will follow given rules Take steps. Ask for approval before any major actions. Get clarification if the user doesn't give enough input. Iterations and Memory A new CharHistory instance is created with meta prompt/persona instruction as the first message. This history, later added by the user's input and LLM's responses, serves as a context memory of the conversation. This helps the Agent choose the correct action based on the context derived from the conversation history. C# while (true) { Console.Write("User > "); chatMessages.AddUserMessage(Console.ReadLine()!); OpenAIPromptExecutionSettings settings = new() { ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions }; var result = chatCompletionService.GetStreamingChatMessageContentsAsync( chatMessages, executionSettings: settings, kernel: kernel); Console.Write("Assistant > "); // Stream the results string fullMessage = ""; await foreach (var content in result) { Console.Write(content.Content); fullMessage += content.Content; } Console.WriteLine("\n--------------------------------------------------------------"); // Add the message from the agent to the chat history chatMessages.AddAssistantMessage(fullMessage); } As you can see, we are setting ToolCallBehavior to ToolCallBehavior.AutoInvokeKernelFunctions. This gives our Agent enough authority to invoke plugins when necessary. Each user's input and the model's response are added to the chatMessages. This will help set the context for further interactions. When I say, "That's enough charging," the agent would know that the car is being charged based on previous conversations. An agent's memory gear is nothing but chat history here. Augmented data would also serve as memory (part of the fancy RAG); we wouldn't touch on that for now. Plugins: The Robotic Arms We have already discussed plugins in detail in Part 2. We have added a WeatherForecaster plugin to the mix to help us plan the trip. In a real-world scenario, we would call a real weather API to get the actual weather. We are picking a random weather pattern for this example, which should suffice. We have also added a batteryLevel variable into our ElectricCar plugin. This helps us simulate the charging behavior using a simple timer. We wouldn't be getting into the details of each of these plugins here. Please revisit Part 2 to have a deeper understanding of how plugins work. As usual, this article includes a working GitHub sample. Clone the code and enjoy playing with it. Wrap Up We started harnessing the power of the Semantic Kernel. Once we start mixing plugins with persona, planner, and memory, the resulting Agents can automate tasks, ask leading questions, take actions on your behalf, get confirmation before executing essential tasks, and more. Agents in Semantic Kernel are not just tools; they’re dynamic assistants that combine the power of AI, plugins, and orchestrated plans to solve complex problems. By understanding their building blocks — AI models, plugins, plans, memory, and connectors — you can create competent agents tailored to your specific needs. The possibilities are vast, from managing travel plans to automating tedious tasks, making Semantic Kernel a powerful ally in your AI toolkit. What's Next? Now that we have connected all the pieces of the Semantic Kernel puzzle through Part 1, Part 2, and Part 3, it is time to start thinking beyond a console application. In the following parts of our series, we will add an Agent to an ASP.NET Core API and use dependency injection to create more than one kernel instance to help us navigate our trip planning. We are not going to stop there. We will integrate Semantic Kernel to a locally downloaded Small Language Model (SLM) and make it work for us. Once that works, we aren't far from a .NET MAUI app that can do the AI dance without internet connectivity or GPT-4. I am not going to spoil most of the surprises, keep going through this series to learn more and more!

By Aneesh Gopalakrishnan

How To Solve OutOfMemoryError: Metaspace

There are 9 types of java.lang.OutOfMemoryErrors, each signaling a unique memory-related issue within Java applications. Among these, java.lang.OutOfMemoryError: Metaspace is a challenging error to diagnose. In this post, we’ll delve into the root causes behind this error, explore potential solutions, and discuss effective diagnostic methods to troubleshoot this problem. Let’s equip ourselves with the knowledge and tools to conquer this common adversary. JVM Memory Regions To better understand OutOfMemoryError, we first need to understand different JVM Memory regions. Here is a video clip that gives a good introduction to different JVM memory regions. But in a nutshell, JVM has the following memory regions: Figure 1: JVM memory regions Young Generation: Newly created application objects are stored in this region. Old Generation: Application objects that are living for a longer duration are promoted from the Young Generation to the Old Generation. Basically, this region holds long-lived objects. Metaspace: Class definitions, method definitions, and other metadata that are required to execute your program are stored in the Metaspace region. This region was added in Java 8. Before that metadata definitions were stored in the PermGen. Since Java 8, PermGen was replaced by Metaspace. Threads: Each application thread requires a thread stack. Space allocated for thread stacks, which contain method call information and local variables are stored in this region. Code cache: Memory areas where compiled native code (machine code) of methods is stored for efficient execution are stored in this region. Direct buffer: ByteBuffer objects are used by modern frameworks (i.e., Spring WebClient) for efficient I/O operations. They are stored in this region. GC (Garbage Collection): Memory required for automatic garbage collection to work is stored in this region. JNI (Java Native Interface): Memory for interacting with native libraries and code written in other languages is stored in this region. misc: There are areas specific to certain JVM implementations or configurations, such as the internal JVM structures or reserved memory spaces, they are classified as ‘misc’ regions. What Is java.lang.OutOfMemoryError: Metaspace? Figure 2: java.lang.OutOfMemoryError: Metaspace With a lot of class definitions, method definitions are created in the Metaspace region than the allocated Metaspace memory limit (i.e., -XX:MaxMetaspaceSize), JVM will throw java.lang.OutOfMemoryError: Metaspace. What Causes java.lang.OutOfMemoryError: Metaspace? java.lang.OutOfMemoryError: Metaspace is triggered by the JVM under the following circumstances: Creating a large number of dynamic classes: If your application uses Groovy kind of scripting languages or Java Reflection to create new classes at runtime Loading a large number of classes: Either your application itself has a lot of classes or it uses a lot of 3rd party libraries/frameworks which have a lot of classes in it. Loading a large number of class loaders: Your application is loading a lot of class loaders. Solutions for OutOfMemoryError: Metaspace The following are the potential solutions to fix this error: Increase Metaspace size: If OutOfMemoryError surfaced due to an increase in the number of classes loaded, then increased the JVM’s Metaspace size (-XX:MetaspaceSize and -XX:MaxMetaspaceSize). This solution is sufficient to fix most of the OutOfMemoryError: Metaspace errors, because memory leaks rarely happen in the Metaspace region. Fix memory leak: Analyze memory leaks in your application using the approach given in this post. Ensure that class definitions are properly dereferenced when they are no longer needed to allow them to be garbage collected. Sample Program That Generates OutOfMemoryError: Metaspace To better understand java.lang.OutOfMemoryError: Metaspace, let’s try to simulate it. Let’s leverage BuggyApp, a simple open-source chaos engineering project. BuggyApp can generate various sorts of performance problems such as Memory Leak, Thread Leak, Deadlock, multiple BLOCKED threads, etc. Below is the Java program from the BuggyApp project that simulates java.lang.OutOfMemoryError: Metaspace when executed. import java.util.UUID; import javassist.ClassPool; public class OOMMetaspace { public static void main(String[] args) throws Exception { ClassPool classPool = ClassPool.getDefault(); while (true) { // Keep creating classes dynamically! String className = "com.buggyapp.MetaspaceObject" + UUID.randomUUID(); classPool.makeClass(className).toClass(); } } } In the above program, the OOMMetaspace’ class’s ‘main() method contains an infinite while (true) loop. Within the loop, the thread uses open-source library javassist to create dynamic classes whose names start with com.buggyapp.MetaspaceObject. Class names generated by this program will look something like this: com.buggyapp.MetaspaceObjectb7a02000-ff51-4ef8-9433-3f16b92bba78. When so many such dynamic classes are created, the Metaspace memory region will reach its limit and the JVM will throw java.lang.OutOfMemoryError: Metaspace. How To Troubleshoot OutOfMemoryError: Metaspace To diagnose OutOfMemoryError: Metaspace, we need to inspect the contents of the Metaspace region. Upon inspecting the contents, you can figure out the leaking area of the application code. Here is a blog post that describes a few different approaches to inspecting the contents of the Metaspace region. You can choose the approach that suits your requirements. My favorite options are: 1. -verbose:class If you are running on Java version 8 or below, then you can use this option. When you pass the -verbose:class option to your application during startup, it will print all the classes that are loaded into memory. Loaded classes will be printed in the standard error stream (i.e., console, if you aren’t routing your error stream to a log file). Example: java {app_name} -verbose:class When we passed the -verbose:class flag to the above program, in the console we started to see the following lines to be printed: [Loaded com.buggyapp.MetaspaceObjecta97f62c5-0f71-4702-8521-c312f3668f47 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject70967d20-609f-42c4-a2c4-b70b50592198 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObjectf592a420-7109-42e6-b6cb-bc5635a6024e from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObjectdc7d12ad-21e6-4b17-a303-743c0008df87 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject01d175cc-01dd-4619-9d7d-297c561805d5 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject5519bef3-d872-426c-9d13-517be79a1a07 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject84ad83c5-7cee-467b-a6b8-70b9a43d8761 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject35825bf8-ff39-4a00-8287-afeba4bce19e from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject665c7c09-7ef6-4b66-bc0e-c696527b5810 from __JVM_DefineClass__] [Loaded com.buggyapp.MetaspaceObject793d8aec-f2ee-4df6-9e0f-5ffb9789459d from __JVM_DefineClass__] : : This is a clear indication that classes with the com.buggyapp.MetaspaceObject prefixes are loaded so frequently into the memory. This is a great clue/hint to let you know where the leak is happening in the application. 2. -Xlog:class+load If you are running on Java version 9 or above, then you can use this option. When you pass the -Xlog:class+load option to your application during startup, it will print all the classes that are loaded into memory. Loaded classes will be printed in the file path you have configured. Example: java {app_name} -Xlog:class+load=info:/opt/log/loadedClasses.txt If you are still unable to determine the origination of the leak based on the class name, then you can do a deep dive by taking a heap dump from the application. You can capture a heap dump using one of the 8 options discussed in this post. You might choose the option that fits your needs. Once a heap dump is captured, you need to use tools like HeapHero, JHat, etc. to analyze the dumps. What Is Heap Dump? Heap Dump is basically a snapshot of your application memory. It contains detailed information about the objects and data structures present in the memory. It will tell what objects are present in the memory, whom they are referencing, who is referencing, what is the actual customer data stored in them, what size of they occupy, whether they are eligible for garbage collection, etc. They provide valuable insights into the memory usage patterns of an application, helping developers identify and resolve memory-related issues. How To Analyze Metaspace Memory Leak Through Heap Dump HeapHero is available in two modes: Cloud: You can upload the dump to the HeapHero cloud and see the results. On-Prem: You can register here, get the HeapHero installed on your local machine, and then do the analysis.Note: I prefer using the on-prem installation of the tool instead of using the cloud edition because heap dump tends to contain sensitive information (such as SSN, Credit Card Numbers, VAT, etc.), and I don’t want the dump to be analyzed in external locations. Once the heap dump is captured, from the above program, we upload it to the HeapHero tool. The tool analyzed the dump and generated the report. In the report go to the ‘Histogram’ view. This view will show all the classes that are loaded into the memory. In this view, you will notice the classes with the prefix com.buggyapp.MetaspaceObject. Right-click on the … that is next to the class name. Then click on the List Object(s) with > incoming references as shown in the below figure. Figure 3: Histogram view of showing all the loaded classes in memory Once you do it, the tool will display all the incoming references of this particular class. This will show the origin point of these classes as shown in the below figure. It will clearly show which part of the code is creating these class definitions. Once we know which part of the code is creating these class definitions, then it will be easy to fix the problem. Figure 4: Incoming references of the class Video Summary Here’s a video summary of the article: Conclusion In this post, we’ve covered a range of topics, from understanding JVM memory regions to diagnosing and resolving java.lang.OutOfMemoryError: Metaspace. We hope you’ve found the information useful and insightful. But our conversation doesn’t end here. Your experiences and insights are invaluable to us and to your fellow readers. We encourage you to share your encounters with java.lang.OutOfMemoryError: Metaspace in the comments below. Whether it’s a unique solution you’ve discovered, a best practice you swear by, or even just a personal anecdote, your contributions can enrich the learning experience for everyone.

By Ram Lakshmanan

CORE

How To Implement Specific Distributed System Patterns Using Spring Boot: Introduction

Regarding contemporary software architecture, distributed systems have been widely recognized for quite some time as the foundation for applications with high availability, scalability, and reliability goals. When systems shifted from a centralized structure, it became increasingly important to focus on the components and architectures that support a distributed structure. Regarding the choice of frameworks, Spring Boot is a widely adopted framework encompassing many tools, libraries, and components to support these patterns. This article will focus on the specific recommendations for implementing various distributed system patterns regarding Spring Boot, backed by sample code and professional advice. Spring Boot Overview One of the most popular Java EE frameworks for creating apps is Spring. The Spring framework offers a comprehensive programming and configuration mechanism for the Java platform. It seeks to make Java EE programming easier and increase developers' productivity in the workplace. Any type of deployment platform can use it. It tries to meet modern industry demands by making application development rapid and straightforward. While the Spring framework focuses on giving you flexibility, the goal of Spring Boot is to reduce the amount of code and give developers the most straightforward approach possible to create web applications. Spring Boot's default codes and annotation setup lessen the time it takes to design an application. It facilitates the creation of stand-alone applications with minimal, if any, configuration. It is constructed on top of a module of the Spring framework. With its layered architecture, Spring Boot has a hierarchical structure where each layer can communicate with any layer above or below it. Presentation layer: The presentation layer converts the JSON parameter to an object, processes HTTP requests (from the specific Restful API), authenticates the request, and sends it to the business layer. It is made up, in brief, of views or the frontend section. Business layer: All business logic is managed by this layer. It employs services from data access layers and is composed of service classes. It also carries out validation and permission. Persistence layer: Using various tools like JDBC and Repository, the persistence layer translates business objects from and to database rows. It also houses all of the storage logic. Database layer: CRUD (create, retrieve, update, and delete) actions are carried out at the database layer. The actual scripts that import and export data into and out of the database This is how the Spring Boot flow architecture appears: Table 1: Significant differences between Spring and Spring Boot 1. Microservices Pattern The pattern of implementing microservices is arguably one of the most used designs in the current software world. It entails breaking down a complex, monolithic application into a collection of small, interoperable services. System-dependent microservices execute their processes and interconnect with other services using simple, lightweight protocols, commonly RESTful APIs or message queues. The first advantages of microservices include that they are easier to scale, separate faults well, and can be deployed independently. Spring Boot and Spring Cloud provide an impressive list of features to help implement a microservices architecture. Services from Spring Cloud include service registry, provided by Netflix Eureka or Consul; configuration offered by Spring Cloud config; and resilience pattern offered through either Hystrix or recently developed Resilience4j. Let’s, for instance, take a case where you’re creating an e-commerce application. This application can be split into several microservices covering different domains, for example, OrderService, PaymentService, and InventoryService. All these services can be built, tested, and implemented singularly in service-oriented systems. Java @RestController @RequestMapping("/orders") public class OrderController { @Autowired private OrderService orderService; @PostMapping public ResponseEntity<Order> createOrder(@RequestBody Order order) { Order createdOrder = orderService.createOrder(order); return ResponseEntity.status(HttpStatus.CREATED).body(createdOrder); } @GetMapping("/{id}") public ResponseEntity<Order> getOrder(@PathVariable Long id) { Order order = orderService.getOrderById(id); return ResponseEntity.ok(order); } } @Service public class OrderService { // Mocking a database call private Map<Long, Order> orderRepository = new HashMap<>(); public Order createOrder(Order order) { order.setId(System.currentTimeMillis()); orderRepository.put(order.getId(), order); return order; } public Order getOrderById(Long id) { return orderRepository.get(id); } } In the example above, OrderController offers REST endpoints for making and retrieving orders, while OrderService manages the business logic associated with orders. With each service operating in a separate, isolated environment, this pattern may be replicated for the PaymentService and InventoryService. 2. Event-Driven Pattern In an event-driven architecture, the services do not interact with each other in a request-response manner but rather in a loosely coupled manner where some services only produce events and others only consume them. This pattern is most appropriate when there is a need for real-time processing while simultaneously fulfilling high scalability requirements. It thus establishes the independence of the producers and consumers of events — they are no longer tightly linked. An event-driven system can efficiently work with large and unpredictable loads of events and easily tolerate partial failures. Implementation With Spring Boot Apache Kafka, RabbitMQ, or AWS SNS/SQS can be effectively integrated with Spring Boot, greatly simplifying the creation of event-driven architecture. Spring Cloud Stream provides developers with a higher-level programming model oriented on microservices based on message-driven architecture, hiding the specifics of different messaging systems behind the same API. Let us expand more on the e-commerce application. Consider such a scenario where the order is placed, and the OrderService sends out an event. This event can be consumed by other services like InventoryService to adjust the stock automatically and by ShippingService to arrange delivery. Java // OrderService publishes an event @Autowired private KafkaTemplate<String, String> kafkaTemplate; public void publishOrderEvent(Order order) { kafkaTemplate.send("order_topic", "Order created: " + order.getId()); } // InventoryService listens for the order event @KafkaListener(topics = "order_topic", groupId = "inventory_group") public void consumeOrderEvent(String message) { System.out.println("Received event: " + message); // Update inventory based on the order details } In this example, OrderService publishes an event to a Kafka topic whenever a new order is created. InventoryService, which subscribes to this topic, consumes and processes the event accordingly. 3. CQRS (Command Query Responsibility Segregation) The CQRS pattern suggests the division of the handling of commands into events that change the state from the queries, which are events that retrieve the state. This can help achieve a higher level of scalability and maintainability of the solution, especially when the read and write operations within an application are significantly different in the given area of a business domain. As for the support for implementing CQRS in Spring Boot applications, let’s mention the Axon Framework, designed to fit this pattern and includes command handling, event sourcing, and query handling into the mix. In a CQRS setup, commands modify the state in the write model, while queries retrieve data from the read model, which could be optimized for different query patterns. A banking application, for example, where account balances are often asked, but the number of transactions that result in balance change is comparatively less. By separating these concerns, a developer can optimize the read model for fast access while keeping the write model more consistent and secure. Java // Command to handle money withdrawal @CommandHandler public void handle(WithdrawMoneyCommand command) { if (balance >= command.getAmount()) { balance -= command.getAmount(); AggregateLifecycle.apply(new MoneyWithdrawnEvent(command.getAccountId(), command.getAmount())); } else { throw new InsufficientFundsException(); } } // Query to fetch account balance @QueryHandler public AccountBalance handle(FindAccountBalanceQuery query) { return new AccountBalance(query.getAccountId(), this.balance); } In this code snippet, a WithdrawMoneyCommand modifies the account balance in the command model, while a FindAccountBalanceQuery retrieves the balance from the query model. 4. API Gateway Pattern The API Gateway pattern is one of the critical patterns used in a microservices architecture. It is the central access point for every client request and forwards it to the right microservice. The following are the cross-cutting concerns: Authentication, logging, rate limiting, and load balancing, which are all handled by the gateway. Spring Cloud Gateway is considered the most appropriate among all the available options for using an API Gateway in a Spring Boot application. It is developed on Project Reactor, which makes it very fast and can work with reactive streams. Let us go back to our first e-commerce example: an API gateway can forward the request to UserService, OrderService, PaymentService, etc. It can also have an authentication layer and accept subsequent user requests to be passed to the back-end services. Java @Bean public RouteLocator customRouteLocator(RouteLocatorBuilder builder) { return builder.routes() .route("order_service", r -> r.path("/orders/**") .uri("lb://ORDER-SERVICE")) .route("payment_service", r -> r.path("/payments/**") .uri("lb://PAYMENT-SERVICE")) .build(); } In this example, the API Gateway routes requests to the appropriate microservice based on the request path. The lb://prefix indicates that these services are registered with a load balancer (such as Eureka). 5. Saga Pattern The Saga pattern maintains transactions across multiple services in a distributed transaction environment. With multiple microservices available, it becomes challenging to adjust data consistency in a distributed system where each service can have its own database. The Saga pattern makes it possible for all the operations across services to be successfully completed or for the system to perform compensating transactions to reverse the effects of failure across services. The Saga pattern can be implemented by Spring Boot using either choreography — where services coordinate and interact directly through events — or orchestration, where a central coordinator oversees the Saga. Each strategy has advantages and disadvantages depending on the intricacy of the transactions and the degree of service coupling. Imagine a scenario where placing an order involves multiple services: A few of them include PaymentService, InventoryService, and ShippingService. Every service has to be successfully executed for the order to be confirmed. If any service fails, compensating transactions must be performed to bring the system back to its initial status. Java public void processOrder(Order order) { try { paymentService.processPayment(order.getPaymentDetails()); inventoryService.reserveItems(order.getItems()); shippingService.schedule**process(order);** Figure 2: Amazon’s Saga Pattern Functions Workflow The saga pattern is a failure management technique that assists in coordinating transactions across several microservices to preserve data consistency and establish consistency in distributed systems. Every transaction in a microservice publishes an event, and the subsequent transaction is started based on the event's result. Depending on whether the transactions are successful or unsuccessful, they can proceed in one of two ways. As demonstrated in Figure 2, the Saga pattern uses AWS Step Functions to construct an order processing system. Every step (like "ProcessPayment") has a separate step to manage the process's success (like "UpdateCustomerAccount") or failure (like "SetOrderFailure"). A company or developer ought to think about implementing the Saga pattern if: The program must provide data consistency amongst several microservices without tightly connecting them together. Because some transactions take a long time to complete, they want to avoid the blocking of other microservices due to the prolonged operation of one microservice. If an operation in the sequence fails, it must be possible to go back in time. It is important to remember that the saga pattern becomes more complex as the number of microservices increases and that debugging is challenging. The pattern necessitates the creation of compensatory transactions for reversing and undoing modifications using a sophisticated programming methodology. 6. Circuit Breaker Pattern Circuit Breaker is yet another fundamental design pattern in distributed systems, and it assists in overcoming the domino effect, thereby enhancing the system's reliability. It operates so that potentially failing operations are enclosed by a circuit breaker object that looks for failure. When failures exceed the specified limit, the circuit "bends,and the subsequent calls to the operation simply return an error or an option of failure without performing the task. It enables the system to fail quickly and/or protects other services that may be overwhelmed. In Spring, you can apply the Circuit Breaker pattern with the help of Spring Cloud Circuit Breaker with Resilience4j. Here's a concise implementation: Java // Add dependency in build.gradle or pom.xml // implementation 'org.springframework.cloud:spring-cloud-starter-circuitbreaker-resilience4j' import io.github.resilience4j.circuitbreaker.annotation.CircuitBreaker; import org.springframework.stereotype.Service; @Service public class ExampleService { @CircuitBreaker(name = "exampleBreaker", fallbackMethod = "fallbackMethod") public String callExternalService() { // Simulating an external service call that might fail if (Math.random() < 0.7) { // 70% chance of failure throw new RuntimeException("External service failed"); } return "Success from external service"; } public String fallbackMethod(Exception ex) { return "Fallback response: " + ex.getMessage(); } } // In application.properties or application.yml resilience4j.circuitbreaker.instances.exampleBreaker.failureRateThreshold=50 resilience4j.circuitbreaker.instances.exampleBreaker.waitDurationInOpenState=5000ms resilience4j.circuitbreaker.instances.exampleBreaker.slidingWindowSize=10 In this instance of implementation: A developer adds the @CircuitBreaker annotation to the callExternalService function. When the circuit is open, the developer specifies a fallback method that will be called. Configure the application configuration file's circuit breaker properties. This configuration enhances system stability by eliminating cascade failures and allowing the service to handle errors gracefully in the external service call. Conclusion By applying the microservices pattern, event-driven pattern, command query responsibility segregation, API gateway pattern, saga pattern, and circuit breaker pattern with the help of Spring Boot, developers and programmers can develop distributed systems that are scalable, recoverable, easily maintainable, and subject to evolution. An extensive ecosystem of Spring Boot makes it possible to solve all the problems associated with distributed computing, which makes this framework the optimal choice for developers who want to create a cloud application. Essential examples and explanations in this article are constructed to help the reader begin using distributed system patterns while developing applications with Spring Boot. However, in order to better optimize and develop systems and make sure they can withstand the demands of today's complex and dynamic software environments, developers can investigate more patterns and sophisticated methodologies as they gain experience. References Newman, S. (2015). Building Microservices: Designing Fine-Grained Systems. O'Reilly Media. Richards, M. (2020). Software Architecture Patterns. O'Reilly Media. AWS Documentation. (n.d.). AWS Step Functions - Saga Pattern Implementation Nygard, M. T. (2007). Release It!: Design and Deploy Production-Ready Software. Pragmatic Bookshelf. Resilience4j Documentation. (n.d.). Spring Cloud Circuit Breaker with Resilience4j. Red Hat Developer. (2020). Microservices with the Saga Pattern in Spring Boot.

By Dhruv Seth