Samarth M O’s Post

Data Engineer 2 at Epsilon | ex-GE | AWS | Databricks | Spark

6mo Edited

Building a Distributed Lock Management Service can be a very challenging problem! A Lock manager basically serves locks to multiple concurrent requests that are trying to get control (lock) over a shared resource. For example- multiple processes trying to update a certain central table but we want one and only one process to have complete control over the table at a given point in time. Relational databases do guarantee ACID transactions, however, if you have a huge number of concurrent requests hitting your application, this guarantee won’t be enough since databases stick to a certain Isolation level (Mostly Serializable) and requests can end up having dirty reads. What we really need here is not just Atomicity between requests but also a Pessimistic concurrency control. We essentially need our service to only talk to one request at a time and also do this very very fast! NoSQL databases like DynamoDB are more suited for these use cases because - 1) They can scale horizontally to virtually unlimited concurrent requests. 2) They provide features such as Transaction APIs and Conditional Updates that actually allow you to truly serve only one request at a time. This helps in building semaphores! 3) They have single milli-second response times. I’ll be writing a detailed article on this exact implementation soon! #dataengineering #systemdesign

To view or add a comment, sign in

More Relevant Posts

Victor Ohachor

Feature Engineer (Not the ML Kind)
5mo
Report this post
CAP Theorem? Let's go! This states that it is impossible for a distributed computing system to guarantee all three features: - Consistency - Availability - Partition Tolerance (see this failure of a part of the system). Which means that whatever feature chooses to be ignored affects the nature of a given system. An example? The difference between relational databases and non-relational (I mean NoSQL) databases is consistency. Remember that a relational database enforces ACIDic properties (Atomicity, Consistency, Isolation, Durability)? NoSQL databases, on the other hand, enforce the BASE model (Basically available, soft-state, eventually consistent). This means that NoSQL databases, due to their distributed nature, sacrifice consistency for availability. They also guarantee that eventually, if no updates are made on a data object, all access to that object will return the latest version of that object. So...so... NoSQL databases choose to scale horizontally (to accommodate easy replication as replication enables availability) while relational databases choose to scale vertically (to enforce consistency). I will share more on this once I have explored this topic more. ——— Meanwhile, anticipate my first post on API integration for front-end engineers tomorrow.
Like Comment
To view or add a comment, sign in
Siddhartha S

Senior Software Engineer & Application Architect | Expert in Building Scalable, Reliable, High-Performance Systems
2w
Report this post
Why do Most Distributed Systems Fail to Scale? A few years ago, I worked on a system that couldn’t handle the pressure. The database slowed down, response times got worse, and scaling felt impossible. The problem? Reads and writes weren’t separated. That’s when I learned about CQRS (Command Query Responsibility Segregation) and Event Sourcing. 🔹 CQRS separates reads from writes for better scaling. 🔹 Event Sourcing keeps a full history of changes for transparency. 🔹 Together, they make systems faster and more reliable. In my latest blog, I explain how these patterns work and how to implement them using tools like Kafka and PostgreSQL. 👉 Read it here: https://lnkd.in/ga3kzeSa PS: Have you faced similar scaling issues? Let’s chat in the comments! Sai Sankara Kesava Nath Panda - Siddhartha S #code #tech #database #networking #architetcure
2 Comments
Like Comment
To view or add a comment, sign in
Mahmut Bilir

Senior Software Developer at Ziraat Katılım
1mo
Report this post
"Important Features of MongoDB" * Document-Based Architecture * Flexible Schema Structure * High Performance * Horizontal Scalability with Sharding Support * Clustering and Automatic Failover * Full-Text Search Capabilities * Built-In Memory Management * MongoDB Query Language (MQL) * JSON and BSON Support * Query Optimization and Indexing * TTL (Time to Live) Index * Aggregation Framework * Data Compression * Data Auditing and Change Streams #softwaredevelopment

1 Comment
Like Comment
To view or add a comment, sign in
Debadyuti Roy Chowdhury

📊 Data product attaché
6mo
Report this post
No. Fluvio is not Apache Kafka wire compatible. This comes up a lot in my conversations. For some legitimate reasons. A mature ecosystem like Kafka has a lot to offer in terms of connectors to integrate a number of systems. And who does not want flexibility and options. But there is a certain beauty in the constraints that forces prioritization in a startup. The team at InfinyOn spent a few iterations a couple of years ago to build Kafka wire compatibility. Sadly it was slowing down the progress of Fluvio. So the team archived the project and continued to build Fluvio. And now we have come to a fascinating little break point. In a recent conversation I asked an architect what purpose would it serve if Fluvio had Kafka wire compatibility. The answer made me chuckle. The critical aspects of the flow are: 1. Data integrations write data to Kafka topics. 2. In memory key-value database deduplicates traffic. 3. Columnar database materializes aggregates over 1 minute tumbling windows. They appreciate Kafka connect to swap out different systems for 2 and 3 cause there are hundreds of options. Where they have problems today. Scaling 2 and 3. What they ultimately want is a reliable system that scales reasonably. That's it. Here is where it becomes complicated! All of their workflow already happens in Fluvio without them needing the in-memory database, or the columnar database, or Kafka for that matter. They even draw the diagram themselves saying we expect Fluvio to do all of this! But it would be great to have Kafka wire compatibility. 😇 (***And we would revisit this after we launch our Stateful Data Flows.) For me the product problem pattern is - to create nicotine chewing gums so that folks could stop smoking! 😂

2 Comments
Like Comment
To view or add a comment, sign in
Deependra Kumawat

Decision Scientist @MuSigma || Ex-Intern @Autovyn || AWS Certified Cloud Practitioner || Salesforce Administrator || VIT-B'24 || Android Developer || System Design & Architecture
10mo
Report this post
What is up? ✌ Let us talk about distributed caches and how they handle data consistency. 🤳 Distributed caches are systems that store data across multiple nodes in a network, enhancing performance and scalability for applications. They reduce latency by caching frequently accessed data and alleviating the load on databases. To manage data consistency, distributed caches employ various strategies: Replication: Data is replicated across multiple nodes, ensuring redundancy and fault tolerance. Changes made to one node are propagated to others, maintaining consistency. Consensus Algorithms: Distributed caches use consensus algorithms like Paxos or Raft to ensure that updates are committed in a consistent order across nodes. Conflict Resolution: When conflicts occur due to concurrent updates, distributed caches use conflict resolution mechanisms to reconcile conflicting changes and maintain consistency. In addition to managing data consistency, distributed caches often utilize Least Recently Used (LRU) eviction policies to optimize cache space usage. With LRU, the cache removes the least recently accessed data when the cache reaches its capacity limit, making room for new data. This helps prioritize caching frequently accessed data and ensures efficient cache utilization. 💻 Example: ✍ One popular example of a distributed cache is Apache Ignite. Apache Ignite is an in-memory computing platform that provides distributed caching capabilities. It allows users to store data in memory across multiple nodes in a cluster, providing high availability and scalability. Apache Ignite manages data consistency through replication and distributed transactions, ensuring that updates are applied consistently across all nodes in the cluster. It also supports conflict resolution mechanisms to handle concurrent updates and maintain data integrity. Overall, Apache Ignite demonstrates how distributed caches can effectively manage data consistency in distributed environments. #learningprogress #contributions
Like Comment
To view or add a comment, sign in
Dravesh Singh

Senior Software Engineer| Springboot Framework| MicroServices| Angular 17| Docker| SQL| PostgreSQL
5mo Edited
Report this post
Exploring Apache Kafka: A Beginner's Guide to Real-Time Data Streaming I recently delved into Apache Kafka, a powerful platform for handling real-time data streams. Kafka's distributed architecture with topics, partitions, producers, and consumers allows seamless scalability and fault tolerance. Key Features: Event-Driven: Kafka supports event-driven architectures, enabling real-time data processing and seamless integration between applications. Scalability: With partitioning and replication, Kafka ensures data scalability and reliability across clusters of brokers. Use Cases: It's used for real-time analytics, log aggregation, and integrating diverse data sources in data pipelines. Getting Started: Start with setting up a Kafka cluster, understanding producers and consumers, and exploring its capabilities for real-time analytics and data integration. Conclusion: Apache Kafka is pivotal in modern data architectures, empowering businesses with real-time data insights and operational efficiency. Are you exploring Kafka too? Share your experiences!
Like Comment
To view or add a comment, sign in
Dataview.in

17 followers
1mo
Report this post
Stream processing has transformed how data is managed in real-time applications. Among the leading technologies in this field, Apache Flink and ksqlDB shine with their distinct features and advantages. For data engineers and developers tackling the challenges of stream processing, mastering these tools is essential. Apache Flink is a powerful and versatile open-source framework and distributed processing engine designed for real-time stream and batch data processing. It excels in handling large-scale, high-throughput, and low-latency data processing tasks, making it a preferred choice for many modern data-driven applications. On the other hand, ksqlDB is a powerful, open-source event streaming database built on top of Apache Kafka. It provides a SQL-based interface for processing, transforming, and querying real-time data streams. By simplifying the complexities of working with Kafka, ksqlDB enables developers to build event-driven applications with minimal effort. Additionally, ksqlDB is tightly integrated with Kafka, leveraging its distributed architecture for scalability, fault tolerance, and durability. It directly reads and writes to Kafka topics, acting as a natural extension of the Kafka ecosystem.
Like Comment
To view or add a comment, sign in
Apache Cassandra

2,495 followers
3mo Edited
Report this post
Apache Cassandra 5.0 is now General Availability (GA)! 🎉 This major release significantly improves performance, usability, and capabilities for the world's most powerful distributed database. Key features include: - Storage Attached Indexes (SAI) for improved query flexibility - Trie Memtables and SSTables for enhanced efficiency - JDK 17 support for better performance - Unified Compaction Strategy for improved node density - Vector Search capabilities for AI applications - Dynamic Data Masking for improved security 🔗 and more at: https://lnkd.in/e3k35MTK ⚠ As a reminder, the Cassandra 5.0 launch marks the end of the 3.x series. Users are encouraged to plan their upgrade strategy soon. 🔗 Blog: https://lnkd.in/eJT-VsuF ⬇ Download: https://lnkd.in/dU__DRmQ 🤝 Learn more about Cassandra 5.0 at CommunityOverCode in Denver, October 7-10, 2024. https://lnkd.in/gpY6fUkN

Announcing Apache Cassandra® 5.0

cassandra.apache.org

1 Comment
Like Comment
To view or add a comment, sign in
Sanjay Hona

Driving DevOps Efficiency | Kubernetes, AWS, Azure Expert, Terraform | CI/CD & Infrastructure As Code | Building scalable, secure systems | Continuous Learner and learning GitOps | Preparing for exams: CKA and AWS DevOps
1w
Report this post
In recent years, the landscape of data systems has evolved significantly. Traditional categories like databases, queues, and caches, once considered distinct, are now intermingling, offering hybrid solutions that blur the lines between them. This evolution arises from the emergence of new tools designed to cater to diverse use cases. For instance, Redis functions as a datastore and a message queue, while Apache Kafka offers durability akin to databases. This convergence is driven by the complex requirements of modern applications, which demand more than a single tool for efficient data processing and storage. Consequently, developers now often decompose tasks to leverage the strengths of various tools, integrating them through application code to meet specific needs effectively. As a result, data systems now encompass a broad spectrum of functionalities, reflecting the dynamic demands of contemporary applications and highlighting the need for adaptable, multi-faceted solutions.
Like Comment
To view or add a comment, sign in
Jennifer Douglas

Director of Licensing, Boiler Bay
2mo
Report this post
We have incredible speed and an extremely simple Java Embedded Database and/or a REST clients. Create an Entity-Attribute-Value structure for an impressive nesting of data. Easily create LOBs, CLOBs, trees, documents, inversions, maps, sets, and tuples. Accomplish a distributed network for any structure, be it within a domain, within a corporation, or within a specialized organization. Alternatively, you can create a stand-alone system that has encryption of the database file. Try it out on AWS. Read more here:

Our Unique Boiler Bay Single-File Database Engine is the Core of our Products

link.medium.com
Like Comment
To view or add a comment, sign in

1,739 followers

View Profile Follow

Samarth M O’s Post

More from this author

Pessimistic Concurrency Control in DynamoDB

Handling Fixed-Width Files in AWS Glue

Explore topics