Apache Cassandra 5.0 is now General Availability (GA)! 🎉 This major release significantly improves performance, usability, and capabilities for the world's most powerful distributed database. Key features include: - Storage Attached Indexes (SAI) for improved query flexibility - Trie Memtables and SSTables for enhanced efficiency - JDK 17 support for better performance - Unified Compaction Strategy for improved node density - Vector Search capabilities for AI applications - Dynamic Data Masking for improved security 🔗 and more at: https://lnkd.in/e3k35MTK ⚠ As a reminder, the Cassandra 5.0 launch marks the end of the 3.x series. Users are encouraged to plan their upgrade strategy soon. 🔗 Blog: https://lnkd.in/eJT-VsuF ⬇ Download: https://lnkd.in/dU__DRmQ 🤝 Learn more about Cassandra 5.0 at CommunityOverCode in Denver, October 7-10, 2024. https://lnkd.in/gpY6fUkN
Apache Cassandra’s Post
More Relevant Posts
-
Let's make this 8 databases in 8 weeks and add Kùzu Inc. to the list 😉👇🏽 If I had all the time in the world, my desired list from each of the categories to go DEEP into would be: - Kùzu (duh) - Postgres - Meilisearch - DuckDB - DataFusion CLI - InfluxDB - TigerBeetle - RedPanda It's funny how all the databases I love working with are written in Rust, C++ or Zig 🙃 #kuzu
7 databases in 7 weeks Cool idea! :) I think you could get a bit more diverse than this though! Pick one from each category: 1. Mainstream SQL (PostgreSQL, MySQL, SQLite) 2. New SQL (Cockroach, TiDB, Yugabyte, Oceanbase) 3. Document (Mongo, Elasticsearch, FaunaDB, Meilisearch, Quickwit) 4. Analytics (DuckDB, ClickHouse, DataFusion CLI, Databend) 5. Time series (Timescale, QuestDB, Influx, VictoriaMetrics) 6. Sharded, distributed key-value (FoundationDB, TiKV) 7. Non-sharded, distributed (etcd, rqlite, Tigerbeetle, zookeeper) 8. Embedded key-value (RocksDB, BerkeleyDB, LMDB) 9. Streaming (Kafka, RabbitMQ, RedPanda, RisingWave) https://lnkd.in/efVca3DQ
To view or add a comment, sign in
-
CAP Theorem? Let's go! This states that it is impossible for a distributed computing system to guarantee all three features: - Consistency - Availability - Partition Tolerance (see this failure of a part of the system). Which means that whatever feature chooses to be ignored affects the nature of a given system. An example? The difference between relational databases and non-relational (I mean NoSQL) databases is consistency. Remember that a relational database enforces ACIDic properties (Atomicity, Consistency, Isolation, Durability)? NoSQL databases, on the other hand, enforce the BASE model (Basically available, soft-state, eventually consistent). This means that NoSQL databases, due to their distributed nature, sacrifice consistency for availability. They also guarantee that eventually, if no updates are made on a data object, all access to that object will return the latest version of that object. So...so... NoSQL databases choose to scale horizontally (to accommodate easy replication as replication enables availability) while relational databases choose to scale vertically (to enforce consistency). I will share more on this once I have explored this topic more. ——— Meanwhile, anticipate my first post on API integration for front-end engineers tomorrow.
To view or add a comment, sign in
-
Building a Distributed Lock Management Service can be a very challenging problem! A Lock manager basically serves locks to multiple concurrent requests that are trying to get control (lock) over a shared resource. For example- multiple processes trying to update a certain central table but we want one and only one process to have complete control over the table at a given point in time. Relational databases do guarantee ACID transactions, however, if you have a huge number of concurrent requests hitting your application, this guarantee won’t be enough since databases stick to a certain Isolation level (Mostly Serializable) and requests can end up having dirty reads. What we really need here is not just Atomicity between requests but also a Pessimistic concurrency control. We essentially need our service to only talk to one request at a time and also do this very very fast! NoSQL databases like DynamoDB are more suited for these use cases because - 1) They can scale horizontally to virtually unlimited concurrent requests. 2) They provide features such as Transaction APIs and Conditional Updates that actually allow you to truly serve only one request at a time. This helps in building semaphores! 3) They have single milli-second response times. I’ll be writing a detailed article on this exact implementation soon! #dataengineering #systemdesign
To view or add a comment, sign in
-
Mike Stonebraker, the former Turing Award winner who invented the Ingres and Postgres databases, has a new project (in his 80's btw, but it's never too late): The DBOS, or Database Operating System, puts the database at the center of the software stack, reducing the operating system to a small kernel of low-level functions. The idea, apparently, is to run everything within the OS using SQL queries. Can someone inform mr Stonebraker that the database is just a detail and we shouldn't use SQL at all, you know, because, you know, no way to prevent injection attacks... And also, those kinda databases were invented in 70's to save storage space, duh. Can't help myself. Thank you. https://lnkd.in/dYcJz9K4
New startup from Postgres creator puts the database at heart of software stack | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
Add parameters like $lookup, $unionWith, and $match to your wordlist for testing. Any errors or hits on these might give a hint to a potential NoSQL injection. Shout out to Soroush Dalili for this research!
To view or add a comment, sign in
-
NoSQL databases play a crucial role in application success. Let's delve into a comparative overview of four primary types: 🔹 Document Databases like MongoDB offer flexibility with JSON-like documents, ideal for content management and e-commerce, providing scalability and schema-less structure. 🔹 Key-Value Databases such as Redis use simple key-value pairs for caching and session management, excelling in high performance with a straightforward data model. 🔹 Graph Databases like Neo4j focus on nodes and relationships, perfect for social networks and fraud detection, handling complex relationships efficiently. 🔹 Vector Databases like Pinecone utilize numerical representations for tasks like image search and recommendation systems, excelling in efficient similarity search. Key considerations include data complexity, query patterns, performance needs, and scalability for informed decision-making. #database #nosql #dataengineering #datascience #technology #coding #development
To view or add a comment, sign in
-
I have completed the 'NoSQL Concepts' course on DataCamp by Miriam Antona, a software engineer. This course enhanced my understanding of NoSQL databases, including key concepts and their applications in modern software engineering.
Vamshi Krishna Bairoju - vb0465's Statement of Accomplishment | DataCamp
datacamp.com
To view or add a comment, sign in
-
There are so called «narrow» and «wide» transformations in Apache Spark. Briefly, wide transformations are those in which Spark needs to collect all data belonging to one group to one partition, and so make data exchange aka shuffle. Any grouping, windowing transformations are wide because to get group summary Spark needs to move all group data in one partition. On the contrary narrow transformations don’t need data from other partitions, for example «filter», «map». The key for this topic is – narrow transformations preserve existing partitions during execution – they don’t shuffle data. But there is a special case – the «union» transformation. It doesn’t produce shuffle but it «stacks» source DataFrames. So the result DataFrame is comprised of sources’ partitions and its partition number equals sum of the sources’ partition numbers. This effect may be important for further transformations execution performance. Also it directly affects files number written to downstream storage. https://lnkd.in/g8VRq8qD
To view or add a comment, sign in
-
Like all other data practitioners, I was intrigued by the open sourcing of #Databricks' #Unitycatalog. I cloned the repository and dove into the code. I wrote a blog post about what I found. Mostly I compared the Unitycatalog to an Apache Ranger as the most closed open source competitor from my point of view. tldr: I cannot say that it is a production-ready solution, more like MVP. No Ranger-like access audit features, currently missing AI/ML governance, no external persistent RDBMS support (only hardcoded in /etc path to H2DB file), etc. My concern is also that Unitycatalog looks like it does not support hive-style partitioning. Nevertheless, it looks very promising. For example, there is nothing close to the Functions API in Apache Ranger. Thanks to #Databricks for following an open source path! It was a cool experience for me to dive into the Unity code! Check the full blog-post: https://lnkd.in/dMKe49kS
Unitycatalog: the first look
semyonsinchenko.github.io
To view or add a comment, sign in
-
There's a misconception that Apache Flink is only for real-time data needs. In reality, Flink offers a whole new approach to data processing that can significantly improve efficiency and reduce costs. It's a mindset shift that extends beyond just handling real-time data. Let's explore the difference between traditional processing and Flink's streaming approach. Imagine a scenario where you need to enrich two sets of data by joining them. In the old school world, a join operation only happens IF matching data exists – data either gets enriched or not at query. Flink, however, takes a different route. It continuously joins data (stream or batch) WHEN a match is found. This has a profound impact on data freshness and dependency handling. But more importantly, it opens doors toperformance and cost optimization. Batch joins are like one-time state checks, while stream joins are like constantly building and querying a state – a powerful yet not always intuitive concept. Understanding how data behaves in a complex Apache Flink job can be challenging since none of the players - data, state and time - is steady. This is where tools like Datorios come in. By enabling the visualization of a step by step evolution of data and state the development and testing of your Flink jobs becomes easy and the unlocking of the full potential of stream processing possible.
To view or add a comment, sign in
2,495 followers
WitFoo co-Founder, US Navy Veteran, Technophile, SCUBA Diver
3moBuckle up!