Apache Cassandra’s Post

View organization page for Apache Cassandra, graphic

2,495 followers

3mo Edited

Apache Cassandra 5.0 is now General Availability (GA)! 🎉 This major release significantly improves performance, usability, and capabilities for the world's most powerful distributed database. Key features include: - Storage Attached Indexes (SAI) for improved query flexibility - Trie Memtables and SSTables for enhanced efficiency - JDK 17 support for better performance - Unified Compaction Strategy for improved node density - Vector Search capabilities for AI applications - Dynamic Data Masking for improved security 🔗 and more at: https://lnkd.in/e3k35MTK ⚠ As a reminder, the Cassandra 5.0 launch marks the end of the 3.x series. Users are encouraged to plan their upgrade strategy soon. 🔗 Blog: https://lnkd.in/eJT-VsuF ⬇ Download: https://lnkd.in/dU__DRmQ 🤝 Learn more about Cassandra 5.0 at CommunityOverCode in Denver, October 7-10, 2024. https://lnkd.in/gpY6fUkN

Announcing Apache Cassandra® 5.0

cassandra.apache.org

1 Comment

Charles Herring

WitFoo co-Founder, US Navy Veteran, Technophile, SCUBA Diver

3mo

Buckle up!

To view or add a comment, sign in

More Relevant Posts

Prashanth Rao

AI Engineer at Kùzu | Graphs | Databases
1w Edited
Report this post
Let's make this 8 databases in 8 weeks and add Kùzu Inc. to the list 😉👇🏽 If I had all the time in the world, my desired list from each of the categories to go DEEP into would be: - Kùzu (duh) - Postgres - Meilisearch - DuckDB - DataFusion CLI - InfluxDB - TigerBeetle - RedPanda It's funny how all the databases I love working with are written in Rust, C++ or Zig 🙃 #kuzu
Phil Eaton

Staff Engineer at EDB (Include note when connecting 🙏 )
2w Edited

7 databases in 7 weeks Cool idea! :) I think you could get a bit more diverse than this though! Pick one from each category: 1. Mainstream SQL (PostgreSQL, MySQL, SQLite) 2. New SQL (Cockroach, TiDB, Yugabyte, Oceanbase) 3. Document (Mongo, Elasticsearch, FaunaDB, Meilisearch, Quickwit) 4. Analytics (DuckDB, ClickHouse, DataFusion CLI, Databend) 5. Time series (Timescale, QuestDB, Influx, VictoriaMetrics) 6. Sharded, distributed key-value (FoundationDB, TiKV) 7. Non-sharded, distributed (etcd, rqlite, Tigerbeetle, zookeeper) 8. Embedded key-value (RocksDB, BerkeleyDB, LMDB) 9. Streaming (Kafka, RabbitMQ, RedPanda, RisingWave) https://lnkd.in/efVca3DQ
Like Comment
To view or add a comment, sign in
Victor Ohachor

Feature Engineer (Not the ML Kind)
5mo
Report this post
CAP Theorem? Let's go! This states that it is impossible for a distributed computing system to guarantee all three features: - Consistency - Availability - Partition Tolerance (see this failure of a part of the system). Which means that whatever feature chooses to be ignored affects the nature of a given system. An example? The difference between relational databases and non-relational (I mean NoSQL) databases is consistency. Remember that a relational database enforces ACIDic properties (Atomicity, Consistency, Isolation, Durability)? NoSQL databases, on the other hand, enforce the BASE model (Basically available, soft-state, eventually consistent). This means that NoSQL databases, due to their distributed nature, sacrifice consistency for availability. They also guarantee that eventually, if no updates are made on a data object, all access to that object will return the latest version of that object. So...so... NoSQL databases choose to scale horizontally (to accommodate easy replication as replication enables availability) while relational databases choose to scale vertically (to enforce consistency). I will share more on this once I have explored this topic more. ——— Meanwhile, anticipate my first post on API integration for front-end engineers tomorrow.
Like Comment
To view or add a comment, sign in
Samarth M O

Data Engineer 2 at Epsilon | ex-GE | AWS | Databricks | Spark
6mo Edited
Report this post
Building a Distributed Lock Management Service can be a very challenging problem! A Lock manager basically serves locks to multiple concurrent requests that are trying to get control (lock) over a shared resource. For example- multiple processes trying to update a certain central table but we want one and only one process to have complete control over the table at a given point in time. Relational databases do guarantee ACID transactions, however, if you have a huge number of concurrent requests hitting your application, this guarantee won’t be enough since databases stick to a certain Isolation level (Mostly Serializable) and requests can end up having dirty reads. What we really need here is not just Atomicity between requests but also a Pessimistic concurrency control. We essentially need our service to only talk to one request at a time and also do this very very fast! NoSQL databases like DynamoDB are more suited for these use cases because - 1) They can scale horizontally to virtually unlimited concurrent requests. 2) They provide features such as Transaction APIs and Conditional Updates that actually allow you to truly serve only one request at a time. This helps in building semaphores! 3) They have single milli-second response times. I’ll be writing a detailed article on this exact implementation soon! #dataengineering #systemdesign
Like Comment
To view or add a comment, sign in
Vedran B.

Junior DBA and Full-Stack Developer
9mo Edited
Report this post
Mike Stonebraker, the former Turing Award winner who invented the Ingres and Postgres databases, has a new project (in his 80's btw, but it's never too late): The DBOS, or Database Operating System, puts the database at the center of the software stack, reducing the operating system to a small kernel of low-level functions. The idea, apparently, is to run everything within the OS using SQL queries. Can someone inform mr Stonebraker that the database is just a detail and we shouldn't use SQL at all, you know, because, you know, no way to prevent injection attacks... And also, those kinda databases were invented in 70's to save storage space, duh. Can't help myself. Thank you. https://lnkd.in/dYcJz9K4

New startup from Postgres creator puts the database at heart of software stack | TechCrunch

https://techcrunch.com

5 Comments
Like Comment
To view or add a comment, sign in
Critical Thinking - Bug Bounty Podcast

9,971 followers
5mo
Report this post
Add parameters like $lookup, $unionWith, and $match to your wordlist for testing. Any errors or hits on these might give a hint to a potential NoSQL injection. Shout out to Soroush Dalili for this research!
Like Comment
To view or add a comment, sign in
Apurva Daksh

Data Architect @Thoughtworks | Data Engineering and Cloud Solutions | ML/AI Ops | Gen AI | LLMs
4mo
Report this post
NoSQL databases play a crucial role in application success. Let's delve into a comparative overview of four primary types: 🔹 Document Databases like MongoDB offer flexibility with JSON-like documents, ideal for content management and e-commerce, providing scalability and schema-less structure. 🔹 Key-Value Databases such as Redis use simple key-value pairs for caching and session management, excelling in high performance with a straightforward data model. 🔹 Graph Databases like Neo4j focus on nodes and relationships, perfect for social networks and fraud detection, handling complex relationships efficiently. 🔹 Vector Databases like Pinecone utilize numerical representations for tasks like image search and recommendation systems, excelling in efficient similarity search. Key considerations include data complexity, query patterns, performance needs, and scalability for informed decision-making. #database #nosql #dataengineering #datascience #technology #coding #development
Like Comment
To view or add a comment, sign in
Vamshi Krishna Bairoju

Master of Science in Business Analytics | University of North Texas
10mo
Report this post
I have completed the 'NoSQL Concepts' course on DataCamp by Miriam Antona, a software engineer. This course enhanced my understanding of NoSQL databases, including key concepts and their applications in modern software engineering.

Vamshi Krishna Bairoju - vb0465's Statement of Accomplishment | DataCamp

datacamp.com
Like Comment
To view or add a comment, sign in
Sergey Senigov

Data Engineer SQL, Apache Spark, Spark Streaming, Flink, Python, Hive, DWH 🔥 Follow my Apache Spark Blog here!
2w
Report this post
There are so called «narrow» and «wide» transformations in Apache Spark. Briefly, wide transformations are those in which Spark needs to collect all data belonging to one group to one partition, and so make data exchange aka shuffle. Any grouping, windowing transformations are wide because to get group summary Spark needs to move all group data in one partition. On the contrary narrow transformations don’t need data from other partitions, for example «filter», «map». The key for this topic is – narrow transformations preserve existing partitions during execution – they don’t shuffle data. But there is a special case – the «union» transformation. It doesn’t produce shuffle but it «stacks» source DataFrames. So the result DataFrame is comprised of sources’ partitions and its partition number equals sum of the sources’ partition numbers. This effect may be important for further transformations execution performance. Also it directly affects files number written to downstream storage. https://lnkd.in/g8VRq8qD
Like Comment
To view or add a comment, sign in
Sem Sinchenko

Data Engineer | MLOps | PySpark
6mo
Report this post
Like all other data practitioners, I was intrigued by the open sourcing of #Databricks' #Unitycatalog. I cloned the repository and dove into the code. I wrote a blog post about what I found. Mostly I compared the Unitycatalog to an Apache Ranger as the most closed open source competitor from my point of view. tldr: I cannot say that it is a production-ready solution, more like MVP. No Ranger-like access audit features, currently missing AI/ML governance, no external persistent RDBMS support (only hardcoded in /etc path to H2DB file), etc. My concern is also that Unitycatalog looks like it does not support hive-style partitioning. Nevertheless, it looks very promising. For example, there is nothing close to the Functions API in Apache Ranger. Thanks to #Databricks for following an open source path! It was a cool experience for me to dive into the Unity code! Check the full blog-post: https://lnkd.in/dMKe49kS

Unitycatalog: the first look

semyonsinchenko.github.io

6 Comments
Like Comment
To view or add a comment, sign in
Ronen Korman

Co-founder & CEO @ Datorios | 3 Decades of solving very hard problems
8mo
Report this post
There's a misconception that Apache Flink is only for real-time data needs. In reality, Flink offers a whole new approach to data processing that can significantly improve efficiency and reduce costs. It's a mindset shift that extends beyond just handling real-time data. Let's explore the difference between traditional processing and Flink's streaming approach. Imagine a scenario where you need to enrich two sets of data by joining them. In the old school world, a join operation only happens IF matching data exists – data either gets enriched or not at query. Flink, however, takes a different route. It continuously joins data (stream or batch) WHEN a match is found. This has a profound impact on data freshness and dependency handling. But more importantly, it opens doors toperformance and cost optimization. Batch joins are like one-time state checks, while stream joins are like constantly building and querying a state – a powerful yet not always intuitive concept. Understanding how data behaves in a complex Apache Flink job can be challenging since none of the players - data, state and time - is steady. This is where tools like Datorios come in. By enabling the visualization of a step by step evolution of data and state the development and testing of your Flink jobs becomes easy and the unlocking of the full potential of stream processing possible.
Like Comment
To view or add a comment, sign in

2,495 followers

View Profile Connect

Apache Cassandra’s Post

Announcing Apache Cassandra® 5.0

cassandra.apache.org

More from this author

Last week to submit Cassandra abstracts to Community Over Code North America 2024

Call for Presentations: Community Over Code 2024

Explore topics