Raghvendra Yadav’s Post

View profile for Raghvendra Yadav, graphic

Software Engineer @StarTree.ai | Empowering real-time decisions with the speed of Apache Pinot with StarTree.

🤔 To use Mmap or Not? That's a hotly debated topic in the #DATABASE community! 📍 What is Mmap? Mmap (memory-mapped files) is a system call that maps files directly into memory, allowing processes to access file contents as if they were in memory. It's like creating a direct window from your application's memory to the file on disk. ✅ Pros of Mmap: • Simplified code - treat file I/O like memory access • Zero-copy I/O - potential performance benefits • Automatic paging by OS • Great for shared memory between processes • Enables lock-free operations using atomic instructions ❌ Cons of Mmap: • Less control over I/O patterns • Page faults can cause unexpected stalls • Poor performance with default OS settings • Harder to handle errors (segfaults vs. explicit errors) • Complex interaction with OS page cache 🔄 Alternatives: • Direct I/O • Buffered I/O with explicit buffer management • Custom page cache implementations • Async I/O systems (like io_uring) 🗄️ Who Uses What? Using Mmap: LMDB Apache Cassandra Apache pinot RocksDB (for certain operations) Not Using Mmap: InfluxData https://lnkd.in/geNM4Qcs Oracle MySQL https://lnkd.in/gt2t2wMh 🤔 How to Decide? Consider mmap if: • You need simple shared memory between processes • Your workload is mostly read-heavy • You can tune OS parameters appropriately • You understand the complexity of virtual memory management Avoid mmap if: • You need precise control over I/O • Your workload is write-heavy • You can't afford unexpected stalls • You need predictable performance On one side, we have Andy Pavlo's famous paper "Are you sure you want to use MMAP in your database management systems (https://lnkd.in/gCyhiqKa)?" On the other side, Howard Chu (LMDB creator) strongly advocates for mmap, offering compelling counterarguments in his response "https://lnkd.in/gp9ScUy9" The debate shows how complex this technical choice really is! #databases #programming #technology #softwareengineering #backend What's your experience with mmap in production systems? Let's discuss in the comments! 👇

Are You Sure You Want to Use MMAP in Your DBMS?

Are You Sure You Want to Use MMAP in Your DBMS?

symas.com

Robert Zych

Staff Software Engineer

1w

Great topic! I’ve discussed this briefly with a couple of other Pinot contributors and the opinions were mixed but mostly for implementing a Buffer Pool Manager (which Andy predicted Pinot would have “in the next few years”). To answer your question, I had a production situation where there was contention and the solution was to host real-time and offline tables on separate servers. I don’t have a strong opinion here, but I do think that cluster planning and management is more important.

To view or add a comment, sign in

Explore topics