#Icelovers 💙 don't miss a bit of what happened to #Iceberg as we wrap up the year. 2024 was definitely the year to consolidate Apache Iceberg in the #Lakehouse landscape 🧊 Cross-vendor integrations, new features, acquisitions, we all witnessed an avalanche of fast paced innovation as never before. And much more is coming. Help yourselves with the big news that warmed up the weather a little bit this year💥 Thanks for growing and contributing to this community 🙏 All of you will lead the Lakehouse era in the next years! Looking forward to a surprisingly 2025 🌟 AWS S3 Tables - native support for Iceberg tables: https://lnkd.in/dEFNnKTN Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg: https://lnkd.in/d93AXBvA Dremio Integrates Apache Iceberg REST to Promote Vendor-Agnostic Ecosystem: https://lnkd.in/dZMB7-ve Snowflake introduces Polaris Catalog - An Open Source Catalog for Apache Iceberg: https://lnkd.in/dGHDidfD Databricks Agrees to Acquire Tabular, the Company Founded by the Original Creators of Apache Iceberg: https://lnkd.in/eFniN9t3 Confluent Tableflow, Convert Kafka topics to Iceberg tables: https://lnkd.in/dmTBZ_VW Cloudera announced integration with Snowflake by extending its Open Data Lakehouse interoperability: https://lnkd.in/dWUNWrRn This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
About us
Apache Iceberg is a cloud-native and open table format to building Open Data Lakehouses. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://iceberg.apache.org/community/
- Website
-
https://iceberg.apache.org/
External link for Apache Iceberg
- Industry
- Software Development
- Company size
- 1 employee
- Headquarters
- California
- Type
- Nonprofit
Locations
-
Primary
California, US
Employees at Apache Iceberg
Updates
-
[repost Ameena Ansari] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
❄️ Iceberg Ahead! 🚢 I just dropped a fresh newsletter diving into Apache Iceberg—the ultimate upgrade for your data lakehouse! Here’s a sneak peek: ✅ File skipping = 3x faster queries (skip the gym, your data’s got it). ✅ WAP workflows—Git for your data (all correct data or none). ✅ Governance, security & the 🍒 on your OLAP cake 🎂. 💌 Subscribe now : https://lnkd.in/emrkZNBx — Ameena 💻✨ #DataEngineering #ApacheIceberg #DataLakehouse 👉 Read here : https://lnkd.in/eXHv7KKH
-
[repost Pejman Mazaheri ] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
🚀 Exploring Data Architectures: Data Lake, Warehouse, and Lakehouse 🚀 #DataWarehouse #DataLake #Lakehouse https://lnkd.in/dad5Qn9K
-
Apache Iceberg reposted this
If you haven’t already read my article on future developments to expect in the Apache Iceberg ecosystem, give it a read! https://lnkd.in/eyfRxi63
-
🚨🚨🚨 Breaking news from Las Vegas 🚨🚨🚨 Amazon Web Services (AWS) just announced native S3 new features to support Apache Iceberg and metadata management, making a big move to speed up #Lakehouse adoption in the public cloud. What a year for the Iceberg community 💫 Learn more about this new feature here: https://lnkd.in/dEFNnKTN This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
-
[repost Javier Ariza Batalloso] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
¿𝗤𝘂é 𝘀𝗼𝗻 𝗹𝗼𝘀 𝗗𝗮𝘁𝗮 𝗟𝗮𝗸𝗲𝗵𝗼𝘂𝘀𝗲 𝗖𝗮𝘁𝗮𝗹𝗼𝗴𝘀 𝘆 𝗽𝗼𝗿 𝗾𝘂é 𝘀𝗼𝗻 𝘁𝗮𝗻 𝗶𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝘁𝗲𝘀 𝗲𝗻 𝗹𝗮𝘀 𝗽𝗹𝗮𝘁𝗮𝗳𝗼𝗿𝗺𝗮𝘀 𝗱𝗲 𝗱𝗮𝘁𝗼𝘀 𝗵𝗼𝘆 𝗲𝗻 𝗱í𝗮? 📍 𝗨𝗻 𝗰𝗮𝘁á𝗹𝗼𝗴𝗼 𝗱𝗲 𝗱𝗮𝘁𝗼𝘀 𝗲𝘀 𝘂𝗻 𝗿𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝗶𝗼 𝗼 𝗶𝗻𝘃𝗲𝗻𝘁𝗮𝗿𝗶𝗼 𝗱𝗲 𝗹𝗼𝘀 𝗮𝗰𝘁𝗶𝘃𝗼𝘀 𝗱𝗲 𝗱𝗮𝘁𝗼𝘀 𝗱𝗲 𝘂𝗻𝗮 𝗼𝗿𝗴𝗮𝗻𝗶𝘇𝗮𝗰𝗶ó𝗻. 𝗘𝘀𝘁𝗼𝘀 𝘂𝘁𝗶𝗹𝗶𝘇𝗮𝗻 𝗹𝗼𝘀 𝗺𝗲𝘁𝗮𝗱𝗮𝘁𝗼𝘀 (técnicos, descriptivos, operativos o de gobernanza) 𝗽𝗮𝗿𝗮 𝗮𝗱𝗺𝗶𝗻𝗶𝘀𝘁𝗿𝗮𝗿 𝗹𝗼𝘀 𝗱𝗮𝘁𝗼𝘀. Hoy en día las grandes empresas tienen ecosistemas de datos amplios y complejos. Algunas de las características que deben soportar estos catálogos son las siguientes: ✅ Gestión de metadatos. ✅ Descubrimiento y perfilado de datos. ✅ Linaje de datos. ✅ Gobernanza. ✅ UI/UX. ✅ Control de acceso. ✅ Calidad. ✅ Observabilidad. ✅ Integraciones. 📍 Durante los 2 últimos años, muchas empresas que trabajan con datos han desarrollado o mejorado sus catálogos de datos. En el ámbito de los Data Lakehouses, y con la llegada de los OTFs (Iceberg, Delta Lake y Hudi), podemos clasificarlos de la siguiente manera: 1️⃣ 𝗢𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 𝗱𝗮𝘁𝗮 𝗰𝗮𝘁𝗮𝗹𝗼𝗴𝘀: Son proyectos mantenidos por la comunidad (no vendor lock-in), suelen ser gratuitos y pueden ampliarse con nuevas funcionalidades. No suelen contar con soporte oficial. e.g: #ApachePolaris, #ApacheGravitino, #UnityCatalog, #Nessie, #DataHub o #OpenMetafata. 2️⃣ 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 (𝗖𝗹𝗼𝘂𝗱-𝗯𝗮𝘀𝗲𝗱) 𝗱𝗮𝘁𝗮 𝗰𝗮𝘁𝗮𝗹𝗼𝗴𝘀: Garantizan un mejor acceso distribuído, mejor seguridad y reducen gastos de gestión. Pueden incluir funcionalidades adicionales (normalmente a partir de la versión OSS) y ofrecen diferentes opciones de despliegue: Managed (Software as a Service), Self-deployed, etc. Precio muy variable. e.g: #SnowflakeHorizon, #SnowflakeOpenCatalog, #DatabricksUnityCatalog, #FabricOneLakeCatalog, #GoogleDataplexCatalog, #AWSGlueCatalog o #MicrosoftPurview. 🔝 Apache Iceberg, a día de hoy, es el OTF más popular en los Data Lakehouses. Soporta 2 tipos de catálogos: File-based Catalogs y Service-Based Catalogs. 1️⃣ 𝗙𝗶𝗹𝗲-𝗯𝗮𝘀𝗲𝗱 𝗖𝗮𝘁𝗮𝗹𝗼𝗴𝘀: Estos catálogos funcionan en cualquier sistema de ficheros, ya sea Hadoop o object storages. Mantienen la referencia al último metadata.json. También conocidos como Hadoop Catalog. 2️⃣ 𝗦𝗲𝗿𝘃𝗶𝗰𝗲-𝗕𝗮𝘀𝗲𝗱 𝗖𝗮𝘁𝗮𝗹𝗼𝗴𝘀: Basados en un servicio o servidor. Los motores de consulta obtienen el fichero metadata.json en base al servicio. e.g: Hive Metastore Catalog, JBDC Catalog, AWS Glue Catalog o Nessie Catalog. 🔥🔥 𝗜𝗰𝗲𝗯𝗲𝗿𝗴 𝗥𝗘𝗦𝗧 𝗖𝗮𝘁𝗮𝗹𝗼𝗴 se está convirtiendo en un estándar de una especificación de catálogo REST, permitiendo la interoperabilidad entre catálogos o plataformas. ¿Quienes lo soportan? Apache Gravitino, Apache Polaris, Nessie, Unity Catalog, Snowflake, Databricks... #DataCatalog #DataManagement #IcebergRESTCatalog #DataLakehouses #DigitalTransformation #DataGovernance #DataEngineering
-
A Lakehouse is not open just because the table format, but compute services and catalogs also should enable interoperability and openness to a wider and dynamic data ecosystem. Repost Dipankar Mazumdar, M.Sc 🥑. This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Staff Data Engineer Advocate @Onehouse.ai | Apache Hudi, Iceberg Contributor | Author of "Engineering Lakehouses"
Open Lakehouse Architecture - But what's really open? There’s a noticeable shift in how users/customers are now consistently thinking about data architectures. You hear about terms like 'Open lakehouse'. But what does “open” really mean? This is not something easy to define. Still, it looks like there is a general agreement on one core idea. Data should reside as an 'open and independent' tier, allowing all compatible compute engines to operate on that “single copy” based on workloads. The most important thing to understand here is that just replacing proprietary storage formats with 'open table formats' doesn’t automatically make everything open and interoperable. In reality, customers end up choosing a particular open table format (based on vendor support), while being tied to proprietary services & tools for things like optimization, maintenance, among others. This confusion is created by the growing use of jargons like "open data lakehouse" and "open table formats”. And no, this is not about build vs buy! You can still buy vendor solutions while maintaining an open & interoperable platform. The 'key' is that when new workloads arise, you should be able to integrate other tools or seamlessly switch between compute platforms. I sought out to answer some of the questions that have been in my mind in this blog (in comments). This is from the perspective of having worked with the 3 table formats (Apache Hudi, Apache Iceberg & Delta Lake) for the past couple years of my career. Some questions that I ask: ✅ What are the differences between an open table format & an open data lakehouse platform? ✅ Is an open table format enough to realize a truly open data architecture? ✅ How seamlessly can we move across different platforms today? Would love to hear any thoughts! #dataengineering #softwareengineering
-
[repost Starburst] This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
Apache Iceberg and Trino are revolutionizing enterprise data lake architectures by tackling cost, scalability, and interoperability challenges. 🔑 Key Highlights from SiliconANGLE: - Iceberg delivers database-like functionality on object stores for seamless, cross-platform analytics. - Trino, co-created by Starburst’s Dain Sundstrom, powers fast, distributed query performance. - Together, they fuel innovation in modern data stacks like Starburst’s Icehouse, a fully managed Iceberg and Trino data lake. 💡 Explore how Starburst is advancing modern data strategies: https://okt.to/JVh6tB #Trino #Iceberg #DataInnovation #OpenSource
Unlock efficient data processing with Iceberg - SiliconANGLE
siliconangle.com
-
Great introduction to cloud-based Lakehouses. Repost Olena Yarychevska This page isn't affiliated with the Apache Iceberg project and doesn’t represent PMC opinions. For official news, please check the communication channels provided by the project: https://lnkd.in/dQ76H72K
📊 Apache Iceberg Tables in Snowflake! 📊 In this brief presentation, I’ve compiled key information about the Apache Iceberg functionality, including main features, setup, and integrations. You’ll find answers to questions like: - What is Apache Iceberg? and its role in data management - Key features, such as ACID transactions, schema evolution, and time-based snapshots - Integrating and setting up Iceberg in Snowflake - Use cases and limitations of Iceberg If you're interested in learning more about this tool or its capabilities - there is cool Tutorial: Create your first Apache Iceberg table 👇 https://lnkd.in/dStt-cnJ #ApacheIceberg #Snowflake #DataLake #DataEngineering #CloudStorage #Tech #DataManagement
-
Apache Iceberg reposted this
Apache Iceberg is our consensus.
📢 GET READY for Iceberg Streaming Analytics Meetup on November 21st🚀 Join RisingWave, MotherDuck and Confluent for an evening packed with the latest on streaming analytics, Apache Iceberg and modern data architectures ⚙️ 💡Agenda: 5:30 PM: Doors open! Grab some food, drink, and mingle 6:00 PM: Kafka Meets Iceberg – Real-time streaming into modern data lakes by Kasun Indrasiri ⚡ 6:30 PM: Iceberg + Postgres Protocol – RisingWave’s Iceberg power-up by Yingjun Wu 🌊 7:00 PM: When All You Have is a Hammer: Using SQL in Your Data Lake by Jacob Matson ⚒️ 7:30 PM: Network and hang out with fellow data nerds! 📍 Location: MotherDuck HQ, 2811 Fairview Ave E, Suite 2000, Seattle 🎟️Don’t miss out! RSVP now 👉 https://lu.ma/1xokjeia #DataStreaming #StreamProcesing #RealTimeAnalytics #SQL #Kafka