dltHub

dltHub

Softwareentwicklung

Supporting a new generation of Python users when they create and use data in their organizations

Info

Since 2017, the number of Python users has been increasing by millions annually. The vast majority of these people leverage Python as a tool to solve problems at work. Our mission is to make them autonomous when they create and use data in their organizations. For this end, we are building an open source Python library called data load tool (dlt). Our users use dlt in their Python scripts to turn messy, unstructured data into regularly updated datasets. It empowers them to create highly scalable, easy to maintain, straightforward to deploy data pipelines without having to wait for help from a data engineer. We are dedicated to keeping dlt an open source project surrounded by a vibrant, engaged community. To make this sustainable, dltHub stewards dlt while also offering additional software and services that generate revenue (similar to what GitHub does with Git). dltHub is based in Berlin and New York City. It was founded by data and machine learning veterans. We are backed by Dig Ventures and many technical founders from companies such as Hugging Face, Instana, Matillion, Miro, and Rasa.

Branche
Softwareentwicklung
Größe
11–50 Beschäftigte
Hauptsitz
Berlin
Art
Privatunternehmen
Gegründet
2022

Orte

Beschäftigte von dltHub

Updates

  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    A reddit user says: "Shift Left? I Hope So."  Shift left involves detecting and fixing problems earlier in the lifecycle (e.g., during coding rather than production). In theory it sounds good but "left" is an actual team, not a concept, and do you think they have time for your extra requirements? Some reddit users say:  "Validate data at the source. Catching bad data early saves you from chasing downstream problems later.” “Debug transformations as you build them—fewer surprises and faster delivery in the long run.” "Resolve inconsistencies early. Fixing issues upfront avoids expensive reprocessing later.” Read full reddit thread here: https://lnkd.in/ehdw8Vw3 dlthub blog post: https://lnkd.in/evcqCq3J Please let us know what do you think? Let’s discuss. 🧠

    Shift YOURSELF Left

    Shift YOURSELF Left

    dlthub.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    Would Delta Lake’s tight integration with Databricks win out, or will Apache Iceberg quietly take the lead? 👇 Databricks acquired Tabular, Iceberg’s original creators’ company. Snowflake launched Polaris, its Iceberg-based catalog, backed by Starburst and Dremio. What’s ahead for 2025? (Iceberg) ✅ 1. RBAC catalog: better permissions at scale Iceberg is adding Role-Based Access Control (RBAC), like Databricks’ Unity Catalog but open-source. ✅ 2. Real-time updates: Iceberg's next step Row lineage tracks changes (insert, delete, update), improving analytics and real-time data handling. ✅ 3. Materialized views: finally, here Iceberg now supports materialized views for faster and more efficient queries. What’s next? Support for nanosecond timestamps with time zones. New tools for handling large-scale deletions. Full article here: https://lnkd.in/gFMRzBkh What’s your bet for the future? #Iceberg #DataEngineering #OpenSource

    Apache Iceberg Won the Future — What’s Next for 2025?

    Apache Iceberg Won the Future — What’s Next for 2025?

    blog.det.life

  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    Our consulting partnership program is open!

    Profil von Adrian Brudaru anzeigen, Grafik

    Open source pipelines - dlthub.com

    𝐝𝐥𝐭𝐇𝐮𝐛 𝐏𝐚𝐫𝐭𝐧𝐞𝐫𝐬𝐡𝐢𝐩 𝐩𝐫𝐨𝐠𝐫𝐚𝐦 𝐢𝐬 𝐧𝐨𝐰 𝐨𝐩𝐞𝐧 𝐭𝐨 𝐜𝐨𝐧𝐬𝐮𝐥𝐭𝐚𝐧𝐭𝐬 𝐚𝐧𝐝 𝐚𝐠𝐞𝐧𝐜𝐢𝐞𝐬 𝐑𝐞𝐚𝐝 𝐦𝐨𝐫𝐞 𝐡𝐞𝐫𝐞: 𝐡𝐭𝐭𝐩𝐬://𝐝𝐥𝐭𝐡𝐮𝐛.𝐜𝐨𝐦/𝐛𝐥𝐨𝐠/𝐜𝐨𝐧𝐬𝐮𝐥𝐭

    Announcing Our Consulting Partnerships Program

    Announcing Our Consulting Partnerships Program

    dlthub.com

  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    What is Delta Sharing? (Simplified) 👇 ● Open protocol for secure, scalable, and platform-agnostic data sharing. ● Share data across platforms and clouds without vendor lock-in. ✅How It Works? ● Databricks-to-Databricks: Advanced sharing with governance and AI tools. ● Open Sharing: Share with any platform using secure tokens. ● Customer-Managed: Host your own Delta Sharing server for complete control. ✅ Why Use It? ● Flexible for multi-cloud and hybrid setups. ● Secure with token-based access and auditing. ● Cost-efficient: No data replication or high transfer fees. 💡 How does your organization handle secure data sharing? Let's discuss! #DeltaSharing #DataCollaboration #SecureDataSharing

    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    🚀 𝗔𝗽𝗮𝗰𝗵𝗲 𝗜𝗰𝗲𝗯𝗲𝗿𝗴: 𝘁𝗵𝗲 𝗻𝗲𝘅𝘁 𝗔𝗽𝗮𝗰𝗵𝗲 𝗛𝗮𝗱𝗼𝗼𝗽? 📍𝗛𝗮𝗱𝗼𝗼𝗽 𝘃𝘀. 𝗜𝗰𝗲𝗯𝗲𝗿𝗴: Hadoop solved the "big data explosion" in the 2010s. Iceberg fixes today’s data lake headaches. 📍 𝗔𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝗰𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀: Fast adoption often creates messy, overcomplicated setups. 📍𝗦𝗺𝗮𝗹𝗹 𝗳𝗶𝗹𝗲𝘀 𝗽𝗿𝗼𝗯𝗹𝗲𝗺: Iceberg struggles with too many small files as Hadoop did. High-frequency writes = metadata overload. 📍 𝗡𝗼𝘁 𝗼𝗻𝗲 𝘁𝗼𝗼𝗹: Iceberg isn’t a single tool. It’s part of an ecosystem (query engines like Trino/Spark/Flink + storage like S3/GCS). This needs a platform mindset. 📍 𝗦𝗻𝗮𝗽𝘀𝗵𝗼𝘁 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁: Snapshots avoid single points of failure. But as they grow, metadata can get messy. Keep things clean with metadata purging and monitoring. 📍𝗦𝗲𝗹𝗳-𝗵𝗼𝘀𝘁𝗶𝗻𝗴 𝘃𝘀. 𝗺𝗮𝗻𝗮𝗴𝗲𝗱: Self-hosting = more control. Managed services = simpler but less flexible. 📍 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗮𝗻𝗱 𝗰𝗼𝗺𝗽𝗲𝘁𝗶𝘁𝗶𝗼𝗻: Iceberg has a strong open-source community. But competition (Delta Lake, Hudi) risks fragmentation. Standardization will be key. ⭐ 𝗙𝘂𝘁𝘂𝗿𝗲 𝘁𝗿𝗲𝗻𝗱𝘀 𝘁𝗼 𝘄𝗮𝘁𝗰𝗵: 1️⃣ One table format might dominate. 2️⃣ Tools to make Iceberg easier to manage (e.g., Amazon S3 Tables). 3️⃣ Growth in real-time use cases (streaming + ML). ⭐ 𝗧𝗵𝗲 𝗯𝗶𝗴 𝗽𝗶𝗰𝘁𝘂𝗿𝗲: Iceberg is a big leap in data management. By learning from Hadoop, we can build better systems for the future. #DataEngineering #BigData

    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    Google introduced a new pipe syntax for BigQuery SQL.👇 - It’s an 𝗲𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁𝗮𝗹 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 aimed at improving SQL readability and design. - While it doesn’t enhance execution performance, it simplifies query structuring. - Pipe syntax enables 𝗹𝗶𝗻𝗲𝗮𝗿 𝗾𝘂𝗲𝗿𝗶𝗲𝘀: a series of easy transformations to read and maintain. Here’s a traditional SQL query example for calculating average order value: ```sql WITH customer_orders AS (   SELECT     c.customer_id,     o.order_id,     o.order_value   FROM     customers c   JOIN     orders o   ON     c.customer_id = o.customer_id   WHERE     c.customer_status = 'active' ), average_order_value AS (   SELECT     AVG(order_value) AS average_order_value   FROM     customer_orders ) SELECT   * FROM   average_order_value ``` And here’s the same query using the new 𝗽𝗶𝗽𝗲 𝘀𝘆𝗻𝘁𝗮𝘅: ```sql customers | JOIN orders ON customers.customer_id = orders.customer_id | WHERE customer_status = 'active' | SELECT customer_id, order_id, order_value | SELECT AVG(order_value) AS average_order_value ``` Currently, pipe syntax is in preview mode and requires applying for access. Would you use this for a cleaner SQL design? Share your thoughts below! 👇 #BigQuery #SQLDesign #DataEngineering #DataQueries

    • Kein Alt-Text für dieses Bild vorhanden
  • dltHub hat dies direkt geteilt

    Profil von Amir Zohrenejad anzeigen, Grafik

    Building Relta | YC Alum

    🚀 Excited to release github-assistant: an AI assistant for repository data from the GitHub API. Simon Farshid and I set out to build an AI assistant on a public dataset. What we built in a very short time is a testament of how far dev tools in data and AI have come in the past year. 🌟 Try it out: https://lnkd.in/epcg3MYp 📖 Learn how it works: https://lnkd.in/es-JQraG A shoutout to all the tools that made this possible: 👉 Relta which powers the semantic layer and text-to-sql 👉 assistant-ui for all interfaces 👉 dltHub for all data pipelines from GitHub API 👉 LangChain for the agent infrastructure

    • Kein Alt-Text für dieses Bild vorhanden
  • Unternehmensseite von dltHub anzeigen, Grafik

    7.165 Follower:innen

    𝐀𝐖𝐒 𝐥𝐚𝐮𝐧𝐜𝐡𝐞𝐝 𝐒𝟑 𝐓𝐚𝐛𝐥𝐞𝐬. It's a new S3 bucket type that optimizes storage and performance for Apache Iceberg tables. Native support for Iceberg in S3 is a big deal. It has significant implications for data engineers, architects, and the broader data ecosystem. S3 Tables are storage-optimized buckets for managing Apache Iceberg tables. Standard S3 buckets require manual operations for compaction, snapshot cleanup, etc. S3 Tables automate this, resulting in better performance. Key features: - 3x faster queries vs. standard S3 - Up to 10x higher transaction throughput (for high-volume workloads) - Built-in automated maintenance - Integration with AWS services (Amazon Athena, EMR, Glue, QuickSight) - Data stored in Iceberg-compatible formats (e.g. Parquet) ➜ access through any third-party query engines that support Iceberg S3 Tables strengthen AWS's position in the lakehouse ecosystem. S3 Tables also simplify Iceberg table operations. They reduce engineering effort for metadata management and performance tuning. S3 Tables adhere to Iceberg standards. They interoperate with tools like Apache Spark, Flink, Dremio, Starburst, and Estuary Flow. It will likely increase Iceberg adoption, especially among AWS users. Adoption could lead to: - Escalating costs for high-frequency or real-time workloads - Increased vendor lock-in ➜ Reliance on AWS-managed features could complicate future migrations to other Iceberg-compatible systems In conclusion, AWS S3 Tables fundamentally change how Iceberg tables are managed and queried. Read their official documentation here:

    Working with Amazon S3 Tables and table buckets - Amazon Simple Storage Service

    Working with Amazon S3 Tables and table buckets - Amazon Simple Storage Service

    docs.aws.amazon.com

Ähnliche Seiten

Finanzierung

dltHub Insgesamt 1 Finanzierungsrunde

Letzte Runde

Pre-Seed

1.500.000,00 $

Investor:innen

Dig Ventures
Weitere Informationen auf Crunchbase