Foundational

Foundational

Data Infrastructure and Analytics

We're on a mission to help data and software engineering teams ship error-free code for the data stack.

About us

Data breaks too often. We make it easy to find, fix, and prevent issues in your code before you deploy and mess up the data. Use Foundational to proactively identify code and data issues, find and prevent issues, and create controls and guardrails. Foundational can be set up in minutes with no code changes required.

Industry
Data Infrastructure and Analytics
Company size
11-50 employees
Headquarters
San Francisco
Type
Privately Held
Specialties
Data Engineering, Analytics Engineering, Data, Cloud Data Warehouse, Code Analysis, CI/CD, Code Validation, Issue Detection, dbt, Spark, and SQL

Locations

Employees at Foundational

Updates

  • View organization page for Foundational, graphic

    1,424 followers

    Exciting release from Snowflake - column-level lineage! This certainly has a lot of people excited Read our thoughts below 👀

    View profile for Alon Nafta, graphic

    CEO at Foundational

    Column-level Lineage is now officially supported by Snowflake! (Official link in the comments) A few days ago Snowflake announced that column-level lineage is now available in Snowsight as a Preview Feature. This was long awaited and certainly got a few folks (including our team here at Foundational) very excited. IMHO it’s interesting for a few reasons: * There’s now an “official” reference for data lineage information you see in other tools or try to build yourself with open source parsers. While most tools supporting data lineage parse the same information (query logs), there are certainly big differences in accuracy, so it’ll be interesting to see if data catalogs will now pull this info directly, instead of parsing query logs for themselves, like they do for Databricks => My take: We’ll see most data catalogs switch to pulling Snowflake lineage info since it doesn’t make sense to parse on your own and risk having those differences. In that sense, this release also helps vendors and not just Snowflake customers. * Many Snowflake customers today *without* column-level data lineage can now get it from a credible source, and don’t need to buy a tool for lineage purposes only. But please note that the big deal about lineage is understanding cross-platform lineage (e.g., impact on dashboards), which is not supported here. Furthermore, this is for Snowflake Enterprise customers only, so I’d also assume that those larger companies already have a solution, but it’s still a meaningful alternative now. => My take: This puts some pressure on tools like dbt and data catalogs to become better in BI coverage and provide that “end-to-end lineage” experience everyone needs. Ultimately lineage is still a facilitator for a specific use case, for example cataloging, or CI/CD, which is our expertise at Foundational. It becomes more about what you do with lineage vs. do you have lineage. What are your thoughts? #datalineage #governance #snowflake

  • View organization page for Foundational, graphic

    1,424 followers

    Check out our CTO Barak Fargoun’s latest post on why enums create persistent challenges for data teams—especially when data warehouses fail to grasp their semantics.

    View profile for Barak Fargoun, graphic

    CTO at Foundational

    Data warehouses don’t understand the semantics of your enum fields! Many organizations encounter issues with enums because, although these fields are intended to have specific, limited values, data warehouses like Snowflake and BigQuery treat them as ordinary string columns. For example, a status column may contain values such as "pending", "in-progress", and "completed". However, since the warehouse sees it as a plain string, it can hold any text, which can lead to issues. Take a Tableau dashboard that filters on "status = 'In Progress'" while Snowflake stores "in-progress"—this minor difference can cause costly errors and inconsistencies. We frequently observe these semantic challenges with enums among our customers. At Foundational, we tackle these issues head-on by analyzing all code (dbt, Airflow, Spark) and SQL queries (from warehouse logs) to identify fields that function as enums and verify their allowed values. We then validate that all uses—whether filtering, joining, or applying "CASE WHEN" conditions—correctly handle these enums. Additionally, we monitor each pending change, alerting users in Pull Requests if new code references an invalid enum value, so issues like checking status for In Progress (when only in-progress is valid) are caught before they become problems. While data warehouses may overlook enum semantics, Foundational makes sure these critical details are enforced consistently. 😊

    • No alternative text description for this image
  • View organization page for Foundational, graphic

    1,424 followers

    Exciting news! We're sharing today that Julien Le Dem is joining Foundational as our latest advisor. Being at the forefront of data lineage research is at the core of what we do, and we're excited to work with Julien and the OpenLineage community to introduce new ways of driving end-to-end accurate lineage and improving the collaboration between the different lineage platforms in the ecosystem. Learn more about the recent news here >> https://lnkd.in/eYtGc8xJ

    • No alternative text description for this image
  • Foundational reposted this

    View profile for Barak Fargoun, graphic

    CTO at Foundational

    𝗪𝗵𝗮𝘁 𝗪𝗲 𝗟𝗲𝗮𝗿𝗻𝗲𝗱 𝗳𝗿𝗼𝗺 𝗔𝗻𝗮𝗹𝘆𝘇𝗶𝗻𝗴 𝟱𝟬𝗞 𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗣𝗥𝘀? Foundational recently hit a major milestone: scanning 50,000 pull requests from data pipelines code. These PRs span a wide range of frameworks, including dbt, Airflow, Spark, Google Dataform, and ORMs like SQLAlchemy, TypeORM, Ruby Active Record, etc. 𝗛𝗲𝗿𝗲 𝗮𝗿𝗲 𝟱 𝗸𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝘄𝗲’𝘃𝗲 𝗴𝗮𝗶𝗻𝗲𝗱: 𝟭. 𝗗𝗮𝘁𝗮 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗣𝗥𝘀 𝗮𝗿𝗲 𝗺𝗲𝗿𝗴𝗲𝗱 𝗳𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗮𝗻 𝗽𝗿𝗼𝗱𝘂𝗰𝘁 𝗰𝗼𝗱𝗲 On average, PRs for data pipeline code are merged in under a day, compared to 1.5 days for regular product code. This trend is most visible in dbt and LookML repositories, which might reflects the ease of coding in these modular frameworks. 𝟮. 𝗨𝗽𝘀𝘁𝗿𝗲𝗮𝗺 𝘀𝗰𝗵𝗲𝗺𝗮 𝗰𝗵𝗮𝗻𝗴𝗲𝘀 𝗼𝗰𝗰𝘂𝗿 𝗺𝗼𝗿𝗲 𝗳𝗿𝗲𝗾𝘂𝗲𝗻𝘁𝗹𝘆 𝘁𝗵𝗮𝗻 𝗮𝗻𝘁𝗶𝗰𝗶𝗽𝗮𝘁𝗲𝗱 These upstream changes originate from product code modifications that alter the schema of OLTP systems like Postgres, which are later synced to data warehouses or lakehouses via ETL tools like Fivetran. We observed that around 5% of these upstream changes impact database schemas, directly influencing downstream pipelines. This percentage is higher than we initially expected. 𝟯. 𝗠𝗼𝘀𝘁 𝗰𝗵𝗮𝗻𝗴𝗲𝘀 𝘁𝗼 𝗱𝗮𝘁𝗮 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲𝘀 𝗵𝗮𝘃𝗲 𝗮 𝘀𝗺𝗮𝗹𝗹 𝗶𝗺𝗽𝗮𝗰𝘁 Small-impact changes are about twice as common as large-impact ones. 𝟰. 𝗕𝗿𝗲𝗮𝗸𝗶𝗻𝗴 𝗦𝗰𝗵𝗲𝗺𝗮 𝗖𝗵𝗮𝗻𝗴𝗲𝘀: 𝗔 𝗖𝗼𝗺𝗺𝗼𝗻 𝗜𝘀𝘀𝘂𝗲 𝗔𝗰𝗿𝗼𝘀𝘀 𝗧𝗲𝗮𝗺𝘀 Schema-breaking changes occur when columns are renamed or removed while still being used downstream, causing disruptions to downstream pipelines. Although these changes are rare within a single framework (like a single dbt project or Airflow repository), they are quite common across teams or different frameworks. For example, a change in one dbt project might break another dependent project, or an Airflow modification could affect dashboards in Looker or Tableau. 𝟱. 𝗦𝗤𝗟 𝘀𝘆𝗻𝘁𝗮𝘅 𝗲𝗿𝗿𝗼𝗿𝘀 𝗮𝗿𝗲 𝗺𝗼𝗿𝗲 𝗰𝗼𝗺𝗺𝗼𝗻 𝘁𝗵𝗮𝗻 𝘄𝗲 𝗶𝗻𝗶𝘁𝗶𝗮𝗹𝗹𝘆 𝗲𝘅𝗽𝗲𝗰𝘁𝗲𝗱 When we introduced SQL syntax error detection into Foundational’s analysis, we assumed these errors would be extremely rare. However, we quickly learned otherwise. Frameworks like dbt, Airflow, and Spark often wrap SQL without proper validation, making it not uncommon for PRs to contain syntax errors. These errors can be difficult to spot, especially when the SQL is generated using complex logic, such as intricate dbt macros.

    • No alternative text description for this image
  • Foundational reposted this

    View profile for Barak Fargoun, graphic

    CTO at Foundational

    Classifying PII Data in Snowflake: Just Add ‘PROPAGATE = TRUE’? At the June Snowflake conference, I was thrilled to learn about an upcoming feature that could revolutionize how companies handle PII data. Snowflake is planning to introduce automatic label propagation for tags, directly addressing one of the most persistent challenges: identifying all instances of PII data. Once identified, enforcing the necessary data protection policies becomes much more straightforward. Although specific details were scarce, the concept is straightforward and promising. You create a tag, set it to automatically propagate (check out the attached screenshot), and Snowflake takes care of the rest. Every table, view, or object that interacts with this data will automatically inherit the tag. Combined with Snowflake’s existing classification capabilities, this could be a game changer for organizations grappling with PII data management. However, it’s important to note that this feature won’t extend to data outside of Snowflake’s environment. Companies will still face challenges with end-to-end governance of PII data—a space where other vendors, including Foundational, continue to innovate. Nonetheless, this represents a significant leap forward. Although Snowflake hasn’t set an official timeline, they did mention that a private preview will be available ‘soon.’ I’m looking forward to the official release and the impact it will have on the industry.

    • No alternative text description for this image
  • Foundational reposted this

    View profile for Alon Nafta, graphic

    CEO at Foundational

    Excited to share that Barak and I will be coming to Big Data LDN! Looking forward to chatting about preventative data quality, data governance, and of course, what's next for data lineage! Please reach out to schedule some time ahead and learn more about Foundational, or schedule directly with us: https://lnkd.in/gj6qASYY We had such a great time last year and I'm really looking forward to another fantastic event!

    • No alternative text description for this image
  • Foundational reposted this

    View organization page for Foundational, graphic

    1,424 followers

    A few months ago Snowflake released its own native data quality monitoring framework called ... Data Quality Monitoring. 😊 Using Snowflake's native data quality monitoring can be very powerful - it's faster, more efficient, and a lot more secure, but it may also seem daunting to set up and maintain against hundreds and thousands of tables. So, we wrote a guide! Read Barak Fargoun's Guide to Snowflake Data Quality to understand how to set up your first data metric functions (DMFs), allowing you to boost data quality for your Snowflake data. Find the guide here: https://lnkd.in/gi-qtkYk #snowflake #dataquality #dataqualitymonitoring #datacontracts

    | Foundational

    | Foundational

    foundational.io

Similar pages

Funding