The Amazon-based, cloud-native relational database is set to offer intercontinental data sharing and gets set to run cross-cloud Credit: Marc Ferranti/IDG Fueled by a capital injection of $263 million making it the first cloud-native data warehouse startup to achieve “unicorn” status, Snowflake is set this year to expand its global footprint, offer cross-regional, data-sharing capabilities, and develop interoperability with a growing set of related tools. With the new round of funding, announced Thursday, Snowflake has raised a total of $473 million at a valuation of $1.5 billion. Founded in 2012, the company has become a startup to watch because it has engineered its data warehouse from the ground up for the cloud, designing it to remove limits on how much data can be processed and how many concurrent queries can be handled. What is Snowflake? Snowflake at its core is essentially a massively parallel processing (MPP) analytical relational database that is ACID (Atomicity, Consistency, Isolation, Durability) compliant, handling not only SQL natively but also semistructured data in formats like JSON by using a VARIANT custom datatype. The marriage of SQL and semistructured data is key, because enterprises today are awash in machine-generated, semistructured data. With a unique three-layer architecture, Snowflake says it can run hundreds of concurrent queries on petabytes of data, while users take advantage of cloud cost-efficiency and elasticity — creating and terminating virtual warehouses as needed — and even self-provisioning with no more than a credit card and about the same effort it takes to spin up an AWS EC2 instance. Snowflake Snowflake can store diverse datatypes. While the Snowflake on Demand self-service option may be particularly enticing to smaller and medium-size businesses (SMBs), Snowflake is well-positioned to serve big enterprises, such as banks, that are moving to the cloud, says CEO Bob Muglia, a tech veteran who spent more than 20 years at Microsoft and two years at Juniper before joining Snowflake in 2014. “It turns out that the data warehouse is one of the pivot points, a tentpole thing, that customers have to move because if the data warehouse continues to live on premises a massive number of systems surrounding that data warehouse will continue to live on premises,” Muglia says. And even well-funded, big enterprises like banks are attracted to cloud cost-efficiencies, Muglia points out. “If you’ve got some quant guy who wants to run something and all of a sudden needs a thousand nodes and needs it for two hours, it’s kinda nice to be able to do that really quickly then have it go away versus paying for them 365 days a year.” Snowflake will expand globally At the moment Snowflake runs on Amazon in four regions: US West, US East, Frankfurt and Sydney. It will be running in another European region within weeks, Muglia says. The capital infusion will enable the company to add Asian and South American regions within a year, he added. Within that timeframe the company also plans to: — Add the ability to do cross-region data replication. Right now, Snowflake’s Data Sharehouse allows for real-time data sharing among customers only within an Amazon region. The ability to replicate across continents should open up doors to global enterprises. — Run on another cloud provider. Muglia has been coy about which provider it will be, but concedes it’s likely to be Microsoft Azure. Cross-provider replication is also in the works, Muglia says. — Continue to work on the system’s ability to interoperate with various tools that its customers use. Customers often use certain database tools and add-ons for years, even after vendors stop updating them, and customers want new systems to work with them. The list of cloud data warehouse rivals grows There are, however, a gaggle of players vying to be the online data warehouse of choice for enterprises. Snowflake must contend with, for example, Microsoft Azure’s SQL Data Warehouse, Google’s BigQuery and Cloud SQL — where users can run Oracle’s MySQL — as well as RedShift from Amazon itself. But Muglia contends that Snowflake’s unique architecture allows it to scale way past traditional SQL databases, even when they are run in the cloud. In addition, its does not require special training or skills, such as noSQL alternatives like Hadoop. Most of the traditional databases, as well as RedShift and many noSQL systems, use shared-nothing architecture, which distributes subsets of data across all the processing nodes in a system, eliminating the communications bottleneck suffered by shared-disk systems. The problem with these systems is that compute cannot be scaled independently of storage, and many systems become overprovisioned, Snowflake notes. Also, no matter how many nodes are added, the RAM in the machines used in these systems limits the amount of concurrent queries they can handle, Muglia says. “The challenge that customers have today is that they have an existing system that’s out of capacity, it’s overtaxed; meanwhile they have a mandate to go to the cloud and they want to use the transition to the cloud to break free of their current limitations,” Muglia says. Snowflake Snowflake is designed to solve this problem by using a three-tier architecture: — A data storage layer that uses Amazon S3 to store table data and query results; — A virtual warehouse layer that handles query execution within elastic clusters of virtual machines that Snowflake calls virtual warehouses; — A cloud services layer that manages transactions, queries, virtual warehouses, metadata such as database schemas, and access control. This architecture lets multiple virtual warehouses work on the same data at the same time, allowing Snowflake to scale concurrency far beyond what its shared-nothing rivals can do, Muglia says. One potential problem is that the three-tier architecture might led to latency issues, but Muglia says that one way the system maintains performance is by having the query compiler in the services layer use the predicates in a SQL query together with the metadata to determine what data needs to be scanned. “The whole trick is to scan as little data as possible,” Muglia says. But make no mistake: Snowflake is not an OLTP database and it’s only going to rival Oracle or SQL Server for work that is analytical in nature. Meanwhile, though, it’s setting its sights on new horizons. “In terms of running and operating a global enterprise having a global database a very good thing and that’s where we’re going,” Muglia says. Snowflake’s latest venture capital round was led by ICONIQ Capital, Altimeter Capital and newcomer to the company, Sequoia Capital. SUBSCRIBE TO OUR NEWSLETTER From our editors straight to your inbox Get started by entering your email address below. Please enter a valid email address Subscribe