The Open Format Movement Heats Up: Snowflake Embraces Apache Iceberg

The cloud data warehouse's support for Iceberg includes new features for query acceleration, data governance, data sharing, and more.

Apr 8th, 2025 6:00am by Jelani Harper

Featued image for: The Open Format Movement Heats Up: Snowflake Embraces Apache Iceberg

Today, data warehouse giant Snowflake announced a significant increase in its support for Apache Iceberg tables. The expanded integration with the open table format lets Snowflake customers access Iceberg data as though it were no different than other data contained in the popular cloud data platforms.

As a result, there’s now a host of Snowflake-enabled features that work with Iceberg tables, making the latter more secure, easier to share, and more performant for certain workloads.

Snowflake’s enhancements to Iceberg tables include:

Data Governance: Organizations can apply column and row-level security to Iceberg tables via techniques like masking and encryption.
Data Sharing: Snowflake customers can securely share Iceberg tables and views of tables inside the platform — without replicating data. They can also publish Iceberg data to Snowflake’s marketplace, which supplies monetization opportunities.
Business Continuity: Snowflake is responsible for replicating, synchronizing, and backing up Iceberg tables across multiple clouds and cloud regions.
Computations: With features like Query Acceleration Service and Search Optimization Service, respectively, Snowflake can accelerate queries on, and lower computational costs, for using Iceberg tables. These features are in preview for Iceberg data and generally available for Snowflake’s proprietary storage format.

“We’ve spent the last 18 plus months really rebuilding a lot of the core of Snowflake so that Iceberg tables are now genuinely first class inside Snowflake,” commented Chris Child, Snowflake’s vice president of product of data engineering. “That means they support all the different capabilities of Snowflake.”

Snowflake APIs

The aforementioned capabilities strengthen Snowflake’s Iceberg support so that use of this data format is practically indistinguishable from the format native to Snowflake. Prior to today’s announcement, it was possible to connect Iceberg tables and query them in Snowflake, although the system treated them as separate from its core.

According to Child, “If you’re using the open source Iceberg APIs to access the data, you’re limited to the things that Iceberg itself supports. On the other hand, if you’re coming in and using it within Snowflake or through the Snowflake APIs, then you get access to all of these capabilities.”

Iceberg Table Storage

Storage of the actual Iceberg tables is incumbent on Snowflake’s customers. It’s not uncommon for users to employ Microsoft Azure Blob Storage or Amazon Web Services’ S3 buckets for this purpose. Snowflake then stores what amounts to “a mix of Parquet and Iceberg metadata files… directly in the customer bucket,” Child explained. “And then, you use a catalog to help govern access control and discovery and a handful of other things.”

Organizations can select their data catalog of choice for these tasks. Snowflake is championing Apache Polaris, which can run inside its platform as a managed service. Because of the catalog integration and the stored information about the Iceberg tables Child referred to, “When you’re inside Snowflake, those feel just like Snowflake,” he said.

Query Accelerations

Snowflake’s Search Optimization Service and Query Acceleration Service can profoundly impact certain workloads involving Iceberg tables. The former is particularly relevant for analyzing facets of time-series data and data for observability or security use cases.

Specifically, this feature is employed “when you’re doing point lookups or looking for specific pieces of data, as opposed to doing aggregate queries atop it,” Child explained. Enabling Search Optimization Service allows the system to store additional metadata to accelerate retrieving single rows or single types of data for workloads that are “traditionally slow on columnar format,” Child said.

Scalable Compute for Queries

Snowflake’s Query Acceleration Service dynamically scales the compute for queries running in Snowflake. It’s viable for users who typically rely on small and medium sized workloads for data warehouses, but who occasionally have queries for larger data amounts that run faster with greater resources dedicated to them.

With this service, “We look at every query that comes in, and if it would run faster on a larger warehouse size, we go get more compute for that query specifically,” Child said. “So, we can tailor the amount of compute you have, not just at the warehouse level, but at the individual query level.” As a result, customers can decrease the size of their compute clusters to lower costs, but still get better average performance because the system can scale up when it’s beneficial.

Role-Based Access

Snowflake’s security and governance features are predominantly based on role-based access, which now works on Iceberg data. By applying row-level security controls, the system allows employees from different departments, for example, to get different query results “depending on the roles you have enforced,” Child said. With column-level security, organizations can remove columns that people don’t have access to when querying data, redact them, or tag them according to governance concerns such as PII. Users can implement rules that only certain roles can access entire credit card numbers, while others get the last four or none of the digits.

“You just define that once, and it gets applied then to every query that runs, every way that the data gets accessed, no matter how they’re coming at it,” Child said. “These things either don’t work or are very hard to implement in raw Iceberg.” Obfuscation methods include internal controls for tokenization and masking. Snowflake partners with vendors for external tokenization, in which data is tokenized before being accessed through the platform.

Data Sharing

Organizations can now share specific facets of Iceberg data — whole tables, views, functions, and even applications — with one another without copying or moving data. Because Snowflake runs as a managed service, these facets of the data can be made available on demand to users from different organizations or departments. According to Child, this feature works with “all the governance capabilities of Snowflake.”

Snowflake also supports data clean rooms, in which it operates as a neutral third-party for organizations (like a furniture manufacturer and a furniture retailer) to see which customers they might have in common. With this approach, the system allows both parties “the ability to look for matches in our customer lists, but not to run arbitrary queries against my data,” Childs said. Thus, organizations only expose the data they want from their Iceberg tables. Iceberg data can also be shared, bought, and sold in Snowflake Marketplace, the vendor’s data marketplace.

Business Continuity

The disaster recovery capabilities Snowflake enables are value-additive for any user of Iceberg tables. They simply specify — through the UI — where they’d like which tables they want replicated for business continuity. Then, Snowflake maintains a copy of that data in a customer’s location of choice, which can span clouds or cloud regions.

“As you make changes to the data, we will incrementally replicate those changes over to another region in a cost-effective way,” Child said. “And then, if you have a problem or an outage in that first region, you can failover to and transparently move your workloads, your pipelines, your clients, and it will all switch over to the other region with very little gap and very little downtime.”

The Larger Point

Snowflake’s expanded support for Iceberg means more than increased governance, security, disaster recovery, data sharing, and query acceleration mechanisms for the open table format. It signifies how capable open storage formats are for contemporary data management, analytics, and AI use cases. It also suggests how much more so they will be in the near and distant future.

“We’re really excited about Iceberg in particular because it’s a truly community-driven open format,” Child said. “We’re really excited to be part of that, and to be contributing and helping drive the whole data ecosystem forward, and take a lot the things that have really been incredible about Snowflake and bring them to Iceberg.”

Jelani Harper has worked as a research analyst, research lead, information technology editorial consultant, and journalist for over 10 years. During that time he has helped myriad vendors and publications in the data management space strategize, develop, compose, and place...