Streaming Data Pipelines: An Introduction to Snowflake’s Dynamic Tables

Introduction

Manually refreshing data pipelines in response to frequent data updates can be a cumbersome and error-prone process. This challenge becomes particularly significant when dealing with high-velocity data streams. Snowflake’s Dynamic Tables offer a powerful solution, enabling the creation of automated data pipelines through a declarative approach.

Introduced in 2023 at the Snowflake Summit, Dynamic Tables represent a paradigm shift in data transformation. They function as materialized views based on user-defined SQL queries. This eliminates the need for separate target tables and complex transformation code. Dynamic Tables automatically refresh based on the underlying source data, ensuring that the data pipelines remain continuously updated with the latest information.

In contrast to traditional approaches, which involve creating streams and managing data flow through them upon data changes, Dynamic Tables simplify the process by leveraging declarative queries. This streamlines data pipeline development and reduces maintenance overhead.

When to use and when not to use?

Dynamic tables are the most useful in the following scenarios:

1. In scenarios where data should update automatically for dependent tables without having to check for those dependencies manually.

2. As streams are complex objects, it’s better to use dynamic tables instead.

3. Materialise multiple tables into one result table.

4. Create multiple tables and transform the data in them through ETL pipelines.

However, there are some functionalities which are not supported by dynamic tables, and dynamic tables should be avoided in such scenarios. They are stored procedures, non-deterministic functions, external functions, or if your source table is an external table, stream or materialized view.

Creating a Dynamic Table

-- Create a landing table to store raw JSON data. 

CREATE OR REPLACE TABLE raw 

(var VARIANT); 

-- Create a dynamic table containing the names of office visitors from the raw data. 

-- Try to keep the data up to date within 1 minute of real time. 

CREATE OR REPLACE DYNAMIC TABLE names 

TARGET_LAG = '1 minute' 

WAREHOUSE = mywh 

AS 

SELECT var:id::int id, var:fname::string first_name, 

var:lname::string last_name FROM raw;

How does it work?

In easy words, imagine having a process which makes a table that updates itself, automatically detecting changes from other table(s). This update happens through an automated process, like a self-updating spreadsheet. This table is called the dynamic table, and the table that it depends on is called as the base table.

When you set up magic table, you get to decide how “fresh” you want the data to be. For instance, you might want the table to never be more than five minutes out of date compared to the base table. This freshness value is what we call the “target freshness”, and it gives the table a threshold so that data older than that doesn’t exist. Based on your freshness threshold, the automated process schedules updates to keep all the views and data within the desired time frame.

If you don’t require data which updates too quickly, you can even set this target freshness to one hour. This is like watching a livestream, only delayed by an hour or a rebroadcast one hour after the show finishes. Also, choosing a longer freshness target would incur lower costs, as the system works lesser on maintaining the freshness of the data.

Downstream Functionality of Dynamic Tables

Downstream in the context of Dynamic tables means the part of the process which relies on the data of some other table. It helps us understand the flow of the data and understand how changes and updates can affect the parts which depend on it.

Now, let’s understand it with an example.

1. Daily Sales Summary Table

This table is going to be dynamic in nature, updating itself daily with the sales data of that day. This sales data would be categorized according to the product as well. The schema would be given below.

2. DailySalesSummary’s Schema

Date: The date of the sales data.
CategoryID: An identifier for the product category.
TotalSales: The total sales amount for the category on that day.
TotalUnitsSold: The total units sold for the category on that day.

3. Downstream Dependencies

Views for Management Reporting
- A view named MonthlyCategorySales aggregates data from DailySalesSummary to provide monthly sales figures by category. If the schema of DailySalesSummary changes (e.g., adding a new column for Returns), this view may need to be updated to incorporate or adjust to the new data structure.

CREATE OR REPLACE VIEW MonthlyCategorySales AS 

SELECT 

    		DATE_TRUNC('MONTH', Date) AS Month, 

    		CategoryID, 

    		SUM(TotalSales) AS MonthlySales, 

    		SUM(TotalUnitsSold) AS MonthlyUnitsSold 

FROM DailySalesSummary 

GROUP BY Month, CategoryID;

ETL Processes for Data Enrichment
- An ETL process enriches the DailySalesSummary table with additional data from a third-party market research database. This process might need adjustments if, for example, the table’s schema changes or if the enrichment logic needs to incorporate new or changed data.

Data Analysis Tools
- Business analysts use BI tools (e.g., Tableau, Power BI) to create dashboards based on the DailySalesSummary and its derived views. Any changes in the underlying data structure or semantics might require updates to these dashboards to ensure they reflect the correct information.
Data Exports for Partners
- Weekly, a CSV export of aggregated sales data is sent to various business partners. If DailySalesSummary undergoes changes (such as a new column for promotional sales), the export scripts and possibly the partner agreements regarding data format might need to be reviewed and updated.

4. Managing Downstream Updates

To manage downstream updates effectively:

Communication: Notify all stakeholders of planned changes in advance.
Versioning: Consider versioning the table or view changes so that downstream users can transition at their own pace.
Documentation: Keep thorough documentation of all dependencies and their nature.
Automation: Automate testing of downstream processes to quickly identify and address issues caused by changes upstream.

Other Important features Supported by Dynamic Tables

1. Chaining together pipelines of dynamic tables

A dynamic table is like having a robot that can check several sources to gather all the info it needs.

For instance, A system that’s pulling in new data, sorting it into different categories like customer info, product details, and sales dates and times. Then, it combines all that sorted data to give a summary of total sales.

Here’s how dynamic tables can be used, in this scenario:

First, set up tables for each category (customer, product, date/time) to automatically pull their data from a main collection spot, which we’ll call the staging table.
Next, create another smart table that looks at all these category tables to sum up the sales data.

Think of it like setting up a series of dominoes. Each domino is a task: updating customer info, product details, etc. Once all the earlier dominoes have successfully toppled (meaning the individual tasks are done), the last domino—updating the sales summary table—gets to fall. This ensures everything happens in order and accurately.

Comparing to Task Graphs

Task Graphs are a type of graph which provide the flow of how changes and tasks are needed to be done. For example, let’s say you have a set of tasks, and you know that some tasks have prerequisites of having to do another task. If you draw the flow of these tasks, you’ll end up with a pretty simple task graph. In technical terms, we might call this a Directed Acyclic Graph (DAG).

If one dynamic table needs information from another dynamic table, then the system automatically figures out the optimized time to update each table so that all the information stays fresh and accurate.

Let’s say you have three tables, sales summary table (containing records of sales), customers table (containing customer data records), and product table (containing product records).

If your sales summary table needs some data from the customer and product tables, it’ll wait for them to be updated first. Only after they get updated will it update itself for proper synchronization.

In other words, it just makes sure the entire database is synchronized wherever there is a dependence of data between tables, so that there is no duplication or other such issues without having to do it manually.

2. Time travel feature

Snowflake Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at any point within a defined period. Time Travel behaves identically for Dynamic Tables as it does for traditional tables.

3. Replication feature

Replication support for dynamic tables enables one to copy data from a primary database to a secondary database for either disaster recovery or data sharing. It can serve as either a failover preparation strategy for disaster recovery or as a means of sharing data across deployments for read-only purposes.

Replicated dynamic tables behave differently depending on if the primary database that contains the dynamic table is replicated in a replication group or a failover group.

Now that we have some basic understanding about what dynamic tables are and why they are so useful, let’s try to look at some use cases.

Use Cases

1. UDTF with dynamic tables

Set Up Your Environment:
- Select or switch to the schema where you’ll be working on for this use case, for example, new_db.schema1.

Create a Python UDTF in Snowflake:
- Define the UDTF: Start by creating a Python-based UDTF in Snowflake that calculates the running total of customer account balances. This involves writing Python code within the Snowflake environment.
- Your Python function (sum_table) will receive a number (e.g., an account balance) as input and output a table with the running total. The function uses a class to maintain the state of the running total and iterates over each input to compute and yield the new total.

Integrate UDTF into a Dynamic Table:
- Create the Dynamic Table: Using the previously defined UDTF, create a dynamic table (cumulative_purchase) that computes the cumulative total of customer purchases.Specify attributes such as LAG (to define data freshness) and WAREHOUSE (to designate the compute resources).
- Apply the UDTF: In the dynamic table creation query, apply your UDTF to the saleprice column of your sales report table. Partition the data by creationtime and customer_id to compute the running total in the context of each customer over time.
- Select Columns: Ensure your dynamic table selects relevant columns (like month, year, customer ID, sale price, and the running total) to make the data meaningful and useful for analysis.

Query the Dynamic Table
- After setting up the dynamic table, you can query it (e.g., SELECT * FROM cumulative_purchase LIMIT 10;) to view the cumulative totals and analyze customer spending patterns over time.

2. Data Validation using Dynamic table

Prepare Your Environment
- Begin by navigating to a new SQL worksheet within the Snowflake UI, ideally naming it something indicative of its purpose, like “04_Dynamic_Table_For_Data_Validation”.
- Ensure you’re operating within the correct schema (DEMO.DT_DEMO).

Create a Dynamic Table for Inventory Alerts
- Define a Dynamic Table (PROD_INV_ALERT): This table will calculate and track the inventory levels of products, identifying those with less than 10% inventory remaining. It integrates data from sales reports and product inventory to calculate current inventory levels and identifies products at risk of running low.

Query the Dynamic Table for Low Inventory Products
- Identify Products with Low Inventory: Run a query against the PROD_INV_ALERT table to find products where the percentage of units left in inventory is below 10%. This operation helps in quickly pinpointing items that need immediate restocking.

Set Up Alerts for Low Inventory
- Create an Email Notification Integration: This step involves setting up an integration within Snowflake to send email alerts. Specify who receives these alerts and under what name the integration operates.
- Define an Alert for Low Inventory: Create an alert (alert_low_inv) that triggers based on the condition of products having less than 10% inventory. This alert uses the dynamic table to evaluate the condition and, if true, sends an email notification to the specified recipients.
- Manage Alert Execution: Initially, alerts are paused. To start receiving notifications, you will need to resume the alert. Moreover, you can insert new records into your sales data to simulate inventory changes and test the alert system.

Monitoring and Managing Alerts
- Monitor Alerts: Snowflake provides functionalities to monitor and inspect the history of alert executions. Utilizing these features allows you to ensure your alert system is functioning as expected.
- Pause or Resume Alerts: Depending on your operational needs, you may find it necessary to pause or resume alerts. Snowflake offers the flexibility to manage alert states easily, ensuring you’re only notified when necessary.
- Suspend Alerts: To conserve warehouse credits and avoid unnecessary operations, it’s crucial to suspend alerts when they’re not needed. This action stops the alert from executing until you decide to resume it.

3. Monitoring dynamic tables: cost DAG and Dashboard

Monitor Dynamic Table Refresh History:
- Use the function:

INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY()

This function will help you access the refresh history of the dynamic tables in your account. With this, you would be able to get the data of all the updates and failures in the updating of the same.

SELECT * FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY())

WHERE NAME IN ('SALESREPORT', 'CUSTOMER_SALES_DATA_HISTORY', 'PROD_INV_ALERT', 'CUMULATIVE_PURCHASE')

ORDER BY DATA_TIMESTAMP DESC, REFRESH_END_TIME DESC LIMIT 10;

Visualize the Directed Acyclic Graph (DAG):
- Navigate to Snowsight GUI to access the visual representation of your data pipeline’s DAG. This visualization aids in understanding the dependencies and execution flow among your dynamic tables.

Path: Data > Databases > DEMO > DT_DEMO > Dynamic Tables

- Here, you can monitor refresh history, preview data, and examine refresh modes and table structures.

Utilize Information Schema for Detailed Insights:
- For more granular monitoring of refresh issues and DAG, leverage the DYNAMIC_TABLE_REFRESH_HISTORY and DYNAMIC_TABLE_GRAPH_HISTORY functions available in the INFORMATION_SCHEMA.
- These functions provide detailed insights into the refresh cycles and the structure of your dynamic table dependencies.

Suspend and Resume Dynamic Tables:

Dynamic tables can be suspended or resumed to manage credit consumption and refresh cycles effectively.

— Resume the data pipeline

ALTER DYNAMIC TABLE customer_sales_data_history RESUME; 

ALTER DYNAMIC TABLE salesreport RESUME; 

ALTER DYNAMIC TABLE prod_inv_alert RESUME;

— Suspend the data pipeline

ALTER DYNAMIC TABLE customer_sales_data_history SUSPEND; 

ALTER DYNAMIC TABLE salesreport SUSPEND; 

ALTER DYNAMIC TABLE prod_inv_alert SUSPEND;

Understand and Monitor Costs:
- Dynamic tables incur costs in storage, cloud service compute, and warehouse compute. Monitoring these costs helps in optimizing resource utilization and managing expenses.
- It’s crucial to test dynamic tables with dedicated warehouses to grasp the associated costs better, which are influenced by the frequency of data refreshes and the specified LAG.

Review Refresh Modes and Operational Features:
- Dynamic tables support both FULL and INCREMENTAL refresh modes. Use the SHOW DYNAMIC TABLES command or check the dynamic table dashboard to determine the refresh mode of your tables.
- Additionally, dynamic tables offer support for Snowflake features like Time Travel, Replication, Data Governance, Masking, and Tagging, ensuring that they align with your data management policies and security requirements.

Costing

Stream and task:

WAREHOUSE_NAME	CREDITS_USED	CREDITS_USED_COMPUTE	CREDITS_USED_CLOUD_SERVICES
STREAM_WH	0.437150833	0.436944444	0.000206389

Dynamic Table:

WAREHOUSE_NAME	CREDITS_USED	CREDITS_USED_COMPUTE	CREDITS_USED_CLOUD_SERVICES
DYN_WH	0.350849445	0.350277778	0.000571667
DYN_WH	0.000748056	0	0.000748056
DYN_WH	0.000640556	0	0.000640556
	0.352238057

These results are for letting the same run for 3 hours. As you can see, the dynamic table method uses lesser amount of credits and is better both convenience and cost wise.

Conclusion

Dynamic tables in Snowflake represent a pivotal advancement in managing and utilizing data with greater efficiency and adaptability. By automating data refreshes and eliminating the need for manual query updates. Dynamic tables offer a robust solution to the challenges posed by rapidly changing data landscapes. Their capability to self-update based on predefined queries ensures that data remains fresh and reflective of the latest changes, thereby facilitating more accurate and timely decision-making.

The use cases highlighted in this article—ranging from UDTFs, data validation, to comprehensive monitoring—underscore the versatility and power of dynamic tables. Whether it’s streamlining data transformation pipelines, enforcing data quality controls, or providing granular insights into data refresh dynamics, dynamic tables serve as a cornerstone for advanced data strategies in Snowflake.

However, it’s important to recognize the scenarios where dynamic tables shine and where they might not be the best fit, especially considering their limitations with certain functionalities and the costs associated with their operation. By carefully evaluating these factors and leveraging dynamic tables where they offer the most value, organizations can optimize their data management practices, enhance operational efficiency, and unlock new levels of analytical capabilities.

As data continues to grow in volume, variety, and velocity, tools like dynamic tables will become increasingly critical in harnessing its potential. By embracing these innovations, businesses can stay ahead of the curve, adapting to the ever-evolving data landscape with agility and confidence.

About Authors

Umesh Sharma – Umesh is a Principal Engineering Manager at Zimetrics, bringing with him a wealth of experience in technology. In addition to his primary role, Umesh assumes the responsibilities of a Data Architect, leveraging his expertise to implement and spearhead numerous projects within the Snowflake ecosystem.

Adarsh Anshul -Adarsh is an enthusiastic engineer with a keen passion for acquiring and mastering cutting-edge technology. His primary focus lies within the realms of Data Science and Machine Learning. He works on solutioning of data-driven projects within the Snowflake environment.

Get the latest Zimetrics articles delivered to your inbox

Stay up to date with Zimetrics

Services

Data

AI

Digital

Solutions

PLATFORMS

ACCELERATORS

Studios

Studios

PARTNERSHIPS & PRACTICES

industries

About

Home

Insights