Data Vault Architecture
Concepts and Implementation
1. Introduction
Data Vault is a modern data modeling methodology designed for building scalable, auditable, and flexible enterprise data warehouses. It uses a hub-and-spoke architecture to separate business keys, relationships, and descriptive attributes.
2. Key Benefits
– Scalability & Flexibility: Incremental growth without disrupting existing structures.
– Complete Historical Tracking: Immutable audit trail using Satellites.
– Business-Aligned Organization: Hubs represent core entities; Links represent relationships.
– Rapid Development via Automation: Standardized patterns enable code generation.
– Built-in Audit & Compliance: Insert-only design preserves history.
– Resilience to Change: Decoupled architecture supports evolving business needs.
3. Core Components
– Hubs: Store unique business keys (e.g., CustomerID, OrderID).
– Links: Represent relationships between Hubs (e.g., Order-Customer).
– Satellites: Store descriptive attributes and history.
4. Hash Keys
– Hub Hash Key: Generated from business key.
– Link Hash Key: Generated from related hub hash keys.
– Purpose: Ensures uniqueness and consistency across systems.
5. Hash Diff
– Definition: A hash of all descriptive attributes in a Satellite.
– Purpose: Detect changes efficiently.
– Creation Example:
SELECT SHA256(CONCAT_WS(‘|’, name, email, address)) AS hash_diff FROM staging_customer;
6. Best Practices
– Always include load_date and record_source.
– Use SHA-256 for hashing.
– Normalize attribute values before hashing.
– Satellites PK = (Hub Hash Key + Load Date).
7. Example Model
– HubCustomer: customer_hub_hash_key, customerid, load_date, record_source
– LinkOrderCustomer: link_order_customer_hash_key, hub_order_hash_key, hub_customer_hash_key
– SatCustomer: customer_hub_hash_key, hash_diff, attributes, load_date, record_source
8. Adding New Sources Easily in Data Vault
One of the key advantages of Data Vault is its ability to integrate new data sources without disrupting existing structures. For example, if a company introduces a Payment System to track transactions, it can be added by creating a new Hub, Link, and Satellite.
- New components for Payment System:
- – HubPayment: Stores unique Payment IDs.
- – LinkOrderPayment: Connects payments to orders.
- – SatPayment: Stores payment attributes like method, status, amount.
The diagram below shows how the new Payment source integrates into the existing Data Vault model: