Fuel Business Growth: Robust Data Modeling with Data Vault 2.0 and Informatica

Last Published: Jun 21, 2024 |
Karthikeyan Mani
Karthikeyan Mani

Principal Product Manager

In this new era of modern technologies and innovation powered by AI, the most essential asset for any organization is its data. Imagine it as the fuel that powers the engine of a company's growth, making the need to efficiently store and process this data more crucial than ever. This is where the concept of a data vault comes into play — it's about making sure businesses are always ready to make smart, quick decisions based on their data.

What Exactly Is a Data Vault?

Think of a data vault as a vast, secure and incredibly organized digital library in the cloud, where every piece of information your business collects is kept safe yet easily accessible. It's designed with today's AI-powered analytics and data management needs in mind, ensuring that businesses can not only store massive amounts of data but also use it in meaningful ways.

Simplifying Data Modeling

To understand the importance of a data vault, we first need to look at traditional data modeling techniques:

Third Normal Form is like organizing your closet to reduce clutter. This data modeling approach gets rid of duplicate data, avoids anomalies and makes sure every piece of data is in the right place, ensuring faster and more efficient data writing.

Dimensional Modeling focuses more on the business aspect, optimizing the data storage in a data warehouse. Data is organized into two main types of tables — fact and dimensional — for easier access and analysis, making it ideal for businesses that need quick insights from their data. 

Enter Data Vault 2.0

Data Vault 2.0 takes the best parts of these traditional models and combines them into a powerhouse system for managing data. It's like building a resilient bridge that not only stands strong but can quickly adapt and expand according to the changing needs of the business environment. This flexibility means you can add or remove parts of your data structure without causing a system-wide headache, ensuring your business remains agile and responsive. This means that no matter how rapidly the world changes or how much data your business accumulates, you'll always be prepared to harness the full value of your data for smarter, speedier business decisions.

Data Vault 2.0 Reference Business Architecture

Building on the foundational principles outlined above, let's delve into the specific aspects of Data Vault 2.0 reference business architecture as depicted in Figure 1.

From different sources the data is staged in a raw vault without adding any filters. It serves as an initial landing area for ingested data before it undergoes any transformation. Here the data is stored in three different components: hub, link and satellite. 

Hubs maintain a distinct list of keys, along with metadata that delineates the source of these business keys. For instance, a hub table might hold employee details, with the employee ID serving as the business key.

Links connect the keys, creating a relationship between hubs. This relationship is captured as data, offering flexibility and adaptability to changes, rather than relying on storing a foreign key. For instance, the association between employee and department is defined by a unique link, enhancing the model's efficiency and clarity.

Satellite contains descriptive data that adds context to business keys, while also offering historical insights about the business key and its related relationships. For instance, it can include the business context of an employee, such as name, date of birth, address and more.

Business vault serves as an advanced layer built upon the raw vault. It is in this stage that business logic rules are meticulously applied to the data before it transitions to the information mart or delivery phase. This process involves applying specific business rules to the data, de-normalizing it, identifying insights and incorporating query assistance functions to facilitate rapid reporting.

Figure 1. Data Vault 2.0 architecture

Data Vault 2.0 Advantages

Key benefits include:

  • Adapts to change in data source, business requirements and organizational structures.
  • Provides agile and adaptive development through its flexible architecture.
  • Supports high data storage, high throughput and responsiveness.
  • Facilitates robust data quality and governance practices.
  • Allows for compatibility with modern data technologies by providing scalable, adaptable and future-proof data architectures.

Data Vault 2.0 and Informatica Cloud Data Integration

Informatica Cloud Data Integration (CDI) is designed to support Data Vault 2.0 and provide numerous benefits. It enables agile data operations, allowing organizations to adapt quickly to evolving business requirements. The integration also facilitates improved scalability and better data quality. Data Vault's standard schemes maintain data consistency, making data easier to manage.

How to Implement Data Vault 2.0 Components in CDI

Informatica CDI creates a parameterized mapping to dynamically build hubs, links and satellite tables for different tables/objects.

Here’s an explanation for each component of Data Vault 2.0:

Hub Template: In this model, the hub key, as illustrated in Figure 2, is a critical requirement. It is a surrogate hash key that uses both the source system's attributes and the incoming system's business keys. This key is later utilized in conjunction with link and satellite tables. To filter existing records, join the landing table and hub table based on the calculated hub key along with the business key in the Hub table. Whenever new records come in, they are inserted into the hub table and timestamped for reference.

Figure 2. Hub architecture

Link Template: Illustrated in Figure 3, the link table represents the business relationship, maintaining connections between two or more hubs. It contains a unique set of all occurrences related to the combination of the hub keys. The primary key in the link table is a hash result of the concatenated integration keys from all participating hubs. The hash key is computed from the landing table and is then combined with hub tables to identify the hub keys from the interconnected hubs.

Figure 3. Link architecture 

Satellite Template: The satellite table captures all modifications related to the business key, shown in Figure 4. For this purpose, both the hub key attributes and the change hash key are needed. An incoming record is classified as existing or new based on the hub key. If it's new, the record is added to the satellite table; otherwise, the change hash key is compared with the stored hash key to determine whether changes need to be accepted. During the insertion or updating process, remember to add the current timestamp before pushing the data to the target.

Figure 4. Satellite table architecture

Data Vault 2.0 and Informatica for Improved Data Outcomes

Data Vault 2.0 offers a robust, adaptable and scalable structure for data warehousing, weaving in the advantages of diverse data modeling philosophies. It caters to the demands of large-scale data integration and historic preservation, providing an approach that adapts to changing business requirements. In parallel, Informatica introduces the dynamic mapping task component, enabling the utilization of the previously mentioned parameterized mapping across a variety of tables.

By using Data Vault 2.0 and Informatica together, organizations can improve their data quality, accessibility and usability at speed. This can lead to better decision-making and improved business outcomes.

To learn more about how Informatica can help bring your data life, visit www.informatica.com.

First Published: Jun 21, 2024