Skip to main content

Shapes and Forms of Structured Data: SCD Types, Master Full, Master Incremental, Unitemporal, and Bitemporal

· 10 min read
Matteo

Structured data forms the foundation of many data-driven systems and is crucial for effective data analysis and decision-making. Within the realm of structured data, there are different shapes and forms that enable organizations to manage and utilize data in diverse ways. In this blog post, we will explore several important concepts related to structured data, including SCD (Slowly Changing Dimensions) types, Master Full, Master Incremental, Unitemporal, and Bitemporal data.

Slowly Changing Dimensions (SCD)

Slowly Changing Dimensions refer to the nature of data that evolves over time in a data warehousing context. SCDs capture changes in dimensional attributes, such as customer addresses, product specifications, or employee roles, while maintaining historical records. There are different SCD types to manage these changes:

SCD Type 1

This type overwrites existing data with updated information, effectively losing historical details. It is suitable when historical data is not critical, such as in cases where only the most recent values are needed.

SCD Type 2

Type 2 preserves historical data by creating new records for each change, typically through the addition of an effective date and expiration date. This allows for accurate tracking of historical changes and provides a complete audit trail.

Let's consider an example of Slowly Changing Dimensions (SCD) Type 2 in the context of a customer database. Suppose we have a table that stores customer information, including their name, address, and membership status. Initially, a customer named John Smith signs up with the address "123 Main Street" and is assigned a membership status of "Standard."

Later, John moves to a new address, "456 Oak Avenue." Instead of updating the existing record, SCD Type 2 creates a new record with the updated address and assigns it a new effective date, indicating when the change took place. The previous record for John Smith remains in the database, capturing the historical address and its associated time frame. The updated record reflects John's current address and effective date.

By utilizing SCD Type 2, the customer database can maintain a complete audit trail of changes. It allows for tracking John's address history, which can be useful for analyzing customer behavior, understanding migration patterns, or generating accurate reports based on specific time periods.

Customer IDNameAddressMembership StatusEffective Start DateEffective End Date
1John Smith123 Main StreetStandard2021-01-012022-02-28
1John Smith456 Oak AvenueStandard2022-03-01(Current Record)

SCD Type 3

Type 3 retains limited historical information by adding additional columns to capture a limited number of changes. It sacrifices full historical tracking but can be useful in cases where only recent changes need to be analyzed.

Let's continue with the same customer database example, but this time, let's explore how Slowly Changing Dimensions (SCD) Type 3 would handle changes in customer information.

Initially, John Smith signs up with the address "123 Main Street" and is assigned a membership status of "Standard." Instead of creating a new record for each change, SCD Type 3 adds additional columns to capture limited historical information.

When John moves to a new address, "456 Oak Avenue," the existing record is updated with the new address, but the previous address is retained in a separate column designated for the previous value. In addition, a separate column captures the effective date of the change.

So, in the SCD Type 3 example, the customer record for John Smith would contain the following information: his current address, the previous address, and the date of the address change.

By utilizing SCD Type 3, the customer database retains some historical information while still allowing for efficient storage and retrieval. This approach is useful when limited historical tracking is required, and only a few key changes need to be captured and analyzed.

Customer IDNameCurrent AddressPrevious AddressAddress Change DateMembership Status
1John Smith456 Oak Avenue123 Main Street2022-03-01Standard

Master Full and Master Incremental

Master Full is a form of structured data management where a centralized "master" dataset contains all relevant information about a specific entity. In this approach, updates to the master dataset are performed in bulk, typically by replacing the entire dataset with a fresh copy. Master Full is suitable when the dataset is relatively small or when updates occur infrequently. It ensures consistency across systems that rely on the master data but may not be ideal for real-time or frequent updates.

Master Incremental is another form of structured data management where changes to the master dataset are made incrementally, without replacing the entire dataset. Instead, only the modified or new records are updated. This approach is efficient when dealing with large datasets or when frequent updates occur. Master Incremental allows for faster processing times and minimizes the need to process unchanged data, but it requires careful tracking of changes and synchronization between systems.

Let's illustrate the concepts of Master Full and Master Incremental using an example of a product catalog management system.

In the Master Full approach, the entire product catalog is replaced with a fresh copy when updates are made. The "Status" column represents the active state of each product. When a product is no longer available or needs to be deleted, the entire record is removed from the dataset. However, this approach does not retain any historical information about deleted products.

Product IDNameDescriptionPriceEffective DateStatus
1Product ADescription of Product A$10.992021-01-01Active
2Product BDescription of Product B$15.992021-01-01Active
3Product CDescription of Product C$8.992021-01-01Active

In the Master Incremental approach, updates are made by modifying only the relevant records. New products or modified details are added as new records with their respective effective dates. For representing deletes, a new record is added with an updated status indicating that the product is now inactive or deleted. In the example above, Product B's status is changed to "Inactive" with an effective date of June 15, 2022, indicating that it is no longer available.

By implementing the Master Incremental approach, the product catalog management system maintains a history of changes, including deletes, by adding new records rather than removing existing ones. This allows for accurate tracking and analysis of product data over time while efficiently managing updates and deletions.

Product IDNameDescriptionPriceEffective DateStatus
1Product ADescription of Product A$10.992021-01-01Active
2Product BDescription of Product B$15.992021-01-01Active
3Product CDescription of Product C$8.992021-01-01Active
4Product DDescription of Product D$12.992022-05-10Active
22022-06-15Inactive

Unitemporal and Bitemporal

Unitemporal data refers to structured data that incorporates a single valid time dimension. It captures not only the current state of the data but also the validity period during which each record was considered accurate. Unitemporal data is valuable in scenarios where the historical context of changes is crucial for analysis, compliance, or auditing purposes. It enables tracking changes and conducting retrospective analysis based on different time periods.

Bitemporal data combines two time dimensions: valid time and transaction time. Valid time represents the period during which a record is considered accurate, while transaction time captures when a change was made to the record. Bitemporal data is commonly used in scenarios where analyzing the temporal aspects of data changes is critical, such as in financial systems or legal applications. It allows for precise tracking of changes and their timing, providing a comprehensive historical view.

Let's consider a real estate property database that tracks the historical ownership and value of properties. We want to capture the current and past ownership details along with their validity periods.

Property IDOwnerStart DateEnd Date
1John Smith2010-01-012015-06-30
1Sarah Johnson2015-07-01(Current)
2Alex Williams2012-03-15(Current)
3Emma Thompson2014-05-102022-08-31

In this Unitemporal example, the data captures the current and historical ownership information for properties. Each property has a unique Property ID. For instance, Property ID 1 was owned by John Smith from January 1, 2010, to June 30, 2015. Then, the ownership was transferred to Sarah Johnson from July 1, 2015, until the current date, which is represented as "Current." Property ID 2 is currently owned by Alex Williams, and Property ID 3 was owned by Emma Thompson until August 31, 2022.

By utilizing Unitemporal data, the property database can maintain a complete history of ownership records, enabling analysis based on specific timeframes and generating accurate reports reflecting changes in property ownership over time.

Let's continue with the real estate property database example, but this time, we will illustrate the Bitemporal approach to track the historical ownership and value of properties, capturing both valid time and transaction time.

Property IDOwnerStart DateEnd DateTransaction DateProperty Value
1John Smith2010-01-012015-06-302010-01-05$100,000
1Sarah Johnson2015-07-01(Current)2015-07-10$150,000
2Alex Williams2012-03-15(Current)2012-03-20$200,000
3Emma Thompson2014-05-102022-08-312014-05-15$300,000

In this Bitemporal example, the data includes both the valid time and transaction time. Each property has a unique Property ID, and ownership records are associated with specific owners. Additionally, the transaction date represents when the ownership change or value update occurred.

For instance, Property ID 1 was initially owned by John Smith from January 1, 2010, to June 30, 2015, with a transaction date of January 5, 2010, and a property value of $100,000. The ownership was then transferred to Sarah Johnson, effective from July 1, 2015, until the current date, with a transaction date of July 10, 2015, and an updated property value of $150,000.

Similarly, Property ID 2 is currently owned by Alex Williams, with a transaction date of March 20, 2012, and a property value of $200,000. Property ID 3 was owned by Emma Thompson until August 31, 2022, with a transaction date of May 15, 2014, and a property value of $300,000.

By utilizing Bitemporal data, the property database can accurately track the historical ownership records and property values, considering both the valid time and transaction time. This enables precise analysis of property ownership and value changes over specific time periods, facilitating comprehensive historical reporting and audit trails.

Conclusion

Understanding the various shapes and forms of structured data is essential for effective data management and analysis. From managing changes in Slowly Changing Dimensions (SCDs) to adopting Master Full or Master Incremental approaches, and from leveraging the temporal aspects of Unitemporal and Bitemporal data, each concept offers unique benefits