Bitemporal Historisation and the SAS DDS

Does the SAS DDS support Bitemporal historisation?  Why yes indeed, with the right transform and extract logic, and updates to the primary keys.

The SAS Detailed Data Store (DDS) suite provides “out of the box” data models for industries such as Banking, Insurance and Telco.  They arrive with logical and physical models (eg in Erwin format), and a set of DDL files for your preferred database.  These assets help to accelerate the delivery and deployment of a company data warehouse that has the further advantage of standardised integration with SAS Solution offerings.  The model is also very large.  It is highly likely that much of the model will be unused, especially in industries like insurance – where corporate structures, regulatory environments, and product offerings can be quite diverse.

As a “baseline” model then, another way to utilise it is to take it as just that – a model, that provides guidance on putting together a warehouse that suits you, the customer.  There is nothing to stop you picking & choosing the parts you like, that make sense for your particular use case(s).

The Validity of SCD2

The historisation approach in the DDS is based on the “VALID_FROM_DTTM” and “VALID_TO_DTTM” columns.  These provide the open and close datetime pairs representing the ‘validity’ of the record.  This – is where the confusion begins.

What is ‘validity’?  Perhaps this represents the ‘truth’, eg the number of widgets we sold last month.  So what is this truth?  Is it the state of the database (transaction datetimes) between the 1st and the end of the month?  Or does it represent the current view of last months widget sales, which were finally loaded on, say, the 15th of the following month?

Dealing with this scenario (late arriving records) poses a number of challenges for the SCD2 model:

  • Reloading historical data (as records must be physically removed, in order to reload)
  • Loading corrections (again, records must first be removed)
  • Maintaining an audit history, or even the ability to run the same query twice and get the same result.
An emerging consensus from the datawarehousing domain is the use of bitemporal datetime ranges for managing such requirements. The below article borrows heavily from this excellent paper by Arnd Wussing, which explains the topic in much greater detail.

Background to Bitemporal

The concept of Bi-Temporal Historisation is not new – it was originally associated with a chap called Richard Snodgrass back in 1992. There is further info on Wikipedia and this blog, and a more recent article on medium.

Teradata have specifically implemented temporal features, which (interestingly) holds the datetime pairs in a single column (see attachment).   Notice the SAS DDS and Teradata nomenclature differences (for SAS DDS: VALID typically means Transaction Datetimes; for Teradata: VALID refers to Business Datetimes).

bitemporal

 Furthermore, this SUGI paper ( http://www2.sas.com/proceedings/sugi29/110-29.pdf) covers the issue.  Here is an extract (page 8): 

Versioning history (Type Two style) will always require at least a single updated date for the record, and two valid from / valid to dates if using a normalized data model. You will also require two dates in a star schema if past point in-time history queries are to be easily run in a single query.

Business history may also dictate a need for effective from / effective to dates when these may differ from the data warehouse versioning dates. This is especially true in certain sectors, like insurance, where value of business is counted over a period rather than a single time. It is also common when such changes are ‘forward-dated’ in operational systems.

So – enough of the background – what on earth is “Bitemporal Historisation” anyway?

Bitemporal Historisation – Overview

Once you ‘get it’, the approach is conceptually very simple. There are essentially just TWO datetime pairs to consider:

1 – Transaction datetimes. These from/to datetimes show when the warehouse table is populated. They effectively constitute a ‘version number’ for the data. If we want the latest ‘version’ of data, we query using a ‘high datetime’. If we want the version of data which existed yesterday, we query using yesterday’s date. This datetime-pair provides full auditability of results. Note that 99.999% of the time we will always query using the high datetime (current version, latest transactions). This is a standard SCD type 2 loading process. EVERY table must have these datetimes.  The rest of this document will use the term ‘Transaction’ datetimes, to denote when the record was actually transacted, or committed, to the database.

2 – Business datetimes. These from/to datetimes show the period to which the data actually relates. NOT every table will have these datetimes – for many queries we are happy to use the current version of, say, a mapping table, even when producing results for historical periods. The rest of this document will use the term ‘Business’ dates.

Bitemporal Historisation in Detail

The concept of bitemporal historisation relates to the approach of storing both both business (real world) history alongside transaction (version) history in the same table.  This approach aims to achieve the following goals:

  • Queries always produce the same result, even if the data changes
  • Data changes can be audited. Each change is traceable.
  • Existing queries can be easily adapted by adding temporal ranges to the WHERE clause.
  • Programming overhead is reasonable.
  • Historisation rules can be understood by business users.

The goal of query repeatability is particularly important in regulated environments.  In order to repeat results reliably, two time points must always be included in the query.  An example might be:

Which motorcyle coverage did Miss Careful have on April 1st, 2020 as we knew it on April 3rd, 2020?”

– The first date pair represents a business coverage period. This implies that each coverage must have a BusinessFrom and BusinessTo datetime to show when the coverage starts and finishes.

– The second date represents the point in time at which we made (or would have made) the query.  Being a snapshot of the database contents, this is termed the “transaction” date. The rest of this document will use TransactionFrom and TransactionTo for column names.

The above query translated into Bitemporal format might look like this:

SELECT coverage
FROM customer_coverage_table AS c
WHERE c.Contact_LName = 'Fudd'
AND (c.BusinessFrom le '2020-04-01:00:00:00'dt lt c.BusinessTo) 
AND (c.TransactionFrom le '2020-04-03:00:00:00'dt lt c.TransactionTo);

Why aren’t we using BETWEEN?   Because between is evil!

Bitemporal Prerequisites and Implications

Implementing a bitemporal approach requires a few principles to be adopted.

Records are Never Modified

With the exception of the TransactionTo datetime field (and maybe the PROCESSED_DTTM in the DDS model),  once loaded, a record must never be modified (or deleted).  This would violate the objective of query repeatability.

Matching Close / Open Dates

When a transaction is closed out and re-opened, or if business values are changing over time, the closing datetime must equal the opening datetime.  This is to prevent the “temporal gap” that can happen when you close out a record at, say, 23:59:59 and re-open it at 00:00:00.  What happens if you query at “23:59:59.5” ?  The data has disappeared!!

Note – not all ETL tools have this capability.  It’s common for an SCD2 load to add a second, or a day, when opening new records.

Business / Transaction FROM must be less than the TO value

Leading on from the previous point, FROM and TO dates cannot be equal, and it also follows that queries should be formed as follows:

SELECT * 
FROM sometable as t
WHERE t.pk = 'some key value'
AND (t.BusinessFrom le &BUSFROM lt t.BusinessTo) 
AND (t.TransactionFrom le &TXFROM lt c.TransactionTo);

The above query would always return either 0 or 1 records.  It’s imperative that there can only be a single record for a particular key value at a particular point in transaction + business time.

Simple Bitemporal Examples

Looking at the following (dummy) hierarchy, imagine we first loaded a table on 01JAN2019.

bitemporal

In the first case, consider the ERROR in the country code for XYZ Capital. This was spotted on 8th Feb 2019. The table is updated as follows:

In the second case, lets consider a business change in NAME from “Crypto Fund” to “Doge GmbH”. The need for this change was raised by the actuaries and performed by the IT team on 4th July 2020. However the actual (legal) change in name occurred on 20th April 2020. The data is updated as follows:

In this manner, the previous results can always be reproduced (audited), and an “up to date” version of past periods can also be generated.

Complex Bitemporal Example

It can be seen that iterative insertions in the bitemporal model are fairly straightforward, but how will it deal with historical restatements?

It is noted that ALL historical restatements deal with the scenario of overlapping ranges. The most complex of these is the situation below:

bitmporal overlapping

The solution is simply to remove the overlap and create three new records:

bitemporal ranges

Lets see how this would apply to our data. It is decided by the new CFO on 6th August to temporarily rename “Trust Us Provincial” to “So Very Solvent SA” for the IFRS17 year end results.  Who are we to argue!

The table is dutifully updated as follows:

We simply query the table (for the natural key “3”) where TechnicalTo equals high date. This gave us 1 record. The new record was inserted ‘from’ 31DEC2019. It applied ‘To’ 01JAN2020.  There are now 3 records of business history for the current transaction version of that natural key entry.

Summary

Bi-temporal historisation can solve  many date stamping woes and allow safe modifications to business history without affecting auditability (reproducability) of results.  This is far more efficient than taking snapshots of the database, and far easier to work with.

SAS does not ship with a Bitemporal transform, however – Data Controller does.  It also provides full Data Lineage (forwards & reverse, table & column level, including business logic applied).  DDS features such as retained keys, PROCESSED_DTTM columns, and of course – SCD2 is also supported.  A Data Catalog, Data Dictionary, and DDL generator are also included.  We’d love to assist you in your Data Warehousing project – feel free to get in touch.