Siemens Healthineers Smart Data Catalog and Data Controller

Data Controller was implemented at Siemens Healthineers to facilitate their SAS-Powered Smart Data Catalog and integrate with Data Lineage reporting.  We are grateful to Helen Stark (Power User) and Hans-Juergen Kopperger (SAS App Developer) for sharing their “before and after” experiences.

The previous article in this series is available here.


Helen Stark

Hi Helen, it’s great to meet with you today. Can you tell us – what is your role within the business? 

I am the portfolio manager. What I do is a lot of demand management. So when people have new requests, like they want new data into our data lake, or they want it structured, or they want it visualized then I manage all that. For the Americas region, for North and south America. And I also do road shows, like marketing kind of aspect for the data scientists. And I do a smidge of project management here and there. 

Interesting. And what do you actually use Data Controller for? 

I use it to keep our Smart Data Catalog up to date. One of the things that I do is we have marketing posters, and so I put the links to those marketing posters in there so they are a part of our front end web design. And I also do the marketing videos, so adding those links. So really its just adding and deleting entries into Data Controller, so that our Smart Data Catalog is updated at all times. 

And that Smart Data Catalog is in SAS. Is there reason its in SAS, and not say – Excel? Is it used by other parts of the business? 

Yes, it’s used by the entire business 

I see. And what is the Smart Data Catalog? 

It’s a listing of all of the offerings that data governance has. So, I say data governance but its data governance and analytical services. So, we are the data owners for all of the data in the company. It’s the single source of truth for all Siemens health business data. And we use Data Controller to manage that. And Data Controller does some incredible things that I do not understand – such as being able to stage and preview the data before it’s made publicly available. I mean I honestly don’t understand it but it does some miraculous things. 

Nice feedback! Next question. How does Data Controller make your life easier? 

Because it’s so easy to use. Before we used an excel spreadsheet and it was quite unwieldy and it was bulky and it was so easy to make mistakes, just trying to remember where you were. And with Data Controller I love that I can filter first and get to exactly where I want to be and then I can edit. So, it really lessens the chance of me making a mistake. I love that, I am an editor, I am not an approver, but I like so I make an edit, but then it goes to an approver, so it’s like the four eye principle, something we did not have before. You can track changes, that’s amazing. Yeah, it made everything easier so whereas before I would dread updating the Smart Data Catalog, now you go in and it’s done in like 3 to 5 minutes. 

Superb. Ok, before Data Controller came along, how did you get data into SAS? 

So again, it was the foundation of this spreadsheet. You would have to check it out so there was some control over it. If somebody checked it out then nobody else could go in and make any changes, and you would have to wait for it to be checked back in and then run like a macro or some tool and then you could upload it and you could update your smart data search. It was a process that depended on how careful your co-workers were about remembering to check it back in. In short, it used to often take days to update the SAS environment, and now it takes minutes. 

Wow – that’s great to hear! Ok, Next up. What are your favourite Data Controller features and why? 

The filter. It will always be my favourite; it will never change. That’s really helpful. I just like how it makes it so much harder to make a mistake. With Data Controller its much harder to make a mistake. It’s less prone to human error. And the copy and paste is so easy, and yeah, there is nothing about it that I don’t love. 

Brilliant, I think that’s the best feedback we’ve ever had. 

Really, I mean it has made our lives so much easier, so much faster. Um yeah, I just love it. 

Thankyou so much! 


Hans-Juergen Hans-Juergen

Guten Morgen Hans-Juergen!  Can you tell us a little about your role within the business? 

Yes, I am working as a Data Integration Manager at Siemens Healthineers providing BI and Analytical Services for our colleagues. My department is the DGA – Data Governance and Analytical Services.  At Siemens-Healthineers we are analyzing “Big Data” from our computer systems – AT Angiography and Therapy, Computer Tomography CT, MR Magnetresonanz Tomography, and LD Labordiagnostik. 

In consequence of our Business Strategy “from Onsite to Online” our focus is to connect more and more systems to our BI backend.  With new services like “Condition-based Maintenance” or “Predictive Analysis”, we can now generate data-driven services to increase business values for our customers or even decrease our overall service costs. 

Our BI platform is based on the SAS technology stack.  To get our Business Analysts and Data Scientists nearer to our Data Lake I have created a Smart Data Catalog which is an interactive web application with a “google style” search facility. Now they can do their jobs more effectively without struggling to find and access accurate, complete, and trustworthy data. As a result, they spend less time searching for data and can actually focus on using data to generate analyses and impactful insights. 

Now that’s valuable!  And what do you use Data Controller for? 

To increase Data Quality while uploading backend data into our BI platform.  Data Controller provides “data version controland full traceability of changes.  We have several control tables that provide data for web applications like the “Smart Data Catalog, and with Data Controller it’s now an easy, controllable and manageable process to get these changes into the backend tables. 

In the past we had a custom Stored Process web app for uploading excel files, based on an excel template.  This process had negative consequences for Data Quality because it often happened that many different versions of the excel templates were created and in the end we didn’t know which was the latest version and which values we wanted to upload into the system. It could take a lot of time to clarify who had done which upload. 

I would often receive support tickets in relation to this upload, the cause of which was often due to the diversity of our excel templates, and being unsure which was the right template…So, we would have a lot of discussions about how to bring data into the backend in a controlled manner. 

Then one day, I got information through SAS User Group Germany that you provide a solution with Data Controller. I was initially interested in the Data Lineage functionality, but then I understood the main concept behind Data Controller. And for me the main benefit is that I can save a lot of time – with out of the box features like the web data editor, and the web upload facility with excel spreadsheet drag and drop.  And there is the automatic workflow behind with the mandatory approval step.  Since we implemented Data Controller, we no longer get those support tickets. 

Fantastic.  If you had to pick your top features, what would they be? 

The main benefit is getting data controlled, and into the backend. The controlled process, and the approval process, those are the main benefits. But we’ve got other benefits. For instance, we have Data Lineage now.  The Data Lineage diagrams could also be linked directly from our Smart Data Catalog using URL parameters.  This easy integration means our users can open the relevant page in Data Controller with one click. 

The transparency of the history page is another benefit. I can look at every requested submit or approval – what changes have been applied, what changes have been submitted, and what changes have been approved. This helps us a lot to get data transparency. 

The email alerts is a great feature.  For the communication of changes, we had previously created a team’s collaboration chat. e.g. if someone did a change and needed to request an approval.  But with email alerts, the notification of changes is now automatically sent to the responsible data owner, who can immediately click the email link and do his approvalThis speeds up the whole process. 

Another advantage is the “database approach” for updates. So, someone is changing one row in a table which is connected to his use case, another guy can change other rows of the same table, nearly simultaneously. Because not everyone is changing the same rows. Everyone has their own subset of rows, their own “workspace” within one table. In the past we would have one excel template, and this would always override all values. We would have a lot of excel templates going around our colleagues, so there were always conflicts of overrides and versioning, and stuff like that. With Data Controller, it’s now a simple, easy and transparent data capture process. 

Vielen Dank! 


Smart Data Catalog

Smart Data Catalog

 

Sarbanes-Oxley and Data Controller for SAS©

The Sarbanes-Oxley Act of 2002 has applied to all publicly-traded companies doing business in the US since 2006.  The penalties can be severe – if Uncle Sam considers a corporate officer to have deliberately submitted an inaccurate certification, the corporate fine is $5 million with up to twenty years in prison for the individual(s).  Accidental mis-certification (or non-submission) is just $1 million and 10 years in prison.

There are many aspects to full Sarbanes-Oxley (SOX) compliance, the legislation is over 60 pages long.  As with other regulatory obligations, the goal is to regularly provide enough evidence to satisfy the auditor that the requirements have been met.  As anyone running a compliance team knows, this is no small endeavour.  The ability to automate the generation of such evidence, or make it available automatically to auditors, can result in significant cost savings.

This article breaks down the areas where Data Controller can contribute to satisfying the requirements of the Sarbanes-Oxley Act.

Sarbanes-Oxley Act Section 404 – MANAGEMENT ASSESSMENT OF INTERNAL CONTROLS.

Data Controller facilitates internal controls through a 4 eyes review & approve mechanism for data changes. This, combined with data validation and an integrated workflow feature, provides a mechanism to easily track and report on the number of internal controls (quality rules, signoffs, rejections), as well as the frequency they are applied, who is applying them, which data items the controls relate to, and who is performing them.  Such metrics can be compared and contrasted with pre-existing and current quality measures to help determine control effectiveness.  Variations in the number of submit / approve cycles between reporting teams, also provide objective and repeatable measurements to support the assessment of the effectiveness of internal controls.

Sarbanes Oxley

Sec 404. (Sarbanes-Oxley)

 

Section 404 is widely considered the most onerous part of Sarbanes-Oxley, as the documentation and testing of all the controls requires significant time and effort.  To address this, the Public Company Accounting Oversight Board (PCAOB – a US non-profit created by the Sarbanes-Oxley act itself) released additional guidance to assist management and auditors in producing their reports.  This is officially labeled “Auditing Standard No. 5 – An Audit of Internal Control Over Financial Reporting That Is Integrated with An Audit of Financial Statements”

A few points are highlighted by the guidance in this standard that are pertinent to users of Data Controller.

PCAOB AS5 Sec24 – Controls Over Management Override

Management Overrides (the freedom to simply “replace” reporting figures based on, presumably, sound judgement) are entity level controls that can be easily captured (in a centralised manner) by Data Controller.   This in fact, is the “core functionality” of the tool.  Data Stewards / Data Processors (Editors) make the change, then one or more Data Owners / Data Controllers (Approvers) sign it off before it is applied to the target table.  A copy of the original excel file (if used) and a record of who made the change, when, what the change was, and why (if a reason is provided) is recorded.  Data Validation rules can also be defined to ensure that inputs fit the desired pattern(s).

Sarbanes Oxley sas management overrides

For fun, we made a short video for this part:

 

PCAOB AS5 Sec27 – Identifying Entity-Level Controls

Sarbanes Oxley SAS Section 24

In the area of documenting the inputs, transformations and outputs of data flows within an organisation, SAS particularly shines, especially in the version 9 world.  The table and column level lineage generated by SAS Data Integration provides a highly detailed view of the data lineage.  Below is an example of Table level lineage, which colour codes each table according to it’s library and captures the detail of each SAS job along the way.  Clicking on a job will open the job in the metadata viewer.  Clicking the table will open the table in VIEW mode.  The lineage is shown all the way from source to target(s), or target to source(s) and can be exported in PNG, SVG, or CSV format.

SAS Table Level Lineage Sarbanes Oxley

SAS Table Level Lineage

Below is an example of column level lineage.  Like Table Level lineage, this can be performed forwards or backwards and exported in multiple formats.  Each arrow represents a SAS transform.  Where business logic is applied, this is additionally extracted and showed in red.

SAS Column Level Lineage Sarbanes Oxley

SAS Column Level Lineage

 

The ability to define additional data lineages, outside of SAS (eg between spreadsheets or other reporting systems) is in the product roadmap, along with lineage from SAS Viya.

PCAOB AS5 App B – Benchmarking of Automated Controls

The use of IT secured financial controls can significantly reduce the cost of Sarbanes-Oxley compliance testing following the first year assessment, particularly where the source code is secured and cannot be modified by users.  The core programs (services) within the Data Controller application that perform data signoffs are mature, distinct and change tracked – so it is possible for Data Controller to be upgraded in-place without affecting the benchmarking strategy.  This contrasts with spreadsheet based control mechanisms, which must be revalidated in each reporting period.

Sarbanes Oxley SAS

PCAOB Release 2007-005A, Appendix B

Sarbanes-Oxley Act Section 1102 – Tampering

Coming back to the original 2002 SOx paper, there is an additional stick being waved against those who destroy records.  This is, unfortunately, a common occurrence in DWh landscapes – poorly designed data models often result in frequent rebuilds of monthly datamarts when issues are found.

If your BI / ETL teams are routinely destroying / modifying database records as part of regular work efforts, you might wish to:  a) ensure there is a well documented ticketing system to make sure those individuals are protected from any accusations, or  b) implement a Bitemporal data model to ensure a full and transparent version history of data is always kept regardless of rebuilds.

IT-secured tools such as Data Controller enable auditors to see easily for themselves who has changed a record, when, why, and who signed it off – thereby vastly reducing the potential for unintentionally impeding an investigation.

sarbanes oxley SAS

SEC. 1102. (Sarbanes Oxley)

Sarbanes Oxley and SAS

We chose SAS as the platform on which to build Data Controller as it is very reliable, provides excellent support for data drivers (enables our code to run inside almost any database), long term customer support, and is very easy to deploy against.  The demo version of Data Controller can be deployed in under 30 seconds (on a SAS 9 platform).  With SAS there are no additional servers to provision, firewalls to configure, scaling issues to address – everything works “out of the box”.  SAS also integrates nicely with existing enterprise authentication mechanisms such as LDAP, and the platform is typically fully secured under your existing IT policies at the backend.

Data Controller is built on SASjs and hence we have versions for both SAS 9 and Viya.  Do get in touch to learn more.

Data Controller – a BICC perspective

We caught up with Herbert Grossmann of DER Touristik to understand how Data Controller for SAS is used within the BICC and the types of challenges it solves. 

The previous article in this series can be found here.

Guten Tag, Herby! Can you tell us about your role within DER Touristik?

Yes, I am working here as project manager for BI and Analytics and my department is the BICC (Business Intelligence Competence Centre), and we have an absolute focus on the SAS technology stack – so that’s my daily business.

Great. And, I understand you guys are using Data Controller for SAS. What do you use it for?

Well, mainly for managing control tables, that we have a lot of nowadays, in the data warehouse. But we also implemented what we call an “early bird booking system”. There we have used the Approval process within Data Controller, which is excellent, because users, business departments etc, can approve data that would normally only be accessible within the back-end warehouse itself. So now they have an interface, which limits their access to specific views, and this is very useful – it was also highly commended by our management.

So, business users can approve modifications to secure warehouse tables without having direct write-access themselves?

Exactly

Fantastic. Next question. How does having Data Controller make your life easier?

Well – there is the version control of course, that gives us a much better traceability of changes to see what was changed by whom, at what time. And we have the immediate constraint checking which is also very useful because some of the tables are sensitive towards, let’s say, the changes of the primary key. And in the past when we did it the “old fashioned way” it was possible that by mistake that someone could cause duplicate primary keys or stuff like that, so this is now not possible anymore, which is very good. And like the example that I mentioned before, that now we can grant access to certain sensitive tables even for business users that would normally have no access, but we can decide whether to give them at least the right to view these tables, or during special events edit tables, or approve edits of those tables. So this gives a lot of opportunities, and makes it much easier than it was in the past.

Nice! And so, talking about the past, before you had Data Controller, how did you manage modifications to data in SAS?

We classically used two approaches – on one hand using SAS Enterprise Guide to directly edit tables or do imports, such as imports of excel sheets for example. On the other hand, we have some batch processes that also do imports of Excel tables or CSV tables. So those were the classic and standard ways. And of course especially the batch one we are still using for some files, depending on the situation. But we do no editing of tables directly with Enterprise Guide anymore because it is much safer and easier to use the Data Controller.

Understood. So on the Data Controller side, what would you say were your favourite features and why?

I would say that I like the editor as a whole very much. I think that is great that in the moment you make a table editable, you can define the ways in which you would edit the tables. Like whether there is some historic logging or not, and the fact you can set the constraints. And in the editor then you have a lot of Data Quality opportunities such as defining drop-down lists for certain attributes, which really makes editing the tables easier and much more comfortable. It was a little bit of a pain in the past but now it’s almost fun.

That’s great feedback! Is there anything else, any comments you would like to add?

Yes, I like the fact that Data Controller is really just a part of the SAS environment. It’s not a completely separate application that you have to install somewhere, but a kind of pluggable part of the SAS environment. I liked it very much because then you still have everything in your hands. I mean I am not a developer but my knowledge of SAS is already enough to match the criteria to be able to handle the Data Controller as whole, to even do the updates and/or to modify things. And also it’s easy to show others who have experience with SAS how the tool works and what is to be done when there are data issues. And yeah, I think that’s a big advantage.

SAS DER Touristik

Bitemporal Historisation and the SAS DDS

Does the SAS DDS support Bitemporal historisation?  Why yes indeed, with the right transform and extract logic, and updates to the primary keys.

The SAS Detailed Data Store (DDS) suite provides “out of the box” data models for industries such as Banking, Insurance and Telco.  They arrive with logical and physical models (eg in Erwin format), and a set of DDL files for your preferred database.  These assets help to accelerate the delivery and deployment of a company data warehouse that has the further advantage of standardised integration with SAS Solution offerings.  The model is also very large.  It is highly likely that much of the model will be unused, especially in industries like insurance – where corporate structures, regulatory environments, and product offerings can be quite diverse.

As a “baseline” model then, another way to utilise it is to take it as just that – a model, that provides guidance on putting together a warehouse that suits you, the customer.  There is nothing to stop you picking & choosing the parts you like, that make sense for your particular use case(s).

The Validity of SCD2

The historisation approach in the DDS is based on the “VALID_FROM_DTTM” and “VALID_TO_DTTM” columns.  These provide the open and close datetime pairs representing the ‘validity’ of the record.  This – is where the confusion begins.

What is ‘validity’?  Perhaps this represents the ‘truth’, eg the number of widgets we sold last month.  So what is this truth?  Is it the state of the database (transaction datetimes) between the 1st and the end of the month?  Or does it represent the current view of last months widget sales, which were finally loaded on, say, the 15th of the following month?

Dealing with this scenario (late arriving records) poses a number of challenges for the SCD2 model:

  • Reloading historical data (as records must be physically removed, in order to reload)
  • Loading corrections (again, records must first be removed)
  • Maintaining an audit history, or even the ability to run the same query twice and get the same result.
An emerging consensus from the datawarehousing domain is the use of bitemporal datetime ranges for managing such requirements. The below article borrows heavily from this excellent paper by Arnd Wussing, which explains the topic in much greater detail.

Background to Bitemporal

The concept of Bi-Temporal Historisation is not new – it was originally associated with a chap called Richard Snodgrass back in 1992. There is further info on Wikipedia and this blog, and a more recent article on medium.

Teradata have specifically implemented temporal features, which (interestingly) holds the datetime pairs in a single column (see attachment).   Notice the SAS DDS and Teradata nomenclature differences (for SAS DDS: VALID typically means Transaction Datetimes; for Teradata: VALID refers to Business Datetimes).

bitemporal

 Furthermore, this SUGI paper ( http://www2.sas.com/proceedings/sugi29/110-29.pdf) covers the issue.  Here is an extract (page 8): 

Versioning history (Type Two style) will always require at least a single updated date for the record, and two valid from / valid to dates if using a normalized data model. You will also require two dates in a star schema if past point in-time history queries are to be easily run in a single query.

Business history may also dictate a need for effective from / effective to dates when these may differ from the data warehouse versioning dates. This is especially true in certain sectors, like insurance, where value of business is counted over a period rather than a single time. It is also common when such changes are ‘forward-dated’ in operational systems.

So – enough of the background – what on earth is “Bitemporal Historisation” anyway?

Bitemporal Historisation – Overview

Once you ‘get it’, the approach is conceptually very simple. There are essentially just TWO datetime pairs to consider:

1 – Transaction datetimes. These from/to datetimes show when the warehouse table is populated. They effectively constitute a ‘version number’ for the data. If we want the latest ‘version’ of data, we query using a ‘high datetime’. If we want the version of data which existed yesterday, we query using yesterday’s date. This datetime-pair provides full auditability of results. Note that 99.999% of the time we will always query using the high datetime (current version, latest transactions). This is a standard SCD type 2 loading process. EVERY table must have these datetimes.  The rest of this document will use the term ‘Transaction’ datetimes, to denote when the record was actually transacted, or committed, to the database.

2 – Business datetimes. These from/to datetimes show the period to which the data actually relates. NOT every table will have these datetimes – for many queries we are happy to use the current version of, say, a mapping table, even when producing results for historical periods. The rest of this document will use the term ‘Business’ dates.

Bitemporal Historisation in Detail

The concept of bitemporal historisation relates to the approach of storing both both business (real world) history alongside transaction (version) history in the same table.  This approach aims to achieve the following goals:

  • Queries always produce the same result, even if the data changes
  • Data changes can be audited. Each change is traceable.
  • Existing queries can be easily adapted by adding temporal ranges to the WHERE clause.
  • Programming overhead is reasonable.
  • Historisation rules can be understood by business users.

The goal of query repeatability is particularly important in regulated environments.  In order to repeat results reliably, two time points must always be included in the query.  An example might be:

Which motorcyle coverage did Miss Careful have on April 1st, 2020 as we knew it on April 3rd, 2020?”

– The first date pair represents a business coverage period. This implies that each coverage must have a BusinessFrom and BusinessTo datetime to show when the coverage starts and finishes.

– The second date represents the point in time at which we made (or would have made) the query.  Being a snapshot of the database contents, this is termed the “transaction” date. The rest of this document will use TransactionFrom and TransactionTo for column names.

The above query translated into Bitemporal format might look like this:

SELECT coverage
FROM customer_coverage_table AS c
WHERE c.Contact_LName = 'Fudd'
AND (c.BusinessFrom le '2020-04-01:00:00:00'dt lt c.BusinessTo) 
AND (c.TransactionFrom le '2020-04-03:00:00:00'dt lt c.TransactionTo);

Why aren’t we using BETWEEN?   Because between is evil!

Bitemporal Prerequisites and Implications

Implementing a bitemporal approach requires a few principles to be adopted.

Records are Never Modified

With the exception of the TransactionTo datetime field (and maybe the PROCESSED_DTTM in the DDS model),  once loaded, a record must never be modified (or deleted).  This would violate the objective of query repeatability.

Matching Close / Open Dates

When a transaction is closed out and re-opened, or if business values are changing over time, the closing datetime must equal the opening datetime.  This is to prevent the “temporal gap” that can happen when you close out a record at, say, 23:59:59 and re-open it at 00:00:00.  What happens if you query at “23:59:59.5” ?  The data has disappeared!!

Note – not all ETL tools have this capability.  It’s common for an SCD2 load to add a second, or a day, when opening new records.

Business / Transaction FROM must be less than the TO value

Leading on from the previous point, FROM and TO dates cannot be equal, and it also follows that queries should be formed as follows:

SELECT * 
FROM sometable as t
WHERE t.pk = 'some key value'
AND (t.BusinessFrom le &BUSFROM lt t.BusinessTo) 
AND (t.TransactionFrom le &TXFROM lt c.TransactionTo);

The above query would always return either 0 or 1 records.  It’s imperative that there can only be a single record for a particular key value at a particular point in transaction + business time.

Simple Bitemporal Examples

Looking at the following (dummy) hierarchy, imagine we first loaded a table on 01JAN2019.

bitemporal

In the first case, consider the ERROR in the country code for XYZ Capital. This was spotted on 8th Feb 2019. The table is updated as follows:

In the second case, lets consider a business change in NAME from “Crypto Fund” to “Doge GmbH”. The need for this change was raised by the actuaries and performed by the IT team on 4th July 2020. However the actual (legal) change in name occurred on 20th April 2020. The data is updated as follows:

In this manner, the previous results can always be reproduced (audited), and an “up to date” version of past periods can also be generated.

Complex Bitemporal Example

It can be seen that iterative insertions in the bitemporal model are fairly straightforward, but how will it deal with historical restatements?

It is noted that ALL historical restatements deal with the scenario of overlapping ranges. The most complex of these is the situation below:

bitmporal overlapping

The solution is simply to remove the overlap and create three new records:

bitemporal ranges

Lets see how this would apply to our data. It is decided by the new CFO on 6th August to temporarily rename “Trust Us Provincial” to “So Very Solvent SA” for the IFRS17 year end results.  Who are we to argue!

The table is dutifully updated as follows:

We simply query the table (for the natural key “3”) where TechnicalTo equals high date. This gave us 1 record. The new record was inserted ‘from’ 31DEC2019. It applied ‘To’ 01JAN2020.  There are now 3 records of business history for the current transaction version of that natural key entry.

Summary

Bi-temporal historisation can solve  many date stamping woes and allow safe modifications to business history without affecting auditability (reproducability) of results.  This is far more efficient than taking snapshots of the database, and far easier to work with.

SAS does not ship with a Bitemporal transform, however – Data Controller does.  It also provides full Data Lineage (forwards & reverse, table & column level, including business logic applied).  DDS features such as retained keys, PROCESSED_DTTM columns, and of course – SCD2 is also supported.  A Data Catalog, Data Dictionary, and DDL generator are also included.  We’d love to assist you in your Data Warehousing project – feel free to get in touch.