The Business Intelligence Blog

November 4, 2009

Slowly Changing Dimensions in Data Warehouse


Some basic definitions
Natural key – Natural keys are identifiers carried over from source systems. Natural keys are also called as business Keys or source system keys.
Surrogate key – A surrogate key is a unique value, usually an integer, assigned to each row in the dimension table in the Data Warehouse and is the primary key of the dimension table.

Now, let’s get into the meet and potatoes of Slowly Changing Dimensions.

Dimensions with attributes that change slowly over time are called Slowly Changing Dimensions(SCD). Many a times, an attribute changes and the business should decide whether or not it wants to maintain history. Generally, there are three techniques to deal with Slowly Changing Dimensions based on the decision the business makes.

Type1: This technique is used to deal with attribute changes the business doesn’t care about. In this technique, the old attribute value in the dimension row is simply overwritten, replacing it with the current value. In doing so, the attribute always reflects the most recent value. This technique is easy to implement, but it destroys history.

Type2: This technique is used to deal with attribute changes the business wants to track. Type 2 is implemented by adding a new row to the dimension table; this row will have the same business key as the old row; however an artificial key called the surrogate key will be used to uniquely identify each record. The ETL process uses each business key in the fact table to look up the current associated surrogate key for every dimension table linked to the fact table. The natural key will act like glue that holds the multiple type 2 records for a single attribute together. The benefit of this technique is that it will preserver history, but will be little difficult to maintain or implement. Also, if several Type II changes take place, there will be several rows each with different surrogate key but with the same business key.

Type3: This is another technique (though less common because it involves changing the table structure) that is used to deal with attribute changes the business wants to track. This technique keeps separate columns for both the old and new attribute values.

I am going to talk about (or rather write aboutJ) Type2 and Type 1 implementation here using the Slowly Changing Dimension Wizard available in Microsoft SQL Server Integration Services. To start off, let’s assume that we have a table called Employee as shown below in our OLTP system and we want to load this table into our Data Warehouse. Please note that the Natural Key or Source system Key in this table is EmployeeNumber

untitled1

Let’s create an empty table called DimEmployee in our Data Warehouse to load the data from the OLTP system. Please see below for the table structure.

untitled2

This table has almost the same structure as that of the Employee table in the OLTP system. The StartDate, EndDate columns are used to identify current and expired records and the EmployeeKey column is used as a Surrogate key. Usually, the Surrogate key is an integer, identity column. Please see the screenshot above for more details on how to set the identity property.

Now let’s start creating a new SSIS package to load the data from Employee table into DimEmployee table, implementing DimEmployee as a Slowly Changing Dimension.
1. Drag a Data Flow Taskonto the Control Flow pane as shown in the figure below.

untitled3

2. Double-clicking the Data Flow Task> will open Data Flow tab.

3. Drag OLE DB Source, double-click to edit/configure the properties such as connection to the source table (in this case Employee table) as shown below.  Click ok to close the dialog box after filling in the required fields.

untitled5

4. Drag Slowly Changing Dimension and connect the OLE DB Source to Slowly Changing Dimension as shown below.

untitled6

5. Double-click Slowly Changing Dimension& to launch the Slowly Changing Dimension Wizard as shown below. In the Connection manager drop down, select the connection to the Data Warehouse table (in this case DimEmployee table). In the able or view drop down filed, select DimEmployee table. Map the columns in the source Employee table to columns in the Data Warehouse DimEmployee table. In the Key Typefor EmployeeNumber, selectBusiness Key. As mentioned earlier, StartDate and EndDate are used to identify current and expired records. Your screen should be as shown below.

untitled7

6. Click next; the <b>Slowly Changing Dimension Columns</b> dialog appears as shown below. For Type1 columns, select the Change Type as Changing Attribute, for Type2 columns, select the Change Type as Historical Attribute, for columns whose value should not change, select Change Type as Fixed Attribute.

untitled8

Now who will decide which attribute is Type1 and which is Type2? The Business, I mean the stake holders decide which columns are of which type.

7. Click next to see the Fixed and Changing Attribute Options dialog as shown below as select the appropriate options.

untitled9

8. Click next; Historical Attribute Options dialog box appears which will let you record historical attributes. There are two ways to do this, I prefer to use StartDate and EndDate as shown below. For an active attribute record, EndDate will be NULL and for expired attribute record, the EndDate will show the date of expiry.

untitled10

9. Now before we get to the next dialog box, lets discuss what Inferred Members are.

Many a times when loading fact data you may find that for a fact record, the corresponding dimension record is not available. To get around this, a new row is inserted into the dimension table with the surrogate key and natural key as a placeholder record. Slowly changing Dimension Wizard refers to this placeholder record as Inferred Member. Inferred Members are also known as late-arriving dimension rows.

Example: At my current work place, we process our facts everyday and dimensions once a week . Now guess what will happen when a new employee joins and starts working from the next day? Since we do our time entry on a daily basis, this new employee’s hours will show up in the fact table(WIP table) but the corresponding dimension record(employee name etc) will be missing.

To summarize, Inferred Members are

(i) added during fact processing when the business key from the fact source doesn’t have a match in the dimension table

(ii) updated during dimension processing when the dimension record comes in from the source

Click next; the Inferred Dimension Members dialog appears.

untitled11

If Enable inferred member support is checked, then you need to tell SCD how you want it to know whether a record is an inferred member. There are two options; the first one is to leave all the columns in the dimension table as NULL and the second one is to use an Inferred Member Boolean column in the table to indicate whether the current record is an inferred member. The recommended option is to define an inferred member column in the table and handle the identification by setting the Boolean value to True or False.

10. Click next to finish the wizard and then click finish.

untitled12

11. Your screen should be similar to the one shown below.
untitled13

12. Execute the package as shown below.

untitled14

13. You should now see all the records in DimEmployee table in the Data Warehouse.

14.  Let’s make some changes (updates) to the Employee table in the OLTP system and see how SSIS handles the changes. We will make changes to both Type1 and Type2 attributes as shown below. Execute the package again.

untitled15

15. Now let’s check these changes in the DimEmployeetable in our Data Warehouse.

untitled16

i. Gary Miler moved to ERP from BI – changed PracticeArea (Type2 change)
As shown in the screen shot above, a new record with a different surrogate key has been added to the table. The two records however, have same business key (or natural key) which holds the two records for that attribute together. The EndDate can be used to distinguish the active record from the expired record – EndDate is NULL for active record.

ii. William Campbell moved to Auditing from Consulting – changed LineOfService (Type1 change)
As expected, the old attribute value (i.e. consulting) has been replaced with new attribute value (i.e. auditing).

Hope this is helpful.

Blog at WordPress.com.