Hello friends, today we are going to learn about slowly changing dimension transformation in SSIS, which is also known as SCD (abbreviated name). Lets take an example to implement SCD in SSIS package.
To implement SCD, we will create two tables Dept and DimDept using below SQL statement.
Create table Dept(DeptId int,DeptName varchar(50),DeptLocation varchar(50))
insert into Dept values(101,’IT’,’Ground Floor’)
insert into Dept values(102,’Finance’,’First Floor’)
insert into Dept values(103,’Admin’,’Second Floor’)
insert into Dept values(104,’Marketing’,’Third Floor’)
insert into Dept values(105,’Sales’,’Fourth Floor’)
Create Dimension table using below SQL Script:
Create table DimDept(DeptId int,DeptName varchar(50),DeptLocation varchar(50),Status bit)
First of all we will take one data flow task in control flow region as shown below.
Now we will double click on the data flow task that will open the data flow region and add one “OLEDB Source” as shown below.
We will right click on “OLEDB Source” and click on “Edit” menu to configure the connection manager and select “Dept” table as shown below.
We will click on “OK” button to save the settings that action will close the “OLEDB Source editor” screen.Now lets add SCD transformation task in the data flow region and connect it with “OLEDB Source”(As shown below).
We will right click on SCD transformation task and click on “Edit” menu that will open Slowly Changing Dimension wizard screen(As shown below).
We will click on “Next” button that will transform the above screen into a new screen(As shown below).
In the above screen, we will select dimension table “DimDept” and we will declare “DeptId” column as “Business Key” in key type column. Business key is responsible for joining two tables “DimDept” and “Dept” .We will click on “Next” button that will transform the above screen into a new screen(As shown below).
Now we have to specify the columns for those we want to track the changes and replicate those changes to the dimension table. I have selected the two columns(other than business key column) which are “DeptLocation” and “DeptName”. In change type field, we need to specify the SCD attribute. There are three types of attributes in SCD transformation which are following.
Fixed attribute: If we specify this attribute for a column then the value of that column should not change and changes will be treated as errors.
Changing attribute: This is type one change and if we specify this attribute for a column then change values will overwrite the existing values of the columns in dimension table.
Historical Attribute: This is type two change and if we specify this attribute for a column then change values of that column will be inserted as a new record and existing record will be marked as outdated or expired record.
In our sample package, we have specified one column with Changing attribute and other column with historical attribute.
Now we will click on “Next” button that will transform the above screen into a new screen(As shown below).
In the above screen there is a check box(encircled with red color), if we check this option, changes occurring on the “Dept” table will be replicated to dimension table for outdated records as well in “DimDept” table. We will click on “Next” button that will transform the above screen into a new screen(As shown below).
In above screen, there are two option buttons, I have selected the first option “Use a single column to show current and expired records” and specify the column “Status” in field “Column to indicate current record”. After that we will specify two more fields “Value when current” =”True” and “Expiration value”=”False”. It means in status column we will store the true or false values which means if value is true then that record is active and if value is false then that record would be expired or outdated. We will click on “Next” and current screen will be transformed into a new screen(As shown below).
We will uncheck the check box “Enable inferred member support” and click on “Next” button that will transform the current screen into a new screen(As shown below).We will explore this option in forthcoming articles.
We will click on “Finish” button that will close the above screen and add few other tasks into the package automatically(As shown below).
Now we are done with our package, lets run it by pressing “F5” key and see the results.
Green color of all the tasks are indicating that our package ran successfully. Lets check the “DimDept” table, which was empty earlier.
We can see the Dimension table has got all five records which were in the “Dept” table with status indicator “1” which means all records are active records. Lets make a change in “DeptName” column where DeptId=5 using below SQL script and run the package one again.
Update Dept set deptname=’HR’ where Deptid=105
Lets run the package and check the “DimDept” table.
We can see in the above screenshot that we have 2 records for deptId=105 , one with status=1 and other with status=0 which means the record which has status=1 is the active record.
Limitation of SCD:
- This transformation is quite slow as there is no caching of the lookup data.
- SCD can be used for only SQL Server data source.
- It uses OLEDB command for row by row updates instead of batch update.
I hope you will find this article useful. In my next article we will use the two column concept where we can use two column instead of a single column to specify if the record is active or outdated.
967 total views, 2 views today