Home > SSIS > Data Profiling Task – SSIS

Data Profiling Task – SSIS


Data Cleaning

The data volume is growing day by day from Byte to Zebibyte. But, how many of us are really thinking about the data quality. Enterprises need quality in their data to make correct decision. Data management team keeps fighting with data quality in an enterprise. So, Data governance has been introduce to maintain data quality and with other features.

Data Profiling Task

Data profiling task helps data management team by providing various data profile details to maintain data quality.

Data profile task gives 8 data profiles. They are

Single column analysis

1. Column Length Distribution Profile – Columns of char type

It reports all the distinct lengths of string values in the selected column and the percentage of rows in the table that each length represents.

2. Column Null Ratio Profile – All columns except binary, xml, text and user defined types

It reports the percentage of null values in the selected columns.

3. Column Pattern Profile – Columns of char type

It reports a set of regular expressions that cover the specified percentage of values in a string column.

4. Column Statistics Profile – Columns of numeric and Date Time type

It reports statistics, such as minimum, maximum, average, and standard deviation for numeric columns, and minimum and maximum for datetime columns.

5. Column Value Distribution Profile

Multi-Column analysis

6. Candidate Key Profile – Columns of integer type, char type, and datetime type

It reports whether a column or set of columns is a key, or an approximate key, for the selected table.

7. Functional Dependency Profile – Columns of integer type, char type, and datetime type

It reports the extent to which the values in one column (the dependent column) depend on the values in another column or set of columns (the determinant column).

8. Value Inclusion Profile – Columns of integer type, char type, and datetime type

It computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables.

Data profiling task offers two types of profile configuration a quick profiling with single table and manual configuration with multiple tables.

Implementation

Step 1: Create a package and Add file connection manager with create XML file option and an ADO.NET connection with a database for data profiling.

Step 2: Add Data Profiling Task to the package and open task editor.

Step 3: Configure destination file path in the Destination property.

Note: if you like to reuse this XML file in another execution then change the OverwriteDestination property to true.

Step 4: Click Quick Profile button.

Configure Database connection, table and data profile options in this form for Data profiling. Click ok to complete this task.

Step 5: Now, you can see all selected profile types in the table. You can also select any type and change the property before data profile execution.

For instance, You can configure each data profile type to compute various profile information on different tables.

Note: This example goes with default selection. No change is required

Step 6: Now, Click ok and execute the task or package.

Step 7: To view data profile XML file, we should open the XML file in Data Profile Viewer utility.

(Edited on 10/16/2011)

This output XML file can be queried using programming or scripting language to add conditional flow in the SSIS package. You can download XSD for the Data Profiling output XML file here.

for instance, Statistic data such as mean, median output of a table can be a conditional value for the following package execution. So, use column statistics profile to achieve this task.

Thanks for reading.

Advertisements
  1. No comments yet.
  1. October 21, 2011 at 11:23 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: