Home > SSIS > Get N sample rows from a huge dataset

Get N sample rows from a huge dataset


In my previous article I have discussed about Percentage Sampling Task. But, there is one one task similar to that in Integration Services. This task fetches only N sample rows from the given dataset.

Row Sampling Transformation

Row Sampling task can be used in a situation wherein you need only N random rows from the huge dataset. The special advantage is that, we can use random seed value for random row selection. Of course, we can do the sample with Percentage Sampling task.

But, Row Sampling allows developer to choose only required column(s) from the input dataset.

Scenario

Assume that I am running a business. I have a marketing strategy to choose 10 customers every month from special category for lucky draw.

Implementation

Step 1: Create a package and add data flow task.

Step 2: Add  a ADO.NET data source and connect with Adventure Works database table Sales.Customer.

Step3: Add Row sampling and connect data source task with it.

Step 4: Open Row Sampling editor and check random seed option and enter 2 in the seed value text box.

Step 5: Add a Union all task and connect Row sampling task’s selected rows output pipeline with it. To view the sample dataset during package execution, Add Data viewer component.

Step 6: Execute package.

MSDN article

Thanks for reading.

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: