Get N sample rows from a huge dataset
In my previous article I have discussed about Percentage Sampling Task. But, there is one one task similar to that in Integration Services. This task fetches only N sample rows from the given dataset.
Row Sampling Transformation
Row Sampling task can be used in a situation wherein you need only N random rows from the huge dataset. The special advantage is that, we can use random seed value for random row selection. Of course, we can do the sample with Percentage Sampling task.
But, Row Sampling allows developer to choose only required column(s) from the input dataset.
Assume that I am running a business. I have a marketing strategy to choose 10 customers every month from special category for lucky draw.
Step 1: Create a package and add data flow task.
Step 2: Add a ADO.NET data source and connect with Adventure Works database table Sales.Customer.
Step3: Add Row sampling and connect data source task with it.
Step 4: Open Row Sampling editor and check random seed option and enter 2 in the seed value text box.
Step 5: Add a Union all task and connect Row sampling task’s selected rows output pipeline with it. To view the sample dataset during package execution, Add Data viewer component.
Step 6: Execute package.
Thanks for reading.