Home > SQL Server > SQL Server Data Sampling or Limiting Result-set

SQL Server Data Sampling or Limiting Result-set


We may need a limited result set of a huge dataset .  Because, we may need to apply some statistical model with different sample data to check the output in various conditions or simply for testing the application or may be for other purpose. In these scenarios, we need to use the sampling commands available in SQL Server. Eventually, It will save some  time and resources.

Below given methods can be used to fetch limited number of rows from a result set.

In this blog post we are going to learn TableSample clause.

TABLESAMPLE

Table sample clause limits the number of rows returned from a table. We can specify the sampling data in row count or in percentage value. So, it will return only the specified number of rows instead all records.

But, we cannot use this with the following DML and DDL statements.

DML statements- Derived tables, Tables from linked servers, and Tables derived from table-valued functions, Rowset functions and OPENXML.

DDL statements – View and  Inline table-valued function.

 Syntax

TABLESAMPLE [SYSTEM] (sample_number [ PERCENT | ROWS ] ) [ REPEATABLE (repeat_seed) ]

SYSTEM is an ANSI SQL implementation and it is the only sampling method available in SQL Server. TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. It returns all the rows on approximately based on that calculation.

The REPEATABLE option causes a selected sample to be returned again with the given seed value. It returns the same output  as long as the table has no changes affect by DML commands or table maintenance operation (such as index rebuild, backup and etc).

Examples

Database used for the example is AdventureWorks2012

1. In this example SQL query we get a sample of  x percent in n rows. I am going to execute the following statements twice.

SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (10 PERCENT) ;

Output: No. of Rows after execution 1 and 2 are 1550 and 1755 respectively.

2. In this example SQL query we are going to use option repeatable.

SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (10 PERCENT)    REPEATABLE (205) ;

3. In this example we are going to use rows option.

SELECT FirstName, LastName FROM Person.Person TABLESAMPLE (100 ROWS) ;

4. In this method we are going to use NewID() guid generating function to get randomized rows.  But, it consumes some time for processing. So, use it with caution.

Select top 5 percent * from Person.Person order by newid()

There are other ways to work with large table using RAND(), newid() and Binary_Checksum() functions. Please read this article for more information.

Advertisements
Categories: SQL Server Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: