Contact Us

Data compression in large tables in MS SQL server

Do you know that you can compress data while using MS SQL 2008 or any later version? Compression is available for now in SQL Server 2008 or 2012 Enterprise and Developer edition.

This type of compression reduces the size of the table by at least of 25% and more. It also depends on how big your table is.

When you deal with TB(s) of data and you have a RAID setup in your dedicated server for your data with 5 SCSI HD you may want to make sure that you are not running out of disk space. For best performance I still use SCSI HD with 15000 RPM.

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

Compression can allow more rows to be stored on a page, but does not change the maximum row size of a table or index.

MS SQL data compression provides another great benefit, since compressed data is stored in fewer pages and therefore, queries need to read fewer pages from the disk and thus, improves the performance of I/O intensive workloads. However, some additional CPU resources are required on the database server to compress and decompress the data, while data is exchanged within the application. That additional resource based on my experience is not that significant at all. I am using 2 CPUs in my server and it works just fine.

In my case since I work with large amount of data and use PAGE compression. Row Compression helps to store data more efficiently while using 4 bits per compressed column to store the length of the data in the column. NULL and 0 take no additional space

Page compression works better for me since it is performing Row Compression plus optimizes the storage of multiple rows in a page, and this way minimizes the data redundancy.

To compress indexes, you must explicitly set the compression property of the indexes. By default, the compression setting for indexes will set to NONE when the index is created.

On a PAGE compression, a compressed index which could be clustered or non- clustered, the leaf level pages are compressed as PAGE, but no leaf pages are ROW compressed.

The leaf level is the data of the table. It is composed of data pages. All columns of the tables are stored for every row of the table.

No Leaf level - are composed of index pages.

Let's run the below stored procedure to see what will be the savings when compressing a 44,058.422 table which is 44GB.

--Using the PAGE compression 
USE Users 
GO 
EXEC sp_Estimate_Data_Compression_Savings NULL, 'Users', NULL, NULL, 'PAGE' ; 
GO

See this link: for the this stored procedure named: sp_estimate_data_compression_savings (Transact-SQL)

The results shows 1 record only and the column named: Sample_size_with_requested_compression_settings(KB) has the value of 27816 which was a savings of 16242.422 KB. That is %36.87 and it was calculated in here: 1 - (27816/44058.422) * 100 = %36.87

Now let's create a compressed clustered index on the Users table and see the actual size of the table. Pay attention that I added the the DATA_COMPRESSION = PAGE at the end of the index script.

CREATE UNIQUE CLUSTERED INDEX Users_PK ON dbo.Users 
(
    Users_PK
)   
WITH (
    PAD_INDEX  = OFF, 
    STATISTICS_NORECOMPUTE  = OFF, 
    SORT_IN_TEMPDB = OFF, 
    IGNORE_DUP_KEY = OFF, 
    DROP_EXISTING = OFF,
    ONLINE = OFF, 
    ALLOW_ROW_LOCKS  = ON, 
    ALLOW_PAGE_LOCKS  = ON,
    DATA_COMPRESSION = PAGE)

In this case I have no indexes on the Users table and therefore creating the clustered index would be relatively faster since no other indexes should be rebuilt on that table. Creating the compressed clustered index will compress the table as well, at the same time.

To create the index on 44GB table and 240 million records took about 23.27 min and the index size was just 54.750MB. The actual amount of kb for the Users table was reduced to 29361.078. The savings was: 1 - (29361.078/44058.422 * 100) = %33.358. For me it is a major savings also in disk space.

We can also use the below command in MS SQL 2008 or above to enable PAGE compression while trying to re build the HEAP (When the table does not have a clustered index) and requires all non-clustered indexes on the table to be rebuilt so that they have pointers to the new row locations in the heap.

ALTER TABLE Users 
REBUILD WITH (Data_Compression = PAGE)

For more info about the MS SQL data compression features see this subject SQL: Server Books Online

Conclusions:
-----------------

Using the compression capability of MS SQL is great and saves HD space and increases performance and reduces cost. It increases the query performance by reducing I/O.In addition, no application changes are required to implement table or index compression.

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

Usually if you need to create a clustered index I would remove first all non-clustered indexes since they will not be compressed when they will be re-built while a compressed clustered index is created.

Make sure to set up Compression as DATA_COMPRESSION = PAGE on non-clustered indexes after you created a compressed clustered index. This way you take full advantage of the compression capability of data and indexes which was added in MS SQL 2008 and above. In addition, see this article of how to remove duplicate records in a smart way.

Go Back to Our: Main Blog

Written by: Doron Farber - The Farber Consulting Group, Inc

There are 2 comments for "Data compression in large tables in MS SQL server"

avatar
By Naomi | September 09,2011

See also other articles on this topic
http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/compressing-data-with-uniqueidentifier-columns

http://blogs.lessthandot.com/index.php/DataMgmt/DBAdmin/MSSQLServerAdmin/do-you-get-a-benefit-from-compressing-ba

Reply
avatar
By Matthew | January 01,0001

It's not just storage to be concerned with, it's the transmission of data. MSSQL needs user functions so compression can be managed by the developers, not by the server. The current feature is limited, at best, by compressing pages interally.

Reply

Add your comment