Why you should not compress your data files

One of my hottest issues is data file compression. Despite the fact that I owned the compression code when I worked at Microsoft, I did not have a chance to rewrite it in a way that would make it more pleasant. I really do not like compression.

Please do not confuse transaction log compression with data file compression. Compressing the journal is necessary if your journal has grown beyond the permissible limits, or when you get rid of excessive fragmentation of virtual journal files (see here (English) and here (English) you can find excellent Kimberley articles). However, transaction log compression should be a rare operation and should never be part of any regular maintenance program that you perform.

Compression of data files should be performed even less often, if at all. And here's why - compressing data files causes severe fragmentation of indexes. Let me demonstrate this in a simple script that you can execute yourself. The script below will create a data file, create a “filler” table with a size of 10 MB at the beginning of the data file, create a “production” cluster index with a size of 10 MB, and then analyze the fragmentation of the new cluster index.

USE [master]; GO IF DATABASEPROPERTYEX (N'DBMaint2008', N'Version') IS NOT NULL DROP DATABASE [DBMaint2008]; GO CREATE DATABASE DBMaint2008; GO USE [DBMaint2008]; GO SET NOCOUNT ON; GO --  -""  10     CREATE TABLE [FillerTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'filler'); GO --  - INSERT INTO [FillerTable] DEFAULT VALUES; GO 1280 --   ,   ""     CREATE TABLE [ProdTable] ( [c1] INT IDENTITY, [c2] CHAR (8000) DEFAULT 'production'); CREATE CLUSTERED INDEX [prod_cl] ON [ProdTable] ([c1]); GO INSERT INTO [ProdTable] DEFAULT VALUES; GO 1280 --     SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED'); GO

 avg_fragmentation_in_percent ----------------------------- 0.390625

The logical fragmentation of the cluster index before compression is close to perfect 0.4%.
')
Now I will delete the table-placeholder, start the compression to free up space and check the fragmentation of the cluster index again:

 -- -,  10        DROP TABLE [FillerTable]; GO --    DBCC SHRINKDATABASE ([DBMaint2008]); GO --     SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats ( DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED'); GO

 DbId FileId CurrentSize MinimumSize UsedPages EstimatedPages ----- ------- ------------ ------------ ---------- --------------- 6 1 1456 152 1448 1440 6 2 63 63 56 56 DBCC execution completed. If DBCC printed error messages, contact your system administrator. avg_fragmentation_in_percent ----------------------------- 99.296875

Wow! After compression, logical fragmentation is almost 100%. The compression operation completely fragmented the index, depriving any chance of effectively scanning ranges in this index by ensuring that all proactive I / O operations with a range scan would be single-page I / O operations.

Why did this happen? The data file compression operation works with one file at a time, and uses the global distribution map (GAM) (see the article “Inside the Storage Engine: GAM, SGAM, PFS and other distribution maps” , English) to find the latest page located in the file. It then moves this page as close to the beginning of the file as possible, and repeats the operation again and again. In the situation above, this completely reversed the cluster index order, making it from fully defragmented completely fragmented.

The same code is used in the DBCC SHRINKFILE, DBCC SHRINKDATABASE, and autocompress commands — they are equally bad. And along with the fragmentation of the index, data file compression generates a large number of I / O operations, actively uses CPU time and generates a large number of entries in the transaction log — since everything it does is completely logged.

Compressing data files should never be part of regular maintenance, and you NEVER, NEVER should turn on auto-compression. I tried to get it removed from SQL Server 2005 and SQL Server 2008 when I was in a position to achieve this — the only reason why it is still there is to ensure backward compatibility. Do not fall into the trap of creating a maintenance plan that rebuilds all indexes and then tries to free up the space occupied when rebuilding indexes by running compression - this is a zero-sum game, where everything you do is generate transaction log entries with zero real benefit for performance.

So when might you need to run compression? For example, if you delete most of a very large database and the database is unlikely to grow, or if you need to clean the file before deleting it?

I recommend the following method:

Create a new file group
Move all involved tables and indexes to a new filegroup using the syntax CREATE INDEX ... WITH (DROP_EXISTING = ON) ON to move tables and remove fragmentation from them at the same time.
Delete the old filegroup that you were going to compress anyway (or compress it to the maximum if this is the primary filegroup)

In fact, you need to provide additional free space before you can compress old files, but this is a much cleaner mechanism.

If you have absolutely no choice and you have to start the file compression operation, be prepared for the fact that you will cause the fragmentation of the indexes and you must take actions to remove it later if it causes performance problems. The only way to remove index fragmentation without growing a data file is to use DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE. These commands require an additional one page of 8Kb in size, instead of the need to build a completely new index in case of a rebuild operation.

Bottom line - try to avoid running file compression at all costs!

Source: https://habr.com/ru/post/330492/

All Articles

Why you should not compress your data files

More articles: