What to do if PK Identity runs out of values?

Sometimes, when designing a database, developers underestimate the scale of the project. And then, the project shoots and becomes heavily loaded. Then, at some point, someone notices that an identity of type INT is selected as the primary key of a large table, with a limit of 2,147,483,647.

Initially, it seems that 2 billion records is a lot. But if, do you have 10 million new entries added daily? And already spent more than 1 billion values? Do you have a 24/7 application? You only have 114 days left to fix the type of primary key. This is not so much if you use the key value in both the web application and the client.

If the described situation is familiar to you, and you have noticed this regrettable detail - you are running out of primary key values - too late, then this article is for you. In the article you will find the scripts that are listed for the TableWithPKViolation table, in which the TableWithPKViolationId field causes the problem.

In the worst case, you encountered the error “Arithmetic overflow error converting to data type int”. This means that the primary key values have already run out and your application has stopped working. In this case, you can use the following solutions:
')

Change the type of primary key to BIGINT. Everyone and everyone understands that the best option is to sit in a time machine and change INT to BIGINT there, in the past. But you can do it now, if the TableWithPKViolationId field is not used in the server and client applications, then you have the opportunity to quickly and painlessly change the type. Do this and do not waste time on the rest of the article. Please note that if your table has more than 1 billion records, the change will be applied, i.e. may take more than 3 hours, depending on the capacity of your server, and will require additional space in the transaction log (if you can, switch to the Recovery Mode in Simple). The script to change is as follows:
```
ALTER TABLE TableWithPKViolation ALTER COLUMN TableWithPKViolationId BIGINT; 
```
If this method is not available to you, you need to plan the transfer of the key to BIGINT as soon as possible.
Use negative values. Usually, when using identity, the default IDENTITY (1,1) is used. When the value goes to 2 billion records, you can reset the initial value using the following command:
```
 DBCC CHECKIDENT (TableWithPKViolation, - 2147483647, reseed) 
```
and thus get much more time to go to BIGINT. The only inconvenience of this solution is the negative values of the primary key. Check that your business logic allows for negative values. Perhaps this is the easiest solution.

Iii. Create a table with unused values. Count the values that are missing and create a table with a list of unused values. This will give you extra time to go to BIGINT.

This method will suit you if you do not rely on the order of records in the table, that is, do not use ORDERY BY Id. Or there are not many places where there is such a sort, and you can change the sorting to another field, for example, on the date the record was added.

You can create a table with unused values in two ways:

Method A Missing values.

When you use Identity, you always have missing values, since values are reserved at the start of a transaction, and, in the case of its rollback, the next transaction is assigned a new, following the reserved, value of the primary key. The reserved value that was generated for the canceled transaction will remain unused. These unused values can be formed into a separate table and applied using the code below.

Method B Remote values.

If you usually delete records from the table where the primary key values end, then all deleted values can be reused as free ones. I will give an example of code for this option below.

Source Table TableWithPKViolation.

 CREATE TABLE [dbo].[TableWithPKViolation]( [TableWithPKViolationId] [int] IDENTITY(1,1) NOT NULL ) ON [PRIMARY]

1. Create a table for storing free IDs.
10-CreateNewId.sql

 CREATE TABLE [dbo].[NewIds]( [NewId] [int] NOT NULL, [DateUsedUtc] [datetime] NULL ) ON [PRIMARY]

Further, depending on the method:

To generate a sequence using Method A. Missing values:

2. Generate a sequence of missing identifiers
20-GenerateGaps.sql
"Option1 FindGaps \ 20-GenerateGaps.sql"

 CREATE PROCEDURE [dbo].[spNewIDPopulateInsertFromGaps] @batchsize INT = 10000, @startFrom INT = NULL AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; IF @startFrom IS NULL BEGIN SELECT @startFrom = MAX([NewId]) FROM dbo.NewIds; END; DECLARE @startId INT = ISNULL(@startFrom,0); DECLARE @rowscount INT = @batchsize; DECLARE @maxId INT; SELECT @maxId = MAX(TableWithPKViolationId) FROM dbo.TableWithPKViolation; WHILE @startId < @maxId BEGIN INSERT INTO dbo.NewIds ([NewId]) SELECT id FROM ( SELECT TOP (@batchsize) @startId + ROW_NUMBER() OVER(ORDER BY TableWithPKViolationId) AS id FROM dbo.TableWithPKViolation --any table where you have @batchsize rows ) AS genids WHERE id < @maxId AND NOT EXISTS ( SELECT 1 FROM [dbo].[TableWithPKViolation] as Tb WITH (NOLOCK) WHERE Tb.TableWithPKViolationId = genids.id ); SET @rowscount = @@ROWCOUNT; SET @startId = @startId + @batchsize; PRINT CONVERT(VARCHAR(50),GETDATE(),121)+' '+ CAST(@startId AS VARCHAR(50)); END END

To generate a sequence with method B Remote values:
2. Create a table for generating a sequence and fill it with data from 1 to 2,147,483,647
15-CreateInt.sql

 CREATE TABLE [dbo].[IntRange]( [Id] [int] NOT NULL ) ON [PRIMARY]

20-GenerateInt.sql

 CREATE PROCEDURE [dbo].[spNewIDPopulateInsert] @batchsize INT = 10000, @startFrom INT = NULL AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; IF @startFrom IS NULL BEGIN SELECT @startFrom = MAX(id) FROM dbo.IntRange; END; DECLARE @startId INT = ISNULL(@startFrom,0); DECLARE @rowscount INT = @batchsize; DECLARE @maxId INT = 2147483647; WHILE @rowscount = @batchsize BEGIN INSERT INTO dbo.IntRange (id) SELECT id FROM ( SELECT TOP (@batchsize) @startId + ROW_NUMBER() OVER(ORDER BY TableWithPKViolationId) AS id FROM dbo.TableWithPKViolation --any table where you have @batchsize rows ) AS genids WHERE id < @maxId; SET @rowscount = @@ROWCOUNT; SET @startId = @startId + @rowscount; PRINT CONVERT(VARCHAR(50),GETDATE(),121)+' '+ CAST(@startId AS VARCHAR(50)); END END

25-PopulateRange.sql

 exec dbo.spNewIDPopulateInsert @batchsize = 10000000

The script uses the TableWithPKViolation table to generate a sequence, you can use any method for this, including the sequence built into MS SQL (Sequence). This method was chosen because it worked faster than others.

30-CreateIndexOnInt.sql

 ALTER TABLE [dbo].[IntRange] ADD PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [PRIMARY]

and fill it up
50-GenerateNewId.sql

 CREATE PROCEDURE [dbo].[spNewIDPopulateInsertFiltered] @batchsize INT = 10000, @startFrom INT = NULL, @endTill INT = NULL AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; IF @startFrom IS NULL BEGIN SELECT @startFrom = MAX([NewId]) FROM dbo.NewIds; END; DECLARE @startId INT = ISNULL(@startFrom,0); DECLARE @rowscount INT = @batchsize; DECLARE @maxId INT = ISNULL(@endTill,2147483647); DECLARE @endId INT = @startId + @batchsize; WHILE @startId < @maxId BEGIN INSERT INTO [NewIds] ([NewId]) SELECT IR.id FROM [dbo].[IntRange] AS IR WHERE IR.id >= @startId AND IR.id < @endId AND NOT EXISTS ( SELECT 1 FROM [dbo].[TableWithPKViolation] as Tb WITH (NOLOCK) WHERE Tb.TableWithPKViolationId = IR.id ); SET @rowscount = @@ROWCOUNT; SET @startId = @endId; SET @endId = @endId + @batchsize; IF @endId > @maxId SET @endId = @maxId; PRINT CONVERT(VARCHAR(50),GETDATE(),121)+' '+ CAST(@startId AS VARCHAR(50)); END END

55-ExecGeneration.sql

 -----Run each part in separate window in parallel ----- --part 1 exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1, @endTill= 500000000 --end of part 1 --part 2 exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 500000000, @endTill= 1000000000 --end of part 2 --part 3 exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1000000000, @endTill= 1500000000 --end of part 3 --part 4 DECLARE @maxId INT SELECT @maxId = MAX(TableWithPKViolationId) FROM dbo.TableWithPKViolation exec dbo.spNewIDPopulateInsertFiltered @batchsize = 10000000, @startFrom = 1500000000, @endTill= @maxId --end of part 4

3. A table of free identifiers generated by method A or B is ready. Create indexes on a table with free keys.
60-CreateIndex.sql

 ALTER TABLE [dbo].[NewIds] ADD CONSTRAINT [PK_NewIds] PRIMARY KEY CLUSTERED ( [NewId] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_NewIds_DateUsedUtc] ON [dbo].[NewIds] ( [DateUsedUtc] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) GO ALTER TABLE [dbo].[NewIds] SET ( LOCK_ESCALATION = DISABLE ) GO

We check that everything has been correctly generated. There should not be an ID in the NewId table that is in the main TableWithPKViolation table.
70-CheckData.sql

 declare @maxId INT select @maxId = max(TableWithPKViolationId) from [dbo].[TableWithPKViolation] IF EXISTS (select 1 from [dbo].[NewIds] WHERE [NewId] > @maxId) BEGIN PRINT 'PROBLEM. Wait for cleanup'; declare @batchsize INT = 10000 DECLARE @rowcount int = @batchsize; while @rowcount = @batchsize begin delete top (@batchsize) from [dbo].[NewIds] where DFVId > @maxId; SET @rowcount = @@rowcount; end; END ELSE PRINT 'OK';

If you generate sequentially on another server (for example, on a server with a restored backup of the database), then you can upload data to a file using the script:
80-BulkOut.sql

 declare @command VARCHAR(4096), @dbname VARCHAR(255), @path VARCHAR(1024), @filename VARCHAR(255), @batchsize INT SELECT @dbname = DB_NAME(); SET @path = 'D:\NewIds\'; SET @filename = 'NewIds-'+@dbname+'.txt'; SET @batchsize = 10000000; SET @command = 'bcp "['+@dbname+'].dbo.NewIds" out "'+@path+@filename+'" -c -t, -S localhost -T -b '+CAST(@batchsize AS VARCHAR(255)); PRINT @command exec master..xp_cmdshell @command

4. Create a procedure that marks the required number of available IDs and returns them as a result.
90-GetNewId.sql

 create PROCEDURE [dbo].[spGetTableWithPKViolationIds] @batchsize INT = 1 AS BEGIN SET NOCOUNT ON; SET XACT_ABORT ON; DECLARE @rowcount INT, @now DATETIME = GETUTCDATE(); BEGIN TRAN UPDATE TOP (@batchsize) dbo.NewIds SET DateUsedUtc = @now OUTPUT inserted.[NewId] WHERE DateUsedUtc IS NULL; SET @rowcount = @@ROWCOUNT; IF @rowcount != @batchsize BEGIN DECLARE @msg NVARCHAR(2048); SET @msg = 'TableWithPKViolationId out of ids. sp spGetTableWithPKViolationIds, table NewIds. ' +'Ids requested ' + CAST(@batchsize AS NVARCHAR(255)) + ', IDs available ' + CAST(@rowcount AS NVARCHAR(255)); RAISERROR(@msg, 16,1); ROLLBACK; END ELSE BEGIN COMMIT TRAN END; END

5. Add to all the procedures in which the data was inserted into the table and return SCOPE_IDENTITY (), a call to the new procedure.

If performance permits or time is very expensive for you, and you need to change a lot of procedures, you can make a trigger instead of insert.

Here is an example of how you can use the procedure to issue new primary key values:

 CREATE TABLE #tmp_Id (Id INT); INSERT INTO #tmp_Id EXEC spGetTableWithPKViolationIds @batchsize=@IDNumber; SELECT @newVersionId = Id FROM #tmp_Id; SET IDENTITY_INSERT [dbo].[TableWithPKViolation] ON;

Note that the SET IDENTITY_INSERT ON option requires that the user calling the procedure has ALTER permission for the TableWithPKViolation table.

6. Then you can configure the JOB, which will clear the table with the used identifiers
95-SPsCleanup.sql

 create PROCEDURE dbo.spCleanupNewIds @batchSize INT = 4999 AS BEGIN SET NOCOUNT ON DECLARE @minId INT DECLARE @maxId INT SELECT @minId = Min([NewId]), @maxId = MAX([NewId]) FROM dbo.NewIds WITH (NOLOCK) WHERE DateUsedUtc IS NOT NULL; DECLARE @totRowCount INT = 0 DECLARE @rowCount INT = @batchSize WHILE @rowcount = @batchsize BEGIN DELETE TOP (@batchsize) FROM dbo.NewIds WHERE DateUsedUtc IS NOT NULL AND [NewId] >= @minId AND [NewId] <= @maxId SET @rowcount = @@ROWCOUNT SET @totRowCount = @totRowCount + @rowcount END PRINT 'Total records cleaned up - ' + CAST(@totRowCount AS VARCHAR(100)) END

JOB, which will delete used records once a day, is not required. If you regularly delete records from the main table, then you can supplement this table with deleted values.

I would still recommend to plan the transition to BIGINT.

New identifiers, of course, will be issued incrementally, however, it is necessary to think through the logic of sorting new identifiers so that they go after previously issued old identifiers, even if the arithmetic value of new ones is less.

The temporary solutions described in this article to the fact that the primary key values have suddenly run out help you gain time and keep the system operational while you change the system and the program for a new data type.

The best solution is to monitor the boundary values and transition to the appropriate data types in advance.

Archive with code

Source: https://habr.com/ru/post/329506/

All Articles

What to do if PK Identity runs out of values?

More articles: