📜 ⬆️ ⬇️

Accelerate data insertion from MSSQL to remote PostgreSQL 800 times

In the process of project development, there is a periodic need for data exchange between database servers. Suppose we have a data source in the form of a SQL Server and a remote PostgreSQL server on which this data should appear. After adding a remote server as a linked server, you can make requests of the form:


INSERT INTO RemotePG...RemoteTable (RecordID, RecordName) VALUES (1,'Test string'); 

The problem is that such requests are very long. If we are faced with the task of unloading tens and hundreds of thousands of records, the execution time tends to infinity. Consider two and a half ways to insert data into the table on the linked server and compare the execution time.



Creating a new linked server:

To create a linked server, you must already have an ODBC data source. In my case, the name of the linked server and the ODBC source are the same.


 USE [master] GO --   linked server   declare @ServerName nvarchar(200) SET @ServerName=N'RemotePG' --    EXEC master.dbo.sp_addlinkedserver @server = @ServerName, @srvproduct=@ServerName, @provider=N'MSDASQL', @datasrc=@ServerName --      --            EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=@ServerName,@useself=N'False',@locallogin=NULL,@rmtuser=N'remote_user',@rmtpassword='password' --     EXEC master.dbo.sp_serveroption @server=@ServerName, @optname=N'rpc', @optvalue=N'true' EXEC master.dbo.sp_serveroption @server=@ServerName, @optname=N'rpc out', @optvalue=N'true' 


Method number 1


Suppose there is a simple table on the remote machine:


 CREATE TABLE RemoteTable (RecordID INT, RecordName VARCHAR(200)); 

Let's look at the execution time of a simple query


 --    SELECT getdate(); GO --     DECLARE @i INT; SET @i=0; --        WHILE @i<1000 BEGIN INSERT INTO RemotePG...RemoteTable (RecordID, RecordName) VALUES (1,'Test string'); SET @i=@i+1; END GO --    SELECT getdate(); GO 

In my case it took 2 minutes 52 seconds. Approximately 6 entries per second. Not fast. For reference: the remote machine is on the channel about 5 Mbit / s and an average ping of 16 ms.


Method number 2


In the case of a remote server, SQL Server allows the use of the following construction:


 EXECUTE ('sql    ') AT LinkedServerName; 

To make this possible, remote procedure calls must be enabled in the Linked Server settings (RPC = remote procedure call). In this case, the request is executed directly on the remote machine.
Let's see how this will affect the execution time:


 --    SELECT getdate(); GO --     DECLARE @i INT; SET @i=0; --       WHILE @i<1000 BEGIN EXECUTE ('INSERT INTO RemoteTable (RecordID, RecordName) VALUES (1,''Test string'');') AT RemotePG; SET @i=@i+1; END GO --    SELECT getdate(); GO 

The runtime of 17.25 seconds is better, but we will try to reduce this time.


Method Number 2.5


To do this, before executing the query, we will prepare the data to be inserted into the table.


 --    SELECT getdate(); GO --     DECLARE @sql VARCHAR(max); DECLARE @i INT; SET @i=0; --    SET @sql='INSERT INTO RemoteTable (RecordID, RecordName) VALUES '; --       WHILE @i<1000 BEGIN SELECT @sql=@sql+'(1,''Test string''),'; SET @i=@i+1; END --    ','  ';' SELECT @sql=SUBSTRING(@sql,1,LEN(@sql)-1)+';' --    EXECUTE (@sql) AT RemotePG; GO --    SELECT getdate(); GO 

The request will be a long string of the form:


 INSERT INTO RemoteTable (RecordID, RecordName) VALUES (1,'Test string'),(1,'Test string') ... (1,'Test string'); 

Such a request, under the same conditions, was executed in 217 milliseconds. Which is about 800 times faster than the original version.


PS The data for insertion into the table is specially simplified.


')

Source: https://habr.com/ru/post/305982/


All Articles