Transact SQL is an excellent language whose functionality is more than enough to solve most common problems. However, sometimes there are problems that can be solved with its help for a long time and / or inconveniently. Perhaps the most striking example is advanced line parsing, in which you have to use regular expressions or just a tricky and twisted algorithm. Starting with SQL Server 2005, this problem is solved by creating a stored procedure / CLR function. But this approach requires recompilation and deployment of the assembly when making changes. And so you want, without leaving Management Studio, to change the behavior of their procedures.
Naturally, there is a desire to build in T-SQL support for a scripting language in order to execute code on the fly. Thanks to the DLR (Dynamic Language Runtime) in .Net Framework 4, we have the opportunity. Exceptionally due to the author’s personal predilections, IronPython was chosen as such a language.
Under the cat step by step instructions and demonstration of the result.
What should be the result
I want to get a view function
select [dbo].[pyExecute]( ' import re re.findall("\d+"," 2013 !")[0] ' )
It would be nice to have the same aggregating function and a stored procedure using python code.
What we need
To implement our plans, we will use SQL Server 2008 R2, Visual Studio 2010 and IronPython 2.6.2. IronPython will have to be assembled from
source by fixing just one line of code (more on that later).
')
Server Tuning
To begin with, we will create a separate base for experiments. In the following examples, I use a database called CLR.
Building IronPython from Source
Initializing the python engine in the context of the Sql Server will cause an error that can be corrected by slightly correcting the source code. To do this, download the source code for IronPython 2.6.2 and open the project. In the IronPython project, we find the Modules \ sys.cs file and in the GetPrefix function we repeat the code used for building under Silverlight. Thus, the GetPrefix function will always return an empty string.
private static string GetPrefix() { string prefix; #if SILVERLIGHT prefix = String.Empty; #else // prefix prefix = String.Empty; /* try { prefix = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location); } catch (SecurityException) { prefix = String.Empty; } */ #endif return prefix; }
We build the project and get the assembly IronPython.dll, IronPython.Modules.dll, Microsoft.Dynamic.dll, Microsoft.Scripting.dll, Microsoft.Scripting.Core.dll, Microsoft.Scripting.Debugging.dll, Microsoft.Scripting.ExtensionAttribute.dll . I advise you to copy them into a separate folder, as we still need them in the future.
Creating our builds
Now we can safely open Visual Studio and create our assemblies. We need a solution with two projects. The first pyCore project is a class library that directly executes IronPython code. The second pySQL project is a database project for the CLR that uses the pyCore assembly and contains the code for our functions and stored procedures.
pyCore
Target framework choose .net 3.5. Add to the project links to assemblies IronPython.dll, IronPython.Modules.dll, Microsoft.Scripting.dll, Microsoft.Scripting.Core.dll. Let me remind you that we get these libraries after building IronPython from sources. Our project will contain only one static class pyCore, responsible for creating and initializing the IronPython engine, managing the context (scope) and executing the transferred script.
PyCore library code using System; using Microsoft.Scripting.Hosting; using IronPython.Hosting; namespace pyCore {
The main interest is the function py_exec, which accepts script text, execution context and arguments that must be passed to the script.
Now you need to create the pyCore assembly in the CLR database. To do this, run the following script:
CREATE ASSEMBLY PYCORE FROM N'< …>\pyCore.dll' WITH PERMISSION_SET = UNSAFE
Most likely, you will receive an error of the following form:
Assembly 'pyCore' references assembly 'system.runtime.remoting, version = 2.0.0.0, culture = neutral, public keytoken = b77a5c561934e089.',
In other words, not all the libraries necessary for pyCore to work are present in the database. In order not to bore the reader, I will cite immediately a script that loads everything you need. After the keyword FROM, you must specify the full path to the assembly. Most of the builds are obtained by building IronPython from source. The System.Runtime.Remoting.dll assembly can be found in C: \ Windows \ Microsoft.NET \ Framework \ v2.0.50727 \
The script to create all the necessary assemblies CREATE ASSEMBLY ExtensionAttribute FROM N'< …>\Microsoft.Scripting.ExtensionAttribute.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY ScriptingCore FROM N'< …>\Microsoft.Scripting.Core.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY Scripting FROM N'< …>\Microsoft.Scripting.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY SystemRuntimeRemoting FROM N'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Runtime.Remoting.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY MicrosoftDynamic FROM N'< …>\Microsoft.Dynamic.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY ScriptingDebugging FROM N'< …>\Microsoft.Scripting.Debugging.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY IronPythonModules FROM N'< …>\IronPython.Modules.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY PYCORE FROM N'< …>\pyCore.dll' WITH PERMISSION_SET = UNSAFE
pySQL
The black work is done, and it's time to start implementing the procedures and functions available from SQL Server. Create a CLR database project and specify our database for tests in the connection string. Now you need to add a link to the pyCore assembly. If you correctly entered the database connection string in the project, then when you add a new link, you will see all the assemblies that exist in the database. Among them, choose pyCore, Scripting and ScriptingCore.
CLR function
Add a new element to the project - a custom function.
public partial class UserDefinedFunctions {
There is nothing interesting happening in functions - a direct call to py_exec and the redirection of arguments. There are two use cases here: passing parameters to the script when generating the script text and explicitly passing parameters through the args array. The second method, in my opinion, is more readable and safe.
When declaring a CLR function in SQL Server, a signature comparison occurs that does not understand or for some other reason does not take into account the params keyword. As a result, you have to declare several functions with a different number of parameters. In reality, the need to create functions with more than three or four parameters is rarely encountered, so this is not a very significant limitation.
CLR procedure
public partial class StoredProcedures {
The internal structure of the procedure is slightly different from the function. Additionally, we pass to the script a link to an instance of the SqlPipe object, so that we can return the table result and display messages.
Aggregate function
Aggregation function cannot be created using Transact SQL. The only option is to use CLR builds. Why this becomes so clear when you first look at the structure of the aggregating function of the CLR.
Aggregation Function Code
We implement the IBinarySerialize interface to provide the script with the ability to save its state and the intermediate result of calculations. Since the Init function takes no arguments, the initialization script has to be executed when the Accumulate function is first run. Our aggregation function accepts script texts to handle each event. The scripts texts themselves are stored in the internal fields of the object and serialized.
Creating a pySQL assembly in the database and declaring functions
Now that the build is ready, it needs to be deployed in the CLR database.
CREATE ASSEMBLY PYSQL FROM N'< …>\pySQL.dll' WITH PERMISSION_SET = UNSAFE
Now we will declare our functions and procedures.
results
If you have read this far, then you are entitled to reward yourself by seeing the results of your work.
Function
First of all, let's solve the problem with regular expressions - we will find email addresses in the line. The string itself will be inserted directly into the script during its formation.
select [dbo].[pyFunc]( ' import re mails = re.findall("[.\\-_a-z0-9]+@(?:[a-z0-9][\\-a-z0-9]+\\.)+[az]{2,6}","'+doc+'") result = ": " for mail in mails: result += mail + "," result[:-1] ' ) from ( select ' somebody@gmail.com' doc union select ' person1@mail.ru person2@list.ru' doc ) SAMPLE_DATA
Result:
Found: somebody@gmail.com
Found: person1 @ mail.ru, person2 @ list.ru
The same, but using parameters. In my opinion, a more beautiful way, but requiring a function declaration with the required number of arguments.
select [dbo].[pyFunc1]( ' import re mails = re.findall("[.\\-_a-z0-9]+@(?:[a-z0-9][\\-a-z0-9]+\\.)+[az]{2,6}",str(args[0])) result = ": " for mail in mails: result += mail + "," result[:-1] ', SAMPLE_DATA.doc ) from ( select ' somebody@gmail.com' doc union select ' person1@mail.ru person2@list.ru' doc ) SAMPLE_DATA
The result is naturally the same.
This function can also be used to calculate complex math functions not built into SQL Server and to dynamically evaluate expressions (this can be achieved with sp_execute).
declare @InputFromula as nvarchar(MAX) SET @InputFromula = 'math.log(math.cosh(int(args[0]))/math.e,int(args[1]))' select [dbo].[pyFunc2] ( 'import math '+@InputFromula, 100,5 )
Stored procedure
I will give a complete example at once. Here we write a text message using the SqlPipe object, carefully passed to our script, then we form a table, fill it with data and return it as a result.
exec pyProc ' import clr clr.AddReference("System.Data") from Microsoft.SqlServer.Server import * from System.Data import * from System import DateTime Pipe.Send(" : !") metadata = ( SqlMetaData(" ", SqlDbType.NVarChar, 12), SqlMetaData(" ", SqlDbType.Int), SqlMetaData(" ", SqlDbType.DateTime) ) record = SqlDataRecord(metadata) record.SetString(0, "bocharovf"); record.SetInt32(1, 1000000); record.SetDateTime(2, DateTime.Now);
Result of performance:
We write the message: let's go!
Employee name | Expected salary | Expected date of increase |
bocharovf | 1,000,000 | 2012-12-31 02: 39: 51.293 |
(1 row (s) affected)
Aggregate function
We list the languages used in the article, separated by commas, using our aggregating function. In the serialization and deserialization scripts we use references to instances of the BinaryReader and BinaryWriter classes passed to our script. The result is accumulated in the variable data.
select dbo.pyAggregate (
Result:
Languages | IsUsed |
C #, IronPython, T-SQL | Used in article |
Ada, Cobol, Fortran, Lisp | Not used in the article |
(2 row (s) affected)
Performance
As expected, the performance is low. For example, the right-to-left output rate of the standard T-SQL REVERSE function and using the cut operation in python is almost 80 times different.
Security
exec pyProc ' from System.Net import WebClient content = WebClient().DownloadString("http://habrahabr.ru/") Pipe.Send(content[:4000]) 1 '
Since our capabilities are not limited by anything, we can execute any code, including access to a resource on the Internet or delete a file from the disk. Therefore, it is necessary to distribute the rights with care and use mainly the transfer of parameters to the function instead of creating a script in parts.
Instead of conclusion
Each tool should only be used where it really makes sense. You should not try to rewrite all your stored procedures and functions using IronPython. Rather, functions on IronPython are suitable for implementing complex algorithms that use functionality that is missing in Transact SQL or processing data from external sources (file system, Internet). In addition to IronPython, anyone can embed IronRuby support or, for example, Javascript .NET.