⬆️ ⬇️

Using IronPython from Transact SQL

Transact SQL is an excellent language whose functionality is more than enough to solve most common problems. However, sometimes there are problems that can be solved with its help for a long time and / or inconveniently. Perhaps the most striking example is advanced line parsing, in which you have to use regular expressions or just a tricky and twisted algorithm. Starting with SQL Server 2005, this problem is solved by creating a stored procedure / CLR function. But this approach requires recompilation and deployment of the assembly when making changes. And so you want, without leaving Management Studio, to change the behavior of their procedures.

Naturally, there is a desire to build in T-SQL support for a scripting language in order to execute code on the fly. Thanks to the DLR (Dynamic Language Runtime) in .Net Framework 4, we have the opportunity. Exceptionally due to the author’s personal predilections, IronPython was chosen as such a language.

Under the cat step by step instructions and demonstration of the result.





What should be the result



I want to get a view function

select [dbo].[pyExecute]( ' import re re.findall("\d+","   2013 !")[0] ' ) 


It would be nice to have the same aggregating function and a stored procedure using python code.



What we need



To implement our plans, we will use SQL Server 2008 R2, Visual Studio 2010 and IronPython 2.6.2. IronPython will have to be assembled from source by fixing just one line of code (more on that later).

')

Server Tuning



To begin with, we will create a separate base for experiments. In the following examples, I use a database called CLR.

 --         ALTER DATABASE CLR SET TRUSTWORTHY ON GO sp_configure 'show advanced options', 1; GO RECONFIGURE; GO --     sp_configure 'clr enabled', 1; GO RECONFIGURE; GO 




Building IronPython from Source



Initializing the python engine in the context of the Sql Server will cause an error that can be corrected by slightly correcting the source code. To do this, download the source code for IronPython 2.6.2 and open the project. In the IronPython project, we find the Modules \ sys.cs file and in the GetPrefix function we repeat the code used for building under Silverlight. Thus, the GetPrefix function will always return an empty string.

 private static string GetPrefix() { string prefix; #if SILVERLIGHT prefix = String.Empty; #else // prefix      prefix = String.Empty; /* try { prefix = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location); } catch (SecurityException) { prefix = String.Empty; } */ #endif return prefix; } 


We build the project and get the assembly IronPython.dll, IronPython.Modules.dll, Microsoft.Dynamic.dll, Microsoft.Scripting.dll, Microsoft.Scripting.Core.dll, Microsoft.Scripting.Debugging.dll, Microsoft.Scripting.ExtensionAttribute.dll . I advise you to copy them into a separate folder, as we still need them in the future.



Creating our builds



Now we can safely open Visual Studio and create our assemblies. We need a solution with two projects. The first pyCore project is a class library that directly executes IronPython code. The second pySQL project is a database project for the CLR that uses the pyCore assembly and contains the code for our functions and stored procedures.



pyCore


Target framework choose .net 3.5. Add to the project links to assemblies IronPython.dll, IronPython.Modules.dll, Microsoft.Scripting.dll, Microsoft.Scripting.Core.dll. Let me remind you that we get these libraries after building IronPython from sources. Our project will contain only one static class pyCore, responsible for creating and initializing the IronPython engine, managing the context (scope) and executing the transferred script.

PyCore library code
 using System; using Microsoft.Scripting.Hosting; using IronPython.Hosting; namespace pyCore { /// <summary> ///      ,  . /// </summary> public static class pyCore { static ScriptEngine engine; static ScriptRuntime runtime; /// <summary> ///  ,         /// </summary> static pyCore() { engine = Python.CreateEngine(); runtime = engine.Runtime; } /// <summary> ///    /// </summary> /// <param name="scope"> </param> /// <param name="name">     </param> /// <param name="value"> </param> public static void py_setvar (ScriptScope scope,string name, object value) { scope.SetVariable(name,value); } /// <summary> ///   . /// </summary> /// <param name="cmd"> </param> /// <param name="scope"> .  null    .</param> /// <param name="args">    .     args</param> /// <returns></returns> public static object py_exec(string cmd, ScriptScope scope = null, params object [] args) { //     -   var CurrentScope = (scope ?? runtime.CreateScope()); //   ,     if (args != null) { CurrentScope.SetVariable("args",args); } //     var source = engine.CreateScriptSourceFromString(cmd, Microsoft.Scripting.SourceCodeKind.AutoDetect); //     return source.Execute(CurrentScope); } /// <summary> ///     /// </summary> /// <returns></returns> public static ScriptScope CreateScope() { return engine.CreateScope(); } } } 




The main interest is the function py_exec, which accepts script text, execution context and arguments that must be passed to the script.

Now you need to create the pyCore assembly in the CLR database. To do this, run the following script:

 CREATE ASSEMBLY PYCORE FROM N'<   …>\pyCore.dll' WITH PERMISSION_SET = UNSAFE 


Most likely, you will receive an error of the following form:

Assembly 'pyCore' references assembly 'system.runtime.remoting, version = 2.0.0.0, culture = neutral, public keytoken = b77a5c561934e089.',

In other words, not all the libraries necessary for pyCore to work are present in the database. In order not to bore the reader, I will cite immediately a script that loads everything you need. After the keyword FROM, you must specify the full path to the assembly. Most of the builds are obtained by building IronPython from source. The System.Runtime.Remoting.dll assembly can be found in C: \ Windows \ Microsoft.NET \ Framework \ v2.0.50727 \

The script to create all the necessary assemblies
 CREATE ASSEMBLY ExtensionAttribute FROM N'<   …>\Microsoft.Scripting.ExtensionAttribute.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY ScriptingCore FROM N'<   …>\Microsoft.Scripting.Core.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY Scripting FROM N'<   …>\Microsoft.Scripting.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY SystemRuntimeRemoting FROM N'C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.Runtime.Remoting.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY MicrosoftDynamic FROM N'<   …>\Microsoft.Dynamic.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY ScriptingDebugging FROM N'<   …>\Microsoft.Scripting.Debugging.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY IronPythonModules FROM N'<   …>\IronPython.Modules.dll' WITH PERMISSION_SET = UNSAFE CREATE ASSEMBLY PYCORE FROM N'<   …>\pyCore.dll' WITH PERMISSION_SET = UNSAFE 




pySQL


The black work is done, and it's time to start implementing the procedures and functions available from SQL Server. Create a CLR database project and specify our database for tests in the connection string. Now you need to add a link to the pyCore assembly. If you correctly entered the database connection string in the project, then when you add a new link, you will see all the assemblies that exist in the database. Among them, choose pyCore, Scripting and ScriptingCore.



CLR function


Add a new element to the project - a custom function.

 public partial class UserDefinedFunctions { /// <summary> ///      IronPython /// </summary> /// <param name="cmd"> </param> /// <returns>  </returns> [Microsoft.SqlServer.Server.SqlFunction] public static object pyFunc(string cmd) { return pyCore.pyCore.py_exec(cmd); } /// <summary> ///      1  /// </summary> /// <param name="cmd"> </param> /// <param name="arg1"> 1.     args[0]</param> /// <returns>  </returns> [Microsoft.SqlServer.Server.SqlFunction] public static object pyFunc1(string cmd, object arg1) { return pyCore.pyCore.py_exec(cmd,null,arg1); } /// <summary> ///      2  /// </summary> /// <param name="cmd"> </param> /// <param name="arg1"> 1.     args[0]</param> /// <param name="arg2"> 2.     args[1]</param> /// <returns>  </returns> [Microsoft.SqlServer.Server.SqlFunction] public static object pyFunc2(string cmd, object arg1, object arg2) { return pyCore.pyCore.py_exec(cmd,null,arg1,arg2); } // pyFunc3, 4, ... , N }; 


There is nothing interesting happening in functions - a direct call to py_exec and the redirection of arguments. There are two use cases here: passing parameters to the script when generating the script text and explicitly passing parameters through the args array. The second method, in my opinion, is more readable and safe.

 --      select dbo.pyFunc('"'+name+'".upper()') from sys.all_objects --    select dbo.pyFunc1('str(args[0]).upper()',name) from sys.all_objects 


When declaring a CLR function in SQL Server, a signature comparison occurs that does not understand or for some other reason does not take into account the params keyword. As a result, you have to declare several functions with a different number of parameters. In reality, the need to create functions with more than three or four parameters is rarely encountered, so this is not a very significant limitation.



CLR procedure


 public partial class StoredProcedures { /// <summary> ///   CLR /// </summary> /// <param name="cmd"> </param> /// <returns>  Int,</returns> [Microsoft.SqlServer.Server.SqlProcedure] public static int pyProc(string cmd) { var scope = pyCore.pyCore.CreateScope(); //     SqlPipe scope.SetVariable("Pipe", SqlContext.Pipe); return (int)pyCore.pyCore.py_exec(cmd,scope); } }; 


The internal structure of the procedure is slightly different from the function. Additionally, we pass to the script a link to an instance of the SqlPipe object, so that we can return the table result and display messages.



Aggregate function


Aggregation function cannot be created using Transact SQL. The only option is to use CLR builds. Why this becomes so clear when you first look at the structure of the aggregating function of the CLR.

Aggregation Function Code
 /// <summary> ///   /// </summary> [Serializable] [SqlUserDefinedAggregate( Format.UserDefined, //    ,  IBinarySerialize IsInvariantToNulls = false, IsInvariantToDuplicates = false, IsInvariantToOrder = false, MaxByteSize = 8000) ] public class pyAggregate : IBinarySerialize { /// <summary> ///   /// </summary> public ScriptScope scope; string init; string merge; string terminate; string accumulate; string save; string restore; /// <summary> ///      /// </summary> public void Init() { //    scope = pyCore.pyCore.CreateScope(); } /// <summary> ///  (    ) /// </summary> /// <param name="value">   </param> /// <param name="init"> </param> /// <param name="accumulate"> </param> /// <param name="merge">   </param> /// <param name="terminate"> </param> /// <param name="save"> </param> /// <param name="restore"> </param> public void Accumulate(object value, string init, string accumulate, string merge, string terminate, string save, string restore) { //   ,     scope.SetVariable("value", value); //      -           if (this.init == null) { this.init = init; this.merge = merge; this.terminate = terminate; this.accumulate = accumulate; this.save = save; this.restore = restore; pyCore.pyCore.py_exec(this.init, scope); } //    pyCore.pyCore.py_exec(this.accumulate, scope); } /// <summary> ///    /// </summary> /// <param name="other"></param> public void Merge(pyAggregate other) { pyCore.pyCore.py_setvar(scope, "otherdata", other); pyCore.pyCore.py_exec(this.merge, scope); } /// <summary> ///     /// </summary> /// <returns></returns> public object Terminate() { return pyCore.pyCore.py_exec(this.terminate, scope); } /// <summary> /// .   IBinarySerialize /// </summary> /// <param name="r"> </param> public void Read(BinaryReader r) { //    this.init = r.ReadString(); this.merge = r.ReadString(); this.accumulate = r.ReadString(); this.terminate = r.ReadString(); this.save = r.ReadString(); this.restore = r.ReadString(); //    scope = pyCore.pyCore.CreateScope(); //      BinaryReader, //       pyCore.pyCore.py_setvar(scope, "reader", r); pyCore.pyCore.py_exec(this.restore, scope); } /// <summary> /// .   IBinarySerialize /// </summary> /// <param name="w"> </param> public void Write(BinaryWriter w) { //    w.Write(this.init); w.Write(this.merge); w.Write(this.accumulate); w.Write(this.terminate); w.Write(this.save); w.Write(this.restore); //      BinaryWriter, //       pyCore.pyCore.py_setvar(scope, "writer", w); pyCore.pyCore.py_exec(this.save, scope); } } 




We implement the IBinarySerialize interface to provide the script with the ability to save its state and the intermediate result of calculations. Since the Init function takes no arguments, the initialization script has to be executed when the Accumulate function is first run. Our aggregation function accepts script texts to handle each event. The scripts texts themselves are stored in the internal fields of the object and serialized.



Creating a pySQL assembly in the database and declaring functions


Now that the build is ready, it needs to be deployed in the CLR database.

 CREATE ASSEMBLY PYSQL FROM N'<   …>\pySQL.dll' WITH PERMISSION_SET = UNSAFE 


Now we will declare our functions and procedures.

 --   ,      CREATE FUNCTION [dbo].[pyFunc] (@cmd nvarchar(MAX)) RETURNS sql_variant AS EXTERNAL NAME PYSQL.[UserDefinedFunctions].[pyFunc]; GO --      CREATE FUNCTION [dbo].[pyFunc1] (@cmd nvarchar(MAX), @arg1 sql_variant) RETURNS sql_variant AS EXTERNAL NAME PYSQL.[UserDefinedFunctions].[pyFunc1]; GO --      CREATE FUNCTION [dbo].[pyFunc2] (@cmd nvarchar(MAX), @arg1 sql_variant, @arg2 sql_variant) RETURNS sql_variant AS EXTERNAL NAME PYSQL.[UserDefinedFunctions].[pyFunc2]; GO --   CREATE PROCEDURE pyProc (@code nvarchar(MAX)) AS EXTERNAL NAME PYSQL.StoredProcedures.pyProc GO --   CREATE AGGREGATE [dbo].[pyAggregate] ( @value sql_variant, @init nvarchar(MAX), @accumulate nvarchar(MAX), @merge nvarchar(MAX), @terminate nvarchar(MAX), @save nvarchar(MAX), @restore nvarchar(MAX) ) RETURNS sql_variant EXTERNAL NAME PYSQL.[pyAggregate]; GO 




results



If you have read this far, then you are entitled to reward yourself by seeing the results of your work.



Function


First of all, let's solve the problem with regular expressions - we will find email addresses in the line. The string itself will be inserted directly into the script during its formation.

 select [dbo].[pyFunc]( ' import re mails = re.findall("[.\\-_a-z0-9]+@(?:[a-z0-9][\\-a-z0-9]+\\.)+[az]{2,6}","'+doc+'") result = ": " for mail in mails: result += mail + "," result[:-1] ' ) from ( select '     somebody@gmail.com' doc union select '     person1@mail.ru  person2@list.ru' doc ) SAMPLE_DATA 


Result:

Found: somebody@gmail.com

Found: person1 @ mail.ru, person2 @ list.ru



The same, but using parameters. In my opinion, a more beautiful way, but requiring a function declaration with the required number of arguments.

 select [dbo].[pyFunc1]( ' import re mails = re.findall("[.\\-_a-z0-9]+@(?:[a-z0-9][\\-a-z0-9]+\\.)+[az]{2,6}",str(args[0])) result = ": " for mail in mails: result += mail + "," result[:-1] ', SAMPLE_DATA.doc ) from ( select '     somebody@gmail.com' doc union select '     person1@mail.ru  person2@list.ru' doc ) SAMPLE_DATA 


The result is naturally the same.



This function can also be used to calculate complex math functions not built into SQL Server and to dynamically evaluate expressions (this can be achieved with sp_execute).

 declare @InputFromula as nvarchar(MAX) SET @InputFromula = 'math.log(math.cosh(int(args[0]))/math.e,int(args[1]))' select [dbo].[pyFunc2] ( 'import math '+@InputFromula, 100,5 ) 




Stored procedure


I will give a complete example at once. Here we write a text message using the SqlPipe object, carefully passed to our script, then we form a table, fill it with data and return it as a result.

 exec pyProc ' import clr clr.AddReference("System.Data") from Microsoft.SqlServer.Server import * from System.Data import * from System import DateTime Pipe.Send(" : !") metadata = ( SqlMetaData(" ", SqlDbType.NVarChar, 12), SqlMetaData("  ", SqlDbType.Int), SqlMetaData("  ", SqlDbType.DateTime) ) record = SqlDataRecord(metadata) record.SetString(0, "bocharovf"); record.SetInt32(1, 1000000); record.SetDateTime(2, DateTime.Now); #   Pipe.Send(record) 1 ' 


Result of performance:

We write the message: let's go!

Employee nameExpected salaryExpected date of increase
bocharovf1,000,0002012-12-31 02: 39: 51.293


(1 row (s) affected)



Aggregate function


We list the languages ​​used in the article, separated by commas, using our aggregating function. In the serialization and deserialization scripts we use references to instances of the BinaryReader and BinaryWriter classes passed to our script. The result is accumulated in the variable data.

 select dbo.pyAggregate ( --   SAMPLE_DATA.[language], --  'data = ""', --  'data += str(value) + ", "', '# nop', --   'data[:-2]', --  'writer.Write(str(data))', --  'data = reader.ReadString()' ) as Languages ,SAMPLE_DATA.IsUsed from ( select 'C#' [language], '  ' IsUsed union select 'T-SQL', '  ' union select 'IronPython', '  ' union select 'Cobol', '   ' union select 'Ada', '   ' union select 'Lisp', '   ' union select 'Fortran', '   ' ) SAMPLE_DATA group by SAMPLE_DATA.IsUsed 


Result:

LanguagesIsUsed
C #, IronPython, T-SQLUsed in article
Ada, Cobol, Fortran, LispNot used in the article


(2 row (s) affected)



Performance


As expected, the performance is low. For example, the right-to-left output rate of the standard T-SQL REVERSE function and using the cut operation in python is almost 80 times different.



Security


 exec pyProc ' from System.Net import WebClient content = WebClient().DownloadString("http://habrahabr.ru/") Pipe.Send(content[:4000]) 1 ' 


Since our capabilities are not limited by anything, we can execute any code, including access to a resource on the Internet or delete a file from the disk. Therefore, it is necessary to distribute the rights with care and use mainly the transfer of parameters to the function instead of creating a script in parts.



Instead of conclusion



Each tool should only be used where it really makes sense. You should not try to rewrite all your stored procedures and functions using IronPython. Rather, functions on IronPython are suitable for implementing complex algorithms that use functionality that is missing in Transact SQL or processing data from external sources (file system, Internet). In addition to IronPython, anyone can embed IronRuby support or, for example, Javascript .NET.

Source: https://habr.com/ru/post/164633/



All Articles