📜 ⬆️ ⬇️

Parsing with ANTLR 4 T-SQL snapshots to get connections between tables

Good day friends.

The task is designated as abstract, i.e. for mastering the methods of solving, although anyone can find practical application.

Initially, the idea arose when I came to the office of a SQL developer. There were hundreds of stored procedures, broken documentation, there were no links between the tables.
')
To the point. Further use of Antlr and code samples.

Thanks to colleagues with foxclub.ru and KvanTTT for criticism and this:
Yes, and I don’t understand what the accept method in the Visitor template is used for.
I use the current Visit *

Answer KvanTTT

- You will not need it, tk. used in generated code and calls the required Visit methods.

Antlr - this thing generates lexers and parsers in different languages, by default java.

Building the antlr itself

set PATH=c:\apache-maven-3.3.9\bin\;%PATH% rem mvn -DskipTests install mvn install -DskipTests=true 

build c: \ dev \ antlr4 \ runtime \ CSharp \ runtime \ CSharp \ Antlr4.vs2013.sln

Add to tsql.g4 (SQL grammar file for antlr):

  options { language = CSharp; } 

generating parser with gen_csharp.cmd

  java -jar c:\dev\antlr4\tool\target\antlr4-4.7-SNAPSHOT-complete.jar -o Generated_Csharp tsql.g4 rem java -jar c:\antlrworks2\antlrworks2\modules\ext\antlr4.jar -o Generated_Csharp tsql.g4 pause 

We tsqlBaseListener.cs, tsqlBaseVisitor.cs, tsqlLexer.cs, tsqlListener.cs, tsqlParser.cs, tsqlVisitor.cs, tsql.tokens, tsqlLexer.tokens eight files: tsqlBaseListener.cs, tsqlBaseVisitor.cs, tsqlLexer.cs, tsqlListener.cs, tsqlParser.cs, tsqlVisitor.cs, tsql.tokens, tsqlLexer.tokens

Create a project and add the received files and link to it.
c:\dev\antlr4\runtime\CSharp\runtime\CSharp\Antlr4.Runtime\bin\net35\Release\Antlr4.Runtime.dll

Well further, let's fix the parsers code ..., but

Antlr provides two design patterns to choose from: Listener and Visitor.

I tried to deal with Visitor first, but as it turned out, it is more often used for constantly changing code, which may contain syntax errors, and methods use return of the result of parsing.

And T-SQL stored procedures are compiled, i.e. are static and do not contain syntax errors. And the code using Listener to write less. Bypass code using Visitor I put at the end.

So, our program Program.cs with Listener

 using System; using System.IO; using Antlr4.Runtime; using Antlr4.Runtime.Misc; namespace tsql1 { class Program { static void Main(string[] args) { //try //{ string text = System.IO.File.ReadAllText(@"c:\dev\antlr4\grammars-v4-master\tsql\examples\dml_select.sql"); StringReader reader = new StringReader(text); //       ... AntlrInputStream input = new AntlrInputStream(reader); //      tsqlLexer lexer = new tsqlLexer(input); //       CommonTokenStream tokens = new CommonTokenStream(lexer); //   tsqlParser parser = new tsqlParser(tokens); // Specify our entry point //tsqlParser.Query_specificationContext tsqlParser.Tsql_fileContext Tsql_fileContext1 = parser.tsql_file(); Console.WriteLine("Tsql_fileContext1.ChildCount = " + Tsql_fileContext1.ChildCount.ToString()); // Walk it and attach our listener Antlr4.Runtime.Tree.ParseTreeWalker walker = new Antlr4.Runtime.Tree.ParseTreeWalker(); AntlrTsqListener listener = new AntlrTsqListener(); walker.Walk(listener, Tsql_fileContext1); //} // catch (Exception e) //{ // Console.WriteLine(e.Message); //} Console.ReadKey(); } } public class AntlrTsqListener : tsqlBaseListener { private enum JoinMode { Undefined, Where, Join }; private JoinMode mode; private enum BranchType { Select, Table_sources, Search_condition //Join }; private BranchType branch; private string alias = ""; public override void EnterQuery_specification(tsqlParser.Query_specificationContext ctx) { mode = JoinMode.Undefined; } public override void EnterTable_sources(tsqlParser.Table_sourcesContext ctx) { if (ctx.ChildCount > 1) mode = JoinMode.Where; branch = BranchType.Table_sources; } public override void EnterTable_source_item_joined([NotNull] tsqlParser.Table_source_item_joinedContext ctx) { if ((mode == JoinMode.Undefined & ctx.ChildCount == 1) || (mode == JoinMode.Where)) return; mode = JoinMode.Join; branch = BranchType.Table_sources; } public override void EnterTable_name_with_hint([NotNull] tsqlParser.Table_name_with_hintContext ctx) { if (mode == JoinMode.Undefined) return; if (branch == BranchType.Table_sources) Console.WriteLine(branch.ToString()); alias = ""; } public override void EnterTable_name([NotNull] tsqlParser.Table_nameContext ctx) { if (branch == BranchType.Search_condition || branch == BranchType.Select || mode == JoinMode.Undefined) return; Console.WriteLine(ctx.GetText()); } public override void EnterTable_alias([NotNull] tsqlParser.Table_aliasContext ctx) { if (branch == BranchType.Search_condition || branch == BranchType.Select | mode == JoinMode.Undefined) return; alias = ctx.GetChild(0).GetText(); Console.WriteLine("alias=" + alias); } public override void EnterSearch_condition([NotNull] tsqlParser.Search_conditionContext ctx) { if (mode == JoinMode.Undefined) return; branch = BranchType.Search_condition; Console.WriteLine("Search_condition"); Console.WriteLine(ctx.GetText()); return; } public override void EnterSelect_statement([NotNull] tsqlParser.Select_statementContext ctx) { Console.WriteLine("Select_statement"); branch = BranchType.Select; return; } 

Input data:

 select * from t1, t2 where t1.id = t2.id SELECT p.* FROM Production.Product AS p ORDER BY Name ASC; GO select * from zxc as t1 inner join qwe t2 on t1.id = t2.id inner join asd t3 on t3.id = t2.id ... 

Example output:

 Tsql_fileContext1.ChildCount = 105 Select_statement Table_sources t1 Table_sources t2 Search_condition t1.id=t2.id Select_statement Select_statement Table_sources zxc alias=t1 Table_sources qwe alias=t2 Search_condition t1.id=t2.id Table_sources asd alias=t3 Search_condition t3.id=t2.id ... 

This result can be expanded in Excel if you want.

Everything.

Please do not hit hard. Maybe something forgot, maybe something is outdated.

Grammar for antlr on github
Old grammar for pastebin t-sql
Antlr on antlr.org

Bypass Code Using Visitor
  using System; using System.IO; using Antlr4.Runtime; using Antlr4.Runtime.Misc; namespace tsql1 { class Program { static void Main(string[] args) { //try //{ string text = System.IO.File.ReadAllText(@"c:\dev\antlr4\grammars-v4-master\tsql\examples\dml_insert.sql"); StringReader reader = new StringReader(text); //       ... AntlrInputStream input = new AntlrInputStream(reader); //      tsqlLexer lexer = new tsqlLexer(input); //       CommonTokenStream tokens = new CommonTokenStream(lexer); //   tsqlParser parser = new tsqlParser(tokens); // Specify our entry point //tsqlParser.Query_specificationContext tsqlParser.Tsql_fileContext Tsql_fileContext1 = parser.tsql_file(); Console.WriteLine("Tsql_fileContext1.ChildCount = " + Tsql_fileContext1.ChildCount.ToString()); /* // Walk it and attach our listener Antlr4.Runtime.Tree.ParseTreeWalker walker = new Antlr4.Runtime.Tree.ParseTreeWalker(); AntlrTsqListener listener = new AntlrTsqListener(); walker.Walk(listener, Tsql_fileContext1);*/ AntlrTsqVisitor visitor = new AntlrTsqVisitor(); var result = visitor.Visit(Tsql_fileContext1); //} // catch (Exception e) //{ // Console.WriteLine(e.Message); //} Console.ReadKey(); } } /* public class AntlrTsqListener: tsqlBaseListener { public override void EnterQuery_specification(tsqlParser.Query_specificationContext ctx) { int ii = 0; //Console.WriteLine(ctx.ToStringTree()); Console.WriteLine("ctx.ChildCount" + ctx.ChildCount.ToString()); for (ii = 0; ii < ctx.ChildCount; ++ii) { Console.WriteLine("ii=" + ii.ToString()); Console.WriteLine(ctx.GetChild(ii).GetType().ToString()); Console.WriteLine(ctx.GetChild(ii).GetText()); } //Console.WriteLine(ctx.GetChild<tsqlParser.i>().ToString()); } } */ public class AntlrTsqVisitor : tsqlBaseVisitor<String> { /* public override string VisitSql_clauses(tsqlParser.Sql_clausesContext ctx) { Console.WriteLine("VisitSql_clauses"); return VisitChildren(ctx).ToString(); } */ public override string VisitSql_clause(tsqlParser.Sql_clauseContext ctx) { Console.WriteLine("VisitSql_clause"); try { return VisitDml_clause(ctx.dml_clause()).ToString(); } catch (Exception e) { return ""; } } /* public override string VisitDml_clause(tsqlParser.Dml_clauseContext ctx) { Console.WriteLine("VisitDml_clause"); return VisitChildren(ctx).ToString(); } */ public override string VisitSelect_statement([NotNull] tsqlParser.Select_statementContext ctx) { Console.WriteLine("VisitSelect_statement"); return VisitTable_sources(ctx.query_expression().query_specification().table_sources()).ToString(); } public override string VisitDelete_statement([NotNull] tsqlParser.Delete_statementContext ctx) { Console.WriteLine("VisitDelete_statement"); try { return VisitTable_sources(ctx.table_sources()); } catch (Exception e) { return ""; } } public override string VisitUpdate_statement([NotNull] tsqlParser.Update_statementContext ctx) { Console.WriteLine("VisitUpdate_statement"); try { return VisitTable_sources(ctx.table_sources()); } catch (Exception e) { return ""; } } public override string VisitInsert_statement([NotNull] tsqlParser.Insert_statementContext ctx) { Console.WriteLine("VisitInsert_statement"); try { return VisitTable_sources(ctx.insert_statement_value().derived_table().subquery().select_statement().query_expression().query_specification().table_sources()); } catch (Exception e) { return ""; } } /* public override string VisitTable_sources([NotNull] tsqlParser.Table_sourcesContext ctx) { Console.WriteLine("VisitTable_sources"); return VisitChildren(ctx).ToString(); } public override string VisitTable_source([NotNull] tsqlParser.Table_sourceContext ctx) { Console.WriteLine("VisitTable_source"); return VisitChildren(ctx).ToString(); } public override string VisitTable_source_item_joined([NotNull] tsqlParser.Table_source_item_joinedContext ctx) { Console.WriteLine("VisitTable_source_item_joined"); return VisitChildren(ctx).ToString(); } */ public override string VisitTable_source_item([NotNull] tsqlParser.Table_source_itemContext ctx) { Console.WriteLine("VisitTable_source_item"); int ii = 0; //Console.WriteLine(ctx.ToStringTree()); Console.WriteLine("ctx.ChildCount " + ctx.ChildCount.ToString()); for (ii = 0; ii < ctx.ChildCount; ++ii) { Console.WriteLine("ii=" + ii.ToString()); Console.WriteLine(ctx.GetChild(ii).GetType().ToString()); Console.WriteLine(ctx.GetChild(ii).GetText()); //if (ctx.GetChild(ii).GetType().ToString() == "tsqlParser+Table_sourcesContext") //{ // this.VisitTable_sources(ctx.table_sources()); //} } //Console.WriteLine(ctx.GetChild<tsqlParser.i>().ToString()); return ctx.ToString(); } public override string VisitJoin_part([NotNull] tsqlParser.Join_partContext ctx) { Console.WriteLine("VisitJoin_part"); int ii = 0; //Console.WriteLine(ctx.ToStringTree()); Console.WriteLine("ctx.ChildCount " + ctx.ChildCount.ToString()); for (ii = 0; ii < ctx.ChildCount; ++ii) { Console.WriteLine("ii=" + ii.ToString()); Console.WriteLine(ctx.GetChild(ii).GetType().ToString()); Console.WriteLine(ctx.GetChild(ii).GetText()); if (ctx.GetChild(ii).GetType().ToString() == "tsqlParser+Table_sourceContext") { this.VisitTable_source(ctx.table_source()); } } //Console.WriteLine(ctx.GetChild<tsqlParser.i>().ToString()); return ctx.ToString(); } } } 

Source: https://habr.com/ru/post/346038/


All Articles