Introduction
Databases came to the world in the year 70. Since then, they have become larger in size, with a more complex logic and continue to grow. Since then, many tools have appeared for parsing SQL and building dependencies. All of them break SQL into tokens using one or another parser and grammars.
But common parsers (ANTLR, for example) may encounter script parsing errors, each SQL dialect has its own peculiarities. Analysis time can also be significant on large databases.
I want to show a much simpler way: RegEx + a little trick,
so…
Home "chip"
- Get a single-file SQL script for all db objects
- Split the SQL script into text blocks using a regular expression that matches the DDL statements
- Find some text string among these text blocks.
- ...
- PROFIT !!! 1
Realization or how everything works
1. Create a script of all SQL objects. One file
Each RDBMS has the ability to generate drop / create SQL scripts. So, first we get a single SQL script for all database objects. Or we can take northwind.sql as an example.
')
2. Find in the script all drop / create / alter DDL using regular expressions
I use this expression
\b(create|alter)\s+(unique)*\s*(nonclustered)*\s*(procedure|proc|table|trigger|view|function|index)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*
FastColoredTextBox provides excellent tools for working with regular expressions, so we use
var result = range.GetRangesByLines(regexStr, RegexOptions.IgnoreCase);
Learn more about regular expressions used.I wrote some useful regular expressions and collected them in one static class.
public static class RegexValues { public static string SqlCmdObjects = @"\[\$\(([^=<>\[\]\s\']+)\)\].\[[^=<>\s\']+\]"; public static string SqlCmdObjectsShort = @"\$\(([^=<>\[\]\s\']+)\)"; public static string DdlObjects = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\[\$\(([^=<>\[\]\s\']+)\)\].\[[^=<>\s\']+\]"; public static string DdlObjectsPrepared = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\[([^=<>\[\]\s\']+)\].(\[[^=<>\s\']+\])"; public static string DdlObjects_ = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s\$\(([^=<>\[\]\s\']+)\).[^=<>\s\']+"; public static string DdlObjectsPrepared_ = @"\b(create)\s+(procedure|proc|table|trigger|view|function)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*"; public static string DdlObjectsPreparedProcViewTrigger = @"\b(create)\s+(procedure|proc|trigger|view|function)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*"; public static string DdlObjectsPreparedWithIndex = @"\b(create)\s+(unique)*\s*(nonclustered)*\s*(procedure|proc|table|trigger|view|function|index)\b\s([^=<>\s\']+).([^=<>\s\'\(]+)[*\s\b\t]*"; public static string DdlIndexAll = @"\b(create|alter)\s+(procedure|proc|table|trigger|view|function)\b\s\[([^=<>\[\]\s\']+)\].\[[^=<>\s\']+\]"; public static string Variables = @"\@([^=<>\s\'\)\(\,]+)"; public static string SqlCmdVariables = @"\:SETVAR\s+([a-zA-Z_]+)\s+([a-zA-Z_]+)"; }
- SqlCmdObjects - finds lines of this type [$ (some_var)]. [Obj_name] or [$ (some_var)]. [Obj_schema]. [Obj_name]
- SqlCmdObjectsShort - corresponds to $ (some_var)
- DdlObjects - finds the same as SqlCmdObjects, but in conjunction with the instructions CREATE PROC / TABLE / VIEW / FUNC / TRIGGER
- DdlObjectsPrepared is the same as DdlObjects, but the constraint $ (sqlcmd) is removed
- DdlObjects_ - the same as DdlObjects, but the restriction on brackets [] is removed
- DdlObjectsPrepared_ - the same as DdlObjectsPrepared, but the bracket restriction [] is removed
- DdlObjectsPreparedProcViewTrigger - DDL statements for proc, views, triggers and functions only, i.e. only for "procedural" objects
- DdlObjectsPreparedWithIndex - the same as DdlObjectsPrepared_, but extended with indexing
- DdlIndexAll is the same as DdlObjectsPrepared_, but the ALTER statement is also taken into account.
- Variables is a regular expression that finds all SQL variables in a script.
- SqlCmdVariables - find SQL CMD variables (for example: SETVAR var1 val1)
The basis of this software is actually the following lines of code:
var range = Syncronized(() => fastColoredTextBox1.Selection.Clone()); range.Normalize(); range.Start = new Place(0, 0); range.End = Syncronized(() => new Place(tb.GetLineLength(tb.LinesCount - 1), tb.LinesCount - 1));
ie, breakdown of the SQL file into ranges (Range).
We simply load the SQL script into FastColoredTextBox (thanks to the authors!) And apply some regular expression to all its contents. As a result, we obtained the RESULT variable, which contains a list of the ranges found.
Range is a powerful class (thanks again to the author FastColoredTextBox). The range contains the row number and column number of the found fragment in the SQL script. We simply save the list of ranges in a virtual list (common method), and on SelectedIndexChanged we do the following
fastColoredTextBox1.Selection = range; fastColoredTextBox1.DoSelectionVisible();
This provides us with convenient and fast navigation through the SQL file.
3. Find all occurrences of the SQL object (or arbitrary string) in the file.
We create another list of ranges (the Range class from the FastColoredTextBox library), but instead of a regular expression string we put the name of some SQL object
4. The intersection of two lists of ranges - the same algorithmic focus.
The last step is to find entries from the second list of ranges to the first. This is done in the method
private void FindUsage(string regex)
see class
Form1.csThat's it, the rest of the work is just plain .NET coding to put all the components together.
TOTAL + small HOWTO1. find all create / alter instructions with regex. And remember these line numbers.
2. find all occurrences of the text (object, for example, the name of the table) using a regular expression. And also remember the line numbers.
3. we connect these two data sets, finding where the text line is located between the create / alter lines
As a result, we have a simple GUI designer, where you can:
- Open the SQL script or the entire folder (the scripts will be merged into a temporary file and opened as a single script)
- find all DDL statements (CREATE / ALTER) - the list will be created in the left pane with full navigation support.
- select parts of the text with the mouse (for example, some name of the SQL object)
- right-click and select "Find Usage" - the list will be created in the right pane - these are SQL objects containing the selected text
Also, you can search not only by the names of SQL objects, but also virtually any part of the code, text, comment, etc.
Important note
Because this way of analyzing SQL dependencies is not based on metadata, you should always remember that you are combining two sets of data from strings. Thus, there are some limitations or, better to say, “features” :)
Let's say we have a stored procedure
create proc test as declare @somevar int create table
If we use the regex instruction that matches the “CREATE TABLE”, we will get the #TMP table together with the CREATE PROC in the results of the match.
Then, if we try to find “COLUMN_WE_SEARCH”, it will be found inside this temporary table #TMP, and not in the TEST procedure.
It can be beat using a more accurate initial regular expression. For such cases, I wrote the regex DdlObjectsPreparedProcViewTrigger operator (see above) - matching DDL statements for proc, views, triggers and functions, i.e. only for SQL objects containing code.
Different and tasty
I would like in the future to add support for
MSAGL for visualization of dependencies and greater visibility!
FastColoredTextBox is a hi-end, complete breakthrough! I did not even expect him to have so many interesting features!
→
Sources→
Build version (archive with exe file)