📜 ⬆️ ⬇️

Simple parser JSON on PL / SQL

Just yesterday, a problem suddenly arose - it was necessary to parse the data in JSON format directly in the Oracle stored procedure. Of course, Java was added to such things in Oracle, but I wanted something more than my own and written directly in PL / SQL. I spread the results of my impulse to the public. Suddenly someone come in handy.

For a start, it's helpful to decide what we are going to do. Let the source data lie in the CLOB-field of a table:

create table ae_spec ( id number not null, name varchar2(30) not null, json CLOB ); alter table ae_spec add constraint pk_ae_spec primary key(id); 

We will add the result of the analysis to a tree placed in a temporary label and used directly in the same transaction in which we parse the data:

 create global temporary table ae_json ( id number not null, parent_id number, name varchar2(1000), value varchar2(1000) ) on commit delete rows; 

By loading data into this table, we can use the full power of SQL to process it.
')
Now everything is ready for the development of our small package:

Package preparation
 create or replace package ae_spec_pkg as procedure compile(p_name in varchar2); end ae_spec_pkg; / create or replace package body ae_spec_pkg as procedure compile(p_name in varchar2) as begin --  JSON load(p_name); -- TODO:   commit; exception when others then rollback; raise; end; end ae_spec_pkg; / 


JSON parsing will be conveniently divided into two procedures. The task of the scanner (load procedure) is to view the source text and extract from it a stream of tokens :

Scanner
 create or replace package body ae_spec_pkg as g_spec_state constant number default 0; g_name_state constant number default 1; procedure lexem(p_state in number ,p_value in varchar2) as begin insert into ae_script_log(id, tp, value) values (ae_script_log_seq.nextval, p_state, p_value); end; procedure load(p_name in varchar2) as l_lob CLOB; l_str varchar2(1000) default null; l_len number default null; l_pos number default 1; l_ix number default 1; l_st number default g_spec_state; l_ch varchar2(1) default null; l_val varchar2(1000) default null; l_qt varchar2(1) default null; l_bs number default 0; begin select json into l_lob from ae_spec where name = p_name for update; dbms_lob.open(l_lob, dbms_lob.lob_readonly); l_len := dbms_lob.getlength(l_lob); while l_pos <= l_len loop l_str := dbms_lob.substr(l_lob, 1000, l_pos); l_ix := 1; while l_ix <= length(l_str) loop l_ch := substr(l_str, l_ix, 1); if not l_qt is null then if l_bs = 1 then if not l_ch in (chr(13), chr(10)) then l_val := l_val || l_ch; l_bs := 0; end if; goto l; end if; if l_ch = '\' then l_bs := 1; goto l; end if; if l_ch = l_qt then lexem(l_st, l_val); l_st := g_spec_state; l_qt := null; else l_val := l_val || l_ch; end if; goto l; end if; if l_ch in ('{', '}', '[', ']', ':', ',', ' ', chr(9), chr(13), chr(10)) then if l_st = g_name_state then lexem(l_st, l_val); end if; if l_ch in ('{', '}', '[', ']', ':', ',') then lexem(g_spec_state, l_ch); end if; l_st := g_spec_state; goto l; end if; if l_ch in ('''', '"') then l_val:= null; l_qt := l_ch; l_st := g_name_state; l_bs := 0; goto l; end if; if l_st = g_name_state then l_val := l_val || l_ch; else l_val := l_ch; l_st := g_name_state; end if; <<l>> l_ix := l_ix + 1; end loop; l_pos := l_pos + 1000; end loop; if l_st = g_name_state then lexem(l_st, l_val); end if; dbms_lob.close(l_lob); exception when others then if dbms_lob.isopen(l_lob) = 1 then dbms_lob.close(l_lob); end if; raise; end; ... end ae_spec_pkg; / 


Since JSON is a very simple format, our scanner needs only two states (g_spec_state - waiting for the next control character and g_name_state - waiting for the continuation of entering a name or value).

In order to make sure that the analysis is correct, we will put the result for the time being in the tablet - the log. Making sure on several test examples that everything works as intended, we make changes to lexem, to save the received data in the tree (we make small changes to the load in order for everything to work):

Ready parser
 create or replace package body ae_spec_pkg as g_spec_state constant number default 0; g_name_state constant number default 1; e_syntax_error EXCEPTION; pragma EXCEPTION_INIT(e_syntax_error, -20001); procedure lexem(p_state in number ,p_value in varchar2 ,p_node in out NOCOPY number) as l_id number default null; l_vl ae_json.name%type; begin if p_state = g_spec_state then if p_value in ('}', ']', ',') then select parent_id into p_node from ae_json where id = p_node; end if; if p_value in ('{', '[', ',') then select max(id) + 1 into l_id from ae_json; insert into ae_json(id, parent_id) values (l_id, p_node); p_node := l_id; end if; if p_value = ':' then select name into l_vl from ae_json where id = p_node; if l_vl is null then RAISE_APPLICATION_ERROR(-20001, 'Syntax error'); end if; end if; else select name into l_vl from ae_json where id = p_node; if l_vl is null then update ae_json set name = p_value where id = p_node; else select value into l_vl from ae_json where id = p_node; if not l_vl is null then RAISE_APPLICATION_ERROR(-20001, 'Syntax error'); end if; update ae_json set value = p_value where id = p_node; end if; end if; end; procedure load(p_name in varchar2) as l_lob CLOB; l_str varchar2(1000) default null; l_len number default null; l_pos number default 1; l_ix number default 1; l_st number default g_spec_state; l_ch varchar2(1) default null; l_val varchar2(1000) default null; l_qt varchar2(1) default null; l_bs number default 0; l_node number default 0; begin insert into ae_json(id) values (l_node); select json into l_lob from ae_spec where name = p_name for update; dbms_lob.open(l_lob, dbms_lob.lob_readonly); l_len := dbms_lob.getlength(l_lob); while l_pos <= l_len loop l_str := dbms_lob.substr(l_lob, 1000, l_pos); l_ix := 1; while l_ix <= length(l_str) loop l_ch := substr(l_str, l_ix, 1); if not l_qt is null then if l_bs = 1 then if not l_ch in (chr(13), chr(10)) then l_val := l_val || l_ch; l_bs := 0; end if; goto l; end if; if l_ch = '\' then l_bs := 1; goto l; end if; if l_ch = l_qt then lexem(l_st, l_val, l_node); l_st := g_spec_state; l_qt := null; else l_val := l_val || l_ch; end if; goto l; end if; if l_ch in ('{', '}', '[', ']', ':', ',', ' ', chr(9), chr(13), chr(10)) then if l_st = g_name_state then lexem(l_st, l_val, l_node); end if; if l_ch in ('{', '}', '[', ']', ':', ',') then lexem(g_spec_state, l_ch, l_node); end if; l_st := g_spec_state; goto l; end if; if l_ch in ('''', '"') then l_val:= null; l_qt := l_ch; l_st := g_name_state; l_bs := 0; goto l; end if; if l_st = g_name_state then l_val := l_val || l_ch; else l_val := l_ch; l_st := g_name_state; end if; <<l>> l_ix := l_ix + 1; end loop; l_pos := l_pos + 1000; end loop; if l_st = g_name_state then lexem(l_st, l_val, l_node); end if; if l_node <> 0 then RAISE_APPLICATION_ERROR(-20001, 'Syntax error'); end if; dbms_lob.close(l_lob); exception when others then if dbms_lob.isopen(l_lob) = 1 then dbms_lob.close(l_lob); end if; raise; end; procedure compile(p_name in varchar2) as begin load(p_name); -- TODO: commit; exception when others then rollback; raise; end; end ae_spec_pkg; / 


In this implementation, I did not try to catch all the possible syntax errors in JSON data, limiting myself to the detection of errors leading to the unconditional impossibility of data parsing. In these cases, all changes made in the transaction are rolled back and the corresponding exception is returned.

In addition, the developed parser (quite deliberately on my part) permits significant easing in terms of formatting the source data. For example, it can parse the following description, generally speaking, not being validated as JSON data:

 { tables: { AD_ACTIVATION_TYPE: { ID: { attribute: id } , ACT_DATE: { attribute: start_date , sql: "is null" } , ACT_PRIORITY: { attribute: priority } , TYPE_ID: { attribute: subtype } , ACT_STATE: { attribute: state , sql: "= 1" } } } , attributes: { id: { type: integer , is_mandatory } , start_date: { type: date } , priority: { type: integer } , subtype: { type: integer , is_mandatory } , state: { type: integer , is_mandatory } } } 

Of course, if necessary, data parsing can be tightened by adding the necessary checks, but I do not see much point in this, since the correct JSON data is disassembled without any problems.

That's all. I would be glad if my post will be useful to someone.

Source: https://habr.com/ru/post/195496/


All Articles