Parsing in Prolog

The publication of the first part ( habrahabr.ru/post/274603 ) caused a rather extensive and interesting discussion on various aspects of the PROLOG language.
The goal was to show experienced, and not so much to programmers, that there is nothing difficult in the Prologue, and everyone can use it in their work.
For some reason there were no questions directly on the text of the publication. I will think that everything is clear there.
Let us begin to consider the more practical aspects of programming in the Prolog language.
In this section, we consider the main aspects of the implementation of the PROLOG language of translation of context-free languages on the example of arithmetic expressions.
As you know, PROLOG originated from attribute grammars. Therefore, parsing is the first and natural application of this language.
Unlike a tool such as regular expressions, it is easy to write translators for PROLOG for more complex context-free languages.
For the translator, you must have a lexical analyzer, a parser and an object code generator.

1. Lexical analysis

The input lexical analyzer has a string of characters, the output has a list of lexical units. Consider a lexical analyzer for arithmetic expressions.
The original predicate lexr / 2 converts a string of characters into a list of character codes, removes spaces (delb / 2), and then goes to lexr1 / 2, which recursively scans the input list, extracting from it lexical units — service characters and integers — sequences of numbers .
The predicate lexr1 / 2 terminates when the input list of characters is exhausted.
The term / 4 predicate from the input list of characters highlights the next lexical unit Hs, which is written to the beginning of the output list of the calling predicate. The fourth argument is the remaining list, which will be processed in the next recursion step. The second argument is a service one, for accumulating a list of digits of an integer. In the beginning we set it to [].
The term / 4 predicate consists of four procedures for four situations that may arise when viewing an input list.
If there is a service symbol in the head of the list, put it in the output list, having previously converted it to a string. This is provided that the working variable contains an empty list.
If the next item in the list is a service symbol, but there are values in the working variable, we convert the accumulated list of digits from the second argument into a string and put it into the output list. In this case, the service symbol is entered into the output list for re-consideration at the next recursive step.
If the next character is a digit, then it is entered at the end of the second argument - the service list and the recursion continues.
The fourth procedure is enabled when the input list is exhausted when searching for the next digit of the number. Exhaustion of the list means the completion of viewing the whole number.
The service predicates spec / 1, digit / 1 provide verification of the character code for compliance with the service character or digit.
The member / 2 predicate checks for an item in the list.
The append / 3 predicate joins two lists.

lexr(S,R):- list_text(L,S), delb(L,L1),lexr1(L1,R),!.

lexr1(L,[Hs|T1]):-
        term(L,[],Hs,L1),
        lexr1(L1,T1).
lexr1([],[]).

term([H|T],[],Hs,T):- spec(H),list_text([H],Hs).
term([H|T],L,I,[H|T]):- 
       L\=[],        
        spec(H),
        list_text(L,Ls),
        int_text(I,Ls).
term([H|T],L,R,Lr):- 
       digit(H),        
        append(L,[H],L1),
        term(T,L1,R,Lr).
term([],L,I,[]):-
        L\=[],
        list_text(L,Ls),
        int_text(I,Ls).

delb([32|T],L):-delb(T,L).
delb([H|T],[H|T1]):-delb(T,T1).
delb([],[]).

spec(H):-member(H,"+-*/()").
digit(H):- H>47, H<58.
append([H|T],L,[H|L2]):-append(T,L,L2).
append([],L,L).
member(H,[H|T]).
member(H,[H1|T]):-H\=H1,member(H,T).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%   
lr:-
        A = $(10-237*65+3/837467)-(342-678)$,
        lexr(A,L),
        write(A),nl,
        write(L),nl.

ex -->  sterm. 
ex -->  sterm, adsign,ex.
sterm --> adsign, term.
sterm --> term.
term --> factor.
term --> factor,musign,term1.
term1 --> term.
term1 --> term, musign, term1.
factor --> [N],{number(N)}.
factor --> lb,ex,rb.
adsign --> ['+']. adsign --> ['-'].
musign --> ['*']. musign --> ['/'].
lb --> ['(']. rb --> [')'].

.
e:-
A=[12,’-‘,98,’*’,’(‘,19,’+’,34,’*’,’(‘,200,’-‘,23,’)’,’)’, ],
ex(A,[]).

, :

es:-
read_line(0,S),
lexr(S,L),
ex(L,[]).

, . expand_term/2, , DCG – .

expand_term((a--> b,c,d),L).
:
L = a(A,B):-b(A,C),c(C,D),d(D,B)

expand_term((a--> [‘+’]),L).

:
L= a([‘+’|T],T)

, – , . , , «» .
, , , , .
, DCG – , – .
, «» DCG – , . DCG , .
, , «Yes» «No».
– . .

3.

, . , , .

ex(R) -->  sterm(R). 
ex([S,R1,R2]) -->  sterm(R1), adsign(S),ex(R2).
sterm([S,R,[]]) --> adsign(S), term(R).
sterm(R) --> term(R).
term(R) --> factor(R).
term([S,R1,R2]) --> factor(R1),musign(S),term1(R2).
term1(R) --> term(R).
term1([S,R1,R2]) --> term(R1), musign(S), term1(R2).
factor(N) --> [N],{number(N)}.
factor(R) --> lb,ex(R),rb.
adsign('+') --> ['+']. adsign('-') --> ['-'].
musign('*') --> ['*']. musign('/') --> ['/'].
lb --> ['(']. rb --> [')'].

e3:-
        S='10-3-5+4',
        lexr(S,L),
        ex(R,L,[]),
        calc(R,Res),
        write(S),nl,
        write(R),nl,
        write(Res),nl.

:

[-,10,[-,3,[+,5,4]]]

calc/2 .

calc([S,A1,A2],Nr):-calc(A1,N1),calc(A2,N2),calc1(S,N1,N2,Nr),!.
calc(A1,A1):-A1\=[_|_].
calc1(*,N1,N2,Nr):- Nr is N1*N2.
calc1(/,N1,N2,Nr):- Nr is N1/N2.
calc1(+,N1,N2,Nr):- Nr is N1+N2.
calc1(-,N1,N2,Nr):- Nr is N1-N2.

– «16». , «6»! – – . , .
, , .

ex([S,R1,R2]) --> sterm(R1), adsign(S),ex1(R2).

ex([S,R1,R2]) --> ex(R1), adsign(S),term(R2).

. ? – «» , – !

ex(E)-->eterm(E).
ex([S,E1,E2])-->sterm(E1),sn(S),eterm(E2).

sterm(E)-->eterm(E).
sterm([S,E1,E2])-->eterm(E1),sn(S),eterm(E2).
sterm([S2,[S1,E1,E2],E3])--> eterm(E1),sn(S1),sterm(E2),sn(S2),eterm(E3).

eterm(E)-->fct(E).
eterm([S2,[S1,E1,E2],E3])--> fct(E1),sn2(S1),eterm(E2),sn2(S2),fct(E3).
eterm([S,E1,E2])-->fct(E1),sn2(S),fct(E2).
sn2(*)-->[*]. sn2(/)-->[/]. sn2(div)-->[div]. sn2(mod)-->[mod]. sn2(and)-->[and]. 
fct(E)-->number(E).
fct(E)-->lb,ex(E),rb.
fct(E)-->snot,fct(E).
number(X)-->[X],{number(X)}.
lb-->['('].
rb-->[')'].
sg(+)-->[+].
sg(-)-->[-].
sn(E)-->sg(E).

[+,[-,[-,10,3],5],4]

«6».

4.

, calc/2.
, , , .

ex(E)-->eterm(E).
ex(R)-->sterm(E1),sn(S),eterm(E2),{clc([S,E1,E2],R)}.

sterm(E)-->eterm(E).
sterm(R)-->eterm(E1),sn(S),eterm(E2),{clc([S,E1,E2],R)}.
sterm(R)-->eterm(E1),sn(S1),sterm(E2),sn(S2),eterm(E3),{clc([S1,E1,E2],N),
        clc([S2,N,E3],R)}.

eterm(E)-->fct(E).
eterm(R)-->fct(E1),sn2(S1),eterm(E2),sn2(S2),fct(E3),{clc([S1,E1,E2],N),
        clc([S2,N,E3],R)}.
eterm(R)-->fct(E1),sn2(S),fct(E2),{clc([S,E1,E2],R)}.
sn2(*)-->[*]. sn2(/)-->[/]. sn2(div)-->[div]. sn2(mod)-->[mod]. sn2(and)-->[and]. 
fct(E)-->number(E).
fct(E)-->lb,ex(E),rb.
number(X)-->[X],{number(X)}.
lb-->['('].
rb-->[')'].
sg(+)-->[+].
sg(-)-->[-].
sn(E)-->sg(E).
clc(A1,A1):-A1\=[_|_].
clc([*,N1,N2],Nr):- Nr is N1*N2.
clc([/,N1,N2],Nr):- Nr is N1/N2.
clc([+,N1,N2],Nr):- Nr is N1+N2.
clc([-,N1,N2],Nr):- Nr is N1-N2.

. «» , .. .
- , , HTML, XML.

— .

Source: https://habr.com/ru/post/274799/

All Articles

Parsing in Prolog

More articles: