Scala rule-based inference engine

Hello! I want to show the public my open rules output engine ( forward chaining ) with the support of fuzzy logic , under the working title Scala inference engine (sie) ( code ).

UPD.
The library is uploaded to the maven central repository:

<dependency> <groupId>net.sf.brunneng.fusie</groupId> <artifactId>fusie</artifactId> </dependency>

Place this engine among their own kind

There are many good output engines and expert systems for jvm. I will give my small review of some of them (not claiming, however, for completeness and authenticity, only my first impressions of a cursory acquaintance).
')
Drools - An adult, well-configurable open source engine using forward chaining. The syntax for defining rules can be found here .

d3web - A rather adult platform for building expert systems. It has its own wiki, for editing rules, questions, building forms for accepting input data, unit testing. Simple language definition rules.

jColibry - as I understand it, this library is designed to interactively search data from a large number of options.

InfoSapient is an output engine using backward chaining with fuzzy logic support. Allows the use of a humanoid language for describing rules. But, in my opinion, it has several significant flaws described here (page 19-20):
“By current design, the rule base currently cannot access external data. This
goal during the session
solved, and all supporting information as well. * "

“The rule syntax doesn’t permit the calculations, ie if ((a + b) is greater than m) then x; or executing programs for external objects. "

Jena - allows you to present data in a standard RDF format (semantic network), and then tries to extract data of interest using the special query language SPARQL.

mandarax is a rule compiler. The disadvantage is that it is static - each set of rules must have been compiled as Java code, and this cannot be done dynamically.

And others .

As you can see, all the engines are very different: some use forward for some backward chaining, some own fuzzy logic some do not. Some have a simple syntax for defining rules - for others it is complicated, and so on.

In sie, I tried to combine the possibilities of clear and fuzzy inference, the simplicity of defining rules and a flexible configuration. It was Scala (and not java) that was initially chosen because it can be written in a functional style, which will allow to overcome the alleged complexity of the algorithms that were to be written. However, the engine is going to be a maven artifact, after which it can be edited to any maven project in java (with an additional dependency on the scala-library), and everything will work.

Killer feature

We introduce the concept of "overlapping rules." These are the rules that give conclusions about the same variable. for example

 when a > 300 then b = 5 when a < 400 then b = 10

In this case, if 'a' takes a value between 300 and 400, then both rules are executed and the system for further output must decide which way to go, since 'b' cannot be 5 and 10. At the same time, there are several ways how to resolve a conflict situation:

Choose first / last rule
Set rule priority somewhere
Choose a rule with a more complicated condition (although in this case the conditions have one difficulty), assuming that the simpler condition defines the general case, and the more difficult one defines special cases.

Drools conflict resolution strategies .
The current implementation (and this I have never met) goes the other way.
4) Assume that both rules are equivalent and 0.5 probabilities that the first is satisfied and 0.5 that the second (or, in general, 1 / N probabilities if the overlapping rules are N).
Further, the output is divided into parts, and continues separately for each of the branches up to finding the desired variable. Subsequently, the total probability for each possible value of the desired variable over all branches of the output is considered.

If we consider that the rules can freakishly depend on each other on the variables used in the preconditions, that conclusions can contain assignments to different variables (so that groups of overlapping rules are possible, such as X intersects Y with Y over a, Y intersects with Z over a b) that simple verbal description of the idea translates into a not very simple implementation. The main goal that has been achieved is the correct calculation of the probabilities of the values assumed by the desired variable.

Thus, the engine copes well with the base of possibly contradictory rules that intersect:

Inadvertently, if the rules were drafted by different experts and each has their own opinion.
Intentionally, if the same variable can be calculated in different ways, for example:
```
 when graphicCardType == "Top" then graphicCard = "Nvidia super card" when graphicCardType == "Top" then graphicCard = "Radeon super card" 
```
In this example, an adviser for choosing a video card can advise both a Nvidia card and a Radeon card with equal probability.

Thus, an element of fuzziness is maintained in the output.

Problem Definition Language

Problem definition structure:

User variables are set - those that will be requested in the output process.
Inference rules consisting of preconditions, conclusions and, if desired, the probability of the implementation of this rule (from 0 to 1, the default is 1).
The target is the name of the variable to be found.

The problem can be defined completely programmatically, but it is much more convenient to do this using a special (non-xml) syntax that was designed to be concise and immediately understandable for a person with programming experience.

Example 1: Financial Advisor

 int amountSaved <- "How many savings you have?" int earnings <- "What is you year income?" bool steady <- "Your year income is stable?" int dependents = min: 0 <- "How many dependents you have?" when true then minincome = 15000 + (4000 * dependents) when true then minsavings = 5000 * dependents when savingsAccount == "inadequate" then investment = "savings" when (savingsAccount == "adequate") && (income == "adequate") then investment = "stocks" when savingsAccount == "adequate" income == "inadequate" then investment = "combination" when amountSaved > minsavings then savingsAccount = "adequate" when amountSaved <= minsavings then savingsAccount = "inadequate" when steady earnings > minincome then income = "adequate" when steady earnings <= minincome then income = "inadequate" when !steady then income = "inadequate" find investment

First comes the variable definition block:

 int amountSaved <- "How many savings you have?" int earnings <- "What is you year income?" bool steady <- "Your year income is stable?" int dependents = min: 0 <- "How many dependents you have?"

These are the variables that are not displayed, but are used in the rules. First comes the type, then the name of the variable, then optional validation to possible values (min: 0 - means that the value is less than 0) and, optionally, after <question to the user when requesting this variable.
Supported types: bool, int, double, enum (aka string).

In the preconditions and conclusions, expressions of any complexity can be used (the most difficult in the example is minincome = 15000 + (4000 * dependents)) , but this is far from the limit)
The semantics of arithmetic operations is the same as in java. Implicit conversions are supported int to double where it is needed. By default, function calls from java.lang.Math are supported, but you can register your functions as well.

Rule of sight

 when true then minsavings = 5000 * dependents

determines the fact that his pre-ensue is always fulfilled.

Records

 when (savingsAccount == "adequate") && (income == "adequate") then investment = "stocks"  when savingsAccount == "adequate" income == "adequate" then investment = "stocks"

Equivalent, since between lines in the precondition implicitly is the operation && (and).

By the way, the problem can be partially defined, for example, only rules without user variables. And to add variable definitions programmatically. You can also slip your datasource to pull in user variables.

Parser features:

Syntax errors are shown, a string and a symbol where it was not possible to parse.
Verification of semantic errors, such as determining several user variables with one name, or determining the assignment of a user variable in the conclusion.
Type control: the user sees where the type conversion error occurred.
Support call overloaded functions.

The parser was implemented as an heir from scala.util.parsing.JavaTokenParsers .
I can only say that it was a pleasure to write it, despite the fact that I had more than modest experience in writing parsers. The power of this tool lies in the fact that you can set templates for parsing and immediately mapping the results on the entity.

Testing

Testing has been given special attention. Unit tests are written to parse the rules (from simple constructions to determine the entire problem), to verify the correctness of the output (from simple tasks to complex, with confusing overlapping rules).

Examples

The com.greentea.sie.examples package has several classes with live examples of rule definition for some different subject areas: FinancialAdviser, ProgrammingLanguageAdviser, LoanarAdviser .

Further directions

Engine operation should not be a black box. It is necessary to improve the description of the output process understandable to the user.
Support for fuzzy comparisons.
Type:
mood is Good
where mood is a numeric variable, and Good is a fuzzy concept defined by the function of care.
Testing performance and memory usage on large arrays of rules.
Other conflict resolution options for overlapping rules.

Source: https://habr.com/ru/post/180869/

All Articles