ChatScript Intelligent Chat Bots: The Basics

Chat bots that communicate with a person in a natural language are very popular and in demand. Today we want to share with you the first part of the translation of the material on the development of chat bots using ChatScript (CS), written by a WebbyLab employee based on the experience he gained while working on one of the company's recent projects. Here, in particular, we will focus on the basics of working with CS, considered from the point of view of a programmer-practice. ChatScript is good because it is relatively easy to develop large-scale systems with artificial intelligence capabilities, and because it is easy to integrate it into projects written in JavaScript.

Formulation of the problem

One day, a company where I work, WebbyLab, was faced with the task of developing an intelligent chat bot for an insurance company from the United States. At the time of the start of work, the client already had a user interface for chatting on Facebook. We needed to make sure that the bot that “sits” behind this interface would understand the users' messages and intelligently respond to them, analyzing the phrases and extracting the necessary information from the entered data. We decided to separate the statements of users according to their possible intentions and implement a mechanism for recognizing these intentions based on a set of phrases. In addition, it was necessary to take into account the fact that intentions may contain various parameters (for example, the area of a house, date, car model) that the chat bot also needed to be recognized. ChatScript was chosen as the platform for the project.

ChatScript

To get started, take a closer look at ChatScript. This is the chat bots engine, the programs created on which received the Löbner Prize four times. It was developed by Sue and Bruce Wilcox. CS is based on rules and working with it can resemble a declarative programming approach, similar to writing a configuration file or grammar for an interpreter. However, working with CS is closer to imperative programming, because here, in addition, you have to use commands in order to inform the engine about how to respond to a particular message. CS is written in C ++, the engine has binary builds for Windows, Linux and MacOS platforms.
')

Development of a simple bot

The best way to master a software tool is to try it out in practice, to create something simple and easy to understand with it. This is the project we are going to do now. In addition, I advise you to look at this article about and go through the official ChatScript tutorial.

First of all, it is worth preparing a comfortable workplace. To do this, I advise you to install development tools that support syntax highlighting CS. Here are the plugins for Sublime Text 3 , Visual Studio Code and Atom . I used Sublime, because sometimes you had to open huge files, and this editor quickly copes with this task, but you can choose the editor that you like.

In order to clarify the basic things, I will consider a step-by-step example of developing a valid chat bot. My example uses Ubuntu 16.04 and CS 7.4., But you can use any other supported platform.

1. Clone the CS repository with GitHub:

git clone https://github.com/bwilcox-1234/ChatScript.git

2. ChatScript directory and create a folder for the chat bot with the file for the main topics (what we have, we’ll discuss below), and the files filesfood.txt file that contains the list of topics included in the application:

 bash cd ChatScript/RAWDATA mkdir FOOD touch FOOD/food.top touch filesfood.txt

3. implecontrol.top s implecontrol.top file from RAWDATA/HARRY to the FOOD folder. This is the script required to interact with the bot. Here, although it is not necessary, you can change the value of the $botprompt variable in the 9th line of the s implecontrol.top file with the line that will be inserted before each bot message. However, you can leave everything as it is (the default is HARRY ), since it does not affect the behavior of the bot. In my example, I use the following setting for this variable: $botprompt = ^"fastfood> .

 bash cp HARRY/simplecontrol.top FOOD/simplecontrol.top

4. Add the following code to the food.top file:

 topic: ~fastfood keep repeat [] t: Hello in our online fastfood. Please make your order. u: BURGER (I [want need take] _[burger potato ice-cream]) $order = _0 Okay, you want $order . Something else?

5. Add to the filesfood.txt file a list of files that will be used when building the bot:

 RAWDATA/FOOD/simplecontrol.top RAWDATA/FOOD/food.top

6. And finally, collect and run the bot. To do this, run the following command from the ChatScript directory:

 ./BINARIES/ChatScript local

7. At this stage, you can choose any username. Next, in the CS console, you need to execute two commands (the first collects the basic level of the chat bot, the second sets it up for a specific topic):

 :build 0 :build food

After completing all these steps, we will have a working chat bot. At the moment, he understands only a few phrases (like “I need a burger” and “I want ice-cream”), but it can be expanded by adding new rules and topics. After any changes to the food.top file, run the command again :build food . Below we will talk more about the syntax and constructs used in this example.

Basic ChatScript Constructs

Now you can get down to work with ChatScript. But for starters, I advise you to learn a little more about the basic constructs of CS by reading the official documentation. Here I will talk about the basic mechanisms of CS, relying on my own experience with this medium for developing chatbots.

Topiki

A topic (topic) is a set of rules to be shared. If we suggest the system to use a specific topic, then as long as they work with it, only the rules from this topic will act. Topics are declared using a keyword ( topic: , followed by a name beginning with a “ ~ ” sign ( ~fastfood ) and a list of functions ( keep repeat ) that should be used for all the rules inside the topic (the keep and repeat functions are needed to return to this topic after calling each rule inside it). At the end of the announcement are square brackets - [] :

 topic: ~fastfood keep repeat []

Each topic usually includes a set of related rules. Switching between topics, or in other words, calling one topic from another, is done using the ^respond method, which we will discuss below.

▍Rules

A rule is a call to an action when a pattern within it matches the data sent to the chat bot. Rules must be posted after the announcement of topics.

 u: BURGER (I want ari-burger) Okay, your order is hamburger

Rule descriptions typically include a type ( u: , a label (for example, BURGER ; this is optional, but useful for debugging and self-documenting code), a template (everything that is in brackets). The description of the rule, in addition, can be divided into several lines in order to improve the readability of the code. CS does not pay attention to line breaks, a sign of the end of the rule is the declaration of a new rule or topic. In the rule, you can use the command to go to another topic - the ^respond function:

 u: BURGER (I want ari-burger)  ^respond(~answers)

In this case, the input data will be processed by the rules from the topic passed to the ^respond function. This approach can give us the opportunity to divide CS-scripts into parts, or, for example, place the formatting of answers for phrases in separate topics.

▍Variables and working with memory

Variables in CS are a mechanism for storing user input. There are variables for short-term information storage — their values are cleared after exiting the templates, and user variables, long-term data stores, which store values until they are cleared. Here is an example of a rule code that uses short-term variables:

 u: ORDER (I want _)  $order = _0  Okay, your order is $order

The underscore indicates a short-term variable (you can specify how many words you need to remember in such a variable, using, for example, the _* construct, which corresponds to all words, or _*2 , which corresponds to two words, and so on). As a result, if the above code works, the word from the entered data that comes after “I want” will be stored in short-term memory. In order to access it, the second line uses the construction _0 . A template can include as many of these variables as needed (usually no more than 20, but this is usually enough), while the values are obtained from them using names consisting of an underscore and a sequence number of a match inside pattern. In this example, $order — a user variable in which what is stored in _0 written.

▍Templates

A pattern is used to describe the order and set of words, the appearance of which is expected in what the user will enter when communicating with the bot. In my opinion, a very remarkable possibility of CS templates is that when describing them, you do not need to add all forms of each word. The system has built-in support for the mechanism, during which in the pattern, which includes only one form of the word, all forms of this word can be found.

For example, take the verb “be”. Adding it to the template also includes this verb in it in all its forms: am, is, are, was, were, were. However, to process data containing auxiliary verbs (will, have, do), they must be added in an explicit form - CS can automatically process different forms only for individual words. For this purpose, I advise you to use another possibility of templates - an optional word in curly braces:

 u: BURGER (I {will} take _burger)

The same applies to nouns and pronouns - it is enough to add a word in the nominative singular, after which CS will look for matches for all variants of the word, using its internal mechanisms. In templates, you can also expand variants of phrases that correspond to a certain statement, using sets of words in certain positions of the template. Expanding our example, we can add the following to it:

 u: BURGER (I {will} [want need take] [_burger hamburger potato ice-cream])  $order = _0  Okay, your order is $order

This rule will correspond to all sentences entered by the user with combinations of the specified words (for example, “I want a hamburger”, “I will take a potato”, “I need ice-cream”). Another important feature of memoization, which I did not find descriptions in the official CS documentation, is how short-term variables work in sets (if you find this topic in the documentation - let me know please). In fact, in a similar situation in a short-term memory, in the variable _0 , any matched word from the set will be saved. The custom variable $order , in any case, when a match is found, will get some value.

In addition, you can control the beginning and end of user-entered phrases using < and > signs:

 u: BURGER (< I {will} [want need take] [_burger hamburger potato ice-cream] >)

Inside the templates, in fact, you can still do a lot of interesting things. For example - to check values for compliance with some criteria. For example, this is what can be done if, within a certain range, we are only interested in numbers:

 u: OLD_ENOUGH (I be _~number _0>21 _0<120)  You are old enough for this. u: TOO_YOUNG (I be _~number _0<21)  $missed_age = 21 - _0  You are too young for this, come after $missed_age years.

Concepts

A concept is a collection of words or combinations of words associated with a single keyword (concept name). A concept ad is similar to a topic ad. The list of words related to the concept is given in square brackets:

 concept: ~food_type [burger potato salad ice-cream]

After declaring a concept, you can use it as an alias in rules like this (now our template will find matches only for words defined in the ~food_type concept):

 u: BURGER (I want _~food_type) $order = _0 Okay, your order is $order

Moreover, some concepts can be nested in others, creating an additional level of abstraction, grouping different sets of words in one parent concept:

 concept: ~dessert [ice-cream sweets cookie] concept: ~burger [burger hamburger cheeseburger vegeterainburger] concept: ~food_type [~burger ~dessert potato salad]

If you need to find out to which particular concept the word with which a match is found belongs, you can use the pattern keyword. In the following example, the value stored in the $drink variable is checked for its belonging to the concept ~alcohol . For this purpose we use the keyword pattern and the sign ? in the if expression, dividing them with what we check and the target value of the concept (CS supports if-else ):

 concept: ~drink_type [~alcohol ~non_alcohol] concept: ~alcohol [rum gean wiskey vodka] concept: ~non_alcohol [cola juice milk water] u: DRINK (^want(_~drink_type))  $drink = _0  if (pattern $drink?~alcohol) {      ^respond(~age_checker)  } else {      Ok, take and drink your $drink .  }

If you need to find matches not only with individual words of user input, but also with combinations of words, you can also add them to the concept, enclose in quotes, or using underscores between the words in one phrase (this approach is also used when you need to find matches with punctuation marks):

 concept: ~vegburger ["vegeterian burger" "vegeterian's burger" vegan_burger vegan_'s_burger]

These approaches are similar, but I prefer to use quotes, as they make it easier to read the code.

Another interesting feature of CS is the availability of standard concepts defined at the engine level. They allow you to use already prepared sets for the most frequently used in natural language phrases and single words with the same or similar meaning. Among them are the concepts ~yes and ~no , which include phrases that can be interpreted in natural language as positive and negative. For example, the concept ~yes contains words and phrases such as yes, yeah, ok, okay, sure, of_course, alright, and many others (183 total). There are 138 corresponding words and phrases in the ~no concept. Here are some other concepts that were useful to us when working on the project:

~number - helps to find matches with numbers.
~yearnumber - is a subset of ~number that contains only values from 999 to 10000.
~dateinfo - helps to find dates in the text, using the format of the record with a slash. For example, mm_dd_yy or mm_dd_yyyy will be recognized as the strings “mm / dd / yy” or “mm / dd / yyyy”.
~timeword - when using this concept, for example, for “1 July 2017” and for “July 1 2017”, “July 1 2017” will be returned. In addition, this concept includes a huge set of words related to time, like second, yesterday, already, and so on).

Here is the CS documentation section where you can find information about all embedded concepts.

If, for some reason, you need to extend the existing standard concepts, you can add entries to the file LIVEDATA_ENGLISH_SUBSTITUTES/interjections.txt :

 <roger_that> ~yes

After that, just restart the ChatScript engine and the phrase “roger that” will be added to the ~yes concept. The angle brackets at the beginning and at the end of a phrase mean that only these two words will be considered a match, and nothing more.

In addition, existing concepts can be expanded in a different way - new values can be added to them using the keyword MORE :

 concept: ~food [burger potato] concept: ~food MORE [ice-cream]

Macros

In order to ensure code reuse, CS contains macros — functions that a programmer develops that are called to generate output data or to use them in templates. JSON is used in our project, as a result, the output must be properly formatted so that they can be transferred to the JS wrapper class. For this purpose, I decided to prepare a string that can be easily parsed in JavaScript. Every time the system generates some output, I convert it to a JSON string. However, in CS there are methods for working with JSON. If we use all this with the use of outputmacro , we will get a simple and convenient way to format the output for the application backend API:

 outputmacro: ^formated_in_json(^param_from_rule) {  $_result = ^jsoncreate(object)  $_result.first_level_param = ^param_from_rule  $_result.nested_object = ^jsoncreate(object)  ^jsonwrite($_result) }

Using macros in the rules looks exactly the same as using the above described standard function ^respond :

 u: FOOD (I want _~food_type)  ^formated_in_json(_0)

Since macros can be written for different output formatting depending on parameters, it is possible to create them for templates with similar constructions for a set of rules using patternmacro :

 patternmacro: ^want(^appendix)  [i we] * [want need take] ^appendi

Part of the template returned by the macro can be reused in a variety of other templates, which allows to simplify the script:

 u: FOOD (^want(_~food_type))  If you want _0, you should get \_0 . u: DRINK (^want(_~drink_type))  Ok, take and drink your _0 .

Chat bot scheme

The following diagram shows the location of each of the CS constructs that we talked about in a chat bot.

Chat bot scheme

As you can see, topics, concepts and macros must be declared in the outer layer of the bot. The rules are nested in topics. Each rule, as an input point, has a pattern, and something like a responder, or a rule body, which is executed only if the current pattern is called. Concepts and template macros declared outside the topic are then used internally with variables for short-term data storage containing the values necessary to form a response. At the same time, a macro can be called in the responder to process the output values. Variables for long-term data storage are used to transfer data from short-term variables to other topics if the responder calls a command like ^respond(~another_topic) . This means that the processing of data, which will then be displayed to the user, will be taken up by the rules in another topic.

Results

Today you learned about the basics of ChatScript, which means that if you want to use this engine to develop your own bot, you can start planning the project. From the next part of this material, you will learn about the CS environment, about debugging CS projects, about integrating CS and JavaScript, and how to deal with the problems that can arise when developing ChatScript chatbots.

Dear readers! Do you create chat bots? If so, please tell us about the tools you use.

Source: https://habr.com/ru/post/344604/

All Articles