This article continues the cycle of stories (
one ,
two ) about the main ways / scenarios for using iKnow - the Natural Language Processing tool from the InterSystems technology stack.
Previous posts on this topic were mainly devoted to working with data after they were placed in the domain (the place where the entire text analysis takes place). The same article will be about how to correctly and conveniently upload information to iKnow. As an example, we will consider downloading information about users on Vkontakte: their personal data, posts, etc.
The article implies some basic background in the field of InterSystems technologies (in particular, Caché ObjectScript).
Long road to the domain

According to
official documentation , there are two scenarios for loading data into an existing domain:
- An instance of the class
%iKnow.Source.Loader
. It is bound to a specific domain (the one whose id was transferred to the constructor). An instance of the class that implements the lister interface is created. This instance calls the AddListToBatch
method with some arguments specifying the load information. Thus, a new list of download information is added to the current domain batch. This can be done several times. In order to load the current batch into the domain, the loader should call the ProcessBatch
method. This option is better suited for high volume downloads. - An instance of the class that implements the Lister interface is created , the
ProcessList
method is called for this instance with some arguments specifying the information to be loaded, and the load is immediately sent to the domain directly. This variant is better for low volume loads.
Listing customization
The standard library offers many ready-made implementations of the lister (RSS-lister, file lister, global lister). However, the final programmer has the opportunity to write his own implementation, suitable for his own needs.
Before writing a lister for Vkontakte posts, I wrote a wrapper for some of the
Vkontakte API methods on
COS , which operate on data in open access. All code is available on
github in the
VKReader
package.
I decided that it would be interesting if Lister could download the latest posts for some keyword, well, and some other parameters. It turned out that this is not difficult to achieve.
The head of the documentation on customization says that to create your own lister, you need to inherit from the system class and override several methods.
So, all in the same package, I created a class
VKReader.Lister
, inherited from the class
%iKnow.Source.Lister
. If you are writing your lister, it must also inherit from this class.
Each lister should be assigned a unique short name (alias) by which the iKnow system methods will access it. If this name is not specified, the full class name of this lister will be used instead.
To specify alias, simply redefine the
GetAlias
class method in your class. For our Lister Vkontakte I did it like this:
ClassMethod GetAlias ​​() As% String
{
Quit "VKAPI"
}
')
All data sources submitted for download have an external id, which must contain the short name of the lister and full reference, which, in turn, consists of the name of the source group and the local reference.
For the lister to work, you need to override the BuildFullRef
and SplitFullRef
class methods, respectively, collecting full reference from groupname and local reference and breaking it into these two parts.
Extrenal id in our case is the following:
VKAPI:searchQuery:::vkPostId
Here VKAPI is the short name of our lister, the search query plays the role of the name of the source group, and the Vkontakte post id is the local reference.
Code of methods BuildFullRef
and SplitFullRef
:
ClassMethod SplitFullRef ( domainId As% Integer , fullRef As% String , Output groupName As% String ,
Output localRef As% String ) As% Status [ Private ]
{
set delim = ":::"
set localRef = $ piece ( fullRef , delim , $ l ( fullRef , delim ))
set groupName = $ e ( fullRef , 1, * - $ l ( localRef ) - $ l ( delim ))
Quit $$$ OK
}
ClassMethod BuildFullRef ( domainId As% Integer , groupName As% String , localRef As% String ) As% String [ Private ]
{
quit groupName _ ":::" _ localRef
}
You also need to specify which Processor
will be standard for this Lister. In iKnow Processor
, this is an object that deals with the immediate processing of downloaded data. There are several types of different handlers ( Processor
), but since in our case the data will be stored only directly in memory, I decided to use a handler for temporary storage. The handler is also specified via override.
ClassMethod DefaultProcessor () As% String
{
Quit "% iKnow.Source.Temp.Processor"
}
All the main boot activity occurs in another ExpandList
method with the eloquent name ExpandList
. This method expands the list to load into the domain. The arguments to the ProcessList and AddListToBatch methods are the same as you define them in the ExpandList
.
We first give all the code of the method for our case.
We will have the following arguments (in order): the query word by which we want to search for records; number of entries; a boolean value corresponding to whether we want to check the load list for the existence of a source with the same local reference; restrictions on the post publication time.
A lot of code under the spoilerMethod ExpandList ( listparams As% List ) As% Status
{
set query = $ li ( listparams , 1)
set count = $ li ( listparams , 2)
set checkExists = + $ lg ( listparams , 3, 1)
set startDate = $ lg ( listparams , 4)
set startTime = $ lg ( listparams , 5)
set endDate = $ lg ( listparams , 6)
set endTime = $ lg ( listparams , 7)
#dim response As % ListOfObjects
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). NewsfeedSearch (. Response , query ,
count ,,, startDate , startTime , endDate , endTime )
quit : $$$ ISERR ( tSC ) tSC
do .. RegisterMetadataKeys ( $ lb ( "PostDate" , "PostTime" , "AuthorID" , "AuthorCity" , "AuthorCountry" ,
"AuthorDOB" , "AuthorSex" ))
set userIds = "1"
set groupIds = "1"
for i = 1: 1: response . Count () {
if ( response . GetAt ( i ). FromID <0) {
set groupIds = groupIds _ "," _ (- ( response . GetAt ( i ). FromID ))
} else {
set userIds = userIds _ "," _ response . GetAt ( i ). FromID
}
}
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). UsersGet (. ResponseUsers , userIds ,
"sex, city, bdate, country" )
quit : $$$ ISERR ( tSC ) tSC
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). GroupsGetById (. ResponseGroups , groupIds ,
"city, country" )
quit : $$$ ISERR ( tSC ) tSC
for i = 1: 1: response . Count () {
set tPostDate = response . GetAt ( i ). Date
set tPostTime = response . GetAt ( i ). Time
set tOwnerID = response . GetAt ( i ). OwnerID
set tFromID = response . GetAt ( i ). FromID
set tID = response . GetAt ( i ). ID
#dim tTextStream as % GlobalCharacterStream
set tTextStream = response . GetAt ( i ). Text
if ( tFromID <0) {
set tAuthorCity = responseGroups . GetAt (- tFromID ). City
set tAuthorCountry = responseGroups . GetAt (- tFromID ). Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers . GetAt ( tFromID ). City
set tAuthorCountry = responseUsers . GetAt ( tFromID ). Country
set tAuthorDOB = responseUsers . GetAt ( tFromID ). DOB
set tAuthorSex = responseUsers . GetAt ( tFromID ). Sex
}
set tLocalRef = tOwnerID _ "#" _ tFromID _ "#" _ tID
if ( checkExists ) {
continue : .. RefExists ( query , tLocalRef , checkExists - 1)
}
set tRef = $ lb (i% ListerClassId, .. AddGroup ( query ), tLocalRef )
do tTextStream . Rewind ()
if ( tTextStream . Size = 0) {
continue
}
set len = 32000
while ( len = 32000) {
do .. StoreTemp ( tRef , tTextStream . Read (. len ))
}
do .. SetMetadataValues ( tRef , $ lb ( tPostDate , tPostTime , tFromID , tAuthorCity , tAuthorCountry ,
tAuthorDOB , tAuthorSex ))
}
}
Go through the code in more detail.
First we select the arguments.
set query = $ li ( listparams , 1)
set count = $ li ( listparams , 2)
set checkExists = + $ lg ( listparams , 3, 1)
set startDate = $ lg ( listparams , 4)
set startTime = $ lg ( listparams , 5)
set endDate = $ lg ( listparams , 6)
set endTime = $ lg ( listparams , 7)
Make a request to the API Vkontakte through our method-wrapper. The result of the work of this method is a list of objects of the class VKReader.Data.Post
, which contains some characteristic fields for recording Vkontakte.
#dim response As % ListOfObjects
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). NewsfeedSearch (. Response , query ,
count ,,, startDate , startTime , endDate , endTime )
quit : $$$ ISERR ( tSC ) tSC
We will register metadata keys for further easy saving of meta-information. In the metadata we want to store the date and time of publication of the record, as well as the id, city, country and date of birth of the author.
do .. RegisterMetadataKeys ( $ lb ( "PostDate" , "PostTime" , "AuthorID" , "AuthorCity" , "AuthorCountry" ,
"AuthorDOB" , "AuthorSex" ))
Save the comma-separated-list id of users and groups that are authors of the records we found. Id groups, as in the API Vkontakte, are negative integers, and user id - positive.
set userIds = "1"
set groupIds = "1"
for i = 1: 1: response . Count () {
if ( response . GetAt ( i ). FromID <0) {
set groupIds = groupIds _ "," _ (- ( response . GetAt ( i ). FromID ))
} else {
set userIds = userIds _ "," _ response . GetAt ( i ). FromID
}
}
Get information about these users and groups using wrapper methods. They return lists of objects of the VKReader.Data.User
and VKReader.Data.Group
, containing fields specific to users and Vkontakte groups (such as a city, a country, and everything else).
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). UsersGet (. ResponseUsers , userIds ,
"sex, city, bdate, country" )
quit : $$$ ISERR ( tSC ) tSC
set tSC = ## class ( VKReader.Requests.APIPublicMethodsCaller ). GroupsGetById (. ResponseGroups , groupIds ,
"city, country" )
quit : $$$ ISERR ( tSC ) tSC
In the cycle we will process all found posts. First, we select all the resulting meta-information into local variables.
set tPostDate = response . GetAt ( i ). Date
set tPostTime = response . GetAt ( i ). Time
set tOwnerID = response . GetAt ( i ). OwnerID
set tFromID = response . GetAt ( i ). FromID
set tID = response . GetAt ( i ). ID
#dim tTextStream as % GlobalCharacterStream
set tTextStream = response . GetAt ( i ). Text
if ( tFromID <0) {
set tAuthorCity = responseGroups . GetAt (- tFromID ). City
set tAuthorCountry = responseGroups . GetAt (- tFromID ). Country
set tAuthorDOB = ""
set tAuthorSex = ""
} else {
set tAuthorCity = responseUsers . GetAt ( tFromID ). City
set tAuthorCountry = responseUsers . GetAt ( tFromID ). Country
set tAuthorDOB = responseUsers . GetAt ( tFromID ). DOB
set tAuthorSex = responseUsers . GetAt ( tFromID ). Sex
}
Local reference - wall owner id, sender id and record id, separated by a grid.
set tLocalRef = tOwnerID _ "#" _ tFromID _ "#" _ tID
If necessary, check if there are sources with the same local reference.
if ( checkExists ) {
continue : .. RefExists ( query , tLocalRef , checkExists - 1)
}
The following code could be different if another source handler were chosen. I use a handler for temporary storage, so I need to expand the list using the StoreTemp
method (for more information on each handler, see the page with its documentation). Also I need to set the values ​​obtained for the metadata fields.
set tRef = $ lb (i% ListerClassId, .. AddGroup ( query ), tLocalRef )
do tTextStream . Rewind ()
if ( tTextStream . Size = 0) {
continue
}
set len = 32000
while ( len = 32000) {
do .. StoreTemp ( tRef , tTextStream . Read (. len ))
}
do .. SetMetadataValues ( tRef , $ lb ( tPostDate , tPostTime , tFromID , tAuthorCity , tAuthorCountry ,
tAuthorDOB , tAuthorSex ))
Everything. Lister is written!
Let's test his work.
Testing Lister
I wrote a small web application that, using the Lister we implemented, allows you to view, search for similar ones, add them on request, and delete records from the domain. Here are some screenshots:
Initially empty domain.

Click on the plus sign to add new posts.
In the form that appears, fill in the fields and click on the button to add entries.

We are waiting for some time and records are added.

For those users or groups who have provided data about themselves in open access, our Lister stores them in the meta-information fields, and this small demo displays them in a not-too-elegant table.
Out of the box, iKnow can show similar entries: click on the button with the target near some post and make sure that it works.

Summary
In the course of the article, we figured out how data loading to the domain works, discussed in detail how the average lister works and how to write our own lister, which will also work. They wrote their lister to work with Vkontakte data, and also made sure that it really works in the modulus of the fact that the domain and configuration were created somewhere behind the scenes.
In case there is a desire to look behind these scenes, all the code that was stated, used or mentioned in the article can be found on the project page on github .