Intersystems iKnow is an unstructured data analysis tool that provides access to data by indexing the sentences and entities contained in the text. To begin the analysis, it is necessary to create a domain - storage of unstructured data, and load text into it. The process of creating a domain is well described here and here . About the main ways of using iKnow is written here , I would also recommend this article to you.
The iFind technology is a Caché DBMS module for performing full-text search operations using data from Caché classes. iFind uses many of the features of iKnow to provide intelligent text search. To use iFind in queries, you need to define a special iFind index in the Caché class.
There are three types of iFind indexes, each type of index provides all the functions of the previous type, plus additional functions:
')
XData Install [ XMLNamespace = INSTALLER ] { <Manifest> // <IfNotDef Var="Namespace"> <Var Name="Namespace" Value="DOCSEARCH"/> <Log Text="Set namespace to ${Namespace}" Level="0"/> </IfNotDef> // <If Condition='(##class(Config.Namespaces).Exists("${Namespace}")=1)'> <Log Text="Namespace ${Namespace} already exists" Level="0"/> </If> // <If Condition='(##class(Config.Namespaces).Exists("${Namespace}")=0)'> <Log Text="Creating namespace ${Namespace}" Level="0"/> // <Namespace Name="${Namespace}" Create="yes" Code="${Namespace}" Ensemble="" Data="${Namespace}"> <Log Text="Creating database ${Namespace}" Level="0"/> // <Configuration> <Database Name="${Namespace}" Dir="${MGRDIR}/${Namespace}" Create="yes" MountRequired="false" Resource="%DB_${Namespace}" PublicPermissions="RW" MountAtStartup="false"/> <Log Text="Mapping DOCBOOK to ${Namespace}" Level="0"/> <GlobalMapping Global="Cache*" From="DOCBOOK" Collation="5"/> <GlobalMapping Global="D*" From="DOCBOOK" Collation="5"/> <GlobalMapping Global="XML*" From="DOCBOOK" Collation="5"/> <ClassMapping Package="DocBook" From="DOCBOOK"/> <ClassMapping Package="DocBook.UI" From="DOCBOOK"/> <ClassMapping Package="csp" From="DOCBOOK"/> </Configuration> <Log Text="End creating database ${Namespace}" Level="0"/> </Namespace> <Log Text="End creating namespace ${Namespace}" Level="0"/> </If> </Manifest> }
The domain that we need to work iKnow, built on the table containing the documentation. Since the data source is a table, we will use SQL.Lister. The content field contains the text of the documentation, so we specify it as a data field. The remaining fields will be indicated in the metadata.
ClassMethod Domain(ByRef pVars, pLogLevel As %String, tInstaller As %Installer.Installer) As %Status { #Include %IKInclude #Include %IKPublic set ns = $Namespace znspace "DOCSEARCH" // set dname="DocSearch" if (##class(%iKnow.Domain).Exists(dname)=1){ write "The ",dname," domain already exists",! zn ns quit } else { write "The ",dname," domain does not exist",! set domoref=##class(%iKnow.Domain).%New(dname) do domoref.%Save() } set domId=domoref.Id // Lister , set flister=##class(%iKnow.Source.SQL.Lister).%New(domId) set myloader=##class(%iKnow.Source.Loader).%New(domId) // set myquery="SELECT id, docKey, title, bookKey, bookTitle, content, textKey FROM SQLUser.DocBook" set idfld="id" set grpfld="id" // set dataflds=$LB("content") set metaflds=$LB("docKey", "title", "bookKey", "bookTitle", "textKey") // Lister set stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds,metaflds) if stat '= 1 {write "The lister failed: ",$System.Status.DisplayError(stat) quit } // set stat=myloader.ProcessBatch() if stat '= 1 { quit } set numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId) write "Done",! write "Domain cointains ",numSrcD," source(s)",! zn ns quit }
To search the documentation we use the index% iFind.Index.Analytic:
Index contentInd On (content) As %iFind.Index.Analytic(LANGUAGE = "en", LOWER = 1, RANKERCLASS = "%iFind.Rank.Analytic");
After adding and building such an index, it can be used, for example, in SQL queries. The general syntax for using iFind in SQL is:
SELECT * FROM TABLE WHERE %ID %FIND search_index(indexname,'search_items',search_option)
After creating the index% iFind.Index.Analytic with such parameters, several SQL procedures are generated - [Table name] _ [Index name] Procedure name
SELECT DocBook_contentIndRank(%ID, 'SearchString', 'SearchOption') Rank FROM DocBook WHERE %ID %FIND search_index(contentInd,'SearchString', 'SearchOption')
SELECT DocBook_contentIndHighlight(%ID, 'SearchString', 'SearchOption','Tags') Text FROM DocBook WHERE %ID %FIND search_index(contentInd,'SearchString', 'SearchOption')
When you enter text in the search bar, possible queries are suggested to help you quickly find the information you need. These prompts are created based on the word (or the initial part of the word, if the word input is not completed) that you entered and the ten most similar words or phrases are displayed to the user.
This process occurs using iKnow,% iKnow.Queries.Entity.GetSimilar method
Technology iFind supports fuzzy search, to find words that almost match the search string. Implemented by comparing the Levenshtein distance between two words. The Levenshtein distance is the minimum number of one-character changes (insert, delete, or replace) needed to change one word to another. It can be used to correct typos, small variations in writing, various grammatical forms (singular and plural).
In iFind SQL queries, the search_option parameter is responsible for using fuzzy search.
The value search_option = 3 means Levenshtein distances equal to two.
To set the Levenshtein distance equal to n, you must specify the value search_option = '3: n'
In the search for documentation, Levenshtein distance equal to one is used, we will demonstrate how it works:
Type in the search word ifind:
Let's try to make a fuzzy search, for example, a word with a typo - ifindd. As we can see, the search corrected a typo and found the necessary articles.
Due to the fact that iFind supports complex queries using brackets and AND OR NOT operators, we implemented an advanced search. In the search you can specify: a word, phrases, any of several words, or not containing some words. Fields can be filled as one or several, and all at once.
For example, find articles containing the word iknow, the phrase rest api and containing any of the words domain or UI.
We see that there are two such articles:
Note that the second article mentions Swagger UI, you can add to the query, search for articles that do not contain the word Swagger
As a result, only one article was found:
As mentioned above, using the iFind index creates the DocBook_contentIndHighlight procedure. Using:
SELECT DocBook_contentIndHighlight(%ID, 'search_items', '0', '<span class=""Illumination"">', 0) Text FROM DocBook
We get the search text framed in a tag
<span class="Illumination">
This allows you to visually highlight search results on the frontend.
iFind supports the ability to rank the results by the TF-IDF algorithm. The TF-IDF measure is often used in text analysis and information retrieval tasks, for example, as one of the criteria for the relevance of a document to a search query.
As a result of the SQL query, the Rank field will contain the weight of the word, which is proportional to the number of words used in the article, and inversely proportional to the frequency of the use of the word in other articles.
SELECT DocBook_contentIndRank(%ID, 'SearchString', 'SearchOption') Rank FROM DocBook WHERE %ID %FIND search_index(contentInd,'SearchString', 'SearchOption')
After installation, the button “Search using iFind” is added to the official documentation search.
If the Search words field is filled in, then after clicking on “Search using iFind”, the system will go to the search results page for the entered query.
do ##class(Docsearch.Installer).setup(.pVars)
Source: https://habr.com/ru/post/333582/
All Articles