Latent semantic analysis and artificial intelligence (LSA and AI)

I would like to write this post more in a philosophical way than in a mathematical one (more precisely, an algebraic one): not what a terrible beast - LSA , but what benefit it may be to “our collective farm”, i.e. AI .

It's no secret for anyone that AI consists of many mutually intersecting or weakly intersecting areas: pattern recognition, speech recognition, realization of motor functions in space, etc. But one of the main goals of AI is to teach the hardware to think that it does not only the processes of understanding, but also the generation of new information: free or creative thinking. In this regard, there are questions not so much the development of methods of teaching systems as thinking about the processes of thinking, the possibility of their implementation.

On the basis of the work of the LSA, as already mentioned at the beginning of the article, I will not stop now (I plan in the next post), but for now I’ll send it to Wikipedia , better even English ( LSA ). But I will try to put the main idea of this method in words.
')
Formally:
LSA is used to identify latent (hidden) associative-semantic links between terms (words, n-grams) by reducing the factor space terms-to-documents. Terms can be both words and their combinations, so called. n-grams, documents - ideally: sets of thematically homogeneous texts, or simply any desirable volumetric text (several million word forms), arbitrarily divided into pieces, for example, paragraphs.

"On fingers":
The basic idea of latent semantic analysis is as follows: if in the original probability space consisting of word vectors (vector = sentence, paragraph, document, etc.), no dependence can be observed between any two words of two different vectors, after some algebraic transformation of this vector space, this dependence may appear, and the magnitude of this dependence will determine the strength of the associative-semantic relationship between these two words.

For example, consider two simple messages from different sources (just an example for clarity):

The first source of advertising: "This wonderful phone XXX has a powerful battery!"
2nd source blogs: "By the way, device XXX has a good battery . "

Since the vocabulary of blogs and advertising does not overlap much, the words “ battery ” and “ battery ” will get different weights, say, the first is small, and the second, on the contrary, is large. Then these messages can be combined only on the basis of the name “ XXX ” (strong criterion), but the details about the battery (let's call it a weak criterion) will disappear.
However, if we carry out the LSA, the weights of the “ battery ” and “ battery ” will even out, and these messages can be combined on the basis of a criterion that is not the least important criterion for the product.
Thus, the LSA "tightens" together the words are different in writing, but close in meaning.

The question is, why is this necessary, and where is the associative-semantic link and AI? Turn to the story.

One of the questions that great thinkers of mankind have been asking since the time of Plato is the question of our possibility of knowing the world. In the 20th century, the famous American linguist Noam Chomsky formulated the so-called Plato's problem: why is the amount of knowledge of an individual person much more than he can extract from his everyday experience? In other words, how can information obtained from a sequence of relatively small variability of events be correctly used and adapted to a potentially infinite number of situations?

For example, the vocabulary of children increases by an average of 3-8 words daily. At the same time, as linguists say, the denotation does not always have its strictly defined referent, or with human words - not every word has a correlation with real-life things or actions performed (for example, abstract concepts, words carrying uninformative load, etc.).
The question arises: how does a child define each new meaning of a word and its relationship with other meanings, or why do new “meanings” (denotations) form and how do they relate to each other?

The work of “semantic” mechanisms can be conceptually compared with the processes of categorization or clustering. With this approach, the problem arises of determining the initial concepts or primary clusters, their boundaries and their number.

LSA, its varieties (PLSA, GLSA) and others like it (LDA - Dirichlet’s notorious latent placement) allows to simulate associative-semantic links between words, which, on the one hand, makes it possible to abandon the rigid linkage of the lexical unit to any of the clusters, and on the other hand, holistic system of connections between words.

This means that words in our brain are not classified by concepts (they do not lie on clusters), but form a complex system of connections between themselves, and these connections can dynamically change depending on many reasons: context, emotions, knowledge about the external world and etc., etc. Algorithms like LSA give us the opportunity to model the simplest elements of “understanding”. But, they will object to me how to prove that the brain works on the principle of LSA. Most likely nothing at all, because there is no need for this: airplanes also fly, but do not flap their wings. LSA is just one of the methods that allows you to simulate the simplest systems of “thinking” for their use both for practical purposes (intelligent systems) and for further research on human cognitive functions.

The obvious disadvantage of LSA is the abnormality (non-Gaussiness) of the probability distribution of words in any natural language. But this problem can be solved by smoothing the sample (for example, using phonetic words: the distribution becomes more “normal”). Or use a probabilistic LSA, so called. PLSA based on multinominal distribution.
Other, less obvious drawbacks of the LSA (and similar methods) with regard to the processing of unstructured information include the “nebula” of the method itself (in particular, the choice of the number of singular values of the diagonal matrix) and the interpretation of the result, not to mention the problem of balancing the training text.

As a rule, for the qualitative construction of the model, less than 1-2 percent of the total number of diagonal values are left (after the SVD transformation, but about this in the next post). And, as practice shows, an increase in the number of factors leads to a deterioration in the result. But having reached approximately 10 percent of the total number of diagonal values, there may again be a surge, similar to the result obtained at 1%.

The balance of the body is an eternal problem that does not have a good solution today. Therefore, it is pleasant to keep quiet about her.

Interpretability of the LSA results (as well as DLA) is also difficult: a person can still understand what the topic will contain the topic obtained as a result of the analysis, but the machine does not understand (not annotate) the topic without attracting a large number of good and different thesauri.

Thus, despite the laboriousness and opacity of the LSA, it can be successfully used for various tasks, where it is important to catch the semantics of the message, to generalize or expand the “meanings” of the search query.

Since this post was written in ideological terms (and why is this necessary?), I would like to devote the next post to practical things (and how does it work?).

Literature:
1. Landauer TK, Dumais ST: A Solution to Plato's problem: Psychological Review. 1997. 104. - P.211-240.
2. Landauer TK, Foltz P., Laham D. An Introduction to Latent Semantic Analysis. Discours Processes, 25, 1998 - P.259-284.
3. www-timc.imag.fr/Benoit.Lemaire/lsa.html - Readings in Latent Semantic Analysis for Cognitive Science and Education. - Collection of articles and references about LSA.
4. lsa.colorado.edu - a site dedicated to LSA modeling.

Source: https://habr.com/ru/post/230075/

All Articles

Latent semantic analysis and artificial intelligence (LSA and AI)

More articles: