Neurotegs

I. Introduction

In this article, I want to consider one of the options for implementing what the W3C * (World Wide Web Consortium) began to develop as the Semantic Web **.

The ideas I presented here are not the concept of the semantic web and I do not attach myself to the standards developed by the W3C, since from my point of view, the most viable model of the future information network has a slightly different look and other needs, but this concept will have much in common with semantic web.

Much has already been said about the theory of the Semantic Web, but a lot of time has passed, and we do not observe the implementation of these ideas on the Internet.
')
I will focus on the conceptual framework of the framework, which embodies everything that those who spoke about the semantic web dreamed of and try to consider the practical aspects of developing such a system and delve into solving the emerging problems. First of all, these are the problems of adapting such a complex system for users, as well as constructing a conceptual model so that users have the motivation to interact with the system and so that they can easily and conveniently receive everything they need from it.

In this key, it is especially important to interact with users, because the system is essentially self-learning and the information it will have to draw from communication with users. Therefore, this task should be approached not only as a programmer, but also as a sociologist, psychologist and, most importantly, as an inventor.

This framework can be considered the foundation of a network, in relation to which the Internet is simply a means of transferring data, a set of protocols and programs for working with them, which are abstracted one more level above such protocols as HTTP, SMTP, SNMP, FTP, Telnet.

This is a model of the management system and self-organization of a distributed knowledge base of a global scale, which at the present stage of development of the Internet can perform the functions of the most ordinary CMS, only with significantly expanded functionality and successfully serve for commercial purposes.

Practical aspects and concrete solutions to the tasks ahead, and the first approach is still done by the theory.

* The World Wide Web Consortium (W3C) - an organization that develops and implements technology standards for the World Wide Web

**: The semantic web is the concept of a network in which every resource in human language would be provided with a description understandable to a computer.

Ii. Neurotegs

Now there are a lot of websites, united by some general principles, the union of which is fashionable to call "Web2.0"
One of the tendencies of this phenomenon is folksonomy *.

It's good. And if you look further?

1. Theory of neurotegs

What is missing in the usual tags (tags, keywords)?
Relationship, classification.

Let's introduce a new concept:
Neuroteg will be the usual keyword and the system of interrelations of this keyword with other keywords.
Such structures are often used in expert systems ** to form a knowledge base. In the relevant literature on expert systems, this concept will correspond to semantic tags.

Any neuroteg can have any number of links of any type with other neurotags.

Relationships can also be classified, for example:
A linked object is:
• synonym
• Antonym
• Parental category (Hyperonym) note: food = G (pie)
• Child category (descendant of the tag, Hyponym) approx .: stool = g (furniture)
• A related category (tags have many common parents): Guppy = p (Goldfish) (the common hyperonym is an aquarium fish)
• Part of this object (meronim) note: engine = m (car)
• Object including dany (holonim) note: house = x (roof)
• Translation of a word into another language (synonym subtype) note: Sky = t (Sky)
It looks like a triplet RDF ***, having the form "subject - predicate - object"
(For example, the statement “Green eyes” in RDF-terminology can be represented as follows: the subject is “eyes”, the predicate is “color”, the object is “green”)
But in this case there is a limited set of predicates.

On the one hand, this type of communication, such as a related category, may seem redundant, since we can always determine whether two neurotegs are related according to their parents' list, but on the other hand, the presence of this type of intertext communication can significantly improve the network compression process neuroteg, which will be discussed a little lower.

This type of connection is a union of generic-species relations as well as a synonym / antonym and part / whole relationship and forms a hierarchical structure.

In addition, each connection has a parameter, such as power, which actually shows how relevant the connection is from the point of view of human logic.

This type of organization of information is called the semantic network **** with weighted (fuzzy) connections. Such systems are often used in expert systems as a knowledge base.
In addition to the above types of communication, there may be a lot of other relations, such as functional connections (defined usually by the verbs “produces”, “influences” ...), quantitative (more less, equal ...), spatial (far from, close to, behind, under, above ...), temporary (earlier, later, during ...), attribute (to have a property, to have a value), etc.
You can go the other, more interesting way: not to classify the connection. Theoretically, the classification of connections according to the above method is superfluous, but this step is most likely completely justified due to the problems with computing power arising in the process of developing such systems. Even in a system with unclassified connections, it is possible to determine exactly how the two neurothegs are interconnected.
For example, if you only bind parent / child neurotegs, then of two interconnected neurotegs in a large system, the parent will always have more connections.

It is also possible to consider each individual connection as a vertex (neuroteg) and establish connections with other vertices.
For example, if there are two related tags: "sky" - "blue", then the very interconnection of these two tags will be nothing more than "blue sky".
Thus, we can organize generic-species relationships even without typing predicates.

In such a network, theoretically, a mass of ring connections can form, so that when recursively traversing vertices, it is necessary to control the depth of recursion in order to avoid infinite circuits and stack overflow.

Now I would like to consider the case of classified inter-connectors, because IMHO: it is more intuitive for human thinking.

We will learn about all the benefits and problems of a system with such a network of interconnected keywords as we reflect on the practical aspects of developing such a system.

* Folksonomy (from folksonomy, from folk to folk + taxonomy to taxonomy) is a neologism, denoting the practice of joint categorization by means of randomly chosen keywords. In other words, this concept refers to the spontaneous cooperation of a group of people with the goal of organizing information in a category that attracts attention because it is completely different from traditional formal methods of faceted classification. As a rule, this phenomenon occurs only in non-hierarchical communities, such as publicly accessible websites, and not in multi-level teams. Since information organizers are usually its main users, folksonomy produces results that more accurately reflect the cumulative conceptual information model of the entire group.

** The expert system is an intellectual program that can replace an expert expert in solving a problem situation, do
logical conclusions based on knowledge in a particular subject area and
providing the solution of specific tasks.
ES began to be developed by artificial intelligence researchers in the 1970s, and in the 1980s they received commercial reinforcements.

*** The Resource Description Framework is a model developed by the W3C consortium to describe resources, in particular, metadata about resources. The basis of this model is the idea of using a special kind of statements made about the resource. One of the main objectives of RDF is to provide assertions in the same way in machine and human-recognizable form. There are several syntaxes for representing RDF information, the most common of which are: RDF / XML, triplets, and a graph model.
**** Semantic network is one of the ways to represent knowledge. The title combines terms from two sciences: semantics in linguistics studies the meaning of sentences, and the network in mathematics is a kind of graph. In the semantic network, the role of vertices is fulfilled by the concepts of the knowledge base, and the arcs (and directed ones) define the relations between them. Thus, the semantic network reflects the semantics of the domain in the form of concepts and relationships.

Iii. Closer to the body

So, the relationship. Where do you get them from?
With the usual labels (keywords) it is clear, the user simply enters them in the specified field.
But how does the system learn about the links between them?
First you have to ask the users.
IMHO: the most convenient is an unobtrusive and strictly metered polling program, issued to the user in the environment of blocks of content of interest.

In fact, flipping through the pages of some informational web resource with a neuroteg system, the user will stumble upon a small questionnaire using ajax technology to keep the user from his information by page overload and occupying a minimal content area, something like:

Q: How are the "milk" and "goat's milk"?
1. “milk” is a subsidiary category of the concept “goat milk”
2. "Goat's milk" - a subsidiary category of the concept of "milk"
3. synonyms
4. the same thing in different languages
5. there is no connection

But simply to insert such polls in the content of the site would be ineffective.
Such an approach would simply prevent the user from watching content of interest.
Therefore, an incentive is needed.

In this case, the most convenient incentive could be the effect of participation in surveys on the user rating in the system, because user rating on web resources has recently become a very fashionable and popular feature.

To do this, it is advisable to have a separate characteristic - special. a rating that would strictly affect the overall rating within a limited framework in order to prevent also the growth of the user rating only due to system polls. (hereinafter, the integrated questionnaire about the relationship of neurotegags will be called a system poll)
User participation in a system poll increases the power of the selected relationship by 1, or does nothing (it is also possible that the “no connection” connection is established between neurotegs, but this is clearly redundant data).
You can also when increasing the power of the relationship to consider the user's rating.

Neurotegs for surveys are selected from the neuro-tag lists of a single content unit, i.e .:
For example, if a user uploaded a photo to a server and tagged with keywords:
star sky, sky, stars, night, it is logical to assume that these tags have some kind of relationship.

But the most interesting feature of the neuro-tag network is self-organization.
For example, if there are such connections:

"Laws of Ukraine" - a subsidiary category of the neurotega "law"
“Law” is a subsidiary category of the neurotega “politics”
“The law on privatization” is a subsidiary category of the neurotega “law of Ukraine”

It is logical to assume that

“Laws of Ukraine” is a subsidiary category of the neurotega “politics”,
“Law on privatization” is a subsidiary category of the neurotega “politics”
“Law on privatization” is a subsidiary category of the neurotega “law”

The system is able to derive new knowledge from old ones, to find patterns in the knowledge base, as well as to find contradictions, discrepancies in the BZ, to monitor the correct organization of the BZ (introspection), as well as to prove its conclusions, to "explain" the line of reasoning.
And in any controversial issues, such ones will always arise, since we use weights (fuzzy links), the system will generate unobtrusive polls to registered users.

With a rather large database of neurotegs, there are more and more opportunities to establish intertext relationships without user participation, their number grows exponentially with the growth of web resources, therefore large-scale, large web resources will have the tangible benefits from such a system.

The mechanism of system surveys and rankings is quite a good solution to cope with the support of the knowledge base, which is the bottleneck in Expert Systems, which required further development ... But as we see, it has found its development a bit in another area.

The first and most basic reason for the difficulty of maintaining knowledge bases in expert systems is the rapid “aging” of knowledge, and the speed of “obsolescence” often exceeds the speed of creating the ES itself. The second reason is the lack of the necessary number of specialists in the field of knowledge engineering.
And in the global social network of a new generation, each registered user is a specialist in the field of knowledge engineering. A specialist from whom, in essence, no knowledge of any engineering is required.
Yes, and the problem of rapid aging of knowledge is not so relevant here, because we work with superglobal data, unlike expert systems.

This is a great benefit for search engines, because this makes it possible to significantly expand the search area and make them much more interactive.

And it will turn the chaotic “tag cloud”, so fashionable in Web 2.0 into a structured tree structure.

Rationalizers have the right to say that such a system requires unnecessarily large computational power and the neuroteg database will very quickly grow to enormous size, moreover it requires more attention from the user and this may not justify the benefits that we get using the neuroteg network.

But after all, in fact, we have not reached the point of considering the real advantages, since this is connected with the practical implementation of specific services. And this is a good place where you can apply your creative potential in developing fundamentally new approaches to the user interface, since we are working in this case with a fundamentally different organization of data.
Looking at pipes.yahoo.com for example, you begin to realize that the possibilities for creating really convenient interfaces for working with such complex data structures are quite solvable tasks and can be done much more conveniently than it may seem to an uninitiated person in the subtleties of the case.

But the most important thing that can give us a similar approach to the development of web resources, I will tell in another article. This is a slightly different topic, a different technology, but it cannot exist without a network of neurotegues and complements it, allowing you to apply a fundamentally new approach to the development of many social services, including very fresh for the current Internet, the creation of which people didn’t even think much about Seeing their obvious inefficiency ... in which we will be helped by the neuro-tags network.

To be continued.
© 2007 Vitaly Stepanenko
I mean, I :)

Source: https://habr.com/ru/post/3447/

All Articles

Neurotegs

More articles: