Using semantic annotation to identify requirements

Good afternoon,% userName%.

In my previous topic on the Management of requirements for IT projects, I touched upon the topic of identifying requirements using concepts and re-using already implemented requirements from one project to another. In this topic, I would like to develop this topic.

Next is a little math, theoretical calculations and a lot of letters.
')

Requirements Management

Requirements management is one of the key processes throughout the software development period. This process provides not only the collection of direct customer wishes, but also their presentation in a form that is accessible to all participants in the software development process.

The use of modern methodologies and programming paradigms, such as object-oriented programming, allows you to create standalone complete modules that can be used in several projects. Reusability is achieved by adhering to the basic principles of object-oriented programming: encapsulation, inheritance, and polymorphism.

Many business processes in enterprises of the same field of activity proceed in a similar way. The differences in these processes are insignificant and are associated with historically established structures of business processes. The flow of business processes in several enterprises in a similar way contributes to the appearance of boxed versions of information systems, where the most common flow patterns of business processes are implemented. To adapt the information system to the specific business processes of a particular enterprise, the supplier organization performs customization of the software product.

When customizing an information system for several enterprises of one subject area, modules developed for one enterprise can be used to customize an information system for another enterprise. The time spent on modifying the module is significantly lower than when developing it from scratch. With the increase in the number of completed modifications, the need for new modifications is reduced by re-using or adapting existing ones.
To reuse the developed modules, it is necessary not only to comply with the principles of object-oriented programming, but also to have a technology that would allow the identification of modules for reuse without the involvement of an expert or with its minimal participation.

In this case, the expert is an analyst or project manager, but since the analyst or project manager cannot participate in all projects of the organization and be aware of all the modifications made, an apparatus is needed to identify the modifications performed and can be searched for reuse. Such a device is a semantic annotation.

Mathematical apparatus of semantic annotation

Using the methods of full-text text analysis does not allow unambiguous identification of texts due to the limitations of the set of used lexical structures. To solve the problem, it is proposed to use semantic annotation, which will allow using a set of small-length concepts to describe the requirement presented in the form of text of greater length.

We define the basic concepts:

Domain - a set of projects of one subject area.
A project is a set of requirements that implement a given functionality, as well as activities aimed at achieving results and creating a unique product or service.
A concept is an attribute that identifies a requirement from a particular point of view, a subject area.
Category or linguistic variable - a set of concepts related to one subject area or point of view. The concept in this case is a term.

We present the requirement using the following model:

where
C is the condition or possibility that the requirement should represent,
R - implementation of this requirement in the system.

The same requirement in natural language can be identified by a set of concepts:

where
Ci is a concept describing a requirement.

Each requirement must have concepts characterizing the requirement from the following points of view:

an object,
subject,
event,
act.

Since the requirements are an integral part of the project, and it, in turn, belongs to a category, each requirement within the domain also receives a set of categories defined for that domain.

Thus, the requirement can be represented as the following model:

where
CO - concept describing the object of the requirement,
CS is a concept that describes the subject of the requirement,
CE is a concept describing a demand event,
CA - concept, describing the action,
{CD} - a set of concepts from the categories received from the domain.

Let us take a measure of the difference between the two requirements: semantic distance, which is an indicator of semantic difference and is a real number in the range from 0 to 1, where 1 - the requirements are identical, 0 - the requirements are completely unrelated. The initial data for the calculation are the concepts that annotated the requirements.

We introduce additional concepts:

An alphabet is an arbitrary non-empty finite set whose elements are called letters or symbols.

A word or chain in the alphabet V is an arbitrary tuple from the set (k-th Cartesian power of the alphabet V) for various k = 0, 1, 2 ...

In this particular case, the alphabet is the totality of all the concepts available in the system; concepts are symbols of this alphabet. The set of concepts describing the requirement is a word whose length is determined by the number of categories of this domain. The position of each character in a word is determined by the category to which the concept belongs, as a result of which we have a finite set of words that can be composed of the symbols of this alphabet.

Semantic distance can be determined based on the calculation of the following indicators:

Levenshtein distance, defined as the minimum number of operations to insert one character, delete one character, or replace one character with another.
The Damerau-Levenshtein distance is a development of the Levenshtein distance and also takes into account symbol permutations. Using this method to find the semantic distance is unjustified, since the characters occupy a strictly defined position in the line in accordance with the category of the concept.
Hamming distance determines the number of positions in which two lines differ.

To find the semantic distance in this article, the Hamming distance is taken as the basis. In general, Hamming distance will be calculated using the following formula:

where
ai1 - the i-th character of the first line
ai2 is the i-th character of the second line.

H is equal to one if the symbols ai1 and ai2 coincide and are equal to zero in all other cases.

To calculate the semantic distance between requirements, we use the Hamming distance in the following form:

where
L - semantic distance
Ci - i-th concept requirements
N - the number of concepts in the requirement (length requirements).

Categories within a domain can have different priorities, that is, vary weights. Coincidence in a category with a large weight should have a greater influence on the semantic distance. In order to reflect the importance of categories within the domain and in the process of calculating the semantic distance, each category is supplied with weight. Weights are determined by the system based on feedback from the expert:

The expert is offered a list of requirements that are similar to those entered (or selected from existing ones) based on the calculation of semantic distance.
The expert notes the requirements that, from his point of view, turned out to be similar.
The categories in which the requirements noted by the expert have coincided increase their weight by one.

Initially, all categories within the domain have a weight equal to one.

Imagine a category in the form of the following model:

where
T - category name,
W - weight category within the domain.

Then the semantic distance taking into account the weights of the categories will be calculated by the following formula:

where
Wi - weight of the i-th category within the domain.
max W –weights of the category with the maximum weight within the domain

The use of the Hamming method is sufficient for working with strings in which each of the characters is independent and not associated with the others. Since concepts are terms represented in natural language, and not just binary meanings, semantic relationships can be established between them, such as synonymy, antonymy, and meonymy.

To calculate the semantic distance, taking into account the semantic relations between concepts, we introduce the following concept model:

where
C - concept
V - the value of the linguistic variable describing the concept,
{S} - a set of concepts that are synonymous with this. The semantic distance between them is 1.
{M} - a set of meromonyms for this concept. The semantic distance in this case is determined by an expert on the basis of the dictionary of meromonyms. The less interconnected the terms, the shorter the semantic distance between them. It is equal to one if the terms are synonymous and tends to zero as it is deleted.

Thus, the semantic distance, taking into account the semantic relations, can be calculated by the following formula:

where

- a set of concepts consisting in a semantic relation with a concept

.

In this case, not only the initial concepts are compared, but also all the semantic relations associated with them. If the source concepts do not match, then the related concepts are compared in the following sequence:

All synonymous concepts are compared. If there is no coincidence among synonymous concepts, then go to step 2.
We compare all concepts with regard to meronymy in descending order of semantic distance. The distance between the original and the concept of measure is in the interval [0..1].

Conclusion

Using semantic distance and semantic annotation allows you to:

Identify similar requirements at the entry stage and prevent re-entry.
Look for similar requirements among those already implemented and reuse their implementing code, test scripts, use cases and other design artifacts.
Perform cluster analysis of requirements for their grouping and subsequent analysis.
Predict requirements parameters.

Prediction of requirements parameters is a priority and will be useful when using flexible programming methodologies, for example, SCRUM to predict the complexity of requirements.

PS: I ask you not to blame for the academic style of presentation - a sample of the pen for publication in the VAK journal.

Source: https://habr.com/ru/post/126248/

All Articles

Using semantic annotation to identify requirements

Requirements Management

More work on this topic

Mathematical apparatus of semantic annotation

Conclusion

More articles: