📜 ⬆️ ⬇️

Using semantic annotation to identify requirements

Good afternoon,% userName%.

In my previous topic on the Management of requirements for IT projects, I touched upon the topic of identifying requirements using concepts and re-using already implemented requirements from one project to another. In this topic, I would like to develop this topic.

Next is a little math, theoretical calculations and a lot of letters.
')
Requirements Management

Requirements management is one of the key processes throughout the software development period. This process provides not only the collection of direct customer wishes, but also their presentation in a form that is accessible to all participants in the software development process.

The use of modern methodologies and programming paradigms, such as object-oriented programming, allows you to create standalone complete modules that can be used in several projects. Reusability is achieved by adhering to the basic principles of object-oriented programming: encapsulation, inheritance, and polymorphism.

Many business processes in enterprises of the same field of activity proceed in a similar way. The differences in these processes are insignificant and are associated with historically established structures of business processes. The flow of business processes in several enterprises in a similar way contributes to the appearance of boxed versions of information systems, where the most common flow patterns of business processes are implemented. To adapt the information system to the specific business processes of a particular enterprise, the supplier organization performs customization of the software product.

When customizing an information system for several enterprises of one subject area, modules developed for one enterprise can be used to customize an information system for another enterprise. The time spent on modifying the module is significantly lower than when developing it from scratch. With the increase in the number of completed modifications, the need for new modifications is reduced by re-using or adapting existing ones.
To reuse the developed modules, it is necessary not only to comply with the principles of object-oriented programming, but also to have a technology that would allow the identification of modules for reuse without the involvement of an expert or with its minimal participation.

In this case, the expert is an analyst or project manager, but since the analyst or project manager cannot participate in all projects of the organization and be aware of all the modifications made, an apparatus is needed to identify the modifications performed and can be searched for reuse. Such a device is a semantic annotation.

More work on this topic

Work with requirements involves their collection and subsequent processing. This requires a mechanism that would uniquely identify the requirements and perform a search among existing ones.

Most of the description of the requirements is textual, that is, using natural language - the limitations and necessary capabilities are described in the form of text using subject terms.

When adding a new requirement to the project, it is necessary to perform a search among the requirements already existing in the project in order to eliminate their duplication. In this case, the identity of the requirements is determined by the semantic correspondence of the texts by which these requirements are presented. To determine compliance with the requirements, a mechanism for determining the similarity of texts is necessary.

The most common method for determining the similarity of texts is the algorithm of shingles. This algorithm allows to detect fuzzy duplicates of texts and can be used to cluster documents by similarity and highlight plagiarism documents.

The use of this algorithm, as well as its modifications (the algorithm of super-caps and megashingle) does not give a representative result, since the description of the requirements uses a limited set of lexical structures, which does not allow to obtain an exact result.

Mathematical apparatus of semantic annotation

Using the methods of full-text text analysis does not allow unambiguous identification of texts due to the limitations of the set of used lexical structures. To solve the problem, it is proposed to use semantic annotation, which will allow using a set of small-length concepts to describe the requirement presented in the form of text of greater length.

We define the basic concepts:

We present the requirement using the following model:

where
C is the condition or possibility that the requirement should represent,
R - implementation of this requirement in the system.

The same requirement in natural language can be identified by a set of concepts:

where
Ci is a concept describing a requirement.

Each requirement must have concepts characterizing the requirement from the following points of view:

Since the requirements are an integral part of the project, and it, in turn, belongs to a category, each requirement within the domain also receives a set of categories defined for that domain.

Thus, the requirement can be represented as the following model:

where
CO - concept describing the object of the requirement,
CS is a concept that describes the subject of the requirement,
CE is a concept describing a demand event,
CA - concept, describing the action,
{CD} - a set of concepts from the categories received from the domain.

Let us take a measure of the difference between the two requirements: semantic distance, which is an indicator of semantic difference and is a real number in the range from 0 to 1, where 1 - the requirements are identical, 0 - the requirements are completely unrelated. The initial data for the calculation are the concepts that annotated the requirements.

We introduce additional concepts:

An alphabet is an arbitrary non-empty finite set whose elements are called letters or symbols.

A word or chain in the alphabet V is an arbitrary tuple from the set (k-th Cartesian power of the alphabet V) for various k = 0, 1, 2 ...

In this particular case, the alphabet is the totality of all the concepts available in the system; concepts are symbols of this alphabet. The set of concepts describing the requirement is a word whose length is determined by the number of categories of this domain. The position of each character in a word is determined by the category to which the concept belongs, as a result of which we have a finite set of words that can be composed of the symbols of this alphabet.

Semantic distance can be determined based on the calculation of the following indicators:

To find the semantic distance in this article, the Hamming distance is taken as the basis. In general, Hamming distance will be calculated using the following formula:

where
ai1 - the i-th character of the first line
ai2 is the i-th character of the second line.

H is equal to one if the symbols ai1 and ai2 coincide and are equal to zero in all other cases.

To calculate the semantic distance between requirements, we use the Hamming distance in the following form:

where
L - semantic distance
Ci - i-th concept requirements
N - the number of concepts in the requirement (length requirements).

Categories within a domain can have different priorities, that is, vary weights. Coincidence in a category with a large weight should have a greater influence on the semantic distance. In order to reflect the importance of categories within the domain and in the process of calculating the semantic distance, each category is supplied with weight. Weights are determined by the system based on feedback from the expert:

Initially, all categories within the domain have a weight equal to one.

Imagine a category in the form of the following model:

where
T - category name,
W - weight category within the domain.

Then the semantic distance taking into account the weights of the categories will be calculated by the following formula:

where
Wi - weight of the i-th category within the domain.
max W –weights of the category with the maximum weight within the domain

The use of the Hamming method is sufficient for working with strings in which each of the characters is independent and not associated with the others. Since concepts are terms represented in natural language, and not just binary meanings, semantic relationships can be established between them, such as synonymy, antonymy, and meonymy.

To calculate the semantic distance, taking into account the semantic relations between concepts, we introduce the following concept model:

where
C - concept
V - the value of the linguistic variable describing the concept,
{S} - a set of concepts that are synonymous with this. The semantic distance between them is 1.
{M} - a set of meromonyms for this concept. The semantic distance in this case is determined by an expert on the basis of the dictionary of meromonyms. The less interconnected the terms, the shorter the semantic distance between them. It is equal to one if the terms are synonymous and tends to zero as it is deleted.

Thus, the semantic distance, taking into account the semantic relations, can be calculated by the following formula:

where
- a set of concepts consisting in a semantic relation with a concept .

In this case, not only the initial concepts are compared, but also all the semantic relations associated with them. If the source concepts do not match, then the related concepts are compared in the following sequence:
  1. All synonymous concepts are compared. If there is no coincidence among synonymous concepts, then go to step 2.
  2. We compare all concepts with regard to meronymy in descending order of semantic distance. The distance between the original and the concept of measure is in the interval [0..1].

Conclusion

Using semantic distance and semantic annotation allows you to:
  1. Identify similar requirements at the entry stage and prevent re-entry.
  2. Look for similar requirements among those already implemented and reuse their implementing code, test scripts, use cases and other design artifacts.
  3. Perform cluster analysis of requirements for their grouping and subsequent analysis.
  4. Predict requirements parameters.

Prediction of requirements parameters is a priority and will be useful when using flexible programming methodologies, for example, SCRUM to predict the complexity of requirements.

PS: I ask you not to blame for the academic style of presentation - a sample of the pen for publication in the VAK journal.

Source: https://habr.com/ru/post/126248/


All Articles