Macromedia: analysis and interpretation of multimedia information. M-lang

This article is devoted to the general problems of the use and development of macro-media technologies. Based on the well-known principles and methods of analyzing and processing information, the author has set himself the goal of defining the basic concepts and rules necessary for developing some generative grammar and language for describing the process of analyzing multimedia information.
Two approaches to the analysis of multimedia information are considered: content and content-interpretation. The article also outlines the basic rules and provides examples of constructions and specifications of the language for describing graphical information analysis algorithms - M-Lang.

Introduction

Based on observations and studies of the development of the Internet in recent years, it is possible to draw conclusions about the prevailing number of new technologies for storing, outputting, processing and searching for multimedia and graphic information in comparison with its simplest, textual form. The reasons for this are obvious: an increase in the capacity of the connection, an increase in computing power and disk spaces, high competition of resource owners in a developing commerce environment (the potential of which will not be exhausted soon).
In addition, thanks to a simple idea, good advertising, novelty and thoughtful execution, such successful resources as FaceBook, Twitter, YouTube, and their Russian counterparts VKontakte and Odnoklassniki gained great popularity. Thus, two defining conditions for the development of a new informational entity, Macromedia technology, were formed.
In general, macromedia is a set of technologies for the transfer, storage, processing, display, analysis and interpretation of multimedia content. Macromedia is characterized by: display flow, that is, simultaneous transmission to the user and display on its side; mainly multimedia (video / audio) nature of information; processes of interpretation, analysis and comparison of this information. The tasks of transferring, storing, processing and displaying, by themselves, are well studied, but the question of analyzing multimedia information remains open.

1. Content analysis method.

In the most general case, the task of analyzing multimedia information is similar to the task of analyzing textual information. The analysis of the text, however, is a much simpler process, if only because it is not necessary to interpret the data into any presentation that is convenient for analysis at another level of abstraction. Also the task is simplified by a small relative to graphical information the order of data sizes.

Example 1
Two text files were uploaded to the server:

A.txt	B.txt
Grass in the yard, firewood on the grass.	Birch firewood, inexpensive. Pickup from the yard. Tel. 123-45-67

The simplest analysis of relevance will be as follows:
The system splits each file into an array of tokens, removes auxiliary words, punctuation marks, and performs a two-way search for tokens from one file to another, taking into account the morphology. As a result, a link between the files “firewood” and “yard” will be found and reflected in the search index.
Of course, modern technologies for analyzing textual information are significant, ten times more complicated, and operate with a variety of parameters, but their essence remains the same - to interpret files into structures and compare contents.
More difficult is the task of analyzing multimedia information.
Example 2
Two images were uploaded to the server:

A.bmp	B.bmp

Comparing files by content directly will not give anything - from the point of view of the machine they are different images, but in reality they are not.
At this stage, there are two approaches to solving this problem:
The content analysis method is a relatively simple approach that allows you to determine and measure data similarity, but no more. The essence of the approach is similar to Example 1: break the data into its component parts and compare them directly. In example 2, it will look like this:

Thus, when comparing without taking into account the transformations (in this case, reflections along the horizontal of the third element), the images are identical by 75%, and taking into account the transformation, they are identical by 100%. We write this as “{A, B} (75/100)” and will continue to adhere to such a record (let's call it M-Lang). In the second case, it is possible to draw conclusions about the belonging of both images to a certain set. For example, "smile". Here comes the second approach.

2. Content-interpretation.

The essence of the approach is to interpret the components of multimedia data into certain concepts in a formal language, and build relationships between these concepts. The main advantage of this approach is the learning ability of the system. It should be noted that the composition / decomposition can occur an unlimited number of times, also applies to the number of possible recursions and / or iterations in the process of analysis.
Consider the simplest case.
To illustrate, let's expand example 2 with the concept of “tags” and build a general table of all the information we have.
Example 3

Further, using the concepts of weight, probability and statistics, predicative and boolean logic, we describe the course of the system in terms of M-Lang. M-Lang is a language developed by the author for the description of algorithms, rules and specifications for recognition, analysis and interpretation of graphical information.
The basic rules and constructions of the language M-lang:

{O1, ..., On} - a joint set of objects.
{O1, ..., On} (K) - the objects are jointly identical by K percent
O1 [(N) “word”] - the O1 object has the word tag, and the weight of the object in the tag is N
O1 = {P1, P2, P3} - the object O1 consists of the objects P1, P2, P3
{O1 | ... | On} - objects belong to the set.
Also, in the language, operators and transformations of Boolean logic and predicative logic are admissible.
You can use the rules of fuzzy logic.

It should be clarified that by objects we mean an entity that is separate and unambiguous at the current level of abstraction, for example: an image, a part of an image, a statement describing something. The object consists of other objects, and is also part of the object. It is possible to carry out image decomposition in as many details as possible, however, it is rational to do this until the ambiguous sets of objects cease to appear.
Consider an example of the description of the process of image recognition, based on the figure from example 3.

Formula	Decryption
{A, B, C} (75/75)	Objects A, B and C are jointly identical at 75% excluding transformation and 75% inclusive of transformation.
{A, B} (75/100)	Objects A and B are jointly identical at 75% excluding transformation and 100% including transformation.
{A [“smile”], C [“smile”]} (75/75)	Object A has an explicit “smiley” tag, object C has an explicit “smiley” tag, while objects A and C are jointly identical at 75% without taking into account the transformation and 75% with taking into account the transformation.
A = {a1 (50%); A2 (25%); a3 (25%)} or {a1 (50%); A2 (25%); a3 (25%)} = A	Object A consists of parts a1, a2 and a3 in proportion 2 \ 1 \ 1 or parts a1, a2 and a3, in proportion 2 \ 1 \ 1 make object A.
{a1 (50%); a2 (25%); a3 (25%)} [“smile”] -> T0: ({a1} [(50) “smile”] \| {a2} [(25) “smile”] \| {a3} [(25) “smile”])	A1, A2 and A3, combined in proportion 2 \ 1 \ 1 have an explicit smiley tag, hence the statement T0: part a1 has 50 weight in the “smile” tag, a2 has 25 weight in the “smile” tag, a3 has 25 weight in the tag "Smile".
{b1 (50%); b2 (25%); b3 (25%)} [“sadness”] -> T1: ({b1} [(50) “sadness”] \| {b2} [(25) “sadness”] \| {b3} [(25) “sadness”])	b1, b2 and b3, combined in proportions 2 \ 1 \ 1 have an explicit sadness tag, hence T1 statement: part b1 has 50 weight in the “sadness” tag, b2 has 25 weight in the “sadness” tag, b3 has 25 weight in the tag "Sadness."
{c1 (50%); c2 (25%); c3 (25%)} [“smile”] -> T2: ({c1} [(50) “smile”] \| {c2} [(25) “smile”] \| {c3} [(25) “smile”])	c1, c2 and c3, combined in proportion 2 \ 1 \ 1 have an explicit smiley tag, hence the statement T2: part c1 has 50 weight in the smile tag, c2 has 25 weight in the smile tag, c3 has 25 weight in the tag "Smile".
T3: {a1, b1, c1} (100/100) T4: {a2, b2, c2} (100/100)	Assertion T3: Parts a1, b1 and c1 are 100% identical without taking into account the transformation and 100% with taking into account the transformation. Statement T4: Parts a2, b2, and c2 are 100% identical without taking into account the transformation and 100% with taking into account the transformation.
T1 v T2 -> T5: ({a1 \| b1 \| c1} [(100) “smile”]) \| ({a2 \| b2 \| c2} [(50) “smile”]) \| ({a3 \| b3 \| c3} [(25) “smile”])	Proceeding from the union of assertions T1, T2, T3 and T4, the assertion T5 follows: a1, b1 and c1 have a weight of 100 in the smile tag, a2, b2 and c2 have a weight of 50 in the smile tag, a3, b3, c3 have a weight of 25 in the tag "smile"
T5 v T3 v T4 -> b1 [(100) “smile”] v b2 [(50) “smile”] -> B [(100) “sadness”] [(150) “smile”]	Based on the combination of assertions T5, T4 and T3, the statement follows that parts b1 and b2 have a weight of 100 and 50 in the smile tag, respectively, therefore object B has an explicit sadness tag and an implicit smiley tag with a weight of 150.

Thus, the object B was assigned the Smile tag based on the comparison and interpretation of the content.
This example is trivial, and the grammar of the language M-lang is simplified. For example, the position of parts in space, color, quality, format, codecs and input file compression mechanism are not taken into account; for video files, the length of time. A separate language specification is required for analyzing audio streams. However, further developments, the author is confident, will allow turning M-lang into a powerful tool for modeling and creating rules for analyzing and interpreting multimedia information.
The main advantages of M-lang.

The simplicity of the grammar of the language.
Use a wide range of universal concepts such as probabilities, weights, fuzzy logic, etc.
A language is easy to interpret both into formal languages and algorithmic ones.
The language can be used both for the development of a whole algorithm or general rules, for and for verification of ready-made systems, including those developed without applying the description on M-lang.
Easy to understand.

Conclusion

Due to the lack of standards and more or less existing technologies with open source, there is an urgent need to create a model for developing and describing algorithms for analyzing and interpreting multimedia information. Such a language can be developed by the author M-lang, which uses the elements of algorithmic and functional development languages, methods of probability theory and mathematical statistics, Boolean algebra and predicative logic to describe the rules and algorithms. The main advantage of this language is simplicity in translation to both algorithmic and natural languages.

Source: https://habr.com/ru/post/112377/

All Articles

Macromedia: analysis and interpretation of multimedia information. M-lang

Introduction

1. Content analysis method.

2. Content-interpretation.

Conclusion

More articles: