One Approach to intellectual image analysis

. This study investigated the method of semantic image analysis by using a set of neuron-like detectors of foreground objects. This method is intended to find different types of foreground objects and to determine properties of these objects. As a result of semantic analysis the semantic descriptor of the image is created. The descriptor is a set of foreground objects of the image and a set of properties for each object. The distance between images is defined as distance between their semantic descriptors. Using the concept of distance between images, "semantically similarity" between images or videos is defined.

are used, making two level images (binarization), or N-level images [1][2].The artificial features are histograms of brightness distribution, spectra of spatial frequencies, the results of the processing of Haar cascades [3].Then this feature set goes to the block symbolic representation, which forms the characters of the features.For example, contour points are grouped into line segments or closed curves, elements with the same brightness etc.These objects description already has semantic representation.
A coherent semantic description of the object is formed by using these semantic characteristics, and it allows making the categorization and identification of detected objects.The information obtained is stored in a general database and can be used for further search.The comparison of proximity between the images is already made on the basis of these semantic descriptions.
But this approach gives a number of issues making the procedure very difficult and sometimes essentially impossible.There are changes in brightness, scale, angle of view and the registered image angle.Serious limitations also appear in case of registered movement of the camera relative to the scene objects.In this case, the so-called semantic gap -the absence of adequate matching between the low-level descriptive features of the graphical object (color, brightness, relative positions and sizes of fragments, etc.) and its semantic description.Schematically this is presented in Figure 1

Key idea
In this paper we consider one possible approach to constructing semantic descriptions of registered scenes image, for its subsequent use for effective analysis of graphic images and video scenes.

Video data signatures
The basic idea is to use the totality of the intellectual video detection, which allows to carry out detection and identification of particular classes of objects at the stage of primary video recording.An example of such a video detector, for instance, may be a video face detector, widely used currently in a various applications: smartphones, authentication system, etc.If we assume that we have setup VMD under a unique object (a person, object, figure, angle, shape/s, action, moving, etc.), the descriptors of individual objects, their relative positions would accurately reflect the specific scene.
Often, to help you find the desired graphic information in large databases of images and videos, formal external description of the data, called metadata are used.Signatures of images and videos in a sense can also be considered as metadata, as are formal appearance, relative to the source image information description.Video data signatures, in the form of metadata, depend only on the algorithm for generating signatures and on the data.
In this work, we propose to overcome the semantic gap between mathematical and semantic description of video data using the technique of "semantic analysis of video information".
In the application to static images the semantic analysis consists of the designation of certain areas or the whole mage by units of a natural language, and in the application to video -dynamic scenes marking by units of natural language.
A tool that is able to automatically designate a certain area of the image is a detector of objects of the specified type and tracker, which provides support for found objects from frame to frame [3,4].The nature of the detector can be different -neural network, cascade connection, based on the color or the contour analysis, texture analysis and so on.The same applies to the nature of the tracker.Important property of the detector is the space localization of object in the image with which it is associated, and for the tracker the main property is the localization of the object in time.The detector indicates the presence of something described by the noun with which it is associated.
Significant expansion of functionality of the proposed system gives us possibilities for analysis of localized detector regions property.
The properties of image area found by the detector is represented by the detector attributes [5], which describes some of the field characteristics by an adjective.Thus, we have a relationship between the noun, its properties, and the detector, the detector attributes and the region on the image.This relationship is the basis of semantic analysis of the static image.And localization of the object found in time using a tracker can be described by a verb of natural language.
The Bank of various detectors, there are a few detectors that analyze the image at the same time, correspond to the controlled vocabulary system for semantic image analysis.The volume of the dictionary says about possibilities of semantic analysis of images.For larger size dictionary semantic analysis will be more pure and complete.

The primary result
The primary result of the semantic analysis of image is the following table: The Table 1 is named as semantic signature of image.On figure 2 an example of semantic signature forming is shown.

Fig. 2.
The formation of semantic signatures image using a Bank of videodetection objects and attributes detectors.

Distance between the images
As shown above, to describe the image so-called "semantic signatures" are used, which can be represented in the form of a table.The semantic description of the image is a set V={v ⃗_1..v ⃗_n }, which every element v ⃗_i={〖x_1〗^i..〖x_k〗^i },i=(1,n) describes the properties of a specific fragment of an image.The elements of these vectors form a set for description of the properties of the fragment obtained using available detectors and object detectors attributes.To build the system, we need to introduce the definition of distance between semantic descriptions of images, the example is presented in Figure 2. To solve the problem, we introduce the following definitions: Recoding.This encoding is used to represent each element of the vector v⃗ = {x_1..x_k}, in the form of a number.
The distance between two vectors v⃗ = {x_1..x_k} that describes some fragment of an image.
The distance between the sets V = {v⃗_1..v⃗_n} that describes the entire image and is its semantic description.

Decoding and comparison of codes
This encoding is used to represent each element of the vector v⃗ = {x_1..x_k} in the form of a code number corresponding to a particular property of the found object.Consider the elements of a vector in more detail.They are in general, correspond to the columns of Table 1.In particular: -Object type -Object size -Object location -Object angle -Object attributes The recoding of an object type is presented in Table 2.The recoding of an object scale is presented in Table 3.The recoding of an object location is presented in Table 4.
Table 4.The recoding of an object location.The recoding of an object angle is presented in Table 5.The recoding of an attribute is presented in Table 6.

Object location
Table 6.The recoding of an attribute.

Attribute disponibility Code It is impossible to determine attribute -2
Attribute is absent -1 Attribute is present 1 Attribute is not determined 0 For comparison the corresponding codes describe the fragment of an image use the binary function ( 1). ( where k, m correspond to semantic descriptions of image K and image M. note that if the attribute cannot be computed, the function returns 1.

The distance between the fragments of images
As we already know, the image fragment is described by a vector , where each element is a code describing some properties of this fragment (tables 2-6).Being able to compare the elements of this vector (1), we can introduce a function of distance between the vectors k,m, that describes a rectangular fragment of the image (2).
(2) Using (2) the distance between identical fragments, in terms of their semantic descriptions, will be equal to zero, and the maximum distance between the fragments is equal to N -the number of elements in the vector, which describes any fragment of an image.

A measure of the closeness between semantic descriptions of images
In order to establish the degree of "similarity" of images in terms of their semantic descriptions, it is necessary to establish a measure of the closeness between them, which, as in the case of the description of image fragments for identical descriptions would give 0, and for all the other a number > 0. The Hausdorff Measure.Hausdorff distance (HD) [6] is a metric between two sets.As an example, consider two sets of points in the plane and .In this case, the Hausdorff distance between these two sets of points is defined as: (3) (4) Here h(A,B) is called a "directed Hausdorff distance" from A set A to the set B. A modified Hausdorff measure.For many applications, including for our search task measures the closeness between the semantic descriptions of the images, it is possible to apply the so-called "a modified Hausdorff measure" (5), it differs from (3) is that instead of the maximum distance it uses the average distance from set A to set B. (5) Substituting in ( 5) expression (2) by using (1), we get the expression ( 6), which we will be used in the expression for the proximity measure between the semantic descriptions of the image (3), which can be used to assess the degree of similarity of two images.(6) where is i-th code corresponding to the vectors a and b describing the fragments of two different images.

Fig. 1 .
Fig. 1.The semantic gap the analysis of the graphic images.

Table 1 .
Semantic signature of image.

Table 2 .
The recoding of an object type.

Table 3 .
The recoding of an object scale.

Table 5 .
The recoding of an object angle.