Processing Archive Information in Digital University

The article discusses methods of processing and storing data archive used in the digital university. Disadvantages of these methods are found. As a result, a fundamentally new method of processing and storing information archive in a constantly changing scheme database is proposed. This method uses mivar technologies. The multidimensional space structure has been developed to store the data archive. This multidimensional space describes the temporal relational model. For processing data, archive is proposed scheme for selecting the subspace and converting it into relations. A method of transformation of relational databases into multidimensional mivar space for efficient execution of operations on temporal data with changing structure is proposed. The transition to a multidimensional space allows us to describe the process of changing temporal data and their structure in a unified way. As a result, the time required to adapt the database schema and the redundancy of information storage are reduced. The results of this work are used in the human resource management database of BMSTU.


Introduction
The digital university is a complex, heterogeneous and constantly changing system that must consolidate and integrate a huge number of subsystems and modules into a single information space [1]. Most of the data stored in the digital university subsystems are temporal, i.e. they are relevant for a certain period of time [2,3,4]. Such data include passport data, the state of education and educational program of students, information about teachers and their individual plans, etc. During the functioning and maintenance of the digital university, an archive is accumulated: new data is constantly entered into the system, and old data becomes out of date, but is not deleted from the system. Because legislation and requirements to subsystems of digital university are constantly changing, Against the background of constantly changing legislation and requirements to the TS subsystems, it takes place the modernization of already implemented business processes and databases [5,6,7].
Currently, the relational model and its modifications is the most popular and widespread in databases [8,9,10]. The known temporal data models that extend the relational model and considered in [11,12,13,14,15,16] have a number of disadvantages that constrain their application in practice. The use of these models in systems that need to change the data structure over time leads to an increase in redundancy of information storage, as well as to basic relational model growth. It complicates the compilation, execution of queries and adaptation of the database to new tasks. The process of adapting the database with temporal data is shown in fig. 1. Currently, the implementation of work with the data archive accumulated in previous systems can be performed in 2 ways: 1. Transfer of archive from several databases (DB) to one, the last. For fig. 1 data from database1 at time T1 is moved to database1 at time T2, when there was a change in the database schema. This approach leads in data integrity damaging because database schemas different. Usually data transfer has long implementation times.
2. To increase the number of databases in the system or increase the number of relations in the database schema when it is changed. This complicates the compilation and execution of queries to temporal data, because they have to be joined from different databases or from different relations.
Currently, there is a new mivar approach [17,18,19] to the description of changeable, dynamic subject areas, which allows to fix data storage redundancy. The mivar approach is used for a class of learning systems whose task is to study and model complex dynamic subject areas. The basis of the mivar approach to data presentation is an integral, unified description of subject area from different points of view through a multidimensional space. However, the use of this approach to work with temporal relational database is impossible, as it is necessary to implement a mechanism for processing multidimensional representation of changing relations [20].
The use of mivar technologies for storage and processing of information archive allows fixing the above disadvantages. The development of a multidimensional model of storage and processing of temporal data archive is an actual task. The use of mivar technologies will increase the system life cycle by adapting the database of the information system functioning in a dynamically changing subject area with temporal data.

Storage of temporal data with variable structure
Consider a relation with temporal tuples. In this case, the relation consists of tuples, which define the states of the domain objects. The dependency graph of the size of a relation with temporal tuples is shown in fig. 2. In the graph, the n axis is the number of tuples in the relation. If the relation schema is unchanged, the size of the relation grows linearly by the tuples number increase, kn V ОТН  , where k -number of attributes in the relation schema. In this case changing the relation schema consists only in adding new attributes to keep the previously accumulated history. Therefore, the relation size increases incrementally, as space for new attributes is allocated in existing tuples. The change in the relation scheme is shown in the graph when the number of tuples is n=n 1 and n=n 2 . If you change the relation scheme, when n=n 1 , the relation size increases from V 1 to V 2 , and the graph angle changes. Similarly happens when n=n 2 .
Mathematically, the dependence of the size of a relation with a changing scheme on the number of temporal tuples is presented in formula 1. Consider the structure of the mivar space for consolidation of relational databases. A relational data model is a set of normalized relations to which relational algebra operations apply. Each relation includes many attributes and many records that are defined by the relation key. Thus, to describe a relational data model in a mivar space, you must enter three axes: the relation axis, the relation attribute axis, and the relation record identifier axis. A time axis is added that determines the relational model state.
Thus, the structure of the mivar space for consolidation of relational databases consists of four main axes: Vthe set of relations of relational model.
Tthe set of state change times of a relational database. Then the multidimensional space will have the following: The relation, attribute, record ID subspace defines the state of the relational model that depends on the other axes. For each axis, a set of elements from the original relational model is generated. The Cartesian product of these sets generates a multidimensional space for a temporal relational data model In multidimensional space, each tuple attribute value of a relation matches to a point with certain coordinates. The point matching to the selected attribute value is shown on fig.  3. The set of all points in a multidimensional space matches to a relational data model. In the mivar approach, space points that store the corresponding attribute values of the relational model define the data structure. Thus, the data structure is determined by the data that is stored in the mivar space.

Multidimensional space of relational model
Changing the value of an attribute in multidimensional space  The structure of relation vi and the mivar description of this relation are presented on fig. 2. When designing a relational model, the sets that describe the structure of the model: relations, attributes, tuples, are dependent on each other. When you destroy some element of the set of the higher level, all dependent sets are destroyed. When constructing a multidimensional data representation space, all sets are independent of each other that creates new opportunities to change the database schema in the relational model and preconditions for the development of new methods of data processing with a variable structure.
In relational databases, the uniformity of filling and storing the data of all relation attributes leads to the fact that if no values are specified, then the database still stores a record (empty) of fixed length. For many domains, this leads to an unjustified waste of computational resources. In the multidimensional space, only the necessary attribute values are stored.
Thus, the mivar representation of the relation in comparison with the relational one is more general, having more opportunities for representation, change of both data and their structure. The mivar space allows you to save data from previous databases without changing their structure that provides the use of existing SQL queries.

Transform a relational database into the multidimensional space
The transformation of the original relational model into the multidimensional space is performed to produce the initial state of the multidimensional space presented in section 3.
To perform this operation, we introduce a transformation operator for a relational data model into the multidimensional space describing a temporal relational model:  Dr R A -attribute value R i A j in tuple n of relation r i , Р inumber of attributes in relation scheme R i .
The transition from a relational data model to a multidimensional space with the help of the introduced transformation φ allows us to describe the process of changing the model. Model change occurs by adding new points in multidimensional space, and the introduced changes do not affect the original relational model. The coordinates of the points define the data structure. As a result, changing the data structure and changing the data in relations are performed simultaneously in the multidimensional space. This representation of the relational model allows you to keep a changes history for each attribute in relations separately, that minimizes the total number of relations in the temporal relational model.

The inverse points values transform from multidimensional space to relational data model
The transformation operator α receive from certain set of point values in a multidimensional space the relevant state of the relational data model: 4. The relational data model is generated from received schemes and sets of relations tuples R D .
Thus, with the help of the introduced α-transformation from the multidimensional representation of the temporal relational model, it is possible to get certain relational model states for their next processing by standard SQL queries. This allows you to apply existing queries in the information system to the relational database when moving it to a multidimensional space.

Work with multidimensional space for processing the archive information
Working with a multidimensional space for processing information archive consists of 3 steps.
1 step. Transforming the original relational data model into a multidimensional space. The first step in processing the information archive is the transformation of the source database into a multidimensional representation, proposed in section 3. The source database is broken down into sets of elements (relationships, attributes, attribute values, and record identifiers). Based on these set elements, a multidimensional representation of relational database is generated. Changing the multidimensional space can be done 2 types 1. "Analysis and formalization of changes in objects of the subject area". The result of this function is a change in the multidimensional space: either the structure of the space changes (new axes are added), or the sets that define the axes of the space change (new elements are added to these sets).
2. "Entering new data and changing data". The result of this function is to store values of new points in multidimensional space.
The process of changing the domain model is iterative ( fig. 4), i.e. each next state of the domain model depends on the previous one: C i =δ(C i-1 ), where δfunction that specifies the change in the model current state from the previous one (the set of points values with coordinates that appeared in space at time t i ).  Operations with point coordinates select the required subspace. Then the selected subspace is transformed into domain relations. Relational algebra operations apply to obtained relations to determine the result of the query (fig. 5).
Existing SQL queries to relational databases consolidated within the mivar space and containing data archive from previous systems are used with additional perators that convert a specific area of the mivar space to the relevant relations state. Thus, processing mivar representation of the data archive includes a single initial transformation from the relational data model to the multidimensional space and then to work with this multidimensional model: the input of new data, modification of data structures and query execution to multidimensional representation of temporal relational model with data archive.

Conclusion
In the article, structure of the space for temporal relational data model was discussed. It stores archive information and consists of 4 main axes: axis of relations, axis of attributes, axis of tuple identifiers, and the time axis. To work with such a multidimensional space, you need to select a part of the space, convert it into relations, to which you can perform SQL queries and get the necessary data.
As a result, it became necessary to extend the standard SQL statements with new ones that allow selecting the required parts of a multidimensional space and transforming them into the relevant relations states The proposed approach for processing archive data in a continuously changing their structure allows to transfer information from one database to another and have relevant and non-redundant database schema, that simplifies storage and handling of archive information in the digital university.