New data mining technique for multidimensional aircraft trajectories analysis

Under conditions of growing airport workload, airspace sectorization is necessary for accidents prevention. Sectorization should be performed taking into account regular traffic of aircrafts. A new data mining technique, solving the problem, is described. It allows to fulfill stable partition of aircraft intent trajectory sample into the groups (asymptotically converged beams), corresponding to the same runway approaches. Method is taking into account special geometric characteristics (curvature, torsion and multiple intersections) of multidimensional space trajectories of aircrafts.


Introduction
Currently, due to increasing data volume, to design methods and tools for big data rapid and automatic processing becomes a problem of great importance.New approaches are required for large data sets analysis (like unsupervised data mining or other machine learning techniques), that can recognize hidden patterns of motion and identify moving objects with similar characteristics and/or the same final targets.
In many areas, particularly, in aviation, it is necessary to process huge data sets of trajectories for monitoring, control or other purposes.Aircraft trajectory is represented by four-dimensional description of the aircraft's states with time, where state may include a position of aircraft center of mass and other characteristics of motion, like velocity, attitude and weight.So, in general, space trajectory is a multidimensional description of moving object path.
According to statistics, large number of aviation accidents takes place in extended airport area due to increased work load of air traffic management (ATM) systems.So, while air traffic is permanently increased, modernization and optimization of ATMsystems are needed to maintain high safety level.
Data mining techniques are widely used in the investigations on work load optimization in extended airport area, in particular, for air space sectorization.Sectorization is space partitioning into the sectors or zones of different controllers responsibility.
Air space sectorization should be done taking into account regular flows that are described by the samples of aircraft trajectories.
Up to date, sectorization problem is solved by clustering [1] 2D-projection of aircraft 4D-trajectories, or with the help of space partitioning into the regions like Voronoidiagrams [2].
Development of 3D-sectorization algorithm is at initial stage and associated with the problem of trajectory beams revealing in 3D-space.Trajectory beams are corresponding to the groups of 3D-trajectories with similar characteristics.For partitioning of trajectory sample into the beams such methods as PCA [3], nonparametric approach based on Dynamic Bayesian Networks [4] and spectral clustering [5] are used.The methods are based on reduction of analyzed data space dimensions.
Complexity of analyzed data space geometry which has specific curvature, torsion and multiple space intersections is demonstrated in Fig. 1а.Example of two intent trajectory beams on different runways is shown in Fig. 1b.
In the present work the details of a new approach to airport space sectorization which has been proposed earlier [6] are described.Current technique is based on determination of geometric asymptote tangent to multidimensional trajectory beam in the lower dimension space.Then trajectory beam is separated in the result of reverse transition to the original data space.As it will be shown further, such approach, while being applied to the analyses of radar data, allows getting stable results.Analyzed data were recorded by TRACON radar system.Reference point coincides with the radar location.Time interval between trajectory points is about 5 sec.Each trajectory consists of 160 points, as it was described in [6,7].

Assessment of beam's asymptote
are coordinates of final trajectory points on a runway.Parameters . is Euclidean distance metric in three- dimensional coordinate space 3 R , ε is a cutoff parameter with value of no more than runway width.
In considering the proposed approach to determination of number of converged beams, it should be taken in to account that the trajectories in asymptotically converged beams have some typical form (profile) and specific geometric asymptote in the region of convergence (1).Geometric asymptote in converged beam of multidimensional aircraft intent trajectories is a line in 3 R that meets the requirement (1).Trajectories in asymptotically converged beam have tangent line in area around their final points , [ ; ] . So, asymptotically converged trajectory beams can be identified by determination of their tangential geometric asymptotes in the points of their focuses.
Because discrete points of beam trajectories are tightly located around its' asymptote, to determine number of asymptotically converged beams sample of trajectory Set of the points (2) has to be sorted according to the values of one of the coordinates (in ascending or descending order).At that, other coordinates of points, representing converged trajectory beam of certain profile, are also ordered.Then, for the scattered three-dimensional data ^, , , 1, x y z i L N (2) orthogonal linear regression models ^3 x, y, z a .By such a way the model ( 3) is symmetrical relatively to the coordinates , , x y z .
Any pair of the points from ( 2) is sufficiently to put forward a hypothesis about the model of orthogonal linear regression (3).Final model ( 3) is proved by the greatest relative quantity (percent) of scattered data ^, , , 1, x y z i L N z (2).Algorithm MLESAC (Maximum Likelihood Estimation Sample Consensus) [9], that is probabilistic modification of RANSAC [8] algorithm, may be used for these purposes.The algorithm estimates likelihood of the model (3), in representing distance distribution of scattered data ^, , , 1, (3) as a mixture of data distributions some of which support the model (3) (inliers), while the rest ones reject it (outlier).Considering the scattered data Z (2) as independent, we obtain a relation for logarithm of likelihood as following where γ is mixing parameter.Distribution of the distances to the data, supporting the model (3), is represented by Gaussian distribution where σ is standard deviation.Distribution of distances to the data, rejecting the model (3), is described by uniform distribution where max ρ is maximal distance to data (it is defined by the context).Minimization of likelihood logarithm (4) allows to estimate vector of the parameters θ and mixing parameter γ (see Eq. ( 21) in [9]).Estimation of the parameters is traditionally done using EM-algorithm [10].Example of the algorithm practical realization in Matlab can be found in [11].
The most likely linear regression of the scattered data of trajectory sample defines geometric asymptote [ ], 1, θ M k k K (3) of one of the sample beams under condition (1).Geometric asymptote obtained in such a manner meets the requirement ICBDA 2016 ITM Web of Conferences itmconf/2016 8 0801001 1

Separation of trajectory beam
Beams of the trajectories tangential to the corresponding geometric asymptotes are defined in the result of minimization of a cost function ^` N is a set of binary indicator variables (e.g., if vector [ ] i x was attributed to the beam k , then [ ; ] 1 r i k and [ ; ] 0 r i k , otherwise).Distance between each geometric asymptote and sample's trajectories is calculated After elimination of the points, representing the trajectories of separated beam, from the scattered data (2), the procedure of geometrical asymptote detection is repeated and the next trajectory beam is separated.In this case, remained scattered data (2) are sorted with respect to another space coordinate (different from previous one), as the model determination should be symmetric relatively to the coordinates , , x y z .The possible dependence of result (3) from coordinate directions is obviated by changing direction of data ordering in (2) from ascending to descending order or vice versa.Analysis of trajectory sample is completed, when all beams in the sample are separated.
In general, the approach described in the present work consists of two stages.Sufficient reduction of data dimensions at the first stage simplifies the revealing of data specific features.In considering 2D-projection of scattered points of 3D trajectories, the most likelihood orthogonal linear regression of the scattered data which corresponds to geometrical asymptote of one of the beams in analyzed sample of the trajectories is defined.At the second stage of the approach, after reverse transferring into the original dimensional space, a certain beam is separated from the analyzed trajectory sample in accordance with proximity (by cosine measure) to defined asymptote.Thus, due to such approach no information about the original data is lost.

Demonstration of the results
Earlier [7], sample of multidimensional aircraft intent trajectories was divided into subsamples in 3D-space with the help of method of polynomial regressions [12].However, the subsamples obtained in the result are not homogeneous, as each subsample consists of several trajectory beams.Besides, the number of beams in the method of polynomial regressions is needed to be specified preliminary and the result obtained is not stable.Method described in the present work is unsupervised, so no knowledge about number of trajectory beams in original sample is necessary.In addition, this method gives stable results.Fig. 3 illustrates the stages of described technique allowing to divide one of the subsample, earlier obtained in [7], into asymptotically converged beams (see Fig. 3d).Fig. 3a shows two-dimensional projection of scattered data (aligned with time) of analyzed trajectory subsample.Fig. 3b illustrates the result of data's linear regression using RANSAC [8] algorithm.The defined asymptote (long blue line) of the first beam is represented in Fig. 3c.The beam highlighted in blue in Fig. 3d

Conclusions
New data mining technique for analysis of multidimensional trajectories of landing aircrafts in 3D, described in the work, allows to obtain stable results in partitioning of the analyzed trajectory sample into asymptotically converged beams of the trajectories.Thus, 3D airspace sectorization of extended airport area may be performed.
a) Intent aircraft trajectories are shown.b) Example of intent trajectory beams is presented 2 Description of analysed data

Fig. 2a illustrates
Fig. 2a illustrates analyzed sample of intent trajectories.The sample contains 116 trajectories of the aircrafts landed at the airports of San Francisco bay area 1 January of 2005 year (data is freely available at https://c3.nasa.gov/dashlink/resources/132/).Analyzed data were recorded by TRACON radar system.Reference point coincides with the radar location.Time interval between trajectory points is about 5 sec.Each trajectory consists of 160 points, as it was described in[6,7].

Fig. 2
Fig.2 Analyzed sample of 4D aircraft intent trajectories is shown in 3D-space

x b y c z d a x b y c z d θ 1 ^1 1 1 1 2 2 2 2 ,
the RANSAC (Random Sample and Consensus) algorithm [8].Here symbol is conjunction, , , , , , , a b c d a b c d θ is vector of the parameters of the models (3), that is determined under given cutoff of Euclidian distance ρ , A z θ M calculated by orthogonal projection of point , , x y z z from the set (2) onto a line θ M

Fig. 3 .
corresponds to the asymptote of the same colour in Fig.3c.It should be noted that in this case the results of DOI: 10.1051/ , 0100 (2016) of scattered and full trajectory data coincide.Trajectories of the first beam (see Fig.3d) are separated from the subsample mentioned above according to proximity of the trajectories to the asymptote (see Fig3c) in according with cosine measure.Results obtained at different stages of the algorithm are presented.a) Twodimensional projection of scattered data (aligned with time); b) Linear regression of analysed scattered data obtained with RANSAC [8] algorithm; c) Asymptote (long blue line) of the first separated trajectory beam; d) Three finally separated trajectory beams in original 3D-space.The beam shown in blue corresponds to the asymptote represented in Fig.3c.