A Driving Violation Index Modelling Study at Intersections in Urban Precincts Using Big Data

Driving violations at intersections are profoundly prevalent in urban precinct in China. These violations, including failing to stop for a red traffic light or arrow, driving across single continuous dividing line, disobeying traffic lane markings, and illegal parking, have become a major issue not only impacting the overall safety of the cities, but also causing traffic delays and accidents. This article focuses on the behaviors of the driving violations at intersections, and investigates what the influencing factors and characteristics are. A model of Driving Violation Index (DVI) for these driving violations at intersections has been developed by utilizing the Big Data approach. Based on the actual traffic data from the intersections in the city of Tianjin, a relationship has been fitted and established between the driving violations and the influencing factors, and further yielded the model of calculating the related DVI. This research attempts to provide a mechanism to forecast the driving violations at intersections, and explore new strategies for managing the security and safety of the intersections.


Introduction
Road traffic crashes are a major public health concern particularly in developing countries.With the recent economic boom in China, vehicle volume and the number of traffic accident fatalities have become the highest in the world.Traffic accidents have become the leading cause of death in China.In 2015, more than 200,000 crashes resulting in damage or injury were reported in China.These crashes resulted in 67,759 deaths.Crashes at intersections account for about 30% of the total crashes in China [1].Crashes often occur at intersections when drivers "scramble" to gain the right of way in violation of traffic regulations [2].In the United States in 2011, more than 2.2 million police-reported motor vehicle crashes occurred at intersections or were intersection related.
In previous studies, it is well established that driving violation is one of the major risks threatening road safety in Western societies and in China [3,4].Violations among Western drivers mainly include drink-driving and speeding [5][6][7], and in addition to the violations mentioned above [8], there are distinctive violations in China, such as running the red light, driving across continuous dividing line, disobeying traffic lane markings, using a non-motor lane during traffic congestion, stopping on the road in prohibited areas.Despite most drivers believing red light running was problematic and dangerous, approximately one in five respondents reported running one or more red lights when entering the last ten signalized intersections [9].
Depending on the environment and the personality of the driver, a particular situation may place the driver in an aggressive disposition.Studies, mainly conducted in the West, have identified some attitudinal and personality factors that are related to driving violations.Medina suggests that driver violation or error contributes to as much as 75% of all roadway crashes, which attempts to construct a generic taxonomy of driver errors and driver error causal factors, based on a synthesis of the available literature on human error and its causal factors [10].Brosseau study the impact of waiting time and other factors on violations at signalized intersections [11].Akaateba investigates the influence of three distinct variables: driver educational attainment, driving experience, and form of driver training on drivers' self-reported attitudes towards the frequency of commission of traffic safety violations [12].
In recent years, the comprehensive research on the influence factors of traffic safety is a lot, for example, Abdel-Aty developed a mathematical model that explains the relationship between the frequency of accidents and highway geometric and traffic characteristics, and they confirmed that shoulder and lane, widths, and sharp horizontal curve are found to affected the safety of a roadway, while AADT is significant factor [13]. Mehmood aimed to introduce the System Dynamics approach to simulate the driver behavior in relation of law enforcement, traffic monitoring, and education [14].
McCartt and Hu studied the effects of camera enforcement on red light violations, and got the result that red light violations at camera-enforced intersections declined significantly [15].Al-Ataw assessed the characteristics of red light violations and analyzed the effect of intersection characteristics, such as geometric design, control system and location, on number of violations [16].Yau study found that male drivers and young drivers are more likely to illegal driving [17].

Traffic violations processing system
At present, more than 300 signal-controlled intersections in Tianjin have been installed video detection devices, which can get the data of drivers' traffic violations at area of the intersections and related data of traffic flow.The data of driving violation behaviors at intersections are obtained from the database of the Traffic Violations Processing System, maintained by the Tianjin Traffic Management Bureau.This database is used to process, store, and report information of traffic violations at intersections in Tianjin, China.The research violation data comes from the TVPS, which process the data covered all the signal-controlled intersections in Tianjin.

Traffic flow collection system
The data of traffic flow comes from the statistical database of Traffic Flow Collection System, which also maintained by the TTMB.Real-time traffic flow data are obtained through video analysis and geomagnetic detection.The traffic flow data in this study are collected by the hour.

Area coordinated system of signal control
The data of signal control is derived from the statistical data of Tianjin Area Coordinated System of Signal Control, and the system adopts the technology of Sydney Coordinated Adaptive Traffic System (SCATS).At present, Tianjin urban intersections are all using SCATS.This study collects the system control mode, the signal circle and the red light time data for analysis and research.

Design
The data obtained from the above systems show that the number of traffic violations and the objective conditions are related, because some of the intersection traffic violation data significantly more than others, while the number of violations will change over time and the occurrence of a certain pattern of changes.Therefore, this study aims to find the relationship between traffic violations and various external factors.
In the previous research, there are many methods of data source.Some scholars use the questionnaire [12,[18][19][20], some are observational studies [21], and some use historical statistics [22].
Referring to the previous research methods, this study combines the quantitative data and qualitative data, including state data and dynamic data.Intersections were selected with characteristic differences all over the Tianjin city, in China.Tianjin, as one of China's four municipalities, is located in the south of Beijing.The city's population is more than 10 million, and the motor vehicles are more than 3 million.The traffic violations in Tianjin are prominent.According to the Tianjin Public Security Traffic Management Bureau statistics, in 2015 the Traffic Violations Processing System found and punished traffic violations in the intersections 1000 times.Among them, running a red light 200 times, driving across continuous dividing line 250 times, disobeying traffic lane markings 550 times.

Data processing
In consideration of detection errors of video detection devices, all the data used in this study has been dealt with through error analysis, data reduction and additional treatment of data incompleteness.Based on the statistical data of one intersection and one entrance lane of one day, this study selected the sample data of 300 days in 2015 in the selected 10 intersections in Tianjin, China.
In order to derive the Driving Violation Index (DVI), the general idea is to analysis of the relationship between traffic violations and the influence factors.This DVI is intended as a measure of the propensity for driving violation to be experienced at a signalized intersect ion.Hamdar got an aggressiveness propensity index for driving behavior at signalized intersections using the SEM method [23].On the basis of previous studies, this study has established DVI at intersections through the analysis and the research of objective factors of urban intersections.

Analytical model
The Structural Equation Model (SEM) approach can handle a large number of endogenous and exogenous variables simultaneously, as well as latent variables specified as linear combinations of the observed variables.SEM is the combination of confirmatory factor model and causal model.The factor SEM includes is called measurement model, and its equation is called measurement equation which describes the relationship between latent variables and indicators.Besides, the causal model SEM contains is called latent variable model, also known as structural model, and its equation is called structural equation which describes the relationship among latent variables [24].With the popularization of vehicles and the development of urban traffic, the influence factors and the complexity of traffic behaviors have increased significantly.Thus, SEM has been applied to traffic field by foreign scholars.Early in the 1980s, Golob applied SEM to the research of traffic demand model [25].At the beginning of 20th century, foreign scholars were the first to apply SEM to the research of drivers' behavioral characteristics, and got a very good application.Golob using SEM to the study of travel behaviour, and de Abreu e Silva using structural equations modeling to unravel the influence of land use patterns on travel behavior of workers in Montreal [26][27].Eboli explored land use and transport interaction through Structural Equation Modelling [28].

Analysis of impact factor
Owning to the complexity of objective conditions of urban road intersections, the objective factors that influence driving behaviors are also a comprehensive and complex collection.Table 1-2 provides the factor variable classification, and also analyzes the impact factors through SPSS software.First, the data were tested through the test of KMO and Bartlett, and the KMO value is greater than 0.7, which illustrates that the data is suitable for factor analysis.Besides, the Sig value is less than 0.05, which shows the test is significant.Through the factor analysis of the data, this thesis got a coefficient matrix of factor scores.According to the analysis result, this thesis reduced the original five impact factors to four impact factors, excluding the environmental factor.Meanwhile, the factor scores of the angle of intersection, the width of lane, the design speed and the one-way road are all less than 0.1, so they are all excluded.Based on the actual experience, because the differences of lane width and design speed in this thesis are small, the two variables of lane width and design speed cannot be treated and analyzed as significant impact factors, which coincides with the analysis result of data software.

Selected model variables
According to the analysis result of impact factors, this thesis selected among the effective factors, and totally selected four exogenous variables and 1 endogenous variable as variables of structural equation model.The detailed descriptions of selected variables in the model are shown in Table 4.

Model construction
This study uses the software of LISREL 8.8 to program and establish SEM.Applied the data samples of Tianjin intersections to model training, this thesis has obtained the measurement equation of exogenous latent variables and structural equation model.Establishing models and modifying them through software programming, this thesis has got a path diagram of the structural equation model to calculate driving violation index which includes factor loading and path coefficient, as shown in Figure 1.

Calculation model of driving violation index
The results of calculation model can be acquired in accordance with SEM, among which Err is the error term.Models include the measurement model of endogenous variables, measurement model of exogenous variables and calculation model of violation propensity index.The result summarizing the model are presented in Table 5.

Analysis of influencing factors
Through the model fitting of 300 data samples, this thesis uses structural equation model to obtain the calculation model of DVI.In the final fitting model, the Chi-Square value is 171.9,RMSEA value is 0.089, GFI value is 0.68 and AGFI value is 0.76.The fitting results are in an acceptable range when the sample size is much larger than observed variables.Because the model fitting is conducted with real data, it is inevitable that the data maybe uncertain and mutant.Therefore, it can be assumed that the result of model fitting is an ideal result.
The final structural equation model classifies the exogenous variables X 1 , X 2 and X 3 as impact factor F 1 which represents the characteristic index of intersections; classifies the exogenous variables X 4 , X 6 and X 7 as impact factor F 2 represents the characteristic index of entrance lanes at intersections; classifies the exogenous variables X 9 , X 12 , X 13 and X 14 as impact factor F 3 which represents the characteristic index of traffic control and traffic management; and classifies the exogenous variables X 15 , X 16 and X 17 as impact factor F 4 which represents the characteristic index of traffic flow.The model fitting has excluded the exogenous X 18 for its relevancy to the exogenous variable X 15 is larger than 0.9.Through the comparison of path coefficient of impact factors, the sort of effects of impact factors on DVI is as follows: F 4 >F 3 >F 1 >F 2 .It shows that the traffic flow characteristic has the greatest influence on the traffic violation behaviors, and the impact factor of the entrance lane characteristics at intersections is the least.
Meanwhile, through the comparison of the path coefficients Y 1 , Y 2 and Y 3 , running the red light has the greatest influence on the DVI among traffic violation behaviours at intersections, followed by pressing the solid lines, and the third is driving not according to the guiding lanes.The illegal parking is eliminated by the model, which shows the low correlation between illegal parking at intersections and driving violation index.

Driving Violation Index Calculation
Making use of the model results obtained in this study, the DVI value of a certain intersection can be obtained.The violation propensity index of each intersection is related to traffic flow, traffic condition and other dynamic variables, so the average traffic data of intersections should be taken into account when calculating the violation propensity index of intersections.Meanwhile, the data of different entrance lanes should be averaged to calculate the result of DVI.According to the calculation model, the corresponding Fi value can be obtained by taking one factor in each factor, and then the Index value is calculated.At the same time, we can estimate the number of traffic violations according to the Index value.Research shows that with the increase of Index value, the tendency of traffic violations also increases.Table 6.presents the calculation results of the classical intersections in this study in Tianjin, China.Similarly, we can extend the results to other intersections or other Chinese cities, because of the similar characteristics of drivers and traffic.When the standard index is greater than 0.8, the intersection is considered to be more dangerous, it is recommended to take countermeasures, such as the number7, number 8 intersections in this study.

Conclusions
In this paper, we have creatively proposed the concept of violation propensity index.Through collecting and organizing the data of illegal driving at intersections and other relevant data of road traffic, it established a SEM-based calculation model of violation propensity index.With the application of real data of Tianjin, it obtained the path coefficient and factor loading chart of SEM by programming with the software LISREL 8.8.Through analysing the impact factors of traffic violation behaviours at urban intersections in China, this thesis explored the relationship between traffic violation behaviours at intersections and intersection characteristics, road characteristics, traffic control characteristics and traffic flow characteristics.Besides, it also innovatively acquires the SEMbased calculation method of driving violation index at intersections through the combination of qualitative analysis and quantitative calculation, which provides new ideas and methods for safety evaluation of intersections.This study is not perfect, and further research in the following areas is suggested.

Figure. 1
Figure.1 Final Driving violation index structural model by LISREL 8.8.

Table 1 .
Observed exogenous variables' description in model

Table 4 .
Selected model variables

Table 5 .
Results of SEM

Table 6
DVI of the intersections in this study