Change detection in non-stationary Hawkes processes through sequential testing

. Detecting changes in an incoming data flow is immensely crucial for understanding inherent dependencies, formulating new or adapting existing policies, and anticipating further changes. Distinct modeling constructs have triggered varied ways of detecting such changes, almost every one of which gives in to certain shortcomings. Parametric models based on time series objects, for instance, work well under distributional assumptions or when change detection in specific properties - such as mean, variance, trend, etc. are of interest. Others rely heavily on the “at most one change-point" assumption, and implementing binary segmentation to discover subsequent changes comes at a hefty computational cost. This work offers an alternative that remains both versatile and untethered to such stifling constraints. Detection is done through a sequence of tests with variations to certain trend permuted statistics. We study non-stationary Hawkes patterns which, with an underlying stochastic intensity, imply a natural branching process structure. Our proposals are shown to estimate changes efficiently in both the immigrant and the offspring intensity without sounding too many false positives. Comparisons with established competitors reveal smaller Hausdorff-based estimation errors, desirable inferential properties such as asymptotic consistency and narrower bootstrapped margins. Four real data sets on NASDAQ price movements, crude oil prices, tsunami occurrences, and COVID-19 infections have been analyzed. Forecasting methods are also touched upon.


Introduction
A rich class of natural and artificial phenomenon may be described adequately through the language of point processes -a probabilistic description summarizing their salient features. These occurrences may evolve over time, space, their combinations or over more abstract domains and often exhibit complex dependencies. Examples include the arrival of spam emails in one's mailbox, the distribution of stars in our galaxy, and several others in between. Of interest in this work is a special subcategory termed Hawkes processes through which, the interconnectedness among events can be conveniently understood and modeled. This is in contrast to simpler alternatives such as the homogeneous Poisson processes, where the *Corresponding author: mbhaduri@bentley.edu arrival shocks are assumed to be independent a priori. A sampling of relevant mathematical descriptions is offered in the following section.
Change-point detection, on the other hand, stays a fascinating problem confronting modern statistical inference, where one wants to check whether a stable arrival flow has deviated in some fashion and estimate any point of departure, objectively and accurately. Mathematical tractability often demands simplified constructs -detecting changes only in means or variances, for instance. The purpose of this brief communication is to examine the applicability of some of the techniques proposed recently by the first author to detect more general changes under a deterministic Poissonian framework in a stochastic intensity-driven Hawkes process scenario.
Section 2 surveys the relevant literature and formalizes the mathematical structure, the next reports simulation results that demonstrate how the proposed technique triumphs over its more established competitors, and section 4 implements the analyses on various data sets with different intricacies.

Hawkes process
A continuous time stochastic process {N(t)}t≥0 is frequently employed to model a bunch of ordered arrival points t1 < t2 < ... < tn < ... representing the global occurrence times of some random phenomenon of interest. {N(t)}t≥0 is also referred to as a counting process with the understanding that N(t) at a given time t, will count the number of observations in (0,t]. Please notice the (almost sure) strict ordering in the arrival times, ensuring a simple point process, the type we are examining now, as opposed to an explosive one. Additionally, a function λ(.), termed the intensity, given through is taken to exist, quantifying the instantaneous probability of observing at least one shock. A λ(.) free of time leads to a stationary point process. An increasing λ(.) leads to a deteriorating process (where shocks occur more and more frequently as time goes on), while a decreasing λ(.) implies an improving sequence (with shocks happening less and less frequently). Regularity conditions on λ(.) lead to the independent increment property and ultimately, to a Poisson process, i.e., when ( ) ∼ Pois(∫ 0 ( ) ), with probability calculations done through A detailed description can be had from Rigdon and Basu [1], for instance. Our change point detection proposals, elaborated in Bhaduri (2018) [2] surveys processes of the above kind, with purely deterministic choices of λ(.). Such a choice, however, assumes the (conditional) intensity is independent of the history -a condition relaxed through a stochastic choice of λ(.). A random, data-dependent intensity enables one event influence another following, an apt requirement to model such events like earthquakes where one major shock inflates the occurrence probabilities of several aftershocks over a close neighbourhood. We opt for This choice leads to a Hawkes process (Hawkes (1971) [3]) where the intensity is composed of one (typically) data independent baseline λ0(.), modeling the rate of occurrence of the "major" events, and a data dependent memory kernel ω(.), modeling how much of an influence one shock has on the "minor" aftershocks to follow. Hawkes (1971) [3] chose the exponential memory of the form ω(u) := αexp(−βu), where β > 0 controls the rate of "forgetting the past". We stick to the exponential choice with α < β to ensure stability. Hawkes processes of this type have been used to analyze the arrival times of spam emails (Prince and Heard (2020) [4]), modeling intensity bursts in financial data sets [5], and various others. Time independent choices of λ0(.) and ω(.) lead to a stationary Hawkes process. For our analyses, to create a non-stationary version, we have corrupted both the baseline intensity and the memory kernel in our simulation studies (section 3 below) at pre-determined locations and measured the performance of our proposals against established competitors. Fig. 1 demonstrates the differences between the environments implied by deterministic and stochastic intensities. A prominent jump in the deterministic step intensity from 1 to 4 at t = 100 leads to an obvious heavier crowding on the arrival times while changes (in the baseline λ0(t) from 1 to 2 at t = 50 and in the memory kernel ω(t) from 0.8 to 2.8 at t = 75) in the stochastic intensity lead to a less discernible changes in the data leading to a more complex change detection exercise.

Known tools
For a thorough discussion on change detection in a longitudinal context, we direct interested readers to some of our previous work: Bhaduri ( With a given sample size , some statistic , is constructed, that measures the "difference" between the pre-and post-change blocks for an arbitrary choice of an initial change estimate at . These , s, in turn, lead to , ⇒ = max =2,3,.., −1 , and a change is signaled through where the threshold ℎ is estimated from the null-distribution of , with the estimated change location at ^= =2,3,.., −1 , Different choices of , lead to different options, working well under different assumptions (changes, for instance, only in the mean or the variance structure, the trend, etc.). The tables below lay them out while the references above provide details. While comparing these options with our proposals below in a continuous time point process setting, we take the values as the inter-event times, which leads to a discrete time series.

Our proposal
Our approach (Bhaduri (2018) [2], Bhaduri (2020) [16]) towards detecting changes offers an algorithm that can operate on continuous time (i.e., the conversion to discrete-time Xs is not needed). Essentially, a test involving a block of n neighbouring event times needs to be conducted. If this test signals stationarity and if a similar test involving a block of n + 1 neighbouring event times (the n-many from the previous stage and the immediate next) signals non-stationarity, we estimate a change-point between the n-th and the n + 1-th event times. More formally, it runs thus: • Set series of hypotheses: { 1 , 2 , . . . }, p-values: 1 , 2 , . . . .
• tests stationarity on the first + 1 events.
Once the earliest significant test (if any) is detected, the algorithm may be restated with the detected change point as the fresh time origin to discover subsequent changes, if any. The technique is, therefore, free of the "at-most-one-change-point (AMOC)" assumption under which several parametric proposals work. Declaring significance through ordered p-values is done cautiously since performing multiple correlated tests is known to inflate the type-I error probability. We follow the false discovery rate control suggested by Benjamini and Hochberg (1995) [17]. The actual testing is done through an array of novel statistics offered by Bhaduri (2018) [2]. Their definitions and crucial properties are summarized below.  (1991) A deteriorating sequence inflates the value of and deflates the value of . Randomized versions of the two through the maximum and the minimum signal general non-stationarities (i.e., both improvement and deterioration) through significantly large or small values. The critical thresholds and are summarized in Bhaduri (2018) [2].

Simulation studies
As a preliminary test to identify which one of the four statistics shown in Table 3 has the highest classification accuracy, we have conducted a power study summarized in Fig. 2 where the immigrant intensity λ0(.) was taken to be time-inhomogeneous and the only change was brought through the offspring kernel ω(.). The preand post-change β values are placed along the x and the y axes, while the power of each test, the estimated probability of correctly identifying a non-stationary sequence as a non-stationary sequence, is plotted along the z axis. The power surface appeals to intuition. Along the diagonal, when the pre-and the postchange recollections are fairly identical, detection gets tougher, leading to a lower power, while along the edges and the corners when the difference is stark, the power rises. We found there is no one statistic that shows the uniformly best power (although the minimum based L test occupies a larger region) and about the diagonal, there exists an asymmetry in the power surfaces.
Next, we investigated, through Fig. 3 through Fig. 5, the closeness of the change points estimated by our sequential proposals and their competitors to the true ones through a larger scale simulation study conducted with 10 4 runs. Each dot signifies a global time (plotted along the vertical axis) of change detection and under each setting, we have conducted tests on both failure truncated (i.e., when we wait for a specified number of shocks, regardless of the time it takes to wait that long) and time truncated (i.e., when we wait for a specified time, regardless of the number of events seen by then) cases. Fig. 3 shows the average run length comparisons under the assumption of no change. The heaviest crowding is observed towards the end of the process. This, owing to how non-detections are expressed (please see the previous section), confirms that all our proposals and most of the rest pick up stationarity adequately. The next scenario, graphed in Fig. 4 shows the results under a true change in the memory kernel's β value from 3.8 to 0.8 at time 100 (shown through the broken horizontal line), with the baseline intensity held constant. Most of our sequential proposals, especially L and R generate estimations that crowd around this true change point. Another observation is that quite a few of our competitors estimate a change point prior to the true change -clearly a false alarm. The sequential proposals are largely free of such a problem. Finally, in Fig. 5, we bring about changes in both the baseline intensity (from 1 to 2) at time point 50 (shown through the heavy broken horizontal line) and the memory kernel (from 3.8 to 0.8) at time points 75 or 100 (shown through the fainter broken horizontal line). Again, we find it is our sequential proposals that demonstrate the strongest and clearest clustering around these two true change points without sounding too many false alarms.
A natural question to ask at this stage is how will these estimators react to a large influx of data? Asymptotic consistency of these types of estimators, as pointed out by Troung et al. (2020) [18] can be examined through: where ||^− * || ∞ : = max{max^∈^min * ∈ * |^− * |, max * ∈ * min^∈^|^− * |}, with τ * representing the set of true change points, ˆτ representing the set of estimated change points, and k representing the size of τ * (i.e., the true number of changes). A change point algorithm is said to be asymptotically consistent if the first probability converges to 1 and the second norm converges to 0 as the terminal time of the process is pushed to ∞. The norm in (ii) represents the Hausdorff norm needed to quantify the "distance" between two sets not necessarily of the same size. It isn't hard to verify this distance penalizes overestimation quite harshly. With our simulation cases, calculation of the long-run probabilities of the type (i) is shown through Fig. 6. We observe (especially in the time-truncated scenario) this asymptotic probability for most of our competitors drop fast (primarily due to their sounding too many false alarms), while ours either hold steady or increase. This suggests with our sequential offerings, if one waits sufficiently long (either in terms of time or in terms of data), the right number of change pints will be picked, i.e., one won't over or underestimate with a high chance.

NASDAQ-100 movements, crude oil prices, and tsunami occurrences
The NASDAQ-100 is an index made up of the top 100 companies listed on the NASDAQ exchange, excluding firms in the financial sector. The data for the NASDAQ-100 analysis has been collected from the National Association of Securities Dealers Automated Quotations (NASDAQ -https://www.nasdaq.com/). With 5 years of historical data, ranging from 6/3/2015 to 6/2/2020, we identified 315 days where opening prices jumped $40.39 or more, a value equivalent to the third quartile of price changes. The subsequent dates were then used to create a vector of global times.  The raw data for the crude oil analysis has been collected from the National Association of Securities Dealers Automated Quotations (NASDAQ -https://www.nasdaq.com/). This institution collects daily data on stocks including, but not limited to, their opening and closing prices. Using 5 years of historical data, ranging from 6/4/2015 to 6/3/2020, we identified 314 days where opening prices jumped $0.67 or more, a value equivalent to the third quartile of price changes. The subsequent dates were then used to create a vector of global times.
The data for the tsunami analysis has been collected from The National Centers for Environmental Information (NCEI -https://www.ngdc.noaa.gov/ngdc.html), a reliable source that maps real time data of natural phenomena such as earthquakes, tsunamis, and volcanoes. Using this source, we collected data on all tsunami instances between the years of 1900 and 2019. While all the data was accurately collected by the NCEI, some of the tsunami instance timings were missing exact dates. To remedy this issue we evenly sprayed the data while ensuring that each year maintained the exact number of tsunami instances it originally saw. Using the sprayed data we created a vector filled with the corresponding global times for the 1,316 tsunami occurrences.

Bootstrapping
Through the NASDAQ example, we now proceed to describe a way of finding interval estimates in addition to the point estimates shown through the vertical lines. Such an exercise will quantify the volatility in our point guesses and aid us gauge the inherent uncertainty. We use bootstrapping to accomplish such an end. Purely random bootstrap estimates of the kind initially introduced by Efron (1971) [19], however, will not be applicable since any underlying auto-correlation will be destroyed. Research into discrete-domain time series offers a remedy. Bootstrapping here is done in blocks of some ideal size and such methods have been used by Ho and Bhaduri (2017) [9] and others to estimate standard errors of certain statistics. In the current context, however, the presence of a continuous-time point process necessitates a further modification. We follow Braun and Kulperger (1998) [20]'s block resampling algorithm to generate point processes similar to the original one. The method works best with roughly stationary point processes such as our NASDAQ or crude oil price examples. So, we choose the former to replicate 100 resamples, each with b = 15, apply our sequential testing approach with Z,ZB,R, L, along with the competitors described in the previous section on each, and generate a bootstrapped change-point distribution. For illustration purposes, we separately show the first three replications over four runs in Fig. 8. The black curve represents the observed process and the others are simulated through the above algorithm. It is a reasonable expectation, confirmed for instance, through the lower panel diagrams of Fig. 8, that when the replicates are extremely similar to the original, the resulting change point estimates should also be close to the ones from the actual. Table 4 reports the 95% bootstrapped intervals from each method. We notice that the ones from our sequential proposals are consistently tighter than the rest.

Coronavirus infections in India
The initial COVID-19 data for India was obtained from Our World in Data1, a reputable source for coronavirus data. The data contained values on the number of infections per day in the country. India had observed its first case of covid-19 on January 30th 2020 and the virus has spread rapidly across the country since then. The covid-19 data was obtained on July 10th 2020 and by then the country had observed 793,802 infections. As the data contained values on the number of infections per day in the country, it had to be "sprayed" first to obtain the global times of each infection to be able to conduct the change point analysis. This was done by using the uniform distribution to randomly distribute the number of infections on each day across the 24-hour period. The final dataset that was used for the analysis contains 793,802 entries, one entry or global time for each individual infection in the country (As of July 10th 2020). A few observations are in order: • The estimates of the true change points are visually acceptable. Very often, the optimal guesses are placed at around points in time where the underlying process undergoes significant improvements or deteriorations, seen through prominent edges. This is especially vivid in the first panel figures where relatively sparse crowding of change estimates enable one to visualize the twists and turns. In particular, the sharp rise in cases around a global time of 800 is detected by all proposals.
• Our algorithm picks up the complexity of the data. As the number of cases rise (please notice the increase in the vertical scale over time), the likelihood of the underlying point process suffering changes either through exogenous (e.g., visitors arriving from other countries) or endogenous (e.g., community transmission) impacts rises as well. It is natural, therefore, to expect the number of change detections in such a case to correlate with the data volume. This is evidenced through the graphs in the lower panel analyzing data over the more recent months, where a heavy crowding of detected change points suggests the underlying process must have gone through several regime shifts. • Too many detections often raise a valid concern over false alarms. The asymptotic consistency result described in the previous section, however, establishes our method is considerably free of such pitfalls. Consequently, the change points we detect may, with some confidence, be claimed as valid estimates, not through random noise.
• Despite the crowding in the bottom panel, gaps are noticeable. Those periods are ones over which the sequence is expected to stay relatively stable. Probing deeper, we found these gaps make intuitive sense. Those are times around which different states or the country as a whole, imposed preventive measures such as "stay-at-home" requests or preventing travelers from other countries from visiting.

A note on forecasting
We observe in passing that under a simplifying assumption of = 0 and 0 ( ): = ∑ 2 =1 (0, ] ( ), change point-based forecasts can be had too. We exploit the connection between a homogeneous Poisson process and the exponential distribution: a simple point process is a homogeneous Poisson with rate if and only if the inter-event times are exponentially distributed with rate 1/ . Using an estimated change point ^, we can break up the data into a pre-change and a post-change piece and calculate the mean inter-event time on each. If the post-change step height is estimated to be ^2 , then to wait for say future events, one needs to wait for /^2 time units. Using ^= 1135.1752 from our -based test, we found ^2 = 3.5968. So to wait for around 4 more infections, one needs to wait roughly one time unit. In practice, we found this count over the next time unit to be close at 3. Such an analysis may aid health providers estimate the number of hospital beds or respirators to arrange over the immediate future. The validity of the two-step model can be confirmed through a goodness-of-fit test.

Conclusions
Random tessellations are plentiful in natural and social sciences. Spatio-temporal mathematical tools that narrate the evolution of earthquakes and hurricanes, often, with certain variations, condense a friendship-network propagation over time quite effortlessly. Ordinary point processes of the stationary Poisson type frequently fall short to model such count patterns due to several reasons, primary among which is their inherent "without aftereffect" feature which guarantees help from relevant history would be ignored. This work studies more realistic Hawkes models where the occurrence probability of any shock is influenced by its (potentially infinite) history. The underlying intensity process in a sense, is thus random in itself. Such a data dependent intensity is known to generate a branching process structure, where observations in the first generation come from some deterministic intensity describing the ongoing exogenous (i.e., external) environment -the arrivals, for instance, of COVID-19 patients from neighbouring countries, and those in the second (offspring) generation come from some random intensity depicting the current endogenous (i.e., internal) situation -the community spread for example, of the virus within a given country. Detecting changes in these intensities remain crucial for incorporating new measures or anticipating a larger influx of patients. This work popularizes a way of estimating possible changes in both types of intensities through conducting a sequence of hypothesis tests using variations of trend permutation statistics. Originally proposed by the first author in the context of deterministic intensities, the technique, with minor modifications, is now demonstrated to work under more hostile stochastic processes, offering reliable, distributionfree change-estimates without sounding too many false alarms, and without restrictions of the "at most one change point" type. Comparisons with time series-based change detection methods document smaller estimation errors (viewed through Hausdorff distances), better inferential properties, including asymptotic consistency and tighter bootstrapped estimation intervals. Real examples sampled from NASDAQ-100 movements, tsunami occurrences, crude-oil prices, and COVID-19 infections in India demonstrate their easy applicability and graphic appeal. Model-based accurate forecasting tools are briefly described, with datadependent ones in the works.