Modeling the dynamics of population immunity to influenza in Russian cities

Prediction models which will explicitly include the immunity levels of the population are required to plan effective measures for the containment of seasonal epidemics of influenza. The aim of the current work is to develop an approach to herd immunity dynamics modeling, with the long–term goal of employing it as a part of multicomponent model of influenza incidence dynamics. Based on serological studies performed for 52 Russian cities and 2 virus strains (A(H1N1)pdm09, A(H3N2)) in 11 years period, we propose statistical models which allow to analyze and predict the strain–specific immunity dynamics.


Introduction
Epidemic outbreaks of influenza, along with the corresponding increase in the incidence of acute respiratory infections (ARI), cause 3-5 million cases of severe illness and 250-500 thousand deaths annually, according to WHO estimates [1]. Effective planning of measures to contain influenza epidemics requires forecasts of influenza dynamics over the season. In this regard, the world's health authorities started to use prognostic models of ARI and influenza propagation, based on methods of statistical data processing and mathematical modeling, which complement traditional epidemiological surveillance systems. The factors which influence the intensity and timing of seasonal epidemics and thus might be considered in the models are temperature and humidity [2], [3], [4], contact patterns between the individuals [5], [6], biological and antigenic properties of the pathogen [7], [8], and, last but not least, levels of population immunity determined by vaccination coverage and the history of previous influenza infections in the population [9]. In vast majority of works devoted to the modeling of influenza epidemics, the dynamics of the population immunity levels from season to season is either added as an abstract parameter of the model, without its direct assessment from serological studies, -sometimes due to the lack of the corresponding data, -or it is not considered at all. Nevertheless, immunity plays a crucial role in case of repetitive seasonal influenza epidemics (as opposed to the pandemic outbreaks where the immunity could be considered negligible). In particular, due to the growing influence of collective immunity on the dynamics of morbidity, the prognostic model of the spread of influenza in the USSR, which was used in 1970s at the Research Institute of Influenza [10], showed a decline in the accuracy of predictions and due to that reason was put out of service [11]. The calibration of influenza dynamics models to contemporary Russian ARI data [12] confirmed that in case when ARI incidence is the only type of data used for the calibration procedure, the forecasting accuracy becomes low [13], [14]. Also, the population immunity level might not be accurately assessed by such model calibration when it is treated as a free parameter [15]. This issue badly affects the plausibility of predicted influenza dynamics. In these circumstances, it seems important to incorporate additional sources of data related to immunity levels into the predictive models.
Among the prognostic models that explicitly include the dynamics of collective immunity, we can name the following ones.
• A model with an age structure designed to assess the real number of cases in London during the 2009 influenza pandemic [16].
• A multicomponent model used to analyze the epidemics caused by the A(H1N1)pdm09 influenza strain in the UK in 2009-2011 [17].
• A stochastic model that reproduces the dynamics of influenza in certain regions of England and Wales for 14 years [18].
• A SIRS model taking into account the age structure and differences of individuals in the levels of acquired immunity, calibrated to data of the 2009 pandemic in Hong Kong [19].
Due to the fact that these models were calibrated to data sets gathered at specific locations (and usually within a limited time frame), the conclusions obtained in the mentioned studies cannot be fully generalized to influenza epidemics in other countries, in particular, in the Russian Federation. Also, none of these works demonstrated a comparison of the dynamics of the population immunity in different areas of the country under consideration.
The aim of the presented research is to develop an approach to modeling and forecasting of collective immunity dynamics in urban populations based on the data obtained from biological samples. Our long-term goal is to build multicomponent prognostic models of the spread of influenza epidemics taking into account the levels of the population immunity.

Data
The dataset we used to assess the immunity levels is based on serological studies performed for 52 Russian cities and 2 virus strains (A(H1N1)pdm09, A(H3N2)) in 11 years period (2009-2019). Each epidemic season (July to June) two serological samples were analyzed to measure a pre-epidemic (October-November) and a post-epidemic (April-May) immunity levels. The analysis was performed by the local healthcare facilities and verified by the specialists of Research Institute of Influenza, the resulting datasets were stored in .xlsx tables. In a number of results obtained from local healthcare units, incorrect data were identified and excluded. The immunity dynamics to both regarded virus strains in all of the cities is shown in Fig. 1.
Currently, most of the healthcare units do not report age-specific immunity, nor do they distinguish the vaccinated individuals from the non-vaccinated ones when collecting the samples. We assume that there is a need for separate accounting of data for vaccinated and nonvaccinated cohorts, since the antibody responses of these cohorts are determined by distinct reasons and have statistically significant differences. It is supposed to verify this hypothesis as the data on donors with known vaccination status are replenished (currently, the data is only available starting from the 2017-2018 epidemic season).
As it can be seen from the gaps in the graph trajectories, the reporting rate of different cities varies considerably. In some cases the amount of omitted data did not allow assessment of the immunity dynamics, so the corresponding cities were excluded from further analysis. There are 34 cities with the immunity data for both virus strains containing 10 points (i.e. 5 years of measurements) or more. Having added to our consideration less complete data for the cities with the number of data points from 6 to 9, for further analysis we have A(H1N1)pdm09 strain immunity data for 38 cities and A(H3N2) strain immunity data for 39 cities.

Analyzing immunity dynamics
Based on the selected data, we analyzed the dynamics of collective immunity over the years and highlighted significant factors affecting the levels of population immunity. It was revealed that, depending on the dominant subtype of influenza viruses in the epidemic, fluctuations in the level of collective immunity are observed in the pre-and post-epidemic periods. In particular, in the post-epidemic period, there is an increase in the number of seropositive donors to the subtype that is most intensively circulating during the epidemic. Also, the preliminary analysis results in a hypothesis that the year of the onset of circulation of the strain in question has a great influence on the change in population immunity levels. Indeed, there is no city to demonstrate immunity levels close to 100% to A(H1N1)pdm09 in the first years of observation (see Fig. 1, top). The immunity levels to A(H3N2) are much higher and almost do not have a tendency to grow (see Fig. 1, bottom).
Next was the stage of constructing a model of the immunity dynamics. A review of mathematical methods suitable for modeling the dynamics of immunity was conducted. Three approaches were considered as candidates: a mathematical model based on discrete difference equations, regression analysis, and a recurrent neural network. Guided by the results of the review, as well as the results of data analysis (in particular, by the estimated samples size and by available knowledge from the field domain), we chose an approach to short-term forecasting of levels of population immunity based on the regression analysis. A software module was created for predicting the dynamics of population immunity using the retrospective data. For this purpose, we have chosen Python programming language, because it is freely available and allows easy batch processing of data, unlike many GUI-oriented statistical software tools. The module was initially tested on average data on the levels of population immunity in the Russian Federation to strains of A (H1N1)pdm09 and A (H3N2) for 2009-2017 (two measurements each year) which were available from the open sources [9], [20] (see Fig. 2). The prediction accuracy assessment was carried out by predicting the points excluded from the test set (see Fig. 2, light-blue dots). The number of excluded points varied. The module was searching for the best regression function, selecting it from the several predefined functions (based on combinations of logarithmic functions, trigonometrical functions and polynomial terms). The accuracy comparison was made based on the two output characteristics: values of the coefficient of determination R 2 demonstrated for all the points, including the predicted ones, and the values of mean squared error (MSE) for the predicted points. The first parameter is aimed at characterizing the ability of the selected model to describe the whole dataset, whereas the second one demonstrates the predictive ability of the model. The best result corresponds to the maximal R 2 and minimal MSE. The obtained selection of functions to describe the dynamics of average all-Russian immunity levels to A(H1N1)pdm09 and A(H3N2) over time was verified against the data for particular Russian cities, described in the section 2. The examples of the results are shown in Fig. 3

Results and discussion
Based on the output, we conclude that the logarithmic function demonstrated better accuracy for predicting immunity to A (H1N1) pdm09 among the tested regression models, being the best by R 2 value for 17 cities and by MSE value for 13 cities out of 38. The reason is in the fast growth of immunity levels to this strain due to its seasonal circulation after the pandemic outbreak in 2009. The selection of the best model is more difficult in case of A (H3N2). The levels of immunity to A (H3N2) demonstrate fluctuations around the linear trend (close to the horizontal line in some cities). These fluctuations are better captured by the periodic model,best fit by R 2 for 17 cities out of 38, -but due to their irregular nature, the linear trend function has much more predictive power (the lowest MSE for 22 cities). The achieved results are consistent with the previously formulated hypothesis that the dynamics of immunity depends on the year the strain began to circulate: strain A (H1N1) pdm09 began to circulate in 2009, as a result of which the level of population immunity to it increases in time, while population immunity to strains A (H3N2) should be more stable due to the circulation of viruses of this subtype from the late 60s. It is known that epidemiologists observe that in the absence of virus circulation in a population, immunity to it begins to decline, and its drop to a critical level leads to an epidemic of influenza caused by this strain. For example, the work [21] provides an estimate of this level at 0.5, i.e., 50% of the population should not have immunity to the circulating strain. The indicated effect is not observed on the data set available to the authors. In the future, it will be interesting to consider a sample with laboratory tests of immunity for a longer period of time and to verify the mentioned observation quantitatively. The obtained statistical models will be used to perform data imputation to assess the immunity levels which were not reported by the healthcare units. Also the mentioned models will be incorporated into the models of influenza dynamics to enhance their descriptive and predictive force. For this case, it should be considered that the data on immunity levels we regarded in this study are related solely to biological immunity, whereas the factor of the indirect protection due to peculiarities of the contact patterns of the population cannot be measured in such a way. This fact might result in underestimating of the real immunity level when using the statistical data in the models. To take into the account the immunity connected with indirect protection, spatially explicit models could be used which consider the contact structure of the population [22].

Acknowledgments
The work was supported by Russian Foundation for Basic Research [grant number 18-37-00428].