Analysis of Relationship Between Personality and Favorite Places with Poisson Regression Analysis

A relationship between human personality and preferred locations have been a long conjecture for human mobility research. In this paper, we analyzed the relationship between personality and visiting place with Poisson Regression. Poisson Regression can analyze correlation between countable dependent variable and independent variable. For this analysis, 33 volunteers provided their personality data and 49 location categories data are used. Raw location data is preprocessed to be normalized into rates of visit and outlier data is prunned. For the regression analysis, independent variables are personality data and dependent variables are preprocessed location data. Several meaningful results are found. For example, persons with high tendency of frequent visiting to university laboratory has personality with high conscientiousness and low openness. As well, other meaningful location categories are presented in this paper. ITM Web of Conferences 16, 02001 (2018) https://doi.org/10.1051/itmconf/20181602001 AMCSE 2017 © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). are personal data and privacy protection required, rare previous works and be found. I. Burbey, [1] tried to predict next location based on past movement pattern. S. Y. Kim and H. Y. Song [2] used Back Propagation Network (BPN) to analyze mobility data. P. T. Costa and R. R. McCrae [3] categorize personality into five element: openness, conscientiousness, extraversion, agreeableness, neuroticism. It is possible to enumerate each personality elements so that the corresponding numbers can be directly utilized in our research. The most related work to this study can be found in H. Y. Song and E. Y. Lee [4], which used Stepwise regression to analyze the relationship between personality and visiting place. Stepwise regression is repetitive iteration of regression analysis, and is know to find the relationship between dependent variable and independent variable. However, stepwise regression cannot analyze relationship between countable data while the frequency to visit a location is clearly a count data. Therefore, we are going to use Poisson regression to analyze the relationship in this paper. 3. Personality Data and Location Data 3.1 Personality Data For personality data in our research, FFM(Five Factor Model) is utilized. Big Five Factor (BFF) of a person can be obtained by survey using Big Five Inventory (BFI) and 33 volunteers provided their BFF. BFF is composed of five personality factor openness, conscientiousness, extraversion, agreeableness and neuroticism L. R. Goldberg, [5]. The advantage of BFF is that the personality factors are presented in numerical form between 0 to 5 and these values can be independent variables of regression analysis. Table 1. Personality Data BFF Person O C E A N person1 3.3 3.9 3.3 3.7 2.6 person2 2.7 3.2 3.3 2.7 2.8 person3 4.3 3.1 2.3 3.2 2.9 person4 3.6 3.3 2.8 3.2 2.8 person5 4.2 4.3 3.5 3.6 2.6 person6 4.0 3.7 4.0 3.9 2.8 person7 3.5 3.8 3.4 3.2 3.0 person8 2.2 3.4 3.0 3.1 2.6 person9 2.6 2.8 3.4 3.1 2.6 person10 3.3 2.9 3.1 3.1 3.3 person11 3.4 3.2 3.4 3.3 3.1 person12 3.1 3.7 3.4 3.2 3.5 person13 3.4 3.6 3.6 2.9 2.5 ·O: Openness ·C: Conscientiousness ·E: Extraversion ·A: Agreeableness ·N: Neuroticism Table 1 shows part of BFF of total 33 volunteers BFI survey. BFFs of thirteen volunteers are presented. Therefore, direct comparison of personality can be enabled in quantitative manner. For example, person1 has higher Openness than person2 and has lower Neuroticism than person2.


Introduction
Location Based Service (LBS) is one of the emerging topic with wide possibility of future service.Especially, understanding human mobility pattern is one of the core part of LBS.
In addition, it is widely recognized that human personality may affect personal favorite location.The relationship between human personality and human preferred location is valuable to understand.In this paper, we are going to analyze the relationship between human personality and human favorite location by regression analysis.The personality data will be independent variable and the location data will be treated as dependent variable.
Linear regression is a general tool to analyze the correlation between independent variable and dependent variable.However, normal linear regression model such as stepwise regression cannot be a general tool in our case.
In our paper, the independent variable is from Big Five Factor (BFF) of human personality and the dependent variable is from the count of visit to favorable location categories.Since linear regression model is not an adequate to analyze such count data, ( Where y is a variable standing the number of occurrences, n is total number of events, and λ is expectation of probability variable y.Poisson distribution is a convergence of binomial distribution with n goes to infinity.
The structure of this paper is like the followings.Section 2 discuss about the related researches.In section 3, we will discuss about the details of data used in this paper.In section 4, the pretreatment of data for Poisson regression will be discussed.Section 5 will present the result of Poisson regression and analyze the result.In section 6, we will conclude this paper and discuss about possible future research topics.

Previous Studies
Recently, positioning data can be collected by many of commercial devices including smartphones.Thus, the mobility data can be easily collected and these data can be used for various research purpose.For example, user can check in their favorite places voluntarily.
The question arise that human personality and human favorite locations are related.Since these data are personal data and privacy protection required, rare previous works and be found.I. Burbey, [1]

Personality Data
For personality data in our research, FFM(Five Factor Model) is utilized.Big Five Factor (BFF) of a person can be obtained by survey using Big Five Inventory (BFI) and 33 volunteers provided their BFF.BFF is composed of five personality factor openness, conscientiousness, extraversion, agreeableness and neuroticism L. R. Goldberg, [5].
The advantage of BFF is that the personality factors are presented in numerical form between 0 to 5 and these values can be independent variables of regression analysis.•O: Openness •C: Conscientiousness

•E: Extraversion
•A: Agreeableness •N: Neuroticism Table 1 shows part of BFF of total 33 volunteers BFI survey.BFFs of thirteen volunteers are presented.Therefore, direct comparison of personality can be enabled in quantitative manner.For example, person1 has higher Openness than person2 and has lower Neuroticism than person2.

Location Data
Location data can be collected by checking-in user favorite places.For such purpose, smartphone app called SWARM [6] is used by volunteers.Table 2 shows parts of location data collected.It contains place names, place categories, and the count of visit to the categories for person2.
The topmost count shown in Table 2 is 69 at university laboratory by person2.The university laboratory is categorized as university building.The count to a category is a sum of count to places belonging to the category.For example, student restaurant, university library, and so on belong to category of university building and the counts of student restaurant, university library, university laboratory is summarized into the count of university building category.
Once the category count of a specific place is less than or equal to five, the place is categorized into etc.category.It is because to avoid the effect of minute count to regression analysis.In sun, we have 49 categories used in this paper.

Data Preprocessing
The counts of location categories are utilized as dependent variable of regression analysis.However, several preprocessing stages required in advance to the regression analysis.The counts need to be pruned.The raw count may effect to regression analysis negatively with biases.Therefore, too high count and too low count must be excluded before regression analysis.Out of 36 data set of volunteers, 2 set of data are excluded because of their too high count and one set of data is excluded because of its too low count.
In addition, the counts need to be normalized.Therefore, we normalized the raw counts in t rates by equation (2).
The normalized data can be used as dependent variable.

Regression Analysis Result
Poisson regression is the major tool we used for the analysis of relation between location data and personality data.
Calibrated count of location categories in Table 4 is used as dependent variable, and personality data in Table 1 is used as independent variable.The Poisson regression library incorporated in R studio is used.Figure 1 shows a sample analysis results executed in R studio environment.Poisson regression analysis is done for the location category 'theater' and personality data.
The main result in Fig 1 shows the effect of five independent variables, O, C, E, A, and N toward the location category theater.First, results with p-value less than 0.05 must be observed.This is because that, statistically, result of regression analysis with p-value judged to be meaningful.In this example, the visit to theater and factor O (Openness) is related.The values in Estimate column is quantitative effect of independent variable to dependent variable.

Fig 1. Poisson Regression Example
The estimate of O in Figure 1 is 2.46986, which leads to analysis result that a person with high Openness is tent to visit theater frequently.On the contrary, once we have negative values on Estimate for the effect of one of BFF to a location category, it will be judged that low BFF value leads to high tendency to visit a location.
Table 6 shows summarized results of total regression analysis.There are 49 location categories to be analyzed and 23 meaningful location categories are found.For each location categories, Poisson regression analysis is done and found meaningful when p value of each test found less than 0.05.And estimate value of each BFF is recorded in Table 6.Results column in Table 6 shows the symbols of the effective personality factor.The sign '+' indicates positive effect.Once we have estimate values greater than 2, the sign '++' is used.On the contrary, the sign '-' indicates negative effect between personality and location category, and the sign '--'-rsonality and locateffect when the estimate values is less than -2.A blank cell in Table 6 stands for meaningless results from Poisson regression analysis.
There is interesting tendency shown in Table 6.Poisson regression analysis indicates that University Laboratory are positively related with C (Conscientiousness) having Estimate value of 2.80482 and thus indicated as ++C for the location category university laboratory, meaning high positive effect of C to the location category university laboratory.
On the contrary, O (Openness) has negative effect with estimate value of -2.111 on University Laboratory and indicated as --O.As a result, a person with high conscientiousness and low openness has strong tendency to visit university laboratory frequently.

Conclusion
In this study, we tried to analyze a relationship between frequently visiting place of a person and the person's personality by use of Poisson regression.Personality can be represented by BFF, and frequent Since the frequentness of visit is represented by count of visit, Poisson regression is regarded as an adequate method.In addition, we categorized and normalized location data.
Comparing to results of previous research by H. Y. Song and E. B. Lee [4], our result shows similarity in several locations.For example, University Building shows -O,++N in our research and very similar to that of previous research which shows --O for University Building.The difference maybe due to various reasons such as data used, regression method, and so on while the trend of relationship is still valid.As a result of this study, several location categories are found to have meaningful relations with BFF.
Also our research can be applied to many of related areas in order to increase imposed value of each area, such as enhancement of mobility model D. Alberg, M. Last, and S. Elnekave, [7], and D. Guo and W. Cui, [8].
We hope our research results can be applied to various location based service.For example, more prediction rate can be found in case user's personality is incorporated in location prediction system.
There are lots of analysis method left untried, such as ZINB regression, spearman rank correlation coefficient, and so on, and applying other methods on our dataset can reveal latent relationships or find the relationship more clearly.
we will use Poisson regression to analyze data set containing count data.Poisson regression model is based on Poisson distribution.Poisson distribution is a probability distribution regarding the count of occurrences in a given unit of time or area which probability mass function is shown as equation (1).
tried to predict next location based on past movement pattern.S. Y. Kim and H. Y. Song [2] used Back Propagation Network (BPN) to analyze mobility data.

Table 1 .
Personality Data

Table 2 .
Location Data

Table 4 .
Raw Location Data

Table 5 .
Pruned and Normalized Location Data

Table 6 .
Regression An alysis Result