Modelling a Behavioral Scoring System for Lending Loans using Twitter

. Traditionally, banks follow a risk assessment model in sanctioning loans. Risk assessment is performed by computing a credit score considering certain financial factors. This work proposes a behavioral score that can be computed from social media data. Social media covers almost all aspects of a person’s life. Integrating the credit score with the behavioral score of a person lowers the risk that comes with traditional assessment models. The behavioral score is measured by the profile score, financial attitude, and twit score. A general profile score is computed for the data fetched from Twitter. The twit score of a person is calculated by considering multiple parameters like relevance, usage, and authenticity. Additionally, to strengthen the model, a novel multi-level voting ensemble is implemented with 84% accuracy to scrutinize the financial attitude of the individuals. Pair wise comparison is used to reveal the importance of the various criteria analyzed. The behavioral score is derived by aggregating the three scores accordingly. This research work proposes fusing social media details as an added risk evaluation feature in granting loans.


Introduction
There has been a major change in the banking paradigms to cater to Digital India.The emergence of a cashless economy doubled by the social and digital technologies of today have put even more pressure on banks to scale down their structures and focus on their most profitable core business segments.To keep pace with these changes, Indian banks are increasingly coming up with technology innovation using mobile, analytics, and social media [11] to suit their customer needs and serve them more efficiently.Banks can expand their horizons from KYC Forms to be aware of a customer.They should tap into the digital identities associated with the customer to perform various analyses and forecasts [10].One major use case of this would be in the Loans sector.
Loans and lending have evolved into a major banking sub-sector in the 21st century.When an individual applies for a loan, the lenders traditionally use proprietary mathematical algorithms to compute a credit score based on the financial circumstances and account history.It is generally computed considering an individual's capacity, collateral and capital.Despite this, there has been a steady increase in the rate of defaulters' i.e. unpaid loans.An increased number of such fraudulent cases can even lead to the closure of banks in the longer run.This throws light on the fact that potential risks are not exclusive to previous financial transactions but rather behavior as well.In a growing digital era, the best way to assess the behavioral score of a person would be to tap into his or her digital footprint on social media.This also reflects the characteristics of their behavior.Their behavior tends to throw light on their approach towards loan payments.Because of its' popularity and ease of acquiring data, Twitter [9] has been chosen as the source of social media data.Tweet users behavior can be shaped based on optimistic or pessimistic value of the future market [12].The authors of [14] has developed a systematic risk model for predicting the financial market prices from financial tweets.
Twitter is a micro-blogging and social networking service in which users post and interact with messages known as "tweets".Twitter users can post, like, and retweet tweets and it also allows videos and pictures to be shared in the tweets.A person's online presence on a microblogging platform like Twitter would serve as a litmus test in evaluating their characteristics and personality.Their tweets on stock markets [13] or the latest financial initiatives can elucidate their financial attitude.
This paper proposes a behavior scoring model to classify customers based on their presence on Twitter.It aims in generating behavior scores which would be used as portfolio sanctioning decision [15] in conjunction with an existing credit score.The profile score and financial attitude are evaluated by developing a multilevel voting classifier.A multilinear approach is defined in considering these parameters and Bradley Terry mathematical model is subsequently implemented to aggregate the scores.

Behavioural Score
Rao, et al. [1] have developed a social score known as the Klout score which assigns an influence rating to users based on their profile across various social networking platforms.They have utilized supervised models to determine the weights for features, and the final Klout Score is obtained by hierarchically combining information from multiple communities and networks.Another approach [2] would be to use deep learning along with network science to mold the social influence and predict human response across various events.They have discussed the correlation between social influence and various activities by analyzing and inferring human real-life behavior through online social networks.This throws light on the fact that a person's social presence reflects their behavior.Jie Tang, et.al. [5] have described the different methods and algorithms for calculating social influence related measures based on statistical media.Similarly, Yanhao Wei, et al. [7] studied social behavior as a model with the exogenous network through tie formation and analyzed the effect of judiciously constructed networks in attaining higher scores.They have investigated how the accuracy of social network-based scores changes when consumers can strategically construct their social networks to attain higher scores.It also throws light on how strategic social networks can be used to improve customer scores by analyzing the strength of ties from friendship formation.In the context of companies, the authors [8] use sentiment analysis to compute the social media score of companies and perform classification using Support Vector Machine.Authors of [16] proposed a graphical Gaussian model, it calculates systemic risk based on the financial tweets.TweetCred, a semi supervised ranking model assigns a credibility score to the tweets based on their credibility [17].

Credit Score
Yuejin Zhanga, et al. [3] have addressed the issue of loan default following the FICO score and found the significant factors that contribute to the same.They also merge certain social media aspects while computing the criteria.They have used a decision tree to implement an online peer to peer channel for credit scoring.This reports on the importance of assigning proper weightage to the attributes involved.Similarly, upon performing a comparative analysis of the credit classification models, the authors [4] have presented a case study on the 2006 Taiwan credit flow crises.They have considered the model with and without imputation and concluded that an artificial neural network (ANN) is a better alternative than a decision tree and logistic regression when data availability is high in the dataset.

Gap Analysis
These papers have served as a support system for us to understand the existing features and helped us in developing a new assessment model.The major conclusion that can be drawn is that relevance of the people on social media i.e. their presence among the friends' circle [2][7] plays a vital role in determining their rating.Apart from this, it can be established that the behavioral score should be developed, considering both profile factors as well as analyzing the tweets [8] of the person.Concerning the techniques, it has been observed that Artificial Neural Network [6], Logistic Regression, Decision Tree, and Support Vector Machine [8] are the popularly applied techniques.SVM yields good results as it is used for bi-class classification.This elucidates that these routines can be combined to develop a multi-level voting ensemble classifier that yields better results.About credit score, the factors that lead to a loan default can be identified [3] and weightage assigned accordingly.To combine it with the behavioral score, Gu¨l, Sait, etal.[8] rely mainly on performing a sentiment analysis using SVM to define the credit rating of the companies.Aggregation operators are defined to find final rating ofcompanies with credibility distributions.The approach used for companies cannot be replicated as such for individuals.While a company could be primarily assessed on financial standing and tweets, an individual's profile needs to be further analyzed to reflect his or her attitude and habits.The financial score has been specifically developed to segregate the financial tweets of the user.This is used to gain a perspective on his financial views.Unlike the former, sentiment analysis has been performed by developing a novel multi-level voting classifier to scrutinize the tweets and boost accuracy.Further to this, the Twit Score has been developed to find the quality of the Twitter user.

System Design
The following, as illustrated in Fig. 1, describes the system workflow to compute the novel score which is defined as an aggregate of behavioral and credit score.The behavioral score is summed up by analyzing the Twitter profile and tweets of a person.The credit score is based on the traditional loan application.

Data Collection
The details of the profile and tweets posted by the users are accessed using Tweepy.Tweepy is a Python library for accessing Twitter API.It is an open-source library, hosted on Git-Hub and enables Python to communicate with the Twitter Platform.The users whose data is to be collected need to have certain relevant financial tweets.To achieve this, the popular hash-tags used by various banks and their handles are collected from the tweets.The id of these users is fetched from the response JSON object.These users are again passed as objects to the following model.These are then aggregated to form the dataset of the users.These users are again filtered on the range of their financial tweets.Using Tweepy, the various details and tweets of the users are retrieved.The Twit Profile, Financial attitude, and Profile Score is computed as stated in Table 1.

Score Definition
Twit Score The user profile of the person forms the basis of computing the twit score.It is an attempt to rate the quality of Twitter user by various metrics available through the API.A Twitter user with low twit score is more likely to be a sign of a spam account or a less safe user.

Profile Score
The tweets of the user form the basis of this score.They are pre processed by tokenization, lemmatization, stop-word removal etc. Subsequently, sentiment analysis is performed and the percentage of positive tweets is chosen to be the profile score.

Financial Score
The financial tweets of the user form the basis of this score.The financial tweets are identified by checking against a corpus of such terms.A threshold is set on the minimum number of financial tweets that need to be present for the analysis.To classify the tweets as positive or negative, a multi-level voting classifier is implemented and the number of positive tweets is chosen as the financial attitude.

Bradly Terry Model
Pair wise comparison is pivotal to reveal the importance of the three criteria.Pair wise comparison is utilized to rank the three parameters.The three scores have a priority weight attached to them and are combined based on it.The scores need to be prioritized to reflect their relative importance.Bradley Terry is probabilistic mathematical model that has been chosen to derive the priority weights and the number of times a factor is preferred to others becomes the weight assigned.

Credit Score
The credit score is based on several factors like payment history, credit exposure, and age of the credit.These parameters, in turn, depend on multiple factors, and weights are assigned for all based on the traditional loan application.

Methodology
In this section, the implementation of the scoring system is elucidated upon.The following describes the various methods and modules carried out for computing the Twit, Profile, and Financial Attitude score and executing the Bradley Terry Model.The modules have been carried out in the Python framework along with the necessary external libraries.

Modelling the Twit Score
The Twit Score is an attempt to rate the quality of Twitter users by various metrics such as friend/follower ratio, profile completeness, and other factors that are all available through the Twitter API.The Twit Score is developed giving equal weightage to elements like FF ratio, relevance, usage, and authenticity.A person with an FF ratio around 4 throws light on the fact that the user is more likely to be followed by all the person he or she is following.This indicates that the user is probably more relevant among his or her friend's circle and has acquired this ratio as a result of people liking what he or she talks about.This factor is further supported by the value of relevance score.A person with FF ratio greater than 10 indicates the person does not have enough amount of followers compared to the number of people he is friends with.This is a reflection of a less socially relevant user and complemented by the usage score.The twit score computation factors are further described in Table 2.The Twit score is initially calculated for a total of 20 and it is later evaluated on a scale of 100.

Modelling the Profile and Financial Score
Firstly, the raw tweets are given as an input for applying standard NLP pre-processing technique to extract contextual features.This is fed to multi-level voting ensemble to train the model.The financial tweets are segregated from the list of tweets based on the predefined list of handles and hashtags used.

Feature Extraction
Data Pre-processing.Twitter data may be incomplete, inconsistent and noisy which can produce misleading mining results.Data preprocessing is a proven method of resolving such issues.It transforms raw data into a cleaner and understandable format.It also filters out useless data.
Various steps involved in it are data cleaning, integration, feature reduction, and transformation.The list of preprocessing rules applied is shown in Table 3.

Stopword removal
Remove stopwords like a, an provided by the standard NLTK package Smiley removal Remove emoticons from tweets.

Replace user mentions
Replace '@' mentions in a tweet

Punctuation removal
Remove punctuation marks and nonenglish character.

I FF Ratio / Friend Follow Ratio
A person's follower count compared to the number of people they follow is a good measure of an interesting or integral Twitter user.

II
Relevance Score This is to verify how influential the user is among his or her followers i.e. the relevance of the user in his circle.This throws light on the impact factor of an user.

Listed Ratio
Listed Count / Followers Count.This is calculated as the ratio of followers who have listed them to their total followers.This throws light on how many followers find them impactful.

II. b.
Re-tweet Average number of re-tweets shows the impact of an user.

III
Usage Score Too much time on social media is not a great quality.
In this computation, highly active users have a lower usage score and less active users have higher score.Having a Profile picture is also seen as a positive factor i.e. confidence.

IV Authenticity Score
This score is used to track the legitimacy of the user and how sound the profile is for consideration.5

25% of Twit
IV. a. Duration Duration is the best way to verify the legitimacy of a person's twitter profile.

Followers Count
The no. of people they follow can play quite a large part in calculating Twitter handle interest scores.A person who follows a large number of people may not be a perfectly ideal person.Hard voting ensures the majority class label predicted at each level is allocated to the tweet.Thus, the classifier is optimized to achieve the highest accuracy of prediction.

Pairwise Comparison
The Bradley-Terry model [18] is a probability model that can predict the outcome of a paired comparison.Given a pair of individuals i and j drawn from some population, estimates the probability that the pair wise comparison P(i>j) is turns out true if i > j is defined by If the competitions are assumed to be mutually independent as in our model, then the probability is found to satisfy the logit model.The three scores Twit, Profile, and Financial Attitude score are compared as such and the probabilities allocated.The competitions are calculated by using   is derived by

Computing the Behavioral Score
The behavioral score of the user is to be computed as follows.

Experimental Results
The various experimental results that were recorded during the development of this research work have been presented in the following section.The classification algorithms discussed have been applied to the two different datasets (training and testing) and their results have also been given.

Data Collection
The tweets and user details required for analysis are not readily available.The required tweet dataset has to be constructed by collecting the tweets, as mentioned, from Twitter API (using Tweepy).Dataset was collected from February and March in 2020.All the pre-processing rules described in Table 3 have been applied and the cleaned tweet is displayed against the original tweet in Table 5.

Evaluation of metrics of the Multilevel Voting Classifier
The evaluation metrics such as true positive, false positive, true negative, false negative can be found.It has been observed that VC 2 (Tf-Idf) is found to perform better than VC1 (CV) as the features are unigram.Tf-Idf is generally found to cover unigram features better.

Fig. 3. Class for tweets predicted by classifier
This is because adding extra features in the CV may lead to over-fitting.Multi-layer perceptron is found to give the best result when combined with the best performing classifier of model 1.The final voting classifier achieves an accuracy of 84 %.Table 6.illustrates the tweets and their corresponding prediction classification by ensemble.

Behavioral Score
Following the method prescribed in the previous sections, the behavioral score of the user is computed as illustrated in Table 7. Pair-wise comparison is implemented on the basis of Bradley-Terry model.The weights have been derived as the average of these probabilistic measures.The threshold value of the behavioral score is aggregated from the Pair-wise comparison.The algorithm has computed this score for 51 relevant users.As mentioned previously, users with null value are a reflection of a negative Profile and Financial Attitude.Hence their score has been 0. The ones whose values are found to exceed the fixed threshold of 170 have a good behavioral score.Further analysis of the scores have been presented as case studies.The behavioral score computation for tweet users is shown in Table 8.

Evaluation of Credit Score
Credit score can be defined in simple terms as the chances of how likely a person is to pay back the debt.Typical credit scores when summarized over several factors range from 300 to 900.There are generally credit bureaus that calculate these scores.This score not only affects a person's chances of getting a loan sanctioned but even plays a crucial role in deciding the interest rates.The weights are fixed to the following credit score based on the age group.The factors have been based on a traditional bank application.The weights have been allocated on a scale of 100.On a scale of 900, the threshold is fixed as 500 i.e. a value above that was traditionally granted a loan.Based on that, the threshold is set as 55 for the same.The following weights are assigned as in Table 9.

Case Studies
The threshold value for behavioural score is fixed as 170 i.e. users having behavioural score greater than the specified limit are considered.Similarly for credit score, on a scale of 100, 55 is the threshold.Due to the nonavailability of details or attributes required to compute the credit score, the values presented here are based on assumptions.The total score is computed on a scale of 400 and the values above 226 (thresholds combined) can be sanctioned.The case studies are represented in the following Table 10.

Conclusion
Introduction of behavioural score in the banking sector is an advancement that replaces the traditional credit score computation methods and factors.This paper proposes a scoring system that considers both the financial aspects and the social attitude of a person.The first major contribution of this paper is three measures that are used to calculate the behavioural score.The profile score and financial attitude throw light on the personality of the user and calculated by performing sentiment analysis.While traditional sentiment analysis models rely on individual classification algorithms, this system applies a novel multi-level voting classifier.This enhances the accuracy of the results and overcomes the weakness attached with individual prediction models.It can be observed that the model composed of logistic regression and random forest in Level I and along with multi-layer perceptron in Level II reports the best performance.The classifier achieves an overall accuracy of 84%.
Apart from these two scores, there is an emergence of a third unique twit score.The twit score performs a thorough analysis of the profile and scales for factors like authenticity and influence.The second major contribution of this paper is implementation of pairwise comparison (Bradley-Terry Model) which ensures proper aggregation of three scores to compute the behavioural score.A major incentive of this approach is precisely disseminating the magnitude of the three scores while assigning the weights.The behavioural score is summed up with the credit score and the values are evaluated.This is evaluated against the threshold for loan sanction.This paper assumes the value of credit score due to non-availability of data sources for the same.The third major and main contribution of this paper is that this neoteric multi-linear approach could be incorporated by banks to sanction loans in real time.This would also lower the risk that comes attached with traditional assessment models.

Fig. 2 .
Fig. 2. System Architecture Count Vector (CV).The vector is a measure of how frequently the word occurs in a document.Count Vector converts a collection of text documents to a matrix of token counts.Consider a Corpus C of D documents {d1,d2…..dD} (rows) and N unique tokens (columns) extracted out of the corpus C. The size of the Count Vector matrix M will be given by D X N. Example of Count Vector Tokenization is given by D1: He is a lazy boy.She is also lazy D2: Neeraj is lazy.

Table 4 .
Calculation of CV This combines the results of both the feature extraction phases.The third level of the voting classifier is retrieved by merging Voting Classifier 1 and 2 based on their FP rate.This classifier combines the FP values of I and II along with the contradicting TP results of the same for testing.This overcomes the weakness attached with the individual learning models of the feature extraction phases and presents a better result.
identification of sentiment of the tweets as positive or negative.The machine learning algorithms used are Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVC) along Multilayer Perceptron (MLP).It is built involving a variety of bagging and algorithms such as Random Forest (RF), Gradient Boosting (GB), Adaptive Boosting (AB), and Passive-Aggressive Classifier (PA).Multi-level Voting Model.In this module, an ensemble classifier is developed and it is implemented as a multilevel voting model that involves various classifiers.The financial corpus built is limited to 925 tweets.Multilevel voting classifiers harnesses the diversity of the individual base models and provide better performance compared to standalone classifiers because they solve the problem posed by a limited volume of data.They are repeatedly combined on the basis of minimum FP ratio and hence overcome the weakness attached with the existing base learning models as illustrated in the results section.Majority voting is implemented by initializing a higher weight to the better performing classifier.Fig.2.illustrates the architecture of the model.Voting -Classifier 2. Based on the performance of the set of classifiers in Model II, they are combined with hard voting based on the FP rate.The lower the FP rate, the better the classifier performs or a greater weight is attached to that classifier.Voting-Classifier 3.

Table 5 .
Tweets and their pre-processed text

Table 8 .
Behavioral Score Computed

Table 9 .
Loans Criteria

Table 10 .
Case Studies The user was found to have an average Profile score.A better trend of Twit score and Financial score was observed.This is an example of low risk neutral profile.On analysis of Twit Score, the person has a genuine account but the user's frequency score does not seem convincing.Apart from it, the user enjoys a decent influence in his social circle and has a sound financial attitude.The fact that he has 50 as his profile score serve as litmus to a neutral presence.Since the total score is above the threshold 226, the user is sanctioned a loanThe user was found to have a sound Profile and Financial score.The Twit score was found to be a relatively average value.This case is very similar to the case study 1.In earlier case, the authenticity score was found to be 0. The fact that the person has an authenticity score 4.5 on 5 indicates a strong genuine profile even if slightly pulled down by the FF ratio.A decent influence and frequency score ensure the person is accepted.The combined Behavioral score exceeds the threshold and since the person has a good credit score, the loan can be sanctioned.It can be reported that user has a very good behavioral score.The person seems to have a good financial attitude as indicated by type of tweets.The Twit score was found to have been a relatively decent value.A strong FF ratio and authenticity score verify the credibility of a person.Though a slightly hampered frequency score, A remarkable relevance score indicates a certain amount of influence in the circle of the user.The computed Behavioral score exceeds the threshold and since the person has a good credit score, the loan can be sanctioned.