Network-Aware Credit Scoring System For Telecom Subscribers Using Machine Learning and Network Analysis
Network-Aware Credit Scoring System For Telecom Subscribers Using Machine Learning and Network Analysis
https://www.emerald.com/insight/1355-5855.htm
APJML
34,5 Network-aware credit scoring
system for telecom subscribers
using machine learning and
1010 network analysis
Received 11 December 2020 Hongming Gao and Hongwei Liu
Revised 13 July 2021
2 August 2021 School of Management, Guangdong University of Technology, Guangzhou, China
Accepted 23 August 2021
Haiying Ma
School of Internet Finance and Information Engineering,
Guangdong University of Finance, Guangzhou, China
Cunjun Ye
School of Management, Guangdong University of Technology, Guangzhou,
China, and
Mingjun Zhan
Business School, Foshan University, Foshan, China
Abstract
Purpose – A good decision support system for credit scoring enables telecom operators to measure the
subscribers’ creditworthiness in a fine-grained manner. This paper aims to propose a robust credit scoring
system by leveraging latent information embedded in the telecom subscriber relation network based on multi-
source data sources, including telecom inner data, online app usage, and offline consumption footprint.
Design/methodology/approach – Rooting from network science, the relation network model and singular
value decomposition are integrated to infer different subscriber subgroups. Employing the results of network
inference, the paper proposed a network-aware credit scoring system to predict the continuous credit scores by
implementing several state-of-art techniques, i.e. multivariate linear regression, random forest regression,
support vector regression, multilayer perceptron, and a deep learning algorithm. The authors use a data set
consisting of 926 users of a Chinese major telecom operator within one month of 2018 to verify the proposed
approach.
Findings – The distribution of telecom subscriber relation network follows a power-law function instead of the
Gaussian function previously thought. This network-aware inference divides the subscriber population into a
connected subgroup and a discrete subgroup. Besides, the findings demonstrate that the network-aware
decision support system achieves better and more accurate prediction performance. In particular, the results
show that our approach considering stochastic equivalence reveals that the forecasting error of the connected-
subgroup model is significantly reduced by 7.89–25.64% as compared to the benchmark. Deep learning
performs the best which might indicate that a non-linear relationship exists between telecom subscribers’ credit
scores and their multi-channel behaviours.
Originality/value – This paper contributes to the existing literature on business intelligence analytics and
continuous credit scoring by incorporating latent information of the relation network and external information
from multi-source data (e.g. online app usage and offline consumption footprint). Also, the authors have
proposed a power-law distribution-based network-aware decision support system to reinforce the prediction
performance of individual telecom subscribers’ credit scoring for the telecom marketing domain.
Keywords Credit scoring, Relation network, Stochastic equivalence, Power-law distribution,
Machine learning, Deep learning
Paper type Research paper
2. Literature review
2.1 Marketing in the telecom industry
There are two major research domains for the telecom operators to shape their business
competitive edges in the telecom industry. One is to improve the network traffic performance
such as self-organizing network automation (Gomez-Andrades et al., 2016), real-time
troubleshooting and root cause analysis (Imran et al., 2014), service failure prevention (Wallin
and Landen, 2008).
On the other, the revenue of marketing is considered the top priority in the telecom
industry. To improve the revenue, researchers integrated the characteristics of telecom data
and the experience of experts to segment different customer lifecycle values (Bayer, 2010; Han
et al., 2012; Phau et al., 2014). Compared with attracting a new subscriber, the cost of retaining
existing subscribers is much lower. Because of the associated revenue losses, telecom
subscriber churn management is a tool for retaining current subscribers through satisfying
subscribers’ demands (Gao and Bai, 2014; Hung et al., 2006; Ram and Wu, 2016; Verbeke et al.,
2012). For example, Hung et al. (2006) utilized telecom subscribers’ demographics, billing
information, call detail records, and service changelogs to assign a score for their propensity
to churn. Verbeke et al. (2012) found the profit significantly increases in a retention marketing
campaign, after considering the optimal fraction of telecom subscribers with the highest
predicted probabilities to attrite. Simultaneously, telecom fraud detection is a basis to identify
those subscribers at-risk who harms other common telecom subscribers (Xing and Girolami,
2007). In a nutshell, extant studies of marketing are still limited in that because they only
considered the traditional inner telecom data. It suggests a research gap that the conventional
telecom business models reflect the subscriber persona profile in the way of the low
dimensional and single-data source.
Though multi-source data sources have shown their efficient complementary capabilities to
the financial credit scoring (Djeundje et al., 2021; Oskarsd
ottir et al., 2019; San Pedro et al., 2015;
Yu et al., 2020). The telecom inner data (e.g. mobile phone call-detail records, telecom billing
information) explains the subscribers’ credit scores as input in the financial field, but not the
creditworthiness of telecom subscribers in the telecom marketing domain itself (Olowe et al.,
2021). A high-performance comprehensive credit scoring model is the core of the subscriber
persona profile. Telecom operators aim to not only achieve credit control of defaulting
subscribers but also facilitate intelligent marketing strategies based on the respective credit
ratings of existing subscribers to make a more profitable sale.
2.2 State-of-art techniques for credit scoring Network-aware
2.2.1 Homogeneity models. A variety of state-of-art techniques has been exploited to evaluate credit scoring
personal credit scores and risk of default with multi-source data (San Pedro et al., 2015; Yu
et al., 2020; Luo, 2019; Fayyaz et al., 2020; Oskarsd ottir et al., 2019; Zhang and Dai, 2020). San
system
Pedro et al. (2015) conducted a study to model users’ financial risk and credit scores with
logistic regression, support vector machines, and gradient boosted trees. Their findings show
that the proposed approach incorporating consumption, network, mobility features from
mobile phone usage data, which is more accurate, comparable to a model without financial 1013
history. Yu et al. (2020) used logistic regression to model online social media data from
Douban.com (a community like Twitter where users can follow each other and post reviews to
movies or books). Their credit evaluation system validated the forecasting capabilities of
complex information within online social media. Besides, the prediction performance of
artificial neural networks is also implemented in the financial credit risk classification
problem. The multilayer perceptron algorithm is compared to four machine learning
algorithms and reveals its generalization in the credit default swaps market (Luo, 2019).
These recent studies have established a connection between multi-source data and state-of-
art credit scoring techniques. However, the extant literature assumed that all individual users
as a static and homogeneous population. That is, all individual users are evaluated by a
homogeneity model without considering heterogeneity. The unobserved heterogeneity needs
to be revealed by different customer segments (Le et al., 2019; Zhan et al., 2020).
2.2.2 Heterogeneity models. A few studies recently captured user heterogeneity for credit
scoring classification problems in peer-to-peer lending platforms (Ahelegbey et al., 2019;
Giudici et al., 2019). In a research stream of network science, user heterogeneity needs to be
captured and identified as stochastic equivalence (Ahelegbey et al., 2019; Giudici et al., 2019;
Gao et al., 2021). That is, a phenomenon is often seen in a group relation network in which the
users (vertices) can be divided into different segments such that members of the same
segment have analogous patterns (Hoff, 2008). Different interconnectedness within any
different user subgroup has been a representation of unique stochastic equivalence
respectively. Specifically, Giudici et al. (2019) leveraged information by extracting topological
features of borrower similarity networks and identifying different communities. They found
the network inference enhances credit risk accuracy. Follow the work (Giudici et al., 2019),
Ahelegbey et al. (2019) employed the inner-product of borrower-specific eigenvectors to
convert into a borrower relation network. Then they simply hypothesized that borrowers’
relationships in the network obey a Gaussian function, and classified the population of
borrowers into a connected subgroup and an unconnected one. Thanks to the stochastic
equivalences of those two subgroups, their heterogeneity models facilitate the prediction
capability when confronting a binary credit-granting decision-making problem.
Since the multi-source data sources can reveal a latent relation of telecom subscribers
(Hung et al., 2006; San Pedro et al., 2015; Oskarsd ottir et al., 2019). For example, the call detail
record reflects the actual relationship between telecom subscribers. Moreover, both their
online mobile app usage patterns and their offline consumption disclose the relationship
between their behaviour patterns. To our best of knowledge, most literature still lacks the
context of continuous credit scores management in which telecom operators can precisely
recommend new products or services, or upgrades or add-ons up-selling according to the
creditworthiness of subscribers in a fine-grained manner. We assume that the stochastic
equivalence of different telecom subscriber subgroups can be inferred from the latent
relation in a perspective of network science. Following the work (Hoff, 2008; Ahelegbey
et al., 2019), we argue that there is a network-aware credit scoring system integrating
identified stochastic equivalence, which might mitigate the high level of systemic risk and
enhance the prediction performance of continuous credit scoring in the field of telecom
marketing.
APJML 3. Methodology
34,5 This study presents a novel network-aware credit scoring model using multiple sources,
e.g. traditional telecom inner data, online app usage data and offline consumption footprint
data. The research framework is shown in Figure 1. We first achieve information fusion using
singular value decomposition and propose a power-law-based relation network model.
Furthermore, several state-of-art algorithms are exploited for business intelligence analytics.
After that, the proposed models are compared with a benchmark that cannot account for
1014 stochastic equivalence. Last, the forecasting error analysis assesses the performance of these
techniques.
X ¼ A þ E ¼ UDV T þ E (2)
Singular value
decomposition
Network
inference
Ahelegbey et al. (2019) assumed the relationship follows a Gaussian function Φðθ þ rij Þ,
wherein Φ is the cumulative density function and a constant θ ¼ Φ−1 n −2 1 . We relax the
hypothetic restrictions. The data-driven discovery in Subsection 5.2 validates that the
subscribers’ relations obey a power-law function instead of Gaussian distribution. This can
be described by a probability density pðrÞ such as
In Equation (5), rij is the observed value of the relation between subscribers i and j, C is a
normalization constant. The constant parameter α of the distribution is known as the scaling
parameter, to be estimated. In a discrete case, the probability of the power-law distribution is
PðrÞ ∝ r −α (Clauset et al., 2009). It can be defined as the probability Crij−α of an edge between
vertices i and j (i.e. Aij ¼ Aji ¼ 1 in an undirected network):
Wherein λ is a parameter to reflect the sparse degree of the subscriber relation network. After
inferring the latent subscriber relation network, the stochastic equivalence of different
telecom subscriber subgroups can be identified by Aij.
The forecasting error for RF is the quantity on the right of Equation (10). Its’ convergence
indicates that RF will not have an over-fit problem.
3.2.3 Support vector regression (SVR). Support vector machines and SVR are a type of
popular machine learning algorithms to deal with nonlinear problems (Benkedjouh et al.,
2015). The purpose is to determine an optimal hyperplane with a maximum margin that acts
as the decision boundary. SVR is applied to evaluate credit scores and demonstrates its
superior performance (Baesens et al., 2003; Goh and Lee, 2019).
Based on the observed data X ¼ ðxi ; di Þ; i ¼ 1; . . . ; n, where di ∈ R, SVR aims to solve
the problem of inferring a function yi ¼ f ðxi Þ. We need to train a SVR, equivalent to its
regression form:
X
n
f ðxi Þ ¼ αi αi* kðxi ; xj Þ þ b (11)
i¼1
T
Wherein, α ¼ ðα1 ; α2 ; . . . αn ÞT , α * ¼ ðα1* ; α2* ; . . . αn* Þ and b are the parameters. Besides,
kðxi ; xj Þ denotes a positive definite kernel function. By minimizing the following objective
function, we can compute α and α * ði ¼ 1; . . . ; nÞ,
1 X Xn
Xn
αi αi* αj αj* kðxi ; xj Þ þ ε αi þ αi* d αi αi* (12)
2 i;j¼1 i¼1 i¼1
Subjecting to: Network-aware
X
l
credit scoring
αi α*i ¼ 0 and αi ; α*i ∈ ½0; C (13) system
i¼1
Wherein ε and C indicate the hyper-parameters used to minimize the learning error. With the
notion of support vector, the output prediction in Equation (11) is:
1017
Xn
byj ¼ αi α*i kðxi ; xj Þ þ b (14)
xi ∈ SV
In this study, we use a Gaussian function with the width of the kernel σ:
k xi xj k2
kðxi ; xj Þ ¼ exp − (15)
2 σ
3.2.4 Multi-layer perceptron (MLP). MLP is one of the artificial neural network techniques,
which has good capability of approximating any finite sets of real numbers (Juhos et al., 2009;
Chong, 2013) with multiple layers between the input and output layers. MLP is widely used
for credit risk assessment (Dahiya et al., 2016; Luo, 2019). The two hidden-layered MLP is
deemed to have better performance (Juhos et al., 2009; Chester, 1990), and it non-linearizes
several linear regression models by the typical sigmoid activation function:
1
f ðxÞ ¼ (16)
1 þ e−x
Where both the input and output layers have linear units. The vector xi of m observable
features of subscriber i takes the form xi1 ; ; xim. Equation (17) displays the class of this
telecom subscriber produced by the one hidden-layered MLP called perceptron, which is the
building block of the multi-layer perceptron. And Equation (18) shows the result of
Equation (17).
!
X m
oi ¼ f
1s
wt xit þ wbias
1s 1s
(17)
tþ1
X
l2
yi ¼ wr o1s
i
r¼1
! (18)
X
l2 X
l1
yi ¼ wr f w2r 1s
s oi þ w2r
bias
r¼1 s¼1
01.4.13.5
−0.932−67564−.4
ID3_CheckName 42 5
04360.902.706.47544660615.52951581131.6562.8.254093.277503484
0.1 .357
23 53224−3.2118.−74
−0
38 78 04 7 0 2
ID4_Undergrad 106 85 .5
0.12 9 20
929
0.3
3
ID5_X4G_Unhealthy
.88009.303−−23181−−1
1.19765
18 15 92
3.105112 100.7.67
ID6_AVR_RecentSixPay −20.33702
−6
9.
.3
2−1 90 31 −2.4
ID7_Ban_Mon .4 .42
57
7 3 823
8.69 62139 8
98
1−.0
4−01.549
ID8_BanStable 26397883
.68 24
0−.3 4
0 68.35
1.144276
−0.
ID9_BanStable_Rel6mon
84
6
1−.29.9940498
1
1018
−9334.609.89
5
071
ID10_CreditQuality .5 535 07
36
−0
44 1 7568
.5
2
6 3 279 40
664
2
5 6 4.4
08
ID12_MonLastPay 0
0. −0.09281
3 8 2 56
8
34 1 556405 1 14 22 64 .34 757 0 39 1 98 02 28 75 50 7 11 3 19 49 48 01 19 49 51
.8 57379
ID13_Number_call 01.4
79490.91126959.79.04091703.5258.3021023.659774.367301412.6.742646.49041.4795058.2237.779−901.10.287654.3972066.48976.359292400.0383906.6.14061.67.4191.4309921.84.46 .14 −0.
52 37 9 −9
−−00.0 0 7 .2
8
47
−0.
ID14_PayType 527382
..1 21
2−121−4.30 80 970 940 .7− .60 .7− 0. − 0 1 0 0. − 0 0
−0 0.588858
274
8
−0.9
−1.4
ID15_PayStable 2 7021
1.36
. 3
16.963 74
18
ID16_RecentPay 38514
9
. 6 40 04 06 2
0..8
35
9
281
519
012
−0 873
65
ID17_Sensity 50 319814
1.0.6
.9
8 9
0 5625
9
−1
ID18_SensityRate .
−11.51288
8 5 88 9 8
19 −718.70−0.7−0.2 −0.7 −−1 −0 −0 − − − −
ID19_Tenure_net .3 9409
3−0.5
1 6
39
40. .944817 7
9
0.
ID20_Total_Mon 41 2901
59
−0.04.616727
00
OAS1_AppFinan 36 5
6 4 4
2−.40 61361
0.4620
9
8
.1 996
OAS2_AppLogistic .601 463708
7
1−
.42
83.2563881 6
OAS3_AppOnlineShop 3 21
6
. 22
−68
0 −14.9 62
247 .
.7 −0
12 8 3 26
OAS4_AppPlane
−0.346
−01.27724
01 8 01 76 6
.
OAS5_AppTrain 1−1.25
.1 145
9
−0.2
5
39 9088 4
02
1−1 .8 .46504 87 7
61
OAS6_AppTravel
78
4
5 2
−10.762716
.4
−1
OAS7_AppVideo .589 8
−−10. 082571
2.22012
2.906 84
.11 43
OAS8_NumApp
2
100.1369664878
9381637−9201.3
21
−727.744
OCF1_LuxMarket 4. 71504
.3 −
−712 .39.7 9
0 5 6 7 95 9 9 254
.892805.4
01
29 7
.
072
OCF3_Often_Market −1 −0.89454
64 8 50
55
.30−−
3201
8
119.9722
12.7
OCF4_Sam
30
168.3−4
Figure 2. 76
1470.45..5
1.53 4425
OCF5_Spot .1 1
.1−
5
The proposed deep 1
471
− 67
636
728
OCF6_Stadium .10
192
0.64−−
−0 549
0−.160..8
architecture OCF8_MarketRec3Mon
denote the connections between each layer and the corresponding weights on each
connection. While those blue lines denote the biases. Mathematically, layer l computes an
output vector zl with the output zl−1 of the previous layer, the biases bl , and the weight wl :
In Equation (19), tanhðaÞ ¼ ðea − e−a Þ=ðea þ e−a Þ is a rescaled and shifted logistic function.
The final output layer zL is used to predict the credit score. According to the continuous
characteristic of the credit score, we specify a Gaussian distribution function for this response
variable. The loss function can be derived:
1X 2
Lðwl ; bl jyÞ ¼ yi byi (20)
2i∈n
Whereinbyi and yi represent respectively the predicted and actual credit score of the subscriber
i whereas n denotes the total number of subscribers. The multi-threaded and distributed
parallel computation is used to optimize the loss function of DL (Candel et al., 2016).
4. Data description
A leading Chinese telecom operator confronts a challenge in which they cannot implement
precision marketing campaigns according to the existing subscriber persona profile. The
telecom operator provided real-world data, which is a randomly selected sample of 926
subscribers in a month of 2018, consisting of three sources, a traditional inner database, Network-aware
online apps usage, and offline consumption footprint. credit scoring
Table 1 shows the description and definition of our data. The dataset is at the individual
level, including a dependent variable (DV), 20 inner-database (ID), 8 online-apps-usage (OAS),
system
and 8 offline-consumption-footprint (OCF) features. The DV, continuous credit scores of
telecom subscribers approximately range from 400 to 700 shown in Table 1. Meanwhile, 20 ID
features can be mapped into the demographic characteristics (ID1 to ID5), billing information
and call detail records (ID6 to ID20) of subscribers. 1019
When it comes to the OAS characteristics in Table 2, the mobile online apps usage are
available. We observe the subscribers browsed video apps most frequently as its maximum
average (1218.02) this month. All standard deviations of each feature are larger than their
averages. Subscribers behaved heterogeneously in the perspective of mobile online apps
usage. This model-free evidence indicates that stochastic equivalence might exist among
OAS1 AppFinan Total freq of using financial apps 607.05 862.67 0 3,964
OAS2 AppLogistic Total freq of using logistics apps 0.56 5.61 0 90
OAS3 AppOnlineShop Total freq of using online shopping apps 604.19 1137.37 0 9,677
OAS4 AppPlane Total freq of using air-ticket apps 0.94 24.18 0 729 Table 2.
OAS5 AppTrain Total freq of using train apps 0.37 4.15 0 92 Summary and
OAS6 AppTravel Total freq of using tourism apps 10.84 57.21 0 1,068 definition of online-
OAS7 AppVideo Total freq of using video apps 1218.02 2056.82 0 12,296 apps-usage (OAS)
OAS8 NumApp Total freq of using apps 2441.96 3026.14 0 12,938 features
APJML telecom subscribers. That is, all users are evaluated by a credit scoring homogeneity model
34,5 without considering the unique stochastic equivalence of different subscriber subgroups,
which might not address over-lapping information between subscribers, resulting in a high
level of systemic risk and bias of prediction.
We also observe the offline consumption footprint of telecom subscribers in Table 3.
About 43% of subscribers have been to attractions and 33% to stadiums for sports exercises.
One-third of subscribers often go to markets while the average frequency of going to
1020 market(s) is 24.48 in the recent three months.
As shown in Table 1, we observed that the units of the 36 explanatory variables are varied,
thus we apply min-max scale normalization:
x xmin
bx¼ (21)
xmax xmin
Wherein xmax and xmin are the maximum and minimum of the focal variable, respectively.
Multicollinearity might exist between two or more explanatory variables with each
other, resulting in redundancy and high correlation. For inspecting the correlation among
explanatory variables, the heat map is shown in Figure 3. Owing to the redundancy,
Figure 3 merely displays the upper triangles of the correlation matrix. Because of the
absolute values of Pearson correlations, there are four correlation coefficients beyond 0.65.
We observe the strongest correlation between OCF1 and OCF7 is 0.92 implying Wanda
supermarket could be one of the luxury supermarkets. For both ID6 and ID20, the
correlation coefficient is 0.91, which indicates that a majority of subscribers spent a similar
amount of bill in the current month as compared to the average amount over the last
6 months. A high correlation (0.87) between ID17 and ID18 because they are the absolute
and relative measures of sensitivity of phone bills. It is also noticeable that OSA7 is
correlated with OSA8 (0.85), suggesting that sample subscribers might use video-type
applications most frequently among all apps. In sum, the multicollinearity problem is not
significant severe in our data.
relation network, singular value decomposition (SVD) executes information fusion with
traditional telecom inner data, online apps usage data, and offline consumption footprint
data. It is worth noting that the dependent variable, continuous credit scores, is not included.
SVD enables to more accurately dimensionality reduction with an appropriate number of
latent factors and addresses the over-lapping network community detection problem (Sarkar
and Dong, 2011; Ahelegbey et al., 2019). Eigenvalues ðσ 1 ; σ 2 ; ; σ 36 Þ of SVD, namely, the
diagonal elements of D ¼ Λ1=2 estimated by Equation (2), are shown in Table 4 and Figure 4.
Concluding from Figure 4, we observe that the first s ¼ 4 eigenvalues explain 99.99% of the
variance of information within 36 features. More intuitively, Figure 4 shows the decay of
eigenvalues in which a pronounced “elbow” is revealed. The magnitude of eigenvalues
plummets firstly and s ¼ 4 is the elbow point, the characteristic gives us the optimal number
of latent factors according to the work (Sarkar and Dong, 2011). The first four common factors
fi ¼ ui D, wherein fi ¼ ðfi;1 ; . . . ; fi;s Þ0 and s ¼ 4 < 36, as a lower-dimensional vector of latent
factor scores which accounts for 99.99% information of all 36 observed multi-source
subscribers’ features.
Figure 4.
Eigenvalues of SVD
subscribers’ relation is to 0, the more orthogonal the two subscriber-level vectors, which
indicates the lower the probability that the two subscribers belong to the same network
subgroup.
Because of the undirected characteristic of the adjacency matrix, the upper and lower
triangle elements are equal whereas the diagonal elements represent the relation of individual
subscribers themselves. As illustrated in Figure 5, we remove the upper and diagonal
elements for reducing redundancy. The horizontal axis denotes the value of any two
Figure 5.
Distribution of
subscribers’ relation
subscribers’ relation while the vertical axis is the density. We observe that the 95th percentile Network-aware
of subscribers’ relation is 0.262, represented by the dash vertical line. In particular, on the left credit scoring
side of that dashed line, we notice that the central or head part of the distribution lies in the
interval (0.0013, 0.2621), which is dominant but a minority of the total distribution. In other
system
words, 95% of subscribers’ relations are “orthogonal”, they are not closely connected
relatively. Whereas the long tail (0.2621, 1.8096) of the subscribers’ relation distribution on the
right part of Figure 5, accounts for only 5% of the whole density. This phenomenon seems
that the subscribers’ relations obey some kind of power-law distribution instead of a normal 1023
Gaussian distribution (Ahelegbey et al., 2019) because the density of the subscribers’ relation
varies as a power of the subscribers’ relation itself (Clauset et al., 2009).
We posit that the value of subscribers’ relation (i.e. elements of the adjacency matrix R)
obeys a power-law function. Following Equation (5) and (6), our proposed power-law network
inference model need to estimate the constant C and slope parameter α in the first step. We
transform Equation (5) into a natural logarithm form:
ln PðAij ¼ 1jrij ; αÞ ¼ ln C α ln rij (22)
Then we use the least-squares method to estimate the above equation, the estimates are
b ¼ 0:0008 and b
C α ¼ 1:678 for our case. Figure 6 shows the observed data of the telecom
subscribers’ relations (i.e. the black triangles) and the fitted blue curve represents the power-
law function with the estimates. Figure 6 demonstrates that our estimated function very
closely fits the observed data, which suggests the adjacency matrix R of the subscriber
relation network obeys the power-law distribution. Thus we transform it back into the power-
law function form:
Prðr ¼ rij Þ ¼ 0:0008 * rij−1:678 (23)
It is evidence that the distribution of subscribers’ relations goes far beyond the managerial
Pareto principle in which 80% of outcomes result from 20% of all causes (Craft and Leake,
2002). Our findings show that 95% of density results from the former 14% values of
subscribers’ relations, computed by (0.2621–0.0013)/(1.8096–0.0013) 5 0.1442. On this sparse
network model, the choice of the set is 0.95, in light of the sparse degree of the network.
Derived from the inference of our proposed latent relation network model, a power-law-based
network reveals in Figure 7.
Two latent subgroups exist among the focal telecom subscribers. In other words, we
observed that there are two unique stochastic equivalences identified in the network. Figure 7
Figure 6.
The fit of the estimated
power-law distribution
APJML displays the full network in a global view, demonstrating that a large majority of subscribers
34,5 (591 of 926) are disconnected and distributed discretely in the outer ring. Besides, the other
subscriber subgroup in the centre is connected closely. Stochastic equivalence is a
phenomenon that is often seen in a group relation network where the vertices can be divided
into different subgroups such that members of the same subgroup have analogous patterns
(Hoff, 2008). In a nutshell, this network-aware inference identifies two unique stochastic
equivalences, namely, it divides the subscriber population into a connected subgroup (centre
1024 part) and a discrete subgroup (outer ring) while the two subgroups are with a sparse distance.
In particular, Figure 8 zooms in the connected subgroup to probe its stochastic
equivalence. Subscribers who belong to the connected subgroup exhibit a good deal of
interconnection in the term of demographics, telecom billing information, call detail records,
online apps usage, and offline consumption footprint. We noticed that the source of this
interconnection is the latent information between subscribers, not necessarily their actual
contact. Owing to its stochastic equivalence, the connected subscribers have an analogous
pattern. On the other hand, the subscribers in the discrete subgroup reveal another stochastic
equivalence. Their interconnection is rather sparse. The average and standard deviation of
the credit scores for these 335 connected subscribers are 629.98 and 36.27. After adding the
discrete subgroup, the statistics for all telecom subscribers are 616.11 and 41.10, respectively.
This suggests that the creditworthiness of subscribers becomes lower and unstable without
the two unique stochastic equivalences. The nature of stochastic equivalence merely
indicates any subscriber subgroup behaves similarly but they do not have to be in an
identical magnitude of creditworthiness. These findings provide us model-free evidence that
the credit scoring homogeneity models might cause a high level of systemic risk in credit
scores and a bias of prediction performance without considering stochastic equivalence. We
argue that such latent information embedded from the subscriber relation network enhances
the predictive capabilities of credit scoring models. Thus, we proposed a power-law-
distributed network-aware credit scoring system considering stochastic equivalence.
Figure 7.
Full network in a
global view
Figure 8.
The connected
subgroup in a
local view
5.3 Forecasting performance and comparison Network-aware
In this section, several algorithms are implemented to build credit scoring models for credit scoring
analyzing forecasting performance and comparison, including MLR, RF, SVR, MLP, and DL.
Credit scoring models within each algorithm are constructed with three sample subsets
system
separately. The full-sample models as benchmark models are composed of all 926
subscribers. To account for the stochastic equivalence, the network-aware credit scoring
system is proposed for predicting the credit scores of telecom subscribers within two
subgroups. The connected-subgroup models consider 335 interconnected subscribers 1025
whereas the discrete-subgroup models are based on the other 591 subscribers. Note that
the observation of any individual subscriber comprises 36 explanatory features, which come
from their telecom record, online apps usage, and offline consumption footprint as seen in
Table 1. Due to managerial practice in prediction, we use 60% of each sample subset to train
the models and the rest as a testing dataset to validate the performances and analyze
forecasting errors.
Specifically, we compute the predicted credit scores byi of individual subscribers. Mean
absolute error (MAE) and root mean squared error (RMSE) are extensively employed to
evaluate the effectiveness of credit scoring models (Chang and Yeh, 2012; Ince and Aktan,
2009; Zhan et al., 2020):
1 X
MAE ¼ yi byi (24)
n i∈n
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 2
RMSE ¼ yi byi (25)
n i∈n
Wherebyi and yi represent respectively the predicted and sample credit scores of the subscriber
i whereas n denotes the total number of subscribers in the testing dataset of each data sample.
Obviously, the lower values of MAE and RMSE represent a better accuracy.
Furthermore, in order to compare the relative prediction performance of network-aware
models with that of the benchmark models, forecasting error reduction (FER) is exploited
(Shang et al., 2020). We let MAE0 and RMSE0 be the performance of a benchmark model using
a full sample without network-aware knowledge. Then we denote MAEτ and RMSEτ as those
performances of a model τ using connected-subgroup sample subset or discrete-subgroup
sample subset. Hence, the performance of considering latent information of stochastic
equivalence can be calculated by FER:
MAEτ MAE0
FERMAE ¼ 3 100 (26)
MAE0
RMSEτ RMSE0
FERRMSE ¼ 3 100 (27)
RMSE0
The performance of different credit scoring models by the full sample (benchmark), the
connected-subgroup sample, and the discrete-subgroup sample datasets, as summarized in
Table 5, respectively. We intuitively observe that, as compared to the benchmark model
without network information, the network-aware credit scoring models (i.e. connected-
subgroup-based and discrete-subgroup-based models) improve forecasting performance in
terms of FERRMSE and FERMAE. The MAEs and RMSEs of MLR, RF, SVR, MLP, and DL
decrease after using the latent network-aware information. Specifically, the results
demonstrate that the connected-subgroup models have significant forecasting error
reduction in the range of 7.89–25.64% over the benchmark models. Whereas, the discrete-
APJML Model Network-aware MAE RMSE FERMAE/% FERRMSE/%
34,5
MLRbenchmark No 18.712 24.441 / /
MLRconnected Yes 15.170 19.207 18.929 21.415
MLRdiscrete 18.001 23.470 3.799 3.973
RFbenchmark No 16.030 20.990 / /
RFconnected Yes 14.738 19.334 8.060 7.889
1026 RFdiscrete 15.389 20.392 3.999 2.849
SVRbenchmark No 16.729 22.865 / /
SVRconnected Yes 13.894 18.845 16.947 17.581
SVRdiscrete 16.107 21.830 3.718 4.527
MLPbenchmark No 14.573 18.836 / /
MLPconnected Yes 11.096 14.006 23.859 25.642
Table 5. MLPdiscrete 14.538 18.655 0.2402 0.961
Credit scoring models DLbenchmark No 12.783 16.177 / /
performance on the DLconnected Yes 10.844 13.720 15.169 15.188
testing dataset DLdiscrete 12.009 15.508 6.055 4.136
subgroup models also outperform the benchmark models with an average forecasting
accuracy improvement of 3.43%. As our proposed models remark an improved prediction
performance, we find strong support that the network-aware credit scoring system can
leverage latent information embedded in the subscriber relation network.
In particular, MLR performs the worst in terms of MAE and RMSE. The best credit
scoring model is DL with the lowest MAE and RMSE. Although the connected-subgroup
MLP model has the highest forecasting error reductions, FERMAE and FERRMSE. It suggests
that the stochastic equivalence in the connected-subgroup has a comparative advantage in
improving the predictive capability of MLP. The connected-subgroup DL model performs
better than the other algorithms. We illustrate its estimated architecture as our proposed deep
neural network architecture in this study, as shown in Figure 2. The finding indicates that a
non-linear relationship exists between telecom subscribers’ credit scores and their multi-
channel behaviours.
On the condition of stochastic equivalence, the predictive capability of connected-
subgroup models is superior to that of the discrete-subgroup models. This finding implies
that within the connected-subgroup identified, subscribers are more homogeneous to each
other internally when they use apps online, consume offline, and in the focal telecom operator.
A managerial direction for the telecom operator is to achieve an intelligent subscriber persona
profile, leading to more accurate cross-sell precision marketing campaigns. Additionally,
though the discrete-subgroup models outperform the benchmark models, the diversity of
these discrete telecom subscribers’ deserves further study.
References
Ahelegbey, D.F., Giudici, P. and Hadji-Misheva, B. (2019), “Latent factor models for credit scoring in
P2P systems”, Physica A: Statistical Mechanics and Its Applications, Vol. 522, pp. 112-121.
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J. and Vanthienen, J. (2003),
“Benchmarking state-of-the-art classification algorithms for credit scoring”, Journal of the
Operational Research Society, Vol. 54 No. 6, pp. 627-635.
Bayer, J. (2010), “Customer segmentation in the telecommunications industry”, Journal of Database
Marketing and Customer Strategy Management, Vol. 17 Nos 3-4, pp. 247-256.
Benkedjouh, T., Medjaher, K., Zerhouni, N. and Rechak, S. (2015), “Health assessment and life
prediction of cutting tools based on support vector regression”, Journal of Intelligent
Manufacturing, Vol. 26 No. 2, pp. 213-223.
Candel, A., Parmar, V., LeDell, E. and Arora, A. (2016), Deep Learning with H2O, H2O. ai, Mountain
View, pp. 1-21, available at: https://www.h2o.ai/resources/.
Chang, S.-Y. and Yeh, T.-Y. (2012), “An artificial immune classifier for credit scoring analysis”, Applied
Soft Computing, Vol. 12 No. 2, pp. 611-618.
Chester, D.L. (1990), “Why two hidden layers are better than one”, Proc. IJCNN, Washington, District
of Columbia, Vol. 1, pp. 265-268.
Chong, A.Y.-L. (2013), “Predicting m-commerce adoption determinants: a neural network approach”,
Expert Systems with Applications, Vol. 40 No. 2, pp. 523-530.
Clauset, A., Shalizi, C.R. and Newman, M.E. (2009), “Power-law distributions in empirical data”, SIAM
Review, Vol. 51 No. 4, pp. 661-703.
Craft, R.C. and Leake, C. (2002), “The Pareto principle in organizational decision making”,
Management Decision, Vol. 40 No. 8, pp. 729-733.
Dahiya, S., Handa, S.S. and Singh, N.P. (2016), “A rank aggregation algorithm for ensemble of multiple
feature selection techniques in credit risk evaluation”, International Journal of Advanced
Research in Artificial Intelligence, Vol. 5 No. 9, pp. 1-8.
Djeundje, V.B., Crook, J., Calabrese, R. and Hamid, M. (2021), “Enhancing credit scoring with
alternative data”, Expert Systems with Applications, Vol. 163, 113766, doi: 10.1016/j.eswa.2020.
113766.
Fayyaz, M.R., Rasouli, M.R. and Amiri, B. (2020), “A data-driven and network-aware approach for
credit risk prediction in supply chain finance”, Industrial Management and Data Systems,
Vol. 121 No. 4, pp. 785-808.
Gao, H., Liu, H. and Yi, M. (2021), “Inferring values of recommendation links: analysis of co-purchase
network based on ERGM and product involvement”, 2021 IEEE International Conference on
Consumer Electronics and Computer Engineering, pp. 108-113.
Gao, L. and Bai, X. (2014), “An empirical study on continuance intention of mobile social networking
services”, Asia Pacific Journal of Marketing and Logistics, Vol. 26 No. 2, pp. 168-189.
Giudici, P., Hadji-Misheva, B. and Spelta, A. (2019), “Network based scoring models to improve credit risk
management in peer to peer lending platforms”, Frontiers in Artificial Intelligence, Vol. 2, p. 3.
Goh, R. and Lee, L.S. (2019), “Credit scoring: a review on support vector machines and metaheuristic Network-aware
approaches”, Advances in Operations Research, Vol. 2019, pp. 1-30.
credit scoring
Gomez-Andrades, A., Barco, R., Munoz, P. and Serrano, I. (2016), “Data analytics for diagnosing the
RF condition in self-organizing networks”, IEEE Transactions on Mobile Computing, Vol. 16
system
No. 6, pp. 1587-1600.
Han, S.H., Lu, S.X. and Leung, S.C. (2012), “Segmentation of telecom customers based on customer
value by decision tree model”, Expert Systems with Applications, Vol. 39 No. 4, pp. 3964-3973.
Hand, D.J. and Henley, W.E. (1997), “Statistical classification methods in consumer credit scoring: a
1029
review”, Journal of the Royal Statistical Society: Series A (Statistics in Society), Vol. 160 No. 3,
pp. 523-541.
Hoff, P. (2008), “Modeling homophily and stochastic equivalence in symmetric relational data”,
Advances in Neural Information Processing Systems, Vol. 20 No. 6, pp. 657-664.
Hung, S.-Y., Yen, D.C. and Wang, H.-Y. (2006), “Applying data mining to telecom churn management”,
Expert Systems with Applications, Vol. 31 No. 3, pp. 515-524.
Imran, A., Zoha, A. and Abu-Dayya, A. (2014), “Challenges in 5G: how to empower SON with big data
for enabling 5G”, IEEE Network, Vol. 28 No. 6, pp. 27-33.
Ince, H. and Aktan, B. (2009), “A comparison of data mining techniques for credit scoring in banking:
a managerial perspective”, Journal of Business Economics and Management, Vol. 10 No. 3,
pp. 233-240.
Juhos, I., Makra, L. and Toth, B. (2009), “The behaviour of the multi-layer perceptron and the support
vector regression learning methods in the prediction of NO and NO2 concentrations in Szeged,
Hungary”, Neural Computing and Applications, Vol. 18 No. 2, pp. 193-205.
Kimura, M. (2021), “Customer segment transition through the customer loyalty program”, Asia Pacific
Journal of Marketing and Logistics, Vol. ahead-of-print, doi: 10.1108/APJML-09-2020-0630.
Le, A.N.H., Tran, M.D., Nguyen, D.P. and Cheng, J.M.S. (2019), “Heterogeneity in a dual personal
values–dual purchase consequences–green consumption commitment framework”, Asia Pacific
Journal of Marketing and Logistics, Vol. 31 No. 2, pp. 480-498.
Luo, C. (2019), “A comprehensive decision support approach for credit scoring”, Industrial
Management and Data Systems, Vol. 120 No. 2, pp. 280-290.
Olowe, A., Olorundare, J.K. and Phillips, T. (2021), “Using open APIs to drive financial inclusion via
credit scoring built on telecoms data”, International Journal on Data Science and Technology,
Vol. 7 No. 1, pp. 17-22.
ottir, M., Bravo, C., Sarraute, C., Vanthienen, J. and Baesens, B. (2019), “The value of big data
Oskarsd
for credit scoring: enhancing financial inclusion using mobile phone data and social network
analytics”, Applied Soft Computing, Vol. 74, pp. 26-39.
Phau, I., Quintal, V. and Shanka, T. (2014), “Examining a consumption values theory approach of
young tourists toward destination choice intentions”, International Journal of Culture, Tourism
and Hospitality Research, Vol. 8 No. 2, pp. 125-139.
Ram, J. and Wu, M.-L. (2016), “A fresh look at the role of switching cost in influencing customer
loyalty”, Asia Pacific Journal of Marketing and Logistics, Vol. 28 No. 4, pp. 616-633.
San Pedro, J., Proserpio, D. and Oliver, N. (2015), “MobiScore: towards universal credit scoring from
mobile phone data”, International Conference on User Modeling, Adaptation, and
Personalization, pp. 195-207.
Sarkar, S. and Dong, A. (2011), “Community detection in graphs using singular value decomposition”,
Physical Review E, Vol. 83 No. 4, 046114.
Segal, M.R. (2004), Machine Learning Benchmarks and Random Forest Regression, Center for
Bioinformatics and Molecular Biostatistics, UC, San Francisco, CA, available at: https://
escholarship.org/uc/item/35x3v9t4.
Shang, G., McKie, E.C., Ferguson, M.E. and Galbreth, M.R. (2020), “Using transactions data to improve
consumer returns forecasting”, Journal of Operations Management, Vol. 66 No. 3, pp. 326-348.
APJML Teles, G., Rodrigues, J.J., Kozlov, S.A., Rab^elo, R.A. and Albuquerque, V.H.C. (2020), “Decision support
system on credit operation using linear and logistic regression”, Expert Systems, Vol. 38 No. 6,
34,5 e12578.
Verbeke, W., Dejaeger, K., Martens, D., Hur, J. and Baesens, B. (2012), “New insights into churn
prediction in the telecommunication sector: a profit driven data mining approach”, European
Journal of Operational Research, Vol. 218 No. 1, pp. 211-229.
Wallin, S. and Landen, L. (2008), “Telecom alarm prioritization using neural networks”, 22nd
1030 International Conference on Advanced Information Networking and Applications-Workshops
(AINA Workshops 2008), pp. 1468-1473.
Xing, D. and Girolami, M. (2007), “Employing latent Dirichlet allocation for fraud detection in
telecommunications”, Pattern Recognition Letters, Vol. 28 No. 13, pp. 1727-1734.
Yu, X., Yang, Q., Wang, R., Fang, R. and Deng, M. (2020), “Data cleaning for personal credit scoring by
utilizing social media data: an empirical study”, IEEE Intelligent Systems, Vol. 35 No. 2, pp. 7-15.
Zhan, M., Gao, H., Liu, H., Peng, Y., Lu, D. and Zhu, H. (2020), “Identifying market structure to monitor
product competition using a consumer-behavior-based intelligence model”, Asia Pacific Journal
of Marketing and Logistics, Vol. 33 No. 1, pp. 99-123.
Zhang, Z. and Dai, Y. (2020), “Combination classification method for customer relationship
management”, Asia Pacific Journal of Marketing and Logistics, Vol. 32 No. 5, pp. 1004-1022.
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: [email protected]