0% found this document useful (0 votes)

11 views10 pages

Classification Multi

This paper presents a multi-label user classification method for social media using the ML-KNN algorithm, addressing the limitations of existing single-label classification methods. It proposes a user topic classification approach based on heterogeneous networks and community detection, demonstrating improved accuracy and efficiency in classifying users with multiple themes. The study contributes to the field by enhancing user representation and classification techniques in social media contexts.

Uploaded by

Nil fima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

Classification Multi

Uploaded by

Nil fima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Technological Forecasting & Social Change 188 (2023) 122271

Contents lists available at ScienceDirect

Technological Forecasting & Social Change

journal homepage: www.elsevier.com/locate/techfore

Research on multi-label user classification of social media based on

ML-KNN algorithm
Anzhong Huang a, Rui Xu a, Yu Chen b, Meiwen Guo c, *
a
School of Economics and Management, Jiangsu University of Science and Technology, PR China
b
School of Electromechanical Engineering, Guangdong University of Technology, PR China
c
School of Management, Xinhua College of Sun Yat-sen University, School of management, Guangzhou Xinhua University, Dongguan City 523133, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: Several research studies have been conducted on multi-label classification algorithms for text and images, but
Social media few have been conducted on multi-label classification for users. Moreover, the existing multi-label user classi
Account classification fication algorithm does not provide an effective representation of users, and it is difficult to use directly in social
Multi-label
media scenarios. By analyzing complex social networks, this paper aims to achieve multi-label classification of
ML-KNN algorithm
Heterogeneous network
users based on research in single-label classification.
Considering the limitations of existing research, this paper proposes a user topic classification method based
on heterogeneous networks as well as a user multi-label classification method based on community detection.
The model is trained using the ML-KNN multi-label classification algorithm. In actual scenarios, the algorithm is
more effective than existing multi-label classification methods when applied to multi-label classification tasks for
social media users.
According to the results of the analysis, the algorithm has a high level of accuracy in classifying different
theme users into a variety of different scenarios using different theme users. Furthermore, this study contributes
to the advancement of classification research by expanding its perspective.

1. Introduction 1.1. Multi-label classification algorithms

The classification of social media users is an effective method for In order to solve the multi-label classification problem, there are a
analyzing and managing social media. There is, however, a tendency to number of algorithms, which can be categorized into two types based on
classify users by a single label in most of the existing methods. Despite how they solve the problem: one is based on the problem transformation
the fact that single label classification can serve the needs of social method, and the other is based on algorithm application.
media management to some degree, in practice, a user contains multiple The problem conversion method involves converting existing prob
topics of information. Therefore, classifying users only by one label often lems into existing single-label algorithms. It is possible to divide prob
misses other topical attributes of users. It is therefore of great impor lem transformation methods into three categories: binary correlation
tance to study the multi-label classification of social media users, methods, classifier chain methods, and Label PowerSet methods. As part
whether in academia or industry. Nonetheless, most of the existing of the binary association method, each label is treated as a single label
multi-label classification algorithms are focused on text, image, and and dichotomies are applied to it. Then the samples are input into
audio processing, and little research has been conducted on multi-label multiple dichotomies, respectively, and finally the dichotomies are
classification for social media users. It represents the purpose of this classified as positive cases as a set of labels for the samples to be pre
paper to classify social media users according to multi-labels in order to dicted (Montañes et al., 2014; Yuan et al., 2018; Hadi and Kusprasapta,
achieve this objective. 2021). The classifier chain method is based on the number of labels
trained by multiple classifiers. A first classifier is only employed to train
the input data to train the model, while a second classifier is used to train
the input data and the output prediction results on training, and so on,

* Corresponding author.
E-mail address: [email protected] (M. Guo).

https://doi.org/10.1016/j.techfore.2022.122271
Received 9 May 2022; Received in revised form 26 August 2022; Accepted 29 August 2022
Available online 30 December 2022
0040-1625/© 2023 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

leading to the creation of multiple classifiers on training to generate classification model (Santos et al., 2016). A concise statistical analysis of
multiple tags in the end (Lughofer, 2022; Nitin and Pramod, 2022; all algorithms and methods is presented in Table 1.
Douglas et al., 2022). Through the use of the Label PowerSet method, To summarize, the existing user-tabbed classification algorithms,
each multi-label combination in the training set is converted into a which are used for multi-label classification, primarily rely on network
unique single label, which can be used to train a multi-topic classifica structure to characterize user characteristics. Building a network can
tion model in the single-label scenario using a traditional machine effectively describe the multiple relationships between users, which
learning algorithm or deep learning algorithm (L. and Rokach, 2010; makes the extracted features effective in describing the differences and
Abdullahi et al., 2021; Yap and Raymer, 2021). similarities between them. The use of more tags is therefore necessary
In this method, a particular algorithm is extended to handle a multi- for users. It is important to note that the current methods are primarily
label problem rather than transforming the problem into multiple affected by two problems when describing users: time and space
dichotomized subsets. There are several classical multi-label classifica complexity; the features used are too simple. Existing methods for
tion algorithms adapted from traditional machine learning algorithms, selecting training models are based on SVMs and other algorithms for
such as ML-KNN (M. and Zhou, 2007; Zhu et al., 2021; Agarwa et al., training multi-label classification models. In order to address the
2021), Rank-SVM (Jiarong et al., 2014; Kassim et al., 2021; Guoqiang shortcomings of existing methods, the paper examines the selection of
et al., 2022), ML-DT (Victor et al., 2022; Helena et al., 2021). The user representations and training models and proposes an algorithm that
traditional KNN, SVM, and decision tree algorithms are only applicable can effectively solve the problem of multi-label classification of social
to single-label classification problems. However, they can be directly media users.
applied to multi-label classification problems through their improve Due to the inadequacy of existing research, this paper proposes a
ment. In recent years, several novel multi-label classification methods multi-label supervised user classification algorithm based on community
have been developed using deep learning models due to the improve detection. Firstly, we need to construct the heterogeneous information
ment of hardware computing capabilities. Due to the way the deep network under the multi-label scenario. As users with similar beliefs and
learning model is constructed, multi-label classification algorithms interests tend to interact, this study divides the user nodes in the
based on deep learning can take into account how labels are related and network into communities using an overlapping community detection
learn a finer-grained representation of samples at the same time. algorithm and uses the results of the community division to represent
the characteristics of the user community. Secondly, in this method,
both textual and behavioral information can be used to describe user
1.2. Multi-label classification methods
topic information. In the linear operation time, this paper extracts users'
multi-label relationship features in order to supplement the problem of
A majority of the existing user multi-label classification algorithms
insufficient representation strength caused by community features.
are embedded in the construction of user interaction networks to
Finally, in this paper, the ML-KNN multi-label classification algorithm is
propagate tags and extract user characteristics in order to achieve user
used to train the user multi-label classification model. When compared
multi-label classification.
with other multi-label classification algorithms, this method is more
Some scholars proposed an edge-centered clustering method based
efficient in terms of training the model in time O(n). Based on a com
on user relationship networks in order to extract the social dimension
parison of the proposed method with existing multi-label user classifi
features of each user node, and a supervised user multi-label classifi
cation algorithms on actual data sets, the proposed method is found to
cation algorithm (Edgecluster) has been developed based on the SVM
be more effective at classifying social media users with better
algorithm (Tang and L., 2009; Guoqiang et al., 2020; Zhongwei et al.,
performance.
2020). A multi-tag relational neighborhood classifier (SCRN) using so
cial context features is proposed, which propagates node-class tags
2. Problem description
iteratively using the idea of the RL algorithm and assumes that the
number of tags per node is fixed to select k categories with the highest
2.1. The problem
probability as the final multiple labels of each node (Wang and Suk
thankar, 2013). As another option, the overlapping community division
Classifying social media users is intended to screen out abnormal
method (MORC) can be used, which is based on layer clustering to
users or to identify user sets relevant to a particular topic from a large
characterize each user, and then the model is trained (Lu et al., 2018;
number of social media users. Generally, each social media user belongs
Palla and Derényi, 2005; Gregory, 2007). This graph embedding algo
to multiple themes. With multi-label classification of social media users,
rithm (HCGE) takes advantage of Gaussian distributions for heteroge
the goal is to assign multiple theme tags to each user at the same time.
neous networks and optimizes two objective functions in order to train a
Consider a classification task, in which the number of instances is
model that embeds each node in the network into a potential space and a
represented by N, and the number of labels is represented by q. The
sample set can be expressed as VI = (I1 , I2 , …IN ), every instance has an
Table 1 eigenvector in the Input space, and all label sets can be expressed as Vt =
The existing multi-label classification algorithms and methods. ( )
l1 , l2 , …lq .
Multi-label Main ideas and methods Now, it is assumed that the first five nL instances in the instance set
classification
are multi-label samples, which are expressed as VIL = (I1 , I2 , …, InL ), in
Algorithms Problem conversion method: It converts the problem into a which each instance has a multi-label set Yi =
single label algorithm. The main methods include the binary ( 1 2 )T
correlation method, the classifier chain method, and the Yi , Yi , …, Yiq ∈ {0, 1}q . If Ii belongs to jth label in the theme set Vt,
Label PowerSet method. j j
Yi = 1. Otherwise, Yi = 0. For the rest VuI = VI − VLI unlabeled samples,
Algorithm application method: Specifically, it extends a
specific algorithm to handle multi-label problems. The main the task of the multi-label classification algorithm is used to make multi-
methods include ML-KNN, RANk-SVM, and ML-DT. label predictions.
Methods Currently, multi-label user classification algorithms are The main difficulty of multi-label classification is that the output
mainly used to construct user interaction networks for label multi-label sets grow exponentially with the number of labels q. Most
propagation and feature extraction. The main methods
multi-label classification algorithms consider converting multi-label
include edge-centered clustering, a multi-tag relational
neighborhood classifier (SCRN), an overlapping community classification tasks into multiple binary classification tasks in order to
division method (MORC), and a graph embedding algorithm solve this problem. Nevertheless, in this scheme, each binary task is
(HCGE). independent of each other, disregarding the correlation between tags. It

2
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

is therefore necessary to implement the classifier chain method. How (M. and Zhou, 2007). In essence, each sample should be marked by
ever, this method will affect the experimental results with the order of determining its nearest neighbors in the training set, followed by
tags. obtaining statistical information based on its tag set, meaning the
number of instances belonging to each possible category. According to
the maximum posteriori principle, multiple labels are determined for the
2.2. The whole process
samples to be labeled.For a given sample x, the label y represents a bi
nary class label set. Therefore, y(l) = 1 signifies that sample x belongs to
A general description of how to solve the problem of multi-label
the lth label. Otherwise, y(l) = 0 (where, l ∈ Y, and Y represents the set of
classification of social media users can be found in Fig. 1. Firstly, het
all labels). Set N(x) represent the K samples in the training set closest to
erogeneous networks are constructed in multi-label scenarios. Hetero
node x, Cx(l) as the quantity statistics of each label l among the K
geneous networks are constructed in multi-label scenarios with users
neighbors.
and entities of different topics, as opposed to the dichotomy problem.
∑
Secondly, heterogeneous networks use more label relations as well as Cx (l) = ya (l), l ∈ Y (1)
community characteristics extraction, tabbed relationship characteris a∈N(x)
tics, including tabbed scenarios that describe relationships between
For every sample t to be tested, the ML-KNN algorithm first defines
features and user entities, and community feature extraction method for
the N(t) and Cx(l). Set H1' and H'0 as the probabilities of sample t belong
the overlapping community detection algorithm, which is applied to
heterogeneous networks and to user node overlapping community and not belong to label l respectively. Meanwhile, set Elj (j ∈ {0, 1, …, K} )
classification. Thirdly, each user's community characteristics are char as an event in which x has j samples belonging to label l among the
acterized using the results of community division. Lastly, the ML-KNN neighborhood samples. Therefore, the label of sample t will be obtained
multi-label classification algorithm is used to train the model and pre according to the definitions of Ct(l) and Elj (j ∈ {0, 1, …, K} ).
dict the unlabeled users. ( ) ( ⃒ )
⃒
The fact that each user belongs to multiple topic labels makes it P Hbl P ECl t (l) ⃒Hbl ( ) ( ⃒ )
⃒
difficult to distinguish the topic information of different users based yt (l) = argmax ( ) = argmaxP Hbl P ECl t (l) ⃒Hbl (2)
l
b∈(0,1) P ECt (l) b∈(0,1)
solely on attribute information or the number of neighbor nodes in a
network of users and values of Page Rank. Despite the fact that
Word2vec and LDA theme models are capable of directly representing
user themes for identifying different users, such features are limited to Algorithm 1.
extracting user theme information at the text level and cannot analyze
Input: Initializing the node
the possibility of a user belonging to multiple themes from the
Output: Random sequence array P(H)
perspective of user behavior. Meanwhile, this type of algorithm requires The smoothing factor s is used in calculating the percentage of the Prior probability P
a high level of time complexity to process text. (Hlb).
Multi-label relationship features and community features are
extracted from heterogeneous networks. It is possible to represent the
(1) For l ∈ Y do
relationship between users of different topics in the network as well as ( ) ( ∑ )/ ( ) ( )
(2) P Hl1 = s + m i=1 yxi (l) (s × 2 + m); P Hl0 = 1 − P Hl1
the relationship between users and entities of different topics effectively
The percentage that calculates posterior probability is represented by P(ECtl(l)|Hlb).
using the multi-label relationship features. As a result of overlapping
community divisions, community features are represented.
The fact that a user belongs to more than one community indicates (3) For l ∈ Y do
that the user has a high probability of belonging to more than one topic. (4) For j ∈ {0, 1, …, K} do
In contrast to the LDA and Word2vec models, the text and behavioral (5) c[j] = 0; c' [j] = 0
(6) For i ∈ {1, …, m} do
information of the user are considered simultaneously, which enables ∑
(7) δ = Cxl(l) = a∈N(xl)ya(l)
them to describe whether the user belongs to several topics at the same (8)
( )
If yxi (l) == 1 , then c[δ] = c[δ] + 1
time. Additionally, considering that supervised learning has a more (9) Otherwise, c' [δ] = c' [δ] + 1
reliable and stable classification performance, the supervised ML-KNN (10) For j ∈ {0, 1, …, K} do
( ⃒ ) ( )
algorithm is selected in this paper to train the user multi-label classifi ∑
(11) P Elj ⃒Hl1 = (s + c[j] )/ s × (K + 1) + Kp=0 c[p]
⃒

cation model. As a result of the algorithm, each label is considered ( ⃒ ) ( ∑ )

(12) P Elj ⃒Hl0 = (s + c' [j] )/ s × (K + 1) + Kp=0 c' [p] ; the percentage that calculates
⃒
comprehensively, and a large number of training sets is not required.
yt
(13) For l ∈ Ydo
3. ML-KNN model and algorithm ( ) ( ⃒ )
⃒
(14) yt = argmaxb∈{0,1} P Hlb P ElCt (l) ⃒Hlb

A multi-label KNN algorithm, known as ML-KNN, is an effective al

gorithm to solve the problem of multi-label classification by improving
on the conventional K-nearest neighbor algorithm and Bayesian model

Fig. 1. The flow chart.

3
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

4. User multi-label user classification based on ML-KNN Fig. 2 illustrates how user relationship features can be extracted in
multi-label scenarios. A RT forwarding relationship between a user and
4.1. Calculating multi-label correlation coefficients U2 indicates that the message sent by the user is forwarded by U2, and an
@ mention relationship between a user and U3 and U5 indicates that @
This paper applies the heterogeneous network to describe user mentions U3 and U5 when the user sends the message. The concerned
interaction to extract user features using RS scores in multi-label sce relationship between user and U1 and U4 indicates that the user is
narios. Therefore, heterogeneous networks should be constructed in concerned about U1 and U4, and the figure depicts the RS scores for U1,
multi-label scenarios. As a first step, 60 seed users are manually selected U2, U3, U4 and U5.
for various topics. The second step is to extract 300 tweets published As a result, the 8-dimensional user relationship characteristics on
recently by each seed user, then extract all @ mentioned user names topic 1 are [0,1,0,1,1,1], the 8-dimensional user relationship charac
from these 300 tweets, which results in a set U. Finally, the seed users are teristics on topic 2 are [0,2,0,1,1,0,1,2,0], and the 8-dimensional user
deleted from U, and the remaining users are used as user nodes to build relationship characteristics on topic k are [0,1,1,0,1,2,0].
heterogeneous networks. During the multi-label user entity relationship feature extraction
This paper selects some users to manually mark each topic for process, based on the calculation of each entity's RS scores on a variety
training the follow-up model and calculating the RS score. In addition, of topics, the statistics of all entities of each user are related to the theme
the heterogeneous relational network in a multi-label scenario is con of k-distribution and can be obtained, respectively, from the user in the
structed. There are user nodes under various topics and entity nodes user entity relationship under multiple topics. An analysis of the user
under various topics in the multi-label scenario. Similarly, the paper entity relationship characteristics of each user in multi-label scenarios
connects user nodes in the network based on three types of user can be carried out by integrating the k-distribution user entity rela
connection relationships: attention/attention, RT forwarding/being tionship characteristics of multiple topics in a series. In multi-label
forwarded, and @ mentioning/being mentioned, as well as users and scenarios, the properties of user entity relationships can be used to
entity nodes in the network based on affiliation. show how messages posted by users under different topics are spread
Finally, the RS score of each node in the network is calculated. out.
Rather than a single correlation coefficient, a plurality of values are
calculated for each topic, and each value indicates the degree of interest
of a user in a related topic or the possibility that an entity belongs to a 4.3. Extracting the user community
related topic, which can be specifically expressed asRSTopic1 , RSTopic2 , …,
RSTopick , which are the correlation coefficients of the topic 1,2, …, k, Prior to dividing nodes into communities in a heterogeneous
respectively. network, it is necessary to calculate the similarity between any two
nodes. Only similar nodes can be divided into a community. In order to
determine user similarity, the user is first represented as a feature vector,
4.2. Extracting features of multi-label relationship and then the similarity of any two users is calculated based on Euclidean
distance or cosine similarity. By representing users as topic distribu
During the extraction of multi-label user relationship features, the tions, the Latent DirichSet Allocation (LDA)topic model can extract
user relationship features do not consist of a K-dimension vector, but shallow semantic information from texts. Therefore, this paper will use a
instead are composed of many K-dimension vectors, which represent the model to represent user interests.
probability distribution for each topic to which the user belongs. As an For a given set of documents D which contains M chapters of docu
alternative to problems caused by feature redundancy and sparse con ments and each document has Nd words, all the words contained in set D
nections between users of different topics, when extracting RS scores for constitute a dictionary. First of all, we suppose that the number of all
a user under each topic, this section takes into account all neighbor topics is K and the distribution of the topics is subject to DirichSet dis
nodes under the concerned relationship, the RT forwarded relationship, tribution. Therefore, to any document, the paper uses the DirichSet
and the @ mention relationship among the three user connection distribution as the prior distribution of its subject distribution
relationships. θd 〈p1 , p2 , …, pK 〉,which can be calculated as follows:

Fig. 2. Schematic diagram of extraction.

4
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

θd = Dirichlet(α)d = 1, 2, …, M (3) ∑ w(x, y) + w(x, u) + w(y, u)

Triu = (8)
where α represents the hyperparameter of the distribution, which is a K x,y∈Ng(u),exy ∈E
3
dimensional vector and represents the initialization probability of ∑
document d belonging to K topics. Meanwhile, the LDA model assumes du = w(v, u) (9)
that the prior distribution of words in the topic is also DirichSet distri v∈Ng(u)

bution. In other words, for any topic K, its word distribution

Bk 〈pw1 , pw2 , …, pV 〉 is shown in the following formula: where du represents the number of neighbor nodes. Triu represents the
total number of triangles, which are constructed between node u and its
βk = Dirichlet(η)k = 1, 2, …, K (4) neighbor nodes. Importance degree between nodes can be evaluated by
the value of du + Triu.
where η represents the hyperparameter of the distribution, which is a |V|
There is a strong correlation between similarity between two nodes
dimensional vector and represents the probability of generating any
and their likelihood of belonging to the same category. Nevertheless,
word in V under the topic. For the nth word in any document d in the data
before measuring the similarity of nodes, it is necessary to characterize
set, the distribution of the serial number of topic zdn obtained from the
the nodes using feature vectors. For each node in the network, the paper
topic distribution θd is polynomial distribution, as showed in follow
extracts the 300 most recent tweets published by each node, and then
formula:
concatenates each user's 300 tweets end-to-end to create a long text. The
zdn = multi(θd )d = 1, 2, …, M (5) LDA topic model can be used to represent each user in the form of a topic
probability distribution Tu 〈pt1 , pt2 , …, ptk 〉. Lastly, the similarity between
Finally, M + K Dirichlet-polynomial conjugate prior distributions are any two nodes in the network can be determined by using cosine simi
obtained, and understanding M + K distributions is essential to solving larity, as shown in Formulas (4)–(10).
the LDA topic model, which is calculated as follows:
Tx ⋅Ty
p(k|d ) = p(w|k )*p(k|d ) (6) sim(x.y) = cos(θ) = ⃒ ⃒ (10)
|Tx | × ⃒Ty ⃒
where p(k|d) is θd and p(w|k) is βk.
where Tx and Ty represent LDA topic probability distribution of node x
and y, respectively.
4.4. User overlap community division A key concept in label propagation is the influence weight of
neighbor nodes. The tags of neighbor nodes can only be influenced by
There are two types of existing community detection algorithms: those with relatively large weights. Consequently, this paper will use NI
overlapping community detection algorithms and non-overlapping coefficient and similarity to measure the influence weight of nodes, as
community detection algorithms. The non-overlapping community shown in Formulas (4)–(11).
detection algorithm assigns each node in the network to a community √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
that does not overlap with any other community. There are a number of sim(v, u)
classic non-overlapping community detection algorithms, such as the NNI v (u) = NI(v) × ( (11)
maxh∈Ng(u) sim h, u
Infomap algorithm (Rosvall and B., 2008), the LPA label propagation
algorithm (Raghavan et al., 2007) and the GN algorithm (Newman and NNI(u) represents the influence weight of a neighbor's v imposed on a
Girvan, 2004). Most network structures have overlapping community node. Those nodes with a higher importance and similarity have a
structures as opposed to non-overlapping community structures, or greater influence on their neighbors. A detailed explanation of the
rather, a node may belong to more than one community simultaneously. Whole Process of the Community Detection Algorithm is presented
There are several classical algorithms that can be used to detect over below.
lapping communities, including K-Clique (Palla and Derényi, 2005),
CONGO (Gregory, 2007), and edge community detection algorithms (1) Calculate coefficient values of NI, sim and NNI for all nodes in the
(Ahn et al., 2010; Noor et al., 2020). network, and initialize the community for each node according to
the serial number of each node in the network. In other words, the
4.5. Extracting community features number of communities in the network is equal to the number of
nodes in the initial scenario.
In a large-scale network structure, there is usually some degree of (2) Sort the nodes by NI coefficient values in ascending order to send
similarity between nodes belonging to the same community. There may the label information of the node whose topic feature is obvious
be similarities in network structures between two nodes, as well as to other nodes. Therefore, during the process of the overlap label
similarities in topics between two nodes. This paper will be able to of nodes, the nodes with smaller coefficients are marked first,
extract features that represent users' multi-semantic attributes by because nodes with small coefficients have a lesser influence on
dividing users' communities based on topic similarity. Using the over the marking of other nodes, and nodes with small coefficients
lapping community detection algorithm, large-scale network structures need to be influenced by nodes with large coefficients.
can be divided into overlapping communities in linear time. This algo (3) Obtain the neighbor label sequence (LNg =
rithm can further improve the accuracy and stability of the community {l(c1 , b1 ) , l(c2 , b2 ) ,…, l(cv , bv ) }) of node u according to the label
detection algorithm as compared to other overlapping community information of node u, where l(ci, bi) represents the information of
detection algorithms. As a result, this paper will implement the task of label attributes of neighbor node i; ci represents the label to which
dividing users into communities using this algorithm. the node i is most likely to belong among all the labels of the node
For a network G = (V, E, w) without weight, V represents all nodes in i; bi represents the possibility of node i, which belongs to ci.
the network and E represents the network edge set. If there is an edge (4) Calculate the possibility (b'(c, u)) of node u which belongs to
between two nodes x and y, then w(x, y) = 1. Otherwise, w(x, y) = 0. neighbor labels be means of LNg and NNI. The formulas are
The definition of node importance is shown as follows: expressed as follows:
∑
1 1 (du + Triu ) − mini∈V (di + Trii ) l(c ,b )∈L ,v∈NG(u),cv =c b(cv , v) × NNI v (u)
NI(u) = + ( ) (7) b' (c, u) = ∑v v NG (12)
2 2 mazj∈V dj + Trij − mini∈V (di + Trii ) l(cv ,bv )∈LNG ,v∈NG(u) b(cv , v) × NNI v (u)

5
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

{( ) ( ) ( )} ∑ Table 2
L' = l c1 , b'1 , l c2 , b'2 ,…, l c|L' | , b'|L' | b , b' (c, u) = 1 (13) Summary of multi-label classification features.
c,b'
Category Feature Dimension

The user relationship The user relationship features of 8

(5) The latest label set of each node is updated according to the latest features political topics
L'. In this paper, only the labels, which meet b' (c, u) ≥ 1/|L' |, are The user relationship features of 8
regarded as the multi-label set to which the final node v belongs. economic topics
The user relationship features of 8
(6) Obtain b"(c, u) through standardizing b' in the Lu and select the
military topics
largest category c to be used as the most possible label of node u The user entity relationship The user entity relationship features of 8
from b"(c, u). The formula is expressed as follows: features political topics
The user entity relationship features of 8
b' (c, u) ∑
economic topics
b˝ (c, u) = ∑ ' (c, u)
, b˝ (c, u) = 1 (14)
' b The user entity relationship features of 8
l(c,b )∈L˝ l(c,b' )∈L˝
military topics
Community features Community features 616
Mixed features Mixed features 664
(7) In the event that the result of the community divisions remains
the same as that obtained in the previous round, i.e., the com
munity structure does not change, the iteration will be 5. Experimental results and analysis
terminated.
(8) By calculating the number of each community, a one-Hot vector 5.1. The data source
represents all the communities, which are finally output by the
overlapping community detection algorithm, where each A Twitter data set is used to evaluate the performance of the
dimension of the one-Hot vector represents a community. For MLUCHNCD algorithm (Multi-label User Classification with Heteroge
each user node in the network, if it is a member of a community, it neous Network and Community Detection) proposed in the paper. In
will have a dimension value of 1 corresponding to the corre addition, two data sets were chosen to test the performance of the model
sponding community in its one-hot vector. Otherwise, it is 0. in order to prove the applicability of the MLUCHNCD algorithm in
practical scenarios. Politics, military, and economy are the topics of data
4.6. Training and classifying of the multi-label model set 1, while economy, education, and sports are the topics of data set 2.
To make the constructed heterogeneous network include more users of
Based on the supervised ML-KNN algorithm, the user multi-label related topics, 60 seed users are manually selected for each topic in two
classification model proposed in this paper is trained to label unla data sets. As part of the study, 60 seed users are artificially selected for
beled samples by learning labeled data sets. A feature vectorization of six multi-label themes: economy-politics, economy-military, politics-
every sample in the multi-label training data set is needed in order to economy, economy-education, education-sports and economy-sports.
convert the data to the (x, y) form (where x represents the sample's The latest 400 tweets were extracted from each of the 360 seed users
feature vectors, and y represents its label set). Based on the fact that the in data set 1, and all users (12308) mentioned by the 300 seed users @
dimension of the feature vector extracted in this paper is directly related were obtained by processing the collected tweets. Finally, 12,308 users
to the number of target topics, the specific characterization method is will be used to construct a heterogeneous relationship network based on
introduced by taking politics, military, and economy as examples of multiple labels. 1653 users were also randomly selected for labeling at
multi-label topics for classification. the same time. A heterogeneous network comprising 8000 user nodes, of
For the eigenvector x in multi-label scenarios, each user's multi-label which 1678 have been labeled, was constructed in data set 2.
relationship and community features will be extracted. Multi-label According to the paper, the following indicators are selected for
relationship features include both user relationship features and user evaluation, including Hamming Loss, Precision, Recall, Accuracy, and the
entity relationship features. The user relationship feature is a feature value of F1. The calculation of these indicators is as follows:
vector connected in series with the user relationship feature under three
themes. In the same way, the user entity relationship feature is a feature 1 ∑N ‖h(xi ) ∩ Yi ‖
Precisionmulti = (15)
vector connected in series with the user entity relationship feature under N i=1 ‖h(x ) ‖
i 1

three themes.
1 ∑N ‖h(xi ) ∩ Yi ‖1
During the process of extracting users' community features, the Recalmulti = (16)
overlapping community detection algorithm is used to divide the multi- N i=1 ‖Yi ‖1
label heterogeneous network into overlapping communities in this
1 ∑N ‖h(xi ) ∩ Yi ‖
paper. Having divided the community into 616 communities, the final Accuracymulti = (17)
N i=1 ‖h(x ) ∪ Y ‖
community feature is a 616-dimensional one-hot vector. In multi-label i i 1

scenarios, as shown in Table 2, the final feature vector for each node
1 ∑N 2|h(xi ) ∩ Yi |
is determined by connecting the community features, user relationship F1 score = (18)
N i=1 |Yi | + |h(xi ) |
features, and user entity relationship features.
y represents a binary label vector. Suppose, for example, that the where h(xi) represents the predicted results and Yi represents the manual
sample belongs to both labels 2 and 3 in a classification task with label marking results.
number 3. A supervised multi-label classification model will be trained
using the labeled training set during the training of the ML-KNN model.
5.2. Experimental results
For a user to be labeled xtest, the input of the ML-KNN model is repre
sented as a 664-dimensional eigenvector and the output is represented
5.2.1. Evaluating testing the performance of the MLUCHNCD algorithm
as y = [l1 , l2 , l3 ]. If l1 = 1, then the sample belongs to label i. Otherwise,
As can be seen in Table 3, the results of the test are presented. As
the sample does not belongs to label i during the forecasting process.
demonstrated evidently in this paper, the proposed multi-label classifi
cation algorithm for social media users is capable of performing well in a
variety of classification tasks. Data set 1 performs better in terms of
classification than data set 2. This is due to the fact that in data set 1, the

6
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

Table 3 Table 5
Results of the performance evaluation of multi-label classification. Matrix of multi-label classification confusion (data set 2).
Hamming Accuracymulti Precisionmulti Recalmulti F1_score 1 2 3 4 5 6 7 8 9
Loss
1 147 1 0 0 0 0 0.879 0.991 0.929
Data set 1 0.0357 0.911 0.879 0.862 0.868 2 6 16 0 0 2 0 0.340 0.712 0.79
Data set 2 0.048 0.911 0.890 0.803 0.829 3 5 0 29 6 0 0 0.677 0.619 0.648
4 3 0 2 138 0 0 0.931 0.842 0.891
5 2 3 0 0 41 0 1.00 0.630 0.771
contents of each topic overlap, suggesting some interaction between 6 1 3 2 5 1 21 0.928 0.969 0.950
users of each topic, resulting in a more dense heterogeneous network Note: 1, 2, 3, 4, 5 and 6 represent economy, economy-military, economy-polit
that can better represent users. Nevertheless, in data set 2, the content of ical, military, political, and political-military, respectively. 7, 8 and 9 represent
each topic is very different, and each topic rarely overlaps in content, Precision, Recall Rate, and F1 value, respectively.
indicating that users of different topics do not have interactive re
lationships, making the heterogeneous network constructed sparse and however, result in feature redundancy and increase computational
difficult to represent effectively. complexity when the number of multi-label topics is large. Therefore, in
In Tables 4 and 5, the simulation results are presented for each contrast to the dichotomy scenario, features of multi-label relationships
category based on data sets 1 and 2. Some users will have no neighbors can be extracted directly based on union-type-neighbor in the multi-
during the process of feature extraction, which will result in the failure label scenario.
of the feature extraction. As a result, the number of labeled users in data A comparison of the performance differences caused by the use of
set 1 and data set 2 used in the experiment is 1470 and 1462, respec community-feature alone, relation-feature alone, and hybrid-feature
tively. As part of the training and prediction process, 70 % of the labeled alone for user representation is made in order to demonstrate the val
data will be used as the training set, and 30 % of the labeled data will be idity of the community characteristics proposed in this paper.
used as the test set to evaluate the model's performance. Fig. 4 illustrates the performance evaluation results for each of the
In Tables 4 and 5, it can be seen that the classification performance of three characteristics. A user representation based on multi-label rela
the marked single-label user is superior to that of the multi-label user. tional features performs much better in classification than a user rep
The results conform to the actual application scenarios since a multiple- resentation based on community features. The reason for this is that, as
tag user classification algorithm in the forecast of tabbed users needs to the algorithm for community detection is unsupervised, it is difficult to
accurately predict the predictions of all the labels and the predictions of carry out accurate subject division without supervision. In practical
a single label user. It should be noted, though, that in practice, no matter applications, the majority of community detection algorithms are used
how good the algorithm is, it will always make a wrong prediction for to perform preliminary network community division. In contrast, multi-
one or more labels in a set with more than one label. label relationship features represent the distribution of users' interests
across different topics, which is why multi-label relationship features are
5.2.2. Validation tests of various features better able to represent the possibility that users may have interests in
Due to the large number of labels in the multi-label scenario, if different topics.
different kinds of neighbor user nodes are treated differently, the In spite of the slightly poor performance of community detection,
extracted multi-label relationship features have a higher dimension and community detection results in the feature mixing process improve the
can cause feature redundancy. Therefore, this paper does not distinguish performance of the final algorithm in this paper, thereby demonstrating
the types of neighbor user nodes in its extraction of multi-label rela the effectiveness of community detection applied to user multi-label
tionship features of users, so that following/followed, RT forward/for classification.
warded, and @ Mention/Mentioned neighbor nodes are unified into one
node type. By treating all types of neighbor users as one, the value 5.2.3. Performance comparison of different multi-label classification
brought by different types of neighbor users can also be fully utilized. algorithms
Therefore, experiments are conducted in this paper to demonstrate that, To demonstrate the effectiveness of ML-KNN as a multi-label classi
as compared to multi-label relational features constructed by taking into fication learning algorithm in this paper, the existing multi-label clas
account different types of single-type-neighbor nodes, they perform in a sification algorithms, BR (Binary Relvance) and LP (Label Power Set),
similar manner to multi-label relational features constructed by union- are used to train the model, and the performance of the three algorithms
type-neighbor nodes. is compared.
As shown in Fig. 3, there is a significant difference caused by con In Fig. 5, performance evaluation results are presented for the ML-
structing multi-label relational features in two cases on data set 1. From KNN multi-label learning algorithm, as well as the BR and LP multi-
the figure, it can be seen that both of them exhibit superior classification label learning algorithms, where Precision, Recall, and 1F values serve
performance in five evaluation indices. Nevertheless, a multi-label as performance evaluation indexes for the multi-label classification al
relationship constructed based on single-type neighbors performs gorithm using the Micro Average strategy. The figure shows that the ML-
slightly better in terms of feature performance. This method will, KNN algorithm presented in this paper has the most superior perfor
mance, followed by the BR algorithm, and the LP algorithm has the
worst performance. Since the LP algorithm converts more tags into
Table 4
single labels, the multiple tag problem can be solved by using a single
Matrix of multi-label classification confusion (data set 1).
label classification problem. However, this approach has limitations,
1 2 3 4 5 6 7 8 9
including the need for a large amount of training data and the limitation
1 117 0 9 0 0 0 0.911 0.929 0.908 of being able to mark tabbed combinations of existing problems. In the
2 0 31 1 1 0 1 0.969 0.912 0.938 training focus, there do not appear to be any more labels to mark. Due to
3 9 0 33 0 0 0 0.750 0.799 0.769
4 3 2 0 134 0 1 0.950 0.968 0.949
the fact that the BR algorithm utilizes multiple binary classifiers for the
5 1 0 1 0 96 3 1.00 0.948 0.980 classification of multi-label problems, the BR algorithm achieves a
6 4 0 0 8 0 17 0.770 0.619 0.678 performance between the two. Since the BR algorithm is essentially a
Note: 1, 2, 3, 4, 5 and 6 represent the economy, economy-military, economy-
binary classification problem, compared to the LP algorithm, it requires
political, military, political, and political-military, respectively. 7, 8 and 9 fewer training sets. Additionally, binary classification problems gener
represent Precision, Recall Rate and F1 value, respectively. ally have higher classification performance than multi-classification

7
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

Fig. 3. Discriminating and undiscriminating neighbor node types.

Fig. 4. Performance analysis under different characteristics.

Fig. 5. Evaluation results of different multi-label classification algorithms.

Fig. 6. Evaluation results of the multi-label classification algorithms for different users.

8
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

problems, which is also the reason why BR is superior to LP. networks, this paper considers not only the jump correlation between
neighbor nodes, but also the heterogeneous network, which contains a
5.2.4. Performance comparison of algorithms for different users great deal of valuable information. Secondly, according to this paper,
This paper compares the performance of the MLUCHNCD algorithm, users are only represented by network structure features. Users will be
the MROC algorithm, and the edgecluster algorithm in order to test the able to be classified based on text information, personal attribute in
effectiveness of our algorithm. formation, and network information in the future. Thirdly, this paper
According to the MORC algorithm, based on the constructed topo uses ML-KNN to achieve user-supervised classification model training
logical network, a layer clustering strategy is employed to divide the for the multi-label classification problem. In contrast, deep learning
node community. The partition results are used to represent each user. frameworks, such as RNN and LSTM, are capable of representing users
An SVM algorithm is then applied to train and predict the model. The more finely while taking tag correlation into account. Therefore, multi-
idea behind the edgecluster algorithm is to express the edge in the label classification utilizing deep learning models may be able to achieve
network as a one-hot vector based on nodes, cluster the edge with the k- better classification results if future conditions permit. Because of time
means algorithm, and then express the node as a k-dimensional (the constraints, only a few topics were examined in the data set for this
number of K-means clustering) feature vector based on the type of edge study. Future studies can be conducted to determine whether or not the
connected to each user node. Finally, the SVM algorithm is used to train proposed method is effective on additional topics.
and predict multi-label models for users.
Fig. 6 illustrates the performance simulation results of the multi-label CRediT authorship contribution statement
classification of social media users under different multi-label classifi
cation algorithms. As shown in the figure, MLUCHNCD, the multi-label Anzhong Huang and Meiwen Guo contributed equally to this work.
classification algorithm proposed in this paper, performs better than Rui Xu and Xiaofang Liu analyzed the data.
MROC and edgecluster.
This poor performance of the MROC algorithm can be attributed to Funding information
the fact that the algorithm divides communities only based on user
connections, which results in unreliable results for community division. This study was supported by the “13th Five-Year” plan research
In addition, according to the Jaccard coefficient, fusion centers and project of philosophy and social sciences of Guangdong province (grant
neighborhood associations perform the best in the process of community number: GD17YGL03); Collaborative education project of industry
integration. However, the process just shows that compared with other university cooperation of the Ministry of Education “Research on the
neighbors in terms of community, the community has the highest sim curriculum reform of online marketing based on the integration of on
ilarity with center communities. However, it remains unknown whether line and offline interaction” (grant number: 202102209025).
the Jaccard coefficient is actually related to the center community based
on the highest neighbor community. During the implementation of the
Declaration of competing interest
edgecluster algorithm, the edge-using nodes were first characterized,
and then clustering was performed at the edge, and finally to a node in
The authors declare that there are no conflicts of interest regarding
the edge type as the final representation of the node. Rather than merely
the publication of this paper.
based on community divisions, this method of characterization of nodes
has higher interpretability as well as provides more useful information.
Data availability
Edge clustering, which is what this method is, makes it hard to use on
large networks because it needs powerful hardware.
No data was used for the research described in the article.
6. Conclusion
References
Considering the complexity of existing multi-label classification al Abdullahi, A., Noor, A.S., Mohd, H.A., Shamsul, K.A.K., Riswan, E., 2021. Multi-label
gorithms, feature extraction and model training are conducted in linear classification approach for quranic verses labeling. <sb:contribution><sb:
time in this paper. Due to the limitations of existing multi-label classi title>Indones. J. Electric. Eng. Comput.</sb:title></sb:contribution><sb:
host><sb:issue><sb:series><sb:title>Sci.</sb:title></sb:series></sb:issue></
fication algorithms for characterization of users, the present study ex
sb:host> 24 (1), 484–490.
tracts user relationship features and user entity relationship features in Agarwa, M., Rao, K.K., Vaidya, K., Bhattacharya, S., 2021. ML-MOC: machine learning
multi-label scenarios using heterogeneous networks. In situations in (kNN and GMM) based membership determination for open clusters. Mon. Not. R.
which users belong to different topics, these features can be effective in Astron. Soc. 502 (2), 2582–2599.
Ahn, Y.Y., Bagrow, J.P., Lehmann, S., 2010. Link communities reveal multi-scale
describing these situations. In addition, since users with common beliefs complexity in networks. Nature 466 (7307), 761–764.
and hobbies will contact each other, it will be possible to divide users Douglas, H.S., Erick, G.M., Muhammad, S., Renata, L.R., Juan, C.S., Demostenes, Z.R.,
with common themes into one community. In order to accomplish this, Kostromitin, K., 2022. Igorevich. Big data analytics for critical information
classification in online social networks using classifier chains. In: Peer-to-Peer
overlapping communities will be detected. In the event that a user be Networking And Applications, 15 (1), pp. 626–641.
longs to more than one community, the user will be divided into mul Gregory, S., 2007. An algorithm to find overlapping community structure in networks.
tiple communities at the same time. Therefore, the classification results In: European Conference on Principles of Data Mining And Knowledge Discovery,
Berlin, Heidelberg, pp. 91–102.
of overlapping communities can be used as the community character Guoqiang, W., Ruobing, Z., Yingjie, T., Dalian, L., 2020. Joint ranking SVM and binary
istics of users to identify which topics a user belongs to in the most direct relevance with robust low-rank learning for multi-label classification. Neural Netw.
manner. The method can also effectively describe the topic similarity 122, 24–39.
Guoqiang, W., Ruobing, Z., Yingjie, T., Dalian, L., 2022. Joint ranking SVM and binary
between two users without a direct connection. By comparing the pro relevance with robust low-rank learning for multi-label classification. Neural Netw.
posed user classification method with existing user classification 122, 24–39.
methods, it is apparent that the proposed method is capable of signifi Hadi, S., Kusprasapta, M., 2021. Multi-label classification using problem transformation
approach and machine learning on text mining for multiple event detection. In:
cantly improving social media user classification.
Cyber Physical, Computer And Automation System, 1291, pp. 91–105.
Despite the fact that the heterogeneous network-based social media Helena, E., Rainer, S., Anastasios, T., 2021. Automated measuring of engineering
user classification algorithm proposed in this paper improves classifi progress based on ML algorithms. Procedia CIRP 99, 627–632.
cation performance to some extent, it also reveals some shortcomings. Jiarong, W., Jun, F., Xia, S., Sushing, C., Bo, C., 2014. Simplified constraints rank-SVM
for multi-label classification. Pattern Recogn. 483, 229–236.
Future studies can improve on the following aspects: Kassim, T., Shajee, M.B.S., Ahammed, M.K.V., 2021. Modified ML-kNN and rank SVM for
Firstly, as part of feature extraction based on heterogeneous multi-label pattern classification. J. Phys. Conf. Ser. 1921 (1), 12027.

9
A. Huang et al. Technological Forecasting & Social Change 188 (2023) 122271

L., T.C., Rokach, B.S., 2010. Identification of label dependencies for multi-label Rui Xu. E-mail: [email protected]. Detailed-Address: 666#,
classification. In: Working Notes of the Second International Workshop on Learning Changhui Road, Dant District, Zhenjiang City, China, Postcode
from Multi-Label Data, Haifa, Israel, pp. 53–60. 212100. Degree: Master. Affiliation: School of Economics and
Lu, M., Zhang, Z., Qu, Z., 2018. LPANNI: overlapping community detection using label Management, Jiangsu University of Science and Technology.
propagation in large-scale complex networks. IEEE Trans. Knowl. Data Eng. 31 (9), Expertise: Financial and regional economic development. Rui
1736–1749. Xu, graduated from Nanjing Institute of Technology, majoring
Lughofer, Edwin, 2022. Evolving multi-label fuzzy classifier. Inf. Sci. 597, 1–23. in material Mechanics, will enter the School of Economics and
M., L.Z., Zhou, Z.H., 2007. ML-KNN: a lazy learning approach to multi-label learning. Management, Jiangsu University of Science and Technology for
Pattern Recognit. 40 (7), 2038–2048. a master's degree in economics in 2020. He applied system
Montañes, E., Senge, R., Barranquero, J., et al., 2014. Dependent binary relevance simulation and other foundations in materials major to the
models for multi-label classification. Pattern Recogn. 47 (3), 1494–1508. development of finance and regional economics. In 2021, his
Newman, M.E.J., Girvan, M., 2004. Finding and evaluating community structure in thesis “Research on Micro-credit Practice Mode Innovation and
networks. Phys. Rev.E 69 (2), 026113. Poverty Alleviation Efficiency” won the Excellence Award of
Nitin, K.M., Pramod, K.S., 2022. Linear ordering. Problem based Classifier Chain using Jiangsu Graduate Forum.
Genetic Algorithm for multi-label classification. Appl. Soft Comput. 117, 108395.
Noor, S., Guo, Y., Shah, S.H.H., Nawaz, M.S., Butt, A.S., 2020. Research synthesis and
thematic analysis of twitter through bibliometric analysis. Int. J. Semant. Web Inf.
Yu Chen. E-mail: [email protected]. Detailed-
Syst. 16 (3), 88–109.
Address: 100#, Outside the West Road, University town,
Palla, G., Derényi, I.I.F., 2005. Uncovering the overlapping community structure of
Panyu District, Guangzhou City, Guangdong Province, China,
complex networks in nature and society. Nature 435 (7043), 814–818.
Postcode 510006. Degree: Master. Affiliation: School of Elec
Raghavan, U.N., Albert, R., Kumara, S., 2007. Near linear time algorithm to detect
tromechanical Engineering, Guangdong University of Tech
community structures in large-scale networks. Phys. Rev. E. 76 (3), 036106.
nology. Expertise: System simulation and financial innovation.
Rosvall, M., B., C.T., 2008. Maps of random walks on complex networks reveal
Yu Chen is currently studying in the School of Mechanical and
community structure. Proc. Natl. Acad. Sci. 105 (4), 1118–1123.
Electrical Engineering of Guangdong University of Technology
Santos, Dos, Piwowarski, B., Gallinari, P., 2016. Multi-label classification on
for a master's degree and a third-year master's degree. The
heterogeneous graphs with Gaussian embeddings. In: Joint European Conference on
research direction is to use System simulation and financial
Machine Learning and Knowledge Discovery in Databases, Riva del Garda, Italy,
innovation. During the first year, he completed the academic
pp. 606–622.
course and studied my major and finance papers through the
Tang, L., L., H., 2009. Scalable learning of collective behavior based on sparse social
Internet. During the second year, he has been actively involved
dimensions. In: Proceedings of the 18th ACM Conference on Information And
in laboratory experiments and testing work. At present, the theory and process research on
Knowledge Management, Hong Kong, China, pp. 1107–1116.
the parts of aircraft engine guide has been completed. During the school years, he won a
Victor, O., Mazvita, M., Elvira, S., Wenlong, C.C., 2022. Identification of malignancies
university-level scholarship, and participated in the publication of an SCI paper and a
from free-text histopathology reports using a multi-model supervised machine
journal as a party member, and became a CPC member.
learning approach. Information 11 (9), 455.
Wang, X., Sukthankar, G., 2013. Multi-label relational neighbor classification using
social context features. In: Proceedings of the 19th ACM SIGKDD International
Conference on Knowledge Discovery And Data Mining, Chicago, Illinois, USA, Meiwen Guo. E-mail: [email protected]. Detailed-Address:
pp. 464–472. No. 19, Huamei Road, Tianhe District, Guangzhou City, China,
Yap, X.H., Raymer, M., 2021. Multi-label classification and label dependence in in silico Postcode 510520. Degree: Master, PhD candidate. Affiliation:
toxicity prediction. Toxicol. In Vitro 74, 105157. Guangzhou Xinhua University. Expertise: Management science,
Yuan, L.X., Tan, S.C., Gao, P.Y., Lim, C.P., Watada, J., 2018. Fuzzy ARTMAP with binary information management, business intelligence system. Mei
relevance for multi-label classification. Intell.Decis.Technol. 73, 127–135. wen Guo, The first group of e-commerce engineers in China;
Zhongwei, S., Xiuyan, L., Keyong, H., Zhuang, L., Jing, L., 2020. An efficient muti-label Top 100 e-commerce masters in Guangdong Province; She is a
SVM classification algorithm by combing approximate extreme points method and PhD candidate in USM; and is an Associate Professor in School
divide-and-conquer strategy. IEEE Access 8, 170967–170975. of Management, Guangzhou Xinhua University; also, she is a
Zhu, X.Y., Ying, C.Z., Wang, J.Y., Li, J.X., Lai, X., Wang, G.T., 2021. Ensemble of ML-KNN Senior Research Professor, Entrepreneurship Center, Sun Yat-
for classification algorithm recommendation. Knowl.-Based Syst. 221, 106933. sen University. She has published more than 20 papers in the
field of management science and information management,
which indexed by SCI, SSCI, EI, CSSCI and other databases; and
three books and a patent. She presided over and participated in 10 projects, all of which
Anzhong Huang. E-mail: [email protected]. Detailed-Address:
have been completed; among them, some projects have won excellent awards. Speciali
666#, Changhui Road, Dant District, Zhenjiang City, China,
zation: management science and engineering, sustainable development, information
Postcode 212100. Degree: Ph.D. Affiliation: School of Eco
management, intelligent system.
nomics and Management，Jiangsu University of Science and
Technology. Expertise: Inclusive finance financial technology.
Anzhong Huang, who graduated from Nanjing University in
June 2006, received his doctorate. After graduation in 2006, he
worked in School of Economics of Anhui University of Tech
nology, and was an associate professor and master tutor. In
April 2012, he worked as a postdoctoral researcher at School of
Economics and Management, Beijing Jiaotong University. After
leaving the station in 2014, he worked as a full professor and
master tutor at Jiangsu Normal University. Since 2019, he went
to work in School of Economics and Management, Jiangsu University of Science and
Technology. He have been devoted to research inclusive finance and financial technology.
From 2006 to now, he have published 42 papers in SSCI, SCI as well as CSSCI journals as an
independent author or the first author and published a monograph.

Multilabel Classification Problem Analysis Metrics and Techniques PDF
No ratings yet
Multilabel Classification Problem Analysis Metrics and Techniques PDF
200 pages
Generative Adversarial Networks with Industrial Use Cases: Learning How to Build GAN Applications for Retail, Healthcare, Telecom, Media, Education, and HRTech
From Everand
Generative Adversarial Networks with Industrial Use Cases: Learning How to Build GAN Applications for Retail, Healthcare, Telecom, Media, Education, and HRTech
Navin K Manaswi
No ratings yet
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
Modified MLKNN Algorithm
No ratings yet
Modified MLKNN Algorithm
11 pages
Benchmarking Multi-Label Classification Algorithms
No ratings yet
Benchmarking Multi-Label Classification Algorithms
12 pages
InstructPix2Pix in Practice: The Complete Guide for Developers and Engineers
From Everand
InstructPix2Pix in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Multi-Label Class
No ratings yet
Multi-Label Class
131 pages
Multilabel Things
No ratings yet
Multilabel Things
42 pages
Multilabel Classification: Francisco Herrera Francisco Charte Antonio J. Rivera María J. Del Jesus
No ratings yet
Multilabel Classification: Francisco Herrera Francisco Charte Antonio J. Rivera María J. Del Jesus
15 pages
Comparative Analysis of Multi-Label Classification Algorithms
No ratings yet
Comparative Analysis of Multi-Label Classification Algorithms
4 pages
A Literature Survey On Algorithms For Multi-Label
No ratings yet
A Literature Survey On Algorithms For Multi-Label
26 pages
A Review On Multi-Label Learning Algorithms: Min-Ling Zhang and Zhi-Hua Zhou, Fellow, IEEE
No ratings yet
A Review On Multi-Label Learning Algorithms: Min-Ling Zhang and Zhi-Hua Zhou, Fellow, IEEE
43 pages
Multi-Label Classification With Weighted Labels Using Learning Classifier Systems
No ratings yet
Multi-Label Classification With Weighted Labels Using Learning Classifier Systems
7 pages
A Multi-Label Approach For Diagnosis Problems in Energy Systems Using LAMDA Algorithm
No ratings yet
A Multi-Label Approach For Diagnosis Problems in Energy Systems Using LAMDA Algorithm
6 pages
K-Nearest Algorithm Classification: Neighbor Based For Multi-Label
No ratings yet
K-Nearest Algorithm Classification: Neighbor Based For Multi-Label
4 pages
AReviewonMulti-Label Learning Algorithms
No ratings yet
AReviewonMulti-Label Learning Algorithms
43 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
A Review On Multi-Label Learning Algorithms: IEEE Transactions On Knowledge and Data Engineering August 2014
No ratings yet
A Review On Multi-Label Learning Algorithms: IEEE Transactions On Knowledge and Data Engineering August 2014
43 pages
Multi-Label Classification: Prof. Dr. Dr. Lars Shmidt Thieme Tutor: Leandro Marinho
No ratings yet
Multi-Label Classification: Prof. Dr. Dr. Lars Shmidt Thieme Tutor: Leandro Marinho
15 pages
Scalable and Efficient Multi-Label Classification For Evolving Data Streams
No ratings yet
Scalable and Efficient Multi-Label Classification For Evolving Data Streams
30 pages
Segment Anything Model Techniques and Applications: The Complete Guide for Developers and Engineers
From Everand
Segment Anything Model Techniques and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
VICUNA with LLaMA: Techniques and Applications: The Complete Guide for Developers and Engineers
From Everand
VICUNA with LLaMA: Techniques and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Symmetry 13 00322 v3
No ratings yet
Symmetry 13 00322 v3
20 pages
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Multi-Label Classification Via Closed Frequent Labelsets and Label Taxonomies
No ratings yet
Multi-Label Classification Via Closed Frequent Labelsets and Label Taxonomies
34 pages
Self-Supervised Learning: Teaching AI with Unlabeled Data
From Everand
Self-Supervised Learning: Teaching AI with Unlabeled Data
Robert Johnson
No ratings yet
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
From Everand
Bootstrapping Language-Image Pretraining: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Data Science: Concepts, Strategies, and Applications
From Everand
Data Science: Concepts, Strategies, and Applications
Zemelak Goraga
No ratings yet
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
From Everand
Building Support Structures, 2nd Ed., Analysis and Design with SAP2000 Software
Wolfgang Schueller
4.5/5 (15)
Extreme Learning Machine For Multi-Label Classification
No ratings yet
Extreme Learning Machine For Multi-Label Classification
12 pages
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
From Everand
Pragmatic Machine Learning with Python: Learn How to Deploy Machine Learning Models in Production
Avishek Nag
No ratings yet
A Novel Cognitive Multi-Label Classification Model
No ratings yet
A Novel Cognitive Multi-Label Classification Model
13 pages
MPT: Architecture, Training, and Applications: The Complete Guide for Developers and Engineers
From Everand
MPT: Architecture, Training, and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Addressing Imbalance Problem For Multi Label Class
No ratings yet
Addressing Imbalance Problem For Multi Label Class
19 pages
Grounded Language-Image Pre-training Approaches: The Complete Guide for Developers and Engineers
From Everand
Grounded Language-Image Pre-training Approaches: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
DINO: Self-Supervised Vision Transformers Explained
From Everand
DINO: Self-Supervised Vision Transformers Explained
William Smith
No ratings yet
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
PLDL-A Novel Method For LDL
No ratings yet
PLDL-A Novel Method For LDL
1 page
Multi-Label Classification Using Labels As Hidden Nodes
No ratings yet
Multi-Label Classification Using Labels As Hidden Nodes
34 pages
Bharat Hic ML 2010
No ratings yet
Bharat Hic ML 2010
8 pages
Reverse PRL
No ratings yet
Reverse PRL
8 pages
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Machine Learning Fundamentals: Concepts, Models, and Applications
From Everand
Machine Learning Fundamentals: Concepts, Models, and Applications
Amar Sahay
No ratings yet
Survey On Multiclass Classification Methods
No ratings yet
Survey On Multiclass Classification Methods
9 pages
Classification Clustering Recommender System
No ratings yet
Classification Clustering Recommender System
12 pages
R Journal
No ratings yet
R Journal
14 pages
Workshop Master Revealed
From Everand
Workshop Master Revealed
Anil Soni
No ratings yet
NN MLC Ecml2014 Camera Ready
No ratings yet
NN MLC Ecml2014 Camera Ready
17 pages
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
Deep Learning For Multi-Label Classification: Jesse Read, Fernando Perez-Cruz
No ratings yet
Deep Learning For Multi-Label Classification: Jesse Read, Fernando Perez-Cruz
8 pages
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Peerj Cs 2093
No ratings yet
Peerj Cs 2093
31 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Honeycomb BubbleUp for Effective Observability: The Complete Guide for Developers and Engineers
From Everand
Honeycomb BubbleUp for Effective Observability: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
From Everand
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
Shanthababu Pandian
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
2019 Sleep Apnea Detection Based On Rician Modelling of Feature Variations in Multi Band EEG
No ratings yet
2019 Sleep Apnea Detection Based On Rician Modelling of Feature Variations in Multi Band EEG
9 pages
Artificial Intelligence - Wikipedia
No ratings yet
Artificial Intelligence - Wikipedia
42 pages
Accounting Financial Fraud Text Analysis TN
No ratings yet
Accounting Financial Fraud Text Analysis TN
13 pages
Final Project List
No ratings yet
Final Project List
15 pages
2B MultiLayer Perceptron Assignment
No ratings yet
2B MultiLayer Perceptron Assignment
3 pages
Real-Time Tiny Part Defect Detection System in Manufacturing Using Deep Learning
No ratings yet
Real-Time Tiny Part Defect Detection System in Manufacturing Using Deep Learning
14 pages
PDF Proceedings of The Fourteenth International Conference On Management Science and Engineering Management: Volume 1 Jiuping Xu Download
100% (2)
PDF Proceedings of The Fourteenth International Conference On Management Science and Engineering Management: Volume 1 Jiuping Xu Download
55 pages
Synopsis - Diabetes Prediction
No ratings yet
Synopsis - Diabetes Prediction
28 pages
Chapter 1 From Book Duda
No ratings yet
Chapter 1 From Book Duda
19 pages
Spam Detection Synopsis
No ratings yet
Spam Detection Synopsis
8 pages
Studio 9 Questions
No ratings yet
Studio 9 Questions
6 pages
BigData&Analytics Module5
No ratings yet
BigData&Analytics Module5
21 pages
120CS0121 - R - B Naga Pravallika
No ratings yet
120CS0121 - R - B Naga Pravallika
15 pages
AIMl Syllabus
No ratings yet
AIMl Syllabus
4 pages
FIFA Video Game - Players Classification
No ratings yet
FIFA Video Game - Players Classification
26 pages
Concept of A Technology Classification For Country Comparisons
No ratings yet
Concept of A Technology Classification For Country Comparisons
15 pages
Code Mania 2019
No ratings yet
Code Mania 2019
3 pages
CS 4650/7650: Natural Language Processing: Neural Text Classification
No ratings yet
CS 4650/7650: Natural Language Processing: Neural Text Classification
85 pages
AI For AI Using AI Methods For Classifyi
No ratings yet
AI For AI Using AI Methods For Classifyi
14 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
8 pages
An Introduction To Bayesian Methods For Analyzing Chemistry Data Part 2 PDF
No ratings yet
An Introduction To Bayesian Methods For Analyzing Chemistry Data Part 2 PDF
10 pages
Skin Cancer Detection
No ratings yet
Skin Cancer Detection
16 pages
A Review of Feature Selection Methods With Applications
No ratings yet
A Review of Feature Selection Methods With Applications
6 pages
SC Exp1 Shruti
No ratings yet
SC Exp1 Shruti
7 pages
A Complete Tutorial On Tree Based Modeling From Scratch (In R & Python) PDF
No ratings yet
A Complete Tutorial On Tree Based Modeling From Scratch (In R & Python) PDF
28 pages
Machine Learning Interview Questions
100% (1)
Machine Learning Interview Questions
4 pages
Applied Machine Learning
No ratings yet
Applied Machine Learning
49 pages
Lect 2 Common Architectural Principles of Deep Networks
No ratings yet
Lect 2 Common Architectural Principles of Deep Networks
20 pages
Emerging Artificial Intelligence Applications in Computer Engineering Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies 1st edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 978-1586037802 - Download the full ebook now to never miss any detail
100% (9)
Emerging Artificial Intelligence Applications in Computer Engineering Real Word AI Systems With Applications in EHealth, HCI, Information Retrieval and Pervasive Technologies 1st edition by Ilias Maglogiannis, Kostas Karpouzis, Manolis Wallace, John Soldatos ISBN 1586037803 978-1586037802 - Download the full ebook now to never miss any detail
82 pages
Pattern Recognition Handwritten Notes
No ratings yet
Pattern Recognition Handwritten Notes
64 pages

Classification Multi

Uploaded by

Classification Multi

Uploaded by

Technological Forecasting & Social Change 188 (2023) 122271

Contents lists available at ScienceDirect

Technological Forecasting & Social Change

Research on multi-label user classification of social media based on

1. Introduction 1.1. Multi-label classification algorithms

cation model. As a result of the algorithm, each label is considered ( ⃒ ) ( ∑ )

A multi-label KNN algorithm, known as ML-KNN, is an effective al­

Fig. 1. The flow chart.

Fig. 2. Schematic diagram of extraction.

θd = Dirichlet(α)d = 1, 2, …, M (3) ∑ w(x, y) + w(x, u) + w(y, u)

bution. In other words, for any topic K, its word distribution

The user relationship The user relationship features of 8

Fig. 3. Discriminating and undiscriminating neighbor node types.

Fig. 4. Performance analysis under different characteristics.

Fig. 5. Evaluation results of different multi-label classification algorithms.

You might also like

A multi-label KNN algorithm, known as ML-KNN, is an effective al