Parallel_Computing_for_Machine_Learning_in_Social_Network_Analysis
Parallel_Computing_for_Machine_Learning_in_Social_Network_Analysis
George Cybenko
Thayer School of Engineering
Dartmouth College
Hanover NH 03755 USA
Email: [email protected]
I. I NTRODUCTION
In this decade, we are witnessing the rapid maturation of
four major computing trends, namely
• Ubiquitous social media and networking together with
associated network science;
• Machine learning at scale, and;
• Commodity parallel, distributed and cloud computing.
This paper discusses a possible way in which these maturing
technologies can converge and how that convergence can
shape parallel computing in the future.
We briefly review progress and trends in social sys-
tems and network science, machine learning and paral- Figure 1. 1st Generation of machine learning for social network modeling,
lel/distributed computing before proposing some projections. analysis and prediction. This is currently the processing being done and
In particular, we envision three distinct phases of the con- involves separate machine learning of agent profiles/behaviors and network
structures.
vergence and maturation process that can be summarized as
follows:
1st Generation
2nd Generation
The first step in convergence has involved the separate
The second step in convergence involves the application
application of parallelized advanced machine learning
of parallelized advanced machine learning techniques
techniques:
to individual social agents and to social networks
– to individual social agents and; simultaneously but offline [4], [5], [6]. That is, while
– to social networks’ structures formed by the agents. individual agent models and their interrelationships are
That is, there are many examples of using machiine simultaneously being modeled and learned, the learning
learning techniques to characterize individual social is done in batch mode offline in the sense that new
network profiles and behaviors [1], [2]. Similarly, ma- transactional data such as profile updates, page visits
chine learning is being used to model social network and communications are not being integrated into the
structural properties such as link prediction [3], [4]. learned models dynamically as they occur. This step,
This step, depicted in Figure I, is being actively devel- depicted in Figure 2, has just started to be developed
oped today. today.
3rd Generation
The third step in convergence involves the application devices and locations can also be nodes in sa social network
of parallelized advanced machine learning techniques with links between nodes indicating some relationship. A
to individual social agents and to social networks both social network structure refers to the connectivity (that is,
simultaneously and online. That is, after initial models the edges or links) in the network as well as the dynamics of
of agents and their network structure are learned, those the connectivity. This can include, for example, the addition
models are updated as new data and events arrive so and deletion of links between nodes or agents in the network,
that the learning and modeling is dynamic and realtime. as well as weights or multiple possible attributes of the
This step, depicted in Figure 3 will be developed in the edges.
near future [7], [8] but because of the data, computing
After this introductory section, Section II discusses the
and operational requirements, these implementations
current status of social media and networking, Section III
may end up being done in commercial and government
discusses the status of network science, Section IV covers
settings, not academic or research ones.
large-scale machine learning and Section V provides a
review of recent developments and trends in c ommodity par-
To clarify our terminology, an individual social agent (or allel, distributed and cloud computing. Section VI synethe-
“agent” as we use below for simplicity) is a an atomic sizes these trends into a forecast of future developments
element of a social network, usually represented as a node in particularly in what we have called 3rd Generation Social
a social network. Non-human resources such as webpages, Network Machine Learning on Parallel Machines.
1465
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
II. U BIQUITOUS SOCIAL MEDIA AND NETWORKING important to note [16].
Social network analysis has been an active area of study • Facebook has over 1.86 billion active users every month
for several years, originally investigated in the social sci- representing a 17 percent increase year over year (as
ences [9], [10] on relatively small scales to allow for the of 02/01/17). This growth rate is not sustainable of
computing, data storage and data collection capabilities at course because it is higher than the growth rate of the
that time. The main scientific questions were dominated by worldwide human population.
how people in the real world structured their social and • Of the 1.86 billion active Facebook users, about 1.15
business networks, how influence could be modelled and billion are daily mobile users. A significant implication
how data, which was sparse and hard to come by, could of this fact is that the majority of mobile users can, and
support various scientific hypotheses about social systems. often do, access their social networks throughout the
An example of early analytic work was the discovery of the day and not just when at a desktop machine. This means
“small world” phenomenon and its formal analyses [11], that data about users and their network connections
[12]. is typically updated continuously during waking hours
By contrast with the early relatively small-scale work and not just during work breaks or when users are at
on social networks, today’s social networks are digitally home.
represented and vast. Table I lists the major online social • Facebook’s Like and Share Buttons are used about
networks together with their reported sizes as of February on 10 million websites daily. This observation implies
2017. According to current world population estimates [13], that the links between social agents and webpages are
the number of Facebook accounts is about one quarter of highly dynamic and algorithms that strive to relate
the world’s population today. agents by their likes and sharing behaviors must update
More recently, data has become much more widely avail- at this rate.
able thanks to networked communications, social media • Five new Facebook profiles are created every second
and data collection in general. This has allowed the field implying a highly dynamic network structure.
to investigate more sophisticated questions dealing with • About 4.5%, or 83 million, of Facebook profiles are
temporal and semantic aspects of social interaction. In fact, fake and so the challenge of filtering out duplicate or
the field has matured and grown to the point were social fake accounts is a large-scale problem.
systems, markets and networks are commonly taught at the • There are 300 million image uploads on Facebook
undergraduate level [14]. per day. As a result, machine learning approaches that
Table I
strive to cluster images or users by image content is a
S OCIAL N ETWORKS HAVE REACHED GLOBAL PROPORTIONS IN 2017 significant real-time challenge.
[15] AND CONTINUE TO GROW SIGNIFICANTLY [16] • 510,000 comments and 293,000 statuses are updated on
Facebook every minute. This is another statistic that
Social Network Size (in millions as of implies that attempts to update or perform machine
January 2017) learning in real-time need to be highly scalable.
Facebook 1,871
WhatsApp 1,000 • As of May 2013, 16 Million local business pages
Facebook 1,000 have been created. This statistic indicates that social
Messenger networks and their analyses have high commercial and
QQ 877
WeChat 846 consumer, not just social, value.
QZone 632 More nuanced and specific questions about the intent and
Instagram 600
Tumblr 550 meaning of social systems’ structures and dynamics have
Twitter 317 been investigated as well [17], [18], [19], [20], [21], [2].
Baidu Tieba 300 In many of such analyses, the dynamics are postulated,
Snapchat 300
Skype 300 hypothesized or inferred through offline analyses.
Sina Weibo 297 A key trend we are observing is that in addition to
Viber 249
LINE 217 modelling and analyzing the structural properties of so-
Pinterest 150 cial systems, which can be viewed as static on smaller
yy 122 time scales, we are now getting access to data [22] and
Linkedin 106
BBM 100 applications in which the dynamics of social systems are
Telegram 100 becoming feasible to model for economic, business and
VKontakte 90 security applications.
Vakaotalk 49
As a result, at a high level of abstraction, we can now
reasonably collect data about how nodes in a network
The following statistics about online social networks are behave, build models of the logics or automata that drive
1466
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
those nodes, and model how the ensemble of nodes in a and 1990’s to today’s powerful deep learning techniques has
networked system can and do evolve together. been remarkable.
In addition to a revolution in the underlying architectures
III. N ETWORK S CIENCE
and algorithms (as outlined in [32], [31] for example),
As a result of the increasing availability of data about so- modern machine learning has benefited greatly from freely
cial, computer/wireless, economic and biological networks, available, sophisticated software for implementing these
a specialized theory of network science has emerged over algorithms [33], [34]. Table II illustrates the dramatic growth
the past two decades [23], [24], [25], [26]. in machine learning implementations, especially using mul-
Network science theory uses concepts from graph theory, tiple cores and GPUs.
computer sciences, combinatorics, statistics and probability
among other areas to explain observed properties of net- Table II
L OPES AND R IBEIRO HAVE DOCUMENTED THE SIGNIFICANT INCREASE
works as observed in nature and technology. Of specific IN GPU IMPLEMENTATIONS OF MACHINE LEARNING ALGORITHMS
interest is that many of the core concepts have trickled down BETWEEN 2004 AND 2011. T HE ORIGINAL FIGURE AND REFERENCES
to undergraduate education [14]. TO THE DEPICTED IMPLEMENTATIONS CAN BE FOUND IN L OPES AND
R IBEIRO [35]
Much network science deals with asymtotic properties of
networks [23], [24] as the number of nodes grows to infinity. Open Source
But as we see from Table I, current social network sizes are 2008 Support Vector Machines (Cantanzaro et al.)
large enough that those asymptotic analyses might actually Genetic Algorithms (Langdon and Banzhaf)
K-Nearest Neighbor (Garcia et al.)
be applicable. 2009 Spiking Neural Networks (Nageswaran et al.)
For example, considering Facebook as a prototypical Multiple Back-Propagation, Back-Propagation
social network, if we assume that each user has about 350 (Lopes and Ribeiro)
2010 Non-negative Matrix Factorization (Lopes and
friends (this is a reasonable approximation for the average Ribeiro)
number of friends a user has [27]), then the Facebook social
network graph has about Closed Source
2004 Multilayer Perceptrons (forward phase) (Oh and
1, 800, 000, 000 × 350 = 630, 000, 000, 000 Jung)
2005 Self-Organizing Maps (Campbell et al., Luo et al.)
Genetic Algorithms (Wong et al. ,Yu et al.)
edges which means that even the largest computer today [28] Back-Propagation (two-layer) (Steinkrau et al.)
will require allocating about 180 agents to each core and 2006 Convolutional Neural Networks (Chellapilla et al.)
possibly requiring communication with as many as 180 × Spiking Neural Networks (Bernhard and Keriven)
Belief Propagation (Brunton et al., Yang et al.)
350 = 63, 000 other cores to interact with all friends of the 2007 Fuzzy ART neural networks (Martnez-Zarzuela et
180 agents managed on a single node. al.)
2008 K-Means Clustering (Shalom et al.)
IV. D EEP LEARNING Recurrent networks (Trebaticky and Pospichal)
Decision Trees and Forests (Sharp)
Just as social network analysis is a “classical” subject Neural Network based text detection (Jang et al.)
that has been recently revitalized and transformed due to linear Radial Basis Functions (Brandstetter and Ar-
data availability, advances in the underlying science, and tusi)
2009 Deep Belief Networks Sparse Coding (Raina et al.)
phenomenal increases in network and computing power, Back-Propagation (three layer) (Guzhva et al.)
machine learning and pattern recognition were active areas
of study for several decades [29] and they have also been
transformed over the past three decades by the same trends, V. PARALLEL COMPUTING
namely data availability, algorithmic innovation, software Parallel and distributed computing are now largely avail-
implementation maturity and compute power. able as commondity services through a variety of cloud
Early theoretical work on multilayered neural networks computing platforms. Perhaps most revealing about future
showed that even simple network architectures could solve trends is the emergence of dedicated platforms for dis-
many problems [30]. Among the major innovations in recent tributed and parallel computing implementations of machine
work on so-called deep learning are [31]: learning code [36], [37], [38], [39].
• the ability to effectively learn parameter values in more Graphical Processing Units are having a major impact on
complex deep networks (that is, networks with more machine learning implementations as depicted in Table ??
layers) efficiently; [40], [35]. It is interesting to note that Nvidia is making a
• the ability for such complex, deep networks to solve major investment in the design and implementation of its
real world problems using fewer overall nodes and GPUs to support machine learning applications [41].
parameters. To get a sense of the computational problems thus arising,
As a result, the evolution and impact of neural network a machine learning example using networks with 38 million
based machine learning from the early days of the 1980’s parameters have required 3 days of training time using a
1467
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
single GPU [42]. Fortunately, machine learning seems to update tens of millions of individual models (at the nodes of
scale very well with the number of processors used so that a network) and refresh the connectivity of the networks that
speedups are quite effective when using multiple core, CPUs define the interactions. Novel approaches to load balancing,
or GPUs. interprocessor communication and system adaptation will
Similar performance numbers for deep learning exist for likely be required [46].
image classification and clustering [43]. As another example, A summary of the stages of this progression is offered by
current cutting edge social networking and modeling appli- the following three generation of high performance machine
cations have used 16,000 machines [44]. Other performance learning applied to social network analysis:
results and issues are in [45]. 1st Generation
Table III The first step in convergence has involved the separate
GPU IMPLEMENTATIONS OF MACHINE LEARNING ALGORITHMS CAN application of parallelized advanced machine learning
SHOW SIGNIFICANT PERFORMANCE IMPROVEMENT OVER STANDARD
CPU IMPLEMENTATIONS . T HIS BAR GRAPH SHOWS THE NUMBER OF techniques: a.) to individual social agents and; b.) to
DATA SAMPLES THAT CAN BE PROCESSED PER SECOND USING GPU social networks’ structures formed by the agents. That
VERSUS BASELINE CPU PROCESSORS [40]. is, there are many examples of using machiine learning
techniques to characterize individual social network
Implementation CPU vs GPU Data entries processed per
second profiles and behaviors. This step was depicted in Figure
I.
Theano GPU 38310.00 2nd Generation
Theano CPU 6573.73
The second step in convergence involves the application
GPU speedup over CPU of parallelized advanced machine learning techniques to
= 5.9 individual social agents and to social networks simul-
Matlab with GPU 5864.98 taneously but offline. That is, while individual agent
GPUmat models and their interrelationships are simultaneously
Matlab CPU 4093.03 being modeled and learned, the learning is done in
GPU speedup over CPU
batch mode offline in the sense that new transactional
= 1.4 data such as profile updates, page visits and communi-
cations are not being integrated into the learned models
The specific data distributions, access requirements, com- dynamically as they occur. This step, was depicted in
munications patterns to conduct dynamic social systems Figure 2.
analysis (from learning the underlying dynamics using deep 3rd Generation
learning for example to simulating the future evolution of The third step in convergence involves the application
such systems) are quite variable and can be radically differ- of parallelized advanced machine learning techniques
ent from traditional scientific and engineering applications to individual social agents and to social networks both
for high performance computing systems. simultaneously and online. That is, after initial models
The performance requirements to support such novel of agents and their network structure are learned, those
applications remain to be fully articulated and explored [45]. models are updated as new data and events arrive so
VI. T HE F UTURE OF PARALLEL C OMPUTING IN D EEP that the learning and modeling is dynamic and realtime.
L EARNING FOR S OCIAL N ETWORK A NALYSIS This step was depicted in Figure 3 and because of
the data, computing and operational requirements, these
The main thesis of this paper is that the four major trends implementations may end up being done in commercial
in recent computing, namely and government settings, not academic or research
• Ubiquitous social media and networking;
ones.
• Network science;
• Machine learning at scale, and;
As we have seen above, social networks with 1.8 billion
• Commodity parallel, distributed and cloud computing.
users exist already, each user having on average 350 friends
(graph edges) and the properties of nodes and edges are
are converging and enabling a new class of modeling and
being updates at rates of up to 10 million websites daily,
analysis techniques that include machine learning at the
300 image uploads daily and over 800,000 comments and
node level (that is, individual agent behaviors) as well as
status updates per minute.
the networked level (that is, cooperative and competitive
dynamics among the various agents connected through a
VII. C ONCLUSION
network).
These trends suggest that a new generation of machine We have outlined several major trends in recent computing
learning systems and computing archtectures might be re- technology that are maturing quickly in this decade. These
quired to continuously assimilate data at huge rates [22], trends and challenges will define the next generation of
1468
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
Table IV
T OP 10 SUPERCOMPUTERS [28]
computing and machine learning technologies across com- [2] P. Jain, P. Kumaraguru, and A. Joshi, “@ i seek’fb. me’:
mercial, scientific and security applications. Specifically, we Identifying users across multiple online social networks,” in
Proceedings of the 22nd international conference on World
will see new solutions for ingesting large-scale data sets in Wide Web. ACM, 2013, pp. 1259–1268.
real time, learning and updating models of individual nodes
(machines, biological systems and combinations thereof), [3] D. Liben-Nowell and J. Kleinberg, “The link-prediction prob-
learning and updating the relationships and links between lem for social networks,” journal of the Association for
nodes and finally simulating the overall system to anticipate Information Science and Technology, vol. 58, no. 7, pp. 1019–
1031, 2007.
future configurations and outcomes.
[4] Y. Ding, S. Yan, Y. Zhang, W. Dai, and L. Dong, “Predicting
ACKNOWLEDGMENT the attributes of social network users using a graph-based ma-
chine learning method,” Computer Communications, vol. 73,
The author thanks Eunice Santos, John Korah, Kate Farris, pp. 3–11, 2016.
Ben Priest, Eugene Santos and the Dartmouth ENGG 177
students for discussions that shaped this paper. This work [5] J. Chayes, “Graphons and machine learning: Modeling and
estimation of sparse massive networks,” in Proceedings of the
was supported in part by the Army Research Office award 22nd ACM SIGKDD International Conference on Knowledge
W911NF-13-1-0421. Discovery and Data Mining. ACM, 2016, pp. 1–1.
1469
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
[7] R. Nishihara, P. Moritz, S. Wang, A. Tumanov, W. Paul, [23] T. G. Lewis, Network science: Theory and applications. John
J. Schleier-Smith, R. Liaw, M. I. Jordan, and I. Stoica, “Real- Wiley & Sons, 2011.
time machine learning: The missing pieces,” arXiv preprint
arXiv:1703.03924, 2017. [24] A.-L. Barabasi, “Linked: How everything is connected to
everything else and what it means,” Plume Editors, 2002.
[8] D. J. King and C. Bennett, “An investigation of two real
time machine learning techniques that could enhance the [25] B. O. Holzbauer, B. K. Szymanski, T. Nguyen, and A. Pent-
adaptability of game ai agents,” GameOn’2016 Proceedings, land, “Social ties as predictors of economic development,”
2016. in International Conference and School on Network Science.
Springer, 2016, pp. 178–185.
[9] E. Katz and P. F. Lazarsfeld, Personal Influence, The part
played by people in the flow of mass communications. Trans- [26] S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca,
action Publishers, 1966. “Network analysis in the social sciences,” science, vol. 323,
no. 5916, pp. 892–895, 2009.
[10] S. Wasserman and K. Faust, Social network analysis: Methods
and applications. Cambridge university press, 1994, vol. 8. [27] http://bigthink.com/praxis/do-you-have-too-many-facebook-
friends.
[11] J. Travers and S. Milgram, “The small world problem,”
Phychology Today, vol. 1, pp. 61–67, 1967. [28] https://www.top500.org/list/2016/11/?page=1.
[12] D. J. Watts, Small worlds: the dynamics of networks between [29] DUDA/HART, Pattern classification and scene analysis.
order and randomness. Princeton university press, 1999. John Wiley., 1973.
[21] R. Savell and G. Cybenko, “Mining for social processes in [40] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu,
intelligence data streams,” in Social Computing, Behavioral G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio,
Modeling, and Prediction. Springer, 2008, pp. 110–119. “Theano: A cpu and gpu math compiler in python,” in Proc.
9th Python in Science Conf, 2010, pp. 1–7.
[22] http://highscalability.com/blog/2010/11/4/facebook-at-13-
million-queries-per-second-recommends-minimiz.html/. [41] https://developer.nvidia.com/deep-learning.
1470
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply
[42] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper,
B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Di-
amos et al., “Deep speech 2: End-to-end speech recognition
in english and mandarin,” arXiv preprint arXiv:1512.02595,
2015.
[44] https://www.technologyreview.com/s/513696/deep-learning/.
1471
Authorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on August 25,2023 at 17:57:46 UTC from IEEE Xplore. Restrictions apply