SlideShare a Scribd company logo
Multi-core Parallelization in Clojure -
            a Case Study

     Johann M. Kraus and Hans A. Kestler

    AG Bioinformatics and Systems Biology
  Institute of Neural Information Processing
               University of Ulm

                 29.06.2009
Outline


1. Concepts of parallel programming


2. Short introduction to Clojure


3. Multi-core parallel K-means - the case study


4. Analysis and Results


5. Summary
Parallel Programming
Definition:
Parallel programming is a form of programming where many calculations
are performed simultaneously.




•   Physical constraints prevent frequency scaling of processors


•   This led to an increasing interest in parallel hardware and parallel
    programming


•   Multi-core hardware is standard on desktop computers


•   Parallel software can use this hardware to the full capacity
•             Large problems are divided into smaller ones and the sub-
              problems are solved simultaneously


•             Speedup S is limited by the fraction of parallelizable code P

                                                   1
•             Amdahl’s law:                  S=
                                                1−P +                           P
                                                                                N

                                                         Amdahl's law
              20
              18




                                                                                Fraction of parallelizable code
              16




                                                                                               0.95 %
                                                                                               0.90 %
              14




                                                                                               0.75 %
                                                                                               0.50 %
              12
    Speedup

              10
              8
              6
              4
              2
              0




                   1   2   4   8   16   32    64   128       256        512   1024   2048   4096    8192   16384   32768   65536

                                                    Number of processors
Concepts of Parallel Programming

              Explicit vs. implicit parallelization


•   Explicitly define communication and synchronization details for
    each task:
                 • MPI
                •   Java Threads


•   Functional programming allows implicit parallelization:

                •   Parallel processing of functions

                •   Functions are free of side-effects

                •   Data is immutable
Distributed vs. local hardware


•   Master - Slave parallelization                    •     Shared memory parallelization
    (e.g. Message Passing Interface)                        (e.g. Open Multi-Processing)



                                                                   CPU
                       Master                                       0




       Slave           Slave            Slave         CPU         Shared          CPU
         0               1                2            4          Memory           1




               Slave            Slave
                 3                4                         CPU
                                                                           CPU2
                                                             3



                                        send data                                       read
                                        send result                                     write
Thread programming

•   Threads are refinements of a process that share the same memory and
    can be processed separately and simultaneously


•   Available in many languages, e.g. PThreads (C), Java Threads (Java),
    OpenMP Threads (C, Fortran)


•   Execution of threads is handled by a scheduler that manages the available
    processing time

•   Communication between
                                           new
                                                     start   runnable
                                                                            awake

    threads is faster than
    communication between
    processes




                                                                 schedule
                                                                                    waiting




•   Invoking threads is also                         end                    block
    faster than fork/join
                                        terminated           running



    processes
Concurrency control via locking and synchronizing

• Concurrency control ensures that threads can access shared memory
 without violating data integrity


• The most popular approach to concurrency is locking and synchronizing
               public c l a s s Counter {
                         private int v a l u e = 0 ;
                         public synchronized void i n c r {
                                     value = value + 1;
                         }
               }
               Counter c o u n t e r = new Counter ( ) ;
               counter . incr ( ) ;

• Problems might occur when using too many locks, too few locks, wrong
  locks, or locks in the wrong order


• Using locks can be fatally error-prone, e.g. dead-locks
Concurrency control via transactional memory


• Transactional memory offers a flexible alternative to lock-based
  concurrency control


• Functionality is analogous to controlling simultaneous access to database
  management systems


• Transactions ensure properties:
  •   Atomicity: Either all changes of a transaction occur or none do

  •   Consistency: Only valid changes are committed

  •   Isolation: No transaction sees the effect of other transactions

  •   Durability: Changes from transactions will be persistent
• Software transactional memory maps transactional memory to
 concurrency control in parallel programming

                                                                                   TIME


 :Transaction 0                       :Data                       :Transaction 1

                      get data




                                                  get data



                                              [consistent data]
                                              send modified data

                  [consistent data]
                  send modified data


                       get data


                  [consistent data]
                  send modified data
Clojure


•   Functional programming language hosted on the JVM


•   Extends the code-as-data paradigm to maps and vectors


•   Based on immutable data structures


•   Provides built-in concurrency support via software transactional
    memory


•   Completely symbiotic to Java, e.g. easy access to Java libraries


•   Platform independent
•   Java interaction
        ( import    ’ ( c e r n . j e t . random . s a m p l i n g
                        RandomSamplingAssistant ) )
        ( defn sample
          [n k]
          ( seq ( . RandomSamplingAssistant
                      ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) )


•   Dynamic typing and multi-methods

    •   An object is defined as the sum of what it can do (methods),
        rather than the sum of what it is (type hierarchy)


•   Add type hints to speed up code

        ( defn da+ [#ˆ doubles a s #ˆdoubles bs ]
          (amap a s i r e t
           (+ ( aget a s i ) ( aget bs i ) ) ) )
Transactional references and STM


•   Transactional references ensure safe coordinated synchronous
    changes to mutable storage locations


•   Are bound to a single storage location for their lifetime


•   Only allow mutation of that location to occur within transactions


•   Available operations are ref-set, alter, and commute


•   No explicit locking is required


                 ( def c o u n t e r ( r e f 0 ) )
                 ( dosync ( a l t e r c o u n t e r inc ) )
Agents

•   Agents allow independent asynchronous change of mutable
    locations

•   Are bound to a single storage location for their lifetime

•   Only allow mutation of that location to a new state to occur as a
    result of an action

•   Actions are functions that are asynchronously applied to the state
    of an Agent

•   The return value of an action becomes new state of the Agent

•   Agents are integrated with the STM
                    ( def c o u n t e r ( agent 0 ) )
                    ( send c o u n t e r inc )
Cluster analysis

•   Given a data set X compute a partition of X into k disjoint clusters C,
    such that:
                                     k
                               (1)         Ci = X
                                     i=1
                               (2) Ci = ∅ and Ci ∩ Cj = ∅


•   How many clusters are in the data set?




                   3 cluster                                9 cluster
Cluster algorithms
•   For all possible partitions evaluate the
    objective function f and search the optimum.




                                                                                                                                                                              Number of data points
                                                                                  30
•   The cardinality of the set of all possible
                                                                                                                                                                         35




                                                                                  25
                                                           Runtime (nanosecond)
                                                                                                                                                                    30




                                                                                  20
    partitions is given by:
                                                                                                                                                               25




                                                                                  15
                                                                                                                                                          20

                                                                                                                                                     15




                                                                                  10
                                      k
                                 1
                                                                                                                                                10
    Stirling numbers of                        k−i   k N
                           k
                               =            (−1)




                                                                                  5
    the second kind
                          SN                           i                                                                                    5

                                 k!                  i                                                                                  0




                                                                                  0
                                      i=0                                              0   5   10       15     20        25   30   35

                                                                                                    Number of clusters




Cluster algorithms provide a heuristic for this search:

•   Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...)

•   Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...)

•   Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...)

•   Model-based clustering, Biclustering, Semi-supervised clustering
K-means algorithm
Function KMeans

 Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d )
         k ( Number o f c l u s t e r s )

Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s )
         m: X −> C ( C l u s t e r a s s i g n m e n t s )

I n i t i a l i z e C ( e . g . random s e l e c t i o n from X)
While C h a s changed
  For e a c h x i i n X
   m( x i ) = a r g m i n j d i s t a n c e ( x i , c j )
 End
  For e a c h c j i n C
   c j = c e n t r o i d ( { x i | m( x i ) = j } )
 End
End
Cluster Validation
•   Evaluation requires repeated runs of clustering, e.g.:
       •   Resampled data sets

       •   Different parameters

•   MCA-index: mean proportion of samples being consistent over
    different clusterings
                                         k
                  M CA =     1
                             n   maxπ    i=1   |Ai ∩ Bj |
Estimation of the expected value of a validation index




                                                 1.0
Random label: randomly assign
each item to a cluster k




                                                 0.8
Random partition: choose a


                                mean mca index

                                                 0.6
random partition


                                                 0.4
Random prototype: assign each
item to its next prototype                       0.2
                                                 0.0




                                                       0   10   20             30   40   50

Mean value from 100 runs                                             cluster
Multi-core K-means with Clojure
•   Split the data set into smaller pieces that are handled by agents

•   Each cluster is represented by an agent

•   Add a commutative list of cluster members within a transactional
    reference to accelerate the centroid update step



                       Data       Data      Data      Data              Data
                      Agent 0    Agent 1   Agent 2   Agent 3           Agent n




                                                                   Member
           Cluster                                                  Ref 0
           Agent 0



                     Cluster                                      Member
                     Agent 1                                       Ref 1




                                Cluster                  Member
                                Agent k                   Ref k
                                                                                 read

                                                                                 write
simultaneous read



      Cluster                               Data
      Agent 0                              Agent 0


            Cluster                                        Data
            Agent 1                                       Agent 1




  Cluster
  Agent k                                       Data
                                               Agent n




                      simultaneous write



                                            Data
 Member
                                           Agent 0
  Ref 0


                                                          Data
         Member
                                                         Agent 1
          Ref 1



                                            Data
                                           Agent n
Member
 Ref 2
read: (nearest-cluster)

write: (commute)
       (assoc)

( defn a s s i g n m e n t [ ]
  (map #(send % update−d a t a a g e n t ) DataAgents )

( defn update−d a t a a g e n t [ d a t a p o i n t s ]
  (map update−d a t a p o i n t d a t a p o i n t s ) )

( defn update−d a t a p o i n t [ d a t a p o i n t ]
  ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ]
    ( dosync (commute ( nth MemberRefs newass )
                              conj ( : d a t a d a t a p o i n t ) ) )
    ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
Benchmark results
                          Large data sets (artificial):

                          •   Each data point is sampled from N(0,1)

                          •   Summary for 10 runs of K-means
                                   10.000 cases, 100 dimensions                                1.000.000 cases, 200 dimensions
                                            20 Cluster                                                    20 Cluster
                    150




                                                                                         450
runtime (seconds)




                                                                     runtime (minutes)
                    100




                                                                                         300
                                                                                         150
                    50
                    0




                                                                                         0




                              ParaKMeans    K-means R     McKmeans                                K-means R     McKmeans
•      Number of computer cores used                                    •   Number of data agents used
                                       100.000 x 500                                                    100.000 x 500
                                         20 cluster                                                       20 cluster




                                                                                      800
                    1500




                                                                                      600
runtime (seconds)




                                                                  runtime (seconds)
                    1000




                                                                                      400
                    500




                                                                                      200
                    0




                                                                                      0




                               1              4               8                                 4      6                8     10

                                   number of computer cores                                           number of data agents
Large data sets with cluster structure


                           •      Data sampled from a multi-variate normal distribution

                           •      100000 samples, 200/500 dimensions, 10/20 cluster

                                                 K-means R                                        McKmeans
                    2000
                    1500
runtime (seconds)
                    1000
                    500
                    0




                               200 / 10   200 / 20   500 / 10   500 / 20   200 / 10    200 / 20      500 / 10   500 / 20

                                                        Number of samples / Number of clusters
Accuracy compared to the known grouping of data


                         •    Measured with the MCA index

                         •    Red bars indicate the random-prototype baseline

               100.000 x 200            100.000 x 200          100.000 x 500          100.000 x 500
                10 cluster               20 cluster             10 cluster             20 cluster
     1.0
     0.8




                _            _                                                         _          _
                                         _          _           _          _
MCA index
0.4    0.6
     0.2
     0.0




              McKmeans   K-means R     McKmeans   K-means R   McKmeans   K-means R   McKmeans   K-means R
Real world data set

                           •     Microarray data (Radiation-induced changes in
                                 human gene expression)

                           •     22277 samples (genes) and 465 features (profiles)
                                              K-means R                                                       McKmeans
                    350
runtime (seconds)

                    250
                    150
                    50
                    0




                          2 Cluster    5 Cluster      10 Cluster      20 Cluster       2 Cluster       5 Cluster       10 Cluster      20 Cluster


                                                                         Number of clusters
    Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
Application to Cluster Number Estimation
•   Repeated clustering with different subsets of data


•   Repeated for different number of clusters k


•   Most stable clustering is produced for the ‘real’ cluster number

•   Jackknife resampling




                                                 1.0
•                                                      _ _ _ _

                                                 0.8
    Evaluation with MCA index
                                                               _ _
                                                 0.6
•   Data set:100000 samples,         MCA index



    100 features, 3 cluster
                                                 0.4




•
                                                 0.2




    10 runs per cluster number
                                                 0.0




•   49.26 minutes on dual-quad                         2   3     4           5      6   7

    core 3.2 GHz                                               number of clusters
Java GUI
( import       ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton )
               ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r )
               ’ ( j a v a . awt GridLayout ) )

( let     [ frame ( new JFrame ” H e l l o , World ! ” )
            h e l l o b u t t o n ( new JButton ” Say h e l l o ” )
            h e l l o l a b e l ( new J L a b e l ” ” ) ]
        ( . h e l l o button
                ( addActionListener
                     ( proxy [ A c t i o n L i s t e n e r ] [ ]
                              ( actionPerformed [ evt ]
                                          ( . hello label
                                                ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) )
        ( d o t o frame
                              ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) )
                              ( . add h e l l o b u t t o n )
                              ( . add h e l l o l a b e l )
                              ( . s e t S i z e 300 8 0 )
                              ( . s e t V i s i b l e true )))
Multi-core Parallelization in Clojure - a Case Study
Summary

•   Writing parallel programs usually requires a careful software design
    and a deep knowledge about thread-safe programming


•   Concurrency control via transactional memory circumvents
    problems of lock-based concurrency strategies


•   Immutable data structures play a key role to software transactional
    memory


•   Clojure combines Lisp, Java and a powerful STM system


•   This enables fast parallelization of algorithms, even for rapid
    prototyping


•   Our simulations show a good performance of the parallelized code
Thank you for your attention.
Statistical computing library


• http://wiki.github.com/liebke/incanter
• Clojure-based statistical computing
• R-like semantics
• COLT library for numerical computation
• JFreeChart library for graphics

More Related Content

PPTX
FlowER Erlang Openflow Controller
Holger Winkelmann
 
PDF
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
PDF
Thousands of Threads and Blocking I/O
George Cao
 
PDF
mSwitch: A Highly-Scalable, Modular Software Switch
micchie
 
PPT
No Heap Remote Objects for Distributed real-time Java
Universidad Carlos III de Madrid
 
PDF
Cpu Caches
shinolajla
 
PDF
Cacheconcurrencyconsistency cassandra svcc
srisatish ambati
 
PDF
Theta and the Future of Accelerator Programming
inside-BigData.com
 
FlowER Erlang Openflow Controller
Holger Winkelmann
 
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
Thousands of Threads and Blocking I/O
George Cao
 
mSwitch: A Highly-Scalable, Modular Software Switch
micchie
 
No Heap Remote Objects for Distributed real-time Java
Universidad Carlos III de Madrid
 
Cpu Caches
shinolajla
 
Cacheconcurrencyconsistency cassandra svcc
srisatish ambati
 
Theta and the Future of Accelerator Programming
inside-BigData.com
 

What's hot (19)

PDF
Building high traffic http front-ends. theo schlossnagle. зал 1
rit2011
 
PPT
Simple asynchronous remote invocations for distributed real-time Java
Universidad Carlos III de Madrid
 
PDF
Recent advance in netmap/VALE(mSwitch)
micchie
 
PPT
Hs java open_party
Open Party
 
PPTX
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Cheng-Chun William Tu
 
PPT
2011.jtr.pbasanta.
Universidad Carlos III de Madrid
 
PDF
Userspace networking
Stephen Hemminger
 
PPTX
DevOops - Lessons Learned from an OpenStack Network Architect
James Denton
 
PDF
Training Slides: Basics 102: Introduction to Tungsten Clustering
Continuent
 
PDF
Performance challenges in software networking
Stephen Hemminger
 
PDF
What's New and Upcoming in HDFS - the Hadoop Distributed File System
Cloudera, Inc.
 
PDF
General Purpose GPU Computing
GlobalLogic Ukraine
 
PDF
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Sivadon Chaisiri
 
PPT
Jvm Performance Tunning
guest1f2740
 
PDF
Methods of NoSQL database systems benchmarking
Транслируем.бел
 
PDF
Prerequisite knowledge for shared memory concurrency
Viller Hsiao
 
PPT
Enhancing the region model of RTSJ
Universidad Carlos III de Madrid
 
Building high traffic http front-ends. theo schlossnagle. зал 1
rit2011
 
Simple asynchronous remote invocations for distributed real-time Java
Universidad Carlos III de Madrid
 
Recent advance in netmap/VALE(mSwitch)
micchie
 
Hs java open_party
Open Party
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Cheng-Chun William Tu
 
Userspace networking
Stephen Hemminger
 
DevOops - Lessons Learned from an OpenStack Network Architect
James Denton
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Continuent
 
Performance challenges in software networking
Stephen Hemminger
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
Cloudera, Inc.
 
General Purpose GPU Computing
GlobalLogic Ukraine
 
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Sivadon Chaisiri
 
Jvm Performance Tunning
guest1f2740
 
Methods of NoSQL database systems benchmarking
Транслируем.бел
 
Prerequisite knowledge for shared memory concurrency
Viller Hsiao
 
Enhancing the region model of RTSJ
Universidad Carlos III de Madrid
 
Ad

Similar to Multi-core Parallelization in Clojure - a Case Study (20)

PPTX
Coding For Cores - C# Way
Bishnu Rawal
 
PPTX
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
PDF
gevent at TellApart
TellApart
 
PDF
Gevent at TellApart
Kevin Ballard
 
PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PDF
STORMPresentation and all about storm_FINAL.pdf
ajajkhan16
 
PPT
Lecture 2
Mr SMAK
 
ODP
Concept of thread
Munmun Das Bhowmik
 
PDF
Simon Peyton Jones: Managing parallelism
Skills Matter
 
PDF
Peyton jones-2011-parallel haskell-the_future
Takayuki Muranushi
 
PPS
Storm presentation
Shyam Raj
 
PDF
Building a Database for the End of the World
jhugg
 
PPTX
Multi core programming 2
Robin Aggarwal
 
PDF
Scaling up java applications on windows
Juarez Junior
 
PPTX
VTU 6th Sem Elective CSE - Module 3 cloud computing
Sachin Gowda
 
PDF
Scalability, Availability & Stability Patterns
Jonas Bonér
 
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
PDF
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
PPT
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
PPTX
Distributed Model Validation with Epsilon
Sina Madani
 
Coding For Cores - C# Way
Bishnu Rawal
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Sachintha Gunasena
 
gevent at TellApart
TellApart
 
Gevent at TellApart
Kevin Ballard
 
Building Big Data Streaming Architectures
David Martínez Rego
 
STORMPresentation and all about storm_FINAL.pdf
ajajkhan16
 
Lecture 2
Mr SMAK
 
Concept of thread
Munmun Das Bhowmik
 
Simon Peyton Jones: Managing parallelism
Skills Matter
 
Peyton jones-2011-parallel haskell-the_future
Takayuki Muranushi
 
Storm presentation
Shyam Raj
 
Building a Database for the End of the World
jhugg
 
Multi core programming 2
Robin Aggarwal
 
Scaling up java applications on windows
Juarez Junior
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
Sachin Gowda
 
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
Distributed Model Validation with Epsilon
Sina Madani
 
Ad

More from elliando dias (20)

PDF
Clojurescript slides
elliando dias
 
PDF
Why you should be excited about ClojureScript
elliando dias
 
PDF
Functional Programming with Immutable Data Structures
elliando dias
 
PPT
Nomenclatura e peças de container
elliando dias
 
PDF
Geometria Projetiva
elliando dias
 
PDF
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
PDF
Javascript Libraries
elliando dias
 
PDF
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
PDF
Ragel talk
elliando dias
 
PDF
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
PDF
Introdução ao Arduino
elliando dias
 
PDF
Minicurso arduino
elliando dias
 
PDF
Incanter Data Sorcery
elliando dias
 
PDF
Rango
elliando dias
 
PDF
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
PDF
The Digital Revolution: Machines that makes
elliando dias
 
PDF
Hadoop + Clojure
elliando dias
 
PDF
Hadoop - Simple. Scalable.
elliando dias
 
PDF
Hadoop and Hive Development at Facebook
elliando dias
 
PDF
From Lisp to Clojure/Incanter and RAn Introduction
elliando dias
 
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
elliando dias
 
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
elliando dias
 
Geometria Projetiva
elliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
elliando dias
 
Javascript Libraries
elliando dias
 
How to Make an Eight Bit Computer and Save the World!
elliando dias
 
Ragel talk
elliando dias
 
A Practical Guide to Connecting Hardware to the Web
elliando dias
 
Introdução ao Arduino
elliando dias
 
Minicurso arduino
elliando dias
 
Incanter Data Sorcery
elliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
elliando dias
 
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop + Clojure
elliando dias
 
Hadoop - Simple. Scalable.
elliando dias
 
Hadoop and Hive Development at Facebook
elliando dias
 
From Lisp to Clojure/Incanter and RAn Introduction
elliando dias
 

Recently uploaded (20)

PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PPTX
Stamford - Community User Group Leaders_ Agentblazer Status, AI Sustainabilit...
Amol Dixit
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Software Development Company | KodekX
KodekX
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
Stamford - Community User Group Leaders_ Agentblazer Status, AI Sustainabilit...
Amol Dixit
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Software Development Company | KodekX
KodekX
 
Doc9.....................................
SofiaCollazos
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 

Multi-core Parallelization in Clojure - a Case Study

  • 1. Multi-core Parallelization in Clojure - a Case Study Johann M. Kraus and Hans A. Kestler AG Bioinformatics and Systems Biology Institute of Neural Information Processing University of Ulm 29.06.2009
  • 2. Outline 1. Concepts of parallel programming 2. Short introduction to Clojure 3. Multi-core parallel K-means - the case study 4. Analysis and Results 5. Summary
  • 3. Parallel Programming Definition: Parallel programming is a form of programming where many calculations are performed simultaneously. • Physical constraints prevent frequency scaling of processors • This led to an increasing interest in parallel hardware and parallel programming • Multi-core hardware is standard on desktop computers • Parallel software can use this hardware to the full capacity
  • 4. Large problems are divided into smaller ones and the sub- problems are solved simultaneously • Speedup S is limited by the fraction of parallelizable code P 1 • Amdahl’s law: S= 1−P + P N Amdahl's law 20 18 Fraction of parallelizable code 16 0.95 % 0.90 % 14 0.75 % 0.50 % 12 Speedup 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 Number of processors
  • 5. Concepts of Parallel Programming Explicit vs. implicit parallelization • Explicitly define communication and synchronization details for each task: • MPI • Java Threads • Functional programming allows implicit parallelization: • Parallel processing of functions • Functions are free of side-effects • Data is immutable
  • 6. Distributed vs. local hardware • Master - Slave parallelization • Shared memory parallelization (e.g. Message Passing Interface) (e.g. Open Multi-Processing) CPU Master 0 Slave Slave Slave CPU Shared CPU 0 1 2 4 Memory 1 Slave Slave 3 4 CPU CPU2 3 send data read send result write
  • 7. Thread programming • Threads are refinements of a process that share the same memory and can be processed separately and simultaneously • Available in many languages, e.g. PThreads (C), Java Threads (Java), OpenMP Threads (C, Fortran) • Execution of threads is handled by a scheduler that manages the available processing time • Communication between new start runnable awake threads is faster than communication between processes schedule waiting • Invoking threads is also end block faster than fork/join terminated running processes
  • 8. Concurrency control via locking and synchronizing • Concurrency control ensures that threads can access shared memory without violating data integrity • The most popular approach to concurrency is locking and synchronizing public c l a s s Counter { private int v a l u e = 0 ; public synchronized void i n c r { value = value + 1; } } Counter c o u n t e r = new Counter ( ) ; counter . incr ( ) ; • Problems might occur when using too many locks, too few locks, wrong locks, or locks in the wrong order • Using locks can be fatally error-prone, e.g. dead-locks
  • 9. Concurrency control via transactional memory • Transactional memory offers a flexible alternative to lock-based concurrency control • Functionality is analogous to controlling simultaneous access to database management systems • Transactions ensure properties: • Atomicity: Either all changes of a transaction occur or none do • Consistency: Only valid changes are committed • Isolation: No transaction sees the effect of other transactions • Durability: Changes from transactions will be persistent
  • 10. • Software transactional memory maps transactional memory to concurrency control in parallel programming TIME :Transaction 0 :Data :Transaction 1 get data get data [consistent data] send modified data [consistent data] send modified data get data [consistent data] send modified data
  • 11. Clojure • Functional programming language hosted on the JVM • Extends the code-as-data paradigm to maps and vectors • Based on immutable data structures • Provides built-in concurrency support via software transactional memory • Completely symbiotic to Java, e.g. easy access to Java libraries • Platform independent
  • 12. Java interaction ( import ’ ( c e r n . j e t . random . s a m p l i n g RandomSamplingAssistant ) ) ( defn sample [n k] ( seq ( . RandomSamplingAssistant ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) ) • Dynamic typing and multi-methods • An object is defined as the sum of what it can do (methods), rather than the sum of what it is (type hierarchy) • Add type hints to speed up code ( defn da+ [#ˆ doubles a s #ˆdoubles bs ] (amap a s i r e t (+ ( aget a s i ) ( aget bs i ) ) ) )
  • 13. Transactional references and STM • Transactional references ensure safe coordinated synchronous changes to mutable storage locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to occur within transactions • Available operations are ref-set, alter, and commute • No explicit locking is required ( def c o u n t e r ( r e f 0 ) ) ( dosync ( a l t e r c o u n t e r inc ) )
  • 14. Agents • Agents allow independent asynchronous change of mutable locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to a new state to occur as a result of an action • Actions are functions that are asynchronously applied to the state of an Agent • The return value of an action becomes new state of the Agent • Agents are integrated with the STM ( def c o u n t e r ( agent 0 ) ) ( send c o u n t e r inc )
  • 15. Cluster analysis • Given a data set X compute a partition of X into k disjoint clusters C, such that: k (1) Ci = X i=1 (2) Ci = ∅ and Ci ∩ Cj = ∅ • How many clusters are in the data set? 3 cluster 9 cluster
  • 16. Cluster algorithms • For all possible partitions evaluate the objective function f and search the optimum. Number of data points 30 • The cardinality of the set of all possible 35 25 Runtime (nanosecond) 30 20 partitions is given by: 25 15 20 15 10 k 1 10 Stirling numbers of k−i k N k = (−1) 5 the second kind SN i 5 k! i 0 0 i=0 0 5 10 15 20 25 30 35 Number of clusters Cluster algorithms provide a heuristic for this search: • Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...) • Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...) • Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...) • Model-based clustering, Biclustering, Semi-supervised clustering
  • 17. K-means algorithm Function KMeans Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d ) k ( Number o f c l u s t e r s ) Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s ) m: X −> C ( C l u s t e r a s s i g n m e n t s ) I n i t i a l i z e C ( e . g . random s e l e c t i o n from X) While C h a s changed For e a c h x i i n X m( x i ) = a r g m i n j d i s t a n c e ( x i , c j ) End For e a c h c j i n C c j = c e n t r o i d ( { x i | m( x i ) = j } ) End End
  • 18. Cluster Validation • Evaluation requires repeated runs of clustering, e.g.: • Resampled data sets • Different parameters • MCA-index: mean proportion of samples being consistent over different clusterings k M CA = 1 n maxπ i=1 |Ai ∩ Bj |
  • 19. Estimation of the expected value of a validation index 1.0 Random label: randomly assign each item to a cluster k 0.8 Random partition: choose a mean mca index 0.6 random partition 0.4 Random prototype: assign each item to its next prototype 0.2 0.0 0 10 20 30 40 50 Mean value from 100 runs cluster
  • 20. Multi-core K-means with Clojure • Split the data set into smaller pieces that are handled by agents • Each cluster is represented by an agent • Add a commutative list of cluster members within a transactional reference to accelerate the centroid update step Data Data Data Data Data Agent 0 Agent 1 Agent 2 Agent 3 Agent n Member Cluster Ref 0 Agent 0 Cluster Member Agent 1 Ref 1 Cluster Member Agent k Ref k read write
  • 21. simultaneous read Cluster Data Agent 0 Agent 0 Cluster Data Agent 1 Agent 1 Cluster Agent k Data Agent n simultaneous write Data Member Agent 0 Ref 0 Data Member Agent 1 Ref 1 Data Agent n Member Ref 2
  • 22. read: (nearest-cluster) write: (commute) (assoc) ( defn a s s i g n m e n t [ ] (map #(send % update−d a t a a g e n t ) DataAgents ) ( defn update−d a t a a g e n t [ d a t a p o i n t s ] (map update−d a t a p o i n t d a t a p o i n t s ) ) ( defn update−d a t a p o i n t [ d a t a p o i n t ] ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ] ( dosync (commute ( nth MemberRefs newass ) conj ( : d a t a d a t a p o i n t ) ) ) ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
  • 23. Benchmark results Large data sets (artificial): • Each data point is sampled from N(0,1) • Summary for 10 runs of K-means 10.000 cases, 100 dimensions 1.000.000 cases, 200 dimensions 20 Cluster 20 Cluster 150 450 runtime (seconds) runtime (minutes) 100 300 150 50 0 0 ParaKMeans K-means R McKmeans K-means R McKmeans
  • 24. Number of computer cores used • Number of data agents used 100.000 x 500 100.000 x 500 20 cluster 20 cluster 800 1500 600 runtime (seconds) runtime (seconds) 1000 400 500 200 0 0 1 4 8 4 6 8 10 number of computer cores number of data agents
  • 25. Large data sets with cluster structure • Data sampled from a multi-variate normal distribution • 100000 samples, 200/500 dimensions, 10/20 cluster K-means R McKmeans 2000 1500 runtime (seconds) 1000 500 0 200 / 10 200 / 20 500 / 10 500 / 20 200 / 10 200 / 20 500 / 10 500 / 20 Number of samples / Number of clusters
  • 26. Accuracy compared to the known grouping of data • Measured with the MCA index • Red bars indicate the random-prototype baseline 100.000 x 200 100.000 x 200 100.000 x 500 100.000 x 500 10 cluster 20 cluster 10 cluster 20 cluster 1.0 0.8 _ _ _ _ _ _ _ _ MCA index 0.4 0.6 0.2 0.0 McKmeans K-means R McKmeans K-means R McKmeans K-means R McKmeans K-means R
  • 27. Real world data set • Microarray data (Radiation-induced changes in human gene expression) • 22277 samples (genes) and 465 features (profiles) K-means R McKmeans 350 runtime (seconds) 250 150 50 0 2 Cluster 5 Cluster 10 Cluster 20 Cluster 2 Cluster 5 Cluster 10 Cluster 20 Cluster Number of clusters Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
  • 28. Application to Cluster Number Estimation • Repeated clustering with different subsets of data • Repeated for different number of clusters k • Most stable clustering is produced for the ‘real’ cluster number • Jackknife resampling 1.0 • _ _ _ _ 0.8 Evaluation with MCA index _ _ 0.6 • Data set:100000 samples, MCA index 100 features, 3 cluster 0.4 • 0.2 10 runs per cluster number 0.0 • 49.26 minutes on dual-quad 2 3 4 5 6 7 core 3.2 GHz number of clusters
  • 29. Java GUI ( import ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton ) ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r ) ’ ( j a v a . awt GridLayout ) ) ( let [ frame ( new JFrame ” H e l l o , World ! ” ) h e l l o b u t t o n ( new JButton ” Say h e l l o ” ) h e l l o l a b e l ( new J L a b e l ” ” ) ] ( . h e l l o button ( addActionListener ( proxy [ A c t i o n L i s t e n e r ] [ ] ( actionPerformed [ evt ] ( . hello label ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) ) ( d o t o frame ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) ) ( . add h e l l o b u t t o n ) ( . add h e l l o l a b e l ) ( . s e t S i z e 300 8 0 ) ( . s e t V i s i b l e true )))
  • 31. Summary • Writing parallel programs usually requires a careful software design and a deep knowledge about thread-safe programming • Concurrency control via transactional memory circumvents problems of lock-based concurrency strategies • Immutable data structures play a key role to software transactional memory • Clojure combines Lisp, Java and a powerful STM system • This enables fast parallelization of algorithms, even for rapid prototyping • Our simulations show a good performance of the parallelized code
  • 32. Thank you for your attention.
  • 33. Statistical computing library • http://wiki.github.com/liebke/incanter • Clojure-based statistical computing • R-like semantics • COLT library for numerical computation • JFreeChart library for graphics