SlideShare a Scribd company logo
Python threads: Dive into GIL!

       PyCon India 2011
       Pune, Sept 16-18



Vishal Kanaujia & Chetan Giridhar
Python: A multithreading example
Setting up the context!!
                                                      Hmm…My threads
                                                   should be twice as faster
                                                        on dual cores!




         Execution Time                                Python v2.7             Execution Time
150

100                                                     Single Core                 74 s
50                                Execution Time

 0
                                                        Dual Core                  116 s
      Single Core    Dual Core




                          57 % dip in Execution Time on dual core!!
Getting into the
problem space!!
                                Python
      Getting in to the Problem Space!!
                                  v2.7

          GIL
Python Threads
• Real system threads (POSIX/ Windows
  threads)
• Python VM has no intelligence of thread
  management
  – No thread priorities, pre-emption
• Native operative system supervises thread
  scheduling
• Python interpreter just does the per-thread
  bookkeeping.
What’s wrong with Py threads?

• Each ‘running’ thread requires exclusive
  access to data structures in Python interpreter
• Global interpreter lock (GIL) provides the
  synchronization (bookkeeping)
• GIL is necessary, mainly because CPython's
  memory management is not thread-safe.
GIL: Code Details
• A thread create request in Python is just a
  pthread_create() call
• The function Py_Initialize() creates the GIL
• GIL is simply a synchronization primitive, and can
  be implemented with a semaphore/ mutex.


• A “runnable” thread acquires this lock and start
  execution
GIL Management
• How does Python manages GIL?
  – Python interpreter regularly performs a
    check on the running thread
  – Accomplishes thread switching and signal handling
• What is a “check”?
    A counter of ticks; ticks decrement as a thread runs
    A tick maps to a Python VM’s byte-code instructions
    Check interval can be set with sys.setcheckinterval
    (interval)
    Check dictate CPU time-slice available to a thread
GIL Management: Implementation
• Involves two global varibales:
        • PyAPI_DATA(volatile int) _Py_Ticker;
        • PyAPI_DATA(int) _Py_CheckInterval;


• As soon as ticks reach zero:
   •   Refill the ticker
   •   active thread releases the GIL
   •   Signals sleeping threads to wake up
   •   Everyone competes for GIL
Two CPU bound threads on single core machine
                          GIL                             waiting for
thread1                released                              GIL


           Running                           Suspended



                           Signals thread2

             waiting                             GIL
thread2                                        acquired
             for GIL


          Suspended      Wakeup                   Running



time                    check1                                          check2
GIL impact
• There is considerable time lag with
   – Communication (Signaling)
   – Thread wake-up
   – GIL acquisition
• GIL Management: Independent of host/native OS
  scheduling
• Result                         Try Ctrl + C. Does it stop
                                              execution?
   – Significant overhead
   – Thread waits if GIL in unavailable
   – Threads run sequentially, rather than concurrently
Curious case of multicore system
                                               thread1, core0


thread

                                               thread2, core1




         time




     • Conflicting goals of OS scheduler and Python interpreter
     • Host OS can schedule threads concurrently on multi-core
     • GIL battle
The ‘Priority inversion’
• In a [CPU, I/O]-bound mixed application, I/O
  bound thread may starve!
• “cache-hotness” may influence the new GIL
  owner; usually the recent owner!
• Preferring CPU thread over I/O thread
• Python presents a priority inversion on multi-
  core systems.
Understanding the
   new story!!
                               Python
     Getting in to the Problem Space!!
                                 v3.2

         GIL
New GIL: Python v3.2
• Regular “check” are discontinued
• We have new time-out mechanism.
      • Default time-out= 5ms
      • Configurable through sys.setswitchinterval()
• For every time-out, the current GIL holder is forced to
  release GIL
• It then signals the other waiting threads
• Waits for a signal from new GIL owner
  (acknowledgement).
• A sleeping thread wakes up, acquires the GIL, and
  signals the last owner.
Curious Case of Multicore: Python v3.2
CPU Thread
  core0                      GIL                       Waiting for
                          released                      the GIL


              Running                Wait       Suspended


              Time Out

                waiting                       GIL
                for GIL                     acquired


             Suspended          Wakeup           Running




                                                                     time
CPU Thread
  core1
Positive impact with new GIL
• Better GIL arbitration
     • Ensures that a thread runs only for 5ms
• Less context switching and fewer signals
• Multicore perspective: GIL battle eliminated!
• More responsive threads (fair scheduling)
• All iz well☺
I/O threads in Python
• An interesting optimization by interpreter
  – I/O calls are assumed blocking
• Python I/O extensively exercise this
  optimization with file, socket ops (e.g. read,
  write, send, recv calls)
  ./Python3.2.1/Include/ceval.h
      Py_BEGIN_ALLOW_THREADS
          Do some blocking I/O operation ...
      Py_END_ALLOW_THREADS
• I/O thread always releases the GIL
Convoy effect: Fallout of I/O
             optimization
• When an I/O thread releases the GIL, another
  ‘runnable’ CPU bound thread can acquire it
  (remember we are on multiple cores).
• It leaves the I/O thread waiting for another
  time-out (default: 5ms)!
• Once CPU thread releases GIL, I/O thread
  acquires and releases it again
• This cycle goes on => performance suffers
Convoy “in” effect
    Convoy effect- observed in an application comprising I/O-
    bound and CPU-bound threads
                                                                               GIL
 I/O Thread                GIL released                  GIL acquired       released
    core0


             Running              Wait       Suspended    Running         Wait



             waiting for
                                          GIL acquired
                GIL



                 Suspended                   Running      Wait          Suspended
CPU Thread
  core1
                                            Time Out                         time
Performance measurements!
• Curious to know how convoy effect translates
  into performance numbers
• We performed following tests with Python3.2:
     • An application with a CPU and a I/O thread
     • Executed on a dual core machine
     • CPU thread spends less than few seconds
       (<10s)!
       I/O thread with CPU thread   I/O thread without CPU thread
              97 seconds                     23 seconds
Comparing: Python 2.7 & Python 3.2
      Python v2.7                Execution Time                    Python v3.2            Execution Time
      Single Core                     74 s                         Single Core                 55 s
       Dual Core                     116 s                          Dual Core                  65 s


         Execution Time – Single Core                                   Execution Time – Dual Core
 80                                                         150
 60
                                                            100
 40
                                      Execution Time         50                              Execution Time
 20
 0                                                             0
          v2.7            v3.2                                          v2.7       v3.2


          Execution Time – Python v3.2
  66
  64                                                                                Performance
  62
  60
                                                                                      dip still
  58
                                                       Execution Time
                                                                                    observed in
  56
  54                                                                                dual cores
  52
  50
            Single Core             Dual Core
Getting into the
problem space!!
                                 Python
      Getting in to the Problem v2.7 / v3.2
                                Space!!



          GIL
                                   Solution
                                   space!!
GIL free world: Jython
• Jython is free of GIL ☺
• It can fully exploit multiple cores, as per our
  experiments
• Experiments with Jython2.5
  – Run with two CPU threads in tandem
              Jython2.5    Execution time
             Single core        44 s
              Dual core         25 s

• Experiment shows performance improvement on a
  multi-core system
Avoiding GIL impact with multiprocessing
• multiprocessing — Process-based “threading” interface
• “multiprocessing” module spawns a new Python interpreter
  instance for a process.
• Each process is independent, and GIL is irrelevant;
    – Utilizes multiple cores better than threads.
    – Shares API with “threading” module.

  Python v2.7      Single Core   Dual Core   120

                                             100
   threading          76 s         114 s
                                             80
 multiprocessing      72 s         43 s
                                             60
                                                                             threading
                                             40                              multiprocessing
                                             20

 Cool! 40 % improvement in Execution          0

         Time on dual core!! ☺                     Single Core   Dual Core
Conclusion
• Multi-core systems are becoming ubiquitous
• Python applications should exploit this
  abundant power
• CPython inherently suffers the GIL limitation
• An intelligent awareness of Python interpreter
  behavior is helpful in developing multi-
  threaded applications
• Understand and use ☺
Questions



  Thank you for your time and attention ☺



• Please share your feedback/ comments/ suggestions to us at:
• cjgiridhar@gmail.com ,        http://technobeans.com
• vishalkanaujia@gmail.com,     http://freethreads.wordpress.com
References
• Understanding the Python GIL, http://dabeaz.com/talks.html
• GlobalInterpreterLock,
  http://wiki.python.org/moin/GlobalInterpreterLock
• Thread State and the Global Interpreter Lock,
  http://docs.python.org/c-api/init.html#threads
• Python v3.2.2 and v2.7.2 documentation,
  http://docs.python.org/
• Concurrency and Python, http://drdobbs.com/open-
  source/206103078?pgno=3
Backup slides
Python: GIL
• A thread needs GIL before updating Python objects,
  calling C/Python API functions
• Concurrency is emulated with regular ‘checks’ to switch
  threads
• Applicable to only CPU bound thread
• A blocking I/O operation implies relinquishing the GIL
   – ./Python2.7.5/Include/ceval.h
       Py_BEGIN_ALLOW_THREADS
           Do some blocking I/O operation ...
       Py_END_ALLOW_THREADS
• Python file I/O extensively exercise this optimization
GIL: Internals
• The function Py_Initialize() creates the GIL
• A thread create request in Python is just a
  pthread_create() call
• ../Python/ceval.c
• static PyThread_type_lock interpreter_lock = 0;
  /* This is the GIL */
• o) thread_PyThread_start_new_thread: we call it
  for "each" user defined thread.
• calls PyEval_InitThreads() ->
  PyThread_acquire_lock() {}
GIL: in action
• Each CPU bound thread requires GIL
• ‘ticks count’ determine duration of GIL hold
• new_threadstate() -> tick_counter
• We keep a list of Python threads and each
  thread-state has its tick_counter value
• As soon as tick decrements to zero, the
  thread release the GIL.
GIL: Details
thread_PyThread_start_new_thread() ->
void PyEval_InitThreads(void)
{
  if (interpreter_lock)
     return;
  interpreter_lock = PyThread_allocate_lock();
  PyThread_acquire_lock(interpreter_lock, 1);
  main_thread = PyThread_get_thread_ident();
}
Convoy effect: Python v2?
• Convoy effect holds true for Python v2 also
• The smaller interval of ‘check’ saves the day!
  – I/O threads don’t have to wait for a longer time (5
    m) for CPU threads to finish
  – Should choose the setswitchinterval() wisely
• The effect is not so visible in Python v2.0
Stackless Python
• A different way of creating threads:
  Microthreads!
• No improvement from multi-core perspective
• Round-robin scheduling for “tasklets”
• Sequential execution

More Related Content

PDF
Advanced Git Techniques: Subtrees, Grafting, and Other Fun Stuff
Atlassian
 
PPT
öğRenme sti̇lleri̇
seherm
 
PDF
Info Session GDSC USICT
DSCUSICT
 
PPTX
Gerenciamento de equipes no desenvolvimento de software
Roberto Brandini
 
PPT
Google chrome sandbox
Nephi Johnson
 
PDF
Python 3.5: An agile, general-purpose development language.
Carlos Miguel Ferreira
 
PPTX
multi threading
Yaswanth Babu Gummadivelli
 
PPTX
Pycon11: Python threads: Dive into GIL!
Chetan Giridhar
 
Advanced Git Techniques: Subtrees, Grafting, and Other Fun Stuff
Atlassian
 
öğRenme sti̇lleri̇
seherm
 
Info Session GDSC USICT
DSCUSICT
 
Gerenciamento de equipes no desenvolvimento de software
Roberto Brandini
 
Google chrome sandbox
Nephi Johnson
 
Python 3.5: An agile, general-purpose development language.
Carlos Miguel Ferreira
 
Pycon11: Python threads: Dive into GIL!
Chetan Giridhar
 

Similar to PyCon India 2011: Python Threads: Dive into GIL! (20)

PDF
Distributed Multi-Threading in GNU-Prolog
Nuno Morgadinho
 
KEY
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
PDF
A Better Python for the JVM
Tobias Lindaaker
 
PDF
淺談 Live patching technology
SZ Lin
 
PPTX
Optimizing Java Notes
Adam Feldscher
 
PPT
Super Fast Gevent Introduction
Walter Liu
 
PPTX
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
PPTX
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
OPNFV
 
PDF
CUG2011 Introduction to GPU Computing
Jeff Larkin
 
PPTX
Overview of python misec - 2-2012
Tazdrumm3r
 
PDF
Understanding how concurrency work in os
GenchiLu1
 
PPTX
Preempt_rt realtime patch
Emre Can Kucukoglu
 
PDF
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ihor Banadiga
 
KEY
SMP Implementation for OpenBSD/sgi [Japanese Edition]
Takuya ASADA
 
PPTX
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
PDF
Continuous Delivery the Hard Way with Kubernetes
Weaveworks
 
PPTX
2018 02 20-jeg_index
Chester Chen
 
PDF
Approaches for Power Management Verification of SOC
DVClub
 
PPT
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Hsien-Hsin Sean Lee, Ph.D.
 
PPTX
The pocl Kernel Compiler
Clay (Chih-Hao) Chang
 
Distributed Multi-Threading in GNU-Prolog
Nuno Morgadinho
 
SMP implementation for OpenBSD/sgi
Takuya ASADA
 
A Better Python for the JVM
Tobias Lindaaker
 
淺談 Live patching technology
SZ Lin
 
Optimizing Java Notes
Adam Feldscher
 
Super Fast Gevent Introduction
Walter Liu
 
Clr jvm implementation differences
Jean-Philippe BEMPEL
 
Testing, CI Gating & Community Fast Feedback: The Challenge of Integration Pr...
OPNFV
 
CUG2011 Introduction to GPU Computing
Jeff Larkin
 
Overview of python misec - 2-2012
Tazdrumm3r
 
Understanding how concurrency work in os
GenchiLu1
 
Preempt_rt realtime patch
Emre Can Kucukoglu
 
Ansible for Configuration Management for Lohika DevOps training 2018 @ Lohika...
Ihor Banadiga
 
SMP Implementation for OpenBSD/sgi [Japanese Edition]
Takuya ASADA
 
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
Continuous Delivery the Hard Way with Kubernetes
Weaveworks
 
2018 02 20-jeg_index
Chester Chen
 
Approaches for Power Management Verification of SOC
DVClub
 
Lec13 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Multicore
Hsien-Hsin Sean Lee, Ph.D.
 
The pocl Kernel Compiler
Clay (Chih-Hao) Chang
 
Ad

More from Chetan Giridhar (8)

PPTX
Rapid development & integration of real time communication in websites
Chetan Giridhar
 
PPTX
Async programming and python
Chetan Giridhar
 
PPTX
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
PDF
Fuse'ing python for rapid development of storage efficient FS
Chetan Giridhar
 
PDF
Diving into byte code optimization in python
Chetan Giridhar
 
PPTX
Testers in product development code review phase
Chetan Giridhar
 
PDF
Design patterns in python v0.1
Chetan Giridhar
 
PDF
Tutorial on-python-programming
Chetan Giridhar
 
Rapid development & integration of real time communication in websites
Chetan Giridhar
 
Async programming and python
Chetan Giridhar
 
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Fuse'ing python for rapid development of storage efficient FS
Chetan Giridhar
 
Diving into byte code optimization in python
Chetan Giridhar
 
Testers in product development code review phase
Chetan Giridhar
 
Design patterns in python v0.1
Chetan Giridhar
 
Tutorial on-python-programming
Chetan Giridhar
 
Ad

Recently uploaded (20)

PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
PPTX
C Programming Basics concept krnppt.pptx
Karan Prajapat
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
DOCX
Top AI API Alternatives to OpenAI: A Side-by-Side Breakdown
vilush
 
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
PDF
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPT
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Make GenAI investments go further with the Dell AI Factory - Infographic
Principled Technologies
 
madgavkar20181017ppt McKinsey Presentation.pdf
georgschmitzdoerner
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
How Onsite IT Support Drives Business Efficiency, Security, and Growth.pdf
Captain IT
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Enable Enterprise-Ready Security on IBM i Systems.pdf
Precisely
 
C Programming Basics concept krnppt.pptx
Karan Prajapat
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Top AI API Alternatives to OpenAI: A Side-by-Side Breakdown
vilush
 
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
CIFDAQ
 
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
L2 Rules of Netiquette in Empowerment technology
Archibal2
 

PyCon India 2011: Python Threads: Dive into GIL!

  • 1. Python threads: Dive into GIL! PyCon India 2011 Pune, Sept 16-18 Vishal Kanaujia & Chetan Giridhar
  • 3. Setting up the context!! Hmm…My threads should be twice as faster on dual cores! Execution Time Python v2.7 Execution Time 150 100 Single Core 74 s 50 Execution Time 0 Dual Core 116 s Single Core Dual Core 57 % dip in Execution Time on dual core!!
  • 4. Getting into the problem space!! Python Getting in to the Problem Space!! v2.7 GIL
  • 5. Python Threads • Real system threads (POSIX/ Windows threads) • Python VM has no intelligence of thread management – No thread priorities, pre-emption • Native operative system supervises thread scheduling • Python interpreter just does the per-thread bookkeeping.
  • 6. What’s wrong with Py threads? • Each ‘running’ thread requires exclusive access to data structures in Python interpreter • Global interpreter lock (GIL) provides the synchronization (bookkeeping) • GIL is necessary, mainly because CPython's memory management is not thread-safe.
  • 7. GIL: Code Details • A thread create request in Python is just a pthread_create() call • The function Py_Initialize() creates the GIL • GIL is simply a synchronization primitive, and can be implemented with a semaphore/ mutex. • A “runnable” thread acquires this lock and start execution
  • 8. GIL Management • How does Python manages GIL? – Python interpreter regularly performs a check on the running thread – Accomplishes thread switching and signal handling • What is a “check”? A counter of ticks; ticks decrement as a thread runs A tick maps to a Python VM’s byte-code instructions Check interval can be set with sys.setcheckinterval (interval) Check dictate CPU time-slice available to a thread
  • 9. GIL Management: Implementation • Involves two global varibales: • PyAPI_DATA(volatile int) _Py_Ticker; • PyAPI_DATA(int) _Py_CheckInterval; • As soon as ticks reach zero: • Refill the ticker • active thread releases the GIL • Signals sleeping threads to wake up • Everyone competes for GIL
  • 10. Two CPU bound threads on single core machine GIL waiting for thread1 released GIL Running Suspended Signals thread2 waiting GIL thread2 acquired for GIL Suspended Wakeup Running time check1 check2
  • 11. GIL impact • There is considerable time lag with – Communication (Signaling) – Thread wake-up – GIL acquisition • GIL Management: Independent of host/native OS scheduling • Result Try Ctrl + C. Does it stop execution? – Significant overhead – Thread waits if GIL in unavailable – Threads run sequentially, rather than concurrently
  • 12. Curious case of multicore system thread1, core0 thread thread2, core1 time • Conflicting goals of OS scheduler and Python interpreter • Host OS can schedule threads concurrently on multi-core • GIL battle
  • 13. The ‘Priority inversion’ • In a [CPU, I/O]-bound mixed application, I/O bound thread may starve! • “cache-hotness” may influence the new GIL owner; usually the recent owner! • Preferring CPU thread over I/O thread • Python presents a priority inversion on multi- core systems.
  • 14. Understanding the new story!! Python Getting in to the Problem Space!! v3.2 GIL
  • 15. New GIL: Python v3.2 • Regular “check” are discontinued • We have new time-out mechanism. • Default time-out= 5ms • Configurable through sys.setswitchinterval() • For every time-out, the current GIL holder is forced to release GIL • It then signals the other waiting threads • Waits for a signal from new GIL owner (acknowledgement). • A sleeping thread wakes up, acquires the GIL, and signals the last owner.
  • 16. Curious Case of Multicore: Python v3.2 CPU Thread core0 GIL Waiting for released the GIL Running Wait Suspended Time Out waiting GIL for GIL acquired Suspended Wakeup Running time CPU Thread core1
  • 17. Positive impact with new GIL • Better GIL arbitration • Ensures that a thread runs only for 5ms • Less context switching and fewer signals • Multicore perspective: GIL battle eliminated! • More responsive threads (fair scheduling) • All iz well☺
  • 18. I/O threads in Python • An interesting optimization by interpreter – I/O calls are assumed blocking • Python I/O extensively exercise this optimization with file, socket ops (e.g. read, write, send, recv calls) ./Python3.2.1/Include/ceval.h Py_BEGIN_ALLOW_THREADS Do some blocking I/O operation ... Py_END_ALLOW_THREADS • I/O thread always releases the GIL
  • 19. Convoy effect: Fallout of I/O optimization • When an I/O thread releases the GIL, another ‘runnable’ CPU bound thread can acquire it (remember we are on multiple cores). • It leaves the I/O thread waiting for another time-out (default: 5ms)! • Once CPU thread releases GIL, I/O thread acquires and releases it again • This cycle goes on => performance suffers
  • 20. Convoy “in” effect Convoy effect- observed in an application comprising I/O- bound and CPU-bound threads GIL I/O Thread GIL released GIL acquired released core0 Running Wait Suspended Running Wait waiting for GIL acquired GIL Suspended Running Wait Suspended CPU Thread core1 Time Out time
  • 21. Performance measurements! • Curious to know how convoy effect translates into performance numbers • We performed following tests with Python3.2: • An application with a CPU and a I/O thread • Executed on a dual core machine • CPU thread spends less than few seconds (<10s)! I/O thread with CPU thread I/O thread without CPU thread 97 seconds 23 seconds
  • 22. Comparing: Python 2.7 & Python 3.2 Python v2.7 Execution Time Python v3.2 Execution Time Single Core 74 s Single Core 55 s Dual Core 116 s Dual Core 65 s Execution Time – Single Core Execution Time – Dual Core 80 150 60 100 40 Execution Time 50 Execution Time 20 0 0 v2.7 v3.2 v2.7 v3.2 Execution Time – Python v3.2 66 64 Performance 62 60 dip still 58 Execution Time observed in 56 54 dual cores 52 50 Single Core Dual Core
  • 23. Getting into the problem space!! Python Getting in to the Problem v2.7 / v3.2 Space!! GIL Solution space!!
  • 24. GIL free world: Jython • Jython is free of GIL ☺ • It can fully exploit multiple cores, as per our experiments • Experiments with Jython2.5 – Run with two CPU threads in tandem Jython2.5 Execution time Single core 44 s Dual core 25 s • Experiment shows performance improvement on a multi-core system
  • 25. Avoiding GIL impact with multiprocessing • multiprocessing — Process-based “threading” interface • “multiprocessing” module spawns a new Python interpreter instance for a process. • Each process is independent, and GIL is irrelevant; – Utilizes multiple cores better than threads. – Shares API with “threading” module. Python v2.7 Single Core Dual Core 120 100 threading 76 s 114 s 80 multiprocessing 72 s 43 s 60 threading 40 multiprocessing 20 Cool! 40 % improvement in Execution 0 Time on dual core!! ☺ Single Core Dual Core
  • 26. Conclusion • Multi-core systems are becoming ubiquitous • Python applications should exploit this abundant power • CPython inherently suffers the GIL limitation • An intelligent awareness of Python interpreter behavior is helpful in developing multi- threaded applications • Understand and use ☺
  • 27. Questions Thank you for your time and attention ☺ • Please share your feedback/ comments/ suggestions to us at: • [email protected] , http://technobeans.com • [email protected], http://freethreads.wordpress.com
  • 28. References • Understanding the Python GIL, http://dabeaz.com/talks.html • GlobalInterpreterLock, http://wiki.python.org/moin/GlobalInterpreterLock • Thread State and the Global Interpreter Lock, http://docs.python.org/c-api/init.html#threads • Python v3.2.2 and v2.7.2 documentation, http://docs.python.org/ • Concurrency and Python, http://drdobbs.com/open- source/206103078?pgno=3
  • 30. Python: GIL • A thread needs GIL before updating Python objects, calling C/Python API functions • Concurrency is emulated with regular ‘checks’ to switch threads • Applicable to only CPU bound thread • A blocking I/O operation implies relinquishing the GIL – ./Python2.7.5/Include/ceval.h Py_BEGIN_ALLOW_THREADS Do some blocking I/O operation ... Py_END_ALLOW_THREADS • Python file I/O extensively exercise this optimization
  • 31. GIL: Internals • The function Py_Initialize() creates the GIL • A thread create request in Python is just a pthread_create() call • ../Python/ceval.c • static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */ • o) thread_PyThread_start_new_thread: we call it for "each" user defined thread. • calls PyEval_InitThreads() -> PyThread_acquire_lock() {}
  • 32. GIL: in action • Each CPU bound thread requires GIL • ‘ticks count’ determine duration of GIL hold • new_threadstate() -> tick_counter • We keep a list of Python threads and each thread-state has its tick_counter value • As soon as tick decrements to zero, the thread release the GIL.
  • 33. GIL: Details thread_PyThread_start_new_thread() -> void PyEval_InitThreads(void) { if (interpreter_lock) return; interpreter_lock = PyThread_allocate_lock(); PyThread_acquire_lock(interpreter_lock, 1); main_thread = PyThread_get_thread_ident(); }
  • 34. Convoy effect: Python v2? • Convoy effect holds true for Python v2 also • The smaller interval of ‘check’ saves the day! – I/O threads don’t have to wait for a longer time (5 m) for CPU threads to finish – Should choose the setswitchinterval() wisely • The effect is not so visible in Python v2.0
  • 35. Stackless Python • A different way of creating threads: Microthreads! • No improvement from multi-core perspective • Round-robin scheduling for “tasklets” • Sequential execution