Wednesday, 11 September 2019

Python threading vs multiprocess

Hi all,

I've been given access to a supercomputer! That's fun!

At first I thought I was doing something clever by importing the threading module into my python script, but I quickly discovered that all was not as it seemed. I'll cut to the nub of it.

I've written some code that rolls Yahtzees (5 of a kind with 6 sided dice). It's a nice simple test that generates CPU load. In the first instance I realised that my code running on a super computer using the Threading module was no faster than my local machine. This was a problem.

After some reading I discovered that actually, although Threading allowed some form of non-serial execution, it wasn't ever designed to do what I considered to be multi-threaded computation.

So I wrote a new version of the code to leverage the multiprocess library instead. All of a sudden we're actually doing some computation. Here are the tests I performed:

4 Threads on 4 Cores:


Multiprocess:


[xxxxx@xxxxx ~]$ cat results.txt
Number of yahtzees: 10000
Number of dice rolls: 3436750
Number of abandoned sets: 1990095
Total sets: 2000095
Running SLURM prolog script on xxxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:08:10 BST 2019
Job ID          : 61100
Job name        : processyatzee.sh
WorkDir         : /mainfs/home/xxxx
Command         : /mainfs/home/xxxxxx/processyatzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 4
Num of tasks    : 4
Hosts allocated : xxxxxxx
Job Output Follows ...
===============================================================================
Writing to file
==============================================================================
Running epilogue script on xxxxxxxxx.
Submit time  : 2019-09-11T10:08:06
Start time   : 2019-09-11T10:08:10
End time     : 2019-09-11T10:08:20
Elapsed time : 00:00:10 (Timelimit=00:15:00)
Job Efficiency is: 0.00%

Threading:



[xxxxx@xxxx ~]$ cat results.txt

Number of yahtzees: 10000
Number of dice rolls: 27977719
Number of abandoned sets: 10899334
Total sets: 10909334
Running SLURM prolog script on xxxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:07:35 BST 2019
Job ID          : 61098
Job name        : threadyahtzee.sh
WorkDir         : /mainfs/home/xxxxx
Command         : /mainfs/home/xxxxx/threadyahtzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 4
Num of tasks    : 4
Hosts allocated :xxxxx
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on xxxxxx
Submit time  : 2019-09-11T10:07:34
Start time   : 2019-09-11T10:07:35
End time     : 2019-09-11T10:08:48
Elapsed time : 00:01:13 (Timelimit=00:15:00)
Job Efficiency is: 38.01%

Job efficiency is interesting here. Suggesting that Threading is more efficient even though it took 7 times longer. Something to look into.


20 Threads on 4 Cores:

Multiprocess:

Number of yahtzees: 10000
Number of dice rolls: 648418
Number of abandoned sets: 1416784
Total sets: 1426784
Running SLURM prolog script on xxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:10:45 BST 2019
Job ID          : 61102
Job name        : processyatzee.sh
WorkDir         : /mainfs/home/xxxxxx
Command         : /mainfs/home/xxxxx/processyatzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 4
Num of tasks    : 4
Hosts allocated : xxxxxxxx
Job Output Follows ...
===============================================================================
Writing to file
==============================================================================
Running epilogue script on xxxxxx.
Submit time  : 2019-09-11T10:10:44
Start time   : 2019-09-11T10:10:45
End time     : 2019-09-11T10:10:55
Elapsed time : 00:00:10 (Timelimit=00:15:00)
Job Efficiency is: 0.00%

Threading: 


Number of yahtzees: 10000
Number of dice rolls: 20623770
Number of abandoned sets: 12071981
Total sets: 12081981
Running SLURM prolog script on xxxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:11:06 BST 2019
Job ID          : 61103
Job name        : threadyahtzee.sh
WorkDir         : /mainfs/home/xxxxxxxx
Command         : /mainfs/home/xxxxxx/threadyahtzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 4
Num of tasks    : 4
Hosts allocated : xxxxxx
Job Output Follows ...
===============================================================================
Running epilogue script on xxxxx.
Submit time  : 2019-09-11T10:10:50
Start time   : 2019-09-11T10:11:05
End time     : 2019-09-11T10:12:09
Elapsed time : 00:01:04 (Timelimit=00:15:00)
Job Efficiency is: 38.28%

Efficiency is still 0% for multiprocess, but it is finishing faster. Efficiency for threading is dropping off which is what you'd expect. I'm asking for something silly on a service which isn't hyper-threaded with a library that says threading but actually isn't. And if that makes sense, then nothing else will :-)

20 Threads on 20 Cores:

Process:

Number of yahtzees: 10000
Number of dice rolls: 487953
Number of abandoned sets: 1865100
Total sets: 1875100
[xxxxx@xxxxx1 ~]$ cat slurm-61106.out
Running SLURM prolog script on xxxxxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:13:34 BST 2019
Job ID          : 61106
Job name        : processyatzee.sh
WorkDir         : /mainfs/home/xxxxxx
Command         : /mainfs/home/xxxxxx/processyatzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 20
Num of tasks    : 20
Hosts allocated : xxxxx
Job Output Follows ...
===============================================================================
Writing to file
==============================================================================
Running epilogue script on xxxxxxxx.
Submit time  : 2019-09-11T10:13:33
Start time   : 2019-09-11T10:13:33
End time     : 2019-09-11T10:13:47
Elapsed time : 00:00:14 (Timelimit=00:15:00)
Job Efficiency is: 56.79%

Threading:


Number of yahtzees: 10000
Number of dice rolls: 18743572
Number of abandoned sets: 10097238
Total sets: 10107238
Running SLURM prolog script on xxxxxx.cluster.local
===============================================================================
Job started on Wed Sep 11 10:11:53 BST 2019
Job ID          : 61107
Job name        : threadyahtzee.sh
WorkDir         : /mainfs/home/xxxxx
Command         : /mainfs/home/xxxxxx/threadyahtzee.sh
Partition       : scavenger
Num hosts       : 1
Num cores       : 20
Num of tasks    : 20
Hosts allocated : xxxxxx
Job Output Follows ...
===============================================================================
==============================================================================
Running epilogue script on xxxxx.
Submit time  : 2019-09-11T10:13:44
Start time   : 2019-09-11T10:13:50
End time     : 2019-09-11T10:14:57
Elapsed time : 00:01:07 (Timelimit=00:15:00)
Job Efficiency is: 7.61%

This last is very interesting. Why would multiprocess efficiency suddenly jump to 57%? Why would threading fall to 8%? I'm launching a thread per core as per the 4 on 4 test?

The conclusion is a simple one though. If you want parallel compute, don't use the threading module. Use the multiprocess module. It actually does what you want in the first place, and it's just as easy to write for.

Thanks for reading