A Guide to Python Multiprocessing and Parallel Programming

Speeding up calculations is a goal that everyone wants to achieve. What if you have a script that can run ten times faster than its current running time? In this article, we will look at Python multiprocessing and a library called multiprocessing, We will talk about what is multiprocessing, its advantages, and how to improve the running time of your Python program using parallel programming.

OK, then let’s go!

Introduction to Parallelism

Before we dive into the Python code, we need to talk about parallel computingWhich is an important concept in computer science.

Typically, when you run a Python script, your code becomes a process at some point, and the process runs on a single core of your CPU. But modern computers have more than one core, so what if you could use more cores for your calculations? It turns out that your calculations will be faster.

Let’s take this as a general principle for now, but later, in this article, we will see that it is not universally true.

Without getting into too many details, the idea behind parallelism is that you write your code in such a way that it can utilize multiple cores of the CPU.

To make things easier, let’s look at an example.

Parallel and Serial Computing

Imagine that you have a big problem to solve, and you are alone. You need to calculate the square root of eight different numbers. What do you do? Well, you don’t have much choice. You start with the first number, and you calculate the result. Then, you move on with the others.

What if you have three good math friends who are willing to help you? Each of them will calculate the square root of the two numbers, and your job becomes easier as the workload is distributed equally among your friends. This means that your problem will be resolved faster.

OK, so everything is clear? In these examples, each friend represents one core of the CPU. In the first example, the entire task is solved by you sequentially. it is called serial computing, In the second example, since you are working with a total of four cores, you are using parallel computing, Parallel computing involves the use of parallel processes or processes that are divided among multiple cores in a processor.

Models for Parallel Programming

We have established what parallel programming is, but how do we use it? Well, we said earlier that parallel computing involves the execution of multiple tasks between multiple cores of the processor, which means those tasks are executed simultaneously. There are a few questions you should consider before approaching parallelization. For example, are there any other optimizations that can speed up our calculations?

For now, let’s assume that parallelization is the best solution for you. there are mainly three model In parallel computing:

  • exactly parallel, Tasks can be run independently, and they do not need to communicate with each other.
  • shared memory parity, Processes (or threads) need to communicate, so they share a global address space.
  • Messaging, Processes need to share messages when needed.

In this article, we will describe the first model, which is also the simplest.

Python Multiprocessing: Process-Based Parallelism in Python

One way to achieve equality in Python is to use multiprocessing module, multiprocessing The module allows you to create multiple processes, each with its own Python interpreter. For this reason, Python multiprocessing accomplishes process-based parallelism.

You must have heard of other libraries, such as threadingwhich also comes with Python built-in, but there are significant differences between them. multiprocessing module creates new procedures, while threading Creates new threads.

In the next section, we’ll look at the benefits of using multiprocessing.

Benefits of using multiprocessing

Here are some benefits of multiprocessing:

  • Better CPU utilization when dealing with high CPU-intensive tasks
  • More control over the child than the thread
  • easy to code

The first benefit relates to performance. Since multiprocessing creates new processes, you can make better use of your CPU’s computational power by splitting your tasks among other cores. Most processors nowadays are multi-core processors, and if you optimize your code you can save time by solving calculations in parallel.

The second advantage sees the option of multiprocessing, which is multithreading. However threads are not processes, and have consequences. If you create a thread, it is also dangerous to kill or interrupt it as you would with a normal process. Since the comparison between multiprocessing and multithreading is not within the scope of this article, I encourage you to do some more reading on it.

The third advantage of multiprocessing is that it is fairly easy to implement, given that the task you are trying to handle is well suited to parallel programming.

Getting Started with Python Multiprocessing

We are finally ready to write some Python code!

We’ll start with a very basic example and use it to illustrate the main aspects of Python multiprocessing. In this example, we would have two processes:

  • parent process. There is only one parent process, which can have multiple children.
  • child process. It is bred by the parents. Each child can also have new children.

we are going to use child The process of performing a certain task. thus, parent can go on with its execution.

A Simple Python Multiprocessing Example

We’ll use the code for this example:

from multiprocessing import Process

def bubble_sort(array):
    check = True
    while check == True:
      check = False
      for i in range(0, len(array)-1):
        if array[i] > array[i+1]:
          check = True
          temp = array[i]
          array[i] = array[i+1]
          array[i+1] = temp
    print("Array sorted: ", array)

if __name__ == '__main__':
    p = Process(target=bubble_sort, args=([1,9,4,5,2,6,8,4],))
    p.start()
    p.join()

In this snippet, we have defined a function called bubble_sort(array), This function is a really naive implementation of the bubble sort sorting algorithm. If you don’t know what it is, don’t worry, because it’s not that important. The important thing to know is that it is a function that takes some work.

process class

From multiprocessingwe import class Process, This class represents an activity that will be run in a separate process. In fact, you can see that we’ve passed some arguments:

  • target=bubble_sortwhich means our new process will run bubble_sort Celebration
  • args=([1,9,4,52,6,8,4],)which is the array passed as an argument to the target function

Once we have created an instance for the Process class, all we need to do is start the process. This is done by writing p.start(), At this point, the process is started.

Before we exit, we have to wait for the child process to finish its computation. join() The method waits for the process to finish.

In this example, we have only created one child process. As you can guess, we can create more child process by creating more instances in it Process Class.

pool class

What if we need to create multiple processes to handle more CPU-intensive tasks? Do we always need to start over and explicitly wait for termination? The solution here is to use Pool Class.

Pool The class allows you to create a pool of worker processes, and in the following example, we’ll see how we can use it. This is our new example:

from multiprocessing import Pool
import time
import math

N = 5000000

def cube(x):
    return math.sqrt(x)

if __name__ == "__main__":
    with Pool() as pool:
      result = pool.map(cube, range(10,N))
    print("Program finished!")

In this code snippet, we have a cube(x) Function that simply takes an integer and returns its square root. easy, isn’t it?

Then, we create an instance of Pool class, without specifying any attributes. The Pool class creates one process per CPU core by default. Next, we run map method with some arguments.

map the law applies cube function for each element of the iterable we supply – which, in this case, is a list of each number 10 To N,

Its biggest advantage is that the enumeration in the list is done in parallel!

Making the Best Use of Python Multiprocessing

It is not necessary to create multiple processes and do parallel computation as compared to serial computing. For less CPU-intensive tasks, serial computation is faster than parallel computation. For this reason, it’s important to understand when you should be using multiprocessing – depending on what you’re doing.

To explain this to you, let’s look at a simple example:

from multiprocessing import Pool
import time
import math

N = 5000000

def cube(x):
    return math.sqrt(x)

if __name__ == "__main__":
    
    start_time = time.perf_counter()
    with Pool() as pool:
      result = pool.map(cube, range(10,N))
    finish_time = time.perf_counter()
    print("Program finished in  seconds - using multiprocessing".format(finish_time-start_time))
    print("---")
    
    start_time = time.perf_counter()
    result = []
    for x in range(10,N):
      result.append(cube(x))
    finish_time = time.perf_counter()
    print("Program finished in  seconds".format(finish_time-start_time))

This snippet is based on the previous example. We are solving the same problem which is computing the square root of N Numbers, but in two ways. The first one involves the use of Python multiprocessing, while the second does not. we are using perf_counter() method from time Library for measuring time performance.

On my laptop, I get this result:

> python code.py
Program finished in 1.6385094 seconds - using multiprocessing
---
Program finished in 2.7373942999999996 seconds

As you can see, there is a difference of more than a second. So in this case, multiprocessing is better.

Let’s change something in the code, like the value of N, let’s down it N=10000 And see what happens.

This is what I get now:

> python code.py
Program finished in 0.3756742 seconds - using multiprocessing
---
Program finished in 0.005098400000000003 seconds

What happened? Looks like multiprocessing is a bad choice now. Why?

The overhead introduced by splitting the computations between processes is much higher than the task solved. You can see how much of a difference there is in terms of timing performance.

conclusion

In this article, we have talked about performance optimization of Python code using Python multiprocessing.

First, we briefly introduce what parallel computing is and the main models for its use. Then, we started talking about multiprocessing and its advantages. Finally, we saw that parallelizing computations is not always the best option and multiprocessing Modules should be used to parallelize CPU-bound tasks. As always, it is a matter of considering the specific problem you are facing and evaluating the pros and cons of the different solutions.

I hope you found learning about Python multiprocessing as useful as I did.

Leave a Reply