Speeding up calculations is a goal that everyone wants to achieve. What if you have a script that can run ten times faster than its current running time? In this article, we will look at Python multiprocessing and a library called
multiprocessing, We will talk about what is multiprocessing, its advantages, and how to improve the running time of your Python program using parallel programming.
OK, then let’s go!
Introduction to Parallelism
Before we dive into the Python code, we need to talk about parallel computingWhich is an important concept in computer science.
Typically, when you run a Python script, your code becomes a process at some point, and the process runs on a single core of your CPU. But modern computers have more than one core, so what if you could use more cores for your calculations? It turns out that your calculations will be faster.
Let’s take this as a general principle for now, but later, in this article, we will see that it is not universally true.
Without getting into too many details, the idea behind parallelism is that you write your code in such a way that it can utilize multiple cores of the CPU.
To make things easier, let’s look at an example.
Parallel and Serial Computing
Imagine that you have a big problem to solve, and you are alone. You need to calculate the square root of eight different numbers. What do you do? Well, you don’t have much choice. You start with the first number, and you calculate the result. Then, you move on with the others.
What if you have three good math friends who are willing to help you? Each of them will calculate the square root of the two numbers, and your job becomes easier as the workload is distributed equally among your friends. This means that your problem will be resolved faster.
OK, so everything is clear? In these examples, each friend represents one core of the CPU. In the first example, the entire task is solved by you sequentially. it is called serial computing, In the second example, since you are working with a total of four cores, you are using parallel computing, Parallel computing involves the use of parallel processes or processes that are divided among multiple cores in a processor.
Models for Parallel Programming
We have established what parallel programming is, but how do we use it? Well, we said earlier that parallel computing involves the execution of multiple tasks between multiple cores of the processor, which means those tasks are executed simultaneously. There are a few questions you should consider before approaching parallelization. For example, are there any other optimizations that can speed up our calculations?
For now, let’s assume that parallelization is the best solution for you. there are mainly three model In parallel computing:
- exactly parallel, Tasks can be run independently, and they do not need to communicate with each other.
- shared memory parity, Processes (or threads) need to communicate, so they share a global address space.
- Messaging, Processes need to share messages when needed.
In this article, we will describe the first model, which is also the simplest.
Python Multiprocessing: Process-Based Parallelism in Python
One way to achieve equality in Python is to use multiprocessing module,
multiprocessing The module allows you to create multiple processes, each with its own Python interpreter. For this reason, Python multiprocessing accomplishes process-based parallelism.
You must have heard of other libraries, such as
threadingwhich also comes with Python built-in, but there are significant differences between them.
multiprocessing module creates new procedures, while
threading Creates new threads.
In the next section, we’ll look at the benefits of using multiprocessing.
Benefits of using multiprocessing
Here are some benefits of multiprocessing:
- Better CPU utilization when dealing with high CPU-intensive tasks
- More control over the child than the thread
- easy to code
The first benefit relates to performance. Since multiprocessing creates new processes, you can make better use of your CPU’s computational power by splitting your tasks among other cores. Most processors nowadays are multi-core processors, and if you optimize your code you can save time by solving calculations in parallel.
The second advantage sees the option of multiprocessing, which is multithreading. However threads are not processes, and have consequences. If you create a thread, it is also dangerous to kill or interrupt it as you would with a normal process. Since the comparison between multiprocessing and multithreading is not within the scope of this article, I encourage you to do some more reading on it.
The third advantage of multiprocessing is that it is fairly easy to implement, given that the task you are trying to handle is well suited to parallel programming.
Getting Started with Python Multiprocessing
We are finally ready to write some Python code!
We’ll start with a very basic example and use it to illustrate the main aspects of Python multiprocessing. In this example, we would have two processes:
parentprocess. There is only one parent process, which can have multiple children.
childprocess. It is bred by the parents. Each child can also have new children.
we are going to use
child The process of performing a certain task. thus,
parent can go on with its execution.
A Simple Python Multiprocessing Example
We’ll use the code for this example:
from multiprocessing import Process def bubble_sort(array): check = True while check == True: check = False for i in range(0, len(array)-1): if array[i] > array[i+1]: check = True temp = array[i] array[i] = array[i+1] array[i+1] = temp print("Array sorted: ", array) if __name__ == '__main__': p = Process(target=bubble_sort, args=([1,9,4,5,2,6,8,4],)) p.start() p.join()
In this snippet, we have defined a function called
bubble_sort(array), This function is a really naive implementation of the bubble sort sorting algorithm. If you don’t know what it is, don’t worry, because it’s not that important. The important thing to know is that it is a function that takes some work.
multiprocessingwe import class
Process, This class represents an activity that will be run in a separate process. In fact, you can see that we’ve passed some arguments:
target=bubble_sortwhich means our new process will run
args=([1,9,4,52,6,8,4],)which is the array passed as an argument to the target function
Once we have created an instance for the Process class, all we need to do is start the process. This is done by writing
p.start(), At this point, the process is started.
Before we exit, we have to wait for the child process to finish its computation.
join() The method waits for the process to finish.
In this example, we have only created one child process. As you can guess, we can create more child process by creating more instances in it
What if we need to create multiple processes to handle more CPU-intensive tasks? Do we always need to start over and explicitly wait for termination? The solution here is to use
Pool The class allows you to create a pool of worker processes, and in the following example, we’ll see how we can use it. This is our new example:
from multiprocessing import Pool import time import math N = 5000000 def cube(x): return math.sqrt(x) if __name__ == "__main__": with Pool() as pool: result = pool.map(cube, range(10,N)) print("Program finished!")
In this code snippet, we have a
cube(x) Function that simply takes an integer and returns its square root. easy, isn’t it?
Then, we create an instance of
Pool class, without specifying any attributes. The Pool class creates one process per CPU core by default. Next, we run
map method with some arguments.
map the law applies
cube function for each element of the iterable we supply – which, in this case, is a list of each number
Its biggest advantage is that the enumeration in the list is done in parallel!
Making the Best Use of Python Multiprocessing
It is not necessary to create multiple processes and do parallel computation as compared to serial computing. For less CPU-intensive tasks, serial computation is faster than parallel computation. For this reason, it’s important to understand when you should be using multiprocessing – depending on what you’re doing.
To explain this to you, let’s look at a simple example:
from multiprocessing import Pool import time import math N = 5000000 def cube(x): return math.sqrt(x) if __name__ == "__main__": start_time = time.perf_counter() with Pool() as pool: result = pool.map(cube, range(10,N)) finish_time = time.perf_counter() print("Program finished in seconds - using multiprocessing".format(finish_time-start_time)) print("---") start_time = time.perf_counter() result =  for x in range(10,N): result.append(cube(x)) finish_time = time.perf_counter() print("Program finished in seconds".format(finish_time-start_time))
This snippet is based on the previous example. We are solving the same problem which is computing the square root of
N Numbers, but in two ways. The first one involves the use of Python multiprocessing, while the second does not. we are using
perf_counter() method from
time Library for measuring time performance.
On my laptop, I get this result:
> python code.py Program finished in 1.6385094 seconds - using multiprocessing --- Program finished in 2.7373942999999996 seconds
As you can see, there is a difference of more than a second. So in this case, multiprocessing is better.
Let’s change something in the code, like the value of
N, let’s down it
N=10000 And see what happens.
This is what I get now:
> python code.py Program finished in 0.3756742 seconds - using multiprocessing --- Program finished in 0.005098400000000003 seconds
What happened? Looks like multiprocessing is a bad choice now. Why?
The overhead introduced by splitting the computations between processes is much higher than the task solved. You can see how much of a difference there is in terms of timing performance.
In this article, we have talked about performance optimization of Python code using Python multiprocessing.
First, we briefly introduce what parallel computing is and the main models for its use. Then, we started talking about multiprocessing and its advantages. Finally, we saw that parallelizing computations is not always the best option and
multiprocessing Modules should be used to parallelize CPU-bound tasks. As always, it is a matter of considering the specific problem you are facing and evaluating the pros and cons of the different solutions.
I hope you found learning about Python multiprocessing as useful as I did.