Python Multiprocessing Process, Queue and Locks
The multiprocessing module in Python provides several key components for working with processes:
- Process class: This is the main way to create and manage processes. Each Process represents a separate task that can run independently. You define what the process does by passing a target function to the constructor. The start() method launches the process, and join() waits for it to finish.
- Queue class: Queues allow processes to communicate by passing messages back and forth. A Queue is like a pipe with two ends. Processes can put items into one end and get items out of the other end. This is useful for sending data between processes or coordinating their actions.
- Lock class: Locks are used to control access to shared resources. If multiple processes try to modify the same data at the same time, it can cause errors. A Lock acts like a traffic light, allowing only one process to access the resource at a time. The acquire() method grabs the lock, and release() frees it up for another process.
Example showing how to use Process and Queue together:
Python
from multiprocessing import Process, Queue
def worker(q):
result = 0
while True:
num = q.get()
if num == 'stop':
break
result += num
q.put(result)
if __name__ == '__main__':
q = Queue()
p = Process(target=worker, args=(q,))
p.start()
for i in range(5):
q.put(i)
q.put('stop')
p.join()
result = q.get()
print(result)

You can also try this code with Online Python Compiler
Run Code
Output
10
In this code, the main process creates a Queue q and a worker process p. The worker runs in a loop, taking numbers from the queue, adding them up, and putting the result back. The main process sends numbers to the queue, tells the worker to stop, and gets the final result.
Python multiprocessing Process class
The Process class is the core of the multiprocessing module. It represents a separate process that can run independently from the main program. Let’s see how we can use this effectively :
Creating a Process
To create a new process, you instantiate the Process class and pass it a target function to run. This function is what the process will execute when it starts. You can also specify arguments to pass to the function using the args parameter.
Python
import multiprocessing
def worker(num):
print(f'Worker process {num} starting')
# do some work here
print(f'Worker process {num} finishing')
if __name__ == '__main__':
processes = []
for i in range(5):
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()

You can also try this code with Online Python Compiler
Run Code
Output
Worker process 0 starting
Worker process 0 finishing
Worker process 1 starting
Worker process 1 finishing
Worker process 2 starting
Worker process 2 finishing
Worker process 3 starting
Worker process 3 finishing
Worker process 4 starting
Worker process 4 finishing
Starting a Process
Once you've created a Process instance, you start it by calling the start() method. This launches the process and runs the target function in a separate memory space. The start() method returns immediately, allowing the main program to continue running while the process executes in the background.
Waiting for a Process: If you need to wait for a process to finish before continuing, you can use the join() method. This blocks until the process terminates, either naturally or by an exception. It's important to use join() if you need to ensure that the process has completed before using any results it produced.
for p in processes:
p.join()
Terminating a Process
If you need to stop a process before it finishes, you can use the terminate() method. This forcibly kills the process and any child processes it spawned. However, it's generally better to use a more graceful method of communication, like sending a signal or message to the process asking it to exit cleanly.
Process Pool
If you have a large number of independent tasks to run, you can use a Pool to manage a fixed number of worker processes. The Pool class takes care of creating the processes, distributing the tasks, and collecting the results. You can use the map() method to apply a function to a list of arguments in parallel.
Python
from multiprocessing import Pool
def square(x):
return x * x
if __name__ == '__main__':
with Pool(processes=4) as pool:
results = pool.map(square, [1, 2, 3, 4, 5])
print(results)

You can also try this code with Online Python Compiler
Run Code
Output
[1, 4, 9, 16, 25]
The Process class provides a simple and flexible way to run multiple tasks concurrently in Python. By using separate memory spaces, processes avoid the issues of shared state that can plague multithreaded programs. However, communication between processes is more limited and requires explicit mechanisms like queues or pipes.
Python multiprocessing Queue class
The Queue class in Python's multiprocessing module provides a safe and efficient way for processes to exchange data. A Queue is a first-in, first-out (FIFO) data structure that allows multiple producers to put items in one end and multiple consumers to get items from the other end. Here's what you need to know to use the Queue class effectively:
Creating a Queue
To create a queue, you simply instantiate the Queue class. You can optionally specify a maximum size for the queue using the maxsize parameter. If maxsize is 0 or negative, the queue size is infinite
from multiprocessing import Queue
q = Queue()
Putting items
To put an item into the queue, you use the put() method. This adds the item to the end of the queue. If the queue is full (i.e., it has reached its maximum size), put() will block until a slot becomes available. You can use the optional block and timeout parameters to control this behavior.
q.put(42)
q.put('hello')
q.put([1, 2, 3])
Getting items
To get an item from the queue, you use the get() method. This removes and returns the item at the front of the queue. If the queue is empty, get() will block until an item is available. Again, you can use the block and timeout parameters to control this behavior.
item = q.get()
Checking queue size
You can check the current size of the queue using the qsize() method. This returns the approximate number of items in the queue. However, because of the asynchronous nature of multiprocessing, this number may not be exact.
size = q.qsize()
Closing a queue
When you're done with a queue, you can close it using the close() method.
This prevents any more items from being added to the queue. Processes that try to put items into a closed queue will raise a ValueError. Processes can still get items from a closed queue until it is empty.
q.close()
Example that shows how to use a Queue to communicate between two processes:
from multiprocessing import Process, Queue
def worker(q):
while True:
item = q.get()
if item is None:
break
print(f'Processing {item}')
if __name__ == '__main__':
q = Queue()
p = Process(target=worker, args=(q,))
p.start()
for i in range(5):
q.put(i)
q.put(None)
p.join()
In this example, the main process creates a Queue and a worker process. The worker runs in a loop, taking items from the queue and processing them. The main process puts some items into the queue, and then puts a None item to signal the worker to exit. Finally, it waits for the worker to finish using join().
The Queue class makes it easy to safely pass data between processes in Python. By using a queue, you can avoid many of the synchronization issues that arise when processes directly share memory. However, queues do have some overhead, so they may not be suitable for extremely high-throughput scenarios.
Python multiprocessing Lock Class
The Lock class in Python's multiprocessing module is used to synchronize access to shared resources across multiple processes. A Lock acts like a traffic light, allowing only one process to access the resource at a time. This prevents race conditions, where two or more processes try to modify the same data simultaneously, leading to unpredictable results.
Here's what you need to know to use the Lock class effectively:
Creating a Lock
To create a lock, you simply instantiate the Lock class. The lock is initially unlocked.
from multiprocessing import Lock
lock = Lock()
Acquiring a Lock
Before accessing a shared resource, a process must acquire the lock using the acquire() method. If the lock is already held by another process, acquire() will block until it becomes available. You can use the optional blocking and timeout parameters to control this behavior.
lock.acquire()
# access shared resource here
Releasing a Lock
After a process has finished accessing the shared resource, it must release the lock using the release() method. This allows other processes to acquire the lock and access the resource.
# access shared resource here
lock.release()
Context Manager
You can use a Lock as a context manager with the with statement. This automatically acquires the lock before entering the block and releases it after the block is finished, even if an exception is raised.
with lock:
# access shared resource here
Example that shows how to use a Lock to synchronize access to a shared variable
from multiprocessing import Process, Lock
def worker(lock, count):
for i in range(1000):
with lock:
count.value += 1
if __name__ == '__main__':
lock = Lock()
count = Value('i', 0)
processes = []
for i in range(4):
p = Process(target=worker, args=(lock, count))
processes.append(p)
p.start()
for p in processes:
p.join()
print(f'Final count: {count.value}')
In this example, we create a shared integer variable count using the Value class. We also create a Lock and pass it to four worker processes. Each worker increments the count 1000 times, but they use the lock to ensure that only one process can access count at a time. Without the lock, the final value of count would be unpredictable due to race conditions.
The Lock class is a fundamental tool for synchronizing access to shared resources in Python multiprocessing. By using locks judiciously, you can prevent many common concurrency bugs and ensure that your parallel programs are correct and efficient. However, locks can also introduce performance bottlenecks if overused, so it's important to strike a balance.
Python multiprocessing example
In this example, we'll create a program that uses multiple processes to count the occurrences of each word in a set of text files.
Python
import multiprocessing
import os
import sys
import string
from collections import Counter
def count_words(filename, counter):
with open(filename, 'r') as file:
text = file.read().lower()
words = text.translate(str.maketrans('', '', string.punctuation)).split()
counter.update(words)
def merge_counters(counters):
result = Counter()
for counter in counters:
result += counter
return result
if __name__ == '__main__':
if len(sys.argv) < 2:
print(f'Usage: {sys.argv[0]} file1 [file2 ...]')
sys.exit(1)
filenames = sys.argv[1:]
num_processes = min(len(filenames), multiprocessing.cpu_count())
manager = multiprocessing.Manager()
counter = manager.Counter()
pool = multiprocessing.Pool(processes=num_processes)
results = []
for filename in filenames:
result = pool.apply_async(count_words, (filename, counter))
results.append(result)
pool.close()
pool.join()
counters = [result.get() for result in results]
final_count = merge_counters(counters)
for word, count in final_count.most_common(10):
print(f'{word}: {count}')

You can also try this code with Online Python Compiler
Run Code
Output
A module you have imported isn't available at the moment. It will be available soon.
Let's see what's happening in this code:
- We define a count_words function that takes a filename and a Counter object. It reads the text from the file, converts it to lowercase, removes punctuation, splits it into words, and updates the Counter with the word counts.
- We define a merge_counters function that takes a list of Counter objects and merges them into a single Counter.
- In the main block, we first check that the user has provided at least one filename as a command-line argument. If not, we print a usage message and exit.
- We create a multiprocessing Manager to manage shared objects across processes. We create a shared Counter object using manager.Counter().
- We create a Pool of worker processes, using the number of CPU cores as the maximum number of processes.
- For each input file, we apply the count_words function asynchronously using pool.apply_async(). This schedules the function to be run in a worker process, passing the filename and the shared Counter as arguments. We collect the results in a list.
- We close the pool and wait for all worker processes to finish using pool.join().
- We collect the Counter objects returned by each worker process and merge them using merge_counters.
- Finally, we print out the 10 most common words and their counts using Counter.most_common().
This example demonstrates how to use multiprocessing to parallelize a task across multiple files. By using a Pool of worker processes and a shared Counter object, we can efficiently count the words in each file and combine the results. The Manager class provides a way to create shared objects that can be safely accessed by multiple processes.
Python multiprocessing Pool
The Pool class in Python's multiprocessing module provides a convenient way to parallelize the execution of a function across multiple input values. A pool of worker processes is created, and the task is split among them. The Pool class takes care of the details of creating the processes, distributing the work, and collecting the results.
Here's what you need to know to use the Pool class effectively:
Creating a Pool
To create a pool of worker processes, you instantiate the Pool class. You can optionally specify the number of processes to create using the processes parameter. If omitted, it defaults to the number of CPU cores.
from multiprocessing import Pool
pool = Pool(processes=4)
Applying a function
The most common way to use a Pool is to apply a function to a list of arguments using the map() method. This distributes the function calls across the worker processes and returns a list of the results in the same order as the input arguments.
Python
def square(x):
return x * x
numbers = [1, 2, 3, 4, 5]
results = pool.map(square, numbers)
print(results)

You can also try this code with Online Python Compiler
Run Code
Output
[1, 4, 9, 16, 25]
Asynchronous execution
If you don't need to wait for the results immediately, you can use the apply_async() or map_async() methods to execute the function asynchronously. These return an AsyncResult object that you can use to retrieve the result later.
Python
result = pool.apply_async(square, (10,))
print(result.get())

You can also try this code with Online Python Compiler
Run Code
Output:
100
Callback functions
You can specify a callback function to be called whenever a worker process finishes a task. This can be useful for handling the results as they become available, rather than waiting for all tasks to complete.
def print_result(result):
print(f'Result: {result}')pool.apply_async(square, (10,), callback=print_result)
Closing the Pool
When you're done with the Pool, you should close it using the close() method. This prevents any more tasks from being submitted to the pool. You can then use the join() method to wait for all worker processes to finish.
pool.close()
pool.join()
Here's a complete example that demonstrates how to use the Pool class to parallelize a computationally intensive function:
import multiprocessing
import time
def factorial(n):
result = 1
for i in range(1, n+1):
result *= i
return result
if __name__ == '__main__':
start_time = time.time()
numbers = [100000, 200000, 300000, 400000, 500000]
pool = multiprocessing.Pool(processes=4)
results = pool.map(factorial, numbers)
pool.close()
pool.join()
end_time = time.time()
print(f'Results: {results}')
print(f'Execution time: {end_time - start_time:.2f} seconds')
In this example, we use a Pool of 4 worker processes to calculate the factorials of several large numbers in parallel. The map() method distributes the factorial function across the worker processes, which speeds up the overall computation. We measure the execution time to demonstrate the performance benefit of using multiprocessing.The Pool class is a high-level abstraction that makes it easy to parallelize function execution in Python. By using a Pool, you can take advantage of multiple CPU cores and significantly speed up CPU-bound tasks. However, it's important to choose the right number of processes based on the available hardware resources and the nature of the task.
Frequently Asked Questions
What's the difference between multiprocessing and multithreading in Python?
Multiprocessing runs separate processes with independent memory, while multithreading runs threads within a process, sharing memory.
How can I share data between processes?
Use multiprocessing queues, pipes, or managers to safely pass data between processes, or shared memory with synchronization.
What are some best practices for using multiprocessing?
Use for CPU-bound tasks, limit processes based on cores, avoid shared state, use pools for parallel processing.
Conclusion
In this article, we've learned about the Python multiprocessing module & how it can be used to write parallel programs that take advantage of multiple CPU cores. Multiprocessing is a powerful tool for speeding up CPU-bound tasks in Python, but it requires careful design & synchronization to avoid common concurrency pitfalls.
Recommended Readings: