目次
- 1 1. Introduction
- 2 2. Comparison of Multithreading and Multiprocessing
- 3 3. Basic Concepts of Threads and Processes
- 4 4. Implementing Multithreading in Python
- 5 5. Multithreading Use Cases
- 6 6. Cautions and Best Practices for Using Multithreading
- 7 7. Comparison of Multithreading and Multiprocessing
- 8 8. Summary and FAQ
- 8.1 Key takeaways
- 8.2 FAQ (Frequently Asked Questions)
- 8.2.1 Q1: When using multithreading, how many threads should I use?
- 8.2.2 Q2: Is there a way to completely avoid the constraints of the GIL?
- 8.2.3 Q3: How do multithreading and asynchronous programming (asyncio) differ?
- 8.2.4 Q4: What are the benefits of using a thread pool in Python?
- 8.2.5 Q5: Does using multithreading increase memory consumption?
- 8.3 Conclusion
1. Introduction
Python is a programming language used by a wide range of users, from beginners to advanced developers, thanks to its simple, easy-to-use syntax and extensive libraries. Among its features, multithreading is an important technique that can dramatically improve processing efficiency in certain situations.Why use multithreading in Python
As computer performance improves, the demands on the amount of data and speed a program must handle at once have increased. Multithreading can be especially effective in the following scenarios:- Processing large amounts of data: Retrieving data from databases or handling a large number of files can reduce processing time through parallelization.
- Improving I/O efficiency: In programs with heavy I/O, such as file reads/writes or network communication, you can minimize waiting time.
- Real-time requirements: In game or user interface programming, where multiple tasks must run simultaneously, multithreading becomes essential.
Benefits and challenges of multithreading
Advantages
- Increased processing speed: Multiple threads running concurrently can distribute tasks more efficiently.
- Effective use of resources: Even if some threads are idle, others can utilize CPU resources.
Challenges
- Global Interpreter Lock (GIL) limitations: In Python, the presence of the GIL can limit the effectiveness of multithreading.
- Debugging complexity: Issues like race conditions and deadlocks are more likely to occur, which can make debugging time-consuming.
Purpose of this article
This article explains the basic concepts and concrete methods for implementing multithreading in Python. It also includes practical examples and points to watch for, so you can learn how to apply these techniques in real-world work. The material is presented step by step to be easy to understand for beginners to intermediate users, so please read through to the end.Ad
2. Comparison of Multithreading and Multiprocessing
In programming, both multithreading and multiprocessing are important techniques for achieving parallel processing, but each has different characteristics and use cases. This section explains their differences and how to choose between them in Python.Basic differences between threads and processes
What is a thread
A thread is a unit of parallel execution within a single process. Because threads share the same memory space, data exchange is fast.- Characteristics:
- Share the same memory space
- Lightweight and fast to start
- Easy data sharing
What is a process
A process is an execution unit with its own independent memory space. Because each process has its own resources, they are less likely to affect each other.- Characteristics:
- Has an independent memory space
- More heavyweight and slower to start
- Additional mechanisms are required for data sharing
Impact of the GIL (Global Interpreter Lock) in Python
Python has a constraint called the GIL (Global Interpreter Lock). This lock makes it so that only one Python thread can execute at a time. Because of the GIL, using multithreading may not fully utilize a CPU’s multicore capabilities.- Cases that are more affected by the GIL:
- CPU-intensive computations (e.g., numerical calculations or image processing)
- Cases that are less affected by the GIL:
- I/O-bound workloads (e.g., network communication, file operations)
Choosing between multithreading and multiprocessing
When to choose multithreading
- Use cases:
- Programs with a lot of I/O operations
- When you need to run lightweight tasks in parallel
- Examples: Web scraping, concurrent file downloads
When to choose multiprocessing
- Use cases:
- CPU-intensive computations
- When you want to avoid the GIL’s limitations
- Examples: Training machine learning models, image processing
Simple comparison examples in Python
Below are simple Python code examples that use thethreading module and the multiprocessing module to demonstrate basic parallel processing.Multithreading example
import threading
import time
def task(name):
print(f"{name} started")
time.sleep(2)
print(f"{name} finished")
threads = []
for i in range(3):
thread = threading.Thread(target=task, args=(f"Thread {i+1}",))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All threads finished")Multiprocessing example
from multiprocessing import Process
import time
def task(name):
print(f"{name} started")
time.sleep(2)
print(f"{name} finished")
processes = []
for i in range(3):
process = Process(target=task, args=(f"Process {i+1}",))
processes.append(process)
process.start()
for process in processes:
process.join()
print("All processes finished")Conclusion
Both multithreading and multiprocessing have their appropriate uses. When implementing parallel processing in Python, it’s important to consider the nature of your program and the effects of the GIL, and choose the best approach accordingly.
3. Basic Concepts of Threads and Processes
To correctly understand and make use of multithreading and multiprocessing, it’s important to know their basic mechanisms and characteristics. This section explains how threads and processes operate and in which situations each is appropriate.Basic Concepts of Threads
Role of Threads
A thread refers to an independent flow of execution within a process. Multiple threads within the same process share the memory space, which allows smooth data sharing and communication.- Characteristics:
- A lightweight unit that runs within a process.
- Because they share memory space, data exchange is fast.
- Synchronization and race-condition control between threads are required.
Advantages and Challenges of Threads
- Advantages:
- High memory efficiency.
- Lightweight, with fast startup and switching.
- Challenges:
- There is a risk of data races and deadlocks on shared data.
- In Python, threads are affected by the GIL, so they are not suitable for CPU-bound tasks.
Basic Concepts of Processes
Role of Processes
A process is an independent execution environment allocated by the operating system. Each process has its own memory space and does not affect others.- Characteristics:
- Uses a completely independent memory space.
- High security and stability.
- If inter-process communication (IPC) is needed, it becomes a bit more complex.
Advantages and Challenges of Processes
- Advantages:
- Not affected by the GIL, so ideal for CPU-bound workloads.
- Because processes are independent, they offer higher stability.
- Challenges:
- Starting and switching processes incurs overhead.
- Increases memory usage.
Comparison of Threads and Processes
| Feature | Thread | Process |
|---|---|---|
| Memory Space | Share the same memory space | Independent memory space |
| Lightweightness | Lightweight | Heavyweight |
| Startup Speed | Fast | Slower |
| Data Sharing | Easy | Requires IPC (inter-process communication) |
| Impact of the GIL | Affected | Not affected |
| Use Cases | I/O-bound tasks | CPU-intensive computations |
How the Global Interpreter Lock (GIL) Works
In Python, the GIL controls thread execution. The GIL ensures that only one thread can execute Python bytecode at a time. This helps prevent data races between threads, but it can also limit efficient utilization of multicore CPUs.- Advantages of the GIL:
- Prevents data races between threads and provides thread safety.
- Disadvantages of the GIL:
- For CPU-bound tasks, multithreading performance is limited.
Criteria for Choosing Between Threads and Processes
When doing parallel processing in Python, it’s good to choose between threads and processes based on the following criteria.- When to Choose Threads:
- Most of the work spends time waiting for I/O (e.g., network communication).
- You want to keep memory usage low.
- When to Choose Processes:
- CPU-intensive workloads (e.g., numerical computations).
- You want to efficiently utilize multiple cores.
Ad
4. Implementing Multithreading in Python
When implementing multithreading in Python, use the standard librarythreading module. This section covers, with concrete code examples, everything from creating basic threads to advanced control.Basic Usage of the threading Module
Creating and Running Threads
In thethreading module, create and run threads using the Thread class. Below is a basic example.import threading
import time
def print_message(message):
print(f"Start: {message}")
time.sleep(2)
print(f"End: {message}")
## Create threads
thread1 = threading.Thread(target=print_message, args=("Thread 1",))
thread2 = threading.Thread(target=print_message, args=("Thread 2",))
## Start threads
thread1.start()
thread2.start()
## Wait for threads to finish
thread1.join()
thread2.join()
print("All threads finished")Explanation of the Output
In this code, two threads start simultaneously and each runs independently. By using thejoin() method, the main thread can wait until all threads have finished.Implementing Threads Using a Class
You can also implement more complex thread behavior by subclassing theThread class.import threading
import time
class MyThread(threading.Thread):
def __init__(self, name):
super().__init__()
self.name = name
def run(self):
print(f"{self.name} started")
time.sleep(2)
print(f"{self.name} finished")
## Create threads
thread1 = MyThread("Thread 1")
thread2 = MyThread("Thread 2")
## Start threads
thread1.start()
thread2.start()
## Wait for threads to finish
thread1.join()
thread2.join()
print("All threads finished")Explanation of the Output
Define the thread’s behavior in therun() method and start the thread with the start() method. This approach is useful when you want to reuse complex thread logic as a class.Thread Synchronization and Locks
When multiple threads operate on shared data concurrently, race conditions and inconsistencies can occur. To prevent such issues, use aLock object to synchronize between threads.Example Using a Lock
import threading
lock = threading.Lock()
shared_resource = 0
def increment():
global shared_resource
with lock: ## Acquire lock
local_copy = shared_resource
local_copy += 1
shared_resource = local_copy
threads = []
for i in range(5):
thread = threading.Thread(target=increment)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print(f"Final value of shared resource: {shared_resource}")Explanation of the Output
By using thewith lock syntax, you can safely acquire and release locks. In this example, the lock is used to restrict access to the shared resource to one thread at a time.Thread Timeouts and Daemon Threads
Thread Timeouts
By setting a timeout on thejoin() method, you can wait for a thread to finish for only a specified amount of time.thread.join(timeout=5)Daemon Threads
Daemon threads stop automatically when the main thread exits. To set a thread as a daemon, set thedaemon attribute to True.thread = threading.Thread(target=print_message)
thread.daemon = True
thread.start()Practical Examples of Multithreading
The following is an example of parallelizing file downloads.import threading
import time
def download_file(file_name):
print(f"Starting download of {file_name}")
time.sleep(2) ## Simulate download
print(f"Finished downloading {file_name}")
files = ["file1", "file2", "file3"]
threads = []
for file in files:
thread = threading.Thread(target=download_file, args=(file,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All files downloaded")Conclusion
This section explained basic multithreading implementations in Python as well as practical application examples. The next section will dive deeper into concrete use cases for multithreading.
5. Multithreading Use Cases
Python’s multithreading is particularly well-suited to I/O-bound tasks. This section presents several concrete examples of applying multithreading. Through these examples, you’ll learn how to use it in real-world projects.1. Improving Web Scraping Efficiency
When collecting data from websites, sending requests to multiple URLs concurrently can significantly reduce processing time.Sample code
Below is an example of web scraping using Python’srequests library and the threading module.import threading
import requests
import time
urls = [
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
]
def fetch_url(url):
print(f"Starting fetch for {url}")
response = requests.get(url)
print(f"Fetch complete for {url}: status code {response.status_code}")
threads = []
start_time = time.time()
for url in urls:
thread = threading.Thread(target=fetch_url, args=(url,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
end_time = time.time()
print(f"Processing time: {end_time - start_time:.2f} seconds")Explanation of the results
In this code, requests to each URL are executed in parallel, reducing total processing time. However, when making many requests, be careful about server load and potential violations of site policies.2. Concurrent File Downloads
When downloading multiple files from the internet, multithreading can handle the task more efficiently.Sample code
import threading
import time
def download_file(file_name):
print(f"Starting download of {file_name}")
time.sleep(2) ## Simulate download
print(f"Download complete for {file_name}")
files = ["file1.zip", "file2.zip", "file3.zip"]
threads = []
for file in files:
thread = threading.Thread(target=download_file, args=(file,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All downloads complete")Explanation of the results
In this code, each file’s download is executed per thread, reducing processing time. In real applications, you would use libraries likeurllib or requests to implement real download functionality.3. Parallel Execution of Database Queries
When retrieving large amounts of data from a database, using multithreading to execute queries in parallel can improve performance.Sample code
import threading
import time
def query_database(query):
print(f"Running query: {query}")
time.sleep(2) ## Simulate query execution
print(f"Query complete: {query}")
queries = ["SELECT * FROM users", "SELECT * FROM orders", "SELECT * FROM products"]
threads = []
for query in queries:
thread = threading.Thread(target=query_database, args=(query,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All queries completed")Explanation of the results
In this example, running different queries in parallel reduces data retrieval time. In real applications, you would connect using database libraries (e.g.,sqlite3, psycopg2).4. Parallelizing Video Processing
Tasks that process video files frame by frame can be made more efficient with multithreading.Sample code
import threading
import time
def process_frame(frame_number):
print(f"Starting processing for frame {frame_number}")
time.sleep(1) ## Simulate processing
print(f"Processing complete for frame {frame_number}")
frame_numbers = range(1, 6)
threads = []
for frame in frame_numbers:
thread = threading.Thread(target=process_frame, args=(frame,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All frames processed")Explanation of the results
By parallelizing per-frame operations like video editing or effects processing, you can improve overall processing speed.Conclusion
Multithreading is highly effective for systems that perform many I/O operations and applications that require real-time responsiveness. However, for CPU-intensive tasks you should consider the impact of the GIL and evaluate using multiprocessing appropriately.
Ad
6. Cautions and Best Practices for Using Multithreading
When using multithreading in Python, you can achieve efficient processing, but there are points to watch out for and common pitfalls. This section introduces multithreading challenges and best practices to avoid them.Cautions
1. Impact of the Global Interpreter Lock (GIL)
Python’s GIL (Global Interpreter Lock) enforces the constraint that only one thread can execute Python bytecode at a time. Because of this, CPU-bound tasks (e.g., numerical computations) don’t benefit as much from multithreading.- Cases affected:
- Heavy computational workloads
- Algorithms that require high CPU usage
- Mitigations:
- Use the
multiprocessingmodule to parallelize with multiple processes. - Use C extension modules or optimized libraries like NumPy to avoid the GIL.
2. Deadlocks
A deadlock, where multiple threads wait on resources held by each other, is a common issue in multithreaded programs. This can cause the entire program to halt.- Example: Thread A holds resource X and waits for resource Y, while Thread B holds resource Y and waits for resource X.
- Mitigations:
- Always acquire resources in a consistent order.
- Use the
RLock(reentrant lock) from thethreadingmodule to prevent deadlocks.
Sample code (deadlock avoidance)
import threading
lock1 = threading.Lock()
lock2 = threading.Lock()
def task1():
with lock1:
print("Task1 acquired lock1")
with lock2:
print("Task1 acquired lock2")
def task2():
with lock2:
print("Task2 acquired lock2")
with lock1:
print("Task2 acquired lock1")
thread1 = threading.Thread(target=task1)
thread2 = threading.Thread(target=task2)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print("Both tasks completed")3. Race conditions
When multiple threads operate on the same data simultaneously, unexpected behavior can occur. This is called a “race condition.”- Example: If two threads try to increment a counter variable at the same time, the counter may not increase as expected.
- Mitigations:
- Use the
Lockfrom thethreadingmodule to synchronize access to shared data. - Minimize data sharing between threads.
Sample code (avoidance using locks)
import threading
lock = threading.Lock()
counter = 0
def increment():
global counter
with lock:
local_copy = counter
local_copy += 1
counter = local_copy
threads = [threading.Thread(target=increment) for _ in range(100)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print(f"Counter value: {counter}")Best practices
1. Setting the appropriate number of threads
- When setting the number of threads, consider the number of CPU cores and I/O wait times.
- Recommendation: For I/O-bound tasks, it’s usually fine to increase the thread count, but for CPU-intensive tasks it’s common to limit threads to the number of cores.
2. Debugging and logging
- Multithreaded programs are harder to debug, so proper logging is important.
- Recommendation: Use Python’s
loggingmodule to record logs per thread.
Sample code (logging)
import threading
import logging
logging.basicConfig(level=logging.DEBUG, format='%(threadName)s: %(message)s')
def task():
logging.debug("Task running")
threads = [threading.Thread(target=task) for _ in range(5)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
logging.debug("All tasks completed")3. Using high-level libraries
Using high-level libraries likeconcurrent.futures.ThreadPoolExecutor makes thread management easier.Sample code (ThreadPoolExecutor)
from concurrent.futures import ThreadPoolExecutor
def task(name):
print(f"{name} running")
with ThreadPoolExecutor(max_workers=3) as executor:
executor.map(task, ["Task 1", "Task 2", "Task 3"])Conclusion
To effectively use multithreading in Python, it’s important to pay attention to the GIL and synchronization issues and aim for a safe, efficient design. Appropriate use of locks, debugging techniques, and leveraging high-level libraries when needed are key to building successful multithreaded programs.
Ad
7. Comparison of Multithreading and Multiprocessing
In Python, there are two main approaches to achieving parallelism: multithreading and multiprocessing. Each has its own characteristics and is suited to different situations. This section compares the two in detail and offers guidance on when to use each.Basic Differences Between Multithreading and Multiprocessing
| Feature | Multithreading | Multiprocessing |
|---|---|---|
| Execution unit | Multiple threads within the same process | Multiple independent processes |
| Memory space | Shared (use the same memory space) | Independent (isolated memory space per process) |
| Lightweight | Lightweight and fast to start | Heavy and slower to start |
| GIL impact | Affected | Not affected |
| Data sharing | Easy (uses the same memory) | Complex (requires inter-process communication) |
| Use cases | I/O-bound tasks | CPU-bound tasks |
Detailed Explanation
- Multithreading: Because multiple threads run within the same process, it’s lightweight and data sharing is easy. However, in Python, the GIL can limit performance for CPU-intensive tasks.
- Multiprocessing: Since processes do not share memory space, it’s not affected by the GIL and can fully utilize multiple CPU cores. However, if inter-process communication (IPC) is required, the implementation can be somewhat more complex.
When to Choose Multithreading
- Examples:
- Web scraping
- File operations (read/write)
- Network communication (asynchronous operations)
- Reason: Multithreading can efficiently utilize I/O waiting time, increasing parallelism. Also, because threads share the same memory space, exchanging data is easy.
Code example: I/O-bound tasks
import threading
import time
def file_operation(file_name):
print(f"Start processing {file_name}")
time.sleep(2) ## Simulate a file operation
print(f"Finished processing {file_name}")
files = ["file1.txt", "file2.txt", "file3.txt"]
threads = []
for file in files:
thread = threading.Thread(target=file_operation, args=(file,))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print("All file operations are complete")When to Choose Multiprocessing
- Examples:
- Large-scale data processing
- Training machine learning models
- Image processing and numerical computations
- Reason: You can avoid the GIL and fully utilize multiple CPU cores to achieve high computational performance. However, sharing data between processes can be cumbersome.
Code example: CPU-intensive tasks
from multiprocessing import Process
import time
def compute_heavy_task(task_id):
print(f"Running task {task_id}")
time.sleep(3) ## Simulate a computation
print(f"Task {task_id} complete")
tasks = ["Computation 1", "Computation 2", "Computation 3"]
processes = []
for task in tasks:
process = Process(target=compute_heavy_task, args=(task,))
processes.append(process)
process.start()
for process in processes:
process.join()
print("All computation tasks are complete")Combining Both
In some projects, combining multithreading and multiprocessing can yield optimal performance. For example, you might parallelize data retrieval (I/O) with multithreading and then process that data with CPU-intensive computations using multiprocessing.Criteria for Choosing Between Multithreading and Multiprocessing
Consider the following points when choosing.- Nature of the task:
- If there’s a lot of I/O waiting: Multithreading
- For compute-heavy tasks: Multiprocessing
- Resource constraints:
- If you want to minimize memory usage: Multithreading
- If you want to fully utilize CPU cores: Multiprocessing
- Code complexity:
- If you want to share data easily: Multithreading
- If you can handle inter-process communication: Multiprocessing

Ad
8. Summary and FAQ
This article provided a detailed explanation of using multithreading and multiprocessing in Python, covering basic concepts, implementation examples, caveats, and guidance on when to use each. In this section, we summarize the key points of the article and supplement the explanation with an FAQ format to answer questions readers are likely to have.Key takeaways
- Characteristics of multithreading
- Well-suited for reducing I/O wait time and makes sharing data easy.
- Affected by the GIL, so not suitable for CPU-bound tasks.
- Characteristics of multiprocessing
- Not constrained by the GIL and performs well for CPU-intensive workloads.
- Uses separate memory spaces, so inter-process communication may be required.
- Choosing the right approach is key
- Use multithreading for I/O-bound tasks and multiprocessing for CPU-bound tasks.
- Combining both when appropriate can yield optimal performance.
FAQ (Frequently Asked Questions)
Q1: When using multithreading, how many threads should I use?
A: Consider the following when setting the number of threads.- I/O-bound tasks: You can use many threads without problems. Specifically, it’s common to match the number of threads to the number of tasks you want to process concurrently.
- CPU-bound tasks: Keep the number of threads at or below the number of physical cores. Too many threads can lead to performance degradation due to the GIL.
Q2: Is there a way to completely avoid the constraints of the GIL?
A: Yes, you can avoid the GIL’s effects using the methods below.- Use multiprocessing:
multiprocessingallows you to avoid the GIL by performing process-level parallelism. - Use external libraries: Libraries implemented in C, such as NumPy and Pandas, can temporarily release the GIL and operate very efficiently.
Q3: How do multithreading and asynchronous programming (asyncio) differ?
A:- Multithreading: Uses threads to execute tasks in parallel. Because threads share resources, synchronization may be necessary.
- Asynchronous programming: Uses asyncio to switch between tasks within an event loop. Runs within a single thread, avoiding thread contention and locking issues. It’s specialized for I/O waiting, so it’s lighter-weight than threads.
Q4: What are the benefits of using a thread pool in Python?
A: Using a thread pool makes creating and tearing down threads more efficient. It’s especially useful when handling a large number of tasks. Usingconcurrent.futures.ThreadPoolExecutor makes thread management easier. Example:from concurrent.futures import ThreadPoolExecutor
def task(name):
print(f"{name} is running")
with ThreadPoolExecutor(max_workers=5) as executor:
executor.map(task, ["Task 1", "Task 2", "Task 3", "Task 4", "Task 5"])



