This article was peer reviewed by Christopher Pitt. Thanks to all of SitePoint’s peer reviewers for making SitePoint content the best it can be!
本文由Christopher Pitt同行评审。 感谢所有SitePoint的同行评审人员使SitePoint内容达到最佳状态!
PHP developers seem to rarely utilise parallelism. The appeal of the simplicity of synchronous, single-threaded programming certainly is high, but sometimes the usage of a little concurrency can bring some worthwhile performance improvements.
PHP开发人员似乎很少利用并行性。 同步单线程编程简单性的吸引力当然很高,但是有时并发的使用可以带来一些有价值的性能改进。
In this article, we will be taking a look at how threading can be achieved in PHP with the pthreads extension. This will require a ZTS (Zend Thread Safety) version of PHP 7.x installed, along with the pthreads v3 installed. (At the time of writing, PHP 7.1 users will need to install from the master branch of the pthreads repo – see this article’s section for details on building third-party extensions from source.)
在本文中,我们将研究如何通过pthreads扩展在PHP中实现线程化。 这将需要安装PHP 7.x的ZTS(Zend线程安全)版本以及已安装的pthreads v3。 (在撰写本文时,PHP 7.1用户将需要从pthreads存储库的master分支中进行安装-有关从源代码构建第三方扩展的详细信息,请参见本文的部分 。)
Just as a quick clarification: pthreads v2 targets PHP 5.x and is no longer supported; pthreads v3 targets PHP 7.x and is being actively developed.
快速澄清一下:pthreads v2以PHP 5.x为目标,不再受支持。 pthreads v3针对PHP 7.x,并且正在积极开发中。
A big thank you to Joe Watkins (creator of the pthreads extension) for proofreading and helping to improve my article!
非常感谢Joe Watkins (pthreads扩展的创建者)进行的校对和帮助改进了我的文章!
什么时候不使用pthreads (When not to use pthreads)
Before we move on, I would first like to clarify when you should not (as well as cannot) use the pthreads extension.
在继续之前,我首先想澄清一下何时 (以及不能 )使用pthreads扩展。
In pthreads v2, the recommendation was that pthreads should not be used in a web server environment (i.e. in an FCGI process). As of pthreads v3, this recommendation has been enforced, so now you simply cannot use it in a web server environment. The two prominent reasons for this are:
在pthreads v2中,建议不要在Web服务器环境中(即,在FCGI进程中)使用pthreads。 从pthreads v3开始,此建议已得到执行,因此现在您根本无法在Web服务器环境中使用它。 造成这种情况的两个突出原因是:
- It is not safe to use multiple threads in such an environment (causing IO issues, amongst other problems). 在这样的环境中使用多个线程是不安全的(导致IO问题以及其他问题)。
- It does not scale well. For example, let’s say you have a PHP script that creates a new thread to handle some work, and that script is executed upon each request. This means that for each request, your application will create one new thread (this is a 1:1 threading model – one thread to one request). If your application is serving 1,000 requests per second, then it is creating 1,000 threads per second! Having this many threads running on a single machine will quickly inundate it, and the problem will only be exacerbated as the request rate increases. 它的伸缩性不好。 例如,假设您有一个PHP脚本,该脚本创建一个新线程来处理某些工作,并且该脚本会在每次请求时执行。 这意味着对于每个请求,您的应用程序将创建一个新线程(这是1:1线程模型–一个请求的线程)。 如果您的应用程序每秒处理1,000个请求,那么它每秒就创建1,000个线程! 在一台计算机上运行这么多线程将很快淹没它,并且随着请求率的提高,问题只会更加严重。
That’s why threading is not a good solution in such an environment. If you’re looking for threading as a solution to IO-blocking tasks (such as performing HTTP requests), then let me point you in the direction of asynchronous programming, which can be achieved via frameworks such as Amp. SitePoint has released some excellent articles that cover this topic (such as writing asynchronous libraries and Modding Minecraft in PHP), in case you’re interested.
这就是为什么在这种环境下线程化不是一个好的解决方案。 如果您正在寻找将线程作为IO阻止任务(例如执行HTTP请求)的解决方案的方法,那么让我指出异步编程的方向,这可以通过Amp之类的框架来实现。 如果您有兴趣,SitePoint已发布了一些涵盖该主题的优秀文章(例如, 编写异步库和使用PHP进行Modding Minecraft )。
With that out of the way, let’s jump straight into things!
顺便说一句,让我们直接开始吧!
处理一次性任务 (Handling one-off tasks)
Sometimes, you will want to handle one-off tasks in a multi-threaded way (such as performing some IO-bound task). In such instances, the Thread
class may be used to create a new thread and run some unit of work in that separate thread.
有时,您可能希望以多线程方式处理一次性任务(例如执行某些IO绑定任务)。 在这种情况下,可以使用Thread
类创建新线程并在该单独线程中运行某些工作单元。
For example:
例如:
$task = new class extends Thread {
private $response;
public function run()
{
$content = file_get_contents("http://google");
preg_match("~<title>(.+)</title>~", $content, $matches);
$this->response = $matches[1];
}
};
$task->start() && $task->join();
var_dump($task->response); // string(6) "Google"
In the above, the run
method is our unit of work that will be executed inside of the new thread. When invoking Thread::start
, the new thread is spawned and the run
method is invoked. We then join the spawned thread back to the main thread (via Thread::join
), which will block until the separate thread has finished executing. This ensures that the task has finished executing before we attempt to output the result (stored in $task->response
).
在上面, run
方法是我们的工作单元,将在新线程内部执行。 调用Thread::start
,将生成新线程并调用run
方法。 然后,我们将生成的线程重新加入主线程(通过Thread::join
),该主线程将阻塞直到单独的线程完成执行。 这样可以确保在我们尝试输出结果之前(存储在$task->response
),任务已完成执行。
It may not be desirable to pollute a class’s responsibility with thread-related logic (including having to define a run
method). We are able to segregate such classes by having them extend the Threaded
class instead, where they can then be run inside other threads:
可能不希望使用与线程相关的逻辑(包括必须定义run
方法)来污染类的责任。 通过使它们扩展Threaded
类,我们可以隔离此类,然后可以在其他线程中运行它们:
class Task extends Threaded
{
public $response;
public function someWork()
{
$content = file_get_contents('http://google');
preg_match('~<title>(.+)</title>~', $content, $matches);
$this->response = $matches[1];
}
}
$task = new Task;
$thread = new class($task) extends Thread {
private $task;
public function __construct(Threaded $task)
{
$this->task = $task;
}
public function run()
{
$this->task->someWork();
}
};
$thread->start() && $thread->join();
var_dump($task->response);
Any class that needs to be run inside of a separate thread must extend the Threaded
class in some way. This is because it provides the necessary abilities to run inside different threads, as well as providing implicit safety and useful interfaces (for things like resource synchronization).
任何需要在单独的线程中运行的类都必须以某种方式扩展Threaded
类。 这是因为它提供了在不同线程中运行的必要能力,以及提供隐式安全性和有用的接口(例如资源同步之类的东西)。
Let’s take a quick look at the hierarchy of classes exposed by pthreads:
让我们快速看一下pthreads公开的类的层次结构:
Threaded (implements Traversable, Collectable)
Thread
Worker
Volatile
Pool
We’ve already seen and learnt the basics about the Thread
and Threaded
classes, so now let’s take a look at the remaining three (Worker
, Volatile
, and Pool
).
我们已经看过并学习了有关Thread
和Threaded
类的基础知识,所以现在让我们看一下剩下的三个( Worker
, Volatile
和Pool
)。
回收线 (Recycling threads)
Spinning up a new thread for every task to be parallelised is expensive. This is because a shared-nothing architecture must be employed by pthreads in order to achieve threading inside PHP. What this means is that the entire execution context of the current instance of PHP’s interpreter (including every class, interface, trait, and function) must be copied for each thread created. Since this incurs a noticeable performance impact, a thread should always be reused when possible. Threads may be reused in two ways: with Worker
s or with Pool
s.
为要并行执行的每个任务启动新线程非常昂贵。 这是因为pthread必须采用无共享体系结构才能在PHP中实现线程化。 这意味着必须为创建的每个线程复制PHP解释器的当前实例的整个执行上下文(包括每个类,接口,特征和函数)。 由于这会引起明显的性能影响,因此应尽可能重用线程。 线程可以两种方式重用: Worker
或Pool
。
The Worker
class is used to execute a series of tasks synchronously inside of another thread. This is done by creating a new Worker
instance (which creates a new thread), and then stacking the tasks onto that separate thread (via Worker::stack
).
Worker
类用于在另一个线程内同步执行一系列任务。 这是通过创建一个新的Worker
实例(它创建一个新线程),然后将任务堆叠到该单独的线程上(通过Worker::stack
)来完成的。
Here’s a quick example:
这是一个简单的例子:
class Task extends Threaded
{
private $value;
public function __construct(int $i)
{
$this->value = $i;
}
public function run()
{
usleep(250000);
echo "Task: {$this->value}\n";
}
}
$worker = new Worker();
$worker->start();
for ($i = 0; $i < 15; ++$i) {
$worker->stack(new Task($i));
}
while ($worker->collect());
$worker->shutdown();
Output:
输出:
The above stacks 15 tasks onto the new $worker
object via Worker::stack
, and then processes them in the stacked order. The Worker::collect
method, as seen above, is used to clean up the tasks once they have finished executing. By using it inside of a while loop, we block the main thread until all stacked tasks have finished executing and have been cleaned up before we trigger Worker::shutdown
. Shutting down the worker prematurely (i.e. whilst there are still tasks to be executed) will still block the main thread until all tasks have finished executing – the tasks will simply not be garbage collected (causing memory leaks).
上面的代码通过Worker::stack
将15个任务堆叠到新的$worker
对象上,然后以堆叠的顺序处理它们。 如上所示, Worker::collect
方法用于在任务完成执行后清理它们。 通过在while循环内使用它,我们将阻塞主线程,直到所有堆叠的任务完成执行并在触发Worker::shutdown
之前被清除为止。 过早地关闭工作线程(即,尽管仍有任务要执行)仍将阻塞主线程,直到所有任务执行完毕–不会简单地收集垃圾任务(导致内存泄漏)。
The Worker
class provides a few other methods pertaining to its task stack, including Worker::unstack
to remove the oldest stacked item, and Worker::getStacked
for the number of items on the execution stack. The worker’s stack only holds the tasks that are to be executed. Once a task in the stack has been executed, it is removed and then placed on a separate (internal) stack to be garbage collected (using Worker::collect
).
Worker
类提供了与其任务堆栈相关的其他一些方法,其中包括Worker::unstack
来删除最旧的堆栈项,而Worker::getStacked
表示执行堆栈上的项数。 工作者的堆栈仅保存要执行的任务。 一旦执行了堆栈中的任务,便将其删除,然后放在一个单独的(内部)堆栈上以进行垃圾回收(使用Worker::collect
)。
Another way to reuse a thread when executing many tasks is to use a thread pool (via the Pool
class). Thread pools are powered by a group of Worker
s to enable for tasks to be executed concurrently, where the concurrency factor (the number of threads the pool runs on) is specified upon pool creation.
执行许多任务时重用线程的另一种方法是使用线程池(通过Pool
类)。 线程池由一组Worker
供电,以使任务能够并发执行,其中在创建池时指定并发因子(运行池的线程数)。
Let’s adapt the above example to use a pool of workers instead:
让我们将上面的示例改编为改为使用工作池:
class Task extends Threaded
{
private $value;
public function __construct(int $i)
{
$this->value = $i;
}
public function run()
{
usleep(250000);
echo "Task: {$this->value}\n";
}
}
$pool = new Pool(4);
for ($i = 0; $i < 15; ++$i) {
$pool->submit(new Task($i));
}
while ($pool->collect());
$pool->shutdown();
Output:
输出:
There are a few notable differences between using a pool as opposed to a worker. Firstly, pools do not need to be manually started, they begin executing tasks as soon as they become available. Secondly, we submit tasks to the pool, rather than stack them. Also, the Pool
class does not extend Threaded
, and so it may not be passed around to other threads (unlike Worker
).
使用池而不是使用工作池之间存在一些显着差异。 首先,池不需要手动启动,它们在可用时就开始执行任务。 其次,我们将任务提交到池中,而不是将其堆叠 。 而且, Pool
类不会扩展Threaded
,因此它可能不会传递给其他线程(与Worker
不同)。
As a matter of good practice, workers and pools should always have their tasks collected once finished, and be manually shut down. Threads created via the Thread
class should also be joined back to the creator thread.
作为一个好的实践,工作人员和资源池应始终在完成任务后收集其任务,并手动将其关闭。 通过Thread
类创建的Thread
也应重新加入创建者线程。
pthread和(im)可变性 (pthreads and (im)mutability)
The final class to cover is Volatile
– a new addition to pthreads v3. Immutability has become an important concept in pthreads, since without it, performance is severely degraded. Therefore, by default, the properties of Threaded
classes that are themselves Threaded
objects are now immutable, and so they cannot be reassigned after initial assignment. Explicit mutability for such properties is now favoured, and can still be done by using the new Volatile
class.
涉及的最后一类是Volatile
-pthreads v3的新增功能。 不变性已成为pthread中的重要概念,因为如果没有它,性能将严重下降。 因此,默认情况下,本身就是Threaded
对象的Threaded
类的属性现在是不可变的,因此在初始分配后无法重新分配它们。 现在支持此类属性的显式可变性,并且仍然可以通过使用新的Volatile
类来实现。
Let’s take a quick look at an example to demonstrate the new immutability constraints:
让我们快速看一个示例,以演示新的不变性约束:
class Task extends Threaded // a Threaded class
{
public function __construct()
{
$this->data = new Threaded();
// $this->data is not overwritable, since it is a Threaded property of a Threaded class
}
}
$task = new class(new Task()) extends Thread { // a Threaded class, since Thread extends Threaded
public function __construct($tm)
{
$this->threadedMember = $tm;
var_dump($this->threadedMember->data); // object(Threaded)#3 (0) {}
$this->threadedMember = new StdClass(); // invalid, since the property is a Threaded member of a Threaded class
}
};
Threaded
properties of Volatile
classes, on the other hand, are mutable:
另一方面, Volatile
类的Threaded
属性是可变的:
class Task extends Volatile
{
public function __construct()
{
$this->data = new Threaded();
$this->data = new StdClass(); // valid, since we are in a volatile class
}
}
$task = new class(new Task()) extends Thread {
public function __construct($vm)
{
$this->volatileMember = $vm;
var_dump($this->volatileMember->data); // object(stdClass)#4 (0) {}
// still invalid, since Volatile extends Threaded, so the property is still a Threaded member of a Threaded class
$this->volatileMember = new StdClass();
}
};
We can see that the Volatile
class overrides the immutability enforced by its parent Threaded
class to enable for Threaded
properties to be reassignable (as well as unset()
).
我们可以看到, Volatile
类重写了其父Threaded
类强制实施的不变性,以使Threaded
属性可以重新分配(以及unset()
)。
There’s just one last fundamental topic to cover with respect to mutability and the Volatile
class – arrays. Arrays in pthreads are automatically coerced to Volatile
objects when assigned to the property of a Threaded
class. This is because it simply isn’t safe to manipulate an array from multiple contexts in PHP.
关于可变性和Volatile
类-数组,只有最后一个基本主题需要讨论。 将pthread中的数组分配给Threaded
类的属性后,它们会自动强制转换为Volatile
对象。 这是因为在PHP中从多个上下文操纵数组根本不安全。
Let’s again take a quick look at an example to better understand things:
让我们再次快速看一个示例,以更好地理解事物:
$array = [1,2,3];
$task = new class($array) extends Thread {
private $data;
public function __construct(array $array)
{
$this->data = $array;
}
public function run()
{
$this->data[3] = 4;
$this->data[] = 5;
print_r($this->data);
}
};
$task->start() && $task->join();
/* Output:
Volatile Object
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
*/
We can see that Volatile
objects can be treated as if they were arrays, since they provide support for the array-based operations (as shown above) with the subset operator ([]
). Volatile
classes are not, however, supported by the common array-based functions, such as array_pop
and array_shift
. Instead, the Threaded
class provides us with such operations as built-in methods.
我们可以看到, Volatile
对象可以被视为数组,因为它们通过子集运算符( []
)为基于数组的操作提供支持(如上所示)。 但是,常见的基于数组的函数(例如array_pop
和array_shift
不支持Volatile
类。 相反, Threaded
类为我们提供了诸如内置方法之类的操作。
As a demonstration:
作为演示:
$data = new class extends Volatile {
public $a = 1;
public $b = 2;
public $c = 3;
};
var_dump($data);
var_dump($data->pop());
var_dump($data->shift());
var_dump($data);
/* Output:
object(class@anonymous)#1 (3) {
["a"]=> int(1)
["b"]=> int(2)
["c"]=> int(3)
}
int(3)
int(1)
object(class@anonymous)#1 (1) {
["b"]=> int(2)
}
*/
Other supported operations include Threaded::chunk
and Threaded::merge
.
其他受支持的操作包括Threaded::chunk
和Threaded::merge
。
同步化 (Synchronization)
The final topic we will be covering in this article is synchronization in pthreads. Synchronization is a technique for enabling controlled access to shared resources.
我们将在本文中讨论的最后一个主题是pthread中的同步。 同步是一种用于对共享资源进行受控访问的技术。
For example, let’s implement a naive counter:
例如,让我们实现一个幼稚的计数器:
$counter = new class extends Thread {
public $i = 0;
public function run()
{
for ($i = 0; $i < 10; ++$i) {
++$this->i;
}
}
};
$counter->start();
for ($i = 0; $i < 10; ++$i) {
++$counter->i;
}
$counter->join();
var_dump($counter->i); // outputs a number from between 10 and 20
Without using synchronization, the output isn’t deterministic. Multiple threads writing to a single variable without controlled access has caused updates to be lost.
如果不使用同步,则输出是不确定的。 多个线程在没有控制访问的情况下写入单个变量已导致更新丢失。
Let’s rectify this by adding synchronization so that we receive the correct output of 20
:
让我们通过添加同步来纠正此问题,以便我们收到正确的输出20
:
$counter = new class extends Thread {
public $i = 0;
public function run()
{
$this->synchronized(function () {
for ($i = 0; $i < 10; ++$i) {
++$this->i;
}
});
}
};
$counter->start();
$counter->synchronized(function ($counter) {
for ($i = 0; $i < 10; ++$i) {
++$counter->i;
}
}, $counter);
$counter->join();
var_dump($counter->i); // int(20)
Synchronized blocks of code can also cooperate with one-another using Threaded::wait
and Threaded::notify
(along with Threaded::notifyOne
).
同步的代码块还可以使用Threaded::wait
和Threaded::notify
(以及Threaded::notifyOne
) Threaded::notifyOne
。
Here’s a staggered increment from two synchronized while loops:
这是两个同步while循环的交错增量:
$counter = new class extends Thread {
public $cond = 1;
public function run()
{
$this->synchronized(function () {
for ($i = 0; $i < 10; ++$i) {
var_dump($i);
$this->notify();
if ($this->cond === 1) {
$this->cond = 2;
$this->wait();
}
}
});
}
};
$counter->start();
$counter->synchronized(function ($counter) {
if ($counter->cond !== 2) {
$counter->wait(); // wait for the other to start first
}
for ($i = 10; $i < 20; ++$i) {
var_dump($i);
$counter->notify();
if ($counter->cond === 2) {
$counter->cond = 1;
$counter->wait();
}
}
}, $counter);
$counter->join();
/* Output:
int(0)
int(10)
int(1)
int(11)
int(2)
int(12)
int(3)
int(13)
int(4)
int(14)
int(5)
int(15)
int(6)
int(16)
int(7)
int(17)
int(8)
int(18)
int(9)
int(19)
*/
You may have noticed the additional conditions that have been placed around the invocations to Threaded::wait
. These conditions are crucial because they only allow a synchronized callback to resume when it has received a notification and the specified condition is true
. This is important because notifications may come from places other than calls to Threaded::notify
. Thus, if the calls to Threaded::wait
were not enclosed within conditions, we would be open to spurious wakeup calls, which will lead to unpredictable code.
您可能已经注意到在Threaded::wait
调用周围放置的其他条件。 这些条件至关重要,因为它们仅允许同步回调在收到通知且指定条件为true
时恢复。 这很重要,因为通知可能来自调用Threaded::notify
之外的其他地方。 因此,如果对Threaded::wait
的调用未包含在条件中,则我们可能会接受虚假的唤醒调用 ,这将导致无法预测的代码。
结论 (Conclusion)
We have seen the five classes pthreads packs with it (Threaded
, Thread
, Worker
, Volatile
, and Pool
), including covering when each of the classes are used. We have also looked at the new immutability concept in pthreads, as well as having a quick tour of the synchronization feature it supports. With these fundamentals covered, we can now begin to look into applying pthreads to some real world use-cases! That will be the topic of our next post.
我们已经看到了五个带有pthreads包的类( Threaded
, Thread
, Worker
, Volatile
和Pool
),其中包括何时使用每个类。 我们还研究了pthread中的新不变性概念,并快速浏览了其支持的同步功能。 涵盖了这些基础知识之后,我们现在可以开始研究将pthread应用于一些实际的用例! 这将是我们下一篇文章的主题。
In the meanwhile, if you have some application ideas regarding pthreads, don’t hesitate to drop them below into the comments area!
同时,如果您有关于pthread的一些应用程序想法,请不要犹豫,将它们放在注释区域以下!
翻译自: https://www.sitepoint/parallel-programming-pthreads-php-fundamentals/
更多推荐
PHP中使用Pthread进行并行编程-基础
发布评论