SPSC螺纹安全带栅栏(SPSC thread safe with fences)

我只希望我的代码尽可能简单并且线程安全。

用C11原子

关于ISO / IEC 9899 / 201X草案的“7.17.4栅栏”部分

X和Y,都在一些原子对象M上运行,这样A在X之前排序,X在M之前修改,Y在B之前排序,Y读取由X写的值或由假设释放中的任何副作用写的值如果它是一个释放操作,序列X将会结束。

此代码线程是否安全(“w_i”为“对象M”)? “w_i”和“r_i”都需要声明为_Atomic吗? 如果只有w_i是_Atomic,主线程是否可以在缓存中保留旧的r_i值并将队列视为未满(当它已满)并写入数据? 如果我在没有atomic_load的情况下读取原子会怎么样?

我做了一些测试,但我的所有尝试似乎都给出了正确的结果。 但是,我知道我的测试在多线程方面并不正确:我多次运行我的程序并查看结果。

即使不是r_i都没有被声明为_Atomic,我的程序也能正常运行,但是对于C11标准来说只有围栏是不够的,对吧?

typedef int rbuff_data_t; struct rbuf { rbuff_data_t * buf; unsigned int bufmask; _Atomic unsigned int w_i; _Atomic unsigned int r_i; }; typedef struct rbuf rbuf_t; static inline int thrd_tryenq(struct rbuf * queue, rbuff_data_t val) { size_t next_w_i; next_w_i = (queue->w_i + 1) & queue->bufmask; /* if ring full */ if (atomic_load(&queue->r_i) == next_w_i) { return 1; } queue->buf[queue->w_i] = val; atomic_thread_fence(memory_order_release); atomic_store(&queue->w_i, next_w_i); return 0; } static inline int thrd_trydeq(struct rbuf * queue, rbuff_data_t * val) { size_t next_r_i; /*if ring empty*/ if (queue->r_i == atomic_load(&queue->w_i)) { return 1; } next_r_i = (queue->r_i + 1) & queue->bufmask; atomic_thread_fence(memory_order_acquire); *val = queue->buf[queue->r_i]; atomic_store(&queue->r_i, next_r_i); return 0; }

我把这些函数称为如下: 主线程将一些数据排入队列:

while (thrd_tryenq(thrd_get_queue(&tinfo[tnum]), i)) { usleep(10); continue; }

其他线程出列数据:

static void * thrd_work(void *arg) { struct thrd_info *tinfo = arg; int elt; atomic_init(&tinfo->alive, true); /* busy waiting when queue empty */ while (atomic_load(&tinfo->alive)) { if (thrd_trydeq(&tinfo->queue, &elt)) { sched_yield(); continue; } printf("Thread %zu deq %d\n", tinfo->thrd_num, elt); } pthread_exit(NULL); }

随着asm围栏

关于具有lfence和sfence的特定平台x86,如果我删除所有C11代码并且只需替换fences

asm volatile ("sfence" ::: "memory");

asm volatile ("lfence" ::: "memory");

(我对这些宏的理解是:编译器围栏,以防止内存访问被重新组合/优化+硬件围栏)

我的变量是否需要声明为volatile?

我已经看到上面的这个环形缓冲区代码只有这些asm围栏,但没有原子类型,我真的很惊讶,我想知道这段代码是否正确。

I just want my code as simple as possible and thread safe.

With C11 atomics

Regarding part "7.17.4 Fences" of the ISO/IEC 9899/201X draft

X and Y , both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation.

Is this code thread safe (with "w_i" as "object M") ? Are "w_i" and "r_i" need both to be declared as _Atomic ? If only w_i is _Atomic, can the main thread keep an old value of r_i in cache and consider the queue as not full (while it's full) and write data ? What's going on if I read an atomic without atomic_load ?

I have made some tests but all of my attempts seems to give the right results. However, I know that my tests are not really correct regarding multithread : I run my program several times and look at the result.

Even if neither w_i not r_i are declared as _Atomic, my program work, but only fences are not sufficient regarding C11 standard, right ?

typedef int rbuff_data_t; struct rbuf { rbuff_data_t * buf; unsigned int bufmask; _Atomic unsigned int w_i; _Atomic unsigned int r_i; }; typedef struct rbuf rbuf_t; static inline int thrd_tryenq(struct rbuf * queue, rbuff_data_t val) { size_t next_w_i; next_w_i = (queue->w_i + 1) & queue->bufmask; /* if ring full */ if (atomic_load(&queue->r_i) == next_w_i) { return 1; } queue->buf[queue->w_i] = val; atomic_thread_fence(memory_order_release); atomic_store(&queue->w_i, next_w_i); return 0; } static inline int thrd_trydeq(struct rbuf * queue, rbuff_data_t * val) { size_t next_r_i; /*if ring empty*/ if (queue->r_i == atomic_load(&queue->w_i)) { return 1; } next_r_i = (queue->r_i + 1) & queue->bufmask; atomic_thread_fence(memory_order_acquire); *val = queue->buf[queue->r_i]; atomic_store(&queue->r_i, next_r_i); return 0; }

I call theses functions as follow : Main thread enqueue some data :

while (thrd_tryenq(thrd_get_queue(&tinfo[tnum]), i)) { usleep(10); continue; }

Others threads dequeue data :

static void * thrd_work(void *arg) { struct thrd_info *tinfo = arg; int elt; atomic_init(&tinfo->alive, true); /* busy waiting when queue empty */ while (atomic_load(&tinfo->alive)) { if (thrd_trydeq(&tinfo->queue, &elt)) { sched_yield(); continue; } printf("Thread %zu deq %d\n", tinfo->thrd_num, elt); } pthread_exit(NULL); }

With asm fences

Regarding a specific platform x86 with lfence and sfence, If I remove all C11 code and just replace fences by

asm volatile ("sfence" ::: "memory");

and

asm volatile ("lfence" ::: "memory");

(My understanding of these macro is : compiler fence to prevent memory access to be reoganized/optimized + hardware fence)

do my variables need to be declared as volatile for instance ?

I have already seen this ring buffer code above with only these asm fences but with no atomic types and I was really surprised, I want to know if this code was correct.

最满意答案

我只回答C11原子,平台细节太复杂,应该逐步淘汰。

C11中线程之间的同步只能通过一些系统调用(例如对于mtx_t )和原子来保证。 甚至没有尝试去做。

也就是说,同步通过原子来工作,即通过对原子的影响的可见性保证副作用的可见性。 例如,对于最简单的一致性模型,顺序,每当线程T2看到修改线程T1已经对原子变量A产生影响时,线程T1中的该修改之前的所有副作用对T2都是可见的。

因此,并非所有共享变量都需要是原子的,您只需要确保通过原子正确传播您的状态。 在这种意义上,当您使用顺序或获取 - 释放一致性时,围栏不会给您带来任何好处,它们只会使图片复杂化。

一些更一般的评论:

由于您似乎使用顺序一致性模型(默认情况下),原子操作(例如atomic_load )的函数编写是多余的。 仅评估原子变量就完全一样了。 我的印象是您在开发过程中过早地尝试优化。 我认为你应该先做一个可以证明正确性的实现。 然后, 当且仅当您发现性能问题时,您应该开始考虑优化。 这种原子数据结构不太可能成为您应用程序的真正瓶颈。 你必须拥有大量的线程,这些线程同时对你可怜的小原子变量进行锤击,以便在这里看到可测量的瓶颈。

I just reply regarding C11 atomics, platform specifics are too complicated and should be phased out.

Synchronization between threads in C11 is only guaranteed through some system calls (e.g for mtx_t) and atomics. Don't even try to do it without.

That said, sychronization works via atomics, that is visibility of side effects is guaranteed to propagate via the visibility of effects on atomics. E.g for the simplest consistency model, sequential, whenever thread T2 sees a modification thread T1 has effected on an atomic variable A, all side effects before that modication in thread T1 are visible to T2.

So not all your shared variables need to be atomic, you only must ensure that your state is properly propagated via an atomic. In that sense fences buy you nothing when you use sequential or acquire-release consistency, they only complicate the picture.

Some more general remarks:

Since you seem to use the sequential consistency model, which is the default, the functional writing of atomic operations (e.g atomic_load) is superfluous. Just evaluating the atomic variable is exactly the same. I have the impression that you are attempting optimization much too early in your development. I think you should do an implementation for which you can prove correctness, first. Then, if and only if you notice a performance problem, you should start to think about optimization. It is very unlikely that such an atomic data structure is a real bottleneck for your applcation. You'd have to have a very large number of threads that all simultaneously hammer on your poor little atomic variable, to see a measurable bottleneck here.

更多推荐