Non blocking algorithms are based on the idea that processor hardware can provide a way to implement some thread unsafe operations in an atomic way. Usually only few such operations are needed to implement non blocking data container.
On Intel platform Windows API provides implementation of interlocked functions which perform thread unsafe operations in an atomic way. This helps application code to avoid using kernel objects and critical sections for synchronization between threads. In general performance advantage of such solution is obvious.
But note that some non blocking algorithms can be actually slower than their blocking counterparts because atomic primitives are based on “probe, repeat, set” principle which can set another problem of tuning an operating system properly and setting correct thread priorities in the software.
I was wondering how portable can be software which is based partially on non blocking algorithms.
I made a sample which runs fine both on Windows and Linux Intel based machines.
This sample implementation is based on non blocking stack originally proposed by
R. K. Treiber in “Systems Programming: Coping with Parallelism” (In RJ 5118, IBM Almaden Research Center, April 1986).
#include "iostream"
#ifdef _WIN32
//Intel ASM syntax supported by VC
template<class T>
bool compare_and_swap(T * addr, const T old_val, const T new_val)
{
T addr_tmp = *addr;
bool result = 0;
__asm{
MOV eax, old_val
MOV ecx, new_val
LOCK CMPXCHG addr_tmp, ecx
JNZ exit_label
MOV result, 1
exit_label:
};
*addr = addr_tmp;
return result;
}
#else
//AT&T asm syntax supported by gcc
template<class T>
bool compare_and_swap(T * ptr, const T old, const T new_val)
{
unsigned char ret=0;
/* Note that sete sets a 'byte' not the word */
__asm__ __volatile__ (" lock\n" \
" cmpxchg %2,%1\n" \
" sete %0\n" \
: "=q" (ret), "=m" (*ptr) \
: "r" (new_val), "m" (*ptr), "a" (old) \
: "memory");
return ret;
}
#endif
template<class T>
class NBStack{
typedef T value_type;
struct Node{
value_type Value;
Node * pNext;
};
private:
Node * pStackTop;
public:
NBStack(){ pStackTop = NULL; }
void Push(const value_type & val){
Node * pNewNode = new Node();
pNewNode->Value = val;
pNewNode->pNext = NULL;
Node * pTmp;
do{
pTmp = pStackTop;
pNewNode->pNext = pTmp;
}while( !(compare_and_swap(&pStackTop, pTmp, pNewNode)) );
}
bool Pop(value_type & val){
Node * pTmp;
do{
pTmp = pStackTop;
if (NULL == pTmp) return false;
}
while(! (compare_and_swap(&pStackTop, pTmp, pTmp->pNext)) );
val = pTmp->Value;
delete pTmp;
return true;
}
};
int main(int argc, char * argv[]){
using namespace std;
NBStack<int> s;
{
for(int i = 0; i<10; ++i)
s.Push(i);
}
{
int x;
for(int i = 0; i<12; ++i){
if (s.Pop(x)){
cout << x << endl;
}else{
cout << "stack is empty" << endl;
}
}
}
return 0;
};
However, I am not sure how well this code runs on AMD 32 and 64 bits since I don’t have an opportunity to run it on these machines.