The theoretical component.It so happened, the bottleneck in many computers is the interface memory - processor. This is due to the significant time of forming a request to the memory and the fact that the memory frequency is lower than the frequency of the processor. In general, at the time of receiving data from memory, the execution of the program stops. To improve the situation, use specialized high-speed memory - cache.
If there is a cache, if the processor needs to access a certain byte in RAM, then the memory area containing this byte (the size of the area depends on the processor) is first selected in the cache, and then the processor reads the data from the cache. If the processor writes data to the memory, they first get into the cache, and then into the RAM.
')
In a multi-core and multi-processor system, each core or processor has its own personal cache. When modifying data in the cache with a single processor, the changes should be synchronized with other processors using intersecting address space.
If changes occur frequently and the data areas overlap strongly, performance will fall.
I have a question how bad things can be. In addition, it was interesting to see how the data length affects.
Experiment.For checking, 2 multi-threaded C ++ programs were written for two cases of intersection of working areas:
- The areas do not intersect and are in the stack of each of the threads;
- The working area for threads is common; it is an array of 1, 4, and 8 byte words, with a total length of 1024 bytes, each of the threads modifying its words (for example, even or even odd).
Working threads - 2, each, changing its words, passes the whole array and, having reached the border, returns to the beginning and so on. Each of the threads makes 100,000,000 changes.
UPD Thank you mikhanoid for the identified typo in the name of the CPU.
For testing, the HP xw4600 workstation, CPU Core 2 Quad (Q9300), OS Linux Slamd64 12.1, kernel 2.6.24.5, compiler gcc 4.2.3 was used. The result was a 64-bit program, so any arithmetic operation for words of 1.4 and 8 bytes in length was performed in 1 clock cycle.
The execution time was measured with the command time (1), and averaged over 5 experiments.
First, a program was written that simply increments the word and puts it back, but it turned out that in the case of an array on the stack, optimization -O2 creates something indecent. Therefore, a second program was written that does something more complex with the data.
Results.UPD: Thnx for noticing mt_ and Google Charts for api charts. Result in graphic form.
Those who want to see the numbers, can squander further.
On the y-axis, the real time of program execution in seconds. X axis word length (in pairs).
Program 1.

Program 2.

Program 2 + -O2 optimization.
findingsIt can be said for sure that in the case of obviously non-intersecting areas it works faster, and significantly. Optimization somehow affects, but the trend continues.
But the dependence on the length of the word is somehow hardly noticeable.
The idea of this opus is inspired by the article
www.ddj.com/architect/208200273 .
UPD Attempt to transfer to the Perfect Code blog, it seems to me that this is the right place.
Results in tabular form.Results are presented in seconds.
Program 1 (shared memory).
Word length | 64 | 32 | eight |
---|
Real time | 1,297 | 1.532 | 0.846 |
---|
User time regime | 2,461 | 3,049 | 1.664 |
---|
Program 1 (stack)
Word length | 64 | 32 | eight |
---|
Real time | 0.453 | 0.431 | 0.441 |
---|
User time regime | 0.851 | 0.842 | 0.866 |
---|
Program 2 (shared memory).
Word length | 64 | 32 | eight |
---|
Real time | 9,191 | 9.033 | 8,921 |
---|
User time regime | 18,365 | 18,039 | 17,824 |
---|
Program 2 (stack)
Word length | 64 | 32 | eight |
---|
Real time with | 8,432 | 8,412 | 8,808 |
---|
User time Mode with | 16.843 | 16.796 | 17,548 |
---|
Program 2 + optimization-O2 (shared memory).
Word length | 64 | 32 | eight |
---|
Real time | 4,247 | 4,380 | 3,473 |
---|
User time regime | 8,415 | 8,960 | 6,781 |
---|
Program 2 + optimization-O2 (stack)
Word length | 64 | 32 | eight |
---|
Real time | 3.282 | 3.718 | 3,308 |
---|
User time regime | 6.550 | 7.384 | 5.565 |
---|
Source.Program 1.#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int8_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
array[index] = array[index] + 1;
#else
array1[index] = array1[index] + 1;
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int8_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
array[index] = array[index] + 1;
#else
array1[index] = array1[index] + 1;
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int8_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
array[index] = array[index] + 1;
#else
array1[index] = array1[index] + 1;
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .
Program 2.#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int64_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
//#define DOIT 100000000 * (sizeof(u_int64_t) / sizeof(MTYPE))
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
template<typename RTYPE, int TLEN>
RTYPE scramble(u_int16_t *seed, RTYPE a){
int d;
//
for ( d = 0; d < 8; ++d){
u_int16_t mask;
//
mask = (*seed & 0x1) ^ ( (*seed >> 1) &0x1);
mask = mask & 0x1;
*seed = (mask << 15) | (*seed >> 1);
//
a = a ^ ((RTYPE)mask << d);
}
//
return a;
};
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
u_int16_t seed = 0x4a80;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
MTYPE data;
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
data = array[index];
array[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#else
data = array1[index];
array1[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int64_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
//#define DOIT 100000000 * (sizeof(u_int64_t) / sizeof(MTYPE))
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
template<typename RTYPE, int TLEN>
RTYPE scramble(u_int16_t *seed, RTYPE a){
int d;
//
for ( d = 0; d < 8; ++d){
u_int16_t mask;
//
mask = (*seed & 0x1) ^ ( (*seed >> 1) &0x1);
mask = mask & 0x1;
*seed = (mask << 15) | (*seed >> 1);
//
a = a ^ ((RTYPE)mask << d);
}
//
return a;
};
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
u_int16_t seed = 0x4a80;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
MTYPE data;
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
data = array[index];
array[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#else
data = array1[index];
array1[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define TBYTES 1024
//
#define TOTAL 2
// u_int8_t u_int32_t u_int64_t
#define MTYPE u_int64_t
//
#define MAXAR TBYTES/ sizeof (MTYPE)
//#define DOIT 100000000 * (sizeof(u_int64_t) / sizeof(MTYPE))
#define DOIT 100000000
MTYPE *array;
pthread_mutex_t mux = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int workers = 2;
template<typename RTYPE, int TLEN>
RTYPE scramble(u_int16_t *seed, RTYPE a){
int d;
//
for ( d = 0; d < 8; ++d){
u_int16_t mask;
//
mask = (*seed & 0x1) ^ ( (*seed >> 1) &0x1);
mask = mask & 0x1;
*seed = (mask << 15) | (*seed >> 1);
//
a = a ^ ((RTYPE)mask << d);
}
//
return a;
};
//
//#define SHARED 1
void *
runner( void * args){
unsigned int which = *(unsigned int *)args; delete (unsigned int *)args;
unsigned int index = which;
u_int16_t seed = 0x4a80;
MTYPE array1[MAXAR];
//
for ( unsigned int i = 0; i < DOIT; ++i){
MTYPE data;
if ( index >= MAXAR)
index = which;
//
#if defined(SHARED)
data = array[index];
array[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#else
data = array1[index];
array1[index] = scramble<MTYPE, 8* sizeof (MTYPE)>(&seed, data);
#endif
//
index += TOTAL;
}
//
pthread_mutex_lock(&mux);
-- workers;
pthread_mutex_unlock(&mux);
pthread_cond_broadcast(&cond);
//
return 0;
};
int
main() {
//
array = (MTYPE*)calloc(MAXAR, sizeof (MTYPE));
//
unsigned int *which;
pthread_t t1;
pthread_attr_t attr;
//
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
//
which = new unsigned int (0);
pthread_create(&t1, &attr, runner, ( void *)which);
//
which = new unsigned int (1);
pthread_create(&t1, &attr, runner, ( void *)which);
//
pthread_mutex_lock(&mux);
while (!pthread_cond_wait(&cond, &mux)){
if ( !workers)
break ;
};
pthread_mutex_unlock(&mux);
//
return 0;
};
//
* This source code was highlighted with Source Code Highlighter .