Firewood is being sawn, saws are improving, the boards are all long and long,
but the speed of our programs is not comparable to the size of these boards ...
I once thought to write a big-wide compiler in my 18, a whole notebook wrote out ideas for him.
So he died for the eternal optimization of his own code ... =)
I decided to present to the public some of my ideas.
and if they are interested in something, please contact me to determine the farthest activity.
Simply put - I am looking for friends to develop a self-optimizing compiler based on datamining and genetic algorithms + many fun delicacies of the standard library.
This is how my small preface of the first post on Habré begins.
This unsubscribe does not require full disclosure of the topic, but simply explains my position.
about existing systems for compiling and processing code that I use in my workings out.
')
Let's start…
We all know that streams are good: they allow us to “split-apart” our programs.
and use them on multi-core architectures "100%"
Often people think, “Oooh, maybe better, I’ll declare several threads to handle this cycle,
and maybe it will be executed once on 64 nuclear machine "
And they do not understand how paradoxical it sounds.
First: respect our grandfathers-kodders, having heard this, they would have had a second heart attack.
Why declare a static number of structures to handle the dynamic number of other structures.
Secondly: is it profitable? Let's imagine 16 threads on a 4-core processor - the standard situation. Is not it? it turns out that we divide everything into 16 small parts and add the lion's share of recurring threat api to them, and it will not be easier to declare only 4 threads and win performance on this by reducing the time for initialization and destruction of streams ...
Third: Who said that the distribution of tasks for all threats will be equally divided. And it does not work that 3 threads are waiting for the result of one.
Fourth: memory corrupt, memory poisoning, dead locks and so on ... All OpenMP users and
Pthreats have encountered similar problems. Although they were solved long ago using erlang
Fifth: CUDA, OpenCL, DirectCompute Everyone remembered that the video card is a processor ...
Not much time has passed. And the meaning of these developments? Increase speed, etc.
So the personal dump of the good old Photoshop CS4 shows that 40-60% of the time can
spent on “changing the CUDA core modes” as it was called developed by Nvidia.
People have forgotten that the shader pipeline has limited accuracy and a set of commands.
So he sometimes has to dump part of the code for execution into the processor.
Sixthly: how many extensions for drawing graphics do we know? .. yes full of.
And how many of them are used in Cuda?
Yes, maybe you are an advanced mego-kodder, I'm not a simple student.
And for you the game of streams (and grandfathers) is quite familiar to us in the last 3 years.
Well, fumble (let's say) you ... Fumble ...
Is it worth the candle game? How much more difficult debag?
How complicated is the synchronization / asynchrony of threads?
Or do you not bother your head? .. It works, the word of the customer is the law, etc.
So I began to rake this disgrace. Fun is not self-serving and reckless ...
It is necessary to automate / optimize it.
To be continued, maybe this is not enough, but in the case ...
In the second post I will tell you what I want to do.
I hope for understanding and soft criticism "on the topic."
Thanks for attention.