Today we are opening in Open Source Qizmt, an internal framework for distributed computing, created by the Data Mining team in Myspace. Qizmt can be used for numerous operations that require processing large amounts of data. Such as filtering in the system of recommendations and analytics.
Some sources have already reported on this issue and wrote that this is a framework for a recommendation system. This is not true. This is a complete implementation of MapReduce , written for Windows. ')
Not so often .NET lovers are faced with open source projects of this level. Despite the fact that the system is declared as Alpha, quite a lot of functionality is declared (which is not surprising, since it seems to be working on the myspace framework)
Rapid development of mapreduce jobs in C #
Easy installer
Integrated IDE / Debugger (including step through debugging jobs on a cluster)
From any machine in the cluster:
Cluster Assembly Cache (CAC) - .NET cache assemblies for mapreduce jobs
3 types of jobs: - Mapreduce - set logic for large amounts of data - Remote - for those tasks that do not fit the mapreducer template - Local - orchestration of connections between Mapreduce and Remote jobs
3 ways to exchange data in mapreduce
- Sorted - key / value pairs are evenly sorted by cluster - Grouped - unsorted, but similar key / value pairs on one reducer - Sorted by hash is a super fast way to sort random data
All this looks quite impressive, although I think now it is not particularly important on what such frameworks are written. They are still used by platform independent methods - light services à la REST / REST2. They say the same Bing uses Hadoop . But in any case, it's nice that colleagues from Myspace shared the code.