📜 ⬆️ ⬇️

World Wide GRID- - the future is near

The XXI century, perhaps, will be the era of the mass introduction of grid technologies. A happy humanity is on the verge of another computer revolution, as a result of which the transformation of the WWW (World Wide Web - Internet), which is usual for us today, to WWG (World Wide GRID - global grid network) will occur.

A magical grid environment that can virtualize processors, memory and communications promises to turn all the computer resources of the world into a kind of giant multiprocessor with almost unlimited computing capabilities.

Informatization today is entering the fourth stage of its development. The first was associated with the emergence of large computers (mainframes), the second with personal computers, the third with the advent of the Internet, which united users into a single information space. The first decade of the 21st century, according to many experts, is marked by the beginning of the transition to new grid technologies.
')
However, the specific forms and mechanisms of this “great transition to the WWG” are not yet clearly defined. Among apologists of grid technologies, there is still no consensus on whether WWG will be created on the basis of already existing Internet capacities or in general “in the open field” - as a universal system for emulating personal computers, users of which will not need a full-featured computer or own software. A lot of unresolved issues remain in the field of protocol standardization, integration of heterogeneous computing resources, ensuring the safety of data storage and transmission.

Supercomputer out of the socket
Grid computing began to form primarily as an integrator of computational resources for solving various “resource-intensive” scientific problems. The idea of ​​more efficient use of computing power by connecting multiple computers into a single structure emerged in the scientific community a relatively long time ago - in the era of mainframes (large computers). Already in the 80s, scientists (primarily nuclear physicists) tried to combine different workstations with each other and use free central processors to reduce processing time to solve complex mathematical problems.

In 1994, a project was launched to create the global computer network GLORIAD (an abbreviation for the Global Ring Network for Advanced Application Development, Global Ring Network for the Development of Applied Research) - a fiber-optic ring in the Northern Hemisphere, combining the computing resources of various research organizations in the USA, Canada, Europe, Russia, China and South Korea (again, mainly physical centers). Russia joined this project in 1996, and today our country is represented in it by the Russian Research Center Kurchatov Institute and the Russian Research Institute for the Development of Public Networks.

Nevertheless, formally, the authors of the grid concept are Jan Foster from the Aragon National Laboratory of the University of Chicago and Karl Kesselman from the Institute of Computer Science at the University of Southern California. It was Foster and Kesselman in 1998 who first proposed the term grid computing to refer to a universal software and hardware infrastructure uniting computers and supercomputers into a geographically-distributed information and computing system. According to their now classic definition, “Grid (grid) is a coherent, open and standardized environment that provides a flexible, secure, coordinated sharing of resources within a virtual organization.”

The term grid computing was coined by analogy with the term power grid.

Users of computer power will be able to directly connect to a remote computer network (as well as to electricity through household sockets), without being puzzled by the question of where exactly the computing resources required for operation come from, what transmission lines are used for this, etc.

The main resource elements of grid systems are supercomputers and supercomputer centers, and the most important infrastructure component is high-speed data transmission networks.

Supercomputers that are not combined into a geographically distributed system have at least three significant drawbacks. Firstly, this is a very expensive technique that quickly becomes morally obsolete (supercomputers from the first hundred of the Top-500 rankings in two or three years, as a rule, find themselves at the very end of this list or fall out of it altogether). Secondly, this is the “static character” of the computing power of supercomputers, which are practically not amenable to serious modernization, often it does not allow using them to solve problems of a new level of complexity. And finally, the third “big minus” is the low efficiency of supercomputers due to uneven CPU usage.

Ideally, these shortcomings can be eliminated by combining supercomputers into a grid network. However, for efficient operation of grid systems, it is first necessary to come to a consensus in the field of standardization (definition of service standards, interfaces, databases, etc.).

Distribution of computing environments
The authors of the idea of ​​grid computing, Foster and Kesselman, were also at the beginnings of the development of the first standard for the design of grid systems, the open source software tool Globus Toolkit.

For the further development of the Globus Toolkit in 1999, the special organization Global Grid Forum (GGF) was created, which, along with academic organizations, included many manufacturers of computer systems and software.

In 2002, GGF and IBM, as part of the Globus Toolkit 3.0, introduced the new system development Open Grid Services Architecture (OGSA), which incorporated concepts and standards of web services into the grid. In this architecture, a grid service is defined as a special type of web service, which makes it possible to work with grid resources based on standard Internet protocols.

Currently, Globus Toolkit is adapting to its main products using business computing technology, is actively involved in leading players in the computer market, in particular, the same IBM and two whales of ERP technologies, Oracle and SAP.

At the same time, in addition to the most popular Globus project today, other grid standards are being developed in parallel - for example, Legion, Condor and Unicore. So, in 2004, GGF had a new serious competitor - the Enterprise Grid Alliance (EGA) consortium, which included such “monsters” as Fujitsu Siemens Computers, Hewlett-Packard, Intel, NEC, Oracle, Sun Microsystems, EMC .

Moreover, if the main task of GGF was to develop grid requirements for manufacturers of IT solutions, the EGA was initially “sharpened” to meet the needs of corporate users.

At the end of June 2006, GGF and EGA, which managed to spoil nerves to each other, officially announced their merger and the creation of an open forum on distributed computing (Open Grid Forum, OGF). As the new president and executive director of OGF Mark Lynesh, who previously held the post of chairman of the board of directors of GGF, noted, “this step will consolidate the community of supporters of grid ideas and more effectively cooperate with the main market participants in different countries. We will be able to speak with one voice in all matters related to the development and implementation of grids and distributed computing environments. ”

Of course, this happy merger does not mean that the universal standardization of grid technologies is now a resolved matter. One of the most serious obstacles to the victorious distribution of grid networks has been and remains the traditional software licensing model, according to which customers pay depending on the number of processors on which the application runs. The grid actually destroys this model, since within the grid network no central processor has a stable connection with a specific application.

So far, no software vendor has openly announced its intention to change its pricing model to reflect the new specifics of grid computing.

Another “sagging” element of the global grid design is the almost complete absence of standardization of commercial grid software. The fact is that one of the characteristics of early applications for grid computing (used in scientific computing) is the independence of one performed task from the result of solving another . For example, in large grid applications for complex mathematical calculations, the calculations are divided into independent parts, which can be “stacked” at any time. However, many enterprise applications are highly dependent: one calculation or process cannot progress until another ends.

According to Jan Foster, “approaches based on open standards (like Globus Toolkit) will eventually turn the grid into the dominant direction of the development of corporate information infrastructures,” but experts still do not take accurate predictions about the timing of this “corporate IT fracture”.

Search for extraterrestrial intelligence
Grid technologies are advancing much more successfully in the scientific and educational sphere, which is largely due to the active financial support of various grid projects by government agencies.

Grid networks today are used in a variety of fundamental research and design work. Evolution of protoplanetary matter, planets and the Earth, genomics and proteomics, general meteorological forecasting and forecasting of various natural disasters (tsunamis, earthquakes, volcanic eruptions), modeling and analysis of experiments in nuclear physics, nuclear weapons, nanotechnologies, design of aerospace vehicles and vehicles, etc. - probably soon it will be easier to name the scientific discipline, where supercomputers and distributed computing have not yet been applied.

Therefore, below we will limit ourselves to the list of the most serious grid projects that have already been implemented over the past few years or are under implementation.

In 2001, the United States launched the TeraGrid project, funded by the National Science Foundation (NSF), whose main task was to create a distributed infrastructure for high-performance (teraflop) calculations.

In May 2004, the European Union created an analogue of the US TeraGrid - a consortium DEISA (Distributed European Infrastructure for Supercomputing Applications),

partly funded under the 6th Framework program, which united the leading national supercomputer centers of the EU into the grid network.

At the end of March 2004, the three-year European DataGrid (EDG) project was completed, within which a test computing and data exchange infrastructure was built for the needs of the European scientific community.

On the basis of these developments, a new international project for the organization of a high-performance grid network Enabling Grids for E-sciencE (EGEE) was launched, which is being carried out under the guidance of the Swiss CERN (European Center for Nuclear Research, Geneva) and funded by the European Union and the governments of the participating countries. Currently, the project includes 70 scientific institutions from 27 countries of the world, united in 12 federations. As part of this project, the world's largest grid with a total computing capacity of 20,000 CPUs should be built.

The leading role of CERN is determined by the fact that in 2007 they plan to launch the world's largest particle accelerator (LHC, the Large Hadron Collider), which will be the source of a huge amount of information. The new computer infrastructure created primarily for LHC will have to ensure efficient processing of information, the expected average annual volume of which is estimated at 10 petabytes (1 petabyte = ~ 10 15 bytes). The task of EGEE, however, is far from being limited to nuclear physics and is to realize the potential of the grid for many other scientific and technological fields. Thus, in the nearest plans of the project management, the creation of a separate bio-information "grid block".

In close cooperation with the EGEE project, the European European Network for Education and Science, GEANT, is also developing. In the middle of last year, the intergovernmental organization DANTE announced the launch of the new generation GEANT 2 research and education network, which covers 3 million users from 3.5 thousand academic institutions located in 34 European countries (including Russia). The new network will qualitatively change the information processing of radio astronomy complexes, the recording systems of which are located at a considerable distance from each other, and will also serve some of the data transfer processes after the launch of the LHC.

Under the leadership of the University of Pennsylvania USA, a national digital mammography center has been created on the basis of grid technologies with a total data volume (mammograms) of 5.6 petabytes, which provides physicians with instant access to records of millions of patients.

It is worth mentioning the project SETI @ home, initiated by astronomers of the University of California - Berkeley. As part of this project, a virtual grid network was created that regularly analyzes data from the Arecibo radio telescope in Puerto Rico in order to search for extraterrestrial intelligence. Through the Internet, SETI has combined the computing power of more than 5 million personal computers and has already done computational work equivalent to more than 600 thousand years of PC operation (although, so far, no information has been received from the project coordinators on the aliens found).

China starts and can win
The United States today is the undisputed world leader in the practical construction of grid networks. In 2004, George W. Bush officially announced the launch of the Presidential Strategic GRID Program (Strategic Grid Computing Initiative), whose main goal is to “create a single national high-performance computing space” (National High Performance Computing Environment).

To date, four national grid networks under the care of key government departments are already successfully operating in the United States: the national research fund's computer network (NSF Comp. Grid), NASA Information Power Grid, the global information network of the Ministry Defense (DOD GI Grid) and the Department of Energy Supercomputer Initiative Network (DOE ASCI Grid).

Private American companies are also contributing to the process of “universal gridding”. The Sun Grid project of Sun Microsystems, which started last year, is quite original: the computer time of the data center network, containing a total of about 10 thousand processors, is leased by the company with payment at the rate of $ 1 for using one processor per hour. The sale of Sun Grid time is carried out by the company under the contract through Archipelago Holdings, the Chicago-based electronic stock exchange. Through it, the buyer can sell unused watches. Additionally, Sun offers data storage services at a cost of $ 1 per gigabyte per month. The service is offered to organizations with occasional needs for significant computing power.

The concept of grid computing at Oracle assumes the use of the grid network as a universal data management system based on the Oracle 10G database. A special feature of ASM (Automatic Storage Manager) allows you to virtualize disk sets into a single virtual disk, with Oracle placing the functions of a file and volume manager. Oracle itself works with this disk group (virtual disks), placing and managing its files on them. Oracle breaks the entire space of this virtual disk into equal pieces of 1 MB in size and creates virtual database files, table spaces, volumes, etc., from the pieces.

A separate project in the long list of grid projects is the project to build a global grid system promoted by Google. The Google model is the transformation of computing into a consumer service by the type of power supply (which is much in common with the idea implemented by Sun Microsystems). In a sense, Google is returning to the mainframe architecture. In this project, all computer devices (PC, mobile phone, TV, etc.) become just terminals that will be included in the Google server grid with application services.

In other words, Google today is trying to position itself as a universal application delivery system to any device anywhere in the world and thus become a real alternative to the usual personal computer. A strategically important competitive advantage of the Google project is a reduction in the cost of processing a bit of information. To solve this problem, Google is actively promoting the formation of the root transport system and the preparation of space for hosting huge server farms with direct access to the world's leading telecommunications operators. (According to unconfirmed information, in 2005, under heightened secrecy, Google carried out large-scale work on installing 4 thousand sea containers with server racks of CPUs in various parts of the world's oceans.) This will allow the company to significantly reduce telecommunication costs and ensure control over the delivery of most of its content and global internet traffic.

Google, - «» : « “” : 2004 Google 28 , 2005 138 ! , -. , ».

- , . 2005 13 , - « ».

C 2000 - . , ChinaGrid, . 2006 , - (China Educational Grid Project, CEGP).

CEGP , .

, « . — 150–200 - (PIC) -. , 50–15, AMD ( 2015 - 50% . — “”)».

2006 EUChinaGRID. — - , -. «-» .

, - GARUDA, - 17 - .


- . .

, - , — , 2003 .

2006 «-» . 7 « - - » .

— ( ) - ( ) — .

, -, -. - — , — «» , ( , ). -, , .

- , — — « » , -. — , . .

- , - « -, , , — , - ».

- : , . .

- ( — Sun Microsystems; - — AMD; , , — Google; , -, — Oracle) .

Creating a dedicated high-speed data network for grid systems is a key element in reducing the cost of processing a bit of information and the main advantage in price competition for global projects. And it is precisely the participation in the creation of a communication infrastructure for global grid computing - perhaps the last chance for Russia to be included in the transformation of the global infocommunication space.

- based on Expert`s materials ---

Source: https://habr.com/ru/post/775/


All Articles