
Today we will talk about something high ... cloudy. Yes, about a new, or rather little-known startup we have, which created its own platform for cloud computing, although this is not quite so accurate - it is a platform that is provided as a service and allows you to get a flexible and scaled execution environment for web projects. In some ways, this all seems to be the familiar and almost nominal
Amazon EC2 , but it differs significantly from it. What and how - we will further consider.
Startup 10gen offers developers its platform-as-a-service construction technology, which allows you to get a flexible and almost unlimited scaling platform (how many times we have repeated this word) for deploying applications that need to work under variable load conditions, which, however, tends to grow continuously, which needs a secure and simple platform so that adding a new server or upgrading the database does not cause the entire system to stop working for several hours or days.
10gen offers its own stack of technology solutions, approaching in this the recently appeared but already widely known
Google App Engine . Unlike the more traditional and familiar hosting users (VDS or dedicated servers), which can also scale up and claim the title of the Cloud (I'm talking about the Aptana Cloud,
which we already wrote about ), 10gen relied on its own technologies, having developed the full stack needed for Web-based technology projects on top of traditional Linux (as well as non-traditional MacOS X and soon available Windows).
')
In more detail, the “cloud” from 10gen consists of the following levels:
- hardware level - a cluster of servers running Linux (or other operating systems that support their stack). As far as I understand, it is the usual servers that are used, not virtualization, a higher level is responsible for this. There is no need for specialized solutions like network data storages (NAS).
- Mongo is a special object-oriented database (rather, structured data storage, similar to Amazon SimpleDB ), the company's own development. This project is interesting in itself, and then we look at it in more detail.
- The file system is represented by several systems, the main of which is the scalable universal system GridFS , which is also the data storage for the Mongo database. To store project files (source code of scripts, CSS, HTML and other files) SCF (Source Code Filesystem) add-in is used, which runs and manages over the Git version control system (note that in such projects it is very convenient to combine the version control system and the project directory itself , which is accessible through a web server, so this approach is also used in the Aptana Cloud, though through a more familiar SVN).
- An application server is the main tier that provides a web server and runtime for user applications. This part runs on top of the JVM and is automatically scaled to provide the necessary resources. Currently, JavaScript applications are supported (server-side, a very popular solution for Cloud now, in particular, Aptana has an excellent Jaxer application server ) and Ruby (Ruby on Rails, while being finalized). The system itself, according to 10gen, is completely independent of the language and uses an intermediate conversion to java-code, which is then compiled into native via javac, which then allows you to use the whole range of Java technologies for distributed execution and scaling (and, probably, get access to all components of the Java platform). Therefore, we should expect the emergence of other languages in the future - for example, Java itself, as well as the more familiar web developer PHP and Python.
- Above all this is the control system of your system that allows you to automatically distribute the load and scale the system if necessary. For the deployment of the project is used, as we said, the distributed system Git.
Let's dwell a little bit on the moment with the database, I am sure that those who have experience in developing ordinary web applications will be interested. In the 10gen platform, developers assure us to abandon the use of the usual and familiar database system. If in the traditional LAMP stack we use the MySQL database at a low level, and in most cases we use some of the ORM layers on top of it, then the latest trend is the use of object-oriented databases that have a much more simplified API (not SQL) and rely on the file system, which, however, does not and inhibits them, providing strong scalability and load distribution. A pioneer and apologist for this approach was probably Google with its
BigTable , and later the open project
Apache Hadoop appeared , which, in particular, is used in some very serious projects, for example, as a data storage system in the search engine
Nutch . Therefore, based on the object nature of most of the data used by the application, as well as the ability to almost unlimitedly scale and virtualize such a file system, we can completely abandon the traditional relational databases. Of course, such an approach requires almost complete rework of the existing application or design, taking into account the new platform and even, I am not afraid to say, philosophy, but the benefits of such a decision are quite obvious.
Mongo's object-oriented database allows you to store an unlimited number of objects, each of which can be either a text (string) or a binary data packet, for example, a multimedia file. This allows you to use a single access mechanism and store all types of content together. Mongo now provides basic functionality - queries for extracting or processing data (SELECT, INSERT, UPDATE, DELETE), as well as indexes. The development process is full-text search functionality, replication system and locking mechanisms.
Now access to the database can be obtained through native APIs for supported languages, in particular, JavaScript, after adding Ruby it is stated that there will be support and mechanisms for popular ORM systems (I think this is ActiveRecord) and developers can write plug-ins to work with other systems. This will allow to transfer the application without significant redesign, if it uses the ORM for abstraction from data access.
Of course, such a database is far from being everywhere and it is not always possible, and the creators themselves do not assume this. This is rather a data warehouse, rather than an analytical tool, so business intelligence and data analysis tools can hardly be built on the basis of an object database - they are simply not intended for this. Therefore, I see an ideal system where both approaches will be combined, and even if two database servers are installed — one regular server, for example, MySQL Cluster and an object base for storing content and other information that does not require special processing. Such a system will allow using both tools there and for those functions for which they are ideally suited. So far, the 10gen stack does not have a relational database, but the fact that the infrastructure is based on java can mean that adding a database and a corresponding interface, for example, Hibernate will not be too complicated.
By the way, have I told you yet? 10gen provide source code and binary distributions of their system
for everyone . Yes, like the Eucalyptus system (
about which we have already written ), you have at your disposal the complete source code and can create your own Cloud Computing system for use in projects. All this is distributed under the GNU AGPL 3.0 license, only JavaScript libraries and main framework modules are available under the Apache License 2.0. So far, only 64-bit Linux and MacOS are supported, but the Win32 version of the platform is also promised soon. If you need the 32-bit version for Linux now, you will have to use the nightly build or the source code from the repository. All you need is an OS (Fedora 6 or 8, MacOS X 10.5, Ubuntu 7.1 or 8.04) and Java SE 5 or 6 installed. By the way, it will be interesting to try to deploy all this on the
VirtualBox virtual machine in Windows.
In conclusion, I would like to summarize. The topic of virtualization and cloud computing is very much in demand in the market, as we see in the example of many startups trying to create a simple and scalable environment for developers. Some rely on the already tested and refined LAMP scheme for years, others have taken full virtualization, offering customers to deploy OS images themselves, while others create their own stack over already virtualized hardware (for example, using XEN or other technologies, or through the Java platform) hesitating to offer developers a completely new concept, implementation language, and even a web server and database. It seems to me that for startups it’s the most ...
PS The original material is posted on my blog
Alpha-Beta-Release Blog