Greetings to cloud computing enthusiasts.
I propose to look at the comparison of Windows Azure Blob Storage and Google Cloud Storage services (while the author does not forget to mention about Amazon AS3).
I thought it would be nice to write an article comparing the Google App Engine and Windows Azure storage. In this article, we will compare
Windows Azure Blob Storage and
Google Cloud Storage .
The first part of the cycle - Comparing Windows Azure Table Storage and Amazon DynamoDBThe second part of the cycle - Comparing Windows Azure Blob Storage and Amazon Simple Storage Service (S3) - Part ICycle Three - Comparing Windows Azure Blob Storage and Amazon Simple Storage Service (S3) - Part II, SummaryAbbreviations:
Windows Azure Blob Storage - WABS and
Google Cloud Storage -
GCS, Amazon S3 -
AS3 .
Conceptually, WABS and GCS provide similar functionality — to put it simply, both systems are cloud file systems that allow storing large amounts of unstructured data (usually in the form of files).
')
Both systems provide a REST API for working with files and folders and other high-level language libraries, which are usually REST API wrappers. Each API release has its own version, in WABS it has a date value, in GCS - numbers. At the time of this writing, the WABS version was
2011-08-18 , GCS is
version 2.0 .
Similar functionality in two systems:
- Both systems are cloud-based file systems with a two-tier hierarchy.
- Both systems allow you to store large amounts of data long and cheaply.
- Both systems protect content from unauthorized access.
- Both systems provide their access control mechanisms to protect data. In GCS, these are ACLs and Query String Authentication in WABS - ACLs and Shared Access Signatures .
- Both systems allow you to store an arbitrarily large number of versions of the original object, but the versioning mechanism in the two systems is different.
Concepts
Before we talk more about these two services, I find it important to clarify some concepts. If you are familiar with the basic concepts of WABS and GCS, you can skip this section.
Containers of blobs and baskets : If these services are file systems in the cloud, consider the WABS blob container and the GCS basket as a folder or directory. In a WABS storage account or a GCS account, you can have zero or more blob containers and baskets that can contain blobs or objects, respectively.
Comments:
- There is no such thing as a nested container blobov or baskets. Both services provide a two-level hierarchy without nesting. However, both systems allow you to create the illusion of a folder hierarchy using prefixes.
- There are no restrictions on the number of containers and baskets.
- Both systems provide the ability to log resource requests - this function is called “logging” in GCS, and “Storage Analytics” in WABS. The difference is that in GCS, logging works at the basket level, while in WABS it works at the storage account level. In GCS, logging data is placed in a separate user-defined basket, while in WABS, it is placed in predefined tables and containers that are created automatically when logging is enabled.
Blobs and Objects : WABS blobs and GCS objects are files in your cloud file system located in blob containers and baskets.
Comments:
- There is no limit on the number of stored blobs and objects, while in GCS this number is simply unknown, in WABS this number is limited by the size of the storage account (100 TB).
- The maximum object size in WABS is 1 TB, and in GCS it is not defined.
- WABS has two types of blobs - blocky, convenient for streaming (for example, pictures, video, documents) and having a maximum size of 200 GB, and page ones, convenient for random access / recording operations and having a maximum size of 1 TB. A common case of using a page blob is to mount the VHD as a disk in the role of Windows Azure. There is no such separation in GCS.
- Both systems provide rich functionality for managing blobs and objects. You can copy, download, download and perform other operations.
- Both systems provide the ability to protect content from unauthorized access, and the access control list mechanism is configured in more detail in GCS, where you can create your own ACL for each file in the basket. In WABS, everything happens at the blob container level.
The two most important functions are upload and download, let's discuss them first, then compare the remaining functions.
Loading blobs and objects
Let's talk about loading blobs and objects into containers and baskets. There are two loading mechanisms - you can load a blob or an object entirely within a single request or divide them into pieces (blocks or WABS pages, in GCS they do not have any special name).
Loading in one request
If the download data is small and you have a good connection speed, you can download this data completely in one request. In WABS,
Put Blob is used for this. In GCS-
PUT Object or
POST Object .
Loading chunks
You can share large data that is not efficiently loaded in a single query completely. Both systems allow you to split data into pieces (blocks or pages in WABS, in GCS they do not have any special name) and load gradually. In WABS, you need to use
Put Block and
Put Block List for block blobs and
Put Page for page blobs. GCS uses the
POST Object and
Put Object functions for this.
There are many reasons why you can decide whether to load data in chunks:
- It is necessary to download very large data. Please note that in WABS one block blob is limited to 200 GB, the page blob is 1 TB, and in GCS you can have one object up to 5 TB. Such volumes are impractical to load in one request.
- Low connection speed.
- Both systems are cloud services designed to process the requests of hundreds and thousands of users at the same time, and both systems will limit your requests if they run longer than the set limit - in WABS it is 10 minutes to download 1 MB of data.
- Splitting large data into chunks allows parallel loading (respectively, loading data more quickly).
- In the case of a broken piece of loading, you can repeat its loading, if the loading of big data that loads in one request stops, you will have to download everything again, which is inefficient.
- System Restrictions - WABS does not allow downloading data in one request if its size exceeds 64 MB.
Let's see how to load the data in chunks on each of the systems. For example, you want to download a 100 MB file with chunks.
WABS
Suppose each piece has a size of 1 MB (despite the fact that there is no need to have pieces of the same size) - you need to make a download of 100 pieces. We take a block blob, each of the blocks (pieces) of which has a unique identifier (BlockId). To load it, use the
Put Block function. BlockId is a Base64 encrypted string, the maximum size of which is limited to 64 bytes. All BlockId (100 in our case) must be the same length. In this case, it does not matter in what order you will load the blocks - you can load them in parallel. After loading the block, WABS puts it somewhere in storage and stores it for 7 days. After loading all the blocks, we call
Put Block List , commit (commit) these blocks. Until the call of this function, the blob cannot be accessed and, if you have not confirmed the blocks within 7 days, they will be deleted by the system. After calling the function based on the order of the BlockId list, WABS will recreate the blob and mark it as available. It makes no difference what values ​​BlockId will have (all of them can be GUIDs), but it is important in which order you send the BlockId when using the Put Block List.
Limitations:
- Blob can be divided into a maximum of 50,000 blocks.
- A blob can have a maximum of 100,000 unconfirmed blocks at any given time.
- A set of unconfirmed blocks cannot have a size larger than 400 GB.
- All BlockId blocks of one blob must be of the same length, i.e. the situation when they will be equal block8, block9, block11 is unacceptable.
- The maximum length of the BlockId is 64 bytes.
GCS
In the case of GCS, uploading a large file in chunks is called “
Resumable Uploads ”. You must first inform GCS that you have started the loading process by calling the
POST Object . Usually this function is used to upload a file using HTML forms, but in this case you do not define the file. You can define request headers with which to inform GCS that you have started the download process. After the upload is complete, GCS will return a response containing a Upload Id that uniquely identifies the upload process. This Id needs to be saved, as it will be needed when loading pieces. Next, you need to try to upload the file using the
Put Object function and passing it the Upload Id and the contents of the object. If everything works out, GCS will respond with the HTTP code 200 Ok, but if the operation fails, you will have to request the number of bytes downloaded from GCS. GCS will return HTTP code 308 Resume Incomplete. Then you can continue loading data using Put Object.
Thoughts:
- I think we can get rid of the first call to returning a function object in which we are trying to load the entire file in the hope of getting the 200 OK code. If I try to upload a 100 MB file, I’m pretty sure that it will not load in one go. Instead of trying to download the entire file, I can skip the first two steps and simply download a piece of this file, get its status and then reload it or load the next piece.
- I'm not sure how you can load chunks in parallel in GCS when GCS returns the Range header when querying the number of bytes loaded. In WABS there is a BlockId, in AS3 the part number, which facilitates the task of parallel loading.
Download blobs and objects
Let's see how you can download blobs and objects. To do this, there are two mechanisms - either to download the whole blob or object in one request, or in chunks.
Each system has only one download function —
Get Blob in WABS and
GET Object in GCS.
Download in one request
If the data is small and you have a good connection speed, you can download the entire object using Get Blob in WABS and GET Object in GCS.
Download in chunks
If the object is large and you are not sure whether you can download it at one time, you can download the pieces using the same function with the addition of the Range header and defining the range of bytes needed for downloading.
Download process:
- Determine the size of the object. For example, it "weighs" 100 MB.
- Determine the size of the pieces. For example, you are comfortable to download pieces of 1 MB.
- Call Get Blob or Get Object and pass them the appropriate values ​​in the Range header. If you download consistently, your first request will have the value of this header “0 - ​​1048575” (0 - 1 Mb), the second request - “1048576 - 2097151” (1 - 2 Mb) and so on.
- After downloading put a piece somewhere.
- After downloading all the pieces, create an empty file of 100 MB in size and fill this file with the downloaded pieces.
Common moments between WABS, AS3 and GCS
All three systems have common moments, for example:
- All three systems are cloud-based file systems with a two-tier hierarchy.
- All three systems allow you to store large amounts of data long and cheaply.
- All three systems provide a two-level hierarchy (baskets / objects in AS3 / GCS and blob containers / blobs in WABS).
- All three systems provide a RESTful interface for interacting with their own services and libraries of high-level languages, which are usually REST wrappers.
Common with AS3
When I first read about GCS, I found that there is much in common with GCS and AS3, for example:
- One terminology: both systems use similar terminology such as baskets and objects (in WABS they are called blob containers and blobs).
- Same operation names: both systems use the same operation name. For example, a function from the API that returns a list of baskets in both systems is called the GET Service.
- Same pricing structure: both systems have a similar pricing structure. In WABS, all transactions are the same, AS3 and GCS have transaction costs, which vary depending on the running operation.
- One style of hosting: both systems support virtual-hosted-style (for example, http://mybucket.s3.amazon.com/myobject ) and path-style (for example, http: //s3-eu-west-1.amazonaws. com / mybucket / myobject ), whereas WABS only supports path-style (for example, http://myaccount.blob.core.windows.net/myblobcontainer/myblob ).
- Similar consistency model: Both systems provide a similar consistency model. For example, both systems provide a strong read-after-write stability model for all PUT requests and a model that is ultimately stable for all List (GET) operations.
Unique moments in GCS
When we started discussing the basic functionality of GCS, it might seem that GCS provides less functionality than WABS and AS3, but GCS has functions that are not found in any other platform. For example:
- OAuth 2.0 authentication : This is a unique and modern feature that eliminates the need for users and applications to provide credentials when they need to access data. Learn more: https://developers.google.com/storage/docs/authentication#oauth .
- Cookie-based authentication : GCS allows you to make requests authenticated in the browser (for those who do not have a GCS account). To do this, you need to configure the ACL and give users the URL of the object. Learn more: https://developers.google.com/storage/docs/authentication#cookieauth .
- Cross - Origin Resource Sharing (CORS) : Another unique and modern feature available only in GCS. The CORS specification developed by W3C is a policy applied in client-side applications to prevent interactions between resources from different origin ( same-origin policy implementation). However, this feature prevents not only dangerous behavior, but also quite useful and legitimate interactions between known origins. GCS supports this specification, allowing you to customize the recycle bin to return CORS-compatible responses. Learn more: https://developers.google.com/storage/docs/cross-origin . Please note that the function is at the “Experimental” stage (in other words, in beta :)). I am not 100% sure, but it seems that the same can be achieved using the $ root blobs container in WABS.
Pricing
When using both systems there are no “capital” costs. The pricing model is relatively simple and based on consumption. In both systems, billing is based on usage and it can consist of three components:
- Number of transactions : Payment is made according to the number of transactions made - roughly speaking, one transaction is one function call in the system. There is a significant difference between the two systems — in WABS, the transaction cost is fixed ($ 0.01 for 10,000 transactions), in GCS it varies depending on the type of transaction. If you perform PUT, COPY, POST, LIST operations, you pay a higher price per transaction ($ 0.01 per 1000 transactions), GET and others pay a lower price ($ 0.01 per 10 000 transactions). Deletion requests are not written, but I assume that they are free in GCS.
- Storage : You pay for the amount of data stored in each system.
- Traffic : You pay for the amount of data transferred to and from the system. At the time of writing the post, both systems provide free incoming traffic. It is not mentioned whether the cost of data transmission within one data center in GCS is paid.
A special pricing model is also available, and both systems provide different payment packages. Learn more about pricing at
https://www.windowsazure.com/en-us/pricing/details/ for WABS and
https://developers.google.com/storage/docs/pricingandterms for GCS.
Functions
The table summarizes the functions provided by WABS and GCS. It contains only functions supported by both systems.
The following table lists the functions supported only in WABS.
Let us consider these functions in more detail.
| WABS
| GCS
|
Create Container / PUT Bucket
| Yes
| Yes
|
This function creates a new blob container or basket.
An important point to remember is that the blob containers are limited to the storage account, while the GCS baskets are limited to the GCS project. When you create a WABS storage account, you determine its location (data center), and your blob containers are located in a specific data center in a specific geographic location. When you create a cart in GCS, you define the region in which this cart will be created, so you can distribute the cart to all data centers in the GCS if necessary. In order to do the same in WABS, you need to create a storage account in each data center where you want to place the containers.
There are several rules for naming blob containers and baskets; they are tabulated below.
| WABS
| GCS
|
Minimum / maximum length title
| 3/63
| 3/63
|
Case sensitivity
| lower case
| lower case
|
Allowed characters
| Alphanumeric and hyphen (-)
| Alphanumeric, hyphen (-) and period (.)
|
More naming rules:
- The names of blob containers should begin with a letter or a number, but not a hyphen, while after the hyphen there should again be a letter or a number, several consecutive hyphens are not allowed.
- GCS basket names must consist of labels, separated by a dot, where each label must begin and end with a lowercase letter or a number, and the basket name should not look like an IP address (for example, 127.0.0.1).
- Although baskets names can contain from 3 to 63 characters, if the name contains dots, then the basket name can be up to 222 characters, taking into account the number of dots.
- Basket names cannot begin with the goog prefix.
Notes:
- When creating a container or basket, you can set the ACL (optional), but if it is not specified, then the container or basket becomes private, that is, available only to the owner. In GCS, an ACL cannot be defined during creation — a System Default ACL is used during creation. Learn more: https://developers.google.com/storage/docs/accesscontrol#default . You can change the ACL or apply the CORS to the basket after it has been created.
- WABS allows you to define your own metadata for a container, which are collections of key-value values ​​and have a maximum size of 8 KB. This is not available in GCS.
| WABS
| GCS
|
List Containers / GET Service
| Yes
| Yes
|
The function returns a list of all blob or basket containers that belong to the authenticated owner in the GCS.
Comments:
- One call to this function in WABS will return a maximum of 5000 containers, if there are more containers in the storage account, then the continuation token will also be returned. By default, WABS returns up to 5000 containers, but you can specify a smaller number. This number is not mentioned in GCS.
- In WABS, you can filter on the server side using the prefix from which the names of the containers to be sampled must begin.
- In WABS, you can specify whether to return the metadata for the blob container along with the list.
| WABS
| GCS
|
Delete Container / DELETE Bucket
| Yes
| Yes
|
The function removes the blob container or cart.
Comments:
- It may look like this operation looks like synchronous, but in reality it is not like that. When you send a request to delete a blob container, it is marked for deletion and becomes unavailable, after which it is deleted during the garbage collection process, so the actual time it takes to remove a container may vary depending on the size of the data in that container. In my experience, deleting a very large container may take hours, and at this time an attempt to create a container with the same name will result in an error (Conflict Error - HTTP 409). In this regard, it is necessary to plan what to do at this time.
- In GCS, the basket must be empty before deletion. You must first remove all objects from the basket, and then delete it. Otherwise, error 409 Conflict will be returned .
| WABS
| GCS
|
List Blobs / GET Bucket (List Objects)
| Yes
| Yes
|
The function is used to get a list of blobs and objects in a container or basket. Functions in systems perform the same thing, given:
- Both functions allow you to limit the resulting sample to the desired number of objects.
- Both functions have the maximum number of objects that they can return in one function call - in WABS it is 5000, in GCS it is 1000.
- Both functions support separators, which are a character that groups blobs or objects. The most used separator is /. As mentioned above, both systems maintain a two-level hierarchy, and using a separator can create the illusion of a folder-type hierarchy. For example, you have the following objects: images / a.png, images / b.png, images / c.png, logs / 1.txt, logs / 2.txt, files.txt. When you want to call a function and pass the delimiter / to it, both systems will return the following values: images, logs, files.txt.
- Both functions support server-side filtering using prefixes. When your request contains a prefix, both systems will return objects that have a name that starts with that prefix. Using the example above, if we pass the “images” prefix without separators, both systems will return the following values: images / a.png, images / b.png, images / c.png.
- Both functions can use a token, which is essentially a continuation token, and is used to tell both systems to start getting a list of objects, starting with this token.
- Both systems return objects in alphabetical order.
Differences:
- One function call in WABS will return a maximum of 5000 blobs, GCS - 1000 objects.
- When receiving the list, you can indicate to WABS that it is also necessary to return the blobs snapshots. This is not possible in GCS.
- When retrieving the list, you can specify WABS to return metadata for blobs. In GCS, metadata for objects is not returned - for this you must use the HEAD Object .
- When receiving the list, you can specify WABS to return the list of blobs that are not yet confirmed (commited), i.e. partially loaded, GCS can only return objects that are already fully loaded.
- You can use this function to get the ACL or CORS configuration for the basket.
| WABS
| GCS
|
Set Container ACL / PUT Bucket (ACL or CORS)
| Yes
| Yes
|
The function is used to specify ACLs for containers or baskets, and one or more access policies can also be specified in WABS. In GCS, you can also configure CORS (but you cannot configure CORS and ACLs in the same request).
For a blob container, the ACL values ​​can be:
For baskets, ACL values ​​can be equal to:
- READ : Allowed to get a list of items in the cart.
- WRITE : Allowed to create, rewrite and delete objects in the basket.
- FULL _ CONTROL : With this value, permissions are granted READ, WRITE.
Convenient in GCS is that you can give users different sets of permissions, for example, user1 can have READ ACL, user2 - WRITE ACL, in WABS there is no such flexibility, permissions are placed only on the blob container.
Convenient in WABS is that, in addition to the ACL, you can set up to 5 container access policies that define a temporary set of permissions for this container. For example, you can create an access policy with permission to write to the blob container, which will only be valid for a day. Using policies allows you to generate a special URL with a signature and give it to users (flexible Shared Access Signatures functionality). Signatures allow you to issue access rights to containers and blobs at a more detailed level for a specific time.
| WABS
| GCS
|
Get Container ACL / GET Bucket (ACL or CORS)
| Yes
| Yes
|
The function is used to get the ACL for the blob container or the recycle bin, and in WABS this function also returns the access policies defined for the container.
To get the basket ACL, you need to call
GET Bucket with the string parameter “
acl ”, to get the CORS, you must call with the string parameter “
cors”. If neither is specified, the list of objects in the basket is returned.
| WABS
| GCS
|
Put Blob / PUT Object
| Yes
| Yes
|
The function adds a blob to the blob container and an object to the basket. This function can be used to specify an ACL to an existing object in GCS or to copy an object from one basket to another.
Comments:
- In both systems, the function will overwrite the existing object with the specified name.
- Both systems allow you to define properties for objects (cache control, content type, etc.)
- Both systems allow you to send an MD5 hash of content to check the consistency of the data.
- In GCS, when creating an object, you can assign an ACL to it, which cannot be done in WABS.
- Both systems allow you to specify metadata for blobs and objects in the form of a collection of key-value pairs. In WABS, the maximum size of this metadata is 8 Kb, in GCS it is not known.
- When creating a page blob using this function, you only initiate a page blob, but do not put data into it. To insert data, you must use the Put Page function.
- When creating a block blob or object in GCS, the data is sent in the request.
- The maximum size of a block blob created using this function is 64 MB. If the size is larger, the blob must be divided into blocks and loaded using Put Block and Put Block List.
- WABS allows you to define preconditions that must be met to successfully complete this function ( If - Modified - Since , If - Unmodified - Since , If - Match , If - None - Match ).
The function adds an object to the specified basket using an HTML form. POST is an alternative to PUT and allows the browser to load the object. Parameters transmitted by PUT using HTTP headers are transmitted from POST as the body of an encrypted multipart / form-data message.
| WABS
| GCS
|
Get Blob / GET Object
| Yes
| Yes
|
The function allows you to download blob from a container or basket.
Comments:
- You can download pieces by specifying the number of bytes in the Range header.
- In WABS, the function also returns metadata for the blob to be downloaded.
- In GCS, you can use this function to get the contents of an object or its ACL.
- Both systems allow you to define preconditions that must be met in order to successfully complete this function ( If - Modified - Since , If - Unmodified - Since , If - Match , If - None - Match ).
- You can use this function to get blob versions - to get a blob version, you need to specify the blob snapshot date / time.
| WABS
| GCS
|
Delete Blob / DELETE Object
| Yes
| Yes
|
The function removes a blob or object from the repository.
Comments:
- In WABS, you can use this function to remove only snapshots without deleting the source. If the source is deleted, all its snapshots are also deleted.
- This function can be used to delete specific versions of a blob - to do this, specify the date / time of the blob snapshot in WABS.
· WABS allows you to define preconditions that must be met in order to successfully complete this function (
If- Modified- Since ,
If- Unmodified- Since ,
If- Match ,
If- None- Match ).
| WABS
| GCS
|
Copy Blob / Put Object - Copy
| Yes
| Yes
|
The function copies the blob or object to somewhere from the original location.
Comments:
· Both systems allow you to define preconditions that must be met in order to successfully complete this function (
If- Modified- Since ,
If- Unmodified- Since ,
If- Match ,
If- None- Match ). These conditions can be defined both on the source and on the final copy in WABS and on the source in GCS.
· WABS allows you to copy objects from a container to a container only within one storage account. There is no such limitation in GCS. If the baskets between which the exchange occurs belong to the same project, the object will be copied. However, if you created an object using the API for loading in chunks, you cannot copy the object from region to region.
· Both systems allow you to copy existing metadata or specify metadata for the final copy.
- In GCS, when copying, a certain ACL is deleted and a default ACL is placed on the object. You can define your own ACL during the copy process using the appropriate request header.
Tips:- Both systems do not support object renaming. Renaming an object can be done by first copying the object and then deleting it.
- You can also “upgrade” the version of the blob or object to the “current” version. To do this, you need to specify both the source for copying a versioned blob (pointing to its snapshot) or an object (indicating its Version Id) and the final copy as a unversioned blob or object.
| WABS
| GCS
|
Get Blob Properties / HEAD Object
| Yes
| Yes
|
The function is used to get blob properties and object metadata, but does not return the contents of a blob or object.Comments:
- Get Blob Properties in WABS and HEAD Object in GCS returns a set of user-defined metadata, standard HTTP properties and system properties for a blob or object.
- WABS , ( If - Modified - Since , If - Unmodified - Since , If - Match , If - None - Match ).
- . / . , .
- WABS, Get Blob Metadata.
| WABS
| GCS
|
Get Blob Metadata/HEAD Object
| Yes
| Yes
|
The function returns user-defined metadata for a blob or object. This function can be used to get the properties of a particular version of a blob or object. To obtain this information, you must specify the date / time of the blob snapshot in WABS.Summary
As we saw from the article, both systems provide a similar set of functions; however, some functions are present in one system, but not in the other. Despite this, you can not talk about a big difference in functionality.Note from the translatorReading this review, it didn’t leave the impression that Google rationally decided not to build Lisapedos, but to follow the well-trodden successful Amazon road - this is evidenced by the almost complete identity of some parameters. Given that Amazon launched its service in 2006, and Google in 2010, it may well be that it was so. However, Google has really great features that are lacking in other services - the same CORS, for example. In general, you can even try to declare that the pace of development of Google and Microsoft services in the time context is higher than Amazon.Thank you for your attention, as soon as the next materials will be developed, I will definitely translate them and give them to your attention.