
Recently, Microsoft
announced a preliminary version of the new service -
Azure Stream Analytics , created for streaming data processing in near real-time mode.
The current version of Azure Stream Analytics connects to Azure Event Hub and Azure Blob Storage to receive data stream (called Inputs), as well as Event Hubs, Blob Storage, Azure SQL Database to record results (Outputs). The stream processor is designed using a language similar to SQL, which allows you to set the processing and conversion of streaming data into reliable information in real time.
And here the power of the cloud comes to the fore. In just a few steps and a couple of hours, you can build a reliable infrastructure that can handle tens of thousands of events or messages per second.
')
I was very curious to know how much can be achieved with the help of this service. Therefore, I compiled a test script. The basis for my experiment was a guide that can be found at this
link .
The manual has a slight inaccuracy in the “Start the Job” step. It says that you should go to the "configure" section of your task (Job) in order to set up the task start time (job output). However, this setting is not in the Configure section. This parameter is configured in the window where you start your task.
In order to make the test more interesting, I changed the following settings:
- Set the Event Hub scale to 10 units. Thus, it is potentially possible to reach 10,000 events per second.
- Changed the Event Hub code to increase the number of messages.
- Created a small PowerShell script to run N simultaneous instances of the command line application.
- All this was done in a virtual machine in the same Azure (Western Europe) data center where Event Hub and Stream Analytics work
Changes to the source code of the Service Bus Event HubI deleted all unnecessary code (for example, creating an Event Hub). In summary, my Program.cs file looks like this:
static void Main(string[] args) { System.Net.ServicePointManager.DefaultConnectionLimit = 1024; eventHubName = "salhub"; Console.WriteLine("Start sending ..."); Stopwatch sw = new Stopwatch(); sw.Start(); Paralelize(); sw.Stop(); Console.WriteLine("Completed in {0} ms", sw.ElapsedMilliseconds); Console.WriteLine("Press enter key to stop worker."); Console.ReadLine(); } static void Paralelize() { Task[] tasks = new Task[25]; for (int i = 0; i < 25; i++) { tasks[i] = new Task(()=>Send(2000)); } Parallel.ForEach(tasks, (t) => { t.Start(); }); Task.WaitAll(tasks); } public static void Send(int eventCount) { Sender s = new Sender(eventHubName, eventCount); s.SendEvents(); }
Now, using this command line application, I simultaneously send 25 x 2,000, or 50,000 messages. To make things even more fun, I’ll launch the application pseudo-in parallel, simply launching it 20 times using the following PowerShell script:
for($i=1; $i -le 20; $i++) { start .\BasicEventHubSample.exe }
Thus, I run processes almost simultaneously. And wait until the end, that is, until all processes send their messages. Twenty times 50,000 messages will form 1,000,000 messages. Then just get the result of the slowest operation. Of course, all these indicators are a bit approximate, but they are sufficient to give me an idea of ​​the opportunities that I have. Without the need to invest in expensive equipment and the development of complex solutions.
One more thing - I launched my stream analytics task before launching command line applications that download data, just to make sure that the stream processor is already running before I drop data on it.
Pay attention to some points. The first service Stream Analytics is still in the preliminary version, so there may be failures. But the end result is still just amazing.
Look at the Event Hub and Stream Analytics graphics - it's just awesome. By the way, I also made sure that the new
performance levels of the Azure SQL Database are also amazing.
With this amount of data in Stream Analytics, the service did not have problems recording the results into a single database of Basic level (5 DTUs)! I started getting results in the table of my SQL database as soon as I switched from running the program to my SQL Server Management Studio and was able to see the results coming in real time.
And finally, I pumped 1,000,000 events in Event Hub in just 75 seconds! That means over 13,000 events per second! Total using a couple of lines of code.
How cool to look at charts like this:

And how cool to look at these graphics Azure Event Hubs:

Azure Event Hubs, millions of messages. Just think how long it would take us to create a local test lab in order to process this amount of data?
The following are some of the most important
limitations and known problems of Stream Analytics:
- Geographical accessibility of the preliminary version of the service (Central USA and Western Europe)
- Quota of streaming unit (12 streaming units per azure per region per subscription)
- UTF-8 is the only supported encoding for input CSV and JSON data sources.
- In the preview version, some cool performance counters are not available, for example, counting delays.
Looking at the results, I’m convinced that Azure Event Hubs really can provide bandwidth of millions of events per second, and Stream Analytics can really handle so much data.
useful links