📜 ⬆️ ⬇️

Storm ("Hadoop in real time") is now Open Source

As promised , Twitter laid out the Storm distributed real-time data processing system (from BackType) on github. Now it is an open source project.

In the explanatory note, the author of the project Nathan Marz explains that in the last decade such technologies as MapReduce, Hadoop and so on. made a real revolution in the processing of large amounts of data. Unfortunately, they are not designed for realtime work. Storm offers an alternative solution. In fact, Storm can be called "Hadoop in real time", the same scheme with a set of basic primitives is implemented here. This is an extremely reliable and scalable system with support for any programming language , it is installed in a single line on Amazon EC2.

The closest analogue for Storm can be considered S4 (developed by Yahoo). The main difference of Storm is that it does not lose data and is easier to use.

Nathan Marz is the lead programmer for BackType , which Twitter bought in July 2011. In the comments on HN, he gives a list of resources that can be useful when working with Storm.
')
Documentation wiki: github.com/nathanmarz/storm/wiki
One-click installation on EC2: github.com/nathanmarz/storm-deploy
Adapter to use the Kestrel message queue server in conjunction with the Storm: github.com/nathanmarz/storm-kestrel
A tutorial project with sample topologies that can be run in local mode: github.com/nathanmarz/storm-starter
The mailing list where Nathan Marz answers the questions: groups.google.com/group/storm-user

Source: https://habr.com/ru/post/128808/


All Articles