Machine learning, cloud technologies, visualization, Hadoop, Spark, scalability, analytics, terabytes, petabytes, faster, bigger, more reliable, better - all these words are spinning around the head after three days in the showroom of the Strata + Hadoop conference hall. And, of course, everywhere mountains of toy elephants - the main symbol of the conference.
My colleagues from DataArt and DeviceHive not only attended the conference, but also helped friends from Canonical. At their booth, they demonstrated Juju - a powerful tool to help set up and deploy services in the cloud quickly and without problems. There we also brought our favorite demo - a device for monitoring industrial equipment. No tediousness and PowerPoint, all live - the Accelerometer SensorTag was installed on the fan to track its vibration.
')
To simulate the vibration, we glued a piece of electrical tape onto one of the fan blades. This upset the balance and made the whole structure highly unstable. The data from the sensors were transmitted to the DeviceHive server as a time series, processed in Spark Streaming and displayed on beautiful graphs. All this is deployed with Juju, which is perfectly integrated with Amazon Web Services (AWS).
With all the abundance of companies with cool products, the main topic of the conference was, it seems to me, Spark. Spark was discussed, Spark was taught, Spark was launched, Spark was integrated. Spark was here, Spark was there, Spark was everywhere. Virtually everyone, regardless of the size of the companies, shared their experience in integrating and using Spark in their products.
In just a few years, Spark has proven to be an excellent platform for data processing, machine learning, and distributed computing. His environment is constantly expanding, he changes his work with data and makes development faster.
The next generation of analytics tools will probably work with Spark anyway, which will allow companies to use data more efficiently. And the next generation of parallel computing tools will help businesses, engineers, and data processing specialists combine their development efforts.
Developing Spark, Databricks introduced its new data analysis product — an interactive shell for creating Spark jobs, launching them on an AWS cluster, creating queries and visualizing data. Add to this Spark Streaming and be able to run models, working with data streams in real time. While Databricks hosts the home page with a user interface, the data and infrastructure for running Spark is hosted on your AWS machines. It will be interesting to compare this all with the Space Needle, which Amazon promises to present at re: Invent 2015 in Las Vegas.
Obviously, working with large amounts of data requires more than just choosing a specific database or distributed system. Entire platforms emerge for the development of BigData technologies, and the world begins to think in terms of these platforms: sets of technologies and architectural design patterns that are designed together to solve various BigData tasks. Data platforms largely determine how we access, store, transfer, process, and search for structured, unstructured, and sensory data. A great example of such a platform is the Basho Data Platform, where Basho uses its Riak database and makes it part of something more than just a key-value store.