📜 ⬆️ ⬇️

Airpal: a web application for working with SQL



The other day we released our new instrument, Airpal. This is a web application designed to work with databases, which is designed to complement Facebook's PrestoDB when analyzing information. And in this post we would like to talk about its capabilities and features.

Airpal is an open source project. A distinctive feature of the application is that it was originally created with an eye on users who are not technical specialists. However, we wanted to give them a tool that allows them to fully work with data arrays, and at the same time help professionals improve the efficiency of the analysis.

People who are regularly involved in sampling and processing data from SQL know that these processes are not always conveniently organized. It is necessary to remember how requests were formulated, to copy and paste them into the command line, to run numerous terminals — all this slows down the work and is not particularly pleasant. In addition, if there are beginners in the team, then for them the learning curve can be quite steep. And we decided that all these annoying problems could be solved with the help of a good UI.
')
Airpal works in conjunction with PrestoDB , an open SQL engine for working with big data. In this case, not only programmers and data processing specialists can work with Presto, but also employees of other professions. In some cases, such a tandem can be an inexpensive alternative to data storage. In addition, Airpal can act as a more user-friendly interface for accessing large amounts of data stored in a Hadoop cluster.

Airpal allowed us to make access to more data much more democratic. The application has been used internally for about a year, and about a third of all employees work with databases through this tool. How many bookings were made per day? How many people overnight spent the night in places found through our service? How many offers are there in a particular region, city, district? How many users speak German or Spanish? Today, in the form of Hive tables in HDFS , we have about 1.5 petabytes of data stored - 25 million bookings in 34,000 cities around the world. And with only a relatively small number of important, key tables, can we use Presto as the default query handler.

Previously, we used the Redshift web service to work with the database. But for several reasons, he did not satisfy us: he demanded installation of a set of ETL tools (extract, transform, load) for preliminary data preparation, had a limit on the number of simultaneously executed queries. In addition, Redshift has very uninformative error messages, which seriously made debugging difficult.

In terms of arbitrary queries and iterations during analysis, Presto is much smarter and faster than traditional MapReduce tools. However, for us the most useful consequence of implementing Presto was the absence of the need to complicate the process of “interactive” query generation. Since we work with our own Hive-storage, we can afford to have “the only source of reliable data”, without heavy copies at a separate storage level. And the fact that we do not need to change the RC format in which data is stored has made Presto an ideal choice for our infrastructure.



Key features of Airpal:


Trying to preserve the ideology of Presto, we tried to simplify the installation of Airpal, so that it can be tested without any difficulties. Detailed information can be found on GitHub .

Source: https://habr.com/ru/post/252859/


All Articles