Materials from VLDB, conferences about the future of databases

The VLDB conference (Very Large Data Bases, www.vldb.org ), as it is easy to understand from the title, is dedicated to databases. Very large databases. What her name doesn’t say is that there are very serious people who regularly perform there. Do you know a lot of conferences where Michael Stonebraker (Michael Stonebraker, creator of Vertica, VoltDB, PostgreSQL, SciDB) reports almost every year? Did you think it would be great to find out what these people are working on now, so that after a few years, when the new base breaks the market, not to bite your elbows?

VLDB is the conference you need to attend if you are thinking about the future.
It will not help you much if you choose from existing bases. There is a small share of industrial reports (Microsoft, Oracle, Teradata, SAP Hana, Exadata, Tableau (!)), But the most interesting is research reports from universities. Although it is very quickly discovered that there are one or two people working for Google, Facebook, Alibaba in the university teams ... or who went there immediately after submitting the article.

I hope I managed to interest you in a basic way, but now let's take a look at the reports.

I will not try to describe all 232 reports, but I will try to single out key groups, and demonstrate several outstanding representatives for each group.

1. Base of the future

Very soon we will have a cheap non-volatile memory (combining RAM + Hard Drive). RAM, cores and graphics cards are rapidly getting cheaper. What should be the base of the future to benefit from all this technological magnificence? What new problems arise?

1.1 Distributed Join Algorithms on Thousands of Cores

Understandable by name: this is a study of the work of distributed Join algorithms on systems with thousands of cores.

1.2 Adaptive Work Placement for Query Processing

Distribution of tasks across a heterogeneous cluster.

1.3 SAP HANA Adoption of Non-Volatile Memory

The first experiments with non-volatile memory.

2. Transactions in distributed (cluster) databases

Good and easy to live on one server. And what if the base needs to be deployed in a cluster? Suddenly, one base needs to be split into dozens of small ones, according to microservice architecture? How to deal with transactions?

2.1 An Evaluation of Distributed Concurrency Control

Stonebreaker article. Simple and honest - they took and wrote a base from scratch to compare half a dozen distributed transaction algorithms for OLTP systems. No PR and advertising: just honest graphics and asymptotics for different scenarios.

2.2 The End Of A Myth: Distributed Transactions Can Scale

Very optimistic claim about the ability to scale the performance of distributed transactions.

3. Storage substitution

The fashionable approach now is to replace the storage infrastructure of the old bases and put something fast there. For example, in-memory key-value storage. Or, for example, two parallel storages at once - string and column. Or six storages on different physical machines ...

3.1 Fast Scans on Key-Value Stores

What you need to do to solve OLAP-problems on a key-value basis.

3.2 PaxosStore: High-availability

An article about how TenCent (WeChat) has databases. 800 million active users - tell them about the high load.

3.3 Parallel Replication for Scaling Out Mixed OLTP / OLAP Workloads

OLTP + OLAP load on one base.

4. Query optimization

As I understand it, the main trend now is query optimization in distributed systems. Ideally, on the fly, with the adjustment / restructuring of the plan right along with the data.

4.1 Runtime Optimization of Join Location Management Systems

4.2 SquirrelJoin: Network-Aware Distributed Join Processing with Lazy Partitioning

You consider the request on the cluster, the cluster is loaded with parallel tasks, and unevenly. What to do if individual nodes begin to work obviously more slowly than others? The answer is in the article.

5. Visualization and data analysis

5.1 ASAP: Prioritizing Attention via Time Series Smoothing

How to smooth the graphics, removing the noise, but leaving anomalies.

5.2 Effortless Data Exploration with zenvisage: An Expressive and Interactive Visual Analytics System

A very curious interactive tool.

6. Man-machine interface :)

6.1 Data Vocalization: Optimizing Voice Output of Relational Data

“Data Vocalization” sounds absolutely fantastic, but the essence is simple: how to compress the sample given by the query into a limited set of words so that you can hear Siri rather than break the phone.

6.2 Provenance for natural language Queries

<Best Article VLDB 2017>. Yes exactly. How to write queries to data in natural language. More precisely: how to translate questions in a natural language in requests for data, and the results - back to the human language.

At last

Actually, that's all. It would seem a little: I have collected here for you only 14 articles. But I would be very interested to know how many people actually read them all to the end. If you take, write in the comments how long it took. For those who are brave, follow the link - the remaining 218 articles: http://confer.csail.mit.edu/vldb2017/papers . And here is a photo from the report of the conference organizers.

Ps. VLDB 2017 was in Munich, for participants there was a small Oktoberfest (good :)). Next VLDB will be in Brazil, merge! I will try to go with the report (in 2015 I could not).

Source: https://habr.com/ru/post/338180/

All Articles