Hi, Habr! I present to you the translation of the article "Tableau talks up natural language interface for creating visualizations" by Peter Sayer .
The BI provider seeks to simplify and automate data analysis as part of the growing trend towards the introduction of AI (artificial intelligence) capabilities into BI tools.
')
How many statisticians are needed to build a new data model? According to Tableau Software, not at all. The company claims that the next version of a widely used analytical tool will do it itself.
Tableau demonstrated this last week in a new feature called Ask Data, which allows users to create visualizations, describing what they want in natural language. This was done at an event for clients in New Orleans. In addition, the company has demonstrated new automation features in its data preparation tool.
This is part of a growing trend among enterprise software developers to automate and simplify tasks that once required specialized skills, allowing enterprises to more efficiently use their data and redeploy qualified staff to less time-consuming work.
Dawn of AI technology in BI
Achievements in the field of artificial intelligence make it easier for corporate software developers to enter data in natural language — oral or printed — and display the information the user needs, instead of forcing him to learn specific commands or operate objects on the screen to achieve his goals. AI is increasingly being used in leading BI tools in the hope of “democratizing” analytics and data science.
Microsoft Power BI, a competitor to Tableau, introduced a feature called “questions and answers” ​​several years ago, but even in recent demonstrations, the sentence seems more difficult in grammar and spelling than Tableau Ask Data. Nevertheless, they both are ahead in the development of the company Dundas BI and the like, which still use drag and drop to create visualizations.
The Tableau implementation will allow users to query the database and allow the software to independently figure out how the database tables should be combined, which columns should be selected and which operations should be performed to get the required response. This and other new features will appear in Tableau 2019.1, which is expected to be released early next year, and the beta version is released in late October.
"
Such automation features are welcome and necessary ," said Martha Bennett, Forrester chief analyst. “
We are getting more and more data, but people who work with them do not have that much time .”
According to her, data managers spend up to 80 percent of their time preparing data, and the less time they spend on it, the more they will be able to engage in those BI functions that directly benefit the business.
One way to overcome the shortage of time among specialists is to transfer most of the work to the machines. Another way is to simplify work with data for people who previously could not work with them themselves because of the need to possess special skills. This is the so-called "democratization" of data.
Disadvantages of using AI
“
But there are risks in providing data to more employees: data cannot substitute for domain expertise and sober assessment of situations ,” said Marta Bennett.
“
Before making new automation features widely available, CIOs should test them through their own experiences to determine if they are suitable ,” she advises.
Tools that offer data analysis without clear recommendations can confuse users in what actions to take.
“ If you don’t give anyone detailed instructions, you shouldn’t expect him to do everything right the first time .”
- Martha Bennett, Forrester Principal Analyst
However, you can’t just put all the responsibility on the software.
“
Automation is not the same as control. All these things still need to be monitored. At trial, it will not sound very good if you say that the computer itself did it, and we have no idea why, ”Marta Bennett warns. This problem has long been known as the AI ​​black box problem.
Additionally, you need to find out if your data is suitable for an automation tool: in particular, machine learning systems require a lot of data to work.
“
If you apply machine learning algorithms to data, where you have more exceptions than normal ones, this will not work ,” she said.
Demo details
At an event in New Orleans, visual analytics manager Andrew Vigno demonstrated the capabilities of Ask Data in the Kickstarter crowdfunding project database, showing that, unlike most compilers, Ask Data does not require perfect punctuation for work.
The software turned his request “what was the total funding” (literally) into a “amount of funding” and returned the answer. When he typed "by year" and "by status", Ask Data converted his request to "amount of financing by term and by status". Then, having no additional data, she prepared a color line chart showing green funding for successful projects, increasing every year, while funding for unsuccessful, canceled or suspended projects (red, orange and yellow) remains unchanged.
The question “which categories were successful” caused another visual answer: Ask Data added “by category, filter status - successful” to the previous query and drew a histogram of ranking Kickstarter categories by the number of successful projects in descending order.
Employees have long wanted corporate software to do what they wanted, even if they didn’t manage to formulate the task accurately, and Andrew Vigno showed that Tableau was approaching this. When he scored "compare with the average funding" (literally), Ask Data showed him the variation in the number of projects along with the average funding for different subcategories of technology projects that he had previously viewed.
Some things in Tableau are still faster to do with the mouse, especially if you are typing slowly: adding subcategories of “fashion” and “game” to the scatter diagram takes only four clicks.
Creating new data models
A few clicks are all his colleague Tyler Doyle needed to create a new data model that displays the fields Tableau uses to analyze data in SQL queries that the underlying database can understand.
“ It’s enough for me to click the mouse on one line -“ Add related objects ”, and your data model is ready, without having to determine which tables to use, how they are related, or what it is, left or right connection. New data modeling features in Tableau just do it for you. "
- Tyler Doyle
“
How did the data model recognize the correct relationships between these tables? "- Doyle is interested. It turns out that Tableau relies on CIOs, as well as their database administrators and data specialists. In order to help him do this trick, you need to make sure that the necessary information is stored in the data warehouse.
Data preparation is another area that Tableau is working on. Senior engineer Zahira Valani showed how Tableau Prep can automate data cleansing with “roles”. Tableau uses them to identify fields that have a specific role — things like URLs, email addresses, or geographical indications (states or postal codes). Valanee showed how, in just a couple of clicks, Tableau Prep can check the contents of the field to determine the most appropriate role, and then select invalid elements that do not match the role, and either set them to “null” or filter these lines. The same can be done with user roles, for example, enumerated types.
According to Tableau’s Chief Product Officer, François Eienstat, Tableau Prep will be updated monthly, in contrast to the schedule of three releases per year for the main Tableau software.
Planning is a function of another tool, which is now being tested in the company: Tableau Prep Conductor. It will allow enterprises to automate the preparation of their data sources by moving them to Tableau according to their chosen schedule. This is a separate product from Tableau and will require a separate license to use it. Start of sales is scheduled for next year.