📜 ⬆️ ⬇️

Text Analytics as Commodity: Text Analytics Application Overview

text analytics landscape If I were given a billion dollars in research, I would create a large NASA-wide natural language processing (NLP) program. [from Reddit AMA Michael Jordan , 2015]. From this publication, you will find out if there is a market for text analytics applications. And the honored professor M. Jordan is not too optimistic about the potential of NLP, but it is better to spend a billion dollars on something else.

Introduction


First we define the terms. Text mining (eng., Text mining ) is a technology for obtaining structured information from collections of text documents . Typically, this concept includes such major tasks as

Often, when people talk about the application of text mining in business - text analytics (English text analytics ) - they mean not just structured information, but so-called an in-depth understanding of the subject of analysis (insights), which helps in making business decisions. The well-known expert Seth Grimes defines text analytics as technological and business processes of applying algorithmic approaches to processing and extracting information from text and gaining a deep understanding .

It is generally accepted that a new market for cognitive computing products is being formed. MarketsandMarkets estimates that the global market for products based on natural language processing should be $ 13.4 billion by 2020 with a growth of 18.4% in CAGR. Thus, now this market is estimated at about $ 5.8 billion. In recent years, this growing market has been marked by a number of high-profile deals , such as IBM buying Alchemy API. According to other estimates , a similar market in Europe already exceeds half a billion dollars and will double by 2019. The North American market accounts for almost 40% of the global text analytics market and has optimistic growth estimates .

Of course, the reader is certainly familiar with the success of the IBM Watson platform. The purpose of this publication is to talk about other interesting and possibly little-known text analytics applications in such areas as:

')

Corporate search


The search for documents of an organization is a well-known information search application in the field of corporate workflow. Clients of such solutions are both large and medium commercial organizations, and government agencies. The reader may ask a reasonable question: why create their own search engines when there is Yandex and Google? As it turns out, the task of searching the web and corporate search has a number of major differences:

In addition, important task features such as the availability of structured directories and knowledge bases of the organization, the need to integrate with various software subsystems of storage and analytics, as well as support for multiple data formats.
gartner enterprise search

Wikipedia provides an impressive list of products in the field of corporate search . Gartner distinguishes among world leaders - HP Autonomy and Coveo . However, none of them is not without flaws (for example, in the sense of supporting the Russian language). Therefore, this direction is still promising for applications.

E-commerce search


Search products for online stores can be considered as a special type of corporate search. And here the importance of the search is almost decisive for the client's business: e-retail is constantly thinking about increasing conversion rates, average bill, marginality and speed of sales of goods. According to a recent Russian e-commerce survey prepared by DataInsight, an analytic agency, at least 20% of shoppers point out the importance of searching as a function of an online store. Moreover, it is known that users searching the site are in themselves a high conversion group of visitors. And in the case of an online store mobile application (m-commerce), search is essentially the only function used. The specific features of the task are the need to support the research scenario (exploration) for customers who cannot precisely state what product they want to buy. In particular, this is achieved through the mechanism of the proposal of search queries when typing (real-time query suggestions).

Successful exit stories in this niche market include:

oracle commerce platform

Oracle Commerce Guided Search (based on Endeca) is a powerful solution with flexible ranking options and personalized search. SAP has a similar solution - search technology in their e-commerce platform Hybris . You can also highlight A9 - spin-off of the largest American online retailer Amazon. It is noteworthy that the leading search engines tried to enter this promising market, but their success is not obvious. Google quietly offers its Commerce Search , closing the segment of expensive implementations, including in Russia. Among the shortcomings of this solution is the lack of search customization. Yandex, as the author knows, did not make a pilot for one of the largest Russian online retailers. In general, on this topic I strongly recommend the materials of Anton Terekhov, who is leading the Shopolog project .

Monitoring object references in the media environment


The Internet generates petabytes of texts every day (according to BI , Facebook alone is 500 TB / day; “the reader is waiting for the Big Data term // So, take it soon!”) ​​Of interest to modern companies: news, press releases, reviews , comments, blog posts and social media posts. The service of collecting and packaging such data into a single stream, the so-called. firehose is becoming a serious business in which significant players are Gnip (recently acquired by Twitter), Datasift , Xignite , Diffbot , Kimono and Connotate . APIs of the largest search engines can be attributed to this category. But data is half the battle. Other companies offer their customers the same insights.

sysomos dashboard


The first group can include companies that work with social media (social media listeners). Meltwater analyzes billions of posts on social networks, tracking customer brand references, finding opinion leaders and comparing it with competitors. The end users of this product are brand managers. Cision complements this functionality with the ability to measure the effectiveness of brand promotion companies. Sysomos offers similar functionality and the ability to respond to reviews by entering into correspondence. Luminoso Dashboard - nontrivial visualization of mentions based on the keyword cloud. Radian6 complements the possibilities of lead generation and direct sales. NewsWhip identifies early trends, high-profile stories and memes - this is in demand among online publishers and marketers. Russian social media monitoring systems - YouScan , Kribrum , BrandAnalytics , SemanticForce .

Another group includes companies that allow to receive real-time results of the so-called. external due diligence of interested companies, that is, structured information about customers, leads or competitors (new contracts, products, court decisions, takeovers, hiring staff, changes in management, etc.). One of the leaders in this area are LexisNexis News Company Research and InsideView for Sales products.

image


Marketing


Laboratory market research on focus groups has its significant drawbacks: the long search for participants, the weak representativeness of the sample of participants and differences in the motivation and behavior of research participants and real customers. Feedback from real customers is much more valuable. So, Attensity Analyze allows you to receive a structured feedback on a product, sales of which are launched in test mode. Their clients (as an example of a telecommunications company) in a matter of hours receive a list of the top problems of their products, which are written by real users in social networks. This provides an opportunity to correct deficiencies and make the necessary changes before a full launch.
clarabridge dashboard

Clarabdridge analyzes customer opinions expressed in questionnaires, social media, support calls, for the purpose of predictive analytics. By combining various signals (for example, shopping history and demographic data) with the results of the analysis of the tonality of reviews, it is possible to predict the likelihood of outflow, re-purchase and LTV.
The Bloomreach Relevance Engine analyzes the content of web pages, patterns of user behavior, their requests to create highly relevant content, which significantly increases the conversion to a purchase.

Smart sales


Open data on the activities of a company on the Internet provide enough information to use it for their own purposes. For example, Datanyze , BuiltWith and HG Data track information about the technologies used by various companies. This information is used by Datanyze customers for smart sales - offering support services or offering their competing technologies to the right leads.

Spiderbook collects a database of companies, simplifying the life of the sales department of customers in the B2B market. Data includes company profile and a list of recommended contacts within the company. Quid has advanced search capabilities and easy-to-use graphical visualization for decision making. Introhive focuses only on gathering contacts, their relationships with other people, and recommendations for establishing personal contact.

salespredict dashboard


SalesPredict , Infer , LeadSpace collect all available information on lead companies (including buying licenses from the corporate knowledge bases - Orb Intelligence , Dun & Bradstreet , LinkedIn ): number of employees, number of open positions, level of presence on the Web and social networks , patents, technologies, trademarks, events, etc., build a complex mathematical model, combining these data with information from your CRM, and calculate the so-called lead scoring, an estimate of the likelihood of buying your service by a specific lead

Information Security


Internal corporate data are of particular value. Modern large companies need to control the dissemination of information, intercepting email, messages in business messengers and collaboration servers in order to prevent the leakage of personal data and client databases. And also to identify unscrupulous employees engaged in espionage or sabotage. Another classic example of a text analysis task is the fight against spam. Similar products, built on the basis of the basic text analytics device, are available at Kaspersky Lab and Infowatch . Digital Reasoning is developing a platform to combat financial abuse.

palantir dashboard


Government agencies are increasingly turning to applications based on text analysis, as a rule, in the fight against extremism and intelligence activities, collecting, summarizing and analyzing information from a huge number of sources. The famous American startup Palantir develops products for both military , intelligence departments , investigating agencies , etc., and for business - such as the Palantir Gotham system.

Personal assistants


Text is a universal presentation format. Messages can be translated into text from other digital sources, such as images, telephone recordings, voice messages and audio tracks of video files. The breakthroughs of recent years in the theory of deep neural networks (deep learning) have made it possible to reach a qualitatively new level in solving problems of speech and image recognition. An accessible tool for speech recognition in Russian is the Yandex Speech Kit technology.

It is clear that docking speech recognition and word processing can produce impressive results. Already become traditional applications - voice search Google, Siri and Cortana from Microsoft. In addition, I note demo video DragonDrive product from one of the leaders in this field - the company Nuance . DragonDrive is a smart assistant that helps the driver to read emails, send messages and create notes on a calendar based on voice control. Another spectacular example is the promo of the Gridspace Memo product, which keeps a log of corporate meetings. Finally, the Pop Up Archive aims to implement a search for all types of audio files, which finds use in organizations that work a lot with audio — radio stations, media, museums, libraries. Kasisto develops communication technology with a virtual assistant in the field of financial issues. Viv from the creators of Siri and Robin Labs with contextualization and multilingual capabilities are two more ambitious projects in this area.

Conclusion


Summing up, we can conclude that the situation in the global market for text analytics applications is capaciously expressed by the phrase in the title of this publication: text analytics has become a valuable product available to any developer . Unstructured data analysis technologies that have reached a qualitatively new level open up tremendous business opportunities.

In recent years, tools have emerged, including cloud-based, qualitatively solving basic tasks that form the basis of text analytics and machine learning tools. Relying on such products, technology companies can develop new applications for the end user, reducing the technological risks and costs of development, infrastructure, and the purchase of licenses for text processing by-products. An example is the Textocat API multilingual text analysis service that implements a standard set of functions:

Another project of our company will soon appear in open access - TextoKit open free library, which implements lower-level functions for working with text and built on the same platform as IBM Watson. We plan to organize a community of developers around TextoKit and transfer our unique experience in building a scalable pipeline for processing natural language. If you are interested in the rapid development of this project and are already ready to help with the preparation of documentation for the existing code, please write us at mail@textocat.com.

As examples of products that are based on the Textocat API, I will cite two other products of our company - Textocat News and Textocat E-Commerce Search.

A product of our company is being prepared for release - Textocat News . This is a technology for collecting news about companies with automatic classification by event type. This technology is integrated with the Orb Intelligence API , offering profiles (the so-called company DNA, "DNA of the company") of several millions of American and international companies. How such information can be used in real time allows us to judge the recent case of the phenomenal growth of Google shares.

At the moment, our company is conducting a closed limited beta testing of specialized search technology for online retailers Textocat E-commerce Search , implemented on the basis of the Textocat API platform. We invite representatives of online stores who care about the effectiveness of their business to test our product.

Analysis of the text analytics market in Russia / CIS


Textocat company conducts research on the analysis of the market text analytics in Russia and the countries of the former Soviet Union. We invite you to fill out a short form . A survey prepared based on the results of the survey will be available to survey participants.

The author thanks colleagues from Textocat ( Alik_Kirillovich , Aldvin , nomemm ), Maxim Gubina (Google), Maria Grineva (Orb Intelligence) for valuable comments.

About Textocat

Textocat is a technology company that creates business solutions for extracting useful information from unstructured data. The company's mission - text analytics as commodity - to make text processing easy for the modern developer of software products.

Source: https://habr.com/ru/post/259035/


All Articles