📜 ⬆️ ⬇️

Big Data in Hadoop Subscription in the SAP Cloud

Today we will talk about one of the SAP services, which characterizes our new approach to creating products and working with clients. This is the SAP Cloud Platform Big Data Services solution that offers customers the ability to work with big data in Hadoop according to the cloud application subscription model.

In the first article, we will review how Big Data analysis can be useful for business in practice, how cloud and on-premise Hadoop is different, and about basic functions, services and technologies in SAP Cloud Platform Big Data Services. In the following articles we will examine in more detail the technological features and individual services within this solution.

Big Data in Business
')
image

Everyone knows that among SAP customers there are many large Russian and global companies from industry, metallurgy, oil and gas and other "conservative industries", for them we develop and implement IT solutions and systems. Now these companies are increasingly investing in new technologies - in the Internet of things, machine learning or working with Big Data (in particular, seeking to extract new value from this big data). For example, for steel companies in the current economic and geopolitical conditions it is crucial to find new sources of profit or ways to reduce costs. One of these ways lies in the search for new ideas in big data, telling us about business, about work processes and the outside world as a whole.

There are many solutions for storing and working with big data on the market - both free open source and commercial products. The most popular solution is Hadoop and its additional components. Among the reasons for its demand:


The popularity of free open source solutions is obvious. However, when deploying Hadoop for industrial use, as a rule, free open source versions are not used in their pure form. In the business world, commercial versions of Hadoop open source products have gained popularity. They are distributed by Cloudera, Hortonworks and other developers. In this case, providers are responsible for the reliability of the software and the interaction of all components. There are also alternative services that provide the ability to work with big data through the cloud, by subscription.

Business often faces a dilemma - which approach to working with big data to choose, on-premise (local) or in the cloud. Of course, most of the internal IT departments of companies vote for the first option, due to the traditional concerns about clouds.

Research company Forrester conducted a survey among companies that work with Big Data about how they use their Hadoop solutions - in the cloud or on-premise. 37% of respondents said they plan to increase investment in cloud services for Big Data from 5% to 10%. Another 14% of research participants said they would increase the cost of cloud Hadoop solutions by more than 10%. Why do they opt for the clouds?

Running Hadoop on your own servers is easy at the initial stage of working with Big Data, while experimenting with data and testing hypotheses. Another story is if you need to launch a solution into commercial operation, where there are certain requirements: SLA for availability of 99.9%, ensuring high reliability of storing huge amounts of data, as well as performing targeted KPIs for performance.

If you chose to place Hadoop on-premise in production, you will have to solve the following tasks:


It is necessary to take into account that this preparatory stage takes considerable time. Therefore, companies make the choice between on-premise and cloud service.

In one of the reports of the consulting firm Bain & Co. provides an example of Netflix. In 2016, the company announced that in order to process Big Data, they have to work with thousands of data nodes under a huge load. Every day they process 350 billion user events and petabytes of data from their services. Of course, in this case, you cannot cope only with your own servers - or you will have to continuously build your data centers.

Another example from the more “traditional industries” is General Electric. In 2013, they began to move from their own data centers to clouds. First, the oil and gas divisions switched to the new service, then the transfer of more than 9,000 thousand infrastructure applications of the company began. As a result, General Electric managed to reduce the number of its own data centers from 30 to 4, and with it the costs for personnel, equipment, etc.

SAP has not stayed away from the cloud trend. In 2016, we were joined by a team from Altiscale, one of the world's leading service providers in the Big Data As-a-Service model. Their solution has become a new product of the SAP Cloud Platform Big Data Services, which is available to SAP customers according to the cloud subscription model, and has also been integrated into the overall SAP cloud structure.

The developers of this solution are the former Chief Technology Officer (CTO) of Yahoo and his colleagues who were involved in the development of Hadoop in the company. For 7 years of work at Yahoo, they turned their small Hadoop project into a productive system with more than 42,000 data nodes.

What is the SAP Cloud Platform Big Data Services - SAP Hadoop-service from SAP

SAP Cloud Platform Big Data Services is a set of tools for working with big data on the SaaS model (Software-as-a-Service).

Consider the architecture of the SAP Cloud Platform Big Data Services.

image

The service includes three main parts:

Apache Hadoop Cluster

Cluster uses ODPi certified Hadoop compilation. This means that applications and scripts running in ODPi-environments of other services will run successfully on SAP Big Data Services.

* For reference, the ODPi (Open Data Platform initiative) is a non-profit organization that standardizes Hadoop and its components. Apart from SAP, ODPi includes such well-known vendors as Hortonworks, IBM, SAS and many others.

The cluster includes three types of control nodes, serving nodes and data nodes: namenode, secondary namenode, resource manager (YARN is included in the initial configuration of the service).

In this case, the duplicate name node supports the additional services Oozie, Hive Metastore, etc. When connected, a separate cluster with the necessary resources is issued to the client. Resources are described by the storage capacity and the number of machine hours. If necessary, cluster resources can be flexibly expanded while performing critical calculations or on an ongoing basis.

Workbench is a single point of access to the Big Data Service.

For security reasons, direct access to the Hadoop cluster is limited to service personnel and Workbench. The client only has access to the Workbench, which includes local Hadoop, as well as Hive, Spark, Oozie, Pig, and other necessary components for data science and data engineering, including SAP Lumira and SAP Predictive Analytics :.
image
More information about the service can be found on the website .

Using Workbench, a client can run scripts, explore data using Business Intelligence tools, and perform other tasks. In turn, Workbench works closely with the Hadoop cluster over a high-speed channel.

Big Data Service Portal

It is used to maintain users, generate access keys to the Big Data Service, view cluster usage statistics, and perform other operational tasks encountered by the client.

To connect the Big Data Service to the outside world, a jumphost server is used. All network interaction is carried out in the space of local ip-addresses - virtual private cloud. The standard way to access Big Data Service is SSH. Other connection options are available upon customer request. Big Data Service also supports kerberos authentication, which allows Single Sign-On (SSO) to be used.

Big Data Service can interact with other SAP cloud services as well as with on-premise solutions. The following options are available for integration:


The communication channels connected to the Big Data Service are organized in such a way as to download data from high-speed client systems.

Next year’s Big Data Service roadmap integrates SAP solutions for working with Vora and SAP Data Hub “big” data. We will tell about them in more detail in one of the following articles.

Difference of SAP Cloud Platform Big Data Services from other cloud Hadoop solutions

The main difference between an SAP solution and others is that it can be organically integrated into business processes by integrating with services and other SAP systems. This is a key factor that helps monetize big data in practice. If, when working with Hadoop, only data scientists see data analysis results, then they have yet to convince business users of the need to apply new ideas in practice - and there is no guarantee that the hypotheses will be applied in practice. SAP Cloud Platform Big Data Services can be directly integrated with the company's internal IT systems as part of a business process. In more detail about the differences of the SAP solution from others, about how in practice to embed the results of the work of specialists on big data into business processes, we will describe in the next article.

Client Cases Using SAP Cloud Platform Big Data Services

Glu mobile

Glu Mobile is one of the major global mobile gaming developers, including successful Cooking Dash, Deer Hunter, Contract Killer, Kim Kardashian: Hollywood, Frontline Commando projects. The company has development studios around the world, one of which is located in Moscow.

Glu Mobile develops and supports free-to-play game services that are free to download and monetize through internal microtransactions. For such game services, it is important that players do not leave them for a long time.

The daily audience of Glu Mobile projects is more than 5 million active users, the entire game of the company installed more than 1.3 billion times. Considering such a large-scale audience, the company faces the following tasks - to make the player comfortable and fun to play, while increasing the profit rate of LTV (lifetime value) of one player.

To do this, the company in real time collects huge data from its projects:


Initially, Glu Mobile tried to use on-premise Hadoop solution, but faced the following difficulties:


As a result of the transition to the SAP Cloud Platform Big Data Services, the Glu Mobile team obtained the following results:


How Glu Mobile uses the SAP Cloud Platform Big Data Services:

image


Case Neustar MarketShare DecisionCloud

Neustar is a company that provides clients with services for analyzing the results of marketing campaigns, as well as analyzing user actions. The company collects a variety of data in many industries - retail, finance, pharmaceuticals, automotive, technology companies.

Currently, the amount of data placed on the service facilities of the SAP Cloud Big Data Service is about 2.5 petabytes.

When Neustar used the previous platform for Big Data, they had the following problems:


After moving to the SAP Cloud Platform Big Data Services solution, the company received the following benefits:


image

image

First Data Company

First Data is a company that handles bank card transactions. This is the largest American bank card processing service (up to 45% of the market).

At the first stage of implementation in 2015, the SAP Cloud Platform Big Data Services solution has expanded the functionality of First Data for small businesses, and also reduced costs by $ 500,000. In the second stage, in 2017, the solution helped introduce fraud detection through bank cards, and also saved the company another two million dollars in ACV.

Using the SAP solution also enabled First Data customers to obtain the following information:


Problems associated with the previous infrastructure for working with Big Data:


When choosing a SAP solution, First Data was guided by the following objectives:


What benefits did the transition to the SAP solution bring?


One of the results of the transition to the SAP solution - the execution of SQL queries on a new solution is 30 times faster than expected.

image

A brief summary of the article about the SAP Cloud Platform Big Data Services :


image

In the following material we will talk more about the plans for the development of the SAP Cloud Platform Big Data Service: about integration with other SAP services and solutions, about new functions and applications, etc.

If you have read this material to the end and want to independently test how to work with the SAP Cloud Platform Big Data Services in practice, please contact us to get free trial access to the service.

Source: https://habr.com/ru/post/344720/


All Articles