Today we will talk about one of the SAP services, which characterizes our new approach to creating products and working with clients. This is the SAP Cloud Platform Big Data Services solution that offers customers the ability to work with big data in Hadoop according to the cloud application subscription model.
In the first article, we will review how Big Data analysis can be useful for business in practice, how cloud and on-premise Hadoop is different, and about basic functions, services and technologies in SAP Cloud Platform Big Data Services. In the following articles we will examine in more detail the technological features and individual services within this solution.
Big Data in Business ')
Everyone knows that among SAP customers there are many large Russian and global companies from industry, metallurgy, oil and gas and other "conservative industries", for them we develop and implement IT solutions and systems. Now these companies are increasingly investing in new technologies - in the Internet of things, machine learning or working with Big Data (in particular, seeking to extract new value from this big data). For example, for steel companies in the current economic and geopolitical conditions it is crucial to find new sources of profit or ways to reduce costs. One of these ways lies in the search for new ideas in big data, telling us about business, about work processes and the outside world as a whole.
There are many solutions for storing and working with big data on the market - both free open source and commercial products. The most popular solution is Hadoop and its additional components. Among the reasons for its demand:
Reliability
Scalability
The optimal cost of storing information
A large number of additional software components with open source data processing in Hadoop - Spark, Hive and others
A large number of specialists are available on the market who know how to work with Hadoop
The popularity of free open source solutions is obvious. However, when deploying Hadoop for industrial use, as a rule, free open source versions are not used in their pure form. In the business world, commercial versions of Hadoop open source products have gained popularity. They are distributed by Cloudera, Hortonworks and other developers. In this case, providers are responsible for the reliability of the software and the interaction of all components. There are also alternative services that provide the ability to work with big data through the cloud, by subscription.
Business often faces a dilemma - which approach to working with big data to choose, on-premise (local) or in the cloud. Of course, most of the internal IT departments of companies vote for the first option, due to the traditional concerns about clouds.
Research company Forrester conducted a survey among companies that work with Big Data about how they use their Hadoop solutions - in the cloud or on-premise. 37% of respondents said they plan to increase investment in cloud services for Big Data from 5% to 10%. Another 14% of research participants said they would increase the cost of cloud Hadoop solutions by more than 10%. Why do they opt for the clouds?
Running Hadoop on your own servers is easy at the initial stage of working with Big Data, while experimenting with data and testing hypotheses. Another story is if you need to launch a solution into commercial operation, where there are certain requirements: SLA for availability of 99.9%, ensuring high reliability of storing huge amounts of data, as well as performing targeted KPIs for performance.
If you chose to place Hadoop on-premise in production, you will have to solve the following tasks:
Find and hire experienced IT professionals
Purchase the necessary equipment
Purchase the necessary distributions, install and adjust the software
Run solution in production
Maintain work decisions with regular operating costs (staff salaries, equipment maintenance, etc.)
It is necessary to take into account that this preparatory stage takes considerable time. Therefore, companies make the choice between on-premise and cloud service.
In one of the reports of the consulting firm Bain & Co. provides an example of Netflix. In 2016, the company announced that in order to process Big Data, they have to work with thousands of data nodes under a huge load. Every day they process 350 billion user events and petabytes of data from their services. Of course, in this case, you cannot cope only with your own servers - or you will have to continuously build your data centers.
Another example from the more “traditional industries” is General Electric. In 2013, they began to move from their own data centers to clouds. First, the oil and gas divisions switched to the new service, then the transfer of more than 9,000 thousand infrastructure applications of the company began. As a result, General Electric managed to reduce the number of its own data centers from 30 to 4, and with it the costs for personnel, equipment, etc.
SAP has not stayed away from the cloud trend. In 2016, we were joined by a team from Altiscale, one of the world's leading service providers in the Big Data As-a-Service model. Their solution has become a new product of the SAP Cloud Platform Big Data Services, which is available to SAP customers according to the cloud subscription model, and has also been integrated into the overall SAP cloud structure.
The developers of this solution are the former Chief Technology Officer (CTO) of Yahoo and his colleagues who were involved in the development of Hadoop in the company. For 7 years of work at Yahoo, they turned their small Hadoop project into a productive system with more than 42,000 data nodes.
What is the SAP Cloud Platform Big Data Services - SAP Hadoop-service from SAP
SAP Cloud Platform Big Data Services is a set of tools for working with big data on the SaaS model (Software-as-a-Service).
Consider the architecture of the SAP Cloud Platform Big Data Services.
The service includes three main parts:
Apache Hadoop Cluster
Cluster uses ODPi certified Hadoop compilation. This means that applications and scripts running in ODPi-environments of other services will run successfully on SAP Big Data Services.
* For reference, the ODPi (Open Data Platform initiative) is a non-profit organization that standardizes Hadoop and its components.Apart from SAP, ODPi includes such well-known vendors as Hortonworks, IBM, SAS and many others.
The cluster includes three types of control nodes, serving nodes and data nodes: namenode, secondary namenode, resource manager (YARN is included in the initial configuration of the service).
In this case, the duplicate name node supports the additional services Oozie, Hive Metastore, etc. When connected, a separate cluster with the necessary resources is issued to the client. Resources are described by the storage capacity and the number of machine hours. If necessary, cluster resources can be flexibly expanded while performing critical calculations or on an ongoing basis.
Workbench is a single point of access to the Big Data Service.
For security reasons, direct access to the Hadoop cluster is limited to service personnel and Workbench. The client only has access to the Workbench, which includes local Hadoop, as well as Hive, Spark, Oozie, Pig, and other necessary components for data science and data engineering, including SAP Lumira and SAP Predictive Analytics :. More information about the service can be found on the website .
Using Workbench, a client can run scripts, explore data using Business Intelligence tools, and perform other tasks. In turn, Workbench works closely with the Hadoop cluster over a high-speed channel.
Big Data Service Portal
It is used to maintain users, generate access keys to the Big Data Service, view cluster usage statistics, and perform other operational tasks encountered by the client.
To connect the Big Data Service to the outside world, a jumphost server is used. All network interaction is carried out in the space of local ip-addresses - virtual private cloud. The standard way to access Big Data Service is SSH. Other connection options are available upon customer request. Big Data Service also supports kerberos authentication, which allows Single Sign-On (SSO) to be used.
Big Data Service can interact with other SAP cloud services as well as with on-premise solutions. The following options are available for integration:
Collecting and processing sensor data with Kafka Streaming
Extraction of data from relational databases using Kafka Connectors or SAP Data Services
Interaction with SAP systems on the SAP HANA platform via Smart Data Access and Smart Data Integration
Hadoop on-premise Hadoop Distributed File System (HDFS) interoperability
The communication channels connected to the Big Data Service are organized in such a way as to download data from high-speed client systems.
Next year’s Big Data Service roadmap integrates SAP solutions for working with Vora and SAP Data Hub “big” data. We will tell about them in more detail in one of the following articles.
Difference of SAP Cloud Platform Big Data Services from other cloud Hadoop solutions
The main difference between an SAP solution and others is that it can be organically integrated into business processes by integrating with services and other SAP systems. This is a key factor that helps monetize big data in practice. If, when working with Hadoop, only data scientists see data analysis results, then they have yet to convince business users of the need to apply new ideas in practice - and there is no guarantee that the hypotheses will be applied in practice. SAP Cloud Platform Big Data Services can be directly integrated with the company's internal IT systems as part of a business process. In more detail about the differences of the SAP solution from others, about how in practice to embed the results of the work of specialists on big data into business processes, we will describe in the next article.
Client Cases Using SAP Cloud Platform Big Data Services
Glu mobile
Glu Mobile is one of the major global mobile gaming developers, including successful Cooking Dash, Deer Hunter, Contract Killer, Kim Kardashian: Hollywood, Frontline Commando projects. The company has development studios around the world, one of which is located in Moscow.
Glu Mobile develops and supports free-to-play game services that are free to download and monetize through internal microtransactions. For such game services, it is important that players do not leave them for a long time.
The daily audience of Glu Mobile projects is more than 5 million active users, the entire game of the company installed more than 1.3 billion times. Considering such a large-scale audience, the company faces the following tasks - to make the player comfortable and fun to play, while increasing the profit rate of LTV (lifetime value) of one player.
To do this, the company in real time collects huge data from its projects:
More than 30 thousand user actions every second
About 2 billion user activity reports every day
Over 100 million events from various metrics
2 trillion user events are stored on the basis of the SAP Cloud Platform solution
Initially, Glu Mobile tried to use on-premise Hadoop solution, but faced the following difficulties:
The more data volumes became, the more difficult it was to work with them.
Hadoop weak internal team
Poor system reliability, periodic server crashes
Weak results when querying databases
As a result of the transition to the SAP Cloud Platform Big Data Services, the Glu Mobile team obtained the following results:
The solution meets the needs of the data processing company.
Ability to work with huge volumes of rapidly emerging new data
One of the best solutions on the market in terms of performance and reliability
The internal team was freed from the need to spend time on Hadoop and switched to Data Science
Simple scalability based on business needs
How Glu Mobile uses the SAP Cloud Platform Big Data Services:
Case Neustar MarketShare DecisionCloud
Neustar is a company that provides clients with services for analyzing the results of marketing campaigns, as well as analyzing user actions. The company collects a variety of data in many industries - retail, finance, pharmaceuticals, automotive, technology companies.
Currently, the amount of data placed on the service facilities of the SAP Cloud Big Data Service is about 2.5 petabytes.
When Neustar used the previous platform for Big Data, they had the following problems:
Too much time spent on operations.
Poor service reliability
Difficulties in product development
Rising infrastructure maintenance costs
Ability to work only with a limited number of clients
After moving to the SAP Cloud Platform Big Data Services solution, the company received the following benefits:
High performance and reliability of service
Ability to focus on analytics instead of operating Hadoop
More efficient resource allocation and cost management
Increased competitiveness of the market solution
First Data Company
First Data is a company that handles bank card transactions. This is the largest American bank card processing service (up to 45% of the market).
At the first stage of implementation in 2015, the SAP Cloud Platform Big Data Services solution has expanded the functionality of First Data for small businesses, and also reduced costs by $ 500,000. In the second stage, in 2017, the solution helped introduce fraud detection through bank cards, and also saved the company another two million dollars in ACV.
Using the SAP solution also enabled First Data customers to obtain the following information:
Link together transaction information and third-party data
Get analysis of customer data and the results of promotional campaigns depending on geography or demographic factors
Compare your results with similar business results.
Receive recommendations for improving sales, marketing activities and increasing customer loyalty based on big data
Problems associated with the previous infrastructure for working with Big Data:
The size of the investment required is not enough to scale up the use of a proprietary solution.
Inability to study in detail
Limited number of visualization options available
Weak vendor support
When choosing a SAP solution, First Data was guided by the following objectives:
Expand product usage among more customers.
Support for analyzing more detailed data and a larger set of visualizations
The ability to add new features and greater interactivity to the product over time
What benefits did the transition to the SAP solution bring?
Significant cost reduction
Fulfillment of productive goals
Flexibility in analyzing detailed information
Extensive data visualization capabilities
Broad vendor support, including technicians
Productive platform for working with Big Data
One of the results of the transition to the SAP solution - the execution of SQL queries on a new solution is 30 times faster than expected.
A brief summary of the article about the SAP Cloud Platform Big Data Services :
Quick start project
Equipment availability for industrial start-up within days, not months
Rapid return on investment from using the cloud service (confirmed by experts and analysts)
Reliability and SLA 99.99% for service availability, which meets the requirements of industrial solutions
High data processing speed due to innovative architecture and specially developed software versions
Cloud Hadoop service successfully coexists with existing on-premise Hadoop clusters and other systems
The client does not need to worry about hardware, administering Hadoop, updating components — the provider assumes these tasks
The SAP Big Data Service offers its customers a support service comparable to the well-known SAP Max Attention premium support service. The client can seek help from a team of professionals on various issues, including recommendations on the performance of calculations and so on.
In the following material we will talk more about the plans for the development of the SAP Cloud Platform Big Data Service: about integration with other SAP services and solutions, about new functions and applications, etc.
If you have read this material to the end and want to independently test how to work with the SAP Cloud Platform Big Data Services in practice, please contact us to get free trial access to the service.