Interviewed by Rafael Knuth (Rafael Knuth)We present you the 10th interview from a series of conversations with
technical leaders of OpenStack Initiative projects in the Mirantis blog. Our goal is to educate as many members of the technical community as possible and help understand how to contribute to OpenStack and how to benefit from it. Of course, the following is the point of view of the interviewee, not of Mirantis.
So, interview with Sergey Lukyanov,
technical director of OpenStack Savanna .
')
Mirantis: Please tell us about yourself.Sergey Lukyanov: I am a senior developer and technical manager at Mirantis Inc., where I have been working for more than 3 years. I’m mainly responsible for architecture design and community activities related to the OpenStack. I have experience in participating in large-scale data processing projects and working with relevant technologies - Hadoop, HDFS, Cassandra, Twitter Storm, etc., as well as the development of industrial-scale projects. At the moment I am involved in various open-source projects, including Twitter Storm and OpenStack.
Q: How did you come to OpenStack? Why do you participate in the project?Answer: I have been active in OpenStack for about a year, and even before that, I watched its development since the Diablo release. Active development began for me with writing that part of the Swift cloud storage code, which allowed me to find out what physical machines some data was located on the outside (this later helped in implementing local computing for Savanna). Then I, directly, began working on the Savanna project and in parallel to take part in the development of other OpenStack projects - Oslo, Swift, Nova client, Hacking, Pbr, Jeepyb, etc. My main task within the framework of OpenStack is to increase the number of services and capabilities that it provides, in order to make it easier for application developers to use this platform and to get it as widely as possible.
Question: What are you responsible for as a technical project manager for Savanna?Answer: I mainly deal with project management. This includes monitoring and managing bugs and blueprints on Launchpad, coordinating work on checking new code using Gerrit, holding weekly IRC rallies with our team and meetings at the OpenStack Design Summit. It seems to me that the technical leader of the project is, first of all, the person who coordinates the work of all the teams within the framework of his project, and ensures that the general direction of his development coincides with the tasks and goals. In addition, I occupy one of the first places in the number of new changes made to my project, as well as in the number of checks made by the written code from other team members.
Q: What is the role of the Savanna project in OpenStack? What is its significance?In my vision, OpenStack is not only and not so much a technical infrastructure, but rather an extensive community of developers working on an incredibly large ecosystem of closely related and actively developing projects. And it is all of this that is a cloud platform. And here I see an excellent opportunity for the future development of this ecosystem by implementing and integrating it with other open-source initiatives and the communities that develop them. And just the integration of OpenStack with Apache Hadoop is a great example. From a user’s point of view, handling large amounts of data can ultimately be useful for most OpenStack initiative projects.
Question: What is truly unique and new in the project Savanna?Answer: The Savanna project applied for becoming the official incubated OpenStack project at the last stage of the Havana cycle as part of the Data Processing program. Today, Savanna provides basic infrastructure operations in the following two areas:
• support and manage Hadoop clusters based on Hadoop vendor tools such as Apache Ambari to provide access to the Hortonworks data processing platform;
-planning and processing of Hadoop tasks, including creating, executing, etc.
I would also like to clarify that Savanna does not offer any Data API due to a very long list of potential problems with big data processing. In the future, we plan to support not only Hadoop, but also other means of processing large amounts of data.
Question: Tell us about the Savanna community - who participates in this project?Answer: The project began with a small team in the company Mirantis. Today, about 30 people work on it as part of the Havana cycle, the backbone of the team is the employees of Mirantis, Red Hat and Hortonworks, the other participants are employees of HP, IBM, UnitedStack and Rackspace.
Question: What has the Savanna community reached at the moment?Answer: Today we have a service that provides work and manages clusters with support for scaling (and increasing and reducing cluster size, including adding new types of computing nodes), anti-affinity (including to ensure the reliability of data nodes ) and use of locally stored data for computing (for more efficient execution of Hadoop tasks). To store cluster configuration data, we use node group and cluster templates. If we talk about our second and main functionality Elastic Data Processing (EDP), then the Savanna project supports simple execution of tasks like jar, Pig and Hive through the Oozie task scheduler, including the ability to read and write data from and to the Swift repository. Regarding the possibility of expanding functional capabilities, this principle is provided by the presence of a plug-in mechanism, which now contains two plug-ins for access to Hadoop clusters: the Vanilla plug-in, which simply installs all the necessary services, and the Hortonworks Data Platform plug-in, which installs the Apache Ambari for autoload and Hadoop cluster configurations And, of course, a plugin for the OpenStack Dashboard panel, reflecting all the functionality of our project.
Q: What features will Savanna provide in the OpenStack Icehouse release?Answer: The main goal is to increase the efficiency of integration with other projects and the OpenStack infrastructure. The main change planned as part of the Icehouse release is Heat support for resource management in order to replace direct control with other OpenStack services. We are also working on the integration of Savanna and DevStack gate to check for new changes being made to the project (Devstack itself already has Savanna support), and proceed to API testing and integrated testing in Tempest. In addition, I hope to see in the Savanna Icehouse the so-called guest-agents, which will solve all current access problems between the cloud controller and the guest operating systems running on it; there is no need for a direct ssh / API call. As part of the EDP functionality, we would like to improve the execution of task flows in general, to implement support for new functions, task types, data sources, etc. I also expect that at least one more new plug-in supported by vendors, for example, IDH (Intel Distribution for the Apache Hadoop project), will be implemented.
Question: What would you like people to know about the project?Answer: The goal of the Savanna project is to provide the OpenStack community with data processing tools. At the moment, our focus is on the Hadoop ecosystem, but discussions are already underway and concepts are being developed to support other tools, for example, Apache Spark and Twitter Storm. That is, we are currently working on collecting requests for EDP and adding new features and data processing tools.
Question: Are there any common misconceptions regarding the Savanna project?Answer: Opinion on the availability of Data API in our project. Savanna does not have a Data API, but there are two levels of management API: one ensures cluster operation / management; the other controls the execution of tasks and their flows. And once again about the purpose of the project. We would like to offer integrated solutions and tools in the field of data processing, rather than a one-time solution for one infrastructure. Our field of activity is data processing.
Question: When can Savanna be used?Answer: During the implementation of the Savanna project we keep in mind several usage scenarios. First, the management of data processing clusters (today Hadoop clusters). Another application for cloud platforms is to use unused computing power when peak loads occur. As well as the ability to manage the load during data processing (various tasks of Hadoop at the moment) in a few clicks without special knowledge in the field of data processing tools.
Question: What is your vision for the Savanna project?Answer: I see Savanna as a service that provides data processing / cluster support tools, the main function of which is to provide elastic data processing operations, for example, to perform certain tasks, etc.
Question: Who would you like to see among the participants of the Savanna project?Answer: I would like to see two types of participants. We need people who are interested in implementing various Hadoop distributions and (especially) other data processing environments. We also really need operators - people who will start using Savanna to manage the load on processing their data and help us by sending us their comments and suggestions for improving the project.
Question: What functionality needs to be improved and tested now?Answer: Integration with Heat needs testing. this will replace a very large part of the code for resource management. We are working on transferring integrated integration testing to Tempest, and here we need help both in transferring old tests to this platform, and in writing new ones. And you also need to continue testing Savanna in various operating systems in combination with various guest operating systems.
Question: How can people start working with Savanna?Answer: I hope it is not very difficult now. Installation can be done using DevStack, you only need to load into Glance a disk image based on diskimage-builder available in CDN.
Docs.openstack.org/developer/savanna provides detailed usage guides for developers, administrators, and users. And, of course, our team is working to simplify this process, especially in view of the expectations of developers of new plug-ins and, as a result, new project participants. If you have questions, you can find our team on IRC channel #savanna at freenode.net or using the openstack-dev@lists.openstack.org e-mail list (indicating the prefix [savanna] in the subject line).
Question: Thank you for your time, Sergey.Answer: Thank you.