📜 ⬆️ ⬇️

Translation of an interview with Julian Dragos (Scala)

Luxoft Training Training Center coach Nazariy Shimansky interviewed Julian Dragos, a well-known developer who made a great contribution to the development of the Scala language. We offer to get acquainted with the translation of the interview.
Yulinan has been studying Scala since 2004, at the same time he began working in the research laboratory of Martin Odersky (Martin Odersky) at the Federal Polytechnic School of Lausanne. He wrote the server part (backend) of the Java virtual machine and the bytecode optimizer, and he also worked on various parts of the compiler. In addition, Julian implemented specialization for Scala using types.

In 2010, Julian received his PhD from the Federal Polytechnic School of Lausanne. He worked at Typesafe since its founding by Martin Oderski, the creator of the Scala language, engaged in the development of development tools (in particular, he wrote the Eclipse plugin for Scala). Then he led the Spark group at Lightbend (formerly Typesafe) and made a significant contribution to the development of this project. In addition, he leads training courses and helps clients implement Spark-projects.

Translated and published by permission of the author.

NS: Good afternoon, Julian. My name is Nazarius Shimansky.
')
UD: Good afternoon.

NS: I work at Luxoft as an engineer and trainer at a training center. I am glad that we have the opportunity to talk with you. I know that you and Martin Oderski have been doing Scala for a long time. What do you think, for which projects it is advisable to use this language?

UD: First of all, thanks for the invitation. I would be very happy to talk with you. As for the use of Scala, two years ago I would say that this language is best suited for programming server-side applications, that is, where Java is widely used today. Therefore, for any applications running on Java virtual machines, the Scala language is best suited. At the same time, the client part is not excluded. As you probably know, for the last two years a new compiler has been used for the Scala code - Scala JS, which compiles programs written in the Scala language in Java Script so that they can be launched in the browser. This approach has been quite successful. So now I would change my answer and say that Scala is well suited for both client and server parts of applications. This is a very powerful combination, as you can use the language and skills that are available for both sides of the application.
Probably today Scala is not very suitable for deeper system programming, where C is usually used or command line utilities and where the launch time of the Java Virtual Machine can be a problem, although, of course, there are certain workarounds.

In addition, a new area is being actively developed, in which Scala is widely used - this is big data (BigData). And I think this is one of the reasons for our conversation today. Scala is a functional language that is particularly well suited for working with data, and thanks to the Spark software platform, Scala finds application in the field of big data processing. In conclusion, I would say that now these three areas are the most promising for Scala. It is a general purpose programming language, and of course, it can be used in many other areas.

NS: Speaking of Spark, which language would you recommend using to create Spark applications?

UD: First of all, I would choose Scala. There are several reasons for this. First, Scala is great for working with data, especially in the context of parallel processing and data processing without side effects. The second reason is that the Spark platform itself was developed using Scala. Therefore, you get access to all the new features; everything that appears new in Spark will first be available in the Scala language, and only after a fairly long time does the Spark team offer new APIs in other languages. It seems to me that you will always have some advantage when using the same language in which the platform itself is developed. I think that for data processing specialists, Python is an excellent choice, especially if they have specifically studied it. I would recommend Python for purely research applications, and in cases where you need to develop a finished product, it is better to use Scala.

NS: Thank you. I know that you are planning to hold a seminar in Russia in the spring, so I would like to know which cluster manager would you prefer to use in your projects and what will you use for demonstration during the training?

UD: Yes, a great question. I usually use Mesos. This is a general purpose cluster manager, and I like this system because it allows you to combine loads; for example, you can perform Spark tasks simultaneously with other applications that, for example, are used in your company. And besides, the Mesos system is evolving. This is a new project launched at the University of Berkeley, where the Spark platform was created, in the same laboratory. Therefore, there is a kind of synergy there. As for the seminar, since we will have only one day and need to tell a lot, I will use the local launch of Spark. Workshop participants will be able to install from scratch, run Spark tasks and quickly see the results.

NS: And how will the Spark API be focused during the training?

UD: Since I will have only one day, I will mainly talk about basic things. When listeners get a good idea of ​​how Spark works, I think they can easily switch to other libraries. If you answer your question directly, we will spend a lot of time on the RDD API, as well as on what is happening behind the scenes. Having considered the basic concepts, we can move to a higher level API. It is especially important to understand Spark SQL well. So we will focus on these two things.

NS: Thank you. Are you planning to give examples of when and what data formats supported by Spark should be used?

UD: Of course. We will discuss the process of receiving and processing data, i.e. what kinds of data can be read and written to Spark, in particular from Spark SQL. We will look at this question in the second part of the course.

NS: What are your thoughts on Spark modules, such as Streaming, GraphX ​​and Machine Learning? Are you going to talk about them during the training?

UD: Yes, of course, we will talk about them, but without practical exercises. These modules are very important. Firstly, from the point of view of the proposed algorithms and capabilities, and secondly, because it shows how Spark works with RDD building blocks. These blocks are used to create higher level applications and services.

The Spark Streaming module is an extremely interesting project, and the data stream processing technology itself is today in a highly competitive environment where there are already quite ready solutions and libraries.

I know that in Spark Streaming they changed the API and they have their own vision of how it will evolve, so I will definitely follow this. The Machine Learning module is also very interesting, since many people are now engaged in big data and want to use algorithms and methods of machine learning in data processing systems.
We will briefly talk about this module, not stopping at its practical application. As I said, my goal in this training is to give listeners an understanding and mental model of how Spark works. If they need Spark in the future, they will be able to study this system in more detail on their own.

NS: And what do you think about vendors who offer distributions and services for Big Data and include Spark in them? What, in your opinion, are the strengths and weaknesses of such proposals?

UD: I think such proposals are quite acceptable. I would say that everything depends on what kind of business you are developing. For example, you publish a local, city newspaper. Unfortunately, newspapers are now a thing of the past. And you do not have the funds for a team that would deal with Big Data and product development for your business. In this case, it makes sense to use a commercial service that gives you a finished product.
However, if big data plays an important role in your business, then you probably need to have your own development team. I think, in fact, it depends on how your main business is related to Big Data.

NS: Thank you very much. These are all the questions I wanted to ask you.

UD: Thank you.

The master class of Julian Dragos will be held on April 6 in Moscow and will be devoted to the development of Apache Spark applications in the Scala language. More details can be found here.

Source: https://habr.com/ru/post/322952/


All Articles