⬆️ ⬇️

R and Python - worthy rivals?





Good Friday, dear readers!



In the history of the computer edition of the publishing house "Peter" there are few such successful books as Michael Dawson's " Programming in Python " and no more such controversial topics as the amazing language R, firmly established among the best-selling themes of Amazon. We are currently negotiating with copyright holders about a great new book on Python, but at the same time we wanted to check public opinion about R - is it advisable to publish new books about this elite language for great statistics gurus, or will Python easily overcome it, not like Apollo ?

')

Welcome under the cut!





Python and R are arguing about being the “best” data tool, and both opponents have their strengths and weaknesses. The choice of a particular language depends on the specific situation, the cost of training, as well as what other common tools are required to solve the problem.



Students are often interested in what language - R and / or Python is better to use when solving everyday tasks related to data analysis. I usually offer interactive guides on R, but I clarify that in each particular case the choice depends on the type of task, the data on which is required to be analyzed.



Python and R are popular programming languages ​​for working with statistics. While the R language was developed with an eye to exactly the needs of statisticians (just remember what powerful data visualization features R has!), Python is renowned for its clear syntax.



This article will look at the major differences between R and Python, as well as the place they both occupy in the world of data and statistics. If you prefer infographics, look at the development of " Data Science Wars: R vs Python ".



Meet R



Ross Aichek and Robert Gentleman created the free R language in 1995 as a free implementation of the S programming language. They sought to develop a language that would provide a more qualitative and understandable approach to data analysis, statistics, and graphical models. At first, R was used primarily in the academic and research environment, but relatively recently began to penetrate into the world of large corporations. Therefore, R is one of the most rapidly developing statistical languages ​​used in corporate practice.



One of the main advantages of R is the huge community of developers involved in language support in mailing lists, user documentation service and in a very active group on Stack Overflow. There is also CRAN , a giant repository of recommended R packages, in the development of which anyone can participate. These packages are a collection of functions and data R, they provide instant access to the latest techniques and functionality, eliminating the programmer from the need to invent everything yourself.



Finally, if you are an experienced developer, then you probably will not have difficulty in quickly learning R. A beginner programmer may have difficulty, since the learning curve R is very steep. Fortunately, there are currently many excellent learning resources for R.



Meet Python



The Python language was created by Guido van Rossum in 1991. This language focuses on code performance and readability. Among programmers who want to immerse themselves in data analysis and use statistical techniques, there are many active users of Python who use this language in the statistical field. The more actively you go into the techie environment, the more you will probably like Python. This flexible language is great for everything innovative. Given its simplicity and readability, the learning curve for this language is relatively flat.



As in R, there are packages in Python. PyPi is a list of Python packages, it contains libraries that any user can supplement. Like R, Python has a large developer community, but it is somewhat heterogeneous, since Python is a universal language. However, it is the data science that is rapidly taking up more and more impressive positions in the Python universe: expectations are growing, new applications for working with data appear one after another.



R and Python: General indicators



On the Net you can find a lot of quantitative comparisons of the prevalence and popularity of R and Python. Although such indicators make it possible to confidently navigate how these two languages ​​develop in the general context of informatics, it is not easy to compare them directly. The main reason is that the scope of the use of R is limited to data science. Python, in turn, being a universal language, is widely used in many areas, for example, in web development. Therefore, ratings are often distorted in favor of Python, while salaries are significantly higher for specialists in R.







When and how to use R?



R is commonly used when data analysis requires dedicated computing power or separate servers. R is great for research, convenient for almost any data analysis option, because in R there are a lot of packages and ready-made tests that provide the right tools for a quick start. R may even be part of a big data solution.

Getting started with R, it is advisable to begin to install a great IDE RStudio . Then I recommend to get acquainted with the following popular packages:







When and how to use Python?



Python comes in handy in cases where data analysis tasks are woven into the work of web applications, or if the statistical code needs to be incorporated into a working database. Python, being a full-featured programming language, is great for implementing algorithms and their subsequent practical use. More recently, packages for analyzing data in Python were in their infancy, which presented a certain problem, but in recent years the situation has improved significantly. Be sure to install NumPy / SciPy (scientific computing) and pandas (data manipulation) to adapt Python for data analysis. Also, look at the matplotlib library for creating graphics and scikit-learn for machine learning.



Unlike R, there is no pronounced “winning” IDE for Python. It is advisable to familiarize yourself with Spyder , IPython Notebook and Rodeo and choose the one that best suits you.



R and Python: shares in the data science segment



If we look at recent polls related to the popularity of various languages ​​used in data analysis, then often R looks like a clear leader. If you specifically compare the positions of Python and R in this community, a similar picture emerges.







Despite the above infographics, there is reason to believe that more and more professionals are moving from R to Python. Moreover, the share of those programmers is growing who own both languages ​​and use one or the other as needed. This is the tactic I recommend to my students.



If you are planning a career in data science, you will need to master both languages. Labor market trends indicate a growing demand for both skills, while wages in this segment are significantly higher than average.



R: Pros and cons



Plus: A picture is more informative than a thousand words

If the data is visualized, then they often become more expressive and comprehensible than bare numbers. The R language is simply created for visualization. Be sure to check out the ggplot2, ggvis, googleVis and rCharts visualization packages.



Plus: R ecosystem

R has a rich ecosystem of cutting-edge packages and has an active community. Packages are available in the CRAN, BioConductor and Github repositories. All R packages can be viewed at Rdocumentation .



Plus: R - lingua franca data science

The R language was developed by statisticians for statisticians. They can exchange ideas and concepts with the help of code and R packages, besides, to dive into this topic they do not have to have basic knowledge of computer science. In addition, the language is increasingly being distributed in a non-academic environment.



Plus / minus: R - slow language

R was created to facilitate the work of statisticians, not your computer. R may seem slow due to poorly written code, but there are many packages that improve the performance of R: pqR, renjin and FastR, Riposte, and many others.



Minus: R is difficult to learn

The R-learning curve is nontrivial, especially if you take up statistical analysis based on a graphical interface. Even searching for packages can take a long time if this is new to you.



Python: Pros and Cons



Plus: IPython Notebook

The IPython Notebook tool makes it easy to work with Python and data. It’s not difficult to use such a notebook with colleagues, and they don’t even have to install anything. In this case, the costs associated with organizing the code, output files and notes are sharply reduced. You will be able to devote more time to useful work.



Plus: universal language

Python is a universal language, simple and intuitive. His learning curve is relatively flat, you can write programs faster in this language. In short, little time is spent on the code, and a lot of time is spent on various interesting things!



Moreover, a testing framework is built into Python, whose entry barrier is very low. The framework provides good test coverage. Thus, your code will be reliable and convenient for repeated use.



Plus: a multi-purpose language

Python brings together people starting their careers in various fields. Since this is a simple and common language, not only understandable to many programmers, but also easy for statisticians, you can write a tool in it that integrates all the steps of your workflow.



Plus / Minus: Visualization

The possibility of visualization is an important criterion in the selection of software for data analysis. Although there are nice libraries for rendering in Python, for example, Seaborn, Bokeh and Pygal, the choice may be unnecessarily large. Moreover, compared to R, rendering on Python is much more complicated, and its results are sometimes not very visual.



Minus: Python plays on foreign field

Python is a competitor to R. But there are no alternatives in it for hundreds of the most important R. Packages. Let Python successfully catch up, it’s unclear whether people will abandon R for it.



And the winner ...



You define! As a data expert, you have to choose your own language for work. Try to answer the following questions:



  1. What problems do you need to solve?
  2. What will it cost you to learn a new language?
  3. What tools are actively used in your professional field?
  4. What alternatives are there for these tools?




Good luck!

Source: https://habr.com/ru/post/263457/



All Articles