📜 ⬆️ ⬇️

A Guide to R recently, the most cited non-academic publication in academic papers

In the bibliographic database Web of Science, "R: a language and environment for statistical computing" recently * went around other sources mentioned in the References section (References and Notes) of publications indexed by this database. Unfortunately, access to it is limited, and it is difficult to give a link (a separate link is generated for each session), but a number of users ** can reproduce my observations, a cat is described as, and also with what reservations it is necessary to understand the news headline.

image

The illustration shows a list of the most cited sources in publications indexed by WoS, which themselves are not indexed by WoS in the main database (Core Collection), but are only in the base of bibliographic references.

In addition to the fact that the three indexed publications (all in biology) are still overtaking the R manual, and in many other ways it is a rather limited record with a number of assumptions. First, it concerns only WoS, in the Scopus database, which is often mentioned along with WoS, the “Diagnostic and Statistical Manual of mental disorders” nomenclature is still (but judging by the growth rates, not for long) ahead of the R. manual. Second, Of course, I am aware that this is an absolute record, without normalization by areas of knowledge, year of publication, etc. Thirdly, I use possibly not the most honest counting, but I’m summarizing the citations of all versions of the manual (like other such bibliographic references — all versions of DSM, all volumes of Numerical recipes, etc.), whereas in the usual calculation, without any summations, the manual is found only on the 40th place (further on 51, 61, etc. the place is also it, but dated a different year, a different version of the manual, the letter a is written in capital letters before the colon, etc. .).
')
image
Top 25 categories of WoS, in which they quote the manual. The situation is similar in Scopus.

image
Growth in the number of manual quotes in Scopus, with similar values ​​for WoS.

It is also worth bearing in mind that not in all cases, if the authors of an academic publication used some kind of tool (in a broad sense, whether it is hardware or software, or a theorem, or a logical argument, etc.), then they will definitely give there is a link to it, so the subject of a separate study, to what extent such a frequent mention of a manual reflects its frequent use in writing scientific works (it is known that R is popular in science, the question is different, according to the numbers, perhaps there is some other non-academic source, -factual used often, but not mentioned in the bibliography).

For example, according to this de facto review , when searching Google Scholar and using data for 2018, SPSS is used one and a half times more often for academic writing. The author explains this by the complexity of the development of R. It would be desirable, however, a comparative analysis of different databases, because the selection of indexed publications, and, accordingly, the citation rates they differ.

Why is R so important to scientists? Andy Wills in the Linux Journal writes about R in the light of the idea of ​​Open Science, and in connection with the relevance of the crisis of reproducibility in psychology. Psychologist and data-scientist Yevgeny Tomilov , to whom I addressed, justified the importance of R for science in response:
R allows you to create reproducible research protocols, including data and their processing. In conditions of total falsification and the urgent need to increase the reproducibility and credibility of scientific works, the use of this tool is at least useful, and at the most ethical.
ZY It is also interesting that Google Scholar has the R Core Team profile , similar to the profiles of individual researchers, with a good Hirsch index of 50 (for this you need to have more than 50 publications, while having a 50 publication in a row, when ranking by the number of citations, has citations equal to 50).

* The exact date is difficult to call due to the peculiarities of counting and detailing the data, apparently it happened in the past few months.

** Namely, the owners of the library card of the NLR, the RSL, and the Gorky Library and the Student Card of St. Petersburg State University, as well as a number of other universities.

How to play CDR:

In the section “Search by reference bibliography”, you can search for years by entering the query 1000-2999 and get a sample of 264 million results from 268 (in the remaining, probably, the year is not indicated, but it is unlikely that they are somehow essential for subsequent manipulations) . Make a ranking by the number of citations. Next, export the results, and filter out those that have the Source column, but not the Title column (for example, in the case of a journal article, the journal’s title is given in the first case, the publication’s title in the second case, then the book is indexed, then both columns will be the same, and only in the case of non-indexed sources, the “Title” column will be empty). And you can manually or through a script get the results of summations of citations for each unique record (that is, combine the data on exported bibliographic links cited in different spelling, indicating different editions, individual pages, etc.).

Source: https://habr.com/ru/post/460169/


All Articles