How to safely store and use secret data in R

Periodically, the question arises how it is possible to safely store the login and password in R, without setting this data explicitly in your script. I think there are several possible solutions. You can store your parameters:

Directly in the script.
In the file inside the folder with the project that you do not show.
In the .Rprofile file.
In the .Renviron file.
In json file.
In a secure repository that you access from R.
Using the digest package.
Using the sodium package.
Using the secure package.

Let's consider the main idea, the advantages (or disadvantages) of each of the approaches.
[From translator: ordered as utility increases.]

Directly in the script

The first approach is to store your parameters directly in the script.

id <- "my login name" pw <- "my password" call_service(id, pw, ...)

Despite the simplicity, no one seriously offers to do this because of an obvious drawback: you cannot show your code without showing these parameters either.

In the file inside the folder with the project that you do not show

The second option is almost as simple. The idea is this: you put your parameters in a separate file inside the same folder with the project, for example, “keys.R”. Then you can read the parameters using, say, source() . Then we exclude "keys.R" from the version control system. If you use git, you can add “keys.R” to the .gitignore settings.

The disadvantage is that you can still accidentally show this file if you are not careful enough.

 # keys.R id <- "my login name" pw <- "my password" # script.R source("keys.R") call_service(id, pw, ...)

In the .Rprofile file

The third option is to store the parameters in one of the .Rprofile files. This option is quite popular because:
You can store data in another folder, i.e. not in the project folder. Therefore, it is less likely that you will accidentally open access to the file.
')
In .Rprofile you can write regular code in R.

 # ~/.Rprofile id <- "my login name" pw <- "my password" # script.R # id  pw     .Rprofile call_service(id, pw, ...)

One of the drawbacks of defining the "id" and "pw" objects in .Rprofile is that they become part of the global environment. Once they are there, they are easy to change from the script. For example, using rm() to clean up the global environment will remove them.

A slightly more flexible version of the same method - also use .Rprofile, but declare your parameters as environment variables. Sys.setenv() can be used to set environment variables, Sys.setenv() can be used to read.

 # ~/.Rprofile Sys.setenv(id = "my login name") Sys.setenv(pw = "my password") # script.R # id  pw     .Rprofile call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)

In the .Renviron file

R also has a mechanism for defining environment variables in a special external file, .Renviron. Working with .Renviron is similar to .Rprofile. The main difference is that variables can be set directly in .Renviron, without using Sys.setenv() .

Environment variables are language independent.

 # ~/.Renviron id = "my login name" pw = "my password" # script.R # id  pw     .Renviron call_service(id = Sys.getenv("id"), pw = Sys.getenv("pw"), ...)

Json or yaml file

The json format is mainly used for interaction via web services. Therefore, most modern languages easily interpret json files. The same applies to yaml files. If you want to store parameters in a format that other languages can understand, such as Python, json can be a good solution.

 # keys.json { "id":["my login name"], "pw":["my password"] } # script.R library(jsonlite) call_service(id = fromJSON("keys.json")$id, pw = fromJSON("keys.json")$pw, ...)

In a secure repository that you access from R

One big disadvantage of all previous approaches is that in all cases the parameters are stored in unencrypted form somewhere on the disk. Perhaps you are using some kind of tool like keychain or LastPass .

For storage, you can use an encrypted disk . Once it is mounted, you can work with its contents as with plain text. Those. From the point of view of user R, secret data is simply stored as “plain text” somewhere on an encrypted disk.

Using the digest package

Another alternative is to use the digest package, which is supported by Dirk Eddelbüttel. Stefan Doyen proposed this solution:

I use the digest package with AES encryption implemented.
I use two functions: one writes files encrypted with AES, the second reads and decrypts these files. These functions are on github .
Then I use the digest package to generate a key to decrypt and encrypt files.
Once all this is ready, I create a data block (dataframe) with a login and password.
I use the write.aes() function to write a parameter locally to an encrypted file.
read.aes() allows you to decrypt parameters and import them into R.

Thus, secret data does not appear explicitly or in code. This gives an additional opportunity to store the parameters somewhere else (remote server, usb disk, etc.). Also this solution does not require entering a password every time.

Stefan offers this code to illustrate:

 source("crypt.R") load("key.RData") credentials <- data.frame(login = "foo", password = "bar", stringsAsFactors = FALSE) write.aes(df = credentials, filename = "credentials.txt",key = key) rm(credentials) credentials <- read.aes(filename = "credentials.txt",key = key) print(credentials)

Using the sodium package

Another option is to use the sodium package created by Yerun Oms. The sodium package is an R wrapper for the libsodium cryptographic library.

Libsodium wrapper: a modern, easy-to-use library for encrypting, decrypting, signing, password hashing, etc. Sodium uses curve25519, the newest Diffie-Hellman function from Daniel Bernstein, which became very popular after the Dual EC DRBG vulnerability was discovered in the NSA.

This means that sodium can be used to set up secure communications, including using asymmetric keys, directly from R. To use sodium to encrypt your data, use the approach described above for digest.

Using the secure package

Finally, the last option is to use the secure package written by Headley Wickham. From the package description:

The secure package provides secure storage in a publicly accessible repository. This allows you to store secret data in a public repository, so that they are accessible only to selected users. This is especially useful for testing, because using the package, you can store personal data in a public repository, not showing them to the world.

Secure is built on the basis of asymmetric encryption (with public and private keys). Secure generates a random master key and uses it to encrypt (AES256) each file in vault/ . The master key is not stored anywhere in the unencrypted form; instead, for each user there is a copy encrypted with his public key. Each user can decrypt the master key using his private key, and then use it to decrypt each file.

In order to understand how this works, you may need a whole study. But in general, the idea is this:

Secret data is stored in the repository using the key consisting of the public keys of all people whom you want to give the right to decrypt.
You can use the public key available to every user on github.
You can also use the Travis public key if continuous integration is used.

Headley provides step-by-step instructions on how to use the package in his github repository .

Source: https://habr.com/ru/post/275757/

All Articles