read.csv from utils - the standard way to read csv files in Rread_csv from readr , which in RStudio replaced the previous methodload and readRDS from base , andread_feather from feather and fread from data.table . set.seed(123) df <- data.frame(replicate(10, sample(0:2000, 15 * 10^5, rep = TRUE)), replicate(10, stringi::stri_rand_strings(1000, 5))) csv , still need files feather , RDS and Rdata . path_csv <- '../assets/data/fast_load/df.csv' path_feather <- '../assets/data/fast_load/df.feather' path_rdata <- '../assets/data/fast_load/df.RData' path_rds <- '../assets/data/fast_load/df.rds' library(feather) library(data.table) write.csv(df, file = path_csv, row.names = F) write_feather(df, path_feather) save(df, file = path_rdata) saveRDS(df, path_rds)  files <- c('../assets/data/fast_load/df.csv', '../assets/data/fast_load/df.feather', '../assets/data/fast_load/df.RData', '../assets/data/fast_load/df.rds') info <- file.info(files) info$size_mb <- info$size/(1024 * 1024) print(subset(info, select=c("size_mb"))) ## size_mb ## ../assets/data/fast_load/df.csv 1780.3005 ## ../assets/data/fast_load/df.feather 1145.2881 ## ../assets/data/fast_load/df.RData 285.4836 ## ../assets/data/fast_load/df.rds 285.4837 csv , and feather , take up much more disk space. Csv - 6 times, and feather - more than 4 times more RDS and RData .microbenchmark library was used to compare the reading time in 10 rounds. Methods: library(microbenchmark) benchmark <- microbenchmark(readCSV = utils::read.csv(path_csv), readrCSV = readr::read_csv(path_csv, progress = F), fread = data.table::fread(path_csv, showProgress = F), loadRdata = base::load(path_rdata), readRds = base::readRDS(path_rds), readFeather = feather::read_feather(path_feather), times = 10) print(benchmark, signif = 2) ##Unit: seconds ## expr min lq mean median uq max neval ## readCSV 200.0 200.0 211.187125 210.0 220.0 240.0 10 ## readrCSV 27.0 28.0 29.770890 29.0 32.0 33.0 10 ## fread 15.0 16.0 17.250016 17.0 17.0 22.0 10 ## loadRdata 4.4 4.7 5.018918 4.8 5.5 5.9 10 ## readRds 4.6 4.7 5.053674 5.1 5.3 5.6 10 ## readFeather 1.5 1.8 2.988021 3.4 3.6 4.1 10 feather ! However, the use of feather involves the preliminary conversion of files into this format.load or readRDS can improve performance (second and third in terms of speed), storing a small / compressed file is also an advantage. In both cases, you first need to convert your file to the appropriate format.csv format, fread significantly outperforms read_csv and read.csv , and accordingly, is the best option for reading from a csv file.feather file, since the conversion from csv to this format was one-time, and we did not have a strict limit on the size of the files, because we did not consider the RData or RData .csv file provided by the customer using fread ,feather via write_feather , andfeather files when launching the application using read_feather .Source: https://habr.com/ru/post/326616/
All Articles