read.csv
from utils
- the standard way to read csv files in Rread_csv
from readr
, which in RStudio replaced the previous methodload
and readRDS
from base
, andread_feather
from feather
and fread
from data.table
. set.seed(123) df <- data.frame(replicate(10, sample(0:2000, 15 * 10^5, rep = TRUE)), replicate(10, stringi::stri_rand_strings(1000, 5)))
csv
, still need files feather
, RDS
and Rdata
. path_csv <- '../assets/data/fast_load/df.csv' path_feather <- '../assets/data/fast_load/df.feather' path_rdata <- '../assets/data/fast_load/df.RData' path_rds <- '../assets/data/fast_load/df.rds' library(feather) library(data.table) write.csv(df, file = path_csv, row.names = F) write_feather(df, path_feather) save(df, file = path_rdata) saveRDS(df, path_rds)
files <- c('../assets/data/fast_load/df.csv', '../assets/data/fast_load/df.feather', '../assets/data/fast_load/df.RData', '../assets/data/fast_load/df.rds') info <- file.info(files) info$size_mb <- info$size/(1024 * 1024) print(subset(info, select=c("size_mb"))) ## size_mb ## ../assets/data/fast_load/df.csv 1780.3005 ## ../assets/data/fast_load/df.feather 1145.2881 ## ../assets/data/fast_load/df.RData 285.4836 ## ../assets/data/fast_load/df.rds 285.4837
csv
, and feather
, take up much more disk space. Csv
- 6 times, and feather
- more than 4 times more RDS
and RData
.microbenchmark
library was used to compare the reading time in 10 rounds. Methods: library(microbenchmark) benchmark <- microbenchmark(readCSV = utils::read.csv(path_csv), readrCSV = readr::read_csv(path_csv, progress = F), fread = data.table::fread(path_csv, showProgress = F), loadRdata = base::load(path_rdata), readRds = base::readRDS(path_rds), readFeather = feather::read_feather(path_feather), times = 10) print(benchmark, signif = 2) ##Unit: seconds ## expr min lq mean median uq max neval ## readCSV 200.0 200.0 211.187125 210.0 220.0 240.0 10 ## readrCSV 27.0 28.0 29.770890 29.0 32.0 33.0 10 ## fread 15.0 16.0 17.250016 17.0 17.0 22.0 10 ## loadRdata 4.4 4.7 5.018918 4.8 5.5 5.9 10 ## readRds 4.6 4.7 5.053674 5.1 5.3 5.6 10 ## readFeather 1.5 1.8 2.988021 3.4 3.6 4.1 10
feather
! However, the use of feather
involves the preliminary conversion of files into this format.load
or readRDS
can improve performance (second and third in terms of speed), storing a small / compressed file is also an advantage. In both cases, you first need to convert your file to the appropriate format.csv
format, fread
significantly outperforms read_csv
and read.csv
, and accordingly, is the best option for reading from a csv
file.feather
file, since the conversion from csv
to this format was one-time, and we did not have a strict limit on the size of the files, because we did not consider the RData
or RData
.csv
file provided by the customer using fread
,feather
via write_feather
, andfeather
files when launching the application using read_feather
.Source: https://habr.com/ru/post/326616/
All Articles