summarise()
used with aggregation functions that take a vector of values as input and return one. The summarise_each()
function offers a different approach to summarise()
with the same results.summarise()
and summarise_each()
, taking into account two factors that we can control:group_by()
option.mtcars
data mtcars
mtcars
data mtcars
.tbl_df
object. With the standard data.frame
object data.frame
nothing will happen, but a much better method of output will be available. mtcars <- mtcars %>% tbl_df() %>% select(cyl , mpg, disp)
summarise()
will produce a simple result: # mtcars %>% summarise (mean_mpg = mean(mpg))
## Source: local data frame [1 x 1] ## ## mean_mpg ## (dbl) ## 1 20.09062
# mtcars %>% group_by(cyl) %>% summarise (mean_mpg = mean(mpg))
## Source: local data frame [3 x 2] ## ## cyl mean_mpg ## (dbl) (dbl) ## 1 4 26.66364 ## 2 6 19.74286 ## 3 8 15.10000
summarise_each()
function could also be used, but its use is less reasonable from the point of view of code clarity. # mtcars %>% summarise_each (funs(mean) , mean_mpg = mpg)
## Source: local data frame [1 x 1] ## ## mean_mpg ## (dbl) ## 1 20.09062
# mtcars %>% group_by(cyl) %>% summarise_each (funs(mean) , mean_mpg = mpg)
## Source: local data frame [3 x 2] ## ## cyl mean_mpg ## (dbl) (dbl) ## 1 4 26.66364 ## 2 6 19.74286 ## 3 8 15.10000
summarise()
and summarise_each()
can be used.summarise()
function has a more intuitive syntax: # mtcars %>% summarise (min_mpg = min(mpg), max_mpg = max(mpg))
## Source: local data frame [1 x 2] ## ## min_mpg max_mpg ## (dbl) (dbl) ## 1 10.4 33.9
# mtcars %>% group_by(cyl) %>% summarise (min_mpg = min(mpg), max_mpg = max(mpg))
## Source: local data frame [3 x 3] ## ## cyl min_mpg max_mpg ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2
max_mpg = max(mpg)
summarise_each()
uses a more compact and neat syntax: # mtcars %>% summarise_each (funs(min, max), mpg)
## Source: local data frame [1 x 2] ## ## min max ## (dbl) (dbl) ## 1 10.4 33.9
# mtcars %>% group_by(cyl) %>% summarise_each (funs(min, max), mpg)
## Source: local data frame [3 x 3] ## ## cyl min max ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2
min
and max
. In this case, we lose the name of the variable to which the function is applied. If you need something like min_mpg
and max_mpg
, you need to rename the functions inside funs()
: # mtcars %>% summarise_each (funs(min_mpg = min, max_mpg = max), mpg)
## Source: local data frame [1 x 2] ## ## min_mpg max_mpg ## (dbl) (dbl) ## 1 10.4 33.9
# mtcars %>% group_by(cyl) %>% summarise_each (funs(min_mpg = min, max_mpg = max), mpg)
## Source: local data frame [3 x 3] ## ## cyl min_mpg max_mpg ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2
summarise()
and summarise_each()
.summarise()
function again has a more intuitive syntax, and the names of the output variables can be set in the usual simple form: max_mpg = max(mpg)
# mtcars %>% summarise(mean_mpg = mean(mpg), mean_disp = mean(disp))
## Source: local data frame [1 x 2] ## ## mean_mpg mean_disp ## (dbl) (dbl) ## 1 20.09062 230.7219
# mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), mean_disp = mean(disp))
## Source: local data frame [3 x 3] ## ## cyl mean_mpg mean_disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000
summarise_each()
uses a more compact and neat syntax: # mtcars %>% summarise_each(funs(mean) , mpg, disp)
## Source: local data frame [1 x 2] ## ## mpg disp ## (dbl) (dbl) ## 1 20.09062 230.7219
# mtcars %>% group_by(cyl) %>% summarise_each (funs(mean), mpg, disp)
## Source: local data frame [3 x 3] ## ## cyl mpg disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000
mpg
and disp
. In this case, we lose the name of the function applied to the variables - mean()
. Probably would like something like mean_mpg
and mean_disp
. In order to achieve this, you need to appropriately rename the variables passed to the "..." inside summarise_each()
: # mtcars %>% summarise_each(funs(mean) , mean_mpg = mpg, mean_disp = disp)
## Source: local data frame [1 x 2] ## ## mean_mpg mean_disp ## (dbl) (dbl) ## 1 20.09062 230.7219
# mtcars %>% group_by(cyl) %>% summarise_each(funs(mean) , mean_mpg = mpg, mean_disp = disp)
## Source: local data frame [3 x 3] ## ## cyl mean_mpg mean_disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000
summarise()
and summarise_each()
, have their advantages.summarise()
function again has a more intuitive syntax, and the names of the output variables can be set in the usual simple form: max_mpg = max(mpg)
# mtcars %>% summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp))
## Source: local data frame [1 x 4] ## ## min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472
# mtcars %>% group_by(cyl) %>% summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp))
## Source: local data frame [3 x 5] ## ## cyl min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0
summarise_each()
uses a more compact and neat syntax: # mtcars %>% summarise_each(funs(min, max) , mpg, disp)
## Source: local data frame [1 x 4] ## ## mpg_min disp_min mpg_max disp_max ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472
# mtcars %>% group_by(cyl) %>% summarise_each(funs(min, max) , mpg, disp)
## Source: local data frame [3 x 5] ## ## cyl mpg_min disp_min mpg_max disp_max ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0
variable_function
, i.e. mpg_min
, disp_min
, etc.function_variable
, not possible when summarise_each()
called. This can be implemented using a separate command. # mtcars %>% summarise_each(funs(min, max) , mpg, disp) %>% setNames(c("min_mpg", "min_disp", "max_mpg", "max_disp"))
## Source: local data frame [1 x 4] ## ## min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472
# mtcars %>% group_by(cyl) %>% summarise_each(funs(min, max) , mpg, disp) %>% setNames(c("gear", "min_mpg", "min_disp", "max_mpg", "max_disp"))
## Source: local data frame [3 x 5] ## ## gear min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0
summarise()
summarise_each()
summarise()
function has a simpler syntax, and the summarise_each()
function has a more compact one.summarise()
more suitable for one variable of a single function. The greater the number of variables or functions, the more justified is the use of summarise_each()
.summarise_each()
function has its own way of naming output variables:summarise_each()
call, another naming is not possible.Source: https://habr.com/ru/post/281747/
All Articles