summarise() used with aggregation functions that take a vector of values as input and return one. The summarise_each() function offers a different approach to summarise() with the same results.summarise() and summarise_each() , taking into account two factors that we can control:group_by() option.mtcars data mtcarsmtcars data mtcars .tbl_df object. With the standard data.frame object data.frame nothing will happen, but a much better method of output will be available. mtcars <- mtcars %>% tbl_df() %>% select(cyl , mpg, disp) summarise() will produce a simple result: # mtcars %>% summarise (mean_mpg = mean(mpg)) ## Source: local data frame [1 x 1] ## ## mean_mpg ## (dbl) ## 1 20.09062 # mtcars %>% group_by(cyl) %>% summarise (mean_mpg = mean(mpg)) ## Source: local data frame [3 x 2] ## ## cyl mean_mpg ## (dbl) (dbl) ## 1 4 26.66364 ## 2 6 19.74286 ## 3 8 15.10000 summarise_each() function could also be used, but its use is less reasonable from the point of view of code clarity. # mtcars %>% summarise_each (funs(mean) , mean_mpg = mpg) ## Source: local data frame [1 x 1] ## ## mean_mpg ## (dbl) ## 1 20.09062 # mtcars %>% group_by(cyl) %>% summarise_each (funs(mean) , mean_mpg = mpg) ## Source: local data frame [3 x 2] ## ## cyl mean_mpg ## (dbl) (dbl) ## 1 4 26.66364 ## 2 6 19.74286 ## 3 8 15.10000 summarise() and summarise_each() can be used.summarise() function has a more intuitive syntax: # mtcars %>% summarise (min_mpg = min(mpg), max_mpg = max(mpg)) ## Source: local data frame [1 x 2] ## ## min_mpg max_mpg ## (dbl) (dbl) ## 1 10.4 33.9 # mtcars %>% group_by(cyl) %>% summarise (min_mpg = min(mpg), max_mpg = max(mpg)) ## Source: local data frame [3 x 3] ## ## cyl min_mpg max_mpg ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2 max_mpg = max(mpg) summarise_each() uses a more compact and neat syntax: # mtcars %>% summarise_each (funs(min, max), mpg) ## Source: local data frame [1 x 2] ## ## min max ## (dbl) (dbl) ## 1 10.4 33.9 # mtcars %>% group_by(cyl) %>% summarise_each (funs(min, max), mpg) ## Source: local data frame [3 x 3] ## ## cyl min max ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2 min and max . In this case, we lose the name of the variable to which the function is applied. If you need something like min_mpg and max_mpg , you need to rename the functions inside funs() : # mtcars %>% summarise_each (funs(min_mpg = min, max_mpg = max), mpg) ## Source: local data frame [1 x 2] ## ## min_mpg max_mpg ## (dbl) (dbl) ## 1 10.4 33.9 # mtcars %>% group_by(cyl) %>% summarise_each (funs(min_mpg = min, max_mpg = max), mpg) ## Source: local data frame [3 x 3] ## ## cyl min_mpg max_mpg ## (dbl) (dbl) (dbl) ## 1 4 21.4 33.9 ## 2 6 17.8 21.4 ## 3 8 10.4 19.2 summarise() and summarise_each() .summarise() function again has a more intuitive syntax, and the names of the output variables can be set in the usual simple form: max_mpg = max(mpg) # mtcars %>% summarise(mean_mpg = mean(mpg), mean_disp = mean(disp)) ## Source: local data frame [1 x 2] ## ## mean_mpg mean_disp ## (dbl) (dbl) ## 1 20.09062 230.7219 # mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), mean_disp = mean(disp)) ## Source: local data frame [3 x 3] ## ## cyl mean_mpg mean_disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000 summarise_each() uses a more compact and neat syntax: # mtcars %>% summarise_each(funs(mean) , mpg, disp) ## Source: local data frame [1 x 2] ## ## mpg disp ## (dbl) (dbl) ## 1 20.09062 230.7219 # mtcars %>% group_by(cyl) %>% summarise_each (funs(mean), mpg, disp) ## Source: local data frame [3 x 3] ## ## cyl mpg disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000 mpg and disp . In this case, we lose the name of the function applied to the variables - mean() . Probably would like something like mean_mpg and mean_disp . In order to achieve this, you need to appropriately rename the variables passed to the "..." inside summarise_each() : # mtcars %>% summarise_each(funs(mean) , mean_mpg = mpg, mean_disp = disp) ## Source: local data frame [1 x 2] ## ## mean_mpg mean_disp ## (dbl) (dbl) ## 1 20.09062 230.7219 # mtcars %>% group_by(cyl) %>% summarise_each(funs(mean) , mean_mpg = mpg, mean_disp = disp) ## Source: local data frame [3 x 3] ## ## cyl mean_mpg mean_disp ## (dbl) (dbl) (dbl) ## 1 4 26.66364 105.1364 ## 2 6 19.74286 183.3143 ## 3 8 15.10000 353.1000 summarise() and summarise_each() , have their advantages.summarise() function again has a more intuitive syntax, and the names of the output variables can be set in the usual simple form: max_mpg = max(mpg) # mtcars %>% summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp)) ## Source: local data frame [1 x 4] ## ## min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472 # mtcars %>% group_by(cyl) %>% summarise(min_mpg = min(mpg) , min_disp = min(disp), max_mpg = max(mpg) , max_disp = max(disp)) ## Source: local data frame [3 x 5] ## ## cyl min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0 summarise_each() uses a more compact and neat syntax: # mtcars %>% summarise_each(funs(min, max) , mpg, disp) ## Source: local data frame [1 x 4] ## ## mpg_min disp_min mpg_max disp_max ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472 # mtcars %>% group_by(cyl) %>% summarise_each(funs(min, max) , mpg, disp) ## Source: local data frame [3 x 5] ## ## cyl mpg_min disp_min mpg_max disp_max ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0 variable_function , i.e. mpg_min , disp_min , etc.function_variable , not possible when summarise_each() called. This can be implemented using a separate command. # mtcars %>% summarise_each(funs(min, max) , mpg, disp) %>% setNames(c("min_mpg", "min_disp", "max_mpg", "max_disp")) ## Source: local data frame [1 x 4] ## ## min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) ## 1 10.4 71.1 33.9 472 # mtcars %>% group_by(cyl) %>% summarise_each(funs(min, max) , mpg, disp) %>% setNames(c("gear", "min_mpg", "min_disp", "max_mpg", "max_disp")) ## Source: local data frame [3 x 5] ## ## gear min_mpg min_disp max_mpg max_disp ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 21.4 71.1 33.9 146.7 ## 2 6 17.8 145.0 21.4 258.0 ## 3 8 10.4 275.8 19.2 472.0 summarise()summarise_each()summarise() function has a simpler syntax, and the summarise_each() function has a more compact one.summarise() more suitable for one variable of a single function. The greater the number of variables or functions, the more justified is the use of summarise_each() .summarise_each() function has its own way of naming output variables:summarise_each() call, another naming is not possible.Source: https://habr.com/ru/post/281747/
All Articles