Data frame and summarizing

January 20, 2022

My dataset:

dt<-data.frame(GrossIncome=seq(0, 10000, by = 1000),
               Turnover= seq(0, 100000, by = 10000),
               Sellers= seq(0, 1, by = 0.1),
               Buyers=seq(0, 1, by = 0.1))

So I now I want to summarize this data and divide by 1000 GrossIncome and Turnover.

     OUTPUT<-data.frame( 
                   "GrossIncome"=round(sum(dt$GrossIncome)/1000,1),
                   "Turnover"=round(sum(dt$Turnover)/1000,1),
                   "GrossIncomeAndTurnover"=round(((sum(dt$Turnover)+sum(dt$Turnover))/1000),1),
                   "Sellers"=round(sum(dt$Sellers),1),
                   "Buyers"=round(sum(dt$Buyers),1))


  Output                 
         GrossIncome Turnover GrossIncomeAndTurnover Sellers Buyers
1          55      550                   1100     5.5    5.5

So any suggestion for a more elegant solution then solution above ? I tried with the code below but this code only works for first two items (GrossIncome and Turnover) but not for rest of items.

  dt %>%
  dplyr::select(GrossIncome,Turnover)%>%
  dplyr:: summarise_all(sum,na.rm=TRUE)/1000

So can anybody help me how to solve this problem?

>Solution :

We can use across() to apply different functions to different columns.

dt %>%
  summarize(
    across(c(GrossIncome, Turnover), ~ round(sum(.) / 1000, 1)),
    GrossIncomeAndTurnover = GrossIncome + Turnover,
    across(c(Sellers, Buyers), ~round(sum(.), 1))
  )
#   GrossIncome Turnover GrossIncomeAndTurnover Sellers Buyers
# 1          55      550                    605     5.5    5.5

Note that in both our codes, the GrossIncome and Turnover summaries are computed first and these newly created variables are used in the GrossIncomeAndTurnover calculation. My code accounts for this, simply adding them.