Home Should I be using unnest_wider and rowMeans to get the average of a list column?

Questions

Should I be using unnest_wider and rowMeans to get the average of a list column?

November 29, 2022

I have a simple data set. The row names are a meaningful index and column 1 has a list of values. What I eventually want is the average of that list for each row name.

What it looks like now:

row name	years
108457	[1200, 1200, 1540, 1890]
237021	[1600, 1270, 1270]

What I eventually want it to look like:

row name	years
108457	mean of list
237021	mean of list

Currently, I’m trying to use unnest_wider(years). My plan is to then afterwards use rowMeans() to find the mean of the unnested row. I can then merge the row name and average value with my main data set, so I’m not too concerned with deleting the new columns.

However, this whole process is taking a while and I’m having some issues with unnest_wider. Currently, when I try:

unnest_wider(dataset, colname)

I get the following error:

Error in as_indices_impl():
! Must subset columns with a valid subscript vector.
✖ Subscript has the wrong type data.frame<years:list>.
ℹ It must be numeric or character.

When I try:

unnest_wider(colname)

My computer just runs endlessly and it looks like it’s counting… it doesn’t stop and I have to quit the application to terminate processing.

I had previously tried to directly apply rowMeans, use mean(df$ColName), and use apply(ColName, mean).

I wonder if there’s a more efficient way?

It may be that I shouldn’t have created the list in the first place. It looks like it does now because I converted it from this format:

Column A	Column B
108457	1200
108457	1200
108457	1540
237021	1600
108457	1890
237021	1270

I converted it using pivot_wider and then as.data.frame.(t(dataset))

Should I have tried to get the averages directly from this format? If so, how would I do that?

>Solution :

If your original data look like they do in the latter table, you could simply find the mean by group based on ColumnA:

Data

df <- read.table(text = "ColumnA    ColumnB
108457  1200
108457  1200
108457  1540
237021  1600
108457  1890
237021  1270", header = TRUE)

Base R

aggregate(df$ColumnB, list(df$ColumnA), FUN=mean) 

# Group.1      x
# 1  108457 1457.5
# 2  237021 1435.0

Dplyr

library(dplyr)
df %>% 
  group_by(ColumnA) %>%
  summarise(mean_years = mean(ColumnB))

#  ColumnA mean_years
#    <int>      <dbl>
#1  108457      1458.
#2  237021      1435