Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Should I be using unnest_wider and rowMeans to get the average of a list column?

I have a simple data set. The row names are a meaningful index and column 1 has a list of values. What I eventually want is the average of that list for each row name.

What it looks like now:

row name years
108457 [1200, 1200, 1540, 1890]
237021 [1600, 1270, 1270]

What I eventually want it to look like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

row name years
108457 mean of list
237021 mean of list

Currently, I’m trying to use unnest_wider(years). My plan is to then afterwards use rowMeans() to find the mean of the unnested row. I can then merge the row name and average value with my main data set, so I’m not too concerned with deleting the new columns.

However, this whole process is taking a while and I’m having some issues with unnest_wider. Currently, when I try:

unnest_wider(dataset, colname)

I get the following error:

Error in as_indices_impl():
! Must subset columns with a valid subscript vector.
âś– Subscript has the wrong type data.frame<years:list>.
ℹ It must be numeric or character.

When I try:

unnest_wider(colname)

My computer just runs endlessly and it looks like it’s counting… it doesn’t stop and I have to quit the application to terminate processing.

I had previously tried to directly apply rowMeans, use mean(df$ColName), and use apply(ColName, mean).

I wonder if there’s a more efficient way?

It may be that I shouldn’t have created the list in the first place. It looks like it does now because I converted it from this format:

Column A Column B
108457 1200
108457 1200
108457 1540
237021 1600
108457 1890
237021 1270

I converted it using pivot_wider and then as.data.frame.(t(dataset))

Should I have tried to get the averages directly from this format? If so, how would I do that?

>Solution :

If your original data look like they do in the latter table, you could simply find the mean by group based on ColumnA:

Data

df <- read.table(text = "ColumnA    ColumnB
108457  1200
108457  1200
108457  1540
237021  1600
108457  1890
237021  1270", header = TRUE)

Base R

aggregate(df$ColumnB, list(df$ColumnA), FUN=mean) 

# Group.1      x
# 1  108457 1457.5
# 2  237021 1435.0

Dplyr

library(dplyr)
df %>% 
  group_by(ColumnA) %>%
  summarise(mean_years = mean(ColumnB))

#  ColumnA mean_years
#    <int>      <dbl>
#1  108457      1458.
#2  237021      1435 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading