I have a simple data set. The row names are a meaningful index and column 1 has a list of values. What I eventually want is the average of that list for each row name.
What it looks like now:
| row name | years |
|---|---|
| 108457 | [1200, 1200, 1540, 1890] |
| 237021 | [1600, 1270, 1270] |
What I eventually want it to look like:
| row name | years |
|---|---|
| 108457 | mean of list |
| 237021 | mean of list |
Currently, I’m trying to use unnest_wider(years). My plan is to then afterwards use rowMeans() to find the mean of the unnested row. I can then merge the row name and average value with my main data set, so I’m not too concerned with deleting the new columns.
However, this whole process is taking a while and I’m having some issues with unnest_wider. Currently, when I try:
unnest_wider(dataset, colname)
I get the following error:
Error in
as_indices_impl():
! Must subset columns with a valid subscript vector.
âś– Subscript has the wrong typedata.frame<years:list>.
ℹ It must be numeric or character.
When I try:
unnest_wider(colname)
My computer just runs endlessly and it looks like it’s counting… it doesn’t stop and I have to quit the application to terminate processing.
I had previously tried to directly apply rowMeans, use mean(df$ColName), and use apply(ColName, mean).
I wonder if there’s a more efficient way?
It may be that I shouldn’t have created the list in the first place. It looks like it does now because I converted it from this format:
| Column A | Column B |
|---|---|
| 108457 | 1200 |
| 108457 | 1200 |
| 108457 | 1540 |
| 237021 | 1600 |
| 108457 | 1890 |
| 237021 | 1270 |
I converted it using pivot_wider and then as.data.frame.(t(dataset))
Should I have tried to get the averages directly from this format? If so, how would I do that?
>Solution :
If your original data look like they do in the latter table, you could simply find the mean by group based on ColumnA:
Data
df <- read.table(text = "ColumnA ColumnB
108457 1200
108457 1200
108457 1540
237021 1600
108457 1890
237021 1270", header = TRUE)
Base R
aggregate(df$ColumnB, list(df$ColumnA), FUN=mean)
# Group.1 x
# 1 108457 1457.5
# 2 237021 1435.0
Dplyr
library(dplyr)
df %>%
group_by(ColumnA) %>%
summarise(mean_years = mean(ColumnB))
# ColumnA mean_years
# <int> <dbl>
#1 108457 1458.
#2 237021 1435