Home Collapsing multiple observations based on specific parameters in R

Questions

Collapsing multiple observations based on specific parameters in R

March 7, 2023

I am quite new to R. I have a dataset with 8081 observations for 113 variables. The data was collected in 4 waves (panels), with some individuals being interviewed multiple times. They were sometimes asked the same questions, but some questions were only asked during one wave. Most answers were on a scale (e.g. how much do you agree) or a binary. Each individual has a unique numeric ID so I know which rows to collapse. My dependent variable was only surveyed in wave 4.

The data looks something like this:

df <- data.frame (ID  = c(1, 1, 2, 2, 2, 2, 3, 4, 4), PANEL = c(1, 4, 1, 2, 3, 4, 2, 3, 4),
AGE = c(68, 68, 52, 52, 52, 52, 43, 33, 33), Q4 = c(2, 2, 1, 1, 1, 1, 2, 2, 1),
Q4_1 = c(2, 2, 1, 1, 1, 1, 2, 2, 1), Q4_1 = c(2, NA, NA, 3, NA, 3, 2, 3, NA),
Q5 = c(10, 10, 8, 9, 8, 7, 6, 6, 5))

df

  ID PANEL AGE Q4 Q4_1 Q4_1.1 Q5
1  1     1  68  2    2      2 10
2  1     4  68  2    2     NA 10
3  2     1  52  1    1     NA  8
4  2     2  52  1    1      3  9
5  2     3  52  1    1     NA  8
6  2     4  52  1    1      3  7
7  3     2  43  2    2      2  6
8  4     3  33  2    2      3  6
9  4     4  33  1    1     NA  5

etc...

Sometimes each individual’s answers are the same across waves, but not always. I do not need to know how the answers vary in time and the waves were relatively close in time. I just need a profile of each individual surveyed.

Ideally, for each individual I’d want to keep the answers given in wave 4 (if they took part in it), but substituting the NA answers with what they answered in previous waves. I was wondering if there is any way to do this without going through every individual one by one, given the amount of data. I’ll also have to remove data for individuals who did not take part in wave 4 at all.

If successful, the chunk of data above would end up looking something like this (+ I’ll remove the panel column later):

   ID  PANEL  AGE  Q4  Q4_1 Q5
1   1     4    68   2    2  10 
2   2     4    52   1    3   7 
3   4     4    33   1    3   5

etc...

I’ve been looking into dplyr’s summarise() function but it doesn’t seem like I can be that specific with what I need to merge and not merge. It wouldn’t be a problem if some of the answers were merged by getting an average of the individual’s responses across waves, but that would not work for the binary answers if the individual changed their mind in between waves.

>Solution :

You may be looking for tidyr::fill().. You can do the following on your provided example dataset:

df %>% group_by(ID) %>% fill(., starts_with("Q")) %>% filter(PANEL == 4)

Output:

  ID PANEL AGE Q4 Q4_1 Q4_1.1 Q5
1  1     4  68  2    2      2 10
2  2     4  52  1    1      3  7
3  4     4  33  1    1      3  5

summarize

byMR

Published March 07, 2023

Add a comment

Can not reshape 2 dimensional array to 3 dimensional array for RNN in python

byMR

March 7, 2023

Questions

How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?

byMR

March 7, 2023

Questions

In reactJS how to wait for one api call to execute and then call next api

byMR

March 7, 2023

Questions

Python – How to unpack a string?

byMR

March 7, 2023

Questions

How, with Ruby, can I access and compare these nested array values?

byMR

March 7, 2023

Questions

Use data from variable in another function

byMR

March 7, 2023

Collapsing multiple observations based on specific parameters in R

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Can not reshape 2 dimensional array to 3 dimensional array for RNN in python

How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?

In reactJS how to wait for one api call to execute and then call next api

Python – How to unpack a string?

How, with Ruby, can I access and compare these nested array values?

Use data from variable in another function

Keep Up to Date with the Most Important News

Collapsing multiple observations based on specific parameters in R

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Can not reshape 2 dimensional array to 3 dimensional array for RNN in python

How to select the columns of a dataframe based on a vector of strings, matching for exact coincidence?

In reactJS how to wait for one api call to execute and then call next api

Python – How to unpack a string?

How, with Ruby, can I access and compare these nested array values?

Use data from variable in another function

Discover more from Dev solutions