Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Group data based on intervals and assign group to new column

I have the following data

> dput(DF)
structure(list(NAME = c("Gait", "Roc", "Bo", "Hernd", 
"Bet", "Oln", "Gai", "Rock", "Mil", "Arli", "Re", "Fred", "Ro", 
"Rock", "Wheat", "Germa", "Rock", "Nort", "Arli", 
"Rockv"), AGE = c(33, 43, 37, 45, 44, 35, 22, 30, 
38, 23, 45, 43, 67, 43, 28, 47, 16, 29, 22, 31)), 
class = "data.frame", row.names = c(NA, -20L))

I want to group the data by specific intervals such that the first group is from AGE 0-19 and the remaining groups are by 10-year intervals so 20-29, 30-39, etc to the max AGE.

Desired output is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

    NAME AGE  GROUP
1   Gait  33  3
2    Roc  43  4
3     Bo  37  3
4  Hernd  45  4
5    Bet  44  4
6    Oln  35  3
7    Gai  22  2
8   Rock  30  3
9    Mil  38  3
10  Arli  23  2
11    Re  45  4
12  Fred  43  4
13    Ro  67  6
14  Rock  43  4
15 Wheat  28  2
16 Germa  47  4
17  Rock  16  1
18  Nort  29  2
19  Arli  22  2
20 Rockv  31  3

Please keep in mind this is just a sample of the data and the actual data is larger. My goal is to have one odd interval for group 1, while the remaining groups are all by the same range of 10 years.

>Solution :

You may use cut and create groups based on defined intervals.

transform(DF, GROUP = cut(AGE, c(0, seq(19, max(AGE) + 10, 10)), labels = FALSE))

#    NAME AGE GROUP
#1   Gait  33     3
#2    Roc  43     4
#3     Bo  37     3
#4  Hernd  45     4
#5    Bet  44     4
#6    Oln  35     3
#7    Gai  22     2
#8   Rock  30     3
#9    Mil  38     3
#10  Arli  23     2
#11    Re  45     4
#12  Fred  43     4
#13    Ro  67     6
#14  Rock  43     4
#15 Wheat  28     2
#16 Germa  47     4
#17  Rock  16     1
#18  Nort  29     2
#19  Arli  22     2
#20 Rockv  31     3

The key part here is how we create intervals with c and seq which define the groups.

c(0, seq(19, max(DF$AGE) + 10, 10))
#[1]  0 19 29 39 49 59 69
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading