Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to divide a large dataset into smaller datasets by birth year using very few commands?

Suppose I have a dataset with people born in different years:

      ID year birth_year outcome
1  10021 2015       1960       1
2  10021 2016       1960       1
3  10021 2017       1960       1
4  10021 2018       1960       0
5  10021 2019       1960       0
6  10022 2015       1968       1
7  10022 2016       1968       0
8  10022 2017       1968       0
9  10022 2018       1968       0
10 10022 2019       1968       0
11 10023 2015       1968       1
12 10023 2016       1968       1
13 10023 2017       1968       1
14 10023 2018       1968       1
15 10023 2019       1968       1
16 10024 2015       1961       0
17 10024 2016       1961       0
18 10024 2017       1961       0
19 10024 2018       1961       1
20 10024 2019       1961       1

I want to split this dataset into smaller datasets according to birth year, and store them as year1960, year1961 and year1968. Specifically,

> year1960

      ID year birth_year outcome
1  10021 2015       1960       1
2  10021 2016       1960       1
3  10021 2017       1960       1
4  10021 2018       1960       0
5  10021 2019       1960       0

> year1961

1 10024 2015       1961       0
2 10024 2016       1961       0
3 10024 2017       1961       0
4 10024 2018       1961       1
5 10024 2019       1961       1

> year1968

1  10022 2015       1968       1
2  10022 2016       1968       0
3  10022 2017       1968       0
4  10022 2018       1968       0
5  10022 2019       1968       0
6  10023 2015       1968       1
7  10023 2016       1968       1
8  10023 2017       1968       1
9  10023 2018       1968       1
10 10023 2019       1968       1

How do I do this with fewest steps possible?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

There are probably shorter/better ways to do this but his will work and you’ll end up with individual dataframes for each birth year.

# read data
df <-read.csv('data.csv')

# split data by 'birth_year' into list of data frames
df_split <- split(df, with(df, birth_year))

# rename elements of list
names(df_split) <- paste0('year', names(df_split))

# create individual dataframes from list 
list2env(df_split, env = .GlobalEnv)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading