Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

split values of a column and store only numeric values in a different column

I have a dataframe in long table format like this one below:

mydf <- data.frame(id = c(123, 123, 123, 123, 123),
                    name =c("test_2001", "test_2002", "test_2003", "test_2004", "test_2005"),
                    value = c(15, 20, 25, 30, 35))
mydf
#>    id      name value
#> 1 123 test_2001    15
#> 2 123 test_2002    20
#> 3 123 test_2003    25
#> 4 123 test_2004    30
#> 5 123 test_2005    35

Now, I want to split attributes from column name in such a way that I could remove those numeric values and store them in another column named year, output like this:

desired <- data.frame(id = c(123, 123, 123, 123, 123),
                      name =c("test", "test", "test", "test", "test"),
                      year =c(2001:2005),
                      value = c(15, 20, 25, 30, 35))
desired
#>    id name year value
#> 1 123 test 2001    15
#> 2 123 test 2002    20
#> 3 123 test 2003    25
#> 4 123 test 2004    30
#> 5 123 test 2005    35

How can I automate this step considering I have to apply this change to a lot of data?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

base R

cbind(mydf, strcapture("(.*)_(.*)", mydf$name, list(name="", year=1L)))
#    id      name value name year
# 1 123 test_2001    15 test 2001
# 2 123 test_2002    20 test 2002
# 3 123 test_2003    25 test 2003
# 4 123 test_2004    30 test 2004
# 5 123 test_2005    35 test 2005

dplyr/tidyr

library(dplyr)
library(tidyr)
mydf %>%
  separate(name, into = c("name", "year"), sep = "_")
#    id name year value
# 1 123 test 2001    15
# 2 123 test 2002    20
# 3 123 test 2003    25
# 4 123 test 2004    30
# 5 123 test 2005    35

(Note that with this, year is a string, can be converted to ints with as.integer.)

data.table

library(data.table)
as.data.table(mydf)[, c("name", "year") := tstrsplit(name, "_")][]
#       id   name value   year
#    <num> <char> <num> <char>
# 1:   123   test    15   2001
# 2:   123   test    20   2002
# 3:   123   test    25   2003
# 4:   123   test    30   2004
# 5:   123   test    35   2005

(Again, year here is still a string.)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading